Class TransformationStrategies
This class provides several transformation strategies that turn strings or other objects into bit vectors. The transformations might optionally be:
- Lexicographical: for objects based on bytes or characters, such as strings and byte arrays, this means that the first bit of the bit vector is the most significant bit of the first byte or character, and so on. In other word, the lexicographical order between bit vectors reflects the lexicographical byte-by-byte, char-by-char, etc. order. Thiss property is necessary for some kind of static structure that depends on it, but it has some computational cost, as after compacting byte or chars into a long we need to revert the bit order of each piece.
- Prefix-free: no two bit vector returned by the transformation on two
different objects will be comparable in prefix order. Again, this might require to use more
linear (e.g.,
prefixFree()
) or constant (e.g.,prefixFreeIso()
) additional space.
As a general rule, transformations without additional naming are lexicographical. Transformation that generate prefix-free bit vectors are marked as such. Plain transformations that do not provide any guarantee are called raw. They should be used only when performance is the main issue and the two properties above are not relevant.
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic TransformationStrategy<byte[]>
A lexicographical transformation from byte arrays to bit vectors.static TransformationStrategy<Long>
A transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE
-bit vector.static <T extends BitVector>
TransformationStrategy<T>identity()
A trivial transformation for data already inBitVector
form.static <T extends CharSequence>
TransformationStrategy<T>iso()
A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.static <T extends BitVector>
TransformationStrategy<T>A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.static TransformationStrategy<byte[]>
A lexicographical transformation from byte arrays to bit vectors that completes the representation with a zero to guarantee lexicographical ordering and prefix-freeness provided the byte arrays to not contain zeros.static <T extends CharSequence>
TransformationStrategy<T>A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.static <T extends CharSequence>
TransformationStrategy<T>A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.static <T extends CharSequence>
TransformationStrategy<T>A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.static TransformationStrategy<byte[]>
A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.static TransformationStrategy<Long>
A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE
-bit vector.static <T extends CharSequence>
TransformationStrategy<T>rawIso()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>rawUtf16()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>rawUtf32()
A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.static <T extends CharSequence>
TransformationStrategy<T>utf16()
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>utf32()
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.wrap
(Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy) Wraps a given iterable, returning an iterable that contains bit vectors.wrap
(Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy) Wraps a given iterator, returning an iterator that emits bit vectors.wrap
(List<T> list, TransformationStrategy<? super T> transformationStrategy) Wraps a given list, returning a list that contains bit vectors.
-
Constructor Details
-
TransformationStrategies
public TransformationStrategies()
-
-
Method Details
-
identity
A trivial transformation for data already inBitVector
form. -
rawUtf32
A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.Warning: this transformation is not lexicographic.
-
utf32
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation. -
prefixFreeUtf32
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.Note that strings provided to this strategy must not contain NULs.
-
rawUtf16
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
utf16
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
prefixFreeUtf16
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.Note that strings provided to this strategy must not contain NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
rawIso
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.Warning: this transformation is not lexicographic.
Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
iso
A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
prefixFreeIso
A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset, and that strings provided to this strategy must not contain ASCII NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
rawByteArray
A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
- See Also:
-
byteArray
A lexicographical transformation from byte arrays to bit vectors.Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
- See Also:
-
prefixFreeByteArray
A lexicographical transformation from byte arrays to bit vectors that completes the representation with a zero to guarantee lexicographical ordering and prefix-freeness provided the byte arrays to not contain zeros.This transformation is mainly intended for byte arrays representing ASCII strings in compact form.
Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
- See Also:
-
wrap
public static <T> Iterator<BitVector> wrap(Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy) Wraps a given iterator, returning an iterator that emits bit vectors.- Parameters:
iterator
- an iterator.transformationStrategy
- a strategy to transform the object returned byiterator
.- Returns:
- an iterator that emits the content of
iterator
passed throughtransformationStrategy
.
-
wrap
public static <T> Iterable<BitVector> wrap(Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy) Wraps a given iterable, returning an iterable that contains bit vectors.- Parameters:
iterable
- an iterable.transformationStrategy
- a strategy to transform the object contained initerable
.- Returns:
- an iterable that has the content of
iterable
passed throughtransformationStrategy
.
-
wrap
public static <T> List<BitVector> wrap(List<T> list, TransformationStrategy<? super T> transformationStrategy) Wraps a given list, returning a list that contains bit vectors.- Parameters:
list
- a list.transformationStrategy
- a strategy to transform the object contained inlist
.- Returns:
- a list that has the content of
list
passed throughtransformationStrategy
.
-
prefixFree
A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.More in detail, we map 0 to 10, 1 to 11, and we add a 0 at the end of all strings.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
fixedLong
A transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE
-bit vector. Note that the bit vectors have as first bit the most significant bit of the underlying long integer, and that the first bit of the representation is flipped, so lexicographical and numerical order coincide.- Implementation Notes:
- The flipping of the most significant bit was implemented in 2.6.18 to match lexicographical and numerical order for negative numbers, too, and made it necessary to bump the serial version of the strategy.
-
rawFixedLong
A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE
-bit vector.- Implementation Notes:
- Implementing
fixedLong()
lexicographical order for all numbers in 2.6.18 made it necessary to bump the serial version of this strategy, too.
-