Class FrontCodedStringList
- All Implemented Interfaces:
ObjectCollection<MutableString>
,ObjectIterable<MutableString>
,ObjectList<MutableString>
,Stack<MutableString>
,Serializable
,Comparable<List<? extends MutableString>>
,Iterable<MutableString>
,Collection<MutableString>
,List<MutableString>
,RandomAccess
This class stores a list of strings using front-coding (also known as prefix-omission)
compression; the compression will be reasonable only if the list is sorted, but you could also
use instances of this class just as a handy way to manage a large amount of strings. It
implements an immutable ObjectList
that returns the
i-th string (as a MutableString
) when the get(int)
method is called
with argument i. The returned mutable string may be freely modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.
Implementation Details
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList
, or a
CharArrayFrontCodedList
, depending on the value of the
utf8
parameter at creation time. In the first
case, if the strings are ASCII-oriented the resulting array will be much smaller, but access
times will increase manifold, as each string must be UTF-8 decoded before being returned.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
AbstractObjectList.ObjectRandomAccessSubList<K extends Object>, AbstractObjectList.ObjectSubList<K extends Object>
-
Field Summary
Modifier and TypeFieldDescriptionprotected final ByteArrayFrontCodedList
The underlyingByteArrayFrontCodedList
, ornull
.protected final CharArrayFrontCodedList
The underlyingCharArrayFrontCodedList
, ornull
.static final long
protected final boolean
Whether this front-coded list is UTF-8 encoded. -
Constructor Summary
ConstructorDescriptionFrontCodedStringList
(Collection<? extends CharSequence> c, int ratio, boolean utf8) Creates a new front-coded string list containing the character sequences contained in the given collection.FrontCodedStringList
(Iterator<? extends CharSequence> words, int ratio, boolean utf8) Creates a new front-coded string list containing the character sequences returned by the given iterator. -
Method Summary
Modifier and TypeMethodDescriptionprotected static char[]
byte2Char
(byte[] a, char[] s) protected static int
countUTF8Chars
(byte[] a) get
(int index) Returns the element at the specified position in this front-coded string list as a mutable string.void
get
(int index, MutableString s) Returns the element at the specified position in this front-coded string list by storing it in a mutable string.listIterator
(int k) static void
int
ratio()
Returns the ratio of the underlying front-coded list.int
size()
boolean
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.util.Collection
parallelStream, removeIf, stream, toArray
Methods inherited from interface java.util.List
containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection
spliterator
Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList
addAll, addAll, setElements, setElements, sort, spliterator, unstableSort
-
Field Details
-
serialVersionUID
public static final long serialVersionUID- See Also:
-
byteFrontCodedList
The underlyingByteArrayFrontCodedList
, ornull
. -
charFrontCodedList
The underlyingCharArrayFrontCodedList
, ornull
. -
utf8
protected final boolean utf8Whether this front-coded list is UTF-8 encoded.
-
-
Constructor Details
-
FrontCodedStringList
Creates a new front-coded string list containing the character sequences returned by the given iterator.- Parameters:
words
- an iterator returning character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.
-
FrontCodedStringList
Creates a new front-coded string list containing the character sequences contained in the given collection.- Parameters:
c
- a collection containing character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.
-
-
Method Details
-
utf8
public boolean utf8()Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.- Returns:
- true if this front-coded string list is keeping its data as an array of UTF-8 encoded bytes.
-
ratio
public int ratio()Returns the ratio of the underlying front-coded list.- Returns:
- the ratio of the underlying front-coded list.
-
get
Returns the element at the specified position in this front-coded string list as a mutable string.- Specified by:
get
in interfaceList<MutableString>
- Parameters:
index
- an index in the list.- Returns:
- a
MutableString
that will contain the string at the specified position. The string may be freely modified.
-
get
Returns the element at the specified position in this front-coded string list by storing it in a mutable string.- Parameters:
index
- an index in the list.s
- a mutable string that will contain the string at the specified position.
-
countUTF8Chars
protected static int countUTF8Chars(byte[] a) -
byte2Char
protected static char[] byte2Char(byte[] a, char[] s) -
listIterator
- Specified by:
listIterator
in interfaceList<MutableString>
- Specified by:
listIterator
in interfaceObjectList<MutableString>
- Overrides:
listIterator
in classAbstractObjectList<MutableString>
-
size
public int size()- Specified by:
size
in interfaceCollection<MutableString>
- Specified by:
size
in interfaceList<MutableString>
- Specified by:
size
in classAbstractCollection<MutableString>
-
main
-