Class PermutedFrontCodedStringList
- All Implemented Interfaces:
ObjectCollection<CharSequence>
,ObjectIterable<CharSequence>
,ObjectList<CharSequence>
,Stack<CharSequence>
,Serializable
,Comparable<List<? extends CharSequence>>
,Iterable<CharSequence>
,Collection<CharSequence>
,List<CharSequence>
FrontCodedStringList
whose indices are permuted.
It may happen that a list of strings compresses very well
using front coding, but unfortunately alphabetical order is not
the right order for the strings in the list. Instances of this class
wrap an instance of FrontCodedStringList
together with a permutation π: inquiries with index i will
actually return the string with index πi.
In case you start from a newline-delimited non-sorted list of UTF-8 strings, the simplest way to build an instance of this map is obtaining a front-coded string list and a permutation with a simple UN*X pipe (which also avoids storing the sorted strings):
nl -v0 -nln | sort -k2 | tee >(cut -f1 >perm.txt) \ | cut -f2 | java it.unimi.dsi.util.FrontCodedStringList tmp-lex.fclThe above command will read a list of strings from standard input, output a their sorted index list in
perm.txt
and create a tmp-lex.fcl
front-coded
string list containing the sorted list of strings.
Important: you must be sure to be using the byte-by-byte collation order—in UN*X,
be sure that LC_COLLATE=C
. Failure to do so will result in an order-of-magnitude-slower sorting and
worse compression.
Now, in perm.txt
you will find the permutation that you have to pass to
this class (given that you will use the option -i
). So the last step is just
java it.unimi.dsi.util.PermutedFrontCodedStringList -i -t tmp-lex.fcl perm.txt your.fcl
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
AbstractObjectList.ObjectRandomAccessSubList<K extends Object>, AbstractObjectList.ObjectSubList<K extends Object>
-
Field Summary
Modifier and TypeFieldDescriptionprotected final FrontCodedStringList
The underlying front-coded string list.protected final int[]
The permutation.static final long
-
Constructor Summary
ConstructorDescriptionPermutedFrontCodedStringList
(FrontCodedStringList frontCodedStringList, int[] permutation) Creates a new permuted front-coded string list using a given front-coded string list and permutation. -
Method Summary
Modifier and TypeMethodDescriptionget
(int index) void
get
(int index, MutableString s) Returns the element at the specified position in this front-coded list by storing it in a mutable string.listIterator
(int k) static void
int
size()
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.util.Collection
parallelStream, removeIf, stream, toArray
Methods inherited from interface java.util.List
containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection
spliterator
Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList
addAll, addAll, setElements, setElements, sort, spliterator, unstableSort
-
Field Details
-
serialVersionUID
public static final long serialVersionUID- See Also:
-
frontCodedStringList
The underlying front-coded string list. -
permutation
protected final int[] permutationThe permutation.
-
-
Constructor Details
-
PermutedFrontCodedStringList
Creates a new permuted front-coded string list using a given front-coded string list and permutation.- Parameters:
frontCodedStringList
- the underlying front-coded string list.permutation
- the underlying permutation.
-
-
Method Details
-
get
- Specified by:
get
in interfaceList<CharSequence>
-
get
Returns the element at the specified position in this front-coded list by storing it in a mutable string.- Parameters:
index
- an index in the list.s
- a mutable string that will contain the string at the specified position.
-
size
public int size()- Specified by:
size
in interfaceCollection<CharSequence>
- Specified by:
size
in interfaceList<CharSequence>
- Specified by:
size
in classAbstractCollection<CharSequence>
-
listIterator
- Specified by:
listIterator
in interfaceList<CharSequence>
- Specified by:
listIterator
in interfaceObjectList<CharSequence>
- Overrides:
listIterator
in classAbstractObjectList<CharSequence>
-
main
-