Class FrontCodedStringList

All Implemented Interfaces:
ObjectCollection<MutableString>, ObjectIterable<MutableString>, ObjectList<MutableString>, Stack<MutableString>, Serializable, Comparable<List<? extends MutableString>>, Iterable<MutableString>, Collection<MutableString>, List<MutableString>, RandomAccess

public class FrontCodedStringList extends AbstractObjectList<MutableString> implements RandomAccess, Serializable
Compact storage of strings using front-coding compression (also known as compression by prefix omission).

This class stores a list of strings using front-coding (also known as prefix-omission) compression; the compression will be reasonable only if the list is sorted, but you could also use instances of this class just as a handy way to manage a large amount of strings. It implements an immutable ObjectList that returns the i-th string (as a MutableString) when the get(int) method is called with argument i. The returned mutable string may be freely modified.

As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.

Implementation Details

To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on the value of the utf8 parameter at creation time. In the first case, if the strings are ASCII-oriented the resulting array will be much smaller, but access times will increase manifold, as each string must be UTF-8 decoded before being returned.

See Also:
  • Field Details

  • Constructor Details

    • FrontCodedStringList

      public FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences returned by the given iterator.
      Parameters:
      words - an iterator returning character sequences.
      ratio - the desired ratio.
      utf8 - if true, the strings will be stored as UTF-8 byte arrays.
    • FrontCodedStringList

      public FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences contained in the given collection.
      Parameters:
      c - a collection containing character sequences.
      ratio - the desired ratio.
      utf8 - if true, the strings will be stored as UTF-8 byte arrays.
  • Method Details