Class FrontCodedStringList

All Implemented Interfaces:
ObjectCollection<MutableString>, ObjectIterable<MutableString>, ObjectList<MutableString>, Stack<MutableString>, Serializable, Comparable<List<? extends MutableString>>, Iterable<MutableString>, Collection<MutableString>, List<MutableString>, RandomAccess, SequencedCollection<MutableString>

public class FrontCodedStringList extends AbstractObjectList<MutableString> implements RandomAccess, Serializable
Compact storage of strings using front-coding compression (also known as compression by prefix omission).

This class stores a list of strings using front-coding (also known as prefix-omission) compression; the compression will be reasonable only if the list is sorted, but you could also use instances of this class just as a handy way to manage a large amount of strings. It implements an immutable ObjectList that returns the i-th string (as a MutableString) when the get(int) method is called with argument i. The returned mutable string may be freely modified.

As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.

Implementation Details

To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on the value of the utf8 parameter at creation time. In the first case, if the strings are ASCII-oriented the resulting array will be much smaller, but access times will increase manifold, as each string must be UTF-8 decoded before being returned.

See Also:
  • Field Details

  • Constructor Details

    • FrontCodedStringList

      public FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences returned by the given iterator.
      Parameters:
      words - an iterator returning character sequences.
      ratio - the desired ratio.
      utf8 - if true, the strings will be stored as UTF-8 byte arrays.
    • FrontCodedStringList

      public FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences contained in the given collection.
      Parameters:
      c - a collection containing character sequences.
      ratio - the desired ratio.
      utf8 - if true, the strings will be stored as UTF-8 byte arrays.
  • Method Details

    • utf8

      public boolean utf8()
      Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
      Returns:
      true if this front-coded string list is keeping its data as an array of UTF-8 encoded bytes.
    • ratio

      public int ratio()
      Returns the ratio of the underlying front-coded list.
      Returns:
      the ratio of the underlying front-coded list.
    • get

      public MutableString get(int index)
      Returns the element at the specified position in this front-coded string list as a mutable string.
      Specified by:
      get in interface List<MutableString>
      Parameters:
      index - an index in the list.
      Returns:
      a MutableString that will contain the string at the specified position. The string may be freely modified.
    • get

      public void get(int index, MutableString s)
      Returns the element at the specified position in this front-coded string list by storing it in a mutable string.
      Parameters:
      index - an index in the list.
      s - a mutable string that will contain the string at the specified position.
    • countUTF8Chars

      protected static int countUTF8Chars(byte[] a)
    • byte2Char

      protected static char[] byte2Char(byte[] a, char[] s)
    • listIterator

      public ObjectListIterator<MutableString> listIterator(int k)
      Specified by:
      listIterator in interface List<MutableString>
      Specified by:
      listIterator in interface ObjectList<MutableString>
      Overrides:
      listIterator in class AbstractObjectList<MutableString>
    • size

      public int size()
      Specified by:
      size in interface Collection<MutableString>
      Specified by:
      size in interface List<MutableString>
      Specified by:
      size in class AbstractCollection<MutableString>
    • main

      public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, NoSuchMethodException
      Throws:
      IOException
      com.martiansoftware.jsap.JSAPException
      NoSuchMethodException