Class HuTuckerCodec

java.lang.Object
it.unimi.dsi.compression.HuTuckerCodec
All Implemented Interfaces:
Codec, PrefixCodec, Serializable

public class HuTuckerCodec extends Object implements PrefixCodec, Serializable
An implementation of the Hu–Tucker optimal lexicographical prefix-free code.

The familiar Huffman coding technique can be extended so to preserve the order in which symbols are given to the coder, in the sense that if j<k, then the j-th symbol will get a code lexicographically smaller than the one assigned to the k-th symbol. This result can be obtained with a small loss in code length (for more details, see the third volume of The Art of Computer Programming).

A Hu–Tucker coder is built given an array of frequencies corresponding to each symbol. Frequency 0 symbols are allowed, but they will degrade the resulting code.

The implementation of this class is rather inefficient, and the time required to build a Hu–Tucker code is quadratic in the number of symbols. An O(n log n) implementation is possible, but it requires very sophisticated data structures.

See Also:
  • Field Details

    • size

      public final int size
      The number of symbols of this coder.
  • Constructor Details

    • HuTuckerCodec

      public HuTuckerCodec(int[] frequency)
    • HuTuckerCodec

      public HuTuckerCodec(long[] frequency)
  • Method Details

    • coder

      public CodeWordCoder coder()
      Description copied from interface: Codec
      Returns a coder for the compression technique represented by this coded.
      Specified by:
      coder in interface Codec
      Specified by:
      coder in interface PrefixCodec
      Returns:
      a coder for the compression technique represented by this codec.
    • decoder

      public Decoder decoder()
      Description copied from interface: Codec
      Returns a decoder for the compression technique represented by this coded.
      Specified by:
      decoder in interface Codec
      Returns:
      a decoder for the compression technique represented by this codec.
    • size

      public int size()
      Description copied from interface: Codec
      Returns the number of symbols handled by this codec.
      Specified by:
      size in interface Codec
      Returns:
      the number of symbols handled by this codec.
    • codeWords

      public BitVector[] codeWords()
      Description copied from interface: PrefixCodec
      Returns the vector of prefix-free codewords used by this prefix coder.
      Specified by:
      codeWords in interface PrefixCodec
      Returns:
      the vector of prefix-free codewords used by this prefix coder.
    • getCoder

      @Deprecated public PrefixCoder getCoder()
      Deprecated.
    • getDecoder

      @Deprecated public Decoder getDecoder()
      Deprecated.