Class HuffmanCodec

java.lang.Object
it.unimi.dsi.compression.HuffmanCodec
All Implemented Interfaces:
Codec, PrefixCodec, Serializable

public class HuffmanCodec extends Object implements PrefixCodec, Serializable
An implementation of Huffman optimal prefix-free coding.

A Huffman coder is built starting from an array of frequencies corresponding to each symbol. Frequency 0 symbols are allowed, but they will degrade the resulting code.

Instances of this class compute a canonical Huffman code (Eugene S. Schwartz and Bruce Kallick, “Generating a Canonical Prefix Encoding”, Commun. ACM 7(3), pages 166−169, 1964), which can by quickly decoded using table lookups. The construction uses the most efficient one-pass in-place codelength computation procedure described by Alistair Moffat and Jyrki Katajainen in “In-Place Calculation of Minimum-Redundancy Codes”, Algorithms and Data Structures, 4th International Workshop, number 955 in Lecture Notes in Computer Science, pages 393−402, Springer-Verlag, 1995.

We note by passing that this coded uses a CanonicalFast64CodeWordDecoder, which does not support codelengths above 64. However, since the worst case for codelengths is given by Fibonacci numbers, and frequencies are to be provided as integers, no codeword longer than the base-[(51/2 + 1)/2] logarithm of 51/2 · 231 (less than 47) will ever be generated.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final int
    The number of symbols of this coder.
  • Constructor Summary

    Constructors
    Constructor
    Description
    HuffmanCodec(int[] frequency)
    Creates a new Huffman codec using the given vector of frequencies.
    HuffmanCodec(long[] frequency)
    Creates a new Huffman codec using the given vector of frequencies.
  • Method Summary

    Modifier and Type
    Method
    Description
    Returns a coder for the compression technique represented by this coded.
    Returns the vector of prefix-free codewords used by this prefix coder.
    Returns a decoder for the compression technique represented by this coded.
    int
    Returns the number of symbols handled by this codec.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • size

      public final int size
      The number of symbols of this coder.
  • Constructor Details

    • HuffmanCodec

      public HuffmanCodec(int[] frequency)
      Creates a new Huffman codec using the given vector of frequencies.
      Parameters:
      frequency - a vector of nonnnegative frequencies.
    • HuffmanCodec

      public HuffmanCodec(long[] frequency)
      Creates a new Huffman codec using the given vector of frequencies.
      Parameters:
      frequency - a vector of nonnnegative frequencies.
  • Method Details

    • coder

      public CodeWordCoder coder()
      Description copied from interface: Codec
      Returns a coder for the compression technique represented by this coded.
      Specified by:
      coder in interface Codec
      Specified by:
      coder in interface PrefixCodec
      Returns:
      a coder for the compression technique represented by this codec.
    • decoder

      public Decoder decoder()
      Description copied from interface: Codec
      Returns a decoder for the compression technique represented by this coded.
      Specified by:
      decoder in interface Codec
      Returns:
      a decoder for the compression technique represented by this codec.
    • size

      public int size()
      Description copied from interface: Codec
      Returns the number of symbols handled by this codec.
      Specified by:
      size in interface Codec
      Returns:
      the number of symbols handled by this codec.
    • codeWords

      public BitVector[] codeWords()
      Description copied from interface: PrefixCodec
      Returns the vector of prefix-free codewords used by this prefix coder.
      Specified by:
      codeWords in interface PrefixCodec
      Returns:
      the vector of prefix-free codewords used by this prefix coder.