public class HyperLogLogCounterArray extends Object implements Serializable
HyperLogLog counters represent the number of elements of a set in an approximate way. They have been introduced by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Freédeéric Meunier in “HyperLogLog: the analysis of a nearoptimal cardinality estimation algorithm”, Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127−146, 2007. They are an improvement over the basic idea of loglog counting, introduced by Marianne Durand and Philippe Flajolet in “Loglog counting of large cardinalities”, ESA 2003, 11th Annual European Symposium, volume 2832 of Lecture Notes in Computer Science, pages 605−617, Springer, 2003.
Each counter is composed by m
registers, and each register is made of registerSize
bits.
The first number depends on the desired relative standard deviation, and its logarithm can be computed using log2NumberOfRegisters(double)
,
whereas the second number depends on an upper bound on the number of distinct elements to be counted, and it can be computed
using registerSize(long)
.
Actually, this class implements an array of counters. Each counter is completely independent, but they all use the same hash function.
The reason for this design is that in our intended applications hundred of millions of counters are common, and the JVM overhead to create such a number of objects
would be unbearable. This class allocates an array of LongArrayBitVector
s, each containing CHUNK_SIZE
registers,
and can thus handle billions of billions of registers efficiently (in turn, this means being able to
handle an array of millions of billions of highprecision counters).
When creating an instance, you can choose the size of the array (i.e., the number of counters) and the desired relative standard deviation (either explicitly or choosing the number of registers per counter). Then, you can add an element to a counter. At any time, you can count count (approximately) the number of distinct elements that have been added to a counter.
If you need to reuse this class multiple times, you can clear all registers, possibly setting a new seed. The seed is used to compute the hash function used by the HyperLogLog counters.
This class provides a number of utility methods that make it possible to extract a counter as an array of longs, set the contents of a counter given an array of longs, and maximize quickly two counters given as arrays of longs.
Modifier and Type  Field and Description 

protected long[][] 
bits
An array of arrays of longs containing all registers.

static long 
CHUNK_MASK
The mask used to obtain an register offset in a chunk.

static int 
CHUNK_SHIFT
The logarithm of the maximum size in registers of a bit vector.

static long 
CHUNK_SIZE
The maximum size in registers of a bit vector.

int 
counterLongwords
The size of a counter in longwords (ceiled if there are less then
Long.SIZE registers per counter). 
protected long 
counterResidualMask

protected int 
counterShift
The shift that selects the chunk corresponding to a counter.

int 
counterSize

int 
log2m
The logarithm of the number of registers per counter (at most 30).

protected boolean 
longwordAligned
Whether counters are aligned to longwords.

protected long[] 
lsbMask
A mask containing a one in the least significant bit of each register (i.e., in positions of the form
registerSize * i ). 
int 
m
The number of registers per counter.

protected int 
mMinus1
The number of registers minus one.

protected long[] 
msbMask
A mask containing a one in the most significant bit of each register (i.e., in positions of the form
registerSize * (i + 1)  1 ). 
protected LongBigList[] 
registers
registerSize bit views of bits . 
int 
registerSize
The size in bits of each register.

protected long 
seed
A seed for hashing.

Constructor and Description 

HyperLogLogCounterArray(long arraySize,
long n,
double rsd)
Creates a new array of counters.

HyperLogLogCounterArray(long arraySize,
long n,
int log2m)
Creates a new array of counters.

HyperLogLogCounterArray(long arraySize,
long n,
int log2m,
long seed)
Creates a new array of counters.

Modifier and Type  Method and Description 

void 
add(long k,
long v)
Adds an element to a counter.

int 
chunk(long counter)
Returns the chunk of a given counter.

void 
clear()
Clears all registers.

void 
clear(long seed)
Clears all registers and sets a new seed (e.g., using
Util.randomSeed() ). 
double 
count(long k)
Estimates the number of distinct elements that have been added to a given counter so far.

double 
count(long[] bits,
long offset)
Estimates the number of distinct elements that have been added to a given counter so far.

void 
getCounter(long[] chunkBits,
long index,
long[] dest)
Extracts a counter from this
HyperLogLogCounterArray and writes it into an array of longs. 
void 
getCounter(long index,
long[] dest)
Extracts a counter from this
HyperLogLogCounterArray and writes it into an array of longs. 
static int 
log2NumberOfRegisters(double rsd)
Returns the logarithm of the number of registers per counter that are necessary to attain a
given relative standard deviation.

void 
max(long[] x,
long[] y)
Computes the registerbyregister maximum of two counters.

void 
max(long[] x,
long[] y,
long[] accumulator,
long[] mask)
Computes the registerbyregister maximum of two counters.

long 
offset(long counter)
Returns the bit offset of a given counter in its chunk.

LongBigList[] 
registers()
Returns the array of big lists of registers underlying this array of counters.

static int 
registerSize(long n)
Returns the register size in bits, given an upper bound on the number of distinct elements.

static double 
relativeStandardDeviation(int log2m)
Returns the relative standard deviation corresponding to a given logarithm of the number of registers per counter.

void 
setCounter(long[] source,
long index)
Sets the contents of a counter of this
HyperLogLogCounterArray using a provided array of longs. 
void 
setCounter(long[] source,
long[] chunkBits,
long index)
Sets the contents of a counter of this
HyperLogLogCounterArray using a provided array of longs. 
void 
transfer(long[] source,
long[] dest,
long node)
Transfers the content of a counter between two parallel array of longwords.

public static final int CHUNK_SHIFT
public static final long CHUNK_SIZE
public static final long CHUNK_MASK
protected final int mMinus1
protected final long[][] bits
protected final LongBigList[] registers
registerSize
bit views of bits
.protected final int counterShift
protected long seed
protected final boolean longwordAligned
protected final long counterResidualMask
protected final long[] msbMask
registerSize * (i + 1)  1
).protected final long[] lsbMask
registerSize * i
).public final int log2m
public final int m
public final int registerSize
public final int counterSize
public final int counterLongwords
Long.SIZE
registers per counter).public HyperLogLogCounterArray(long arraySize, long n, double rsd)
arraySize
 the number of counters.n
 the expected number of elements.rsd
 the relative standard deviation.public HyperLogLogCounterArray(long arraySize, long n, int log2m)
arraySize
 the number of counters.n
 the expected number of elements.log2m
 the logarithm of the number of registers per counter (at most 30).public HyperLogLogCounterArray(long arraySize, long n, int log2m, long seed)
arraySize
 the number of counters.n
 the expected number of elements.log2m
 the logarithm of the number of registers per counter (at most 30).seed
 the seed used to compute the hash function.public static int log2NumberOfRegisters(double rsd)
rsd
 the relative standard deviation to be attained.rsd
.public static double relativeStandardDeviation(int log2m)
log2m
 the logarithm of the number of registers per counter (at most 30).public static int registerSize(long n)
n
 an upper bound on the number of distinct elements.public int chunk(long counter)
counter
 a counter.public long offset(long counter)
counter
 a counter.public void clear(long seed)
Util.randomSeed()
).seed
 the new seed used to compute the hash functionpublic void clear()
public void add(long k, long v)
k
 the index of the counter.v
 the element to be added.public LongBigList[] registers()
The main purpose of this method is debugging, as it makes comparing the evolution of the state of two implementations easy.
public double count(long[] bits, long offset)
This is an lowlevel method that should be used only after having understood in detail the inner workings of this class.
bits
 the bit array containing the counter.offset
 the starting bit position of the counter in bits
.public double count(long k)
k
 the index of the counter.k
so far.public final void setCounter(long[] source, long[] chunkBits, long index)
HyperLogLogCounterArray
using a provided array of longs.
Warning: this is a lowlevel method. You must know what you're doing.
source
 an array of at least counterLongwords
longs containing a counter.chunkBits
 the array where the counter will be stored.index
 the index of the counter that will be filled with the provided array.getCounter(long[], long, long[])
public void setCounter(long[] source, long index)
HyperLogLogCounterArray
using a provided array of longs.
Warning: this is a lowlevel method. You must know what you're doing.
source
 an array of at least counterLongwords
longs containing a counter.index
 the index of the counter that will be filled with the provided array.setCounter(long[], long[], long)
public final void getCounter(long[] chunkBits, long index, long[] dest)
HyperLogLogCounterArray
and writes it into an array of longs.
Warning: this is a lowlevel method. You must know what you're doing.
chunkBits
 the array storing the counter.index
 the index of the counter to be extracted.dest
 an array of at least counterLongwords
longs where the counter of given index will be written.setCounter(long[], long[], long)
public final void getCounter(long index, long[] dest)
HyperLogLogCounterArray
and writes it into an array of longs.
Warning: this is a lowlevel method. You must know what you're doing.
index
 the index of the counter to be extracted.dest
 an array of at least counterLongwords
longs where the counter of given index will be written.getCounter(long[], long, long[])
public final void transfer(long[] source, long[] dest, long node)
Warning: this is a lowlevel method. You must know what you're doing.
source
 the source array.dest
 the destination array.node
 the node number.public final void max(long[] x, long[] y)
This method will allocate two temporary arrays. To reduce object creation, use max(long[], long[], long[], long[])
.
x
 a first array of at least counterLongwords
longs containing a counter.y
 a second array of at least counterLongwords
longs containing a counter.public final void max(long[] x, long[] y, long[] accumulator, long[] mask)
x
 a first array of at least counterLongwords
longs containing a counter.y
 a second array of at least counterLongwords
longs containing a counter.accumulator
 a support array of at least counterLongwords
longs.mask
 a support array of at least counterLongwords
longs.