2.7.2 - Faster, safer AbstractBitVector.bits(). 2.7.1 - New method LongBigArrayBigVector.bigBits() to expose underlying big array. 2.7.0 - New MappedFrontCodedStringBigList makes it possible to access a list of string using memory mapping. - TransformationStrategies.fixedLong() now is truly lexicographical, even on negative numbers. Unfortunately, this change made it necessary to bump the serial version of the class, which is the same of TransformationStrategies.rawFixedLong(), which means that structures serialized using both strategies will not be deserializable. - Fixed wrong indexing in LongSet views of subvectors in AbstractBitVector. 2.6.17 - New FileLines*Iterable classes replacing older FileLines*Collection implementations. It provides transparently support for more than 2^31 line and arbitrary decompressors that extend java.io.InputStream. It is also possible to specify the number of strings in advance. - New lexicographical prefix-free transformation strategy for byte arrays that do not contain zeros. - Fixed problem with generation of Java 8 code and some ByteBuffer methods. Thanks to Aaron Novstrup for reporting this bug. 2.6.16 - The DSI Utilities are now dually licensed under the Lesser GNU Public License 2.1+ or the Apache Software License 2.0. 2.6.15 - Removed (almost) unused dependencies. - Upgraded to newest versions of Apache Commons. 2.6.14 - Reinstated 12-bit OutputBitStream tables (mistakenly removed). 2.6.13 - OSGi modularization (really, really for real). 2.6.12 - OSGi modularization (really for real). 2.6.11 - Manual reduction in strength in arithmetic operations everywhere. - Useful static methods of LongArrayBitVector and LongBigArrayBitVector are now public. - Assertions for indices in array-based bit vectors (even when CHECKS is false). - OSGi modularization (for real). 2.6.10 - OSGi metadata and default modularization. 2.6.9 - Added automatic module name. 2.6.8 - Fixed missing data files from tests (thanks to Pierre Gruet for reporting this bug). 2.6.7 - New methods for composing permutations. - Big versions of front-coded string lists. 2.6.6 - Fixed SLF4J dependencies (again). 2.6.5 - Fixed erroneous casting in LongBigArrayBitVector. Thanks to Louie Helm for reporting this bug. 2.6.4 - New LongBigArrayBitVector class for bit vectors longer than 2^37. 2.6.3 - Fixed major bug in splitting methods for pseudrandom number generators: they were not avancing the generator state, which means that multiple calls to split() would have returned identical generators. 2.6.2 - Fixed build.xml: we now use the classifier in naming artifacts. 2.6.0 - New, slightly faster PRNGs using the ++ scrambler. 2.5.5 - Faster, no-test versions of Fast.int2nat() and Fast.nat2int(). - ByteBufferLongBigList instances are now mutable. 2.5.4 - Fixed old-standing bug in LongArrayBitVector.{next|previous}{One|Zero}(). 2.5.3 - Removed boundary checks from LongArrayBitVector. 2.5.2 - Fixed dependence from SLF4J. 2.5.1 - Improved jump()/longJump() methods for generating non-overlapping sequences. Thanks to Danilo Fabiani and Tim Mattox for reporting problems and suggesting solutions. 2.5.0 - New PRNGs. - ObjectParser now has default factory methods getInstance, newInstance and valueOf. 2.4.2 - Fixed bug in nextInt(1) call for xoroshiro128+ and xorshift1024*φ implementations. 2.4.1 - xorshift1024*φ generators - More Java 8-ish code. - Eliminated spaces inside ( and [. - Removed multiplication-free nextDouble()/nextFloat() from all PRNG implementations. Now they all generate floating-point numbers using 53 and 24 significant bits, respectively. The difference is only in the least significant bit of half the values generated (previously it would have always been zero). A new methods with a Fast suffix have been added to obtain the old behavior. - In xoroshiro128+ and xorshift1024*φ implementations, nextInt() implementations now return the upper half (instead of the lower half) and the shortcut for for powers of two in nextInt() and nextLong() now returns the high bits instead of the low bit. 2.4.0 - Java 8-only, fastutil 8. 2.3.6 - Added ReorderingBlockingQueue. 2.3.5 - Moved logback-classic dependency from compile to runtime. - Fixed licence in Kahan-summation classes. - Fast constructor for LongArrayBitVector instances of given length. 2.3.4 - Now all RandomGenerator implementations implement Serializable, so all generators can be serialized and restarted in the same state. 2.3.3 - xoroshiro128+ implementations and new JMH-based speed measurements. - SplitMix64 now provides a truly uniform nextInt(int): the generated sequence, however, will not be the same. - SplitMix64 and XorShift1024Star will now return different results for nextLong(long)/nextInt(int) when the argument is a power of 2 (but more quickly). 2.3.2 - New JSAP StringParser for enums. 2.3.1 - Fixed bug in PrefixFreeTransformationStrategy. 2.3.0 - New transformation strategies for byte arrays. - Several new, high-performance raw transformation strategies that do not reverse bit order. 2.2.8 - New class FileLinesByteArrayCollection. 2.2.7 - New high-performance raw transformation strategy for byte arrays. 2.2.6 - Fixed jump() function of xorshift1024* generators. 2.2.5 - All pseudorandom number generators now use a smart, faster conversion to float/double, SplitMix64 is used everywhere to generate a large state from a seed, and the shifts of XorShift128Plus have been slightly modified to obtain a better polynomial weight. The generators should be pretty stable now (hopefully). 2.2.4 - New InputBitStream.copyto()/OutputBitStream.copyFrom() methods. 2.2.3 - Improved speed for Fast.select() thanks to an idea of Giuseppe Ottaviano. - New SplitMix64Random PRNG that instantiates Java 8's SplittableRandom and provides the fastest generator passing BigCrush using only 64 bits of state. - Now all nextBoolean() methods in PRNGs check for nextLong)() < 0, which is faster. 1024-bit generators with a large state space are now initialized using a SplitMix64RandomGenerator, which will cause, given a seed, a different sequence to be generated. - XorShift64Random[Generator] has been deprecated, as SplitMix64Random[Generator] is better with every respect. - MutableString's method that read UTF-8 strings are now deprecated, as they don't handle properly surrogate pairs. The self-delimiting methods have not been deprecated, but a warning has been added to the documentation. 2.2.2 - Fixed Maven dependencies. 2.2.1 - Some jackknife BigDecimal operations were not propagating the math context. - Slightly improved speed of InputBitStream.readUnary() methods (thanks to Giuseppe Ottaviano for the tip). 2.2.0 - Removed all deprecated dependencies from Log4J. - Removed deprecated methods in Fast. - Removed OfflineIterable.length() method. - Removed Utf16TransformationStrategy, IntHyperLogLogCounterArray, XorShiftStarRandom, XorShiftStarRandomGenerator. 2.1.10 - Fixed wrong computation of local speed upon restart of a ProgressLogger. - New XorShift128PlusRandom generators. These are currently the fastest generators that have no systematic failure in BigCrush, albeit the state space is good only for single-thread usage. 2.1.9 - Fixed iterator bug in FrontCodedStringList, which moreover now implements java.util.RandomAccess. - Made HyperLogLogCounterArray.counterLongwords public. 2.1.8 - Many utility methods previously in WebGraph's HyperBall class have been moved to HyperLogLogCounterArray. 2.1.7 - Fixed (again, sorry, I think it will be the last time :) the parameters of xorshift* generators. - XorShiftStarRandom[Generator] has been deprecated in favour of XorShift64StarRandom[Generator] 2.1.6. The new class has slightly improved parameters and a seeding procedure that avoids "zeroland" transient behaviour when seeding with small numbers (I do it all the time...). - SummaryStats is now thread safe. 2.1.5 - Further replaced BloomFilter methods (hopefully for the last time). - Switched to standard vByte coding in ByteDiskQueue (the previous coding, mistakenly called Group Varint, was slower). 2.1.4 - Replaced BloomFilter.contains(Hasher) with BloomFilter.contains(HashCode). The first method is now deprecated, as it calls Hasher.hash(), which is not idempotent. - Fixed bug in the value returned by LongArrayBitVector.set(int,boolean). 2.1.3 - Replaced BloomFilter.add(Hasher) with BloomFilter.add(HashCode). The first method is now deprecated, as it calls Hasher.hash(), which is not idempotent. 2.1.2 - New class for Kahan summations. 2.1.1 - New method InputBitStream.position(). 2.1 - Decided the final values, finally, for all constants in XorShiftStar pseudorandom number generators, included the new XorShift1024Star generator. These classes are no longer experimental. - Added isEmpty() method to MutableString. - Added slower UTF-32 transformation strategy that handles surrogate pairs. - New PermutedFrontCodedStringList has the option of passing a text file for the permutation, and instructions on how to build the permutation using just UN*X commands (suggested by Peter Mika). - Added file: functionality to ObjectParser. - BloomFilter now supports custom hashers. - Util.format() was localized, leading to inconsistent formatting. Thanks to Nicola Tonellotto for reporting this bug. - New ByteDiskQueue for handling large queues. - We no longer accept repeated characters in the string specifying the additional word constituents of a FastBufferedReader. - New ProgressLogger.start() methods that make it possible to start ahead of current time. - Fixed very old set of pernicious bugs in reading/writing coded longs in bit streams due to missing L's in constant literals. - IntHyperLogLogCounterArray has been deprecated in favour of HyperLogLogCounterArray, which supports long indices and values. 2.0.15 - OfflineIterable instances can now be reused. - DelimitedWordReader now implements copy() correctly. 2.0.14 - ProgressLogger can now fix the time units used for speed and time per item. - The one-argument constructor of ByteBufferLongBigList was not computing the parameters passed to the chained constructor correctly. Thanks to Tim Potter for reporting and fixing this bug. 2.0.13 - New LongBigArraySignedStringMap for signing very large maps and keep the signatures on disk, memory mapped. - More integer overflow bugs fixes (thanks to Tim Potter). 2.0.12 - The fix to it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was not complete. 2.0.11 - Removed VOID_FUNNEL from BloomFilter (it wasn't really necessary). - We did not *really* switch to SLF4J for logging. We now do. In the process, we lost serialization compatibility for some classes, and some loggers that were useful only in the main method are no logger cached in static fields. - it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was not implementing Size64 and not using size64(). Now it does. Thanks to Tim Potter for reporting this problem. - it.unimi.dsi.big.util.LiterallySignedStringMap was not implementing Size64. Now it does. 2.0.10 - LongArrayBitVector now exploits the fact that shift operations use just the lower necessary bits of their right operand, avoiding a few masking operations. - BloomFilter has been significantly enhanced and it is not longer binary compatible with its previous version. - We switched to SLF4J for logging. - ProgressLogger can now optionally display the local speed, that is, the speed shown by the process since the last log. It now also based on SLF4J and Log4J constructors have been deprecated. - All Log4J-related methods in Util have been deprecated. If ConsoleAppender cannot be found (e.g., because you are using a bridging package that mimics Log4J) a warning is issued. - Util used to share a static number format without synchronization. Now old format methods are synchronized, and there are new equivalent method to which you can supply your own NumberFormat. 2.0.9 - Fixed again (hopefully in the right way this time) the Ivy dependency file. 2.0.8 - Several methods in Fast now use equivalent methods in Integer/Long or have just been deprecated in favour of their copies, as recent JVMs intrinsify such methods. - The Ivy dependency file has been Maven-normalized (now we have a "compile" scope instead of a "build" scope). Thanks to Kazó Csaba for reporting this issue. 2.0.7 - New InputBitStream methods for multiple coded writing (e.g., writeGammas()). - New static methods in it.unimi.dsi.big.util.StringMaps offer big views of standard string and prefix maps. 2.0.6 - New NullLogger for throwing away all logging. 2.0.4 - Fixed jar building issues. 2.0.3 - new Jackknife class to compute arbitrary-precision jackknife estimates. - Fixed bug in SummaryStats.max(). - Now property files use "=" instead of " = " as separator, thanks to the new setter in PropertiesConfigurationLayout, and are thus sourceable. - Now XorShiftStarRandom uses the right number of bits to generate doubles and floats: the previous version could occasionally generate 1. - New ByteBufferLongBigList class (along the lines of ByteBufferInputStream) that exposes arbitrarily large binary files of longs (with settable endianness) as a big list. - We now use Ivy to handle jar dependencies. 2.0.2 - Now ByteBufferInputStream initialises copies lazily, which should help when mapping large files with concurrent access (e.g., WebGraph's). - Now XorShiftStarRandom uses 63 bits to generate doubles, and provides truly uniform nextInt(int) and nextLong(long) methods. - Fixed computation of singular names in ProgressLogger. - Now IntHyperLogLogCounterArray has a count() method that returns the estimate of a single counter. 2.0.1 - Now there are permutation methods for big arrays. - New SummaryStats class for basic statistics bookkeeping (I know, there are billions of those, but we wanted something really simple, fast, and supporting >2^31 values). 2.0 - WARNING: This release has minor binary incompatibilities with previous releases, mainly due to the move from the interface it.unimi.dsi.util.LongBigList to the now standard it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were all modified to fit the new interface, and that prepare the way for our "big" versions, that is, supporting >2^31 entries in arrays (simulated), elements in lists, terms, documents, nodes, etc. Please read our (short) "Moving Java to Big Data" document (JavaBig.pdf) for details. - We now require Java 6. - it.unimi.dsi.util.LongBigList is dead. Long live to it.unimi.dsi.fastutil.longs.LongBigList. We're sorry for the nuisance--adapting the code should be very easy (and we warned you anyway :). - New it.unimi.dsi.big namespace for classes that need redesign to handle collections with more than Integer.MAX_VALUE elements. Presently we provide an adaptation of all classes built around StringMap. The classes have the same name of the non-big counterparts, so watch out. - Now you can save directly a synchronised ImmutableExternalPrefixMap. - Instances of an ImmutableExternalPrefixMap contain now a small cache that will store the most recent answers. - Transformation strategies now have a length method that computes the length of a representation (possibly without actually computing the representation). - New method BitVector.equals(v,from,to) that compares segments of two bit vectors. - A ProgressLogger will now print the time per item (besides the number of items per time unit). Some patterns compute the correct singular from the plural item name. - The automatic Log4J configuration system will now log on stderr instead of stdout. - Fixed very old bug of Interval: the implemented sorted set was returning iterators with an off-by-one starting point. - Fixed very old bug in OutputBitStream.position() for wrapped output streams. Thanks to Soumen Chakrabarti for reporting this bug. 1.0.12 - The DSI utilities are now distributed under the GNU Lesser General Public License 3. - We are no longer dependent on COLT, thanks to the new sorting stuff in fastutil (which, in fact, is partially borrowed from COLT) and java.security.SecureRandom. - New IntHyperLogLogCounterArray that computes estimates of unique elements in streams using HyperLogLog counters. - LongArrayBitVectors now check that its length does not exceed 2^37. - Now ProgressLogger makes it possible to update with a given value, or to update and force display. Moreover, times per time interval are scaled so slow processes gets items per minute, hour or day. - New Util methods to manipulate and generate permutations. - New Util.randomSeed()/Util.randomSeedBytes() methods that generates a reasonable seed using System.nanoTime() and MurmurHash3. - We now check that the buffer of a FastBufferedReader is of nonzero length. - Fast flip() methods in LongArrayBitVector. 1.0.11 - BitVectors.readFast()/BitVectors.writeFast() now use more liberal interfaces. - Equality in AbstractBitVector is computed quickly word by word, and not scanning bit by bit (ouch!). - The testForPosition boolean flag in InputBitStream and OutputBitStream constructors now just inhibits just the very slow reflective test for the existence of a getChannel() method. Conformance to the RepositionableStream interface is always checked (requested by Bryan Thompson). - {Input,Output}BitStream.close() was throwing a null pointer exception in certain cases (thanks to Bryan Thompson for finding and fixing this bug). - CanonicalFast64CodeWordDecoder now works with 0 or 1 symbols (thanks to Bryan Thompson for letting me notice this problem). - IntBloomFilter was initialising incorrectly the field m (thanks to David Greenspan for finding and fixing this bug). - NullOutputStream is now a RepositionableSteam. In this way, the creation of an OutputBitStream wrapping a NullOutputStream is much quicker. - Fixed integer overflow bugs in LongArrayBitVector.wrap() and elsewhere. 1.0.10 - Fixed return type of writeLongZeta to int. - New DelimitedWordReader class for tokenising streams by delimiters (rather than by word constituents). - Fixed a small bug in ObjectParser: string arguments were trimmed even if delimited by quotes. - ObjectParser will now accept null contexts. - FastBufferedReader/DelimitedWordReader have toString()/toSpec() methods. - New MutableString.skipSelfDelimUTF8() method. 1.0.9 - New ByteBufferInputStream.map() method that will map any file into a memory-mapped array of ByteBuffers and expose them as a measurable, repositionable InputStream. Very useful to work around the 2GiB limit of FileChannel.map(). - HuTuckerCodec can now get frequencies as a vector of longs. HuTuckerTransformationStrategy uses the new constructor. - Code in replace() and clear() methods was not using the fact that unused parts of a LongArrayBitVector are maintained clear. This caused a significant slowdown when replacing or clearing vectors with a large backing array. - New OfflineIterable class that makes it possible to dump and reread quickly temporary data. 1.0.8 - InputBitStream and OutputBitStream have new constructors that make it possible to skip the reflective test that are necessary to support position(). This feature was requested by Bryan Thompson, who had to create often a large number of bit streams. Additionally, there is a constructor accepting a file input/output stream that invokes getChannel() without using reflection. - New Util.getDebugLogger() method for getting more detailed debugging programmatically (e.g., when testing). More generally, now it is possible to configure the log level when autoconfiguring Log4J. - BitVector.hashCode() has been finally nailed down and documented. The chosen function seems to be reasonable and quick. Sorry if some serialised classes out there depended on it... - Fixed old bug in ImmutableBinaryTrie: a root with a non-empty compacted path would have returned an incorrect approximated interval on prefixes of that path. Thanks to Peni Nissani for finding this bug. - Slight impromevent to Fast.select(), which was using a few redundant instruction when doing broadword comparisons. 1.0.7 - Fixed obnoxious bug in LongArrayBitVector.replace(BitVector). In some cases bit vectors with garbage after length could have been generated. - Fixed bug in LongArrayBitVector.longestCommonPrefixLength(). When comparing a string to one of its prefix the returned length was sometimes incorrect. - New methods BitVector.firstZero/BitVector.lastZero() with fast implementations. - New methods in BitVectors that dump on disk and expose as an iterable object the bit vectors returned by an iteartor. 1.0.6 - New transformation strategy that remaps bit vectors so that they are prefix free. - Fixed a horrendous bug: AbstractBitVector was implementing equals(), but not hashCode(). This fact was causing a number of problems with collections. Since I had to fix it anyway, I built in a better hash function, too. 1.0.5 - WARNING: ObjectParser now throws more accurate exception. As an unpleasing side effect, it throws more checked exceptions than it used to, so your code might not compile. - New (obviously missing) interval comparators in Intervals. - Fixed absolutely stupid bug in CanonicalFast64CodeWordDecoder: the size of the lastCodeWordPlusOne array would have been the number of codewords, and not the number of codeword lengths. The resulting array was an order of magnitude larger than necessary, which wasn't properly a bug, but very annoying nonetheless (in particular when you generate thousands of decoders, as we do for our large time-aware graphs). - New optional context object in ObjectParser. 1.0.4 - FastBufferedReader has now a more flexible approach to word segmentation. 1.0.3 - ShiftAddXorSignedStringMap moved here from Sux4J. It can sign any given function (usually one from Sux4J). - Fixed bug in LongArrayBitVector.length(long) (the underlying long array was not cleared properly). - Optimised implementations for LongArrayBitVector.{fill,flip}(). - Fixed very stupid but pernicious bug in ImmutableExternalPrefixMap, which was actually a bug in AbstractPrefixMap: the default return value of the underlying function should have been set to -1. 1.0.2 - Fixed very old bug in MutableString.print(PrintStream). - New methods in BitVector. In particular, fast(). - Fixed bug in LongArrayBitVector.replace(LongArrayBitVector). - Fixed bug in LongArrayBitVector.copy(LongArrayBitVector). - New ISO-8859-1 transformation strategy. - Double (CharSequence/MutableString) implementation for UTF16/ISO-8859-1 transformation strategies. - Several performance enhancements to bit vector classes. - A series of wrappers in TransformationStrategies apply transforms to all the elements of collection-like objects. 1.0.1 - New method BitVector.append(BitVector). - PrefixCoderTransformationStrategy and UTF16TransformationStrategy now are thread-safe, as they return a new instance at each call. They now both support prefix-free and non-prefix-free encodings. - UTF16TransformationStrategy has been now replaced by two singleton instances in TransformationStrategies. - Significantly optimised LongArrayBitVector.getLong(). 1.0 - First release.