The DSI utilities are a mish mash of classes accumulated during the last ten years in projects developed at the former DSI (Dipartimento di Scienze dell'Informazione, i.e., Information Sciences Department), now DI (Dipartimento di Informatica, i.e., Informatics Department) of the Università degli Studi di Milano. They were originally distributed in several projects (mainly in MG4J) but we finally decided to gather all the material in a single place.
The DSI utilities are distributed under the GNU Lesser General Public License.
The implementations available are a bit eclectic due to the particular kind of applications we developed. Very broadly, we have:
BitVectorand its implementations—a high-performance but flexible set of bit vector classes.
it.unimi.dsi.compressionpackage containing codecs for several types of encodings.
ProgressLogger, a flexible logger with statistics marking the progress of the (many) classes we use that require hours of computation.
ObjectParser, a class making it easy to specify complex objects on the command line.
MutableString, our answer to the Java
I/O package, containing fast version of several classes existing in
java.io, many useful classes to read easily text data (e.g.,
FileLinesCollection), bit streams, classes providing large-size memory mapping such as
OfflineIterable—the easy & fast way to store large sequences of objects on disk and iterate on them.
it.unimi.dsi.utilpackage, containing pseudorandom number generators, tries, immutable prefix maps, Bloom filters, a very comfortable
Propertiesclass and more.
it.unimi.dsi.statpackage, containing a lightweight class for computing basic statistics and an arbitrary-precision implementation of the Jackknife method.
Util(have a look!)
BulletParser, that we use to parse HTML and XML.
I/O big classes.
Collections and similar big classes.
Main classes manipulating bits.
Word-based compression/decompression classes.
A fast, lightweight, on-demand (X)HTML parser.
Callbacks for the
Miscellaneaous utility classes.