Alternative to Java Bitset with array like performance?

Alternative to Java Bitset with array like performance? - java

I am looking for an alternative to Java Bitset implementation. I am implementing a high performance algorithm and seems like using a Bitset object is killing its performance. Any ideas?

Someone here has compared boolean[] to BitSet and concluded with:
BitSet is more memory efficient than boolean[] except for very
small sizes. Each boolean in the array takes a byte. The numbers
from runtime.freeMemory() are a bit muddled for BitSet, but less.
boolean[] is more CPU efficient except for very large sizes, where
they are about even. E.g., for size 1 million boolean[] is about
four times faster (e.g. 6ms vs 27ms), for ten and a hundred million
they are about even.
If you Google, you can find some alternative implementations as well, like JavaEWAH, used by Apache Hive, Apache Spark and Eclipse JGit. It claims:
The goal of word-aligned compression is not to achieve the best
compression, but rather to improve query processing time. Hence, we
try to save CPU cycles, maybe at the expense of storage. However, the
EWAH scheme we implemented is always more efficient storage-wise than
an uncompressed bitmap as implemented in the BitSet class). Unlike
some alternatives, javaewah does not rely on a patented scheme.

While searching an answer for my question single byte comparison vs multiple boolean comparison, I found OpenBitSet
They claim to be faster than Java BitSet and direct access to the array of words storing the bits.
I am definitely gonna try that. See if it solve your purpose too.

Look at Javolution FastBitSet :
A high-performance bitset integrated with the collection framework as a set of indices and obeying the collection semantic for methods such as FastSet.size() (cardinality) or FastCollection.equals(java.lang.Object) (same set of indices).
See also http://code.google.com/p/guava-libraries/issues/detail?id=724#c3.

If you really must squeeze the maximum performance out of this thing, and if memory does not matter, you can try storing each one of your flags in an integer whose bit size is equal to the width of the data bus of your CPU.
You are probably on a 64-bit data bus CPU, so try long integers.

There are a number of compressed alternatives to the BitSet class. EWAH was already mentioned (https://github.com/lemire/javaewah). More recent additions include Roaring bitmaps (https://github.com/RoaringBitmap/RoaringBitmap) that are used by Apache Lucene, Apache Spark, Elastic Search, and so forth.

Related

How to create efficient bit set structure for big data?

Java's BitSet is in memory and it has no compression in it.
Say I have 1 billion entries in bit map - 125 MB is occupied in memory.
Say I have to do AND and OR operation on 10 such bit maps it is taking 1250 MB or 1.3 GB memory, which is unacceptable.
How to do fast operations on such bit maps without holding them uncompressed in memory?
I do not know the distribution of the bit in the bit-set.
I have also looked at JavaEWAH, which is a variant of the Java BitSet class, using run-length encoding (RLE) compression.
Is there any better solution ?

One solution is to keep the arrays off the heap.
You'll want to read this answer by #PeterLawrey to a related question.
In summary the performance of Memory-Mapped files in Java is quite good and it avoids keeping huge collections of objects on the heap.
The operating system may limit the size of a individual memory mapped region. Its easy to work around this limitation by mapping multiple regions. If the regions are fixed size, simple binary operations on the entities index can be used to find the corresponding memory mapped region in the list of memory-mapped files.
Are you sure you need compression? Compression will trade time for space. Its possible that the reduced I/O ends up saving you time, but its also possible that it won't. Can you add an SSD?
If you haven't yet tried memory-mapped files, start with that. I'd take a close look at implementing something on top of Peter's Chronicle.
If you need more speed you could try doing your binary operations in parallel.
If you end up needing compression you could always implement it on top of Chronicle's memory mapped arrays.

From the comments here what I would say as a complement to your initial question :
the bit fields distribution is unknown and so BitSet is probably the best we can use
you have to use the bit fields in different modules and want to cache them
That being said, my advice would be to implement a dedicated cache solution, using a LinkedHashMap with access order if LRU is an acceptable eviction strategy, and having a permanent storage on disk for the BitSetS.
Pseudo code :
class BitSetHolder {
class BitSetCache extends LinkedHashMap<Integer, Bitset> {
BitSetCache() {
LinkedHashMap(size, loadfactor, true); // access order ...
}
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > BitSetHolder.this.size; //size is knows in BitSetHolder
}
}
BitSet get(int i) { // get from cache if not from disk
if (bitSetCache.containsKey(i) {
return bitSetCache.get(i);
}
// if not in cache, put it in cache
BitSet bitSet = readFromDisk();
bitSetCache.put(i, bitSet);
return bitSet();
}
}
That way :
you have transparent access to you 10 bit sets
you keep in memory the most recently accessed bit sets
you limit the memory to the size of the cache (the minimum size should be 3 if you want to create a bitset be combining 2 others)
If this is an option for your requirements, I could develop a little more. Anyway, this is adaptable for other eviction strategy, LRU being the simplest as it is native in LinkedHashMap.

The best solution depends a great deal on the usage patterns and structure of the data.
If your data has some structure beyond a raw bit blob, you might be able to do better with a different data structure. For example, a word list can be represented very efficiently in both space and lookup time using a DAG.
Sample Directed Graph and Topological Sort Code
BitSet is internally represented as a long[], which makes it slightly more difficult to refactor. If you grab the source out of the openjdk, you'd want to rewrite it so that internally it used iterators, backed by either files or in-memory compressed blobs. However, you have to rewrite all the loops in BitSet to use iterators, so the entire blob never has to be instantiated.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/BitSet.java

Are Bit Set really faster than Sorted Set operations?

I am looking around for the best algorithms for the bitset operations like intersection and union, and found a lot of links and similar questions also.
Eg: Similar Question on Stack-Overflow
One thing however, which I am trying to understand is that where bit set stands into this. Eg, Lucene has taken BitSet operations to give a high performing set operations, specially because it can work at a lower level.
However, what looks to me is, the bit-set will start performing slow and slow, as the number of elements increase and the set is sparse, say set has ~10 elements where the max number of elements can be 2 Billion, because that will call out for unnecessary matching. What do you suggest ?

Bit Sets indeed make sense for dense sets, i.e. covering a significant fraction of the domain, as they represent every possible element. The space and running time requirements are O(D) [D = domain size = 2 billion !].
Sorted Set operations represent only the elements in the given set and will have an O(E) behavior [E = number of elements = 10], much more appropriate.
Bit Sets are fast, they are not efficient. I mean their hidden constant is smaller. They are blazingly fast for small sets (say D <= 1024) as they can process 32/64 elements in a single CPU instruction.

For sparse bitsets you can greatly improve performance (and reduce memory usage) using sparse bitmaps where you divide your data into chunks as opposed to storing everything under a single key.
When using bitmaps for analytics, you have a limited number of users active at any given time (e.g. day) and sparse bitmaps use this fact to their advantage.
Shameless plug: http://github.com/bilus/redis-bitops (if you're using Ruby but there are also performance notes there).

Huge binary matrix (logical AND on array of bitsets) Java performance

We have a java service that computes some logical operations on a huge binary matrix (10 000 x 10 000). This matrix is array of bitsets. The most important operation is an intersection (logical AND) betwen a given bitset and each bitset in the array. We are using OpenBitset and it shows quite good results (at least better than java.util.BitSet). Data sparsity is moderate (could be many 0 or 1 in a row), bitset size is fixed.
The most important thing for us is fast response times (for now it's ~0.05 sec), so we would like to find ways for further improvements as the matrix and the quantity of requests are growing. There could be some algebraic methods or faster libraries for that.
We tried to use javaewah, but this library performed operations 10x times slower comparing to OpenBitset. There is comparision on the project's page, that shows that other bitset-compression libraries slower than Java BitSet.
Could you suggest some other methods or new ideas?

In my recent blog I discussed a "yet another" bitset implementation - with source code. Maybe you want to give it a try: http://www.censhare.com/en/aktuelles/censhare-labs/yet-another-compressed-bitset

If you don't mind using client-server solution, pilosa would be perfect for your use case.
bindings for java,python,go
groupBy support
time range support
huge matrix support
uses high-performance roaringbitmap
scales horizontally
helm chart https://github.com/pilosa/helm

Are there any strategies (or libraries) for performant varints and varlongs in Java?

Currently, I am serializing some long data using DataOutput.writeLong(long). The issue with this is obvious: there are many many cases where the longs will be quite small. I was wondering what the most performant varint implementation is? I've seen the strategy from protocol buffers, and testing on Random long data (which probably isn't the right distribution to test against), I'm seeing a pretty big performance drop (about 3-4x slower). Is this to be expected? Are there any good strategies for serializing longs as quickly as possible while still saving space?
Thanks for your help!

How about using the standard DataOutput format for serializing and using a generic compression algorithm such as GZIPOutputStream for compression?

The protocol buffer encoding is actually pretty good but isn't helpful with random longs - it is mostly useful if your longs are probably going to be small positive or negative numbers (let's say in the +/- 1000 range 95% of the time).
Numbers in this range will typically get encoded in 1, 2 or 3 bytes compared with 8 for a normal long. Try it with this sort of input on a large set of longs, you can often often get 50-70% space savings.
Of course calculating this encoding has some performance overhead, but if you are using this for serialisation then CPU time will not be your bottleneck anyway - so you can effectively ignore the encoding cost.

Smart buffering in an environment with limited amount of memory Java

Dear StackOverflowers,
I am in the process of writing an application that sorts a huge amount of integers from a binary file. I need to do it as quickly as possible and the main performance issue is the disk access time, since I make a multitude of reads it slows down the algorithm quite significantly.
The standard way of doing this would be to fill ~50% of the available memory with a buffered object of some sort (BufferedInputStream etc) then transfer the integers from the buffered object into an array of integers (which takes up the rest of free space) and sort the integers in the array. Save the sorted block back to disk, repeat the procedure until the whole file is split into sorted blocks and then merge the blocks together.
The strategy for sorting the blocks utilises only 50% of the memory available since the data is essentially duplicated (50% for the cache and 50% for the array while they store the same data).
I am hoping that I can optimise this phase of the algorithm (sorting the blocks) by writing my own buffered class that allows caching data straight into an int array, so that the array could take up all of the free space not just 50% of it, this would reduce the number of disk accesses in this phase by a factor of 2. The thing is I am not sure where to start.
EDIT:
Essentially I would like to find a way to fill up an array of integers by executing only one read on the file. Another constraint is the array has to use most of the free memory.
If any of the statements I made are wrong or at least seem to be please correct me,
any help appreciated,
Regards

when you say limited, how limited... <1mb <10mb <64mb?
It makes a difference since you won't actually get much benefit if any from having large BufferedInputStreams in most cases the default value of 8192 (JDK 1.6) is enough and increasing doesn't ussually make that much difference.
Using a smaller BufferedInputStream should leave you with nearly all of the heap to create and sort each chunk before writing them to disk.

You might want to look into the Java NIO libraries, specifically File Channels and Int Buffers.

You dont give many hints. But two things come to my mind. First, if you have many integers, but not that much distinctive values, bucket sort could be the solution.
Secondly, one word (ok term), screams in my head when I hear that: external tape sorting. In early computer days (i.e. stone age) data relied on tapes, and it was very hard to sort data spread over multiple tapes. It is very similar to your situation. And indeed merge sort was the most often used sorting that days, and as far as I remember, Knuths TAOCP had a nice chapter about it. There might be some good hints about the size of caches, buffers and similar.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.