Java - BitSet Replacement

Java - BitSet Replacement - java

In my code I am trying to get if a number exist in the hashmap or not. My code is following:
BitSet arp = new BitSet();
for i = 0 to 10 million
HashMap.get (i)
if number exist
arp.set(i , true)
else
arp.set(i , false)
After that from bitset I get if number i exist or not. However, I found this bitset operation is quite slow (tried with string = string + 0/1 also, more slower). Can anybody help me how to replace this operation with a faster one.

Your code is really difficult to read clearly, but I suspect you're just trying to set bits in the BitSet that are keys from your HashMap?
In that case, your code should just be more or less
BitSet bits = new BitSet(10000000);
for (Integer k : map.keySet()) {
bits.set(k);
}
Even if this wasn't what you meant, as a general rule, BitSet is blazing fast; I suspect it's the rest of your code that's slow.

If you provided your actual relevant code, we could have found some performance errors in the first place. But assuming your code is ok and you profiled your application to make sure that the BitSet operations are actually slow:
If you have enough memory space available, you can always just go for a boolean[] instead of a BitSet.
BitSet internally uses long[] to store the separate bits, so it's very good memory-wise, but can sometimes be a little bit too slow.

Related

Is this way of generating a securerandom biginteger secure?

So I tried this line of code in java which generates a random integer that is 40 bytes long. I have no clue if it's secure and I wondered if anyone with a little bit more experience than me could explain.
I would like to know if this is cryptographically secure. meaning is this a secure way of generating a random number that's a BigInteger. If it isn't secure what would be a good way to generate a full cryptographically random BigInteger.
SecureRandom random = new SecureRandom();
BigInteger key_limit = new BigInteger("10000000000000000000000000000000000000000");
int key_length = key_limit.bitLength();
BigInteger key_1 = new BigInteger(key_length, random);

You're rolling your own crypto.
Be prepared to fail. The odds that the code you end up writing will actually be secure are infinitesemal. It is very, very, very easy to make a mistake. These mistakes are almost always extremely hard to test for (for example, your algorithm may leak information based on how long it takes to process different input, thus letting an attacker figure out the key in a matter of hours. Did you plan on writing a test that checks if all attempts to decode anything, be it the actual ciphertext, mangled ciphertext, half of ciphertext, crafted input specifically designed to try to derive key info by checking how long it takes to process, and random gobbledygook all take exactly equally long? Do you know what kind of crafted inputs you need to test for, even?)
On the topic of timing attacks, specifically, once you write BigInteger, you've almost certainly lost the game. It's virtually impossible to write an algorithm based on BI that is impervious to timing attacks.
An expert would keep all key and crypto algorithm intermediates in byte[] form.
So, you're doing it wrong. Do not roll your own crypto, you'll mess it up. Use existing algorithms.
If you really, really, really want to go down this road, you need to learn, a lot, before you even start. Begin by analysing a ton of existing implementations. Try to grok every line, try to grok every move. For example, a password hash checking algorithm might contain this code:
public boolean isEqual(byte[] a, byte[] b) {
if (a.length != b.length) throw new IllegalArgumentException("mismatched lengths");
int len = a.length;
boolean pass = true;
for (int i = 0; i < len; i++) {
if (a[i] != b[i]) pass = false;
}
return pass;
}
and you may simply conclude: Eh. Weird. I guess they copied it from C or something, or they just didn't know they could have removed that method entirely and just replaced it with java.util.Arrays.equals(a, b);. Oh well, it doesn't matter.
and you would be wrong - that's what I mean by understand it all. Assume no mistakes are made. Arrays.equals can be timing-attacked (the amount of time it takes for it to run tells you something: The earlier the mismatch, the faster it returns. This method takes the same time, but only 'works' if the two inputs are equal in length, so it throws instead of returning the seemingly obvious false if that happens).
If you spend that much time analysing them all, you'll have covered this question a few times over.
So, with all that context:
This answer is a bazooka. You WILL blow your foot off. You do not want to write this code. You do not want to do what you are trying to do. BigInteger is the wrong approach.
new BigInteger(8 * 40, secureRandom); will get the job done properly: Generates a random number between (0 and 2^320-1), inclusive, precisely 40 bytes worth. No more, no less.
40 bytes worth of randomness can be generated as follows:
byte[] key = new byte[40];
secureRandom.nextBytes(key);
But this is, really, still a grave error unless you really, really, really know what you are doing (try finding an existing implementation that has some reliable author or has been reviewed by an expert).

You will get a BigInteger containing a securely generated random number that way.
However, that method for calculating the bit length is (to say the least) odd. I don't know about you, but most programmers would find it difficult to work out how many zeros there are in that string. Then, the computation is going to give you a bit count such that 2bits is less than the number.
It would make a lot more sense (to me) to just specify a bit count directly and code it, and add a comment to explain it.
To a first approximation1 2(10*N) is 1000N. However, the former is slightly greater than the latter. That means if your code is intended to give you 40 byte random keys, your computed key length will be off by one.
1 - Experienced programmers remember that ... and inexperienced programmers can use a programmer's calculator.

Efficiently Storing a Short History of Boolean Events for many Components

To preface this - I have no influence on the design of this problem and I can't really give a lot of details about the technical background.
Say I have a lot of components of the same type that regularly get a boolean event - and I need to hold a short history of these boolean events.
A coworker of mine wrote a rather naive implementation using the type Map<Component, CircularFifoQueue<Boolean>>, CircularFifoQueue being data structure from Apache Commons. The code works, but given how generics work in Java and the dimensions used, this is really inefficient as it stores a reference to one of the two singleton boolean objects instead of just one bit.
Generally there are around 100K components and the history is supposed to hold the 5-10 most recent boolean values (might be subject to change but probably won't be larger than 10). This currently means that around 1.5GB of RAM are allocated just for these history maps. Also these changes happen quite frequently so it wouldn't hurt to increase the CPU efficiency if possible.
One obvious change would be to move the history into the Component class to remove the HashMap-induced overhead.
The more complicated question is how to efficiently store the last few boolean values.
One possible way would be to use BitSets, but as those use long[] as their underlying data structure, I doubt it would be the most efficient way to store what is essentially 5 bits.
Another option would be to directly use an integer and shift the value as a way to remove old entries. So basically
int history = 0;
public void set(int length, boolean active){
if(active) {
history |= 1 << length;
} else {
history &= ~(1 << length);
}
// shift one to the right to remove oldest entry
history = history >> 1;
}
Just off the top of my head. This code is untested. I don't know how efficient or if it works, but that is about what I had in mind.
But that would still lead to quite some overhead compared to the optimal case of storing 5 bits of data using 5 bits of memory.
One could achieve some additional saving if the histories of the different components were stored in a contiguous array, but I'm not sure how to handle either one giant contiguous BitSet. Or alternatively a large byte[] where each byte represents one bool-history as explained above.
This is a weirdly specific problem and I'd be really glad about any suggestions.

Setting aside the bit manipulations which I'm sure you'll conquer, please think how efficient is efficient enough.
Every instance of
class Foo {}
allocates 16 bytes. So if you were to introduce
class ComponentHistory {
private final int bits;
}
that's 20 bytes.
If you replace the int with byte, you're still at 20 bytes: byte type is padded to 4 bytes by JVM (at least).
If you define a global array of bits somewhere and refer to it from ComponentHistory, the reference itself is at least 4 bytes.
Basically, you can't win :)
But consider this: if you go with the simplest approach that you have already outlined, that produces simple readable code, your 100K component histories will take up 2MB of RAM - substantial savings from your current level of 1.5GB. Specifically, you've saved 1498MB.
Suppose you indeed invent a cumbersome yet working way of only storing 5 bits per history. You'd then need 500Kb = 60KB to store all histories. With the baseline of 1.5GB, your savings are now 1499.94MB. Savings improve by 0.1%. Does that at all matter? More often than not, I'd prefer to not over-optimize here while sacrificing simplicity.

Implementing a very efficient bit structure

I'm looking for a solution in pesudo code or java or js for the following problem:
We need to implement an efficient bit structure to hold data for N bits (you could think of the bits as booleans as well, on/off).
We need to support the following methods:
init(n)
get(index)
set(index, True/False)
setAll(True/false)
Now I got to a solution with o(1) in all except for init that is o(n). The idea was to create an array where each index saves value for a bit. In order to support the setAll I would also save a timestamp withe the bit vapue to know if to take the value from tge array or from tge last setAll value. The o(n) in init is because we need to go through the array to nullify it, otherwise it will have garbage which can be ANYTHING. Now I was asked to find a solution where the init is also o(1) (we can create an array, but we cant clear the garbage, the garbage might even look like valid data which is wrong and make the solution bad, we need a solution that works 100%).
Update:
This is an algorithmic qiestion and not a language specific one. I encountered it in an interview question. Also using an integer to represent the bit array is not good enough because of memory limits. I was tipped that it has something to do with some kind of smart handling of garbage data in the array without ckeaning it in the init, using some kind of mechanism to not fall because if the garbage data in the array (but I'm not sure how).

Make lazy data structure based on hashmap (while hashmap sometimes might have worse access time than o(1)) with 32-bit values (8,16,64 ints are suitable too) for storage and auxiliary field InitFlag
To clear all, make empty map with InitFlag = 0 (deleting old map is GC's work in Java, isn't it?)
To set all, make empty map with InitFlag = 1
When changing some bit, check whether corresponding int key bitnum/32 exists. If yes, just change bitnum&32 bit, if not and bit value differs from InitFlag - create key with value based on InitFlag (all zeros or all ones) and change needed bit.
When retrieving some bit, check whether corresponding key exists. If yes, extract bit, if not - get InitFlag value
SetAll(0): ifl = 0, map - {}
SetBit(35): ifl = 0, map - {1 : 0x10}
SetBit(32): ifl = 0, map - {1 : 0x12}
ClearBit(32): ifl = 0, map - {1 : 0x10}
ClearBit(1): do nothing, ifl = 0, map - {1 : 0x10}
GetBit(1): key=0 doesn't exist, return ifl=0
GetBit(35): key=1 exists, return map[1]>>3 =1
SetAll(1): ifl = 1, map = {}
SetBit(35): do nothing
ClearBit(35): ifl = 1, map - {1 : 0xFFFFFFF7 = 0b...11110111}
and so on

If this is a college/high-school computer science test or homework assignment question - I suspect they are trying to get you to use BOOLEAN BIT-WISE LOGIC - specifically, saving the bit inside of an int or a long. I suspect (but I'm not a mind-reader - and I could be wrong!) that using "Arrays" is exactly what your teacher would want you to avoid.
For instance - this quote is copied from Google's Search Reults:
long: The long data type is a 64-bit two's complement integer. The
signed long has a minimum value of -263 and a maximum value of 263-1.
In Java SE 8 and later, you can use the long data type to represent an
unsigned 64-bit long, which has a minimum value of 0 and a maximum
value of 264-1
What that means is that a single long variable in Java could store 64 of your bit-wise values:
long storage;
// To get the first bit-value, use logical-or ('|') and get the bit.
boolean result1 = (boolean) storage | 0b00000001; // Gets the first bit in 'storage'
boolean result2 = (boolean) storage | 0b00000010; // Gets the second
boolean result3 = (boolean) storage | 0b00000100; // Gets the third
...
boolean result8 = (boolean) storage | 0b10000000; // Gets the eighth result.
I could write the entire thing for you, but I'm not 100% sure of your actual specifications - if you use a long, you can only store 64 separate binary values. If you want an arbitrary number of values, you would have to use as many 'long' as you need.
Here is a SO posts about binary / boolean values:
Binary representation in Java
Here is a SO post about bit-shifting:
Java - Circular shift using bitwise operations
Again, it would be a job, and I'm not going to write the entire project. However, the get(int index) and set(int index, boolean val) methods would involve bit-wise shifting of the number 1.
int pos = 1;
pos = pos << 5; // This would function as a 'pointer' to the fifth element of the binary number list.
storage | pos; // This retrieves the value stored as position 5.

Why is Java HashMap slowing down?

I try to build a map with the content of a file and my code is as below:
System.out.println("begin to build the sns map....");
String basePath = PropertyReader.getProp("oldbasepath");
String pathname = basePath + "\\user_sns.txt";
FileReader fr;
Map<Integer, List<Integer>> snsMap =
new HashMap<Integer, List<Integer>>(2000000);
try {
fr = new FileReader(pathname);
BufferedReader br = new BufferedReader(fr);
String line;
int i = 1;
while ((line = br.readLine()) != null) {
System.out.println("line number: " + i);
i++;
String[] strs = line.split("\t");
int key = Integer.parseInt(strs[0]);
int value = Integer.parseInt(strs[1]);
List<Integer> list = snsMap.get(key);
//if the follower is not in the map
if(snsMap.get(key) == null)
list = new LinkedList<Integer>();
list.add(value);
snsMap.put(key, list);
System.out.println("map size: " + snsMap.size());
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("finish building the sns map....");
return snsMap;
The program is very fast at first but gets much slowly when the information printed is :
map size: 1138338
line number: 30923602
map size: 1138338
line number: 30923603
....
I try to find to reason with two System.out.println() clauses to judge the preformance of BufferedReader and HashMap instead of a Java profiler.
Sometimes it takes a while to get the information of the map size after getting the line number information, and sometimes, it takes a while to get the information of the line number information after get the map size. My question is: which makes my program slow? the BufferedReader for a big file or HashMap for a big map?

If you are testing this from inside Eclipse, you should be aware of the huge performance penalty of writing to stdout/stderr, due to Eclipse's capturing that ouptut in the Console view. Printing inside a tight loop is always a performance issue, even outside of Eclipse.
But, if what you are complaining about is the slowdown experienced after processing 30 million lines, then I bet it's a memory issue. First it slows down due to intense GC'ing and then it breaks with OutOfMemoryError.

You will have to check you program with some profiling tools to understand why it is slow.
In general file access is much more slower than in memory operations (unless you are constrained in memory and doing excess GC) so the guess would be that reading file could be the slower here.

Before you profiled, you will not know what is slow and what isn't.
Most likely, the System.out will show up as being the bottleneck, and you'll then have to profile without them again. System.out is the worst thing you can do for finding performance bottlenecks, because in doing so you usually add an even worse bottleneck.
An obivous optimization for your code is to move the line
snsMap.put(key, list);
into the if statement. You only need to put this when you created a new list. Otherwise, the put will just replace the current value with itself.
Java cost associated with Integer objects (and in particular the use of Integers in the Java Collections API) is largely a memory (and thus Garbage Collection!) issue. You can sometimes get significant gains by using primitive collections such as GNU trove, depending how well you can adjust your code to use them efficiently. Most of the gains of Trove are in memory usage. Definitely try rewriting your code to use TIntArrayList and TIntObjectMap from GNU trove. I'd avoid linked lists, too, in particular for primitive types.
Roughly estimated, a HashMap<Integer, List<Integer>> needs at least 3*16 bytes per entry. The doubly linked list again needs at least 2*16 bytes per entry stored. 1m keys + 30m values ~ 1 GB. No overhead included yet. With GNU trove TIntObjectHash<TIntArrayList> that should be 4+4+16 bytes per key and 4 bytes per value, so 144 MB. The overhead is probably similar for both.
The reason that Trove uses less memory is because the types are specialized for primitive values such as int. They will store the int values directly, thus using 4 bytes to store each.
A Java collections HashMap consists of many objects. It roughly looks like this: there are Entry objects that point to a key and a value object each. These must be objects, because of the way generics are handled in Java. In your case, the key will be an Integer object, which uses 16 bytes (4 bytes mark, 4 bytes type, 4 bytes actual int value, 4 bytes padding) AFAIK. These are all 32 bit system estimates. So a single entry in the HashMap will probably need some 16 (entry) + 16 (Integer key) + 32 (yet empty LinkedList) bytes of memory that all need to be considered for garbage collection.
If you have lots of Integer objects, it just will take 4 times as much memory as if you could store everything using int primitives. This is the cost you pay for the clean OOP principles realized in Java.

The best way is to run your program with profiler (for example, JProfile) and see what parts are slow. Also debug output can slow your program, for example.

Hash Map is not slow, but in reality its the fastest among the maps. HashTable is the only thread safe among maps, and can be slow sometimes.
Important note: Close the BufferedReader and File after u read the data... this might help.
eg: br.close()
file.close()
Please check you system processes from task manager, there may be too may processes running in the background.
Sometimes eclipse is real resource heavy, so try to run it from console to check it.

Java BitSet which allows easy Concatenation of BitSets

I have a need for a BitSet which allows easy concatenation of multiple BitSets create a new BitSet. The default implementation doesn't have such a method.
Is there any an implementation in some external library that any of you know which allows easy concatenation?
For example lets say I have a bitarray 11111 and another bit array 010101. I want functionality like appending. So after concatenating it would result in 11111010101.

Well there's no way to implement this terribly efficient (performance and memory that is) since there's no leftshift method.
What you can do is either use the obvious nextSetBit for loop - slow, but memory efficient.
The presumably faster method would be to use toLongArray on one, copy that correctly shifted into a large enough array, create a bitset from that and or it with the other. That way you don't do any bitshifting on single bits but instead work on wordsized chunks.

This worked for me:
BitSet concatenate_vectors(BitSet vector_1_in, BitSet vector_2_in) {
BitSet vector_1_in_clone = (BitSet)vector_1_in.clone();
BitSet vector_2_in_clone = (BitSet)vector_2_in.clone();
int n = 5;//_desired length of the first (leading) vector
int index = -1;
while (index < (vector_2_in_clone.length() - 1)) {
index = vector_2_in_clone.nextSetBit((index + 1));
vector_1_in_clone.set((index + n));
}
return vector_1_in_clone;
}
Result: 11111010101

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.