In Effective Java there is an example of Complex class. That class has overridden hashCode which uses hashDouble method I have a question about.
private int hashDouble(double val)
{
long longBits = Double.doubleToLongBits(re);
return (int) (longBits ^ (longBits >>> 32));
}
For what purpose it does (int) (longBits ^ (longBits >>> 32))?
The double value is 64 bits wide but the int returned by hash method has only 32 bit.
In order to achieve a better distribution of hash values (compared to simply striping the upper 32 bits).
The code uses XOR to incorporate the upper 32 bits (containing sign, exponent and some bits of the mantissa) aligned by right-shifting of the IEEE 754 value in the calculation.
Image source Wikipedia
Related
I need a very fast universal hash function for a 128-bit key. The returned value needs to be about 32 bit (well, 16 bit would be sufficient; in most cases I only need 1-4 bits actually).
Universal hash means, there are two parameters: key (128 bit) and index (64 bit). For two keys, the universal hash function needs to return different result eventually, if called with different indexes. So with a different index, the universal hash should behave like a different hash function. For x = universalHash(k, i) and y = universalHash(k, i + 1), it would be best if on average 50% of all bits are different between x and y (randomly). The same for the case if the method is called with different keys. In practise, 5% off is OK for me.
It needs to be very fast (one or two multiplications at most). It is called millions of times. Please don't say: no, you won't need it to be fast. It also needs to return different values eventually.
What I have so far (Java code, but C is (due to the lack of a 128 bit data type, the key is the composite of a and b, which are 64 bit each):
int universalHash(long a, long b, long index) {
long x = a ^ Long.rotateLeft(b, (int) index) ^ index;
int y = (int) ((x >>> 32) ^ x);
y = ((y >>> 16) ^ y) * 0x45d9f3b;
y = ((y >>> 16) ^ y) * 0x45d9f3b;
y = (y >>> 16) ^ y;
return y;
}
int universalHash2(long a, long b, long index) {
long x = Long.rotateLeft(a, (int) index) ^
Long.rotateRight(b, (int) index) ^ index;
x = (x ^ (x >>> 32)) * 0xbf58476d1ce4e5b9L;
return (int) ((x >>> 32) ^ x);
}
(The second method is actually broken for some values.)
I would like to have a hash function that is faster than those above, and is guaranteed to work in all cases (if possible provably correct, even thought that's not a strict requirement; it doesn't need to be cryptographically secure however).
I will call the universalHash method with incrementing index (first index 0, then index 1, and so on) for the same keys. It would be best if the next result could be calculated faster (e.g. without multiplication) from the previous result. But I also need to have a fast "direct access" if the index is some value (as in the example code).
Background
The problem I'm trying to solve is finding a MPHF (minimal perfect hash function) for a relatively small set of keys (up to 16 keys by directly mapping, and up to about 1024 keys by splitting into smaller subsets). For details on the algorithm, see my MinPerf project, specially the RecSplit algorithm. To support set of size 10^12 (like BBHash), I'm trying to internally use 128 bit signatures, which would simplify the algorithm.
You need a hash function that outputs 32 bits for 128 bits of inputs.
A simple way would be to just return "some" 32 bits out of the original 128 bits. There are many ways of choosing 32 bits and every choice will have collisions. But the index can decide which 32 bits to choose.
128/32 = 4, so 4 indices are enough to find at least one different bit.
For key 0 you choose the lower most 32 bits
For key 1 you choose the next 32 bits
and so on ..
The C implementation would be
uint32_t universal_hash(uint64_t key_higher, uint64_t key_lower, int index) {
// For a lack of portable 128 bit datatype we take the key in parts.
return 0xFFFFFFFF & ( index >=2 ? key_higher >> ((index - 2)*32) : key_lower >> (index*32));
}
This question is a result of the responses submitted to my post at CodeReview.
I have a class called Point, which is basically "intended to encapsulate a point represented in 2D space." I have overrided the hashcode() function which is as follows:
...
#Override
public int hashCode() {
int hash = 0;
hash += (int) (Double.doubleToLongBits(this.getX())
^ (Double.doubleToLongBits(this.getX()) >>> 32));
hash += (int) (Double.doubleToLongBits(this.getY())
^ (Double.doubleToLongBits(this.getY()) >>> 32));
return hash;
}
...
Let me clarify (for those who didn't check the above link) that my Point uses the two doubles: x and y to represent its coordinates.
Problem:
My Problem is evident when I run this method:
public static void main(String[] args) {
Point p1 = Point.getCartesianPoint(12, 0);
Point p2 = Point.getCartesianPoint(0, 12);
System.out.println(p1.hashCode());
System.out.println(p2.hashCode());
}
I get the Output:
1076363264
1076363264
This is clearly a problem. Basically I intend my hashcode() to return equal hashcodes for equal Points. If I reverse the order in one of the parameter declarations (i.e. swap 12 with 1 in one of them to get equal Points), I get the correct (same) result. How can I correct my approach while maintaining the quality or uniqueness of the hash?
You cannot get an integer hash code for two doubles that is unique, without some further information about the numbers being made available about the nature of the numbers in the doubles.
Why?
int is stored as a 32bit representation, double as a 64 bit representation (see the Java tutorial).
So you are trying to store 128 bits of information in a 32 bit space, so it can never give an unique hash.
However
This really isn't the purpose of a hash code, hash codes
just need to have fairly uncommon collisions to be useful.
If you
know something about the double numbers, that reduces their
entropy/information content then you could use this to compress the
number of bits they use. This will be dependant on the application
of this class that you have not discussed yet.
This is why equals
normally does not make use of the hashcode to check for equality,
use getX and getY of each Point to do the comparison instead.
Try this
public int hashCode() {
long bits = Double.doubleToLongBits(x);
int hash = 31 + (int) (bits ^ (bits >>> 32));
bits = Double.doubleToLongBits(y);
hash = 31 * hash + (int) (bits ^ (bits >>> 32));
return hash;
}
this implementation follows Arrays.hashCode(double a[]) pattern.
It produces these hash codes:
-992476223
1076364225
You can find suggestions how to write good hashCode in Effective Java Item. 9
Can't you use the code that is present in Arrays.hashCode already?
Arrays.hashCode(new double[]{x,y});
This is what guava for example is using in Objects.hashCode.
If you have Java 7, simply:
Objects.hash(x,y)
It might be a silly idea, bt you are using + which is a symetric operation and you are getting symetric problems. What if you ue a non-symmetric operation such as division (check for denominator == 0 though) or minus? Or any other that you cna find in literature or invent yourself.
Today I was learning about the left shift bit operator (<<). As I understand it the left shift bit operator moves bits to the left as specified. And also I know multiply by 2 for shifting. But I am confused, like what exactly is the meaning of "shifting bits" and why does the output differ when value is assigned with a different type?
When I call the function below, it gives output as System.out.println("b="+b); //Output: 0
And my question is: how does b become 0 and why is b typecasted?
public void leftshiftDemo()
{
byte a=64,b;
int i;
i=a << 2;
b=(byte)(a<<2);
System.out.println("i="+i); //Output: 256 i.e 64*2^2
System.out.println("b="+b); //Output: 0 how & why b is typecasted
}
Update (new doubt):
what does it mean "If you shift a 1 bit into high-order position (Bit 31 or 63), the value will become negative". eg.
public void leftshifHighOrder()
{
int i;
int num=0xFFFFFFE;
for(i=0;i<4;i++)
{
num=num<<1;
System.out.println(num);
/*
* Output:
* 536870908
* 1073741816
* 2147483632
* -32 //how this is -ve?
*/
}
}
When integers are casted to bytes in Java, only the lowest order bits are kept:
A narrowing conversion of a signed integer to an integral type T
simply discards all but the n lowest order bits, where n is the number
of bits used to represent type T. In addition to a possible loss of
information about the magnitude of the numeric value, this may cause
the sign of the resulting value to differ from the sign of the input
value.
In this case the byte 64 has the following binary representation:
01000000
The shift operator promotes the value to int:
00000000000000000000000001000000
then left shifts it by 2 bits:
00000000000000000000000100000000
We then cast it back into a byte, so we discard all but the last 8 bits:
00000000
Thus the final byte value is 0. However, your integer keeps all the bits, so its final value is indeed 256.
In java, ints are signed. To represent that, the 2's complement is used. In this representation, any number that has its high-order bit set to 1 is negative (by definition).
Therefore, when you left-shift a 1 that is on the 31st bit (that is the one before last for an int), it becomes negative.
i = a << 2;
in memory:
load a (8 bits) into regitry R1 (32 bits)
shift registry R1 to the left two position
assign registry R1 (32 bits) to variable i (32 bits).
b = (byte)(a << 2);
in memory:
load a (8 bits) into regitry R1 (32 bits)
shift registry R1 to the left two position
assign registry R1 (32 bits) to variable b (8 bits). <- this is why cast (byte) is necessary and why they get only the last 8 bits of the shift operation
The exact meaning of shifting bits is exactly what it sounds like. :-) You shift them to the left.
0011 = 3
0011 << 1 = 0110
0110 = 6
You should read about different data types and their ranges in Java.
Let me explain in easy terms.
byte a=64,b;
int i;
i=a << 2;
b=(byte)(a<<2);
'byte' in Java is signed 2's complement integer. It can store values from -128 to 127 both inclusive. When you do this,
i = a << 2;
you are left shifting 'a' by 2 bits and the value is supposed to be 64*2*2 = 256. 'i' is of type 'int' and 'int' in Java can represent that value.
When you again left shift and typecast,
b=(byte)(a<<2);
you keep your lower 8 bits and hence the value is 0.
You can read this for different primitive types in Java.
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
Would that mean that the 100th constant would have to be 1 << 100?
You can use a BitSet which has any number bits you want to set or clear. e.g.
BitSet bitSet = new BitSet(101);
bitSet.set(100);
You can't do it directly because maximum size for a primitive number which can be used as a bitmask is actually 64 bit for a long value. What you can do is to split the bitmask into 2 or more ints or longs and then manage it by hand.
int[] mask = new int[4];
final int MAX_SHIFT = 32;
void set(int b) {
mask[b / MAX_SHIFT] |= 1 << (b % MAX_SHIFT);
}
boolean isSet(int b) {
return (mask[b / MAX_SHIFT] & (1 << (b % MAX_SHIFT))) != 0;
}
You can only create a simple bitmask with the number of bits in the primitive type.
If you have a 32 bit (as in normal Java) int then 1 << 31 is the most you can shift the low bit.
To have larger constants you use an array of int elements and you figure out which array element to use by dividing by 32 (with 32 bit int) and shift with % 32 (modula) into the selected array element.
Effective Java Item #32 suggests using an EnumSet instead of bit fields. Internally, it uses a bit vector so it is efficient, however, it becomes more readable as each bit has a descriptive name (the enum constant).
Yes, if you intend to be able to bitwise OR any or all of those constants together, then you're going to need a bit representing each constant. Of course if you use an int you will only have 32 bits and a long will only give you 64 bits.
I know Java doesn't allow unsigned types, so I was wondering how it casts an integer to a byte. Say I have an integer a with a value of 255 and I cast the integer to a byte. Is the value represented in the byte 11111111? In other words, is the value treated more as a signed 8 bit integer, or does it just directly copy the last 8 bits of the integer?
This is called a narrowing primitive conversion. According to the spec:
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
So it's the second option you listed (directly copying the last 8 bits).
I am unsure from your question whether or not you are aware of how signed integral values are represented, so just to be safe I'll point out that the byte value 1111 1111 is equal to -1 in the two's complement system (which Java uses).
int i = 255;
byte b = (byte)i;
So the value of be in hex is 0xFF but the decimal value will be -1.
int i = 0xff00;
byte b = (byte)i;
The value of b now is 0x00. This shows that java takes the last byte of the integer. ie. the last 8 bits but this is signed.
or does it just directly copy the last
8 bits of the integer
yes, this is the way this casting works
The following fragment casts an int to a byte. If the integer’s value is larger than the range of a byte, it will be reduced modulo (the remainder of an integer division by the) byte’s range.
int a;
byte b;
// …
b = (byte) a;
Just a thought on what is said: Always mask your integer when converting to bytes with 0xFF (for ints). (Assuming myInt was assigned values from 0 to 255).
e.g.
char myByte = (char)(myInt & 0xFF);
why? if myInt is bigger than 255, just typecasting to byte returns a negative value (2's complement) which you don't want.
Byte is 8 bit. 8 bit can represent 256 numbers.(2 raise to 8=256)
Now first bit is used for sign. [if positive then first bit=0, if negative first bit= 1]
let's say you want to convert integer 1099 to byte. just devide 1099 by 256. remainder is your byte representation of int
examples
1099/256 => remainder= 75
-1099/256 =>remainder=-75
2049/256 => remainder= 1
reason why? look at this image http://i.stack.imgur.com/FYwqr.png
According to my understanding, you meant
Integer i=new Integer(2);
byte b=i; //will not work
final int i=2;
byte b=i; //fine
At last
Byte b=new Byte(2);
int a=b; //fine
for (int i=0; i <= 255; i++) {
byte b = (byte) i; // cast int values 0 to 255 to corresponding byte values
int neg = b; // neg will take on values 0..127, -128, -127, ..., -1
int pos = (int) (b & 0xFF); // pos will take on values 0..255
}
The conversion of a byte that contains a value bigger than 127 (i.e,. values 0x80 through 0xFF) to an int results in sign extension of the high-order bit of the byte value (i.e., bit 0x80). To remove the 'extra' one bits, use x & 0xFF; this forces bits higher than 0x80 (i.e., bits 0x100, 0x200, 0x400, ...) to zero but leaves the lower 8 bits as is.
You can also write these; they are all equivalent:
int pos = ((int) b) & 0xFF; // convert b to int first, then strip high bits
int pos = b & 0xFF; // done as int arithmetic -- the cast is not needed
Java automatically 'promotes' integer types whose size (in # of bits) is smaller than int to an int value when doing arithmetic. This is done to provide a more deterministic result (than say C, which is less constrained in its specification).
You may want to have a look at this question on casting a 'short'.