What are the assumption under default Long.hashCode implementation in Java? [duplicate]

What are the assumption under default Long.hashCode implementation in Java? [duplicate] - java

I was wondering if someone could explain in detail what
(int)(l ^ (l >>> 32));
does in the following hashcode implementation (generated by eclipse, but the same as Effective Java):
private int i;
private char c;
private boolean b;
private short s;
private long l;
private double d;
private float f;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + s;
result = prime * result + (b ? 1231 : 1237);
result = prime * result + c;
long t = Double.doubleToLongBits(d);
result = prime * result + (int) (t ^ (t >>> 32));
result = prime * result + Float.floatToIntBits(f);
result = prime * result + (int) (l ^ (l >>> 32));
return result;
}
Thanks!

Basically it XORs the top 32 bits of a long with the bottom 32 bits. Here's an exploded version:
// Unsigned shift by 32 bits, so top 32 bits of topBits will be 0,
// bottom 32 bits of topBits will be the top 32 bits of l
long topBits = l >>> 32;
// XOR topBits with l; the top 32 bits will effectively be left
// alone, but that doesn't matter because of the next step. The
// bottom 32 bits will be the XOR of the top and bottom 32 bits of l
long xor = l ^ topBits;
// Convert the long to an int - this basically ditches the top 32 bits
int hash = (int) xor;
To answer your comment: you have a long value which has to be converted into an int to be part of the hash (the result has to only be 32 bits). How are you going to do that? You could just take the bottom 32 bits - but then that means changes in only the top 32 bits would be ignored, which wouldn't make it a very good hash. This way, a change in a single bit of input always results in a change of a single bit of the hash. Admittedly you can still get collisions easily - change both bits 7 and 39, for example, or any other pair of bits 32 positions apart - but that's bound to be the case, given that you're going from 264 possible values to 232.

It takes a 64 bit number, splits it half, and xors the two halves together (essentially).

It takes a (64-bit) long l, exclusive-or's the top and bottom halves (of 32 bits each) into the bottom 32 bits of a 64-bit results, then takes only the bottom 32 bits with the (int) cast.

Related

Encoding a set of three integers to one unique number

So, my problem set is very simple. I am working with a set of three integers randomly selected from [0-65535] and my Job is to encode this integers into one unique number. Here is what I have tried so far
I have written a java function called pack to try and encode this numbers as follows
private long pack(long a, long b, long c) {
int N = 65535, M = 65536;
return (a + (b * N) + c * N * M);
}
And I have also written another java function to unpack or decode the packed number back to the original integers as follows
private long[] unpack(long packed) {
int N = 65535, M = 65536;
long a = (packed % N);
long b = (packed / N) % M;
long c = (packed % (N * M));
return new long[]{a, b, c};
}
Now when I ran the code above in my main function using sample data {67, 8192, 7168} I am getting the following as result in my console output
Packing 67, 8192, 7168
Result=30786392678467
UnPacking 30786392678467
Result=[67, 8192, 57411]
From the above, clearly my first and second values are always correct but the last value always appear to be wrong. What am I possibly missing out.Your help is greatly appreciated. Thanks alot.

I'm going to give you an alternative solution now, and then I can try to debug your current solution when I'm on a PC instead of a phone (rgettman beat me!).
Because each of the three numbers can be a maximum of 65535, that means that each number will fit into 16 bits. For that reason, you can simply build a unique long with the following:
long encoded = (a << 32L) | (b << 16) | c;
And decoding it would look like the following:
long a = (encoded >> 32) & 0xFFFFL;
long b = (encoded >> 16) & 0xFFFFL;
long c = encoded & 0xFFFFL;

Your packing and unpacking code is incorrect according to the range [0, 65535] you've given.
There are 65,536 possible numbers, and you don't want the encoding of one integer to change the encoding of another integer. You should use one constant set to 65536 (which is 216).
public static final long PACK = 65536;
Then your pack method changes slightly to:
private long pack(long a, long b, long c) {
return (a + (b * PACK) + c * PACK * PACK);
}
This "packs" a into the least significant 16 bits of the long (bits 49-64), b into bits 33-48, and c into bits 17-32. (Nothing is packed into bits 0-16, so those bits remain cleared.)
Also, your unpack method changes to:
private static long[] unpack(long packed) {
long a = (packed % PACK);
long b = (packed / PACK) % PACK;
long c = (packed / (PACK * PACK)); // Use / not %.
return new long[]{a, b, c};
}
Notice that c's operation divides by PACK squared, not using the % operator, but using /. Otherwise both M and N have each been replaced by PACK.
Output with these changes:
Packing 67, 8192, 168
Result=722091376707
UnPacking 722091376707
Result=[67, 8192, 168]

actually, your solution is almost correct: just make sure that M == N == 65536 and fix the problem in unpacking variable c.
private long pack(long a, long b, long c) {
long N = 65536;
return (a + (b * N) + c * N * N);
}
private long[] unpack(long packed) {
long N = 65536;
long a = (packed % N);
long b = (packed / N) % N;
long c = (packed / (N * N));
return new long[]{a, b, c};
}
Also, I changed the type of N to long although it would not matter as Java will convert it to long during multiplication anyway.

Java Random.nextInt(int) return the same value when set different seeds

I wrote a demo to test java.util.Random and I want to produce a repeating list of the same 5 numbers, but I get the same value when set different seeds.In my program, seeds range from 0 to 4. As far as I know, different seeds produce different values and the same seed get the same value. So I think the result will be a repeating list of the same 5 numbers. But the actual values output are all the same. What's wrong with my code? Could anyone tell me?
import java.util.Random;
public class Main {
public Main() {
}
public static void main(String[] args) {
for (int i = 0 ; i <= 255; i++)
{
String hex = Integer.toHexString(randInt(0, 255, i % 5));
System.out.println(hex);
}
}
private static Random rand = new Random();
public static int randInt(int min, int max, long seed) {
rand.setSeed(seed);
System.out.println("seed:" + seed);
int randomNum = rand.nextInt((max - min) + 1) + min;
return randomNum;
}
}
The result is :
seed:0
bb
seed:1
bb
seed:2
bb
seed:3
bb
seed:4
bb
seed:0
bb
seed:1
bb
seed:2
bb
seed:3
bb
seed:4
bb
seed:0
bb
seed:1
bb
seed:2
bb
seed:3
bb
seed:4
bb
seed:0
bb
seed:1
...
...
...

As far as I know, different seeds produce different values
This is incorrect, different seeds may produce different values, they can also produce the same values.
There are 2^64 possible seeds and rand.nextInt(256) can only return 256 different values so many of the seeds must return the same value.
Also the setSeed javadoc states
The implementation of setSeed by class Random happens to use only 48
bits of the given seed
So if your seed differs in only the ignored bits all of the values will be the same.

I have found this implementation on grepcode, there's an if statement to detect if n is a power of 2. If n (i.e. the bound) is a power of 2 (int)((n * (long)next(31)) >> 31); is used.
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
I don't know if this implementation is used in your JDK but this suggests that power of 2 bounds are treated differently.
The algorithm treats the case where n is a power of two specially: it returns the correct number of high-order bits from the underlying pseudo-random number generator. In the absence of special treatment, the correct number of low-order bits would be returned. Linear congruential pseudo-random number generators such as the one implemented by this class are known to have short periods in the sequence of values of their low-order bits. Thus, this special case greatly increases the length of the sequence of values returned by successive calls to this method if n is a small power of two.

What does '<< ' mean ? And what this code mean?

I don't understand what is this doCalculatePi means or does, in the following example:
public static double doCalculatePi(final int sliceNr) {
final int from = sliceNr * 10;
final int to = from + 10;
final int c = (to << 1) + 1;
double acc = 0;
for (int a = 4 - ((from & 1) << 3), b = (from << 1) + 1; b < c; a = -a, b += 2) {
acc += ((double) a) / b;
}
return acc;
}
public static void main(String args[]){
System.out.println(doCalculatePi(1));
System.out.println(doCalculatePi(2));
System.out.println(doCalculatePi(3));
System.out.println(doCalculatePi(4));
System.out.println(doCalculatePi(10));
System.out.println(doCalculatePi(100));
}
I have printed the values to understand what the results are but I still have no clue what this code calculates. The conditions inside the loop are not clear.

<< means left shift operation, which shifts the left-hand operand left by the number of bits specified by the right-hand operand (See oracle docs).
Say, you have a decimal value, 5 which binary representation is 101
Now for simplicity, consider,
byte a = (byte)0x05;
Hence, the bit representation of a will be,
a = 00000101 // 1 byte is 8 bit
Now if you left shift a by 2, then a will be
a << 2
a = 00010100 //shifted place filled with zero/s
So, you may now understand that, left shift a by 3 means
a << 3
a = 00101000
For better understanding you need to study Bitwise operation.
Note, you are using int instead of byte, and by default, the int data type is a 32-bit signed integer (reference here), so you have to consider,
int a = 5;
in binary
a << 3
a = 00000000 00000000 00000000 00101000 // total 32 bit

My guess is that it approximates PI with
PI = doCalculatePi(0)+doCalculatePi(1)+doCalculatePi(2)+...
Just a guess.
Trying this
double d = 0;
for(int k = 0; k<1000; k++) {
System.out.println(d += doCalculatePi(k));
}
gives me
3.0418396189294032
3.09162380666784
3.1082685666989476
[...]
3.1414924531892394
3.14149255348994
3.1414926535900394

<< is the Bitshift operator.
Basically, every number is represented as a series of binary digits (0's and 1's), and you're shifting each of those digits to the left by however many places you indicate. So for example, 15 is 00001111 and 15 << 1 is 00011110 (or 30), while 15 << 2 is (00111100) which is 60.
There's some special handling that comes into play when you get to the sign bit, but you should get the point.

Get bits of a given range more effective?

It's been a while, that I did bit manipulations and I'm not sure if this can be done in a more effective way.
What I want is to get bits of a specific range from a value.
Let's say the binary of the value is: 0b1101101
Now I want to get a 4-bit range from the 2nd to the 5th bit of this value in it's two's complement.
The range I wanna get: 0b1011
Value in Two's complement: -5
This is the code I have, with some thoughts what I'm doing:
public int bitRange(int value, int from, int to) {
// cut the least significant bits
value = value >> from;
// create the mask
int mask = 0;
for (int i = from; i <= to; i++) {
mask = (mask << 1) + 1;
}
// extract the bits
value = value & mask;
// needed to check the MSB of the range
int msb = 1 << (to - from);
// if MSB is 1, XOR and inverse it
if ((value & msb) == msb ) {
value = value ^ mask;
value = ~value;
}
return value;
}
Now I would like to know if this can be done more effective? Especially the creation of the dynamic mask and the check of the MSB of the range, to be able to convert the bit range. Another point is, as user3344003 pointed out correctly, if the range would be 1 bit, the output would be -1. I'm sure there is a possible improvement.

For your mask, you could go something like
int mask = 0xffffffff >> 32-(to-from);
Though the chance of that exact code being correct is small. Probably off by one, edge issues, sign problems. But it's on the right track?

Here's your mask:
int mask = 0xffffffff >>> 32 - (to - from + 1);
You have to use >>> due to sign bit is 1.
Another solution could be to store the possible masks which can be 31 values at the most:
private static int[] MASKS = new int[31];
static {
MASKS[0] = 1;
for (int i = 1; i < MASKS.length; i++)
MASKS[i] = (MASKS[i - 1] << 1) + 1;
}
And using this your mask:
int mask = MASKS[to - from];
You can do the same for the msb mask, just store the possible values in a static array, and you don't have to calculate it in your method.

Disclaimer: I'm more of a C or C++ programmer, and I know there are some subtles between the bitwise operators in the different languages. But it seems to me that this can be done in one line as follows by taking advantage of the arithmetic shift that will result when shifting a negative value to the right, where one's will be shifted in for the sign extension.
public int bitRange(int value, int from, int to) {
int waste = 31 - to;
return (value << waste) >> (waste + from);
}
breakdown:
int a = 31 - to; // the number of bits to throw away on the left
int b = value << a; // shift the bits to throw away off the left of the value
int c = a + from; // the number of bits that now need to be thrown away on the right
int d = b >> c; // throw bits away on the right, and extend the sign on the left
return d;

Inverse function of Java's Random function

Java's Random function takes a seed and produces the a sequence of 'psuedo-random' numbers.
(It is implemented based on some algorithm discussed in Donald Knuth, The Art of Computer Programming, Volume 3, Section 3.2.1.), but the article is too technical for me to understand)
Is there an inverse function of it?
That is, given a sequence of numbers, would it be possible to mathematically determine what the seed would be?
(, which means, brute-forcing doesn't count as a valid method)
[Edit]
There seems to be quite a number of comments here... I thought I'd clarify what I am looking for.
So for instance, the function y = f(x) = 3x has an inverse function, which is y = g(x) = x/3.
But the function z = f(x, y) = x * y does not have an inverse function, because (I could give a full mathematical proof here, but I don't want to sidetrack my main question), intuitively speaking, there are more than one pair of (x, y) such that (x * y) == z.
Now back to my question, if you say the function is not inversible, please explain why.
(And I am hoping to get answers from those who have really read to article and understand it. Answers like "It's just not possible" aren't really helping)

If we're talking about the Oracle (née Sun) implementation of java.util.Random, then yes, it is possible once you know enough bits.
Random uses a 48-bit seed and a linear congruential generator. These are not cryptographically safe generators, because of the tiny state size (bruteforceable!) and the fact that the output just isn't that random (many generators will exhibit small cycle length in certain bits, meaning that those bits can be easily predicted even if the other bits seem random).
Random's seed update is as follows:
nextseed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1)
This is a very simple function, and it can be inverted if you know all the bits of the seed by calculating
seed = ((nextseed - 0xBL) * 0xdfe05bcb1365L) & ((1L << 48) - 1)
since 0x5DEECE66DL * 0xdfe05bcb1365L = 1 mod 248. With this, a single seed value at any point in time suffices to recover all past and future seeds.
Random has no functions that reveal the whole seed, though, so we'll have to be a bit clever.
Now, obviously, with a 48-bit seed, you have to observe at least 48 bits of output or you clearly don't have an injective (and thus invertible) function to work with. We're in luck: nextLong returns ((long)(next(32)) << 32) + next(32);, so it produces 64 bits of output (more than we need). Indeed, we could probably make do with nextDouble (which produces 53 bits), or just repeated calls of any other function. Note that these functions cannot output more than 248 unique values because of the seed's limited size (hence, for example, there are 264-248 longs that nextLong will never produce).
Let's specifically look at nextLong. It returns a number (a << 32) + b where a and b are both 32-bit quantities. Let s be the seed before nextLong is called. Then, let t = s * 0x5DEECE66DL + 0xBL, so that a is the high 32 bits of t, and let u = t * 0x5DEECE66DL + 0xBL so that b is the high 32 bits of u. Let c and d be the low 16 bits of t and u respectively.
Note that since c and d are 16-bit quantities, we can just bruteforce them (since we only need one) and be done with it. That's pretty cheap, since 216 is only 65536 -- tiny for a computer. But let's be a bit more clever and see if there's a faster way.
We have (b << 16) + d = ((a << 16) + c) * 0x5DEECE66DL + 11. Thus, doing some algebra, we obtain (b << 16) - 11 - (a << 16)*0x5DEECE66DL = c*0x5DEECE66DL - d, mod 248. Since c and d are both 16-bit quantities, c*0x5DEECE66DL has at most 51 bits. This usefully means that
(b << 16) - 11 - (a << 16)*0x5DEECE66DL + (k<<48)
is equal to c*0x5DEECE66DL - d for some k at most 6. (There are more sophisticated ways to compute c and d, but because the bound on k is so tiny, it's easier to just bruteforce).
We can just test all the possible values for k until we get a value whos negated remainder mod 0x5DEECE66DL is 16 bits (mod 248 again), so that we recover the lower 16 bits of both t and u. At that point, we have a full seed, so we can either find future seeds using the first equation, or past seeds using the second equation.
Code demonstrating the approach:
import java.util.Random;
public class randhack {
public static long calcSeed(long nextLong) {
final long x = 0x5DEECE66DL;
final long xinv = 0xdfe05bcb1365L;
final long y = 0xBL;
final long mask = ((1L << 48)-1);
long a = nextLong >>> 32;
long b = nextLong & ((1L<<32)-1);
if((b & 0x80000000) != 0)
a++; // b had a sign bit, so we need to restore a
long q = ((b << 16) - y - (a << 16)*x) & mask;
for(long k=0; k<=5; k++) {
long rem = (x - (q + (k<<48))) % x;
long d = (rem + x)%x; // force positive
if(d < 65536) {
long c = ((q + d) * xinv) & mask;
if(c < 65536) {
return ((((a << 16) + c) - y) * xinv) & mask;
}
}
}
throw new RuntimeException("Failed!!");
}
public static void main(String[] args) {
Random r = new Random();
long next = r.nextLong();
System.out.println("Next long value: " + next);
long seed = calcSeed(next);
System.out.println("Seed " + seed);
// setSeed mangles the input, so demangle it here to get the right output
Random r2 = new Random((seed ^ 0x5DEECE66DL) & ((1L << 48)-1));
System.out.println("Next long value from seed: " + r2.nextLong());
}
}

I normally wouldn't just link articles... But I found a site where someone looks into this in some depth and thought it was worth posting. http://jazzy.id.au/default/2010/09/20/cracking_random_number_generators_part_1.html
It seems that you can calculate a seed this way:
seed = (seed * multiplier + addend) mod (2 ^ precision)
where multiplier is 25214903917, addend is 11, and precision is 48 (bits). You can't calculate what the seed was with only 1 number, but you can with 2.
EDIT: As nhahtdh said there's a part 2 where he delves into more of the math behind the seeds.

I would like to present an implementation to reverse a sequence of integers generated by nextInt().
The program will brute force on the lower 16-bit discarded by nextInt(), use the algorithm provided in the blog by James Roper to find previous seed, then check that upper 32 bit of the 48-bit seed are the same as the previous number. We need at least 2 integers to derive the previous seed. Otherwise, there will be 216 possibilities for the previous seed, and all of them are equally valid until we have at least one more number.
It can be extended for nextLong() easily, and 1 long number is enough to find the seed, since we have 2 pieces of upper 32-bit of the seed in one long, due to the way it is generated.
Note that there are cases where the result is not the same as what you set as secret seed in the SEED variable. If the number you set as secret seed occupies more than 48-bit (which is the number of bits used for generating random numbers internally), then the upper 16 bits of 64 bit of long will be removed in the setSeed() method. In such cases, the result returned will not be the same as what you have set initially, it is likely that the lower 48-bit will be the same.
I would like to give most the credit to James Roper, the author of this blog article which makes the sample code below possible:
import java.util.Random;
import java.util.Arrays;
class TestRandomReverse {
// The secret seed that we want to find
private static long SEED = 782634283105L;
// Number of random numbers to be generated
private static int NUM_GEN = 5;
private static int[] genNum(long seed) {
Random rand = new Random(seed);
int arr[] = new int[NUM_GEN];
for (int i = 0; i < arr.length; i++) {
arr[i] = rand.nextInt();
}
return arr;
}
public static void main(String args[]) {
int arr[] = genNum(SEED);
System.out.println(Arrays.toString(arr));
Long result = reverse(arr);
if (result != null) {
System.out.println(Arrays.toString(genNum(result)));
} else {
System.out.println("Seed not found");
}
}
private static long combine(int rand, int suffix) {
return (unsignedIntToLong(rand) << 16) | (suffix & ((1L << 16) - 1));
}
private static long unsignedIntToLong(int num) {
return num & ((1L << 32) - 1);
}
// This function finds the seed of a sequence of integer,
// generated by nextInt()
// Can be easily modified to find the seed of a sequence
// of long, generated by nextLong()
private static Long reverse(int arr[]) {
// Need at least 2 numbers.
assert (arr.length > 1);
int end = arr.length - 1;
// Brute force lower 16 bits, then compare
// upper 32 bit of the previous seed generated
// to the previous number.
for (int i = 0; i < (1 << 16); i++) {
long candidateSeed = combine(arr[end], i);
long previousSeed = getPreviousSeed(candidateSeed);
if ((previousSeed >>> 16) == unsignedIntToLong(arr[end - 1])) {
System.out.println("Testing seed: " +
previousSeed + " --> " + candidateSeed);
for (int j = end - 1; j >= 0; j--) {
candidateSeed = previousSeed;
previousSeed = getPreviousSeed(candidateSeed);
if (j > 0 &&
(previousSeed >>> 16) == unsignedIntToLong(arr[j - 1])) {
System.out.println("Verifying: " +
previousSeed + " --> " + candidateSeed);
} else if (j == 0) {
// The XOR is done when the seed is set, need to reverse it
System.out.println("Seed found: " + (previousSeed ^ MULTIPLIER));
return previousSeed ^ MULTIPLIER;
} else {
System.out.println("Failed");
break;
}
}
}
}
return null;
}
private static long ADDEND = 0xBL;
private static long MULTIPLIER = 0x5DEECE66DL;
// Credit to James Roper
// http://jazzy.id.au/default/2010/09/21/cracking_random_number_generators_part_2.html
private static long getPreviousSeed(long currentSeed) {
long seed = currentSeed;
// reverse the addend from the seed
seed -= ADDEND; // reverse the addend
long result = 0;
// iterate through the seeds bits
for (int i = 0; i < 48; i++)
{
long mask = 1L << i;
// find the next bit
long bit = seed & mask;
// add it to the result
result |= bit;
if (bit == mask)
{
// if the bit was 1, subtract its effects from the seed
seed -= MULTIPLIER << i;
}
}
return result & ((1L << 48) - 1);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What are the assumption under default Long.hashCode implementation in Java? [duplicate] - java

It takes a 64 bit number, splits it half, and xors the two halves together (essentially).

It takes a (64-bit) long l, exclusive-or's the top and bottom halves (of 32 bits each) into the bottom 32 bits of a 64-bit results, then takes only the bottom 32 bits with the (int) cast.

Related

Encoding a set of three integers to one unique number

Java Random.nextInt(int) return the same value when set different seeds

What does '<< ' mean ? And what this code mean?

Get bits of a given range more effective?

Inverse function of Java's Random function

Categories

Resources