I was scanning a third party source code using Findbugs (just to be cautious before integrating into it mine), and found the following warning:
long a = b << 32 | c
Bug: Integer shift by 32 Pattern id:
ICAST_BAD_SHIFT_AMOUNT, type: BSHIFT,
category: CORRECTNESS
The code performs an integer shift by
a constant amount outside the range
0..31. The effect of this is to use the lower 5 bits of the integer value
to decide how much to shift by. This
probably isn't want was expected, and
it at least confusing.
Could anyone please explain what exactly does the above mean?
Thanks!
(I am quite a newbie in Java programming)
From the Java Language Specification:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f. The shift distance actually used is therefore always in the range 0 to 31, inclusive.
So if b is an int, the expression is identical to
long a = b | c;
which I highly doubt is what is intended. It should probably have been
long a = ((long) b << 32) | c;
(If b is already a long, the code is correct and FindBugs is mistaken about the bug).
Edited: The problem almost certainly stems from the fact that 'b' is an 'int' and not a 'long'.
In C, if 'b' is an integer instead of a long and you shift left by 32 bits, all the bits from the original value have been removed, so the result of the overall expression would be the same as 'c' you would invoke undefined behaviour, so any result is permissible. Java defines things differently — as noted in the comment by Rasmus Faber and the chosen answer — and does overlong shifts modulo the maximum number of bits that can be shifted. [It seems an odd way to do business; I'd probably have arranged for an exception in a language that has them. However, it is clearly defined, which is more important than exactly what the definition is.] The coercion to 64-bits doesn't occur while the expression is evaluated; it occurs when the expression is complete and the assignment happens.
The reference to 5 bits is ... intriguing. It means that if you shift left by, say, 48, or binary 110000, it is the same as shifting left by 16. Or, alternatively, 'x << n' is the same as 'x << (n % 32)'.
Related
At specified in JLS8 at §JLS-15.19
If the promoted type of the left-hand operand is int, then only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
I am not clear about this statement in bold . An example is much appreciated.
It's Java exploiting compiler optimisations from the C and C++ worlds. For a 32 bit int, using an bit-shift argument greater than or equal to 31 will set the resulting value to 0 for a positive int. (For a negative argument the behaviour in C and C++ on shifting is implementation defined).
Whereas in C and C++, actually using a value greater than 31 for a 32 bit int is in fact undefined behaviour, the Java bods have actually defined the behaviour specifically and simply perform the shift with an argument modulo 32 (which is what the majority of C and C++ compilers actually do). This method is mentioned explicitly in the JLS snippet you've quoted.
Extracting the lowest five order bits of a number is equivalent to taking that number modulo 32.
I have two related questions:
the bitwise operator >>> means that we are shifting the binary number by those many places while filling 0 in the Most Significant Bit. But, then why does the following operation yields the same number: 5>>>32 yields 5 and -5>>>32 yields -5. Because if the above description is correct then both these operations would have yielded 0 as the final result.
In continuation to above, As per Effective Java book, we should use (int) (f ^ (f >>> 32)) (in case the field is long) while calculating the hash code (if the field is long). Why do we do that and what's the explanation
5 can be represented as 0101 if you shift it by 1 bit i.e 5>>>1 this will result as 0010=2
If the promoted type of the left-hand operand is int, only the five
lowest-order bits of the right-hand operand are used as the shift
distance. It is as if the right-hand operand were subjected to a
bitwise logical AND operator & (§15.22.1) with the mask value 0x1f.
The shift distance actually used is therefore always in the range 0 to
31, inclusive.
When you shift an integer with the << or >> operator and the shift
distance is greater than or equal to 32, you take the shift distance
mod 32 (in other words, you mask off all but the low order 5 bits of
the shift distance). This can be very counterintuitive. For example (i> >> 32) == i, for every integer i. You might expect it to shift the entire number off to the right, returning 0 for positive inputs and -1
for negative inputs, but it doesn't; it simply returns i, because (i
<< (32 & 0x1f)) == (i << 0) == i.
Answer to your first question is here why is 1>>32 == 1?
The second question answer, in short, is that in such way the whole long value is used(not a part of it) and note that it is probably the fastest way to do this.
I know this question has been answered long back, but I tried an example to get more clarification and I guess it'll others too.
long x = 3231147483648l;
System.out.println(Long.toBinaryString(x));
System.out.println(Long.toBinaryString(x >>> 32));
System.out.println(Long.toBinaryString(x ^ (x >>> 32)));
System.out.println(Long.toBinaryString((int) x ^ (x >>> 32)));
This prints -
101111000001001111011001011110001000000000
1011110000
101111000001001111011001011110000011110000
1001111011001011110000011110000
As #avrilfanomar mentions, this XORs first 32 bits of long with the other 32 bits and unsigned right shift operator helps us in doing this. Since we want to use this long field while calculating the hashcode, directly casting the long to int would mean that long fields differing only in the upper 32 bits will contribute the same value to the hashcode. This potentially means that two objects differing only in this field will have same hashcode and it'll be stored in the same bucket (with say a list to resolve collision) and this impacts the performance of hash-based collections. Hence, this operation.
I came across an interesting scenario, When working with bitwise shift operator. If the second operand is negative, how does the bitwise shift operation works? .
i.e a << b , "<<" shifts a bit pattern to the left by b bits in a. But if b is neagtive, shouldn't it be an error at runtime ?
I am able to run the below code successfully but I don't understand how it works?
public static void bitwiseleftShift(char testChar)
{
int val=testChar-'a';
int result= 1<<val;
System.out.println("bit wise shift of 1 with val="+val+" is "+result);
}
Input
bitwiseleftShift('A');// ASCII 65
bitwiseleftShift('0'); // ASCII 48
Results
bit wise shift of 1 with val=-32 is 1
bit wise shift of 1 with val=-49 is 32768
ASCII for 'a' is 97. Can someone help me understand how this works?
But if b is neagtive, shouldn't it be an error at runtime?
Not according to the Java Language Specification, section 15.19:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
So a shift of -32 actually ends up as a shift of 0, and a shift of -49 actually ends up as a shift of 15 - hence the results you saw.
I have the following division that I need to do often:
int index = pos / 64;
Division can be expensive in the cpu level. I am hoping there is a way to do that with bitwise shift. I would also like to understand how you can go from division to shift, in other words, I don't want to just memorize the bitwise expression.
int index = pos >> 6 will do it, but this is unnecessary. Any reasonable compiler will do this sort of thing for you. Certainly the Sun/Oracle compiler will.
The general rule is that i/(2^n) can be implemented with i >> n. Similarly i*(2^n) is i << n.
You need to be concerned with negative number representation if i is signed. E.g. twos-complement produces reasonable results (if right shift is arithmetic--sign bit copied). Signed magnitude does not.
The compiler will implement it for you in the most efficient way, as long you understand what you need and ask the compiler to do exactly that. If shift is the most efficient way in this case, the compiler will use shift.
Keep in mind though that if you are performing signed division (i.e pos is signed), then it cannot be fully implemented by a shift alone. Shift by itself will generate invalid results for negative values of pos. If the compiler decides to use shifts for this operations, it will also have to perform some post-shift corrections on the intermediate result to make it agree with the requirements of the language specification.
For this reason, if you are really looking for maximum possible efficiency of your division operations, you have to remember not to use signed types thoughtlessly. Prefer to use unsigned types whenever possible, and use signed types only when you have to.
P.S. AFAIK, Java implements Euclidean division, meaning that the above remarks do not apply to Java. Euclidean division is performed correctly by a shift on a negative divisor in 2's-complement representation. The above remarks would apply to C/C++.
http://www.java-samples.com/showtutorial.php?tutorialid=58
For each power of 2 you want to divide by, right shift it once. So to divide by 4 you would right shift twice. To divide by 8 right shift 3 times. Divide by 16 right shift 4 times. 32 -> 5 times. 64 -> 6 times. So to divide by 64 you can right shift 6 times. myvalue = myvalue >> 6;
This is more of a language design rather than a programming question.
The following is an excerpt from JLS 15.19 Shift Operators:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance.
If the promoted type of the left-hand operand is long, then only the six lowest-order bits of the right-hand operand are used as the shift distance.
This behavior is also specified in C#, and while I'm not sure if it's in the official spec for Javascript (if there's one), it's also true based on my own test at least.
The consequence is that the following is true:
(1 << 32) == 1
I understand that this specification is probably "inspired" by the fact that the underlying hardware only takes 5 bits for the count operand when shifting 32-bit values (and 6 bits for 64-bit), and I can understand such behavior specified at the JVM level for example, but why would high level languages such as C# and Java retain this rather low-level behavior? Shouldn't they provide a more abstract view beyond the hardware implementation and behave more intuitively? (Even better if they can take a negative count to mean to shift in the OTHER direction!)
Java and C# are not fully "high-level". They try real hard to be such that they can be compiled into efficient code, in order to shine in micro-benchmarks. This is why they have the "value types" such as int instead of having, as default integer type, true integers, which would be objects in their own right, and not limited to a fixed range.
Hence, they mimic what the hardware does. They trim it a bit, in that they mandate masking, whereas C only allows it. Still, Java and C# are "medium-level" languages.
Because in most programming environments an integer is only 32 bits. So then 5 bits (which is enough to express 32 values) is already enough to shift the entire integer. A similar reasoning exists for a 64bit long: 6 bits is all you need to completely shift the entire value.
I can understand part of the confusion: if your right-hand operand is the result of a calculation that ends up with a value greater than 32, you might expect it to just shift all the bits rather than apply a mask.
C# and Java define shifting as using only the low-order bits of the shift count as that's what both sparc and x86 shift instructions do. Java was originally implemented by Sun on sparc processors, and C# by Microsoft on x86.
In contrast, C/C++ leave as undefined the behavior of shift instructions if the shift count is not in the range 0..31 (for a 32 bit int), allowing any behavior. That's because when C was first implemented, different handware handled these differently. For example, on a VAX, shifting by a negative amount shifts the other direction. So with C, the compiler can just use the hardware shift instruction and do whatever it does.