Bitshift operators description in Java language specification - java

At specified in JLS8 at §JLS-15.19
If the promoted type of the left-hand operand is int, then only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
I am not clear about this statement in bold . An example is much appreciated.

It's Java exploiting compiler optimisations from the C and C++ worlds. For a 32 bit int, using an bit-shift argument greater than or equal to 31 will set the resulting value to 0 for a positive int. (For a negative argument the behaviour in C and C++ on shifting is implementation defined).
Whereas in C and C++, actually using a value greater than 31 for a 32 bit int is in fact undefined behaviour, the Java bods have actually defined the behaviour specifically and simply perform the shift with an argument modulo 32 (which is what the majority of C and C++ compilers actually do). This method is mentioned explicitly in the JLS snippet you've quoted.
Extracting the lowest five order bits of a number is equivalent to taking that number modulo 32.

Related

How does integer type cast behave in Java for numbers beyond the range of integers?

Here is my program.
public class Foo
{
public static void main(String[] args)
{
System.out.println((int) 2147483648l);
System.out.println((int) 2147483648f);
}
}
Here is the output.
-2147483648
2147483647
Why isn't 2147483648l and 2147483648f type cast to the same integer? Can you explain what is going on here or what concept in Java I need to understand to predict the output of type casts like these?
These are examples of the Narrowing Primitive Conversion operation.
In your first example, long to int:
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
So your (int) 2147483648l is taking the 64 bits of the long:
00000000 00000000 00000000 00000000 10000000 00000000 00000000 00000000
...and dropping the top 32 bits entirely:
10000000 00000000 00000000 00000000
...and taking the remaining 32 bits as an int. Since the leftmost of those is now a sign bit (long and int are stored as two's complement), and since it happens to be set in your 2147483648l value, you end up with a negative number. Since no other bits are set, in two's complement, that means you have the lowest negative number int can represent: -2147483648.
The float to int example follows a more complex rule. The relevant parts for your value are:
...if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3).
...[if] the value [is] too large (a positive value of large magnitude or positive infinity), [then] the result of the first step is the largest representable value of type int or long.
(But see the part of the spec linked above for the details.)
So since 2147483648f rounds to 2147483648, and 2147483648 is too large to fit in int, the largest value for int (2147483647) is used instead.
So in the long to int, it's bit fiddling; in the float to int, it's more mathematical.
In a comment you've asked:
Do you know why both (short) 32768 and (short) 32768f evaluate to -32768? I was exepecting the latter to evaluate to 32767.
Excellent question, and that's where my "see the part of the spec linked above for the details" above comes in. :-) (short) 32768f does, in effect, (short)(int)32768f:
In the spec section linked above, under "A narrowing conversion of a floating-point number to an integral type T takes two steps:", it says
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int...
and then later in Step 2's second bullet:
* If T is byte, char, or short, the result of the conversion is the result of a narrowing conversion to type T (§5.1.3) of the result of the first step.
So in step one, 32768f becomes 32768 (an int value), and then of course (short)32768 does the bit-chopping we saw in long => int above, giving us a short value of -32768.
Nice! It's wonderful to see the effects of design decisions presented in the way that you have.
2147483648l is a long type and the rule for converting a long that's too big for the int is to apply the wrap-around rule into the destination type. (Under the hood, the significant bits from the source type are simply discarded.)
2147483648f is a float type and the rule for converting a float that's too big for the destination type is to take the largest possible for the destination type. Reference Are Java integer-type primitive casts "capped" at the MAX_INT of the casting type?
The good thing about standards is that there are so many to choose from.

Bit wise shift operator with shift by negative number

I came across an interesting scenario, When working with bitwise shift operator. If the second operand is negative, how does the bitwise shift operation works? .
i.e a << b , "<<" shifts a bit pattern to the left by b bits in a. But if b is neagtive, shouldn't it be an error at runtime ?
I am able to run the below code successfully but I don't understand how it works?
public static void bitwiseleftShift(char testChar)
{
int val=testChar-'a';
int result= 1<<val;
System.out.println("bit wise shift of 1 with val="+val+" is "+result);
}
Input
bitwiseleftShift('A');// ASCII 65
bitwiseleftShift('0'); // ASCII 48
Results
bit wise shift of 1 with val=-32 is 1
bit wise shift of 1 with val=-49 is 32768
ASCII for 'a' is 97. Can someone help me understand how this works?
But if b is neagtive, shouldn't it be an error at runtime?
Not according to the Java Language Specification, section 15.19:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive.
So a shift of -32 actually ends up as a shift of 0, and a shift of -49 actually ends up as a shift of 15 - hence the results you saw.

Why does bitwise AND of two short values result in an int value in Java?

short permissions = 0755;
short requested = 0700;
short result = permissions & requested;
I get a compiler error:
error possible loss of precision
found : int
required: short
If I'm not totally wrong the result of binary AND is as long as the longest operand. Why is the result an integer?
Would there be a performance hit, if I would cast to short?
(short) permissions & requested
The short answer (hah!) is, binary numeric promotion.
If any of the operands is of a reference type, unboxing conversion (§5.1.8) is performed. Then:
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
Otherwise, if either operand is of type long, the other is converted to long.
Otherwise, both operands are converted to type int.
If I'm not totally wrong the result of
binary AND is as long as the longest
operand. Why is the result an integer?
Because the Java Language Specification says that the result of non-long integer arithmetic is always an int. It was probably written that way in acknowledgment of the fact that 32 bit CPUs work like that internally anyway - they actually don't have a way to do arithmetic with shorts.
Would there be a performance hit, if I would cast to short?
For the reason given above: no - it will have to happen anyway.
I just wanted to add that you can actually avoid the cast if you use the arithmetic assignment operators. It's not faster or slower, just something that might be nice to know.
short permissions = 0755;
short requested = 0700;
short result = permissions;
result &= requested;
Actually, I suspect that you might take a performance hit. There are only Java bytecodes for bitwise operations on int and long values. So the short values in permission and requested variables need (in theory) to be sign-extended before the operation is performed.
(And besides, I think you will find that the native bit-wise instructions are only available in 32 and 64 bit. Or if there are 8 or 16 bit versions, they will take the same number of clocks as the 32 bit version. The CPU datapaths will be at least 32 bits wide, and for and/or/xor there is no way to make narrower types work faster.)
In addition, even though the three variables have type short, the JVM will allocate the same number of bytes to store them. This is a consequence of the way that the JVM is designed.
So if your aim in using short was to save space or time, it probably won't help. But the only way to be sure is to use a profiler to compare the short and int versions of your application ... or better still, just forget about it.
The operands to the & operator will be promoted to int, and therefore the result is int, which will have to be casted to short if you want to store it in result. I don't think it should be a performance penalty unless the compiler is bad at generating code.
From http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.22.1:
When both operands of an operator &, ^, or | are of a type that is convertible (§5.1.8) to a primitive integral type, binary numeric promotion is first performed on the operands (§5.6.2).
From http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#170983:
[...] Otherwise, both operands are converted to type int.

What's the reason high-level languages like C#/Java mask the bit shift count operand?

This is more of a language design rather than a programming question.
The following is an excerpt from JLS 15.19 Shift Operators:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance.
If the promoted type of the left-hand operand is long, then only the six lowest-order bits of the right-hand operand are used as the shift distance.
This behavior is also specified in C#, and while I'm not sure if it's in the official spec for Javascript (if there's one), it's also true based on my own test at least.
The consequence is that the following is true:
(1 << 32) == 1
I understand that this specification is probably "inspired" by the fact that the underlying hardware only takes 5 bits for the count operand when shifting 32-bit values (and 6 bits for 64-bit), and I can understand such behavior specified at the JVM level for example, but why would high level languages such as C# and Java retain this rather low-level behavior? Shouldn't they provide a more abstract view beyond the hardware implementation and behave more intuitively? (Even better if they can take a negative count to mean to shift in the OTHER direction!)
Java and C# are not fully "high-level". They try real hard to be such that they can be compiled into efficient code, in order to shine in micro-benchmarks. This is why they have the "value types" such as int instead of having, as default integer type, true integers, which would be objects in their own right, and not limited to a fixed range.
Hence, they mimic what the hardware does. They trim it a bit, in that they mandate masking, whereas C only allows it. Still, Java and C# are "medium-level" languages.
Because in most programming environments an integer is only 32 bits. So then 5 bits (which is enough to express 32 values) is already enough to shift the entire integer. A similar reasoning exists for a 64bit long: 6 bits is all you need to completely shift the entire value.
I can understand part of the confusion: if your right-hand operand is the result of a calculation that ends up with a value greater than 32, you might expect it to just shift all the bits rather than apply a mask.
C# and Java define shifting as using only the low-order bits of the shift count as that's what both sparc and x86 shift instructions do. Java was originally implemented by Sun on sparc processors, and C# by Microsoft on x86.
In contrast, C/C++ leave as undefined the behavior of shift instructions if the shift count is not in the range 0..31 (for a 32 bit int), allowing any behavior. That's because when C was first implemented, different handware handled these differently. For example, on a VAX, shifting by a negative amount shifts the other direction. So with C, the compiler can just use the hardware shift instruction and do whatever it does.

Findbugs warning: Integer shift by 32 -- what does it mean?

I was scanning a third party source code using Findbugs (just to be cautious before integrating into it mine), and found the following warning:
long a = b << 32 | c
Bug: Integer shift by 32 Pattern id:
ICAST_BAD_SHIFT_AMOUNT, type: BSHIFT,
category: CORRECTNESS
The code performs an integer shift by
a constant amount outside the range
0..31. The effect of this is to use the lower 5 bits of the integer value
to decide how much to shift by. This
probably isn't want was expected, and
it at least confusing.
Could anyone please explain what exactly does the above mean?
Thanks!
(I am quite a newbie in Java programming)
From the Java Language Specification:
If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (§15.22.1) with the mask value 0x1f. The shift distance actually used is therefore always in the range 0 to 31, inclusive.
So if b is an int, the expression is identical to
long a = b | c;
which I highly doubt is what is intended. It should probably have been
long a = ((long) b << 32) | c;
(If b is already a long, the code is correct and FindBugs is mistaken about the bug).
Edited: The problem almost certainly stems from the fact that 'b' is an 'int' and not a 'long'.
In C, if 'b' is an integer instead of a long and you shift left by 32 bits, all the bits from the original value have been removed, so the result of the overall expression would be the same as 'c' you would invoke undefined behaviour, so any result is permissible. Java defines things differently — as noted in the comment by Rasmus Faber and the chosen answer — and does overlong shifts modulo the maximum number of bits that can be shifted. [It seems an odd way to do business; I'd probably have arranged for an exception in a language that has them. However, it is clearly defined, which is more important than exactly what the definition is.] The coercion to 64-bits doesn't occur while the expression is evaluated; it occurs when the expression is complete and the assignment happens.
The reference to 5 bits is ... intriguing. It means that if you shift left by, say, 48, or binary 110000, it is the same as shifting left by 16. Or, alternatively, 'x << n' is the same as 'x << (n % 32)'.

Categories

Resources