Packing bytes into a long with |= is giving unexpected results

Packing bytes into a long with |= is giving unexpected results - java

I am trying to concatenate my byte[] data into a long variable. But for some reason, the code is not working as I expected.
I have this byte array which maximum size will be 8 bytes which are 64 bits, the same size a Long variable has so I am trying to concatenate this array into the long variable.
public static void main(String[] args) {
// TODO Auto-generated method stub
byte[] data = new byte[]{
(byte)0xD4,(byte)0x11,(byte)0x92,(byte)0x55,(byte)0xBC,(byte)0xF9
};
Long l = 0l;
for (int i =0; i<6; i++){
l |= data[i];
l <<=8;
String lon = String.format("%064d", new BigInteger(Long.toBinaryString((long)l)));
System.out.println(lon);
}
}
The results are:
1111111111111111111111111111111111111111111111111101010000000000
1111111111111111111111111111111111111111110101000001000100000000
1111111111111111111111111111111111111111111111111001001000000000
1111111111111111111111111111111111111111100100100101010100000000
1111111111111111111111111111111111111111111111111011110000000000
1111111111111111111111111111111111111111111111111111100100000000
When the final result should be something like
111111111111111110101000001000110010010010101011011110011111001
which is 0xD4,0x11,0x92,0x55,0xBC,0xF9

byte in Java is signed, and when you do long |= byte, the byte's value is promoted and the sign bit is extended, which essentially sets all those higher bits to 1 if the byte was a negative value.
You can do this instead:
l |= (data[i] & 255)
To force it into an int and kill the sign before it's then promoted to a long. Here is an example of this happening.
Details
Prerequisite: If the term "sign bit" does not make sense to you, then you must read What is “2's Complement”? first. I will not explain it here.
Consider:
byte b = (byte)0xB5;
long n = 0l;
n |= b; // analogous to your l |= data[i]
Note that n |= b is exactly equivalent to n = n | b (JLS 15.26.2) so we'll look at that.
So first n | b must be evaluated. But, n and b are different types.
According to JLS 15.22.1:
When both operands of an operator &, ^, or | are of a type that is convertible (§5.1.8) to a primitive integral type, binary numeric promotion is first performed on the operands (§5.6.2).
Both operands are convertible to primitive integral types, so we consult 5.6.2 to see what happens next. The relevant rules here are:
Widening primitive conversion (§5.1.2) is applied to convert either or both operands as specified by the following rules:
...
Otherwise, if either operand is of type long, the other is converted to long.
...
Ok, well, n is long, so according to this b must be now be converted to long using the rules specified in 5.1.2. The relevant rule there is:
A widening conversion of a signed integer value to an integral type T simply sign-extends the two's-complement representation of the integer value to fill the wider format.
Well byte is a signed integer value and its being converted to a long, so according to this the sign bit (highest bit) is simply extended to the left to fill the space. So this is what happens in our example (imagine 64 bits here I'm just saving space):
b = (byte)0xB5 10110101
b widened to long 111 ... 1111111110110101
n 000 ... 0000000000000000
n | b 111 ... 1111111110110101
And so n | b evaluates to 0xFFFFFFFFFFFFFFB5, not 0x00000000000000B5. That is, when that sign bit is extended and the OR operation is applied, you've got all those 1's there essentially overwriting all of the bits from the previous bytes you've OR'd in, and your final results, then, are incorrect.
It's all the result of byte being signed and Java requiring long | byte to be converted to long | long prior to performing the calculation.
If you're unclear on the implicit conversions happening here, here is the explicit version:
n = n | (long)b;
Details of workaround
So now consider the "workaround":
byte b = (byte)0xB5;
long n = 0l;
n |= (b & 255);
So here, we evaluate b & 255 first.
So from JLS 3.10.1 we see that the literal 255 is of type int.
This leaves us with byte & int. The rules are about the same as above although we invoke a slightly different case from 5.6.2:
Otherwise, both operands are converted to type int.
So as per those rules byte must be converted to an int first. So in this case we have:
(byte)0xB5 10110101
promote to int 11111111111111111111111110110101 (sign extended)
255 00000000000000000000000011111111
& 00000000000000000000000010110101
And the result is an int, which is signed, but as you can see, now its a positive number and its sign bit is 0.
Then the next step is to evaluate n | the byte we just converted. So again as per the above rules the new int is widened to a long, sign bit extended, but this time:
b & 255 00000000000000000000000010110101
convert to long 000 ... 0000000000000000000000000010110101
n 000 ... 0000000000000000000000000000000000
n | (b & 255) 000 ... 0000000000000000000000000010110101
And now we get the intended value.
The workaround works by converting b to an int as an intermediate step and setting the high 24 bits to 0, thus letting us convert that to a long without the original sign bit getting in the way.
If you're unclear on the implicit conversions happening here, here is the explicit version:
n = n | (long)((int)b & 255);
Other stuff
And also like maraca mentions in comments, swap the first two lines in your loop, otherwise you end up shifting the whole thing 8 bits too far to the left at the end (that's why your low 8 bits are zero).
Also I notice that your expected final result is padded with leading 1s. If that's what you want at the end you can start with -1L instead of 0L (in addition to the other fixes).

Related

Why Type Casting of larger variable to smaller variable results in modulo of larger variable by the range of smaller variable

Recently,while I was going through typecasting concept in java, I have seen that type casting of larger variable to smaller variable results in the modulo of larger variable by the range of smaller variable.Can anyone please explain this in detail why this is the case and is it true for any explicit type conversion?.
class conversion {
public static void main(String args[])
{
double a = 295.04;
int b = 300;
byte c = (byte) a;
byte d = (byte) b;
System.out.println(c + " " + d);
}
}
The above code gives the answer of d as 44 since 300 modulo 256 is 44.Please explain why this is the case and also what happens to the value of c?

It is a design decision that goes all the way back the the C programming language and possibly to C's antecedents too.
What happens when you convert from a larger integer type to a smaller integer type is that the top bits are lopped off.
Why? Originally (and currently) because that is what hardware integer instructions support.
The "other" logical way to do this (i.e. NOT the way that Java defines integer narrowing) would be to convert to that largest (or smallest) value representable in the smaller type; e.g.
// equivalent to real thin in real java
// b = (byte) (Math.max(Math.min(i, 127), -128))
would give +127 as the value of b. Incidentally, this is what happens when you convert a floating-point value to an integer value, and the value is too large. That is what is happening in your c example.
You also said:
The above code gives the answer of d as 44 since 300 modulo 256 is 44.
In fact, the correct calculation would be:
int d = ((b + 128) % 256) - 128;
That is because the range of the Java byte type is -128 to +127.
For completeness, the above behavior only happens in Java when the larger type is an integer type. If the larger type is a floating point type and the smaller one is an integer type, then a source value that is too large or too small (or an infinity) gets converted to the largest or smallest possible integer value for the target type; e.g.
double x = 1.0e200;
int i = (int) x; // assigns 'Integer.MAX_VALUE' to 'i'
And a NaN is converted to zero.
Reference:
Java 17 Language Specification: §5.1.3

Java, combining two integers to long results negative number

I am trying to combining two integers to a long in Java. Here is the code I am using:
Long combinedValue = (long) a << 32 | b;
When a = 0x03 and b = 0x1B56 ED23, I am able to get the expected value (combinedValue = 13343583523 in long).
However, when my a = 0x00 and b = 0xA2BF E1C7, I get a negative value, -1567628857, instead of 2730484167. Can anyone explain why shifting an integer 0 by 32 bits causes the first 32 bits become 0xFFFF FFFF?
Thanks

b is negative, too. That's what that constant means. What you probably want is ((long) a << 32) | (b & 0xFFFFFFFFL).

When you OR (long) a << 32 with b, if b is an int then it will be promoted to a long because the operation must be done between two values of the same type. This is called a widening conversion.
When this conversion from int to long happens, b will be sign extended, meaning that if the top bit is set then it will be copied into the top 32 bits of the 64 bit long value. This is what causes the top 32 bits to be 0xffffffff.

confusing in type conversion and giving unexpected result

public class Multicast {
public static void main(String[] args) {
System.out.println((int) (char) (byte) -2);
}
}
I am confuse about the type conversion and also giving unexpected result (it prints 65534, not -2 as expected).

The confusing result is due to the fact that char is an unsigned type. When -2 is converted to char, its bit representation becomes 1111111111111110 in binary, because two's complement representation is used. That representation becomes a positive 65534 when converted to decimal - the result you see printed by your program.

In Java, char is an unsigned type (and byte is a signed type). So (char)-2 is 65534. Then you widen that to be an int.

your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.
char a = '\uffff';
byte b = (byte)a; // b = 0xFF
this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF
char c = (char)b; // c = 0xFFFF
Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF
int d = (int)c; // d = 0x0000FFFF
Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.

How to Convert Int to Unsigned Byte and Back

I need to convert a number into an unsigned byte. The number is always less than or equal to 255, and so it will fit in one byte.
I also need to convert that byte back into that number. How would I do that in Java? I've tried several ways and none work. Here's what I'm trying to do now:
int size = 5;
// Convert size int to binary
String sizeStr = Integer.toString(size);
byte binaryByte = Byte.valueOf(sizeStr);
and now to convert that byte back into the number:
Byte test = new Byte(binaryByte);
int msgSize = test.intValue();
Clearly, this does not work. For some reason, it always converts the number into 65. Any suggestions?

A byte is always signed in Java. You may get its unsigned value by binary-anding it with 0xFF, though:
int i = 234;
byte b = (byte) i;
System.out.println(b); // -22
int i2 = b & 0xFF;
System.out.println(i2); // 234

Java 8 provides Byte.toUnsignedInt to convert byte to int by unsigned conversion. In Oracle's JDK this is simply implemented as return ((int) x) & 0xff; because HotSpot already understands how to optimize this pattern, but it could be intrinsified on other VMs. More importantly, no prior knowledge is needed to understand what a call to toUnsignedInt(foo) does.
In total, Java 8 provides methods to convert byte and short to unsigned int and long, and int to unsigned long. A method to convert byte to unsigned short was deliberately omitted because the JVM only provides arithmetic on int and long anyway.
To convert an int back to a byte, just use a cast: (byte)someInt. The resulting narrowing primitive conversion will discard all but the last 8 bits.

If you just need to convert an expected 8-bit value from a signed int to an unsigned value, you can use simple bit shifting:
int signed = -119; // 11111111 11111111 11111111 10001001
/**
* Use unsigned right shift operator to drop unset bits in positions 8-31
*/
int psuedoUnsigned = (signed << 24) >>> 24; // 00000000 00000000 00000000 10001001 -> 137 base 10
/**
* Convert back to signed by using the sign-extension properties of the right shift operator
*/
int backToSigned = (psuedoUnsigned << 24) >> 24; // back to original bit pattern
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
If using something other than int as the base type, you'll obviously need to adjust the shift amount: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
Also, bear in mind that you can't use byte type, doing so will result in a signed value as mentioned by other answerers. The smallest primitive type you could use to represent an 8-bit unsigned value would be a short.

Except char, every other numerical data type in Java are signed.
As said in a previous answer, you can get the unsigned value by performing an and operation with 0xFF. In this answer, I'm going to explain how it happens.
int i = 234;
byte b = (byte) i;
System.out.println(b); // -22
int i2 = b & 0xFF;
// This is like casting b to int and perform and operation with 0xFF
System.out.println(i2); // 234
If your machine is 32-bit, then the int data type needs 32-bits to store values. byte needs only 8-bits.
The int variable i is represented in the memory as follows (as a 32-bit integer).
0{24}11101010
Then the byte variable b is represented as:
11101010
As bytes are signed, this value represent -22. (Search for 2's complement to learn more about how to represent negative integers in memory)
Then if you cast is to int it will still be -22 because casting preserves the sign of a number.
1{24}11101010
The the casted 32-bit value of b perform and operation with 0xFF.
1{24}11101010 & 0{24}11111111
=0{24}11101010
Then you get 234 as the answer.

The solution works fine (thanks!), but if you want to avoid casting and leave the low level work to the JDK, you can use a DataOutputStream to write your int's and a DataInputStream to read them back in. They are automatically treated as unsigned bytes then:
For converting int's to binary bytes;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(bos);
int val = 250;
dos.write(byteVal);
...
dos.flush();
Reading them back in:
// important to use a (non-Unicode!) encoding like US_ASCII or ISO-8859-1,
// i.e., one that uses one byte per character
ByteArrayInputStream bis = new ByteArrayInputStream(
bos.toString("ISO-8859-1").getBytes("ISO-8859-1"));
DataInputStream dis = new DataInputStream(bis);
int byteVal = dis.readUnsignedByte();
Esp. useful for handling binary data formats (e.g. flat message formats, etc.)

The Integer.toString(size) call converts into the char representation of your integer, i.e. the char '5'. The ASCII representation of that character is the value 65.
You need to parse the string back to an integer value first, e.g. by using Integer.parseInt, to get back the original int value.
As a bottom line, for a signed/unsigned conversion, it is best to leave String out of the picture and use bit manipulation as #JB suggests.

Even though it's too late, I'd like to give my input on this as it might clarify why the solution given by JB Nizet works. I stumbled upon this little problem working on a byte parser and to string conversion myself.
When you copy from a bigger size integral type to a smaller size integral type as this java doc says this happens:
https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.3
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
You can be sure that a byte is an integral type as this java doc says
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
byte: The byte data type is an 8-bit signed two's complement integer.
So in the case of casting an integer(32 bits) to a byte(8 bits), you just copy the last (least significant 8 bits) of that integer to the given byte variable.
int a = 128;
byte b = (byte)a; // Last 8 bits gets copied
System.out.println(b); // -128
Second part of the story involves how Java unary and binary operators promote operands.
https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.6.2
Widening primitive conversion (§5.1.2) is applied to convert either or both operands as specified by the following rules:
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
Otherwise, if either operand is of type long, the other is converted to long.
Otherwise, both operands are converted to type int.
Rest assured, if you are working with integral type int and/or lower it'll be promoted to an int.
// byte b(0x80) gets promoted to int (0xFF80) by the & operator and then
// 0xFF80 & 0xFF (0xFF translates to 0x00FF) bitwise operation yields
// 0x0080
a = b & 0xFF;
System.out.println(a); // 128
I scratched my head around this too :). There is a good answer for this here by rgettman.
Bitwise operators in java only for integer and long?

If you want to use the primitive wrapper classes, this will work, but all java types are signed by default.
public static void main(String[] args) {
Integer i=5;
Byte b = Byte.valueOf(i+""); //converts i to String and calls Byte.valueOf()
System.out.println(b);
System.out.println(Integer.valueOf(b));
}

In terms of readability, I favor Guava's:
UnsignedBytes.checkedCast(long) to convert a signed number to an unsigned byte.
UnsignedBytes.toInt(byte) to convert an unsigned byte to a signed int.

Handling bytes and unsigned integers with BigInteger:
byte[] b = ... // your integer in big-endian
BigInteger ui = new BigInteger(b) // let BigInteger do the work
int i = ui.intValue() // unsigned value assigned to i

in java 7
public class Main {
public static void main(String[] args) {
byte b = -2;
int i = 0 ;
i = ( b & 0b1111_1111 ) ;
System.err.println(i);
}
}
result : 254

I have tested it and understood it.
In Java, the byte is signed, so 234 in one signed byte is -22, in binary, it is "11101010", signed bit has a "1", so with negative's presentation 2's complement, it becomes -22.
And operate with 0xFF, cast 234 to 2 byte signed(32 bit), keep all bit unchanged.
I use String to solve this:
int a = 14206;
byte[] b = String.valueOf(a).getBytes();
String c = new String(b);
System.out.println(Integer.valueOf(c));
and output is 14206.

How are integers cast to bytes in Java?

I know Java doesn't allow unsigned types, so I was wondering how it casts an integer to a byte. Say I have an integer a with a value of 255 and I cast the integer to a byte. Is the value represented in the byte 11111111? In other words, is the value treated more as a signed 8 bit integer, or does it just directly copy the last 8 bits of the integer?

This is called a narrowing primitive conversion. According to the spec:
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
So it's the second option you listed (directly copying the last 8 bits).
I am unsure from your question whether or not you are aware of how signed integral values are represented, so just to be safe I'll point out that the byte value 1111 1111 is equal to -1 in the two's complement system (which Java uses).

int i = 255;
byte b = (byte)i;
So the value of be in hex is 0xFF but the decimal value will be -1.
int i = 0xff00;
byte b = (byte)i;
The value of b now is 0x00. This shows that java takes the last byte of the integer. ie. the last 8 bits but this is signed.

or does it just directly copy the last
8 bits of the integer
yes, this is the way this casting works

The following fragment casts an int to a byte. If the integer’s value is larger than the range of a byte, it will be reduced modulo (the remainder of an integer division by the) byte’s range.
int a;
byte b;
// …
b = (byte) a;

Just a thought on what is said: Always mask your integer when converting to bytes with 0xFF (for ints). (Assuming myInt was assigned values from 0 to 255).
e.g.
char myByte = (char)(myInt & 0xFF);
why? if myInt is bigger than 255, just typecasting to byte returns a negative value (2's complement) which you don't want.

Byte is 8 bit. 8 bit can represent 256 numbers.(2 raise to 8=256)
Now first bit is used for sign. [if positive then first bit=0, if negative first bit= 1]
let's say you want to convert integer 1099 to byte. just devide 1099 by 256. remainder is your byte representation of int
examples
1099/256 => remainder= 75
-1099/256 =>remainder=-75
2049/256 => remainder= 1
reason why? look at this image http://i.stack.imgur.com/FYwqr.png

According to my understanding, you meant
Integer i=new Integer(2);
byte b=i; //will not work
final int i=2;
byte b=i; //fine
At last
Byte b=new Byte(2);
int a=b; //fine

for (int i=0; i <= 255; i++) {
byte b = (byte) i; // cast int values 0 to 255 to corresponding byte values
int neg = b; // neg will take on values 0..127, -128, -127, ..., -1
int pos = (int) (b & 0xFF); // pos will take on values 0..255
}
The conversion of a byte that contains a value bigger than 127 (i.e,. values 0x80 through 0xFF) to an int results in sign extension of the high-order bit of the byte value (i.e., bit 0x80). To remove the 'extra' one bits, use x & 0xFF; this forces bits higher than 0x80 (i.e., bits 0x100, 0x200, 0x400, ...) to zero but leaves the lower 8 bits as is.
You can also write these; they are all equivalent:
int pos = ((int) b) & 0xFF; // convert b to int first, then strip high bits
int pos = b & 0xFF; // done as int arithmetic -- the cast is not needed
Java automatically 'promotes' integer types whose size (in # of bits) is smaller than int to an int value when doing arithmetic. This is done to provide a more deterministic result (than say C, which is less constrained in its specification).
You may want to have a look at this question on casting a 'short'.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.