binary data manipulation in java

binary data manipulation in java - java

I spent the better part of a day chasing down a binary reconstruction bug, and want to understand why:
the specific line of code looked like this (dataBuffer is an array of bytes):
short data = (short) ((short)dataBuffer[curPos + 3] << 8 | ((short)dataBuffer[curPos + 2]));
it sporadically returned garbage until i added a mask to the low-order word:
short data = (short) ((short)dataBuffer[curPos + 3] << 8 | (((short)dataBuffer[curPos + 2])) & 0xff);
so, my interpretation is that the type-cast from byte to short occasionally leaves behind trash in the high-order word, causing issues when it's or-ed... but that doesn't make a whole lot of sense.
this code is taken from c++ and worked great there... what am i missing?

It's sign extension. All byte values in Java are signed, so any byte value greater than 127 is actually a negative number. Thus, a byte value of, say, 0x90 (= 144 decimal) is actually treated as -112 when it is a byte. When it is widened to a short it becomes 0xff90 (still -112). You need to mask the value with 0xff to recover the desired bit pattern of 0x0090.
As an aside, your can eliminate a couple of casts from your second expression:
short data = (short) ((dataBuffer[curPos + 3] << 8) | (dataBuffer[curPos + 2] & 0xff));
Those casts are, in fact, quite useless. Operands to bitwise operators are always promoted to int values1 before the operator is applied.
1 Or long, if any of them are long.

Related

Java generate code byte 256*256 & 0xff

I need to generate a 3-bytes code (like A502F1). I am given a criteria:
1st byte is (serialCodeNumber / (256*256) ) & 0xFF
2nd is (serialCodeNumber / 256) & 0xFF
3th is (serialCodeNumber) & 0xFF
serialCodeNumber is a sequence 1-0xFFF
What does that mean!?
I would generate it like this:
String codeNum = new BigInteger(256, random).toString(16).toUpperCase().substring(0, 6);
But what is the right way of doing it as the requirement says?

I'm not quite sure what is meant by the serialCodeNumber, since if it is later on divided by 65025 it has to be a considerably larger number than 0xFFF (which is 4095) for it to make any reasonable sense.
But let's take a look at the conditions, they would all make sense once you are accustomed to the bitwise AND operator. A good read is available here on how it works but the meat of the matter from that question in my opinion is this sentence by Markus Jarderot:
The result is the bits that are turned on in both numbers.
Since in your conditions you have & 0xFF and 0xFF is 255, or in binary it's 11111111 the first eight bits that are all turned on. This is a neat trick to just retrieve only the first 8 bits of any number. And as we all know 8 bits make up a byte. (Are you starting to see where this all is coming together now?)
As for the conditions before the & 0xFF, some might recognize them as bit shift operations hidden behind divisions.
(serialCodeNumber / (256*256)) is equivalent to (serialCodeNumber >> 16)
and
(serialCodeNumber / 256) is equivalent to (serialCodeNumber >> 8)
But that is not that important in this case.
So the first condition takes the serialCodeNumber divides it by 65025 (256*256) and then looks at the 8 right most bits and ignores any other, from those 8 bits it constructs a byte.
In Java you can pretty much just write the condition as it is:
byte myFirstByte = (byte) ((serialCodeNumber / (256*256)) & 0xFF);
The other conditions aren't much different:
byte mySecondByte = (byte) ((serialCodeNumber / (256)) & 0xFF);
and
byte myThirdByte = (byte) ((serialCodeNumber) & 0xFF);
Once you have all three of your bytes, I'm assuming you need to convert them to a hex String. So I'll add them into a byte array.
byte[] myArray = {myFirstByte,mySecondByte,myThirdByte};
And borrow some method on how to convert byte arrays to HEX strings from this question.
String codeNum = bytesToHex(myArray);
And the result will look something like this:
F03DD7
EDIT:
Since you have to generate a serial number that has to be up to 6 bytes in value, I'd recommend using a long number.
A 6 byte number will be anywhere from 1 to 281474976710655, so you probably need to generate one randomly.
First instantiate a Random object which you will be able to poll numbers from:
Random random = new Random();
Once you have that, poll a long from it for the range 1 to 281474976710655.
For this you can borrow KennyTM's answer from this question.
So you can then generate the number like so:
long serialCodeNumber = nextLong(random, 281474976710655L)+1L;
We add the +1L at the end since we want it to include the last number as well as start from 1 instead of 0.
If you ever need to show a HEX string of the serialCodeNumber you can then just call:
String serialHex = Long.toHexString(serialCodeNumber);
But make sure to add any additional "0"s at the left side based on the length of the string so that it is 6-bytes = 12 characters long.

Reading bytes in Java

I am trying to understand how the following line of code works:
for (int i = 0; i < numSamples; i++) {
short ampValue = 0;
for (int byteNo = 0; byteNo < 2; byteNo++) {
ampValue |= (short) ((data[pointer++] & 0xFF) << (byteNo * 8));
}
amplitudes[i] = ampValue;
}
As far as I understand, this is reading 2 bytes (as 2 bytes per sample) in a inclusive manner, i.e. the ampValue is composed of two byte reads. The data is the actual data sample (file) and the pointer is increasing to read it upto the last sample. But I don't understand this part:
"data[pointer++] & 0xFF) << (byteNo * 8)); "
Also, I am wondering whether it makes any difference if I want to read this as a double instead of short?

Looks like data[] is the array of bytes.
data[pointer++] gives you a byte value in the range [-128..127].
0xFF is an int contstant, so...
data[pointer++] & 0xFF promotes the byte value to an int value in the range [-128..127]. Then the & operator zeroes out all of the bits that are not set in 0xFF (i.e., it zeroes out the 24 upper bits, leaving only the low 8 bits.
The value of that expression now will be in the range [0..255].
The << operator shifts the result to the left by (byteNo * 8) bits. That's the same as saying, it multiplies the value by 2 raised to the power of (byteNo * 8). When byteNo==0, it will multiply by 2 to the power 0 (i.e., it will multiply by 1). When byteNo==1, it will multiply by 2 to the power 8 (i.e., it will multiply by 256).
This loop is creating an int in the range [0..65535] (16 bits) from each pair of bytes in the array, taking the first member of each pair as the low-order byte and the second member as the high-order byte.
It won't work to declare ampValue as double, because the |= operator will not work on a double, but you can declare the amplitudes[] array to be an array of double, and the assignment amplitudes[i] = ampValue will implicitly promote the value to a double value in the range [0.0..65535.0].
Additional info: Don't overlook #KevinKrumwiede's comment about a bug in the example.

In Java, all bytes are signed. The expression (data[pointer++] & 0xFF) converts the signed byte value to an int with the value of the byte if it were unsigned. Then the expression << (byteNo * 8) left-shifts the resulting value by zero or eight bits depending on the value of byteNo. The value of the whole expression is assigned with bitwise or to ampValue.
There appears to be a bug in this code. The value of ampValue is not reset to zero between iterations. And amplitude is not used. Are those identifiers supposed to be the same?

Let's break down the statement:
|= is the bitwise or and assignment operator. a |= b is equivalent to a = a | b.
(short) casts the int element from the data array to a short.
pointer++ is a post-increment operation. The value of pointer will be returned and used and then immediately incremented every single time it's accessed in this fashion - this is beneficial in this case because the outer-loop is cycling through 2-byte samples (via the inner loop) from the contiguous data buffer, so this keeps incrementing.
& is the bitwise AND operator and 0xFF is the hexadecimal value for the byte 0b11111111 (255 in decimal); the expression data[pointer++] & 0xFF is basically saying, for each bit in the byte retrieved from the data array, AND it with 1. In this context, it forces Java, which by default stores signed byte objects (i.e. values from -128 to 127 in decimal), to return the value as an unsigned byte (i.e. values from 0 to 255 decimal).
Since your samples are 2 bytes long, you need to shift the second lot of 8 bits left, as the most significant bits, using the left bit-shift operator <<. The byteNo * 8 ensures that you're only shifting bits when it's the second of the two bytes.
After the two bytes have been read, ampValue will now contain the value of the sample as a short.

Combine 2 8 bit byte array positions to a single 16 bit integer

I have a byte array in which a value is stored as a 16bit unsigned integer. This is spread across two positions in my byte array, DataArray[11] and DataArray[12]. The documentation I have for the packet which contains the byte array tells me that the value I need to extract is packed least significant bit first. I'm having trouble wrapping my head around bitmasks and bit shifting, and I'm actually unclear if I need to use one or the other, or both.
This is what I have so far, but the results don't seem right:
int result = (DataArray[11] << 8 | DataArray[12]) & 0xFF;

You're trying to get a 16-bit integer, right? But you're masking it using & 0xff - which limits you to 8 bits. I suggest you mask each byte rather than the result:
int result = (DataArray[11] & 0xff) |
((DataArray[12] & 0xff) << 8);
I've included more parentheses here than are probably required, just for the sake of sanity and not needing to worry about precedence.
I've also swapped the ordering so that you're shifting DataArray[12] rather than DataArray[11], as it's meant to be least-significant byte first.

Understanding signed numbers and complements in java

I have a 3 byte signed number that I need to determine the value of in Java. I believe it is signed with one's complement but I'm not 100% sure (I haven't studied this stuff in more than 10 years and the documentation of my problem isn't super clear). I think the problem I'm having is Java does everything in two's complement. I have a specific example to show:
The original 3-byte number: 0xEE1B17
Parsed as an integer (Integer.parseInt(s, 16)) this becomes: 15604503
If I do a simple bit flip (~) of this I get (I think) a two's complement representation: -15604504
But the value I should be getting is: -1172713
What I think is happening is I'm getting the two's complement of the entire int and not just the 3 bytes of the int, but I don't know how to fix this.
What I have been able to do is convert the integer to a binary string (Integer.toBinaryString()) and then manually "flip" all of the 0s to 1s and vice-versa. When then parsing this integer (Integer.parseInt(s, 16)) I get 1172712 which is very close. In all of the other examples I need to always add 1 to the result to get the answer.
Can anyone diagnose what type of signed number encoding is being used here and if there is a solution other than manually flipping every character of a string? I feel like there must be a much more elegant way to do this.
EDIT: All of the responders have helped in different ways, but my general question was how to flip a 3-byte number and #louis-wasserman answered this and answered first so I'm marking him as the solution. Thanks to everyone for the help!

If you want to flip the low three bytes of a Java int, then you just do ^ 0x00FFFFFF.

0xFFEE1B17 is -1172713
You must only add the leading byte. FF if the highest bit of the 3-byte value is set and 00 otherwise.
A method which converts your 3-byte value to a proper intcould look like this:
if(byte3val>7FFFFF)
return byte3val| 0xFF000000;
else
return byte3val;

Negative signed numbers are defined so that a + (-a) = 0. So it means that all bits are flipped and then 1 added. See Two's complement. You can check that the condition is satisfied by this process by thinking what happens when you add a + ~a + 1.
You can recognize that a number is negative by its most significant bit. So if you need to convert a signed 3-byte number into a 4-byte number, you can do it by checking the bit and if it's set, set also the bits of the fourth byte:
if ((a & 0x800000) != 0)
a = a | 0xff000000;
You can do it also in a single expression, which will most likely perform better, because there is no branching in the computation (branching doesn't play well with pipelining in current CPUs):
a = (0xfffffe << a) >> a;
Here << and >> perform byte shifts. First we shift the number 8 bits to the right (so now it occupies the 3 "upper" bytes instead of the 3 "lower" ones), and then shift it back. The trick is that >> is so-called Arithmetic shift also known as signed shift. copies the most significant bit to all bits that are made vacant by the operation. This is exactly to keep the sign of the number. Indeed:
(0x1ffffe << 8) >> 8 -> 2097150
(0xfffffe << 8) >> 8 -> -2
Just note that java also has a unsigned right shift operator >>>. For more information, see Java Tutorial: Bitwise and Bit Shift Operators.

Bitwise AND, Bitwise Inclusive OR question, in Java

I've a few lines of code within a project, that I can't see the value of...
buffer[i] = (currentByte & 0x7F) | (currentByte & 0x80);
It reads the filebuffer from a file, stored as bytes, and then transfers then to buffer[i] as shown, but I can't understand what the overall purpose is, any ideas?
Thanks

As the other answers already stated, (currentByte & 0x7F) | (currentByte & 0x80) is equivalent to (currentByte & 0xFF). The JLS3 15.22.1 says this is promoted to an int:
When both operands of an operator &,
^, or | are of a type that is
convertible (§5.1.8) to a primitive
integral type, binary numeric
promotion is first performed on the
operands (§5.6.2). The type of the
bitwise operator expression is the
promoted type of the operands.
because JLS3 5.6.2 says that when currentByte has type byte and 0x7F is an int (and this is the case), then both operands are promoted to int.
Therefore, buffer will be an array of element type int or wider.
Now, by performing & 0xFF on an int, we effectively map the original byte range -128..127 into the unsigned range 0..255, an operation often used by java.io streams for example.
You can see this in action in the following code snippet. Note that to understand what is happening here, you have to know that Java stores integral types, except char, as 2's complement values.
byte b = -123;
int r = b;
System.out.println(r + "= " + Integer.toBinaryString(r));
int r2 = b & 0xFF;
System.out.println(r2 + "= " + Integer.toBinaryString(r2));
Finally, for a real-world example, check out the Javadoc and implementation of the read method of java.io.ByteArrayInputStream:
/**
* Reads the next byte of data from this input stream. The value
* byte is returned as an <code>int</code> in the range
* <code>0</code> to <code>255</code>. If no byte is available
* because the end of the stream has been reached, the value
* <code>-1</code> is returned.
*/
public synchronized int read() {
return (pos < count) ? (buf[pos++] & 0xff) : -1;
}

(currentByte & 0x7F) | (currentByte & 0x80)
is equivalent to
currentByte & (0x7F | 0x80)
which equals
currentByte & 0xFF
which is exactly the same as
currentByte
Edit: I only looked at the right side of the assignment, and I still think the equivalance is true.
However, it seems like the code wants to cast the signed byte to a larger type while interpreting the byte as unsigned.
Is there an easier way to cast signed-byte to unsigned in java?

I think someone did too much thinking here. That's just not right.
I have but one remark
The original author was worried about the run-time replacing the byte with a native signed integer (presumably 32-bit) and is explicitly trying to tell us something about the sign bit being "special"?
It's code left behind. Unless you know you're on a fishy run-time? What's the type of the 'buffer' anyway?

The complicated bitwise logic is completely superfluous.
for (int i = 0; i < buffer.length; i++) {
buffer[i] = filebuffer[currentPosition + i] & 0xff;
}
does the same thing. If buffer is declared as an array of bytes you may even leave out the & 0xff, but unfortunately the declaration is not shown.
The reason may be that the original developer was confused by bytes being signed in Java.

The result of a bitwise AND operation has a 1 on that bits where both bits are 1 while the result of a bitwise OR operation hase a on that bits where either one of bot bits is 1.
So an example evaluation for the value 0x65:
01100101 0x65
& 01111111 0x7F
===============
01100101 0x65
01100101 0x65
& 10000000 0x80
===============
00000000 0x00
01100101 0x65
| 00000000 0x00
===============
01100101 0x65

The good thing about these kinds of logical operations: you can try every possible combination (all 256 of them) and verify that you get the answer you expected.

Turns out, the file which the byte was being read from was in a signed bit notation, and of a different length, therefore it was requried to perform this task to allow it to be extended to the java int type, while retaining its correct sign :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.