Java and unsigned values - java

I'm parsing unsigned bits from a DatagramSocket. I have a total of 24bits (or 3 bytes) coming in - they are: 1 unsigned 8bit integer followed by a 16bit signed integer. But java never stores anything more than a signed byte into a byte/byte array? When java takes in these values, do you lose that last 8th bit?
DatagramSocket serverSocket = new DatagramSocket(666);
byte[] receiveData = new byte[3]; <--Now at this moment I lost my 8th bit
System.out.println("Binary Server Listing on Port: "+port);
while (true)
{
DatagramPacket receivePacket = new DatagramPacket(receiveData, receiveData.length);
serverSocket.receive(receivePacket);
byte[] bArray = receivePacket.getData();
byte b = bArray[0];
}
Did I now lose this 8th bit since I turned it into a byte? Was it wrong I initialized a byte array of 3 bytes?

When java takes in these values, do you lose that last 8th bit?
No. You just end up with a negative value when it's set.
So to get a value between 0 and 255, it's simplest to use something like this:
int b = bArray[0] & 0xff;
First the byte is promoted to an int, which will sign extend it, leading to 25 leading 1 bits if the high bit is 1 in the original value. The & 0xff then gets rid of the first 24 bits again :)

No, you do not lose the 8th bit. But unfortunately, Java has two "features" which make it harder than reasonable to deal with such values:
all of its primitive types are signed;
when "unwrapping" a primitive type to another primitive type with a greater size (for instance, reading a byte to an int as is the case here), the sign bit of the "lower type" is expanded.
Which means that, for instance, if you read byte 0x80, which translates in binary as:
1000 0000
when you read it as an integer, you get:
1111 1111 1111 1111 1111 1111 1000 0000
^
This freaking bit gets expanded!
whereas you really wanted:
0000 0000 0000 0000 0000 0000 1000 0000
ie, integer value 128. You therefore MUST mask it:
int b = array[0] & 0xff;
1111 1111 1111 1111 1111 1111 1000 0000 <-- byte read as an int, your original value of b
0000 0000 0000 0000 0000 0000 1111 1111 <-- mask (0xff)
--------------------------------------- <-- anded, give
0000 0000 0000 0000 0000 0000 1000 0000 <-- expected result
Sad, but true.
More generally: if you wish to manipulate a lot of byte-oriented data, I suggest you have a look at ByteBuffer, it can help a lot. But unfortunately, this won't save you from bitmask manipulations, it is just that it makes it easier to read a given quantity of bytes as a time (as primitive types).

In Java, byte (as well as short, int and long) is only a signed numeric data types. However, this does not imply any loss of data when treating them as unsigned binary data. As your illustration shows, 10000000 is -128 as a signed decimal number. If you are dealing with binary data, just treat it as its binary form and you will be fine.

Related

how to convert integer to hex signed 2's complement:

Pretty basic stuff i am sure but bits are not my forte.
So for some internal calculation i am trying to convert a given input ( constraint is that it would be a integer string for sure) into its hex equivalent, what stumped me is on how to get
Hex signed 2's complement:
My noob code:
private String toHex(String arg, boolean isAllInt) {
String hexVal = null;
log.info("arg {}, isAllInt {}", arg, isAllInt);
if (isAllInt) {
int intVal = Integer.parseInt(arg);
hexVal = Integer.toHexString(intVal);
// some magic to convert this hexVal to its 2's compliment
} else {
hexVal = String.format("%040x", new BigInteger(1, arg.getBytes(StandardCharsets.UTF_8)));
}
log.info("str {} hex {}", arg, hexVal);
return hexVal;
}
Input: 00001
Output: 1
Expected Output: 0001
Input: 00216
Output: D8
Expected Output: 00D8
00216
Input: 1192633166
Output: 4716234E
Expected Output: 4716234E
any predefined library is much welcome or any other useful pointers!
So to pad the hex digits up to either 4 digits or 8 digits, do:
int intVal = Integer.parseInt(arg);
if (intVal >= 0 && intVal <= 0xffff) {
hexVal = String.format("%04x", intVal);
} else {
hexVal = String.format("%08x", intVal);
}
See Java documentation on how the format strings work.
Answering the two's complement aspect.
Two's Complement Representation
Two's complement is an agreement how to represent signed integral numbers in e.g. 16 bits (in olden times, different representations have been used by various processors, e.g. one's complement or sign-magnitude).
Positive numbers and zero are represented as expected:
0 is 0000 0000 0000 0000 or hex 0000
1 is 0000 0000 0000 0001 or hex 0001
2 is 0000 0000 0000 0010 or hex 0002
3 is 0000 0000 0000 0011 or hex 0003
4 is 0000 0000 0000 0100 or hex 0004
Negative numbers are represented by adding 1 0000 0000 0000 0000 to them, giving:
-1 is 1111 1111 1111 1111 or hex ffff
-2 is 1111 1111 1111 1110 or hex fffe
-3 is 1111 1111 1111 1101 or hex fffd
This is equivalent to: take the positive representation, flip all bits, and add 1.
For negative numbers, the highest bit is always 1. And that's how the machine distinguishes positive and negative numbers.
All processors in use today do their integer arithmetic based on two's complement representation, so there's typically no need to do special tricks. All the Java datatypes like byte, short, int, and long are defined to be signed numbers in two's complement representation.
In a comment you wrote
2's compliment is hex of negative of original value
That mixes up the concepts a bit. Two's complement is basically defined on bit patterns, and groups of 4 bits from these bit patterns can nicely be written as hex digits. Two's complement is about representing negative values as bit patterns, but from your question and comments I read that you don't expect negative values, so two's complement shouldn't concern you.
Hex Strings
To represent signed values as hex strings, Java (and most other languages / environments) simply looks at the bit patterns, ignoring their positive / negative interpretation, meaning that e.g. -30 (1111 1111 1110 0010) does not get shown as "-1e" with a minus sign, but as "ffe2".
Because of this, negative values will always get translated to a string with maximum length according to the value's size (16 bits, 32 bits, 64 bits giving 4, 8, or 16 hex digits), because the highest bit will be 1, resulting in a leading hex digit surely not being zero. So for negative values, there's no need to do any padding.
Small positive values will have leading zeros in their hex representation, and Java's toHexString() method suppresses them, so 1 (0000 0000 0000 0001) becomes "1" and not "0001". That's why e.g. format("%04x", ...), as in #nos's answer, is useful.

How to convert int to hex in java?

public void int2byte(){
int x = 128;
byte y = (byte) x;
System.out.println(Integer.toHexString(y));
}
I got result ffffff80, why not 80?
I got result 7f when x = 127.
Bytes are signed in Java. When you cast 128 to a byte, it becomes -128.
The int 128: 00000000 00000000 00000000 10000000
The byte -128: 10000000
Then, when you widen it back to an int with Integer.toHexString, because it's now negative, it gets sign-extended. This means that a bunch of 1 bits show up, explaining your extra f characters in the hex output.
You can convert it in an unsigned fashion using Byte.toUnsignedInt to prevent the sign extension.
Converts the argument to an int by an unsigned conversion. In an unsigned conversion to an int, the high-order 24 bits of the int are zero and the low-order 8 bits are equal to the bits of the byte argument.
System.out.println(Integer.toHexString(Byte.toUnsignedInt(y)));
An alternative is to bit-mask out the sign-extended bits by manually keeping only the 8 bits of the byte:
System.out.println(Integer.toHexString(y & 0xFF));
Either way, the output is:
80
This is because a positive value of 128 cannot be represented with a signed byte
The range is -128 to 127
So your int becomes sign-extended so instead of 0x00000080 you get 0xffffff80
Edit: As others have explained:
Integer 128 shares the last 8 bits with the byte (-128)
The difference is that Integer 128 has leading zeros.
0x00000080 (0000 0000 0000 0000 0000 0000 1000 0000)
while the byte -128 is just
0x80 (1000 0000)
When you convert byte -128 using Integer.toHexString() it takes the leading 1 and sign-extends it so you get
0xffffff80 (1111 1111 1111 1111 1111 1111 1000 0000)

Understanding a program written into bluej using twos complement.

I just started using Bluej to learn more about how computers store integers. I have a small program that I put into Bluej that sets the value of an integer called x to MAX_VALUE - 3 then adds 1 to x six times, printing out a new value each time.
One addition is incorrect, although I need help understanding which value I received in incorrect and why the results I got are "strange".
Please keep in mind I am VERY naive to the language for computers and am literally reading from a book about storing integers. The book I have is Computer Science 11th edition by J. Glenn Brookshear.
Here is the program I put into BlueJ:
public class Add
{
public Add()
{
int i, x;
x = java.lang.Integer.MAX_VALUE - 3;
i = 0;
while (i < 6) {
x = x + 1;
i = i + 1;
System.out.print(x + "\n");
}
}
}
The values I receive are:
2147483645
2147483646
2147483647
-2147483648
-2147483647
-2147483646
My teacher says there is a problem with any integer math but I do not know what he means. I would just really like to understand why this happens.
I might also note that these numbers are very much larger than 1 and I do not know why.
Thank you all in advance for any responses!
Integers that you store with the int data type are only allocated a limited amount of space in your computer's memory. It's not possible to store every possible integer in this amount of space. So your computer will deal correctly with integers between -2147483648 and 2147483647, because those are enough for most purposes. If you want to store numbers that are outside this range, you need to use a different data type from int. For example, there's long (which has a much bigger range) and BigInteger (which is really limited only by the amount of space allocated to Java itself).
When you add 1 to the largest possible int, the "correct" answer can't fit in an int variable. This is a bit like having an abacus with only one line of beads (which can represent numbers from 0 to 9), and trying to work out 9 + 1. Your computer will roll the number over to the smallest possible int instead. So when you work with int values, the effect is that 2147483647 + 1 = -2147483648, even though mathematically this makes no sense.
There is a limit value for an integer in Java in this case max_value.... for example when you try to surpass that value it becomes the oposite (-2,147,483,648 min_value). Like completing the circle and go back to the beggining. So there in no higher value than 2,147,483,647 ...so when you add 1 to that value you get the min_value instead...think of it like a snake eating his own tail ;)
If your Windows calculator has a Programmer View, switch to it, click Dword, enter 2147483645, add 1 six times, and watch the bits.
An integer in Java is 32-bits and signed. This means there is one sign bit and 31 bits for the value.
Integer.MAX_VALUE = 2147483647 (base 10)
= 0111 1111 1111 1111 1111 1111 1111 1111 (base 2)
Adding 1 yields
2147483647 + 1 = 2147483648
= 1000 0000 0000 0000 0000 0000 0000 0000
Counting up, this is what you'd expect if you weren't a computer (or your number wasn't bounded by representation space). But with the int data type, we only get 32 bits and the "first" (not technically correct, but will aid in understanding) tells you whether or not the value is negative.
Now when Java translates this value to base 10 and because this is the signed integer data type...
2147483647 + 1 = 2147483648
= 1000 0000 0000 0000 0000 0000 0000 0000
We read the first bit as 1 so this is a negative number and we need to take its
twos-complement to calculate the value.
= 1000 0000 0000 0000 0000 0000 0000 0000
negated = 0111 1111 1111 1111 1111 1111 1111 1111
+ 1 = 1000 0000 0000 0000 0000 0000 0000 0000
= 2147483648 (base 10)
so when we display this value, it's the negative value of the two's complement,
= -2147483648
The problem with "integer math" your teacher mentions is that, when your data type (Java's int, in this case) is bounded by size, your operations must make sense within its range.

left shift on short produces zero

As 'byte' is 8-bit and 'short' is 16-bit in Java, I believe this should --
byte[] packet = reader.readPacket();
short sh;
sh = (short)packet[1]; //packet[1] holds '0xff'
sh <<= 8;
sh &= 0xFF;
System.out.print(sh+" ");
produce some big positive value, since lower 8bits are promoted to higher 8 bits.
Instead I receive a '0' (zero). Why does it happen so?
What you're doing is shifting the initial value to the left:
0000 0000 1111 1111
<<=8
1111 1111 0000 0000
Then, you're doing a bitwise AND with 0xFF:
1100 0011 0000 0000
&
0000 0000 1111 1111
==
0000 0000 0000 0000
Thus, your end result is 0.
The code first shifts left 8 places. So you have all the right most 8 bits set to 0.
Then you AND it with 0xFF which has left most 8 bits 0.
So your final result is all 0's!
Additional comment: It is a good practice to avoid using short for Java as Java typecasts everything below int as int. Also, it is not clear from your code what output you expected. If you add additional information, it will be easy to spot exactly what is needed to be done for the logic you are trying to implement.
Go through this step-by-step:
sh = (short)0xff; //Since you said that packet[1] holds '0xff'
So now sh = 0x00ff. Next, consider the shift sh <<= 8;. Afterwards, sh = 0xff00.
I'll leave the last sh &= 0xFF; to you (should hopefully be clear why the & op is setting sh to 0).

Having difficulty understanding an example of when bit shifting should/needs to be used

I was looking into when one would want to/should use the bit shift operators. I understand that we don't need to use them to multiply by two and such, because JIT compilation will take care of that. I came across why do we need to use shift operators in java, and am confused by part of the accepted answer:
For example, say I have two bytes that are the high-order and low-order bytes
of a two-byte (16-bit) unsigned value. Say you need to construct that value.
In Java, that's:
int high = ...;
int low = ...;
int twoByteValue = (high << 8) | low;
You couldn't otherwise do this without a shift operator.
To answer your questions: you use them where you need to use them! and nowhere else.
I know that I'm missing something, because to me it looks like he's just multiplying high by 2^8 and adding it to low (I've never actually seen | used in this context before, but when I plugged in dummy values and ran my code, it looked like it was just adding the two together). What's actually going on here?
EDIT: For reference, I had high = 10 and low = 3.
As an example, lets compose the 16-bit representation of the number 54321: 1101 0100 0011 0001 from its high-order 8 bits and low-order 8 bits.
The high-order 8 bits would be: 1101 0100
The low-order 8 bits would be : 0011 0001
1101 0100 << 8
will produce
1101 0100 0000 0000
We then bitwise OR this (|) against the low order bits
1101 0100 0000 0000
0011 0001
-------------------
1101 0100 0011 0001
We now have the full binary representation on 54321 (assuming I'm using unsigned ints of course)
Edit:
To use your example: high=10 and low=3
high, written out in 8 bits, would be 0000 1010
low, written in the same fashion, would be 0000 0011
If we shift high to the left 8 bits:
0000 1010 0000 0000
If we OR that against low:
0000 1010 0000 0000
0000 0011
-------------------
0000 1010 0000 0011
If we treat this pattern as a decimal integer it would mean 2563
Perhaps the confusing part is that the 10 and 3 independently really hold no meaning at all in this context. It is the composition of the two that has value here.
Perhaps you are reading a file byte by byte but in one part of the file there is a sequence of 16-bit integers. You would have to take every pair of bytes and combine them in exactly this fashion to get the 16-bit integers.
Now imagine if, on a platform where the largest integer possible is 64 bits, you wanted to store an integer that is so large it occupies 128 bits. Well, you could use a trick similar to this one to fudge the math and store that really big integer in two separate values. OK, maybe its more complicated than this example, but hopefully that makes it hit home why we would need bitwise operators like these.

Categories

Resources