Understanding bitwise operations and their application in Java

Understanding bitwise operations and their application in Java - java

I think understand what they fundamentally do - operate on bits (flip, shift, invert, etc...).
My issue is that I don't know when I'd ever need to use them, and I don't think I fully understand bits.
I know that there are 8 bits in a byte and I know that bits are either a 0 or 1. Now here is where I start to get confused... I believe data types define combinations of bits differently. So if I declare an int, 32 bits are set aside for numbers, if I declare a char, 8 bits are set aside and the bits for that data type define a letter.
Running with that idea, I did the following basic operation in java which confuses me:
int num = 00000010;
System.out.println(num);
This prints 8 and if I define num this way:
int num = 00000100;
System.out.println(num);
This prints 64
So to practice with bitwise operations (just for the hell of it) I tried this:
int num = 00000010 << 1;
System.out.println(num);
And it prints 16 where as I thought it would shift the bits by one to the left and print 64.
What is happening here, and when would I ever need to apply this method of manipulating bits?

You are accidentally specifying an octal literal when you specify a number with a leading zero.
00000010 => 1*8^1 + 0*8^0 => 8
00000100 => 1*8^2 + 0*8^1 + 0*8^0 => 64
The JLS, Section 3.10.1, describes octal and binary literals:
An octal numeral consists of an ASCII digit 0 followed by one or more
of the ASCII digits 0 through 7 interspersed with underscores, and can
represent a positive, zero, or negative integer.
A binary numeral consists of the leading ASCII characters 0b or 0B
followed by one or more of the ASCII digits 0 or 1 interspersed with
underscores, and can represent a positive, zero, or negative integer.
You are bit-shifting your 8 by one to the left, effectively multiplying it by 2 to get 16. In bits:
00000100 => 00001000
(8 => 16)
Binary literals are expressed with leading 0b, e.g.:
0b000010 => 2

Related

how does a number change when it is too long for the selected datatype in java [duplicate]

This question already has answers here:
How are integers cast to bytes in Java?
(8 answers)
Type casting into byte in Java
(6 answers)
Explicit conversion from int to byte in Java
(4 answers)
Closed 4 months ago.
For example of this is my input:
byte x=(byte) 200;
This will be the output:
-56
if this is my input:
short x=(short) 250000;
This will be the output:
-12144
I realize that the output is off because the number does not fit into the datatype, but how can I predict what this output will be in this case? In my computer science exam this my be one of the questions and I do not understand why exactly 200 changes to -56 and so one.
I realize that the output is off because the number does not fit into the datatype, but how can I predict what this output will be in this case? In my computer science exam this my be one of the questions and I do not understand why exactly 200 changes to -56 and so one.

The relevant aspects are what overflow looks like, and how the bits that represent the underlying data are treated.
Computers are all bits, grouped together in groups of 8; a group of 8 bits is called a byte.
byte b = 5; for example, is stored in memory as 0000 0101.
Bits can be 0. Or 1. That's it. That's where it ends. And everything is, in the end, bits. This means: That - is not a thing. Computers do not know what - is and cannot store it. We need to write code and agree on some sort of meaning to represent them.
2's complement
So what's -5 in bits? It's 1111 1011. Which seems bizarre. But it's how it works. If you write: byte b = -5;, then b will contain 1111 1011. It is because javac made that happen. Similarly, if you then call System.out.println(b), then the println method gets the bit sequence 1111 1011. Why does the println method decide to print a - symbol and then a 5 symbol? Because it's programmed that way: We all are in agreement that 1111 1011 is -5. So why is that?
Because of a really cool property - signed/unsigned irrelevancy.
The rule is 2's complement: To switch the sign (i.e. turn 5, which is 0000 0101 into -5 which is 1111 1011), you flip every bit, and then add 1 to the end result. Try it with 0000 0101 - and you'll see it's 1111 1011. This algorithm is reversible - apply the same algorithm (flip every bit, then add 1) and you can turn -5 into 5.
This 2's complement thing has 2 great advantages:
There is only one 0 value. If we just flipped all bits, we'd have both 1111 1111 and 0000 0000 both representing some form of 0. In basic math, there's no such thing as 'negative 0' - it's the same as positive 0. Similarly if we just decided the first bit is the sign and the remaining 7 bits are the number, then we'd have both 1000 0000 and 0000 0000 both being 0, which is annoying and inefficient, why waste 2 different bit sequences on the same number?
plus and minus are sign-mode independent. The computer doesn't have to KNOW whether we are doing the 2's complement thing or not. Take the bit sequence 1111 1011. If we treat that as unsigned bits, then that is 251 (it's 128 + 64 + 32 + 16 + 8 + 2 + 1). If we treat that as a signed number, then the first bit is 1, so the thing is negative: We apply 2's complement and figure out that it is -5. So, is it -5 or 251? It's both, at once! Depends on the human/code that interprets this bit sequence which one it is. So how could the computer possibly do a + b given this? The weird answer is: It doesn't matter - because the math works out the same way. 251 - 10 is 241. -5 - 10 is -15. -15 and 241 are the exact same bit sequence.
Overflow
A byte is 8 bits, and there are 256 different sequences of bits, and then you have listed each and every possible variant. (2^8 = 256. Hence, a 16-bit number can be used to convey 65536 different things, because 2^16 is 65536, and so on). So, given that bytes are 8 bits and we decreed they are signed, and 2's complement signed, that means that the smallest number you can send with it is -128, which in bits is 1000 0000 (use 2's complement to check my work), and +127, which in bits is 0111 1111. So what happens if you add 1 to 127? That'd seemingly be +128 except that's not storable in 8 bits if we decree that we interpret these bits as 2's complement signed (which java does). What happens? The bits 'roll over'. We just add 1 as normal, which turns 0111 1111 into 1000 0000 which is -128:
byte b = 127;
b = (byte)(b + 1);
System.out.println(b); // prints -128
Imagine the number line - stretching out into infinity on both ends, from -infinite to +infinite. That's the usual way math works. Computers (or rather, int, long, etc) do not work like that. Instead of a line, it is a circle. Take your infinite number line and take some scissors, and snip that number line at -128 (because a 2's comp signed byte cannot represent -129 or anything else below -128), and at +127 (because our byte cannot represent 128 or anything above it).
And now tape the 2 cut ends together.
That's the number line. What's 'to the right' of 125? 126 - that's what +1 means: Move one to the right on the number line.
What's 'to the right' of +127? Why, -128. Because we taped it together.
Similarly, -127 - 5 is +123. '-5' is 'move 5 places to the left on the number line (or rather, number circle)'. Going in 1 decrements:
-127 (we start here)
-128 (-127 -1)
+127 (-127 -2)
+126 (-127 -3)
+125 (-127 -4)
+124 (-127 -5)
Hence, 124.
Same math applies to short (-32768 to +32767), char (which is really a 16-bit unsigned number - so 0 to 65535), int (-2147483648 to +2147483647), and even long (-2^63 to +2^63-1 - those get a little large).
short x = 32765;
x += 5;
System.out.println(x); // prints -32766.

How do I convert a short to an int without turning it into a negative in java

I am working on a file reader and came into a problem when trying to read a short. In short (punintended), java is converting a two bytes I'm using to make the short into an int to do bitwise operations and is converting it in a way to keep the same value. I need to convert the byte into an int in a way that would preserve its value so the bits stayed the same.
example of what's happening:
byte number = -1; //-1
int otherNumber = 1;
number | otherNumber; // -1
example of what I want:
byte number = -1; //-1
int otherNumber = 1;
number | otherNumber; // 129

This can be done pretty easily with some bit magic.
I'm sure you're aware that a short is 16 bits (2 bytes) and an int is 32 bits (4 bytes). So, between an integer and a short, there is a two-byte difference. Now, for positive numbers, copying the value of a short to an int is effectively copying the binary data, however, as you've pointed out, this is not the case for negative numbers.
Now let's look at how negative numbers are represented in binary. It's a bit confusing, so I'll try to keep it simple. Modern systems use what's called the two's compliment to store negative numbers. Basically all this means is that the very first bit in the set of bytes representing the number determines whether or not it's negative. For mathematical purposes, the rest of the bits are also inverted and offset 1 bit to the right (since you can't have negative 0). For example, 2 as a short would be represented as 0000 0000 0000 0010, while -2 would be represented as 1111 1111 1111 1110. Now, since the bytes are inverted in a negative number, this means that -2 in int form is the same but with 2 more bytes (16 bits) at the beginning that are all set to 1.
So, in order to combat this, all we need to do is change the extra 1s to 0s. This can be done by simply using the bitwise and operator. This operator goes through each bit and checks if the bits at each position in each operand are a 1 or a 0. If they're both 1, the bit is flipped to a 0. If not, nothing happens.
Now, with this knowledge, all we need to do is create another integer where the first two bytes are all 1. This is fairly simple to do using hexidecimal literals. Since they are an integer by default, we simply need to use this to get four bytes of 1s. With a single byte, if you were to set every bit to 1, the max value you can get is 255. 255 in hex is 0xFF, so 2 bytes would be 0xFFFF. Pretty simple, now you just need to apply it.
Here is an example that does exactly that:
short a = -2;
int b = a & 0xFFFF;
You could also use Short.toUnsignedInt(), but where's the fun in that? 😉

JAVA and byte arrays

I'm trying to use an API where there's a socket is used to communicate. A request is made up of different parts and one of them is the header which is stated as so:
Fixed header: 2 bytes, fixed at 0xffff
Generally I'm not good with bytes and streams, since I've never used it. So how should i create said byte array? I've tried the following
byte[] header = new byte[]{(byte)0xff, (byte)0xff};
But they bytes each become -1, which I believe is because 0xFF translates to 255 which is outside of the signed byte range (-128 to +127), but then how do I create a header like that?

You just did it.
In the end, computers just know about bits. The rest is what the code, and the humans looking at it, make of it. A bit is a 0 or a 1. If you bought a computer with 4GB RAM, then your computer can remember 34359738368 of those.
That's a bit unwieldy, so AMD, or intel, or TSMC, or whomever baked your chip, baked into the chip's design that the chip groups them in sets of 8 (and for certain jobs, in sets of 64 or even higher). But that's where it ends. It's just bits, really. Negative number? What's that? 2? What is this 2 you speak of. I know only 0 and 1.
So that's unwieldy too, so we humans don't wanna say: This byte holds value 00000101. We'll just say 'that holds 5'.
bits = decimal
00000000 = 0
00000001 = 1
00000010 = 2
00000011 = 3
00000100 = 4
00000101 = 5
... and so on
That's great, but what about -1? We just have 0 and 1. There's no - so how do we do this?
That's where it gets interesting. It's a convention, not something in the computer. There's this thing called two's complement: We all agree to check the first bit. If it is a 1, then we shall call this -X, where X is found by applying the following algorithm: Flip every bit (all zeroes become one, all ones become zeroes), and add 1 to it.
11111011 = -5.
Why? Well, flip every bit: 00000100
then add 1 to it : 00000101
which is 5.
But that immediately eats half of what we can represent. After all, the biggest number we can now store in a byte is 127: 01111111, which is 127. If we add 1 to this number, then we get to 10000000, but hey that starts with a 1 bit, so assuming we are all in agreement that this means it is negative, that means 1000000 is -128 (bit of an exotic case).
And sometimes that's annoying or not worth it. So sometimes we all agree that the number cannot be negative at all, and 1000000 is just 128. and 11111111 is just 255.
The computer has no idea. 255 is 11111111 and so is -1. So what's 11111111? The computer doesn't know. It doesn't even know what 2 is. It just knows zeroes and ones, and as far as the computer is concerned, 11111111 is what it is. (the math works out that + and - 'just work' regardless of whether we decree these numbers are to be seen as two's complement signed or not, cool, huh? Try it! If 11111011 is both -5 as well as 251 depending on the opinion of the one reading off the number, what happens? -5 + 2 is -3. 251 + 2 is 253. -3 and 253 boil down to the same sequence of bits. Just an example. This is, incidentally, why we do the weirdo 'flip all bits and add 1' stuff. So that + and - just work and you don't need to pass along whether you consider the bits 'signed' or 'unsigned'.
In java, all numeric types except char (which is a numeric type. You'd think it represents a character, but it really doesn't) are signed. byte is 'signed 8-bit number' (so, can represent from -128 to +127, inclusive). char is the only exception, that is an 'unsigned 16-bit number', so can hold from 0 to 65535, inclusive. It's just if you e.g. call System.out.println((char) 65);, the println method will interpret that number as: "Look this up in the unicode table and print whatever you find there", so that prints 'A'. That's part of the source code of that particular println method, it's nothing inherent about the char type in java, which is just 'a number between 0 and 65535'.
So, when you print your byte array containing 0xFF, 0xFF in java, because java agreed that we consider it signed, it prints -1, -1. But that's just java-ese for 0xFF, 0xFF. Your byte array contains 0xFF, 0xFF because at the bit level -1 and 255 are the exact same number. For bytes anyway. Not so for all the other ones (char, short, int, long).
To recap:
byte x = (byte) 200;
byte x = (byte) 0xC8;
byte x = -56;
In all these cases, x ends up holding the bits 11001000. There is no way to tell the difference. You can't ask the system: So, uh, is this x equal to 200, or 0xC8, or -56? What was used to set it? Because the computer does not know - the compiler translates all of the above code to the exact same end result, which is 11001000.
255 is -1.

Well, to start you must know that in Java all integer types are signed. This means that the most significant bit is reserved to represent the sign. That is why in Java the constant Byte.MAX_VALUE says it can go up to 127.
Now, this means you can store 8 bits in a byte, but if you happen to turn on the sign bit, whatever you store would be represented by Java as negative number.
Since 0xff turns on all the byte bits (i.e. 11111111) instead of getting 255 as you were expecting, what you're getting is -1, because that number represents -1 in Java.
Perhaps to understand it I can show you how the bits work in Java. Imagine a type called nimble of only 4 bits, where the most significant bit is reserved for sign.
This is how it would look in Java if it existed:
Imaginary Signed Type: Nimble (4 bits)
Dec. Bin. Hex.
--------------------
+0 0000 0x0
+1 0001 0x1
+2 0010 0x2
+3 0011 0x3
+4 0100 0x4
+5 0101 0x5
+6 0110 0x6
+7 0111 0x7
-8 1000 0x8
-7 1001 0x9
-6 1010 0xA
-5 1011 0xB
-4 1100 0xC
-3 1101 0xD
-2 1110 0xE
-1 1111 0xF
Notice how those numbers where the most significant bit is on become negative numbers. If this nimble was a unsigned type, then it wouldn't have negative numbers and it could reach 15.
That's why Java bytes go from -128 to 127, instead of up to 255 as you were expecting.
Now, when it comes to creating byte arrays to send to a stream, perhaps instead of creating the byte array yourself, you could wrap your socket output stream to a type-aware stream like a DataOuputStream, which allows you to send data of specific type.
For example:
try(DataOutputStream out = new DataOutpuStream(socket.getOutputStream())) {
dOut.writeByte((byte)0xff);
dOut.writeByte((byte)0xff);
}
That way you may avoid all the difficulties of having to create a header array.
But bottom line, you are array if fine.

why String.format("%02x ", -1) return ffffffff instead of ff?

I need to convert byte to hex string, and tried below:
String.format("%02x", -1);
Note, -1 here is two's complement integer
However the return I get is "ffffffff" instead of "ff" which is expected?

Because ff isn't -1. It's 255. Specifying 02 in your format might pad a short number with a leading zero to get it up to 2 digits, but it won't truncate a number in a way that would give the wrong value.
If you want to get a one-byte value for a number, you can cut it to one byte with &0xff.
int n = -1;
String.format("%02x", n&0xff);

Because %02 means the minimum width of the result should be 2 and pad with 0 if needed. It does nothing if the result is more than 2 characters wide.
Knowing that, and the fact Java integers are 32 bits long, the result is expected. If you want ff, you can either do String.format("%02x", 255) or String.format("%02x", (byte)-1).

Formatting does not truncate any significant digits. That's why if you specify less positions than needed for representation, your limit will be ignored and all significant digits will be represented. For instance, if you have a number 0x12345 and try to format it using only 2 positions (like in your format), the result string will consist of 5 digits, not 2.
Hexadecimal ff means decimal 255, not decimal -1 as you may have expected. The number -1 is of type int. It takes 4 bytes in memory. Its hexadecimal representation really needs ffffffff, i.e. 8 hex digits. There are no leading zeroes that you could truncate. Leading zeroes are possible for positive numbers only. For instance, -2 will be formatted to fffffffe, -3 to fffffffd, -256 to ffffff00, -257 to fffffeff, etc.

why does default value for char data type in java has 4 hex when it is a 16 bit datatype

I am new to java, and this question may be silly to many.
When going through the basics, i learnt this:
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
My question is why does the default, minimum and maximum have 4 hex when it can be only one?

Hex F is decimal 15 or binary 1111. It fits exactly in 4 bits. a 16-bit value can hold 4 times 4 bits, hence from 0x0000 to 0xFFFF (which is 2^16 = 65,536).
The \u in your example is for Unicode, pretty much saying that you can store unicode characters that take up to 16 bits, from \u0000 to \uFFFF.

I think you need to read up on numeral systems.
Binary: Represents numbers using 2 digits, 0 and 1.
Decimal: Represents numbers using 10 digits, 0 - 9.
Hexadecimal: Represents numbers using 16 digits, 0 - F.
A char in Java is a type that can hold numbers with 16 bits, i.e. in the range 0 - 1111111111111111 in binary, 0 - 65535 in decimal or 0 - FFFF in hexadecimal.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.