Bit shifting and bit mask - sample code

Bit shifting and bit mask - sample code - java

I've come across some code which has the bit masks 0xff and 0xff00 or in 16 bit binary form 00000000 11111111 and 11111111 00000000.
/**
* Function to check if the given string is in GZIP Format.
*
* #param inString String to check.
* #return True if GZIP Compressed otherwise false.
*/
public static boolean isStringCompressed(String inString)
{
try
{
byte[] bytes = inString.getBytes("ISO-8859-1");
int gzipHeader = ((int) bytes[0] & 0xff)
| ((bytes[1] << 8) & 0xff00);
return GZIPInputStream.GZIP_MAGIC == gzipHeader;
} catch (Exception e)
{
return false;
}
}
I'm trying to work out what the purpose of using these bit masks in this context (against a byte array). I can't see what difference it would make?
In the context of a GZip compressed string as this method seems to be written for the GZip magic number is 35615, 8B1F in Hex and 10001011 00011111 in binary.
Am I correct in thinking this swaps the bytes? So for example say my input string were \u001f\u008b
bytes[0] & 0xff00
bytes[0] = 1f = 00011111
& ff = 11111111
--------
= 00011111
bytes[1] << 8
bytes[1] = 8b = 10001011
<< 8 = 10001011 00000000
((bytes[1] << 8) & 0xff00)
= 10001011 00000000 & 0xff00
= 10001011 00000000
11111111 00000000 &
-------------------
10001011 00000000
So
00000000 00011111
10001011 00000000 |
-----------------
10001011 00011111 = 8B1F
To me it doesn't seem like the & is doing anything to the original byte in both cases bytes[0] & 0xff and (bytes[1] << 8) & 0xff00). What am I missing?

int gzipHeader = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);
The type byte is Java is signed. If you cast a byte to an int, its sign will be extended. The & 0xff is to mask out the 1 bits that you get from sign extension, effectively treating the byte as if it is unsigned.
Likewise for 0xff00, except that the byte is first shifted 8 bits to the left.
So, what this does is:
take the first byte, bytes[0], cast it to int and mask out the sign-extended bits (treating the byte as if it is unsigned)
take the second byte, cast it to int, shift it left by 8 bits, and mask out the sign-extended bits
combine the values with |
Note that the shift left effectively swaps the bytes.

Apparently the purpose is to read the first word of bytes and store them in gzipHeader by suitable masking and shifting. More precisely, the first part masks out exactly the first byte while the second part masks out the second byte, already shifted by 8 bits. The | combines both bit masks to an int.
The resulting value is compared against the defined value GZIPInputStream.GZIP_MAGIC to determine if the first two bytes are the defined beginning of data compressed with gzip.

This is a trick to overcome big-endian/little-endian issues. It is forcing the interpretation of the first two bytes as little-endian, i.e. [0] contains the low byte and [1] contains the high byte.

byte is a signed type. If you convert 0xff as a byte to int you get -1. If you actually want to get 255, mask after the conversion.

Related

Reading Little Endian – LS Byte first for integer how to ignore the extra 0

I have been reading these byte by bytes from streams. Example I read this line like this.
int payloadLength = r.readUnsignedShort();
The problem I have is that 2 bytes value is x3100 so it turns out to be 12544 but I suppose to only read as x31 which makes it to be only 49. How to ignore the extra 00.

Right shift the value by 8 bits and then and it with 0xFF. Right shifting moves the bits 8 bits to the right. Any other bits would also be moved to the right so you need to mask those of by do an ANDing (&) with 0xFF to get rid of them.
int payloadLength = r.readUnsignedShort();
payloadLength = (payloadLength >>> 8)& 0xFF;
System.out.println(payLoadLength);
You may also want to swap the two bytes.
v = 0xa0b;
v = swapBytes(v);
System.out.println(Integer.toHexString(v)); // 0xb0a
public static int swapBytes(int v) {
return ((v << 8)&0xFF00) | ((v >> 8) & 0xFF);
}
Normally, for reading in just 16 bits you would not have to and it with 0xFF since the high order bits are 0's. But I think it is a good practice and will prevent possible problems in the future.

Conversion from CIDR notation to IP address/subnet mask (dot-decimal)

/*
* RFC 1518, 1519 - Classless Inter-Domain Routing (CIDR)
* This converts from "prefix + prefix-length" format to
* "address + mask" format, e.g. from xxx.xxx.xxx.xxx/yy
* to xxx.xxx.xxx.xxx/yyy.yyy.yyy.yyy.
*/
static private String normalizeFromCIDR(final String netspec)
{
final int bits = 32 - Integer.parseInt(netspec.substring(netspec.indexOf('/')+1));
final int mask = (bits == 32) ? 0 : 0xFFFFFFFF - ((1 << bits)-1);
return netspec.substring(0, netspec.indexOf('/') + 1) +
Integer.toString(mask >> 24 & 0xFF, 10) + "." +
Integer.toString(mask >> 16 & 0xFF, 10) + "." +
Integer.toString(mask >> 8 & 0xFF, 10) + "." +
Integer.toString(mask >> 0 & 0xFF, 10);
}
This is a function in apache james to convert the ip to the specified format. Can you please explain what's happening inside the function. Confused with this bit shifting and conversion.
Thanks in Advance.

Bit-wise operations maybe are not most intuitive at first glance but once you got it you'll see that they're quite easy to understand. I'll try to explain what this code does on a example of 172.16.0.1/23 as a netspec string.
Part1 - CIDR to binary
The goal is to make a binary representation of a subnet mask from a given CIDR prefix length. CIDR prefix length is just a number of 1 bits in a subnet mask. The first line
final int bits = 32 - Integer.parseInt(netspec.substring(netspec.indexOf('/')+1));
finds CIDR prefix length by getting the index of / and parsing the integer that succeeds it (23 in my example). This number is subtracted from 32 to get a number of 0 in a subnet mask — those bits are also called host bits.
In this example we know that we're dealing with /23 prefix and that it's subnet mask should look like:
n represents network (16 bits for class B network), s represents subnet, h represents host.
For us network and subnet bits are functionally the same, but I made a distinction just to be precise. Our interest is just in host bits (number of it).
nnnnnnnn nnnnnnnn sssssssh hhhhhhhh
11111111 11111111 11111110 00000000
The easiest way to make this is to have a 32bit binary number of all 1s and 'fill' the last 9 bits with 0. This is where the second line comes in:
You can ignore the bits == 32 check since it is not that relevant and probably is there just as a optimization.
//final int mask = (bits == 32) ? 0 : 0xFFFFFFFF - ((1 << bits)-1);
final int mask = 0xFFFFFFFF - ((1 << 9)-1);
0xFFFFFFFF will give you 32bit binary number of all 1s. 1 shifted left 9 bits (1 << bits) will give you 512 and 512 - 1 in binary is 111111111:
1 << 9 10 00000000
- 1 1
--------------------------------------------------
1 11111111
When you subtract those values you will get the subnet mask in binary:
0xFFFFFFFF = 11111111 11111111 11111111 11111111
- (1 << 9)-1 = 1 11111111
--------------------------------------------------
11111111 11111111 11111110 00000000
Which is exactly the network mask we wanted.
Note: This is maybe not the most intuitive way of calculating the binary value. I like to start with a binary number of all ones and that number in signed int has a decimal value of -1. Then i just shift it the number of host bits to the left and that's it. (Additionally if you're dealing with integers that are larger than 32bits you can mask it with 0xFFFFFFFF):
(-1 << 9) & 0xFFFFFFFF
Part2 - Binary to dotted-decimal
The rest of the code converts the binary value to a dotted-decimal representation — 255.255.254.0.
return netspec.substring(0, netspec.indexOf('/') + 1) + // part of the netspec string before '/' -> IP address
Integer.toString(mask >> 24 & 0xFF, 10) + "." + // 11111111 & 0xFF = 0xFF
Integer.toString(mask >> 16 & 0xFF, 10) + "." + // 1111111111111111 & 0xFF = 0xFF
Integer.toString(mask >> 8 & 0xFF, 10) + "." + // 111111111111111111111110 & 0xFF = 0xFE
Integer.toString(mask >> 0 & 0xFF, 10); // 11111111111111111111111000000000 & 0xFF = 0x00
The return statement is composed of several concatenated strings, starting with IP address and following by decimal representation of each octet. Binary mask is shifted right for (4-n)*8 bits (where n is octet number) and using binary AND with 0xFF you get only the last 8bits which are then parsed by Integer.toString.
The result is 172.16.0.1/255.255.254.0.

another way to get cidr notation to binary:
input = '1.2.3.4/5'
cidr = input.split('/')
bin_mask = '1' * cidr + '0' * (32 - cidr)

Converting two bytes into 16 bit signed number

I got two bytes:
byte[0] is 0000 0000
byte[1] is 1000 0000
I would like to put them together and make one float value of them. So the result should be 128 is decimal and 0000 0000 1000 0000 in binary. Not negative because of the leading zero.
My solution so far is this approach:
float f = (bytes[0] << 8 | bytes[1]);
But this will result in a value of minus 128 for value f. It's because of the 2s complement i guess. Therefore byte[1] will be interpreted as negative. How can I just see the leading bit of byte[0] as the bit for negative/positive?

Try this:
short int16 = (short)(((bytes[0] & 0xFF) << 8) | (bytes[1] & 0xFF));
You need to use parenthesis because of operation precedence.
once you have your 16 bit integer, you can assign it to your float:
float f = int16;
yes, you could have done it in one step, but I wanted to go step by step.

Since you are performing some widening conversions, you have to stop the propagation of leading 1 bits due to the internal usage of two's complement:
byte[] bytes = {0, -128}; // bytes[0] = 0000 0000, bytes[1] = 1000 0000
short s = (short) (((bytes[0] & 0xFF) << 8) | (bytes[1] & 0xFF));
System.out.println(s); // prints 128
Also, float is for floating point numbers, since you want a 16 bit decimal number I changed the target datatype to short.

Issues when converting bytes to integers (Java specific?)

I'm a bit confused regarding a conversion from bytes to integers. Consider the following code:
byte[] data = new byte[] { 0, (byte) 0xF0 };
int masked = data[0] << 8 & 0xFF | data[1] & 0xFF; //240
int notMasked = data[0] << 8 | data[1]; //-16
Because bytes in java are signed, data[1] is not 240 decimal, but rather the 2's complement, -16. However, it should still be, in binary: 0x11110000 so, why do I need to do data[1] & 0xFF ?
Is Java converting everything to Integer before passing it to the | operator? Why does &0xFF make a difference then?

Java bytes are signed (unfortunately) - so when you promote the value to an int in order to perform the bitwise |, it ends up being sign-extended as 0xFFFFFFF0. That then messes up the | with data[0]. The masking with & 0xff converts it to an integer value of 240 (just 0x000000F0) instead.
However, you've stlil got a problem. This code:
int masked = data[0] << 8 & 0xFF | data[1] & 0xFF;
should be:
int masked = ((data[0] & 0xff) << 8) | (data[1] & 0xFF);
... otherwise you're masking after the shift, which won't work. I've added brackets because I'm never sure of the predence of &, << and |...

It is similar to a known "puzzle"
byte x = -1;
x = x >>>= 1;
System.out.println(x);
produces
-1
No shift? This is because before compiling arithemtic / shift / comparison expressions javac promotes byte (as well as short and char) to int or to long (if there is any long in the expression), so it works as follows
x -> int = 0xFFFFFFFF; 0xFFFFFFF >>> 1 = 0x7FFFFFF; (byte)0x7FFFFFF -> 0xFF

Java - write the first byte of an int

Ok, so I have searched and searched and nothing worked...
I have this array of int, each int occupies only the low order byte. For instance, I have
data[0] = Ox52
data[1] = Oxe4
data[2] = Ox18
data[3] = Oxcb
I want that the standard output contains exactly those bytes (or in other words, if I write this in a file and I examine the file with a Hex editor, I should see):
52e418cb
How can I do that?
Thank you for your help

The correct way of doing this is to shift the bytes according to their desired position and then stitch them together using the OR operator. But, you should also perform a bit mask on the lower 8 bits of the byte before shifting it. This is needed because a byte is first converted to an int (before the shifting is done). This is no big deal, but when the highest bit is 1 (i.e.: the byte is negative), your integer will become negative as well, which causes all the leading bits to be set on 1.
So:
(byte) 10000000 = (int) 11111111 11111111 11111111 10000000
Using this negative int value with the OR operator will cause a wrong result. So, the working line is this one:
((data[0] & 0xFF) << 24) | ((data[1] & 0xFF) << 16) | ((data[2] & 0xFF) << 8) | (data[3] & 0xFF)

The following seems to work fine. I'm using the OutputStream.write(int) method.
int[] ints = new int[] { 0x52, 0xe4, 0x18, 0xcb };
FileOutputStream os = new FileOutputStream(new File("/tmp/x"));
for (int i : ints) {
os.write(i);
}
os.close();
Results:
> hexdump /tmp/x
0000000 52 e4 18 cb

Just shift and OR them together before writing them to the file/output:
(data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Bit shifting and bit mask - sample code - java

This is a trick to overcome big-endian/little-endian issues. It is forcing the interpretation of the first two bytes as little-endian, i.e. [0] contains the low byte and [1] contains the high byte.

byte is a signed type. If you convert 0xff as a byte to int you get -1. If you actually want to get 255, mask after the conversion.

Related

Reading Little Endian – LS Byte first for integer how to ignore the extra 0

Conversion from CIDR notation to IP address/subnet mask (dot-decimal)

Converting two bytes into 16 bit signed number

Issues when converting bytes to integers (Java specific?)

Java - write the first byte of an int

Categories

Resources