I'm just beginning to learn about file compression and I've run into a bit of a roadblock. I have an application that will encode a string such as "program" as a compressed binary representation "010100111111011000"(note this is still stored as a String).
Encoding
g 111
r 10
a 110
p 010
o 011
m 00
Now I need to write this to the file system using a FileOutputStream, the problem I'm having is, how can I convert the string "010100111111011000" to a byte[]/bytes to be written to the file system with FileOutputStream?
I've never worked with bits/bytes before so I'm kind of at a dead end here.
An introduction to bit-shift operators:
First, we have the left-shift operator, x << n. This will shift all the bits in x left by n bits, filling the new bits with zero:
1111 1111
<< 3: 1111 1000
Next, we have the signed right-shift operator, x >> n. This shifts all the bits in x right by n, copying the sign bit into the new bits:
1111 1111
>> 3: 1111 1111
1000 0000
>> 3: 1111 0000
0111 1111
>> 3: 0000 1111
Finally, we have the zero-fill right-shift operator, x >>> n. This shifts all bits in x right by n bits, filling the new bits with zero:
1111 1111
>>> 3: 0001 1111
You may also find useful the bitwise-or operator, x | y. This compares the bits in each position in x and y, setting the new number's bit on if it was on in either x or y, off otherwise:
1010 0101
| 1010 1010
---------
1010 1111
You should only need the previous operators for the problem at hand, but for the sake of completeness, here are the last two:
The bitwise-and operator, x & y sets the bits in the output to one if and only if the bit is on in both x and y:
1010 0101
& 1010 1010
---------
1010 0000
The bitwise-xor operator, x ^ y sets the output bits to one if the bit is on in one number or the other but not both:
1010 0101
^ 1010 1010
---------
0000 1111
Now, applying these to the situation at hand:
You will need to use the bit-shift operators to add and manipulate bits. Start setting bits at the right side according to their string representations and shift them over. Continue until you hit the end of a byte, and then move to the next byte. Say we want to create a byte representation of "1100 1010":
Our byte Target
--------- --------
0000 0000
1100 1010
0000 0001 ^
1100 1010
0000 0011 ^
1100 1010
0000 0110 ^
1100 1010
0000 1100 ^
1100 1010
0001 1001 ^
1100 1010
0011 0010 ^
1100 1010
0110 0101 ^
1100 1010
1100 1010 ^
I will, of course, leave it to you to apply this to your work.
Chop your String up into lengths of 8 and call Byte#parseByte. If you set the radix to 2, it will parse the String as a binary number.
I guess, you want to write these zeros and ones as binary values in a file. I so, you can iterate the string taking 8 signs everytime (String.substring() or smth) and create bytes with Byte(String) constructor.
It's the easiest solution that comes to my mind for now.
If i'm not right about the problem, tell more about it please.
Related
Why do both unsigned right shift (logical right shift) and signed right shift (arithmetic right shift) produce the same result for negative numbers?
Log.v("-59 >>> 5 expected 6, actual", String.valueOf((byte)(-59 >>> 5)));
Log.v("11000101 >>> 5 expected 00000110, actual",Integer.toBinaryString( -59 >>> 5));
Log.v("11000101 >> 5 expected 00000110, actual",Integer.toBinaryString( -59 >> 5));
Android Studio Logcat output
-59 >>> 5 expected 6, actual: -2
11000101 >>> 5 expected 00000110, actual: 111111111111111111111111110
11000101 >> 5 expected 00000110, actual: 11111111111111111111111111111110
This is normal behavior. Any integer with a negative value has a binary representation starting with infinite 1s.
So if you start out with say: -3 the binary representation looks like this:
...11 1101
So if we right shift this by 2 we get
...11 1111
Now for the unsgined/signed right shift. This relies on the fact that we don't have infinite digits in our integer. So say we have an 8bit integer assigned as -3 it looks like this:
1111 1101
If we do a signed shift, it will look at the MSB (most significant bit, the most left one) and preserve the value of that when shifting. So a signed shift right by 3 looks like this:
1111 1111
On the contrary, a unsigned right shift will not check the MSB and just shift right and fill with zeroes resulting in this:
0011 1111
This is exactly what you're seeing but the output truncates the preceding zeroes away.
If you don't know why negative integers are stored that way, check this answer.
As for why (b & 0xff) >>> 5 behaves differently
Integers in java are 32bit, that means the binary representation will have 32 digits. Your -59 will look like the following binary representation:
1111 1111 1111 1111 1111 1111 1100 0101 == -59
0000 0111 1111 1111 1111 1111 1111 1110 == -59 >>> 5
If you now and this together with 0xff you get the following:
1111 1111 1111 1111 1111 1111 1100 0101 == -59
0000 0000 0000 0000 0000 0000 1111 1111 == 0xff
0000 0000 0000 0000 0000 0000 1100 0101 == -59 & 0xff
0000 0000 0000 0000 0000 0000 0000 0110 == (-59 & 0xff) >>> 5
Right shift operator:- if the number is negative then it fills with 1. if the number is positive then it fills with 0.
Unsigned Shift operator:- It fills with 0 irrespective of sign of the number.
Decimal Binary
x1 = 105 0110 1001
x2 = -38 1101 1010
1. (byte) (x>>2)
2. (byte) (x>>>26)
I understand the first shift will shift it two times to the right, and replace the missing bits with a 1. so the shift results in:
1111 0110
but I have no idea why the second shifts results in:
0011 1111 or 63.
My understanding is that the x >> adds 1 if x is negative and adds a 0 if x is positive. The >>> adds a 0 regardless of the sign. So if that is the case wouldn't the result of x2 >>> 26 be 0000 0000?
The reason for the "strange" bit shift result is because the values are widened to 32 bit (int) before the shift.
I. e. -38 isn't 1101 1010 here, but 1111 1111 1111 1111 1111 1111 1101 1010.
Which should make it clear why -38 >>> 26 is 0000 0000 0000 0000 0000 0000 0011 1111 (or 63).
The widening is described in the the Java Language Specification:
Otherwise, if the operand is of compile-time type byte, short, or char, it is promoted to a value of type int by a widening primitive conversion (§5.1.2).
If you want to perform bit shift operations on an 8 bit (byte) value, you could mask the value to use only the lower 8 bits, after widening but before shifting, like Federico suggests:
byte x = -38;
(x & 0xFF) >>> 26;
This would give the expected value of 0 (though I'm not sure if it makes sense, as any 8 bit value will be 0 if you right shift by more than 8).
I am trying my hands on bit manipulation, can someone provide the basic knowledge that can help me solve bit manipulation?
I am facing the following discrepancy.
System.out.println((~1)&1111); gives 1110. Treating ~1 as 0.
System.out.println((~1)); gives -2.
From the official tutorial by Oracle:
The unary bitwise complement operator "~" inverts a bit pattern; it can be applied to any of the integral types, making every "0" a "1" and every "1" a "0"
The reason you get -2 for ~1 is due to 2-complement:
0000 0000 0000 0000 0000 0000 0000 0001
Inverted is
1111 1111 1111 1111 1111 1111 1111 1110
Since Java uses 2-complement, this results in -2.
Break down (~1)&1111 into bits:
1111 1111 1111 1111 1111 1111 1111 1110
0000 0000 0000 0000 0000 0100 0101 0111
___________________________________________
0000 0000 0000 0000 0000 0100 0101 0110
0100 0101 0110 in base 10 is 1110
Let's assume for a moment that every number had exactly three bits. In this case, if we wanted to use signed numbers (positive and negative) we would get these values for a 3-bit-twos-complement.
000 = 0, obviously
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
So, if you invert 1 by ~, which is truely 001 (because the ~ operator inverts EVERY bit), then you get 110, which would be, correctly, the decimal number -2.
And if you do an and 1110 & 1111, then you get obviously 1110.
I was running code in an attempt to figure out the behavior of Java when losing precision due to casting between the integral decimal numeric types, and I found an unexpected result:
long l2 = 999999999999999999L; //outside of range -2147483648..2147483647 for int
int i3=(int)l2;
System.out.println(l2); //999999999999999999, as expected.
System.out.println(i3); //I expected 2147483647, but got -1486618625
Can someone please explain how I'm getting a large negative int out of a large positive long? I would have expected the system to at least make a best effort casting attempt, returning the maximum positive integer (the closest valid int to the long which is too large to be stored in integer.) Instead, I'm getting a negative number which does not make sense to me.
The narrowing primitive conversion of a long to an int discards all but the lower order 32 bits of the original number, so you don't get Integer.MAX_VALUE.
The JLS, Section 5.1.3, states:
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
This is in contrast to a primitive narrowing conversion from a floating-point type to an int, which may result in Integer.MAX_VALUE if the original value was too big.
double l2 = 999999999999999999.0;
System.out.println((int) l2);
This prints:
2147483647
Let's first take a look at the result number bits :
Now look at the bits representation of 999999999999999999 :
Notice that the first 32 bits is the same in both case.
Now considering the JLS, Section 5.1.3, as rgettman stated :
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
So you would only keep the 32 first bits (10100111011000111111111111111111) when doing the cast. Therefore, considering the highest bit of a signed integer represent the sign of the number you will have a negative(1)0100111011000111111111111111111
Which is equals to -1486618625 as decimal value.
A cast does not do any conversations nor does it care about the data. (or about truncating)
999999999999999999 in binary is 0000 1101 1110 0000 1011 0110 1011 0011 1010 0111 0110 0011 1111 1111 1111 1111.
When casting this to an signed int, just the lower 32 bits will be used:
1010 0111 0110 0011 1111 1111 1111 1111 - Now, that number means (its an signed integer now due to casting):
Leading 1 = negative number
010 0111 0110 0011 1111 1111 1111 1111 would be 660865023 in dec... not correct.
Java uses the Two's complement, which means if you are only looking at the "number" of a negative number, you need to invert every bit and add 1 to get the actual numeric value:
so:
010 0111 0110 0011 1111 1111 1111 1111
101 1000 1001 1100 0000 0000 0000 0000 + 1
101 1000 1001 1100 0000 0000 0000 0001
101 1000 1001 1100 0000 0000 0000 0001 is 1486618625 - and it's negative: Voila, your -1486618625
The highest bit of a signed integer is the sign bit. So, if you overflow an operation into that bit, you will get a negative number. In this case you are doing it by casting.
= 1101 1110 0000 1011 0110 1011 0011 1010 0111 0110 0011 1111 1111 1111 1111
Truncated to (int):
1010 0111 0110 0011 1111 1111 1111 1111
Twos' compliment transform:
0101 1000 1001 1100 0000 0000 0000 0001 = 1486618625
An integer's max value in Java is 2147483647, since Java integers are signed, right?
0xff000000 has a numeric value of 4278190080.
Yet I see Java code like this:
int ALPHA_MASK = 0xff000000;
Can anyone enlighten me please?
Just an addition to erickson's answer:
As he said, signed integers are stored as two's complements to their respective positive value on most computer architectures.
That is, the whole 2^32 possible values are split up into two sets: one for positive values starting with a 0-bit and one for negative values starting with a 1.
Now, imagine that we're limited to 3-bit numbers. Let's arrange them in a funny way that'll make sense in a second:
000
111 001
110 010
101 011
100
You see that all numbers on the left-hand side start with a 1-bit whereas on the right-hand side they start with a 0. By our earlier decision to declare the former as negative and the latter as positive, we see that 001, 010 and 011 are the only possible positive numbers whereas 111, 110 and 101 are their respective negative counterparts.
Now what do we do with the two numbers that are at the top and the bottom, respectively? 000 should be zero, obviously, and 100 will be the lowest negative number of all which doesn't have a positive counterpart. To summarize:
000 (0)
111 001 (-1 / 1)
110 010 (-2 / 2)
101 011 (-3 / 3)
100 (-4)
You might notice that you can get the bit pattern of -1 (111) by negating 1 (001) and adding 1 (001) to it:
001 (= 1) -> 110 + 001 -> 111 (= -1)
Coming back to your question:
0xff000000 = 1111 1111 0000 0000 0000 0000 0000 0000
We don't have to add further zeros in front of it as we already reached the maximum of 32 bits.
Also, it's obviously a negative number (as it's starting with a 1-bit), so we're now going to calculate its absolute value / positive counterpart:
This means, we'll take the two's complement of
1111 1111 0000 0000 0000 0000 0000 0000
which is
0000 0000 1111 1111 1111 1111 1111 1111
Then we add
0000 0000 0000 0000 0000 0000 0000 0001
and obtain
0000 0001 0000 0000 0000 0000 0000 0000 = 16777216
Therefore, 0xff000000 = -16777216.
The high bit is a sign bit. Setting it denotes a negative number: -16777216.
Java, like most languages, stores signed numbers in 2's complement form. In this case, subtracting 231, or 2147483648 from 0x7F000000, or 2130706432, yields -16777216.
Something probably worth pointing out - this code is not meant to be used as an integer with a numerical value; The purpose is as a bitmask to filter the alpha channel out of a 32 bit color value. This variable really shouldn't even be thought of as a number, just as a binary mask with the high 8 bits turned on.
the extra bit is for the sign
Java ints are twos complement
ints are signed in Java.