I am not sure how to phrase the topic for this question because I am new to bit manipulation and really don't understand how it works.
I'm in the process of reverse engineering a game application just to see how it works and wanted to figure out how exactly the '&' operator is being used in a method.
Partial Code:
int n = (random numbers will be provided below)
int n2 = n & 1920 // interested in this line of code
switch (n2){
//ignore n2 value assignment inside of cases
case 256: {
n2 = 384;
break;
case 384: {
n2 = 512;
break;
case 512: {
n2 = 0
break;
Test Values:
Input Values | Output Values | Substituting Values
n = 387 | n2 = 384 | ( 387 & 1920 ) = 384
n = 513 | n2 = 512 | ( 513 & 1920 ) = 512
n = 12546 | n2 = 256 | ( 12546 & 1920 ) = 256
n = 18690 | n2 = 256 | ( 18690 & 1920 ) = 256
Based on this use case I have a few questions:
What is the & operator doing in this example?
To me it looks like most of the values are being rounded down to the nearest bit interval, except for the numbers greater than 10000
What is so important about the number 1920?
How did they come up with this number to get to a specific bit interval? (if possible to figure out)
The first thing you need to do, to understand bit manipulation, is to convert all base-10 decimal numbers into a number format showing bits, i.e. base-2 binary numbers or base-16 hexadecimal numbers (if you've learned to read those yet).
Bits are numbered from the right, starting at 0.
Decimal Hex Binary
256 = 0x100 = 0b1_0000_0000
384 = 0x180 = 0b1_1000_0000
512 = 0x200 = 0b10_0000_0000
1920 = 0x780 = 0b111_1000_0000
| | | | |
10 8 7 4 0 Bit Number
As you can see, n & 1920 will clear all but bits 7-10.
As long as n doesn't have any set bits above 10, i.e. greater than 0x7FF = 2047, the effect is as you stated, the values are being rounded down (truncated) to the nearest bit interval, i.e. multiple of 128.
128 + 256 + 512 + 1024 = 1920.
These are also powers of 2. let ^ be power of.
128 = 2^7
256 = 2^8
512 = 2^9
1024 = 2^10
The exponent also represents the location of the bit in the number, going from right to left starting with bit 0.
By ANDing the a value with 1920 you can see if any of the bits are set.
Let's say you wanted to see if n had only bit 7 was set.
if ((n & 1920) == 128) {
// it is set.
}
Or to see if it had bits 7 and 8 set.
if ((n & 1920) == 384) {
// then those bits are set.
}
You can also set a particular bit by using |.
n |= 128. Set's bit 7 to 1.
I need to set the bit for some of the Bluetooth Features listed below using Java:
HeaderValue:BluetoothFeatures,
Tag ID:0x10,
Length:4 bytes,
Possible Values :
Bit 0 = a,
Bit 1 = b,
Bit 2 = c,
Bit 3 = d,
Bit 4 = e ....so on till bit 31.
To set the seventh bit to 1:
b = (byte) (b | (1 << 6));
To set the sixth bit to zero:
b = (byte) (b & ~(1 << 5));
(The bit positions are effectively 0-based, so that's why the "seventh bit" maps to 1 << 6 instead of 1 << 7.)
Source : Change bits value in Byte
Code in java:
byte x = new Integer((version << 6) | (padding << 5)
| (extension << 4) | cc).byteValue();
I need this in objective-c
I tried uint8_t x = (version << 6 ) | (padding << 5) | (extension <<4) | cc);
The java statement returns -128, while my approach returns 128
You should use signed type instead of unsigned: uint8_t -> int8_t.
Why do you get 128? There's no negative numbers in unsigned integers, so you can imagine it like this -128 = 0 - 128 = (0 - 1) - 127 = 255 - 127 = 128 (0 - 1 = 255 for unsigned 1-byte integer, which range is [0; 255])
You can also read byte docs and C data-types docs
I discovered this oddity:
for (long l = 4946144450195624l; l > 0; l >>= 5)
System.out.print((char) (((l & 31 | 64) % 95) + 32));
Output:
hello world
How does this work?
The number 4946144450195624 fits 64 bits, and its binary representation is:
10001100100100111110111111110111101100011000010101000
The program decodes a character for every 5-bits group, from right to left
00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
d | l | r | o | w | | o | l | l | e | h
5-bit codification
For 5 bits, it is possible to represent 2⁵ = 32 characters. The English alphabet contains 26 letters, and this leaves room for 32 - 26 = 6 symbols
apart from letters. With this codification scheme, you can have all 26 (one case) English letters and 6 symbols (space being among them).
Algorithm description
The >>= 5 in the for loop jumps from group to group, and then the 5-bits group gets isolated ANDing the number with the mask 31₁₀ = 11111₂ in the sentence l & 31.
Now the code maps the 5-bit value to its corresponding 7-bit ASCII character. This is the tricky part. Check the binary representations for the lowercase
alphabet letters in the following table:
ASCII | ASCII | ASCII | Algorithm
character | decimal value | binary value | 5-bit codification
--------------------------------------------------------------
space | 32 | 0100000 | 11111
a | 97 | 1100001 | 00001
b | 98 | 1100010 | 00010
c | 99 | 1100011 | 00011
d | 100 | 1100100 | 00100
e | 101 | 1100101 | 00101
f | 102 | 1100110 | 00110
g | 103 | 1100111 | 00111
h | 104 | 1101000 | 01000
i | 105 | 1101001 | 01001
j | 106 | 1101010 | 01010
k | 107 | 1101011 | 01011
l | 108 | 1101100 | 01100
m | 109 | 1101101 | 01101
n | 110 | 1101110 | 01110
o | 111 | 1101111 | 01111
p | 112 | 1110000 | 10000
q | 113 | 1110001 | 10001
r | 114 | 1110010 | 10010
s | 115 | 1110011 | 10011
t | 116 | 1110100 | 10100
u | 117 | 1110101 | 10101
v | 118 | 1110110 | 10110
w | 119 | 1110111 | 10111
x | 120 | 1111000 | 11000
y | 121 | 1111001 | 11001
z | 122 | 1111010 | 11010
Here you can see that the ASCII characters, we want to map, begin with the 7th and 6th bit set (11xxxxx₂) (except for space, which only has the 6th bit on). You could OR the 5-bit
codification with 96 (96₁₀ = 1100000₂) and that should be enough to do the mapping, but that wouldn't work for space (darn space!).
Now we know that special care has to be taken to process space at the same time as the other characters. To achieve this, the code turns the 7th bit on (but not the 6th) on the extracted 5-bit group with an OR 64 64₁₀ = 1000000₂ (l & 31 | 64).
So far the 5-bit group is of the form: 10xxxxx₂ (space would be 1011111₂ = 95₁₀).
If we can map space to 0 unaffecting other values, then we can turn the 6th bit on and that should be all.
Here is what the mod 95 part comes to play. Space is 1011111₂ = 95₁₀, using the modulus
operation (l & 31 | 64) % 95). Only space goes back to 0, and after this, the code turns the 6th bit on by adding 32₁₀ = 100000₂
to the previous result, ((l & 31 | 64) % 95) + 32), transforming the 5-bit value into a valid ASCII character.
isolates 5 bits --+ +---- takes 'space' (and only 'space') back to 0
| |
v v
(l & 31 | 64) % 95) + 32
^ ^
turns the | |
7th bit on ------+ +--- turns the 6th bit on
The following code does the inverse process, given a lowercase string (maximum 12 characters), returns the 64-bit long value that could be used with the OP's code:
public class D {
public static void main(String... args) {
String v = "hello test";
int len = Math.min(12, v.length());
long res = 0L;
for (int i = 0; i < len; i++) {
long c = (long) v.charAt(i) & 31;
res |= ((((31 - c) / 31) * 31) | c) << 5 * i;
}
System.out.println(res);
}
}
The following Groovy script prints intermediate values.
String getBits(long l) {
return Long.toBinaryString(l).padLeft(8, '0');
}
for (long l = 4946144450195624l; l > 0; l >>= 5) {
println ''
print String.valueOf(l).toString().padLeft(16, '0')
print '|' + getBits((l & 31))
print '|' + getBits(((l & 31 | 64)))
print '|' + getBits(((l & 31 | 64) % 95))
print '|' + getBits(((l & 31 | 64) % 95 + 32))
print '|';
System.out.print((char) (((l & 31 | 64) % 95) + 32));
}
Here it is:
4946144450195624|00001000|01001000|01001000|01101000|h
0154567014068613|00000101|01000101|01000101|01100101|e
0004830219189644|00001100|01001100|01001100|01101100|l
0000150944349676|00001100|01001100|01001100|01101100|l
0000004717010927|00001111|01001111|01001111|01101111|o
0000000147406591|00011111|01011111|00000000|00100000|
0000000004606455|00010111|01010111|01010111|01110111|w
0000000000143951|00001111|01001111|01001111|01101111|o
0000000000004498|00010010|01010010|01010010|01110010|r
0000000000000140|00001100|01001100|01001100|01101100|l
0000000000000004|00000100|01000100|01000100|01100100|d
Interesting!
Standard ASCII characters which are visible are in range of 32 to 127.
That's why you see 32 and 95 (127 - 32) there.
In fact, each character is mapped to 5 bits here, (you can find what is 5 bit combination for each character), and then all bits are concatenated to form a large number.
Positive longs are 63 bit numbers, large enough to hold encrypted form of 12 characters. So it is large enough to hold Hello word, but for larger texts you shall use larger numbers, or even a BigInteger.
In an application we wanted to transfer visible English characters, Persian characters and symbols via SMS. As you see, there are 32 (number of Persian characters) + 95 (number of English characters and standard visible symbols) = 127 possible values, which can be represented with 7 bits.
We converted each UTF-8 (16 bit) character to 7 bits, and gain more than a 56% compression ratio. So we could send texts with twice the length in the same number of SMSes. (Somehow, the same thing happened here.)
You are getting a result which happens to be char representation of below values
104 -> h
101 -> e
108 -> l
108 -> l
111 -> o
32 -> (space)
119 -> w
111 -> o
114 -> r
108 -> l
100 -> d
You've encoded characters as 5-bit values and packed 11 of them into a 64 bit long.
(packedValues >> 5*i) & 31 is the i-th encoded value with a range 0-31.
The hard part, as you say, is encoding the space. The lowercase English letters occupy the contiguous range 97-122 in Unicode (and ASCII, and most other encodings), but the space is 32.
To overcome this, you used some arithmetic. ((x+64)%95)+32 is almost the same as x + 96 (note how bitwise OR is equivalent to addition, in this case), but when x=31, we get 32.
It prints "hello world" for a similar reason this does:
for (int k=1587463874; k>0; k>>=3)
System.out.print((char) (100 + Math.pow(2,2*(((k&7^1)-1)>>3 + 1) + (k&7&3)) + 10*((k&7)>>2) + (((k&7)-7)>>3) + 1 - ((-(k&7^5)>>3) + 1)*80));
But for a somewhat different reason than this:
for (int k=2011378; k>0; k>>=2)
System.out.print((char) (110 + Math.pow(2,2*(((k^1)-1)>>21 + 1) + (k&3)) - ((k&8192)/8192 + 7.9*(-(k^1964)>>21) - .1*(-((k&35)^35)>>21) + .3*(-((k&120)^120)>>21) + (-((k|7)^7)>>21) + 9.1)*10));
I mostly work with Oracle databases, so I would use some Oracle knowledge to interpret and explain :-)
Let's convert the number 4946144450195624 into binary. For that I use a small function called dec2bin, i.e., decimal-to-binary.
SQL> CREATE OR REPLACE FUNCTION dec2bin (N in number) RETURN varchar2 IS
2 binval varchar2(64);
3 N2 number := N;
4 BEGIN
5 while ( N2 > 0 ) loop
6 binval := mod(N2, 2) || binval;
7 N2 := trunc( N2 / 2 );
8 end loop;
9 return binval;
10 END dec2bin;
11 /
Function created.
SQL> show errors
No errors.
SQL>
Let's use the function to get the binary value -
SQL> SELECT dec2bin(4946144450195624) FROM dual;
DEC2BIN(4946144450195624)
--------------------------------------------------------------------------------
10001100100100111110111111110111101100011000010101000
SQL>
Now the catch is the 5-bit conversion. Start grouping from right to left with 5 digits in each group. We get:
100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
We would be finally left with just 3 digits in the end at the right. Because, we had total 53 digits in the binary conversion.
SQL> SELECT LENGTH(dec2bin(4946144450195624)) FROM dual;
LENGTH(DEC2BIN(4946144450195624))
---------------------------------
53
SQL>
hello world has a total of 11 characters (including space), so we need to add two bits to the last group where we were left with just three bits after grouping.
So, now we have:
00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
Now, we need to convert it to 7-bit ASCII value. For the characters it is easy; we need to just set the 6th and 7th bit. Add 11 to each 5-bit group above to the left.
That gives:
1100100|1101100|1110010|1101111|1110111|1111111|1101111|1101100|1101100|1100101|1101000
Let's interpret the binary values. I will use the binary to decimal conversion function.
SQL> CREATE OR REPLACE FUNCTION bin2dec (binval in char) RETURN number IS
2 i number;
3 digits number;
4 result number := 0;
5 current_digit char(1);
6 current_digit_dec number;
7 BEGIN
8 digits := length(binval);
9 for i in 1..digits loop
10 current_digit := SUBSTR(binval, i, 1);
11 current_digit_dec := to_number(current_digit);
12 result := (result * 2) + current_digit_dec;
13 end loop;
14 return result;
15 END bin2dec;
16 /
Function created.
SQL> show errors;
No errors.
SQL>
Let's look at each binary value -
SQL> set linesize 1000
SQL>
SQL> SELECT bin2dec('1100100') val,
2 bin2dec('1101100') val,
3 bin2dec('1110010') val,
4 bin2dec('1101111') val,
5 bin2dec('1110111') val,
6 bin2dec('1111111') val,
7 bin2dec('1101111') val,
8 bin2dec('1101100') val,
9 bin2dec('1101100') val,
10 bin2dec('1100101') val,
11 bin2dec('1101000') val
12 FROM dual;
VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
100 108 114 111 119 127 111 108 108 101 104
SQL>
Let's look at what characters they are:
SQL> SELECT chr(bin2dec('1100100')) character,
2 chr(bin2dec('1101100')) character,
3 chr(bin2dec('1110010')) character,
4 chr(bin2dec('1101111')) character,
5 chr(bin2dec('1110111')) character,
6 chr(bin2dec('1111111')) character,
7 chr(bin2dec('1101111')) character,
8 chr(bin2dec('1101100')) character,
9 chr(bin2dec('1101100')) character,
10 chr(bin2dec('1100101')) character,
11 chr(bin2dec('1101000')) character
12 FROM dual;
CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER
--------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
d l r o w ⌂ o l l e h
SQL>
So, what do we get in the output?
d l r o w ⌂ o l l e h
That is hello⌂world in reverse. The only issue is the space. And the reason is well explained by #higuaro in his answer. I honestly couldn't interpret the space issue myself at first attempt, until I saw the explanation given in his answer.
I found the code slightly easier to understand when translated into PHP, as follows:
<?php
$result=0;
$bignum = 4946144450195624;
for (; $bignum > 0; $bignum >>= 5){
$result = (( $bignum & 31 | 64) % 95) + 32;
echo chr($result);
}
See live code
Use
out.println((char) (((l & 31 | 64) % 95) + 32 / 1002439 * 1002439));
to make it capitalised.
Alright, so I have 4 integers I want to wrap in a long.
The 4 integers all contains 3 values, positioned in the first 2 bytes:
+--------+--------+
|xxpppppp|hdcsrrrr|
+--------+--------+
{pppppp} represents one value, {hdcs} represents the second and {rrrr} the last.
I want to pack 4 of these integers, in a long. I've tried the following:
ordinal = (c1.ordinal() << (14*3) | c2.ordinal() << (14*2) | c3.ordinal() << 14 | c4.ordinal());
where c1.ordinal()...c4.ordinal() is the integers to wrap.
This does not seem to work if I run a test. Lets say I want to look up the values of the last integer in the long, c4.ordinal(), where {pppppp} = 41, {hdcs} = 8 and {rrrr} = 14, I get the following results:
System.out.println(c4.ordinal() & 0xf); //Prints 14
System.out.println(hand.ordinal() & 0xf); // Prints 14 - correct
System.out.println(c4.ordinal() >> 4 & 0xf); // Prints 8
System.out.println(hand.ordinal() >> 4 & 0xf); // Prints 8 - correct
System.out.println(c4.ordinal() >> 8 & 0x3f); // Prints 41
System.out.println(hand.ordinal() >> 8 & 0x3f); // Prints 61 - NOT correct!
Now, the following is weird to me. If I remove the first two integers, and only wrap the last two, like this:
ordinal = (c3.ordinal() << 14 | c4.ordinal());
And run the same test, I get the correct result:
System.out.println(c4.ordinal() >> 8 & 0x3f); // Prints 41
System.out.println(hand.ordinal() >> 8 & 0x3f); // Prints 41 - correct!
I have no idea whats wrong. And it does not make any sense to me, that I get the correct answer if I remove the first two integers. I'm starting to thing this might have to do with the long datatype, but I've not found anything yet, that supports this theory.
Even though you are assigning the result to a long, all of the operations are performed with int values, and so the high-order bits are lost. Force "promotion" to a long by explicitly widening the values to a long.
long ordinal = (long) c1.ordinal() << (14*3) |
(long) c2.ordinal() << (14*2) |
(long) c3.ordinal() << 14 |
(long) c4.ordinal();
Also, unless you are positive that the top two bits of each value are zero, you could run into other problems. You may wish to mask these off for safety's sake:
long ordinal = (c1.ordinal() & 0x3FFFL) << (14*3) |
(c2.ordinal() & 0x3FFFL) << (14*2) |
(c3.ordinal() & 0x3FFFL) << 14 |
(c4.ordinal() & 0x3FFFL);