How does this print "hello world"? - java
I discovered this oddity:
for (long l = 4946144450195624l; l > 0; l >>= 5)
System.out.print((char) (((l & 31 | 64) % 95) + 32));
Output:
hello world
How does this work?
The number 4946144450195624 fits 64 bits, and its binary representation is:
10001100100100111110111111110111101100011000010101000
The program decodes a character for every 5-bits group, from right to left
00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
d | l | r | o | w | | o | l | l | e | h
5-bit codification
For 5 bits, it is possible to represent 2⁵ = 32 characters. The English alphabet contains 26 letters, and this leaves room for 32 - 26 = 6 symbols
apart from letters. With this codification scheme, you can have all 26 (one case) English letters and 6 symbols (space being among them).
Algorithm description
The >>= 5 in the for loop jumps from group to group, and then the 5-bits group gets isolated ANDing the number with the mask 31₁₀ = 11111₂ in the sentence l & 31.
Now the code maps the 5-bit value to its corresponding 7-bit ASCII character. This is the tricky part. Check the binary representations for the lowercase
alphabet letters in the following table:
ASCII | ASCII | ASCII | Algorithm
character | decimal value | binary value | 5-bit codification
--------------------------------------------------------------
space | 32 | 0100000 | 11111
a | 97 | 1100001 | 00001
b | 98 | 1100010 | 00010
c | 99 | 1100011 | 00011
d | 100 | 1100100 | 00100
e | 101 | 1100101 | 00101
f | 102 | 1100110 | 00110
g | 103 | 1100111 | 00111
h | 104 | 1101000 | 01000
i | 105 | 1101001 | 01001
j | 106 | 1101010 | 01010
k | 107 | 1101011 | 01011
l | 108 | 1101100 | 01100
m | 109 | 1101101 | 01101
n | 110 | 1101110 | 01110
o | 111 | 1101111 | 01111
p | 112 | 1110000 | 10000
q | 113 | 1110001 | 10001
r | 114 | 1110010 | 10010
s | 115 | 1110011 | 10011
t | 116 | 1110100 | 10100
u | 117 | 1110101 | 10101
v | 118 | 1110110 | 10110
w | 119 | 1110111 | 10111
x | 120 | 1111000 | 11000
y | 121 | 1111001 | 11001
z | 122 | 1111010 | 11010
Here you can see that the ASCII characters, we want to map, begin with the 7th and 6th bit set (11xxxxx₂) (except for space, which only has the 6th bit on). You could OR the 5-bit
codification with 96 (96₁₀ = 1100000₂) and that should be enough to do the mapping, but that wouldn't work for space (darn space!).
Now we know that special care has to be taken to process space at the same time as the other characters. To achieve this, the code turns the 7th bit on (but not the 6th) on the extracted 5-bit group with an OR 64 64₁₀ = 1000000₂ (l & 31 | 64).
So far the 5-bit group is of the form: 10xxxxx₂ (space would be 1011111₂ = 95₁₀).
If we can map space to 0 unaffecting other values, then we can turn the 6th bit on and that should be all.
Here is what the mod 95 part comes to play. Space is 1011111₂ = 95₁₀, using the modulus
operation (l & 31 | 64) % 95). Only space goes back to 0, and after this, the code turns the 6th bit on by adding 32₁₀ = 100000₂
to the previous result, ((l & 31 | 64) % 95) + 32), transforming the 5-bit value into a valid ASCII character.
isolates 5 bits --+ +---- takes 'space' (and only 'space') back to 0
| |
v v
(l & 31 | 64) % 95) + 32
^ ^
turns the | |
7th bit on ------+ +--- turns the 6th bit on
The following code does the inverse process, given a lowercase string (maximum 12 characters), returns the 64-bit long value that could be used with the OP's code:
public class D {
public static void main(String... args) {
String v = "hello test";
int len = Math.min(12, v.length());
long res = 0L;
for (int i = 0; i < len; i++) {
long c = (long) v.charAt(i) & 31;
res |= ((((31 - c) / 31) * 31) | c) << 5 * i;
}
System.out.println(res);
}
}
The following Groovy script prints intermediate values.
String getBits(long l) {
return Long.toBinaryString(l).padLeft(8, '0');
}
for (long l = 4946144450195624l; l > 0; l >>= 5) {
println ''
print String.valueOf(l).toString().padLeft(16, '0')
print '|' + getBits((l & 31))
print '|' + getBits(((l & 31 | 64)))
print '|' + getBits(((l & 31 | 64) % 95))
print '|' + getBits(((l & 31 | 64) % 95 + 32))
print '|';
System.out.print((char) (((l & 31 | 64) % 95) + 32));
}
Here it is:
4946144450195624|00001000|01001000|01001000|01101000|h
0154567014068613|00000101|01000101|01000101|01100101|e
0004830219189644|00001100|01001100|01001100|01101100|l
0000150944349676|00001100|01001100|01001100|01101100|l
0000004717010927|00001111|01001111|01001111|01101111|o
0000000147406591|00011111|01011111|00000000|00100000|
0000000004606455|00010111|01010111|01010111|01110111|w
0000000000143951|00001111|01001111|01001111|01101111|o
0000000000004498|00010010|01010010|01010010|01110010|r
0000000000000140|00001100|01001100|01001100|01101100|l
0000000000000004|00000100|01000100|01000100|01100100|d
Interesting!
Standard ASCII characters which are visible are in range of 32 to 127.
That's why you see 32 and 95 (127 - 32) there.
In fact, each character is mapped to 5 bits here, (you can find what is 5 bit combination for each character), and then all bits are concatenated to form a large number.
Positive longs are 63 bit numbers, large enough to hold encrypted form of 12 characters. So it is large enough to hold Hello word, but for larger texts you shall use larger numbers, or even a BigInteger.
In an application we wanted to transfer visible English characters, Persian characters and symbols via SMS. As you see, there are 32 (number of Persian characters) + 95 (number of English characters and standard visible symbols) = 127 possible values, which can be represented with 7 bits.
We converted each UTF-8 (16 bit) character to 7 bits, and gain more than a 56% compression ratio. So we could send texts with twice the length in the same number of SMSes. (Somehow, the same thing happened here.)
You are getting a result which happens to be char representation of below values
104 -> h
101 -> e
108 -> l
108 -> l
111 -> o
32 -> (space)
119 -> w
111 -> o
114 -> r
108 -> l
100 -> d
You've encoded characters as 5-bit values and packed 11 of them into a 64 bit long.
(packedValues >> 5*i) & 31 is the i-th encoded value with a range 0-31.
The hard part, as you say, is encoding the space. The lowercase English letters occupy the contiguous range 97-122 in Unicode (and ASCII, and most other encodings), but the space is 32.
To overcome this, you used some arithmetic. ((x+64)%95)+32 is almost the same as x + 96 (note how bitwise OR is equivalent to addition, in this case), but when x=31, we get 32.
It prints "hello world" for a similar reason this does:
for (int k=1587463874; k>0; k>>=3)
System.out.print((char) (100 + Math.pow(2,2*(((k&7^1)-1)>>3 + 1) + (k&7&3)) + 10*((k&7)>>2) + (((k&7)-7)>>3) + 1 - ((-(k&7^5)>>3) + 1)*80));
But for a somewhat different reason than this:
for (int k=2011378; k>0; k>>=2)
System.out.print((char) (110 + Math.pow(2,2*(((k^1)-1)>>21 + 1) + (k&3)) - ((k&8192)/8192 + 7.9*(-(k^1964)>>21) - .1*(-((k&35)^35)>>21) + .3*(-((k&120)^120)>>21) + (-((k|7)^7)>>21) + 9.1)*10));
I mostly work with Oracle databases, so I would use some Oracle knowledge to interpret and explain :-)
Let's convert the number 4946144450195624 into binary. For that I use a small function called dec2bin, i.e., decimal-to-binary.
SQL> CREATE OR REPLACE FUNCTION dec2bin (N in number) RETURN varchar2 IS
2 binval varchar2(64);
3 N2 number := N;
4 BEGIN
5 while ( N2 > 0 ) loop
6 binval := mod(N2, 2) || binval;
7 N2 := trunc( N2 / 2 );
8 end loop;
9 return binval;
10 END dec2bin;
11 /
Function created.
SQL> show errors
No errors.
SQL>
Let's use the function to get the binary value -
SQL> SELECT dec2bin(4946144450195624) FROM dual;
DEC2BIN(4946144450195624)
--------------------------------------------------------------------------------
10001100100100111110111111110111101100011000010101000
SQL>
Now the catch is the 5-bit conversion. Start grouping from right to left with 5 digits in each group. We get:
100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
We would be finally left with just 3 digits in the end at the right. Because, we had total 53 digits in the binary conversion.
SQL> SELECT LENGTH(dec2bin(4946144450195624)) FROM dual;
LENGTH(DEC2BIN(4946144450195624))
---------------------------------
53
SQL>
hello world has a total of 11 characters (including space), so we need to add two bits to the last group where we were left with just three bits after grouping.
So, now we have:
00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
Now, we need to convert it to 7-bit ASCII value. For the characters it is easy; we need to just set the 6th and 7th bit. Add 11 to each 5-bit group above to the left.
That gives:
1100100|1101100|1110010|1101111|1110111|1111111|1101111|1101100|1101100|1100101|1101000
Let's interpret the binary values. I will use the binary to decimal conversion function.
SQL> CREATE OR REPLACE FUNCTION bin2dec (binval in char) RETURN number IS
2 i number;
3 digits number;
4 result number := 0;
5 current_digit char(1);
6 current_digit_dec number;
7 BEGIN
8 digits := length(binval);
9 for i in 1..digits loop
10 current_digit := SUBSTR(binval, i, 1);
11 current_digit_dec := to_number(current_digit);
12 result := (result * 2) + current_digit_dec;
13 end loop;
14 return result;
15 END bin2dec;
16 /
Function created.
SQL> show errors;
No errors.
SQL>
Let's look at each binary value -
SQL> set linesize 1000
SQL>
SQL> SELECT bin2dec('1100100') val,
2 bin2dec('1101100') val,
3 bin2dec('1110010') val,
4 bin2dec('1101111') val,
5 bin2dec('1110111') val,
6 bin2dec('1111111') val,
7 bin2dec('1101111') val,
8 bin2dec('1101100') val,
9 bin2dec('1101100') val,
10 bin2dec('1100101') val,
11 bin2dec('1101000') val
12 FROM dual;
VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
100 108 114 111 119 127 111 108 108 101 104
SQL>
Let's look at what characters they are:
SQL> SELECT chr(bin2dec('1100100')) character,
2 chr(bin2dec('1101100')) character,
3 chr(bin2dec('1110010')) character,
4 chr(bin2dec('1101111')) character,
5 chr(bin2dec('1110111')) character,
6 chr(bin2dec('1111111')) character,
7 chr(bin2dec('1101111')) character,
8 chr(bin2dec('1101100')) character,
9 chr(bin2dec('1101100')) character,
10 chr(bin2dec('1100101')) character,
11 chr(bin2dec('1101000')) character
12 FROM dual;
CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER
--------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
d l r o w ⌂ o l l e h
SQL>
So, what do we get in the output?
d l r o w ⌂ o l l e h
That is hello⌂world in reverse. The only issue is the space. And the reason is well explained by #higuaro in his answer. I honestly couldn't interpret the space issue myself at first attempt, until I saw the explanation given in his answer.
I found the code slightly easier to understand when translated into PHP, as follows:
<?php
$result=0;
$bignum = 4946144450195624;
for (; $bignum > 0; $bignum >>= 5){
$result = (( $bignum & 31 | 64) % 95) + 32;
echo chr($result);
}
See live code
Use
out.println((char) (((l & 31 | 64) % 95) + 32 / 1002439 * 1002439));
to make it capitalised.
Related
java.util.UUID.randomUUID().toString() length
Does java.util.UUID.randomUUID().toString() length always equal to 36? I was not able to find info on that. Here it is said only the following: public static UUID randomUUID() Static factory to retrieve a type 4 (pseudo randomly generated) UUID. The UUID is generated using a cryptographically strong pseudo random number generator. Returns: A randomly generated UUID And that type 4 tells me nothing. I do not know what type 4 means in the case.
Does java.util.UUID.randomUUID().toString() length always equal to 36? Yes!! it is. A UUID actually a 128 bit value (2 long). To represent 128 bit into hex string there will be 128/4=32 char (each char is 4bit long). In string format it also contains 4 (-) that's why the length is 36. Example: 54947df8-0e9e-4471-a2f9-9af509fb5889 32 hex char + 4 hyphen char = 36 char. So the length will be always same. #Update: I do not know what type 4 means in the case.? FYI: There are several ways to generate UUID. Here type-4 means this uuid is generated using a random or pseudo-random number. From wiki - Universally_unique_identifier#Versions: UUID Versions For both variants 1 and 2, five "versions" are defined in the standards, and each version may be more appropriate than the others in specific use cases. Version is indicated by the M in the string representation. Version 1 UUIDs are generated from a time and a node id (usually the MAC address); version 2 UUIDs are generated from an identifier (usually a group or user id), time, and a node id; versions 3 and 5 produce deterministic UUIDs generated by hashing a namespace identifier and name; and version 4 UUIDs are generated using a random or pseudo-random number.
You may convert UUIDv4 16 bytes binary to 24 bytes ascii using base64, instead encode to ascii-hex (32 bytes)
For those like me that start googling before reading the javadoc, here the javadoc ;) UUID.toString For those that Don't know how to read a grammar tree read from Bottom to Top. an hexDigit is one char an hexOctet is 2 hexDigits = 2chars a node is 6 * hexOctet = 6 * 2hexdigit = 6*2 chars = 12chars a variant_and_sequence is 2 * hexOctet = 2 * 2hexdigit = 2*2 chars = 4chars a time_high_and_version is 2 * hexOctet = 2 * 2hexdigit = 2*2 chars = 4chars a time_mid is 2 * hexOctet = 2 * 2hexdigit = 2*2 chars = 4chars a time_low is 4 * hexOctet = 4* 2hexdigit = 4*2 chars = 8chars and finaly, a UUID is < time_low > "-" < time_mid > "-" < time_high_and_version > "-" < variant_and_sequence > "-"< node > = 8 chars + 1 char + 4 chars + 1 char + 4 chars + 1 char + 4 chars + 1 char + 12 chars = 36 chars ! 128 bit of data + 4 hyphen as stated previously The UUID string representation is as described by this BNF: UUID = <time_low> "-" <time_mid> "-" <time_high_and_version> "-" <variant_and_sequence> "-" <node> time_low = 4*<hexOctet> time_mid = 2*<hexOctet> time_high_and_version = 2*<hexOctet> variant_and_sequence = 2*<hexOctet> node = 6*<hexOctet> hexOctet = <hexDigit><hexDigit> hexDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F"
What is the purpose of left shifting zero by any amount?
Upon reading the ASM 4.1 source code I've found instances of the following: int ASM4 = 4 << 16 | 0 << 8 | 0; int ASM5 = 5 << 16 | 0 << 8 | 0; Does these left shifts of zero by 8 do anything to the expression, or the 'or' by 0 for that matter? Wouldn't it be equivalent to just have: int ASM4 = 4 << 16; int ASM5 = 5 << 16;
Indeed they are equivalent but one possible explanation is that they wanted to map the version numbers including both the major and minor numbers to a unique ID in their code. So in the following: int ASM4 = 4 << 16 | 0 << 8 | 0; // this looks like 4.0.0 int ASM5 = 5 << 16 | 0 << 8 | 0; // this looks list 5.0.0 The 4 and 5 represent versions 4 and 5 respectively, and the zero in 0 << 8 could potentially be the minor numbers, and the last zero is another minor number, as in 4.0.0 and 5.0.0. But that's my guess anyway. You'd really have to ask the authors.
In context: // ASM API versions int ASM4 = 4 << 16 | 0 << 8 | 0; int ASM5 = 5 << 16 | 0 << 8 | 0; Yes, this is equivalent to int ASM4 = 4 << 16; int ASM5 = 5 << 16; This is just written that way to make it clear that we are setting the 3rd byte to 4, and both lower bytes to 0. Alternatively, that it is a version number that should be read as 4.0.0.
It indeed serves no purpose, but then it is neatly and visually aligned so that the ASM developers know about the opcodes versions (if I'm not mistaken, this is the OpCodes interface you're looking at here). The same way that you'd use 1 << 0 vs 1 << 1, etc.
new to java - trying to understand: checker |= (1 << val)
the following code will check to see if you have any duplicate characters in the string, but i don't understand the if clause: public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } I tried to look up some references, I am new to bit shifting, all i understand is that << shifts the binary number left or right. Can you explain to me how checker |= (1 << val) works ? and that 'if' statement as well.
I was also going through this book Cracking the Code Interview and ended up googling for a clear explanations. Finally I understood the concept. Here is the approach. Note : We will assume, in the below code, that the string is only lower case ‘a’ through ‘z’. This will allow us to use just a single int. Java integer is of size 32 Number of lower case alphabets is 26 So we can clearly set 0/1 (true or false) value inside one integer in decimal notation. It is similar to bool visited[32] . bool uses 1 byte. Hence you need 32 bytes for storing bool visited[32]. Bit masking is a space optimization to this. Lets start : You are looping through all the characters in the string. Suppose on i'th iteration you found character 'b' . You calculate its 0 based index. int val = str.charAt(i) - 'a'; For 'b' it is 1. ie 98-97 . Now using left shift operator, we find the value of 2^1 => 2. (1 << val) // 1<<1 => 10(binary) Now let us see how bitwise & works 0 & 0 -> 0 0 & 1 -> 0 1 & 0 -> 0 1 & 1 -> 1 So by the below code : (checker & (1 << val)) We check if the checker[val] == 0 . Suppose we had already encountered 'b'. check = 0000 0000 0000 0000 0000 1000 1000 0010 & 'b' = 0000 0000 0000 0000 0000 0000 0000 0010 ---------------------------------------------- result= 0000 0000 0000 0000 0000 0000 0000 0010 ie decimal value = 2 which is >0 So you finally we understood this part. if ((checker & (1 << val)) > 0) return false; Now if 'b' was not encountered, then we set the second bit of checker using bitwise OR. ( This part is called as bit masking. ) OR's Truth table 0 | 0 -> 0 0 | 1 -> 1 1 | 0 -> 1 1 | 1 -> 1 So check = 0000 0000 0000 0000 0000 1000 1000 0000 | 'b' = 0000 0000 0000 0000 0000 0000 0000 0010 ---------------------------------------------- result= 0000 0000 0000 0000 0000 1000 1000 0010 So that simplifies this part: checker |= (1 << val); // checker = checker | (1 << val); I hope this helped someone !
Seems like I am late to the party, but let me take a stab at the explanation. First of all the AND i.e & operation: 0 & 0 = 0 1 & 0 = 0 0 & 1 = 0 1 & 1 = 1 So basically, if you are given a bit, and you want to find out if its 1 or 0, you just & it with a 1. If the result is 1 then you had a 1, else you had 0. We will use this property of the & below. The OR i.e | operation 0 | 0 = 0 1 | 0 = 1 0 | 1 = 1 1 | 1 = 1 So basically, if you are given a bit, and you want to do something to it so that the output is always 1, then you do an | 1 with it. Now, In Java the type int is 4 bytes i.e. 32 bits. Thus we can use an int itself as a data-structure to store 32 states or booleans in simpler terms, since a bit can either be 0 or 1 i.e false or true. Since we assume that our string is composed of only lower case characters, we have enough space inside our int to store a boolean for each of the 26 chars! So first we initialize our data-structure that we call checker to 0 which is nothing but 32 zeros: 0000000000000000000000. So far so good? Now we go through our string, for each character, first we get an integer representation of the character. int val = str.charAt(i) - 'a'; We subtract a from it because we want our integer to be 0 based. So if vals: a = 0 i.e. `0000000000000000000000` b = 1 i.e. `0000000000000000000001` c = 2 i.e. `0000000000000000000010` d = 4 i.e. `0000000000000000000100` Now as seen above, a is 32 zeros, but rest of the characters have a single 1 and 31 zeros. So when we use these characters, we left shift each of them by 1, i.e. (1 << val), so each of them have a single 1 bit, and 31 zero bits: a = 1 i.e. `0000000000000000000001` b = 2 i.e. `0000000000000000000010` c = 4 i.e. `0000000000000000000100` d = 8 i.e. `0000000000000000001000` We are done with the setup. Now we do 2 things: First assume all characters are different. For every char we encounter, we want our datastructure i.e. checker to have 1 for that char. So we use our OR property descrived above to generate a 1 in our datastructure, and hence we do: checker = checker | (1 << val); Thus checker stores 1 for every character we encounter. Now we come to the part where characters can repeate. So before we do step 1, we want to make sure that the checker already does not have a 1 at the position corresponding to the current character. So we check the value of checker & (1 << val) So with help of the AND property described above, if we get a 1 from this operation, then checker already had a 1 at that position, which means we must have encountered this character before. So we immediately return false. That's it. If all our & checks return 0, we finally return true, meaning there were no character repititions.
1 << val is the same as 2 to the degree of val. So it's a number which has just one one in its binary representation (the one is at position val+1, if you count from the right side of the number to the left one). a |= b means basically this: set in a all binary flags/ones from the binary representation of b (and keep those in a which were already set).
The other answers explain the coding operator usages but i don't think they touch the logic behind this code. Basically the code 1 << val is shifting 1 in a binary number to a unique place for each character for example a-0001 b-0010 c-0100 d-1000 As you can notice for different characters the place of 1 is different checker = checker | (1 << val) checker here is Oring (basically storing 1 at the same place as it was in 1<<val) So checker knows what characters have already ocurred Let's say after the occurence of a,b,c,d checker would look like this 0000 1111 finally if ((checker & (1 << val)) > 0) checks if that character has already been occured before if yes return false.To explain you should know a little about AND(&) operation. 1&1->1 0&0->0 1&0->0 So checker currently have 1 in places whose corresponding characters have already occured the only way the expression inside if statement is true if a character occurs twice which leads 1&1->1 > 0
This sets the 'val'th bit from the right to 1. 1 << val is a 1 shifted left val times. The rest of the value is 0. The line is equivalent to checker = checker | (1 << val). Or-ing with a 0 bit does nothing, since x | 0 == x. But or-ing with 1 always results in 1. So this turns (only) that bit on. The if statement is similar, in that it is checking to see if the bit is already on. The mask value 1 << val is all 0s except for a single 1. And-ing with 0 always produces 0, so most bits in the result are 0. x & 1 == x, so this will be non-zero only if that bit at val is not 0.
checker |= (1 << val) is the same as checker = checker | (1 << val). << is left bit shift as you said. 1 << val means it's a 1 shifted val digits left. Example: 1 << 4 is 1000. A left bit shift is the same as multiply by 2. 4 left bit shifts are 4 times 1 multiplied by 2. 1 * 2 = 2 (1) 2 * 2 = 4 (2) 4 * 2 = 8 (3) 8 * 2 = 16 = (4) | operator is bitwise or. It's like normal or for one bit. If we have more than one bit you do the or operation for every bit. Example: 110 | 011 = 111 You can use that for setting flags (make a bit 1). The if condition is similar to that, but has the & operator, which is bitwise and. It is mainly used to mask a binary number. Example: 110 | 100 = 100 So your code just checks if the bit at place val is 1, then return false, otherwise set the bit at place val to 1.
It means do a binary OR on the values checker and (1 << val) (which is 1, left shifted val times) and save the newly created value in checker. Left Shift (<<) Shift all the binary digits left one space. Effectively raise the number to 2 to the power of val or multiply the number by 2 val times. Bitwise OR (|) In each binary character of both left and right values, if there is a 1 in the place of either of the two numbers then keep it. Augmented Assignment (|=) Do the operation (in this case bitwise OR) and assign the value to the left hand variable. This works with many operators such as:- a += b, add a to b and save the new value in a. a *= b, multiply a by b and save the new value in a.
Bitwise shift works as follows: Example: a=15 (bit representation : 0000 1111) For operation: a<<2 It will rotate bit representation by 2 positions in left direction. So a<<2 is 0011 1100 = 0*2^7+0*2^6+1*2^5+1*2^4+1*2^3+1*2^2+0*2^1+0*2^0 = 1*2^5+1*2^4+1*2^3+1*2^2 = 32+18+8+4=60 hence a<<2 = 60 Now: checker & (1<<val), will always be greater then 0, if 1 is already present at 1<<val position. Hence we can return false. Else we will assign checker value of 1 at 1
I've been working on the algorithm and here's what I noticed that would also work. It makes the algorithm easier to understand when you exercise it by hand: public static boolean isUniqueChars(String str) { if (str.length() > 26) { // Only 26 characters return false; } int checker = 0; for (int i = 0; i < str.length(); i++) { int val = str.charAt(i) - 'a'; int newValue = Math.pow(2, val) // int newValue = 1 << val if ((checker & newValue) > 0) return false; checker += newValue // checker |= newValue } return true; When we get the value of val (0-25), we could either shift 1 to the right by the value of val, or we could use the power of 2s. Also, for as long as the ((checker & newValue) > 0) is false, the new checker value is generated when we sum up the old checker value and the newValue.
public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } 1 << val uses right shift operator. Let us say we have character z. ASCII code of z is 122. a-z is 97- 122 = 25. If we multiply 1*(2)^25 = 33554432. Binary of that is 10000000000000000000000000 if checker has 1 on its 26th bit then this statement if ((checker & (1 << val)) > 0) would be true and isUniqueChar would return false. otherwise checker would turn it's 26th bit on. |= operator(bitwise or and assignment operator) does checker bitwise OR 10000000000000000000000000. Assigns the result to checker.
Bitwise operator unexpected behavior
Can someone explain this java bitwise operator behavior?? System.out.println(010 | 4); // --> 12 System.out.println(10 | 4); // --> 14 Thank you!
The first number is interpreted as octal. So 010 == 8. Starting from that, it is easy to see, that 8d | 4d == 1000b | 0100b == 1100b == 12d The second number is interpreted to be decimal, which yields 10d | 4d == 1010b | 0100b == 1110b == 14d (Where d indicates a decimal number and b indicates a binary one.)
Wrapping 4 integers in a 64 bit long - java bitwise
Alright, so I have 4 integers I want to wrap in a long. The 4 integers all contains 3 values, positioned in the first 2 bytes: +--------+--------+ |xxpppppp|hdcsrrrr| +--------+--------+ {pppppp} represents one value, {hdcs} represents the second and {rrrr} the last. I want to pack 4 of these integers, in a long. I've tried the following: ordinal = (c1.ordinal() << (14*3) | c2.ordinal() << (14*2) | c3.ordinal() << 14 | c4.ordinal()); where c1.ordinal()...c4.ordinal() is the integers to wrap. This does not seem to work if I run a test. Lets say I want to look up the values of the last integer in the long, c4.ordinal(), where {pppppp} = 41, {hdcs} = 8 and {rrrr} = 14, I get the following results: System.out.println(c4.ordinal() & 0xf); //Prints 14 System.out.println(hand.ordinal() & 0xf); // Prints 14 - correct System.out.println(c4.ordinal() >> 4 & 0xf); // Prints 8 System.out.println(hand.ordinal() >> 4 & 0xf); // Prints 8 - correct System.out.println(c4.ordinal() >> 8 & 0x3f); // Prints 41 System.out.println(hand.ordinal() >> 8 & 0x3f); // Prints 61 - NOT correct! Now, the following is weird to me. If I remove the first two integers, and only wrap the last two, like this: ordinal = (c3.ordinal() << 14 | c4.ordinal()); And run the same test, I get the correct result: System.out.println(c4.ordinal() >> 8 & 0x3f); // Prints 41 System.out.println(hand.ordinal() >> 8 & 0x3f); // Prints 41 - correct! I have no idea whats wrong. And it does not make any sense to me, that I get the correct answer if I remove the first two integers. I'm starting to thing this might have to do with the long datatype, but I've not found anything yet, that supports this theory.
Even though you are assigning the result to a long, all of the operations are performed with int values, and so the high-order bits are lost. Force "promotion" to a long by explicitly widening the values to a long. long ordinal = (long) c1.ordinal() << (14*3) | (long) c2.ordinal() << (14*2) | (long) c3.ordinal() << 14 | (long) c4.ordinal(); Also, unless you are positive that the top two bits of each value are zero, you could run into other problems. You may wish to mask these off for safety's sake: long ordinal = (c1.ordinal() & 0x3FFFL) << (14*3) | (c2.ordinal() & 0x3FFFL) << (14*2) | (c3.ordinal() & 0x3FFFL) << 14 | (c4.ordinal() & 0x3FFFL);