So im making a chess engine in java, which involved a lot of bit operations, and I have been looking at some C code for some inspiration. In the code, he uses the print statement:
for (int rank = 0; rank < 8; rank++){
for (int file = 0; file < 8; file++) {
int square = rank * 8 + file;
printf("%d", (bitboard & (1ULL << sqaure)) ?1 :0 );
}
}
The whole point of this method is that it loops through a long with 64 bits in it, and prints out an 8x8 square to represent a chessboard, with 1s for taken squares, and 0s for empty squares.
Now I am familiar with everything in this code block, except 1ULL. What does it represent? Is there a way I can use this in java, or do I not even need to worry about it?
1ULL is an unsigned long long - probably 64-bit. It means: The number '1', but as an unsigned long long. The unsigned part doesn't actually appear to matter, so, just.. 1L << square would do the job. (java also has this postfix letter thing, but only F, D, and L, for float, double, and long).
Related
Just when I though I had a fair grasp on how Java treats all Integers/Bytes etc.. as signed numbers, it hit me with another curve ball and got me thinking if I really understand this treatment after all.
This is a piece of assembly code that is supposed to jump to an address if a condition is met (which indeed is met). PC just before the jump is C838h and then after condition check it is supposed to be: C838h + FCh (h = hex) which I thought would be treated as signed so the PC would jump backwards: FCh = -4 in two's compliment negative number. But To my surprise, java ADDED FCh to the PC making it incorrectly jump to C934h instead of back to C834h.
C832: LD B,04 06 0004 (PC=C834)
C834: LD (HL), A 77 9800 (PC=C835)
C835: INC L:00 2C 0001 (PC=C836)
C836: JR NZ, n 20 00FC (PC=C934)
I tested this in a java code and indeed the result was the same:
int a = 0xC838;
int b = 0xFC;
int result = a + b;
System.out.printf("%04X\n", result); //prints C934 (incorrect)
To fix this I had to cast FCh into a byte after checking if the first bit is 1, which it is: 11111100
int a = 0xC838;
int b = (byte) 0xFC;
int result = a + b;
System.out.printf("%04X\n", result); //prints C834 (correct)
In short I guess my question is that I thought java would know that FCh is a negative number but that is not the case unless I cast it to a byte. Why? Sorry I know this question is asked many times and I seem to be asking it myself alot.
0xfc is a positive number. If you want a negative number, then write a negative number. -0x4 would do just fine.
But if you want to apply this to non-constant data, you'll need to tell Java that you want it sign-extended in some way.
The core of the problem is that you have a 32-bit signed integer, but you want it treated like an 8-bit signed integer. The easiest way to achieve that would be to just use byte as you did above.
If you really don't want to write byte, you can write (0xfc << 24) >> 24:
class Main
{
public static void main(String[] args)
{
int a = 0xC838;
int b = (0xfc << 24) >> 24;
int result = a + b;
System.out.printf("%04X\n", result);
}
}
(The 24 derives from the difference of the sizes of int (32 bits) and byte (8 bits)).
The algorithm below converts hex to decimal, but I've confused, how this solution works?
public static int hex2decimal(String s) {
String digits = "0123456789ABCDEF";
s = s.toUpperCase();
int val = 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
int d = digits.indexOf(c);
val = 16*val + d;
}
return val;
}
source
I had known only one approach to do that before I found out this one.
I mean, everyone knows:
X*16^Y where X is the number you want to convert and Y is the position for the number (begging from the end to the start).
So, if you want to convert DA145 to decimal would be...
*(5 * 16^0) + (4 * 16^1) + (1 * 16^2) + (10 * 16^3) + (13 * 16^4)*
This algorithm uses the fact that we can calculate 16^Y by repeatedly multiply together 16's and also that we can factor out common multiplications by 16. From your example you would instead end at:
13*16 + 10)*16 + 1)*16 + 4)*16 + 5
I've omitted the leading parentheses, as you see it happens to be that 13 is in effect multiplied with 16 four times.
The algorithm does pretty much the same as you would do it. It takes a String and compares every single char with the given value. This value is evaluated by digits, at which position it stands (e.g. A is in the 10th position, since we start counting with 0). This allows to easily change it e.g. to a 17 digit system instead of hexadezimal.
Edit: About the powers of 16, look at #skykings answer.
I am trying to compute hamming distances between each node in a graph of n nodes. Each node in this graph has a label of the same length (k) and the alphabet used for labels is {0, 1, *}. The '*' operates as a don't care symbol. For example, hamming distances between labels 101*01 and 1001*1 is equal to 1 (we say they only differ at the 3rd index).
What I need to do is to find all 1-hamming-distance neighbors of each node and report exactly at which index those two labels differ.
I am comparing each nodes label with all others character by character as follows:
// Given two strings s1, s2
// returns the index of the change if hd(s1,s2)=1, -1 otherwise.
int count = 0;
char c1, c2;
int index = -1;
for (int i = 0; i < k; i++)
{
// do not compute anything for *
c1 = s1.charAt(i);
if (c1 == '*')
continue;
c2 = s2.charAt(i);
if (c2 == '*')
continue;
if (c1 != c2)
{
index = i;
count++;
// if hamming distance is greater than 1, immediately stop
if (count > 1)
{
index = -1;
break;
}
}
}
return index;
I may have a couple of millions nodes. k is usually around 50. I am using JAVA, this comparison takes n*n*k time and operates slow. I considered making use of tries and VP-trees but could not figure out which data structure works for this case. I also studied the Simmetrics library but nothing flashed into my mind. I would really appreciate any suggestions.
Try this approach:
Convert the keys into ternary numbers (base 3). i.e. 0=0, 1=1, *=2
10 digits ternary give you a range of 0..59049 which fits in 16 bits.
That means two of those would form a 32 bit word. Create a lookup table with 4 billion entries that return the distance between those two 10 digit ternary words.
You can now use the lookup table to check 10 characters of the key with one lookup. If you use 5 characters, then 3^5 gives you 243 values which would fit into one byte, so the lookup table would only be 64 KB.
By using shift operations, you can create lookup tables of different sizes to balance memory and speed.
That way, you can optimize the loop to abort much more quickly.
To get the position of the first difference, you can use a second lookup table which contains the index of the first difference for two key substrings.
If you have millions of nodes, then you will have many that start with the same substring. Try to sort them into buckets where one bucket contains nodes that start with the same key. The goal here is to make the buckets as small as possible (to reduce the n*n).
Instead of / additional to the string, store a mask for 1 bits and a mask for * bits. One could use BitSet, but let's try without.
static int mask(String value, char digit) {
int mask = 0;
int bit = 2; // Start with bits[1] as per specification.
for (int i = 0; i < value.length(); ++i) {
if (value.charAt(i) == digit) {
mask |= bit;
}
bit <<= 1;
}
return mask;
}
class Cell {
int ones;
int stars;
}
int difference(Cell x, Cell y) {
int distance = 0;
return (x.ones & ~y.stars) ^ (y.ones & ~x.stars);
}
int hammingDistance(Cell x, Cell y) {
return Integer.bitCount(difference(x, y));
}
boolean differsBy1(Cell x, Cell y) {
int diff = difference(x, y);
return diff == 0 ? false : (diff & (diff - 1)) == 0;
}
int bitPosition(int diff) {
return Integer.numberOfTrailingZeroes(diff);
}
Interesting problem. It would be easy it weren't for the wild card symbol.
If the wildcard was a regular character in the alphabet, then for a given string you could enumerate all k hamming distance 1 strings. Then look these strings up in a multi-map. So for example for 101 you look up 001,111 and 100.
The don't care symbol makes it so that you can't do that lookup. However if the multi-map is build such that each node is stored by all its possible keys you can do that lookup again. So for example 1*1 is stored as 111 and 101. So when you do the look up for 10* you look up 000,010,011,001,111 which would find 1*1 which was stored by 111.
The upside of this is also that you can store all labels as integers rather then trinary structures so with an int[3] as the key value you can use any k < 96.
Performance would depend on the backing implementation of the multi-map. Ideally you'd use a hash implementation for key sizes < 32 and a tree-implementation for anything above. With the tree-implementation all nodes be connected to their distance-1 neighbors in O(n*k*log(n)). Building the multi-map takes O(n * 2 ^ z) where z is maximum number of wildcard characters for any string. If the average number of wildcards is low this should be an acceptable performance penalty.
edit: You improve look up performance for all nodes to O(n*log(n)) by also inserting the hamming distance 1 neighbors into the multi-map but that might just explode its size.
Note: I'm typing this in a lunch break. I haven't checked the details yet.
I am reading wave files in my java program. The right channel audio has half the sample which happens to be 445440 samples (double amplitude values). Everything is working fine except for some significant differences in the values I am reading in Matlab. What's bugging me is that most of the values are identical (in my program and Matlab), but when I averaged all the elements, the values are quite far apart:
in Matlab I got: 1.4581E*-05, and my program: -44567.3253
So I started checking out values until I found a different value at the 166th element!
Matlab has -6.10351562500000e-05 and I have 2.0! (the value before and after are this are identical).
This is quite frustrating as only few elements in the first 300 elements differed! As you can imagine, I cannot physically go through all 445440 elements to understand the pattern.
I don't even know where to start looking for the issue. So taking a chance by asking all the brilliant minds out there. Here's my code if it helps:
public double[] getAmplitudes(Boolean asArrayOfDouble){
//bytesInASample is 2 (16-bit little endian);
int numOfSamples = data.length / bytesInASample ;
double[] amplitudes = new double[numOfSamples];
int pointer = 0;
for (int i = 0; i < numSamples; i++) {
double ampValue= 0;
for (int byteNumber = 0; byteNumber < bytesPerSample; byteNumber ++) {
ampValue+= (double) ((data[pointer ++] & 0xFF) << (byteNumber * 8))/32767.0;
}
amplitudes[i] = ampValue;
}
return amplitudes;
}
After this, I am simply reading the right channel data by using the following code:
double[] rightChannelData = new double[data.length/2];
for(int i = 0; i < data.length/2; i++)
{
rightChannelData [i] = data[2*i+1];
}
I know this might be a hard question to answer without seeing the actual program and it's output in contrast to the Matlab output. So do let me know if any additional information is needed.
You are masking all bytes with the term data[pointer ++] & 0xFF creating all-unsigned values. For values consisting of two bytes you are creating int values between 0 and 65536 which, after dividing by 32767.0, yield to values between 0.0 and 2.0 whereas Matlab using signed interpretation produces values in the range -1.0 and 1.0.
To illustrate this:
The short value 0xFFFE, interpreted as signed value is -2, and the division is -2/32768.0 produces -6.10351562500000e-05 while interpreted as unsigned is 65534 and 65534/32767.0 produces 2.0.
(Note that the negative value was divided by the absolute value of Short.MIN_VALUE rather than Short.MAX_VALUE…)
It’s not clear how you could calculate an average of -44567.3253 from that. Even for your unsigned values (between 0.0 and 2.0) that is way off.
After all, you are better off not doing everything manually:
ShortBuffer buf=ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN)
.asShortBuffer();
int numOfSamples = buf.remaining();
double[] amplitudes = new double[numOfSamples];
for (int i = 0; i < numOfSamples; i++) {
amplitudes[i] = buf.get() * (1.0/32768.0);
}
return amplitudes;
Since I don’t know how Matlab does the normalization I cannot guaranty that the values are the same. It’s possible that the loop body has to look like this instead:
final short s = buf.get();
amplitudes[i] = s * (s<0? (1.0/32768.0): (1.0/32767));
I recall reading about a method for efficiently using random bits in an article on a math-oriented website, but I can't seem to get the right keywords in Google to find it anymore, and it's not in my browser history.
The gist of the problem that was being asked was to take a sequence of random numbers in the domain [domainStart, domainEnd) and efficiently use the bits of the random number sequence to project uniformly into the range [rangeStart, rangeEnd). Both the domain and the range are integers (more correctly, longs and not Z). What's an algorithm to do this?
Implementation-wise, I have a function with this signature:
long doRead(InputStream in, long rangeStart, long rangeEnd);
in is based on a CSPRNG (fed by a hardware RNG, conditioned through SecureRandom) that I am required to use; the return value must be between rangeStart and rangeEnd, but the obvious implementation of this is wasteful:
long doRead(InputStream in, long rangeStart, long rangeEnd) {
long retVal = 0;
long range = rangeEnd - rangeStart;
// Fill until we get to range
for (int i = 0; (1 << (8 * i)) < range; i++) {
int in = 0;
do {
in = in.read();
// but be sure we don't exceed range
} while(retVal + (in << (8 * i)) >= range);
retVal += in << (8 * i);
}
return retVal + rangeStart;
}
I believe this is effectively the same idea as (rand() * (max - min)) + min, only we're discarding bits that push us over max. Rather than use a modulo operator which may incorrectly bias the results to the lower values, we discard those bits and try again. Since hitting the CSPRNG may trigger re-seeding (which can block the InputStream), I'd like to avoid wasting random bits. Henry points out that this code biases against 0 and 257; Banthar demonstrates it in an example.
First edit: Henry reminded me that summation invokes the Central Limit Theorem. I've fixed the code above to get around that problem.
Second edit: Mechanical snail suggested that I look at the source for Random.nextInt(). After reading it for a while, I realized that this problem is similar to the base conversion problem. See answer below.
Your algorithm produces biased results. Let's assume rangeStart=0 and rangeEnd=257. If first byte is greater than 0, that will be the result. If it's 0, the result will be either 0 or 256 with 50/50 probability. So 0 and 256 are twice less likely to be chosen than any other number.
I did a simple test to confirm this:
p(0)=0.001945
p(1)=0.003827
p(2)=0.003818
...
p(254)=0.003941
p(255)=0.003817
p(256)=0.001955
I think you need to do the same as java.util.Random.nextInt and discard the whole number, instead just the last byte.
After reading the source to Random.nextInt(), I realized that this problem is similar to the base conversion problem.
Rather than converting a single symbol at a time, it would be more effective to convert blocks of input symbol at a time through an accumulator "buffer" which is large enough to represent at least one symbol in the domain and in the range. The new code looks like this:
public int[] fromStream(InputStream input, int length, int rangeLow, int rangeHigh) throws IOException {
int[] outputBuffer = new int[length];
// buffer is initially 0, so there is only 1 possible state it can be in
int numStates = 1;
long buffer = 0;
int alphaLength = rangeLow - rangeHigh;
// Fill outputBuffer from 0 to length
for (int i = 0; i < length; i++) {
// Until buffer has sufficient data filled in from input to emit one symbol in the output alphabet, fill buffer.
fill:
while(numStates < alphaLength) {
// Shift buffer by 8 (*256) to mix in new data (of 8 bits)
buffer = buffer << 8 | input.read();
// Multiply by 256, as that's the number of states that we have possibly introduced
numStates = numStates << 8;
}
// spits out least significant symbol in alphaLength
outputBuffer[i] = (int) (rangeLow + (buffer % alphaLength));
// We have consumed the least significant portion of the input.
buffer = buffer / alphaLength;
// Track the number of states we've introduced into buffer
numStates = numStates / alphaLength;
}
return outputBuffer;
}
There is a fundamental difference between converting numbers between bases and this problem, however; in order to convert between bases, I think one needs to have enough information about the number to perform the calculation - successive divisions by the target base result in remainders which are used to construct the digits in the target alphabet. In this problem, I don't really need to know all that information, as long as I'm not biasing the data, which means I can do what I did in the loop labeled "fill."