java 128 bit structure bit maninpulation - java

Is there a way to create a 128 bit object in java, that can be bit manipulated the same way as a long or int? I want to do 32 bit shifts and i want to be able to do a bit OR operation on the whole 128 bit structure.

Here, I present to you... an old idea. Now it's awfully downgraded (no code enhancer, no nothing) to simple 128 bit thingie that should be super fast, though. What I truly want is a ByteBuffer based array of C alike Struct but fully usable in java.
The main idea is allocating more than a single object at a time and using a pointer to the array. Thus, it greatly conserves memory and the memory is allocated in continuous area, so less cache misses (always good).
I did some moderate testing (but the code is still untested).
It does allow basic operations like add, xor, or, set/get with 128 bit numbers.
The standard rule: less documentation than expected applied unfortunately.
Adding extra code for extra operations should be straight forward.
Here is the code, look at main method for some usage. Cheers!
package bestsss.util;
import java.util.Random;
public class Bitz {
final int[] array;
private Bitz(int n){
array=new int[n<<2];
}
public int size(){
return size(this.array);
}
private static int size(int[] array){
return array.length>>2;
}
/**
* allocates N 128bit elements. newIdx to create a pointer
* #param n
* #return
*/
public static Bitz allocate(int n){
return new Bitz(n);
}
/**
* Main utility class - points to an index in the array
* #param idx
* #return
*/
public Idx newIdx(int idx){
return new Idx(array).set(idx);
}
public static class Idx{
private static final long mask = 0xFFFFFFFFL;
//dont make the field finals
int idx;
int[] array;//keep ref. here, reduce the indirection
Idx(int[] array){
this.array=array;
}
public Idx set(int idx) {
if (Bitz.size(array)<=idx || idx<0)
throw new IndexOutOfBoundsException(String.valueOf(idx));
this.idx = idx<<2;
return this;
}
public int index(){
return idx>>2;
}
public Idx shl32(){
final int[] array=this.array;
int idx = this.idx;
array[idx]=array[++idx];
array[idx]=array[++idx];
array[idx]=array[++idx];
array[idx]=0;
return this;
}
public Idx shr32(){
final int[] array=this.array;
int idx = this.idx+3;
array[idx]=array[--idx];
array[idx]=array[--idx];
array[idx]=array[--idx];
array[idx]=0;
return this;
}
public Idx or(Idx src){
final int[] array=this.array;
int idx = this.idx;
int idx2 = src.idx;
final int[] array2=src.array;
array[idx++]|=array2[idx2++];
array[idx++]|=array2[idx2++];
array[idx++]|=array2[idx2++];
array[idx++]|=array2[idx2++];
return this;
}
public Idx xor(Idx src){
final int[] array=this.array;
int idx = this.idx;
int idx2 = src.idx;
final int[] array2=src.array;
array[idx++]^=array2[idx2++];
array[idx++]^=array2[idx2++];
array[idx++]^=array2[idx2++];
array[idx++]^=array2[idx2++];
return this;
}
public Idx add(Idx src){
final int[] array=this.array;
int idx = this.idx+3;
final int[] array2=src.array;
int idx2 = src.idx+3;
long l =0;
l += array[idx]&mask;
l += array2[idx2--]&mask;
array[idx--]=(int)(l&mask);
l>>>=32;
l += array[idx]&mask;
l += array2[idx2--]&mask;
array[idx--]=(int)(l&mask);
l>>>=32;
l += array[idx]&mask;
l += array2[idx2--]&mask;
array[idx--]=(int)(l&mask);
l>>>=32;
l += array[idx]&mask;
l += array2[idx2--];
array[idx]=(int)(l&mask);
// l>>>=32;
return this;
}
public Idx set(long high, long low){
final int[] array=this.array;
int idx = this.idx;
array[idx+0]=(int) ((high>>>32)&mask);
array[idx+1]=(int) ((high>>>0)&mask);
array[idx+2]=(int) ((low>>>32)&mask);
array[idx+3]=(int) ((low>>>0)&mask);
return this;
}
public long high(){
final int[] array=this.array;
int idx = this.idx;
long res = (array[idx]&mask)<<32 | (array[idx+1]&mask);
return res;
}
public long low(){
final int[] array=this.array;
int idx = this.idx;
long res = (array[idx+2]&mask)<<32 | (array[idx+3]&mask);
return res;
}
//ineffective but well
public String toString(){
return String.format("%016x-%016x", high(), low());
}
}
public static void main(String[] args) {
Bitz bitz = Bitz.allocate(256);
Bitz.Idx idx = bitz.newIdx(0);
Bitz.Idx idx2 = bitz.newIdx(2);
System.out.println(idx.set(0, 0xf));
System.out.println(idx2.set(0, Long.MIN_VALUE).xor(idx));
System.out.println(idx.set(0, Long.MAX_VALUE).add(idx2.set(0, 1)));
System.out.println("==");
System.out.println(idx.add(idx));//can add itself
System.out.println(idx.shl32());//left
System.out.println(idx.shr32());//and right
System.out.println(idx.shl32());//back left
//w/ alloc
System.out.println(idx.add(bitz.newIdx(4).set(0, Long.MAX_VALUE)));
//self xor
System.out.println(idx.xor(idx));
//random xor
System.out.println("===init random===");
Random r = new Random(1112);
for (int i=0, s=bitz.size(); i<s; i++){
idx.set(i).set(r.nextLong(), r.nextLong());
System.out.println(idx);
}
Idx theXor = bitz.newIdx(0);
for (int i=1, s=bitz.size(); i<s; i++){
theXor.xor(idx.set(i));
}
System.out.println("===XOR===");
System.out.println(theXor);
}
}

Three possibilities have been identified:
The BitSet class provides some of the operations that you need, but no "shift" method. To implement this missing method, you'd need to do something like this:
BitSet bits = new BitSet(128);
...
// shift left by 32bits
for (int i = 0; i < 96; i++) {
bits.set(i, bits.get(i + 32));
}
bits.set(96, 127, false);
The BigInteger class provides all of the methods (more or less), but since BigInteger is immutable, it could result in an excessive object creation rate ... depending on how you use the bitsets. (There is also the issue that shiftLeft(32) won't chop off the leftmost bits ... but you can deal with this by using and to mask out the bits at index 128 and higher.)
If performance is your key concern, implementing a custom class with 4 int or 2 long fields will probably give best performance. (Which is actually the faster option of the two will depend on the hardware platform, the JVM, etc. I'd probably choose the long version because it will be simpler to code ... and only try to optimize further if profiling indicated that it was a potentially worthwhile activity.)
Furthermore, you can design the APIs to behave exactly as you require (modulo the constraints of Java language). The downside is that you have to implement and test everything, and you will be hard-wiring the magic number 128 into your code-base.

There is no longer data type than long (I have logged this as an RFE along with a 128 bit floating point ;)
You can create an object with four 32-bit int values and support these operations fairly easily.

You can't define any new types to which you could apply Java's built-in bitwise operators.
However, could you just use java.math.BigInteger? BigInteger defines all of the bit-wise operations that are defined for integral types (as methods). This includes, for example, BigInteger.or(BigInteger).

No.
Sorry there isn't a better answer.
One approach may be to create a wrapper object for two long values and implement the required functionality while taking signedness of the relevant operators into account. There is also BigInteger [updated from rlibby's answer], but it doesn't provide the required support.
Happy coding.

Perhaps BitSet would be useful to you.
It has the logical operations, and I imagine shifting wouldn't be all that hard to implement given their utility methods.

Afaik, the JVM will just convert whatever you code into 32 bit chunks whatever you do. JVM is 32 bit. I think even 64 bit version of JVM largely processes in 32 bit chunks. It certainly should to conserve memory... You're just going to slow down your code as the JIT tries to optimise the mess you create. In C/C++ etc. there's no point doing this either as you will still have impedance from the fact that it's 32 or 64 bit registers in the hardware you're most likely using. Even the Intel Xenon Phi (has 512bit vector registers) is just bunches of 32 and 64 bit elements.
If you want to implement something like that, you could try to do it in GLSL or OpenCL if you have GPU hardware available. In 2015 Java Sumatra will be released as part of Java 9, at least that's the plan. Then you will have the ability to integrate java with GPU code out of the box. That IS a big deal, hence the illustrious name!

Related

Can a packed C structure and function be ported to java?

In the past I have written code which handles incoming data from a serial port. The data has a fixed format.
Now I want to migrate this code to java (android). However, I see many obstacles.
The actual code is more complex, but I have a simplified version here:
#define byte unsigned char
#define word unsigned short
#pragma pack(1);
struct addr_t
{
byte foo;
word bar;
};
#pragma pack();
bool RxData( byte val )
{
static byte buffer[20];
static int idx = 0;
buffer[idx++] = val;
return ( idx == sizeof(addr_t) );
}
The RxData function is called everytime a byte is received. When the complete chunk of data is in, it returns true.
Some of the obstacles:
The used data types are not available to java. In other threads it is recommended to use larger datatypes, but in this case this is not a workable solution.
The size of the structure is in this case exactly 3 bytes. That's also why the #pragma statement is important. Otherwise the C compiler might "optimize" it for memory use, with a different size as a result.
Java also doesn't have a sizeof function and I have found no alternative for this kind of situation.
I could replace the 'sizeof' with a fixed value of 3, but that would be very bad practice IMO.
Is it at all possible to write such a code in java? Or is it wiser to try to add native c source into Android Studio?
Your C code has its problems too. Technically, you do not know how big a char and a short is. You probably want uint8_t and uint16_t respectively. Also, I'm not sure how portable packing is.
In Java, you need a class. The class might as well tell you how many bytes you need to initialise it.
class Addr
{
private byte foo;
private short bar;
public final static int bufferBytes = 3;
public int getUnsignedFoo()
{
return (int)foo & 0xff;
}
public int getUnsignedBar()
{
return (int)bar & 0xffff;
}
}
Probably a class for the buffer too although there may already be a suitable class in the standard library.
class Buffer
{
private final static int maxSize = 20;
private byte[] bytes = new byte[maxSize];
private int idx = 0;
private bool rxData(byte b)
{
bytes[idx++] = b;
return idx == Addr.bufferBytes;
}
}
To answer the question about the hardcodedness of the 3, this is actually the better way to do it because your the specification of your protocol should say "one byte for foo and two bytes for bar" not "a packed C struct with a char and a short in it". One way to deserialise the buffer is like this:
public class Addr
{
// All the stuff from above
public Addr(byte[] buffer)
{
foo = buffer[0];
bar = someFunctionThatGetsTheEndiannessRight(buffer[1], buffer[2]);
}
}
TI have left the way bar is calculated deliberately vague because it depends on your platform as much as anything. You can do it simply with bit shifts e.g.
(((short)buffer[1] & 0xff) << 8) | ((short)buffer[2] & 0xff)
However, there are better options available. For example, you can use a java.nio.ByteBuffer which has the machinery to cope with endian isssues.

Convert a long to an int cutting off the overflow

I want to cast a long value to an int value and if the long value is too big to fit into an int it should just be the the biggest possible int value. My solution looks like that:
long longVar = ...;
int intVar = (int) Math.min(longVar, Integer.MAX_VALUE)
In a more general way (to include the negative maximum) it would be:
long longVar = ...;
int intVar = (int) (longVar < 0 ? Math.max(longVar, Integer.MIN_VALUE) : Math.min(longVar, Integer.MAX_VALUE));
Is there an easier way to do this, like a method in the JRE or something?
An improvement would be
int intVar = (int) Math.min(Math.max(longVar, Integer.MIN_VALUE),
Integer.MAX_VALUE));
Math.max would make [Long.Min,Long.Max] => [Int.Min, Long.Max] and whatever outcome of that, if it is greater than Int.Max will be trimmed down by the outer Math.min to [Int.Min, Int.Max].
I don't know of a ready-to-go method doing this included in java.
The java 8 method Math.toIntExact will throw an exception on overflow. And using that to do this - well, I'd consider it a misuse of exceptions. And probably less efficient than above construct.
If you can use Guava, there is a method that does exactly what you want: static int Ints.saturatedCast(long):
long longVar = ...;
int intVar = Ints.saturatedCast(longVar);
For general interest, there's the Wikipedia article on saturation arithmetic. The Intel MMX instruction set uses saturation arithmetic and I think Intel offer an SDK to allow Java developers to use MMX. I'm not sure if Guava implements its methods using this SDK (probably not).
You can also write some reusable code.
package it.stackoverflow;
public class UtilInt {
public static int getIntMaxMinLong(long longNumber){
int intNumber = 0;
if (longNumber < Integer.MIN_VALUE )
intNumber = Integer.MIN_VALUE;
else if (longNumber > Integer.MAX_VALUE)
intNumber = Integer.MAX_VALUE;
else
intNumber = (int) longNumber;
return intNumber;
}
}
You can call the method in the static way.
package it.stackoverflow;
public class Main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int intNewValue = UtilInt.getIntMaxMinLong(224748223647L);
}
}

Performance difference between assignment and conditional test

This question is specifically geared towards the Java language, but I would not mind feedback about this being a general concept if so. I would like to know which operation might be faster, or if there is no difference between assigning a variable a value and performing tests for values. For this issue we could have a large series of Boolean values that will have many requests for changes. I would like to know if testing for the need to change a value would be considered a waste when weighed against the speed of simply changing the value during every request.
public static void main(String[] args){
Boolean array[] = new Boolean[veryLargeValue];
for(int i = 0; i < array.length; i++) {
array[i] = randomTrueFalseAssignment;
}
for(int i = 400; i < array.length - 400; i++) {
testAndChange(array, i);
}
for(int i = 400; i < array.length - 400; i++) {
justChange(array, i);
}
}
This could be the testAndChange method
public static void testAndChange(Boolean[] pArray, int ind) {
if(pArray)
pArray[ind] = false;
}
This could be the justChange method
public static void justChange(Boolean[] pArray, int ind) {
pArray[ind] = false;
}
If we were to end up with the very rare case that every value within the range supplied to the methods were false, would there be a point where one method would eventually become slower than the other? Is there a best practice for issues similar to this?
Edit: I wanted to add this to help clarify this question a bit more. I realize that the data type can be factored into the answer as larger or more efficient datatypes can be utilized. I am more focused on the task itself. Is the task of a test "if(aConditionalTest)" is slower, faster, or indeterminable without additional informaiton (such as data type) than the task of an assignment "x=avalue".
As #TrippKinetics points out, there is a semantical difference between the two methods. Because you use Boolean instead of boolean, it is possible that one of the values is a null reference. In that case the first method (with the if-statement) will throw an exception while the second, simply assigns values to all the elements in the array.
Assuming you use boolean[] instead of Boolean[]. Optimization is an undecidable problem. There are very rare cases where adding an if-statement could result in better performance. For instance most processors use cache and the if-statement can result in the fact that the executed code is stored exactly on two cache-pages where without an if on more resulting in cache faults. Perhaps you think you will save an assignment instruction but at the cost of a fetch instruction and a conditional instruction (which breaks the CPU pipeline). Assigning has more or less the same cost as fetching a value.
In general however, one can assume that adding an if statement is useless and will nearly always result in slower code. So you can quite safely state that the if statement will slow down your code always.
More specifically on your question, there are faster ways to set a range to false. For instance using bitvectors like:
long[] data = new long[(veryLargeValue+0x3f)>>0x06];//a long has 64 bits
//assign random values
int low = 400>>0x06;
int high = (veryLargeValue-400)>>0x06;
data[low] &= 0xffffffffffffffff<<(0x3f-(400&0x3f));
for(int i = low+0x01; i < high; i++) {
data[i] = 0x00;
}
data[high] &= 0xffffffffffffffff>>(veryLargeValue-400)&0x3f));
The advantage is that a processor can perform operations on 32- or 64-bits at once. Since a boolean is one bit, by storing bits into a long or int, operations are done in parallel.

Performance of 2D array allocation

I am wondering why allocation of a 2D int array at once (new int[50][2]) performs poorer than allocating separately, that is, execute new int[50][] first, then new int[2] one-by-one. Here is a non-professional benchmark code:
public class AllocationSpeed {
private static final int ITERATION_COUNT = 1000000;
public static void main(String[] args) {
new AllocationSpeed().run();
}
private void run() {
measureSeparateAllocation();
measureAllocationAtOnce();
}
private void measureAllocationAtOnce() {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < ITERATION_COUNT; i++) {
allocateAtOnce();
}
stopwatch.stop();
System.out.println("Allocate at once: " + stopwatch);
}
private int allocateAtOnce() {
int[][] array = new int[50][2];
return array[10][1];
}
private void measureSeparateAllocation() {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < ITERATION_COUNT; i++) {
allocateSeparately();
}
stopwatch.stop();
System.out.println("Separate allocation: " + stopwatch);
}
private int allocateSeparately() {
int[][] array = new int[50][];
for (int i = 0; i < array.length; i++) {
array[i] = new int[2];
}
return array[10][1];
}
}
I tested on 64 bit linux, these are the results with different 64 bit oracle java versions:
1.6.0_45-b06:
Separate allocation: 401.0 ms
Allocate at once: 1.673 s
1.7.0_45-b18
Separate allocation: 408.7 ms
Allocate at once: 1.448 s
1.8.0-ea-b115
Separate allocation: 380.0 ms
Allocate at once: 1.251 s
Just for curiosity, I tried it with OpenJDK 7 as well (where the difference is smaller):
Separate allocation: 424.3 ms
Allocate at once: 1.072 s
For me it's quite counter-intuitive, I would expect allocating at once to be faster.
Absolute unbelievable. A benchmark source might suffer from optimizations, gc and JIT, but this?
Looking at the java byte code instruction set:
anewarray (+ 2 bytes indirect class index) for arrays of object classes (a = address)
newarray (+ 1 byte for prinitive class) for arrays of primitive types
multianewarray (+ 2 bytes indirect class index) for multidimensional arrays
This leads one to suspect that multianewarray is suboptimal for primitive types.
Before looking further, I hope someone knows where we are misled.
The latter code's inner loop (with a newarray) is hit more times than the former code's multianewarray, so it probably hits C2 and gets subjected to escape analysis sooner. (Once that happens, the rows created by the latter code are allocated on the stack, which is faster than the heap and reduces the workload for the garbage collector.)
It's also possible that these JDK versions didn't actually do escape analysis on rows from a multianewarray, since a multidimensional array is more likely to exceed the size limit for a stack array.

Can The 5-Op Log2(Int 32) Bit Hack be Done in Java?

Just to clarify this is NOT a homework question as I've seen similar accusations leveled against other bit-hackish questions:
That said, I have this bit hack in C:
#include <stdio.h>
const int __FLOAT_WORD_ORDER = 0;
const int __LITTLE_END = 0;
// Finds log-base 2 of 32-bit integer
int log2hack(int v)
{
union { unsigned int u[2]; double d; } t; // temp
t.u[0]=0;
t.u[1]=0;
t.d=0.0;
t.u[__FLOAT_WORD_ORDER==__LITTLE_END] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=__LITTLE_END] = v;
t.d -= 4503599627370496.0;
return (t.u[__FLOAT_WORD_ORDER==__LITTLE_END] >> 20) - 0x3FF;
}
int main ()
{
int i = 25; //Log2n(25) = 4
int j = 33; //Log2n(33) = 5
printf("Log2n(25)=%i!\n",
log2hack(25));
printf("Log2n(33)=%i!\n",
log2hack(33));
return 0;
}
I want to convert this to Java. So far what I have is:
public int log2Hack(int n)
{
int r; // result of log_2(v) goes here
int[] u = new int [2];
double d = 0.0;
if (BitonicSorterForArbitraryN.__FLOAT_WORD_ORDER==
BitonicSorterForArbitraryN.LITTLE_ENDIAN)
{
u[1] = 0x43300000;
u[0] = n;
}
else
{
u[0] = 0x43300000;
u[1] = n;
}
d -= 4503599627370496.0;
if (BitonicSorterForArbitraryN.__FLOAT_WORD_ORDER==
BitonicSorterForArbitraryN.LITTLE_ENDIAN)
r = (u[1] >> 20) - 0x3FF;
else
r = (u[0] >> 20) - 0x3FF;
return r;
}
(Note it's inside a bitonic sorting class of mine...)
Anyhow, when I run this for the same values 33 and 25, I get 52 in each cases.
I know Java's integers are signed, so I'm pretty sure that has something to do with why this is failing. Does anyone have any ideas how I can get this 5-op, 32-bit integer log 2 to work in Java?
P.S. For the record, the technique is not mine, I borrowed it from here:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogIEEE64Float
If you're in Java, can't you simply do 31 - Integer(v).numberOfLeadingZeros()? If they implement this using __builtin_clz it should be fast.
I think you did not get the meaning of that code. The C code uses a union - a struct that maps the same memory to two or more different fields. That makes it possible to access the storage allocated for the double as integers. In your Java code, you don't use an union but two different variables that are mapped to different parts of memory. This makes the hack fail.
As Java has no unions, you had to use serialization to get the results you want. Since that is quite slow, why not use another method to calculate the logarithm?
You are using the union to convert your pair of ints into a double with the same bit pattern. In Java, you can do that with Double.longBitsToDouble, and then convert back with Double.doubleToLongBits. Java is always (or at least gives the impression of always being) big-endian, so you don't need the endianness check.
That said, my attempt to adapt your code into Java didn't work. The signedness of Java integers might be a problem.

Categories

Resources