Java - byte array arithmetics on bit-level?

Java - byte array arithmetics on bit-level? - java

I'm using byte arrays (of size 2 or 4) to emulate the effect of short and int data types.
Main idea is to have a data type that support both char and int types, however it is really hard for me to emulate arithmetic operations in this way, since I must do them in bit level.
For those who do not follow:
The int representation of 123 is not equal to the byte[] of {0,1,2,3} since their bit representations differ (123 representation is 00000000000000000000000001111011 and the representation of {0,1,2,3} is 00000000000000010000001000000011 on my system.
So "int of 123" would actually be equivalent to "byte[] of {0,0,0,123}". The problems occur when values stretch over several bytes and I try to subtract or decrement from those byte arrays, since then you have to interact with several different bytes and my math isn't that sharp.
Any pseudo-code or java library suggestions would be welcome.

Unless you really want to know what bits are being carried from one byte to the next, I'd suggest don't do it this way! If it's just plain math, then convert your arrays to real short and int types, do the math, then convert them back again.
If you must do it this way, consider the following:
Imaging you're adding two short variables that are in byte arrays.
The first problem you have is that all Java integer types are signed.
The second is that the "carry" from the least-significant-byte into the most-significant-byte is best done using a type that's longer than a byte because otherwise you can't detect the overflow.
i.e. if you add two 8-bit values, the carry will be in bit 8. But a byte only has bits 0..7, so to calculate bit 8 you have to promote your bytes to the next appropriate larger type, do the add operation, then figure out if it resulted in a carry, and then handle that when you add up the MSB. It's just not worth it.
BTW, I did actually have to do this sort of bit manipulation many years ago when I wrote an MC6809 CPU emulator. It was necessary to perform multiple operations on the same operands just to be able to figure out the effect on the CPU's various status bits, when those same bits are generated "for free" by a hardware ALU.
For example, my (C++) code to add two 8-bit registers looked like this:
void mc6809::help_adc(Byte& x)
{
Byte m = fetch_operand();
{
Byte t = (x & 0x0f) + (m & 0x0f) + cc.bit.c;
cc.bit.h = btst(t, 4); // Half carry
}
{
Byte t = (x & 0x7f) + (m & 0x7f) + cc.bit.c;
cc.bit.v = btst(t, 7); // Bit 7 carry in
}
{
Word t = x + m + cc.bit.c;
cc.bit.c = btst(t, 8); // Bit 7 carry out
x = t & 0xff;
}
cc.bit.v ^= cc.bit.c;
cc.bit.n = btst(x, 7);
cc.bit.z = !x;
}
which requires that three different additions get done on different variations of the operands just to extract the h, v and c flags.

Related

Bit manipulation in Java - 2s complement and flipping bits

I was recently looking into some problems with but manipulation in Java and I came up with two questions.
1) Firstly, I came up to the problem of flipping all the bits in a number.
I found this solution:
public class Solution {
public int flipAllBits(int num) {
int mask = (1 << (int)Math.floor(Math.log(num)/Math.log(2))+1) - 1;
return num ^ mask;
}
}
But what happens when k = 32 bits? Can the 1 be shifted 33 times?
What I understand from the code (although it doesn't really make sense), the mask is 0111111.(31 1's)....1 and not 32 1's, as someone would expect. And therefore when num is a really large number this would fail.
2) Another question I had was determining when something is a bit sequence in 2s complement or just a normal bit sequence. For example I read that 1010 when flipped is 0110 which is -10 but also 6. Which one is it and how do we know?
Thanks.

1) The Math object calls are not necessary. Flipping all the bits in any ordinal type in Java (or C) is not an arithmatic operation. It is a bitwise operation. Using the '^' operator, simply using 1- as an operand will work regardless of the sizeof int in C/C++ or a Java template with with the ordinal type as a parameter T. The tilde '~' operator is the other option.
T i = 0xf0f0f0f0;
System.out.println(T.toHexString(i));
i ^= -1;
System.out.println(T.toHexString(i));
i = ~ i;
System.out.println(T.toHexString(i));
2) Since the entire range of integers maps to the entire range of integers in a 2's compliment transform, it is not possible to detect whether a number is or is not 2's complement unless one knows the range of numbers from which the 2's complement might be calculated and the two sets (before and after) are mutually exclusive.

That mask computation is fairly inscrutable, I'm going to guess that it (attempts to, since you mention it's wrong) make a mask up to and including the highest set bit. Whether that's useful for "flipping all bits" is an other possible point of discussion, since to me at least, "all bits" means all 32 of them, not some number that depends on the value. But if that's what you want then that's what you want. Especially combined with that second question, that looks like a mistake to me, so you'd be implementing the wrong thing from the start - see near the bottom.
Anyway, the mask can be generated with some reasonably nice bitmath, which does not create any doubt about possible edge cases (eg Math.log(0) is probably bad, and k=32 corresponds with negative numbers which are also probably bad to put into a log):
int m = num | (num >> 16);
m |= m >> 8;
m |= m >> 4;
m |= m >> 2;
m |= m >> 1;
return num ^ m;
Note that this function has odd properties, it almost always returns an unsigned-lower number than went in, except at 0. It flips bits so the name is not completely wrong, but flipAllBits(flipAllBits(x)) != x (usually), while the name suggests it should be an involution.
As for the second question, there is nothing to determine. Two's complement is scheme by which you can interpret a bitvector - any bitvector. So it's really a choice you make; to interpret a given bitvector that way or some other way. In Java the "default" interpretation is two's complement (eg toString will print an int by interpreting it according to its two's complement meaning), but you don't have to go along with it, you can (with care) treat int as unsigned, or as an array of booleans, or several bitfields packed together, etc.
If you wanted to invert all the bits but made the common mistake to assume that the number of bits in an int is variable (and that you therefore needed to compute a mask that covers "all bits"), I have some great news for you, because inverting all bits is a lot easier:
return ~num;
If you were reading "invert all bits" in the context of two's complement, it would have the above meaning, so all bits, including those left of the highest set bit.

Get least significant bytes from an integer

I need to sum all data bytes in ByteArrayOutputStream, adding +1 to the result and taking the 2 least significant bytes.
int checksum = 1;
for(byte b : byteOutputStream.toByteArray()) {
checksum += b;
}
Any input on taking the 2 least significant bytes would be helpful. Java 8 is used in the environment.

If you really mean least significant bytes then:
checksum & 0xFFFF
If you meant that you want to take least significant bits from checksum, then:
checksum & 0x3

Add
checksum &= 0x0000ffff;
That will zero out everything to the left of the 2 least significant bytes.

Your question is a bit underspecified. You didn’t say neither, what you want to do with these two bytes nor how you want to store them (which depends on what you want to do).
To get to individual bytes, you can use
byte lowest = (byte)checksum, semiLowest=(byte)(checksum>>8);
In case you want to store them in a single integer variable, you have to decide, how these bytes are to be interpreted numerically, i.e signed or unsigned.
If you want a signed interpretation, the operation is as simple as
short lowest2bytes = (short)checksum;
If you want an unsigned interpretation, there’s the obstacle that Java has no dedicated type for that. There is a 2 byte sized unsigned type (char), but using it for numerical values can cause confusion when other code tries to interpret it as character value (i.e. when printing). So in that case, the best solution is to use an int variable again and only initialize it with the unsigned char value:
int lowest2bytes = (char)checksum;
Note that this is semantically equivalent to
int lowest2bytes = checksum&0xffff;
seen in other solutions.

Java generate code byte 256*256 & 0xff

I need to generate a 3-bytes code (like A502F1). I am given a criteria:
1st byte is (serialCodeNumber / (256*256) ) & 0xFF
2nd is (serialCodeNumber / 256) & 0xFF
3th is (serialCodeNumber) & 0xFF
serialCodeNumber is a sequence 1-0xFFF
What does that mean!?
I would generate it like this:
String codeNum = new BigInteger(256, random).toString(16).toUpperCase().substring(0, 6);
But what is the right way of doing it as the requirement says?

I'm not quite sure what is meant by the serialCodeNumber, since if it is later on divided by 65025 it has to be a considerably larger number than 0xFFF (which is 4095) for it to make any reasonable sense.
But let's take a look at the conditions, they would all make sense once you are accustomed to the bitwise AND operator. A good read is available here on how it works but the meat of the matter from that question in my opinion is this sentence by Markus Jarderot:
The result is the bits that are turned on in both numbers.
Since in your conditions you have & 0xFF and 0xFF is 255, or in binary it's 11111111 the first eight bits that are all turned on. This is a neat trick to just retrieve only the first 8 bits of any number. And as we all know 8 bits make up a byte. (Are you starting to see where this all is coming together now?)
As for the conditions before the & 0xFF, some might recognize them as bit shift operations hidden behind divisions.
(serialCodeNumber / (256*256)) is equivalent to (serialCodeNumber >> 16)
and
(serialCodeNumber / 256) is equivalent to (serialCodeNumber >> 8)
But that is not that important in this case.
So the first condition takes the serialCodeNumber divides it by 65025 (256*256) and then looks at the 8 right most bits and ignores any other, from those 8 bits it constructs a byte.
In Java you can pretty much just write the condition as it is:
byte myFirstByte = (byte) ((serialCodeNumber / (256*256)) & 0xFF);
The other conditions aren't much different:
byte mySecondByte = (byte) ((serialCodeNumber / (256)) & 0xFF);
and
byte myThirdByte = (byte) ((serialCodeNumber) & 0xFF);
Once you have all three of your bytes, I'm assuming you need to convert them to a hex String. So I'll add them into a byte array.
byte[] myArray = {myFirstByte,mySecondByte,myThirdByte};
And borrow some method on how to convert byte arrays to HEX strings from this question.
String codeNum = bytesToHex(myArray);
And the result will look something like this:
F03DD7
EDIT:
Since you have to generate a serial number that has to be up to 6 bytes in value, I'd recommend using a long number.
A 6 byte number will be anywhere from 1 to 281474976710655, so you probably need to generate one randomly.
First instantiate a Random object which you will be able to poll numbers from:
Random random = new Random();
Once you have that, poll a long from it for the range 1 to 281474976710655.
For this you can borrow KennyTM's answer from this question.
So you can then generate the number like so:
long serialCodeNumber = nextLong(random, 281474976710655L)+1L;
We add the +1L at the end since we want it to include the last number as well as start from 1 instead of 0.
If you ever need to show a HEX string of the serialCodeNumber you can then just call:
String serialHex = Long.toHexString(serialCodeNumber);
But make sure to add any additional "0"s at the left side based on the length of the string so that it is 6-bytes = 12 characters long.

Unsigned short in Java

How can I declare an unsigned short value in Java?

You can't, really. Java doesn't have any unsigned data types, except char.
Admittedly you could use char - it's a 16-bit unsigned type - but that would be horrible in my view, as char is clearly meant to be for text: when code uses char, I expect it to be using it for UTF-16 code units representing text that's interesting to the program, not arbitrary unsigned 16-bit integers with no relationship to text.

If you really need a value with exactly 16 bits:
Solution 1: Use the available signed short and stop worrying about the sign, unless you need to do comparison (<, <=, >, >=) or division (/, %, >>) operations. See this answer for how to handle signed numbers as if they were unsigned.
Solution 2 (where solution 1 doesn't apply): Use the lower 16 bits of int and remove the higher bits with & 0xffff where necessary.

This is a really stale thread, but for the benefit of anyone coming after. The char is a numeric type. It supports all of the mathematical operators, bit operations, etc. It is an unsigned 16.
We process signals recorded by custom embedded hardware so we handle a lot of unsigned 16 from the A-D's. We have been using chars all over the place for years and have never had any problems.

You can use a char, as it is an unsigned 16 bit value (though technically it is a unicode character so could potnetially change to be a 24 bit value in the future)... the other alternative is to use an int and make sure it is within range.
Don't use a char - use an int :-)
And here is a link discussing Java and the lack of unsigned.

From DataInputStream.java
public final int readUnsignedShort() throws IOException {
int ch1 = in.read();
int ch2 = in.read();
if ((ch1 | ch2) < 0)
throw new EOFException();
return (ch1 << 8) + (ch2 << 0);
}

It is not possible to declare a type unsigned short, but in my case, I needed to get the unsigned number to use it in a for loop. There is the method toUnsignedInt in the class Short that returns "the argument converted to int by an unsigned conversion":
short signedValue = -4767;
System.out.println(signedValue ); // prints -4767
int unsignedValue = Short.toUnsignedInt(signedValue);
System.out.println(unsingedValue); // prints 60769
Similar methods exist for Integer and Long:
Integer.toUnsignedLong
Long.toUnsignedString : In this case it ends up in a String because there isn't a bigger numeric type.

No such type in java

Yep no such thing if you want to use the value in code vs. bit operations.

"In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232-1." However this only applies to int and long but not short :(

If using a third party library is an option, there is jOOU (a spin off library from jOOQ), which offers wrapper types for unsigned integer numbers in Java. That's not exactly the same thing as having primitive type (and thus byte code) support for unsigned types, but perhaps it's still good enough for your use-case.
import static org.joou.Unsigned.*;
// and then...
UShort s = ushort(1);
(Disclaimer: I work for the company behind these libraries)

No, really there is no such method, java is a high-level language. That's why Java doesn't have any unsigned data types.

He said he wanted to create a multi-dimensional short array. Yet no one suggested bitwise operators? From what I read you want to use 16 bit integers over 32 bit integers to save memory?
So firstly to begin 10,000 x 10,000 short values is 1,600,000,000 bits, 200,000,000 bytes, 200,000 kilobytes, 200 megabytes.
If you need something with 200MB of memory consumption you may want to redesign this idea. I also do not believe that will even compile let alone run. You should never initialize large arrays like that if anything utilize 2 features called On Demand Loading and Data Caching. Essentially on demand loading refers to the idea to only load data as it is needed. Then data caching does the same thing, but utilizes a custom frame work for delete old memory and adding new information as needed. This one is tricky to have GOOD speed performance. There are other things you can do, but those two are my favorite when done right.
Alright back to what I was saying about bitwise operators.
So a 32bit integer or in Java "int". You can store what are called "bits" to this so let's say you had 32 Boolean values which in Java all values take up 32 bits (except long) or for arrays they take up 8 for byte, 16 for short, and 32 for int. So unless you have arrays you don't get any memory benefits from using a byte or short. This does not mean you shouldn't use it as its a way to ensure you and others know the data range this value should have.
Now as I was saying you could effectively store 32 Booleans into a single integer by doing the following:
int many_booleans = -1; //All are true;
int many_booleans = 0; //All are false;
int many_booleans = 1 | 2 | 8; //Bits 1, 2, and 4 are true the rest are false;
So now a short consists of 16 bits so 16 + 16 = 32 which fits PERFECTLY within a 32bit integer. So every int value can consist of 2 short values.
int two_shorts = value | (value2 << 16);
So what the above is doing is value is something between -32768 and 32767 or as an unsigned value 0 - 65535. So let's say value equaled -1 so as an unsigned value it was 65535. This would mean bits 1 through 16 are turned on, but when actually performing the math consider the range 0 - 15.
So we need to then activate bits 17 - 32. So we must begin at something larger than 15 bits. So we begin at 16 bits. So by taking value2 and multiplying it by 65536 which is what "<< 16" does. We now would have let's say value2 equaled 3 it would be OR'd 3x65536 = 196608. So our integer value would equal 262143.
int assumed_value = 262143;
so let's say we want to retrieve the two 16bit integer values.
short value1 = (short)(assumed_value & 0xFFFF); //-1
short value2 = (short)(assumed_value >> 16); //=3
Also basically think of bitwise operators as powers of 2. That is all they really are. Never look at it terms of 0's and 1's. I mostly posted this to assist anyone who may come across this searching for unsigned short or even possibly multi-dimensional arrays. If there are any typo's I apologize quickly wrote this up.

Java does not have unsigned types. What do you need it for?
Java does have the 'byte' data type, however.

You can code yourself up a ShortUnsigned class and define methods for those operators you want. You won't be able to overload + and - and the others on them, nor have implicit type conversion with other primitive or numeric object types, alas.
Like some of the other answerers, I wonder why you have this pressing need for unsigned short that no other data type will fill.

Simple program to show why unsigned numbers are needed:
package shifttest;
public class ShiftTest{
public static void main(String[] args){
short test = -15000;
System.out.format ("0x%04X 0x%04X 0x%04X 0x%04X 0x%04X\n",
test, test>>1, test>>2, test>>3, test>>4);
}
}
results:
0xC568 0xFFFFE2B4 0xFFFFF15A 0xFFFFF8AD 0xFFFFFC56
Now for those that are not system types:
JAVA does an arithmetic shift because the operand is signed, however, there are cases where a logical shift would be appropriate but JAVA (Sun in particular), deemed it unnecessary, too bad for us on their short sightedness. Shift, And, Or, and Exclusive Or are limited tools when all you have are signed longer numbers. This is a particular problem when interfacing to hardware devices that talk "REAL" computer bits that are 16 bits or more. "char" is not guaranteed to work (it is two bytes wide now) but in several eastern gif based languages such as Chinese, Korean, and Japanese, require at least 3 bytes. I am not acquainted with the number need for sandscript style languages. The number of bytes does not depend on the programmer rather the standards committee for JAVA. So basing char as 16 bits has a downstream risk. To safely implement unsigned shorts JAVA, as special class is the best solution based on the aforementioned ambiguities. The downside of the class is the inability of overloading the mathematical operations for this special class. Many of the contributors for this thread of accurately pointed out these issues but my contribution is a working code example and my experience with 3 byte gifs languages in C++ under Linux.

//вот метод для получения аналога unsigned short
public static int getShortU(byte [] arr, int i ) throws Exception
{
try
{
byte [] b = new byte[2];
b[1] = arr[i];
b[0] = arr[i+1];
int k = ByteBuffer.wrap(b).getShort();
//if this:
//int k = ((int)b[0] << 8) + ((int)b[1] << 0);
//65536 = 2**16
if ( k <0) k = 65536+ k;
return k;
}
catch(Throwable t)
{
throw new Exception ("from getShort: i=" + i);
}
}

What is the best way to work around the fact that ALL Java bytes are signed?

In Java, there is no such thing as an unsigned byte.
Working with some low level code, occasionally you need to work with bytes that have unsigned values greater than 128, which causes Java to interpret them as a negative number due to the MSB being used for sign.
What's a good way to work around this? (Saying don't use Java is not an option)

It is actually possible to get rid of the if statement and the addition if you do it like this.
byte[] foobar = ..;
int value = (foobar[10] & 0xff);
This way Java doesn't interpret the byte as a negative number and flip the sign bit on the integer also.

When reading any single value from the array copy it into something like a short or an int and manually convert the negative number into the positive value it should be.
byte[] foobar = ..;
int value = foobar[10];
if (value < 0) value += 256 // Patch up the 'falsely' negative value
You can do a similar conversion when writing into the array.

Using ints is generally better than using shorts because java uses 32-bit values internally anyway (Even for bytes, unless in an array) so using ints will avoid unnecessary conversion to/from short values in the bytecode.

Probably your best bet is to use an integer rather than a byte. It has the room to allow for numbers greater than 128 without the overhead of having to create a special object to replace byte.
This is also suggested by people smarter than me (everybody)
http://www.darksleep.com/player/JavaAndUnsignedTypes.html
http://www.jguru.com/faq/view.jsp?EID=13647

The best way to do bit manipulation/unsigned bytes is through using ints. Even though they are signed they have plenty of spare bits (32 total) to treat as an unsigned byte. Also, all of the mathematical operators will convert smaller fixed precision numbers to int. Example:
short a = 1s;
short b = 2s;
int c = a + b; // the result is up-converted
short small = (short)c; // must cast to get it back to short
Because of this it is best to just stick with integer and mask it to get the bits that you are interested in. Example:
int a = 32;
int b = 128;
int foo = (a + b) | 255;
Here is some more info on Java primitive types http://mindprod.com/jgloss/primitive.html
One last trivial note, there is one unsigned fixed precision number in Java. That is the char primitive.

I know this is a very late response, but I came across this thread when trying to do the exact same thing. The issue is simply trying to determine if a Java byte is >127.
The simple solution is:
if((val & (byte)0x80) != 0) { ... }
If the real issue is >128 instead, just adding another condition to that if-statement will do the trick.

I guess you could just use a short to store them. Not very efficient, but really the only option besides some herculean effort that I have seen.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.