Get least significant bytes from an integer

Get least significant bytes from an integer - java

I need to sum all data bytes in ByteArrayOutputStream, adding +1 to the result and taking the 2 least significant bytes.
int checksum = 1;
for(byte b : byteOutputStream.toByteArray()) {
checksum += b;
}
Any input on taking the 2 least significant bytes would be helpful. Java 8 is used in the environment.

If you really mean least significant bytes then:
checksum & 0xFFFF
If you meant that you want to take least significant bits from checksum, then:
checksum & 0x3

Add
checksum &= 0x0000ffff;
That will zero out everything to the left of the 2 least significant bytes.

Your question is a bit underspecified. You didn’t say neither, what you want to do with these two bytes nor how you want to store them (which depends on what you want to do).
To get to individual bytes, you can use
byte lowest = (byte)checksum, semiLowest=(byte)(checksum>>8);
In case you want to store them in a single integer variable, you have to decide, how these bytes are to be interpreted numerically, i.e signed or unsigned.
If you want a signed interpretation, the operation is as simple as
short lowest2bytes = (short)checksum;
If you want an unsigned interpretation, there’s the obstacle that Java has no dedicated type for that. There is a 2 byte sized unsigned type (char), but using it for numerical values can cause confusion when other code tries to interpret it as character value (i.e. when printing). So in that case, the best solution is to use an int variable again and only initialize it with the unsigned char value:
int lowest2bytes = (char)checksum;
Note that this is semantically equivalent to
int lowest2bytes = checksum&0xffff;
seen in other solutions.

Related

How do I convert a short to an int without turning it into a negative in java

I am working on a file reader and came into a problem when trying to read a short. In short (punintended), java is converting a two bytes I'm using to make the short into an int to do bitwise operations and is converting it in a way to keep the same value. I need to convert the byte into an int in a way that would preserve its value so the bits stayed the same.
example of what's happening:
byte number = -1; //-1
int otherNumber = 1;
number | otherNumber; // -1
example of what I want:
byte number = -1; //-1
int otherNumber = 1;
number | otherNumber; // 129

This can be done pretty easily with some bit magic.
I'm sure you're aware that a short is 16 bits (2 bytes) and an int is 32 bits (4 bytes). So, between an integer and a short, there is a two-byte difference. Now, for positive numbers, copying the value of a short to an int is effectively copying the binary data, however, as you've pointed out, this is not the case for negative numbers.
Now let's look at how negative numbers are represented in binary. It's a bit confusing, so I'll try to keep it simple. Modern systems use what's called the two's compliment to store negative numbers. Basically all this means is that the very first bit in the set of bytes representing the number determines whether or not it's negative. For mathematical purposes, the rest of the bits are also inverted and offset 1 bit to the right (since you can't have negative 0). For example, 2 as a short would be represented as 0000 0000 0000 0010, while -2 would be represented as 1111 1111 1111 1110. Now, since the bytes are inverted in a negative number, this means that -2 in int form is the same but with 2 more bytes (16 bits) at the beginning that are all set to 1.
So, in order to combat this, all we need to do is change the extra 1s to 0s. This can be done by simply using the bitwise and operator. This operator goes through each bit and checks if the bits at each position in each operand are a 1 or a 0. If they're both 1, the bit is flipped to a 0. If not, nothing happens.
Now, with this knowledge, all we need to do is create another integer where the first two bytes are all 1. This is fairly simple to do using hexidecimal literals. Since they are an integer by default, we simply need to use this to get four bytes of 1s. With a single byte, if you were to set every bit to 1, the max value you can get is 255. 255 in hex is 0xFF, so 2 bytes would be 0xFFFF. Pretty simple, now you just need to apply it.
Here is an example that does exactly that:
short a = -2;
int b = a & 0xFFFF;
You could also use Short.toUnsignedInt(), but where's the fun in that? 😉

Issues with OR-ing with bytes in Java?

Let's say I have the following code in java
byte t = (byte) 0b10001000;
byte z = 0b00000000;
z = (byte) (t|z);
You'd think the output would be 10001000, however it ends up being -1111000 in String representation, the - sign being the first 1 of course, making it 11111000. If I do the same code but with the last bit in z as a 1, e.g. 00000001, and I perform the same operation, I get -1110111, or 11110111. I figure this is due to some conversion issue with the negatively signed byte t. Is there any way to avoid this? Is there any way to have 10000000 work the same in an OR operation as 01000000?

Java doesn't have unsigned types, so when you use binary notaion and set the sign bit things will not go as you expect ;-)
To emulate an unsigned type you need to work in the next size up.

Purpose of byte type in Java

I read this line in the Java tutorial:
byte: The byte data type is an 8-bit signed two's complement integer. It has
a minimum value of -128 and a maximum value of 127 (inclusive). The
byte data type can be useful for saving memory in large arrays, where
the memory savings actually matters. They can also be used in place of
int where their limits help to clarify your code; the fact that a
variable's range is limited can serve as a form of documentation.
I don't clearly understand the bold line. Can somebody explain it for me?

Byte has a (signed) range from -128 to 127, where as int has a (also signed) range of −2,147,483,648 to 2,147,483,647.
What it means is that since the values you're going to use will always be between that range, by using the byte type you're telling anyone reading your code this value will be at most between -128 to 127 always without having to document about it.
Still, proper documentation is always key and you should only use it in the case specified for readability purposes, not as a replacement for documentation.

If you're using a variable which maximum value is 127 you can use byte instead of int so others know without reading any if conditions after, which may check the boundaries, that this variable can only have a value between -128 and 127.
So it's kind of self-documenting code - as mentioned in the text you're citing.
Personally, I do not recommend this kind of "documentation" - only because a variable can only hold a maximum value of 127 doesn't reveal it's really purpose.

Integers in Java are stored in 32 bits; bytes are stored in 8 bits.
Let's say you have an array with one million entries. Yikes! That's huge!
int[] foo = new int[1000000];
Now, for each of these integers in foo, you use 32 bits or 4 bytes of memory. In total, that's 4 million bytes, or 4MB.
Remember that an integer in Java is a whole number between -2,147,483,648 and 2,147,483,647 inclusively. What if your array foo only needs to contain whole numbers between, say, 1 and 100? That's a whole lot of numbers you aren't using, by declaring foo as an int array.
This is when byte becomes helpful. Bytes store whole numbers between -128 and 127 inclusively, which is perfect for what you need! But why choose bytes? Because they use one-fourth of the space of integers. Now your array is wasting less memory:
byte[] foo = new byte[1000000];
Now each entry in foo takes up 8 bits or 1 byte of memory, so in total, foo takes up only 1 million bytes or 1MB of memory.
That's a huge improvement over using int[] - you just saved 3MB of memory.
Clearly, you wouldn't want to use this for arrays that hold numbers that would exceed 127, so another way of reading the bold line you mentioned is, Since bytes are limited in range, this lets developers know that the variable is strictly limited to these bounds. There is no reason for a developer to assume that a number stored as a byte would ever exceed 127 or be less than -128. Using appropriate data types saves space and informs other developers of the limitations imposed on the variable.

I imagine one can use byte for anything dealing with actual bytes.
Also, the parts (red, green and blue) of colors commonly have a range of 0-255 (although byte is technically -128 to 127, but that's the same amount of numbers).
There may also be other uses.
The general opposition I have to using byte (and probably why it isn't seen as often as it can be) is that there's lots of casting needed. For example, whenever you do arithmetic operations on a byte (except X=), it is automatically promoted to int (even byte+byte), so you have to cast it if you want to put it back into a byte.
A very elementary example:
FileInputStream::read returns a byte wrapped in an int (or -1). This can be cast to an byte to make it clearer. I'm not supporting this example as such (because I don't really (at this moment) see the point of doing the below), just saying something similar may make sense.
It could also have returned a byte in the first place (and possibly thrown an exception if end-of-file). This may have been even clearer, but the way it was done does make sense.
FileInputStream file = new FileInputStream("Somefile.txt");
int val;
while ((val = file.read()) != -1)
{
byte b = (byte)val;
// ...
}
If you don't know much about FileInputStream, you may not know what read returns, so you see an int and you may assume the valid range is the entire range of int (-2^31 to 2^31-1), or possibly the range of a char (0-65535) (not a bad assumption for file operations), but then you see the cast to byte and you give that a second thought.
If the return type were to have been byte, you would know the valid range from the start.
Another example:
One of Color's constructors could have been changed from 3 int's to 3 byte's instead, since their range is limited to 0-255.

It means that knowing that a value is explicitly declared as a very small number might help you recall the purpose of it.
Go for real docs when you have to create a documentation for your code, though, relying on datatypes is not documentation.

An int covers the values from 0 to 4294967295 or 2 to the 32nd power. This is a huge range and if you are scoring a test that is out of 100 then you are wasting that extra spacce if all of your numbers are between 0 and 100. It just takes more memory and harddisk space to store ints, and in serious data driven applications this translates to money wasted if you are not using the extra range that ints provide.

byte data types are generally used when you want to handle data in the forms of streams either from file or from network. Reason behind this is because network and files works on the concept of byte.
Example: FileOutStream always takes byte array as input parameter.

Unsigned short in Java

How can I declare an unsigned short value in Java?

You can't, really. Java doesn't have any unsigned data types, except char.
Admittedly you could use char - it's a 16-bit unsigned type - but that would be horrible in my view, as char is clearly meant to be for text: when code uses char, I expect it to be using it for UTF-16 code units representing text that's interesting to the program, not arbitrary unsigned 16-bit integers with no relationship to text.

If you really need a value with exactly 16 bits:
Solution 1: Use the available signed short and stop worrying about the sign, unless you need to do comparison (<, <=, >, >=) or division (/, %, >>) operations. See this answer for how to handle signed numbers as if they were unsigned.
Solution 2 (where solution 1 doesn't apply): Use the lower 16 bits of int and remove the higher bits with & 0xffff where necessary.

This is a really stale thread, but for the benefit of anyone coming after. The char is a numeric type. It supports all of the mathematical operators, bit operations, etc. It is an unsigned 16.
We process signals recorded by custom embedded hardware so we handle a lot of unsigned 16 from the A-D's. We have been using chars all over the place for years and have never had any problems.

You can use a char, as it is an unsigned 16 bit value (though technically it is a unicode character so could potnetially change to be a 24 bit value in the future)... the other alternative is to use an int and make sure it is within range.
Don't use a char - use an int :-)
And here is a link discussing Java and the lack of unsigned.

From DataInputStream.java
public final int readUnsignedShort() throws IOException {
int ch1 = in.read();
int ch2 = in.read();
if ((ch1 | ch2) < 0)
throw new EOFException();
return (ch1 << 8) + (ch2 << 0);
}

It is not possible to declare a type unsigned short, but in my case, I needed to get the unsigned number to use it in a for loop. There is the method toUnsignedInt in the class Short that returns "the argument converted to int by an unsigned conversion":
short signedValue = -4767;
System.out.println(signedValue ); // prints -4767
int unsignedValue = Short.toUnsignedInt(signedValue);
System.out.println(unsingedValue); // prints 60769
Similar methods exist for Integer and Long:
Integer.toUnsignedLong
Long.toUnsignedString : In this case it ends up in a String because there isn't a bigger numeric type.

No such type in java

Yep no such thing if you want to use the value in code vs. bit operations.

"In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232-1." However this only applies to int and long but not short :(

If using a third party library is an option, there is jOOU (a spin off library from jOOQ), which offers wrapper types for unsigned integer numbers in Java. That's not exactly the same thing as having primitive type (and thus byte code) support for unsigned types, but perhaps it's still good enough for your use-case.
import static org.joou.Unsigned.*;
// and then...
UShort s = ushort(1);
(Disclaimer: I work for the company behind these libraries)

No, really there is no such method, java is a high-level language. That's why Java doesn't have any unsigned data types.

He said he wanted to create a multi-dimensional short array. Yet no one suggested bitwise operators? From what I read you want to use 16 bit integers over 32 bit integers to save memory?
So firstly to begin 10,000 x 10,000 short values is 1,600,000,000 bits, 200,000,000 bytes, 200,000 kilobytes, 200 megabytes.
If you need something with 200MB of memory consumption you may want to redesign this idea. I also do not believe that will even compile let alone run. You should never initialize large arrays like that if anything utilize 2 features called On Demand Loading and Data Caching. Essentially on demand loading refers to the idea to only load data as it is needed. Then data caching does the same thing, but utilizes a custom frame work for delete old memory and adding new information as needed. This one is tricky to have GOOD speed performance. There are other things you can do, but those two are my favorite when done right.
Alright back to what I was saying about bitwise operators.
So a 32bit integer or in Java "int". You can store what are called "bits" to this so let's say you had 32 Boolean values which in Java all values take up 32 bits (except long) or for arrays they take up 8 for byte, 16 for short, and 32 for int. So unless you have arrays you don't get any memory benefits from using a byte or short. This does not mean you shouldn't use it as its a way to ensure you and others know the data range this value should have.
Now as I was saying you could effectively store 32 Booleans into a single integer by doing the following:
int many_booleans = -1; //All are true;
int many_booleans = 0; //All are false;
int many_booleans = 1 | 2 | 8; //Bits 1, 2, and 4 are true the rest are false;
So now a short consists of 16 bits so 16 + 16 = 32 which fits PERFECTLY within a 32bit integer. So every int value can consist of 2 short values.
int two_shorts = value | (value2 << 16);
So what the above is doing is value is something between -32768 and 32767 or as an unsigned value 0 - 65535. So let's say value equaled -1 so as an unsigned value it was 65535. This would mean bits 1 through 16 are turned on, but when actually performing the math consider the range 0 - 15.
So we need to then activate bits 17 - 32. So we must begin at something larger than 15 bits. So we begin at 16 bits. So by taking value2 and multiplying it by 65536 which is what "<< 16" does. We now would have let's say value2 equaled 3 it would be OR'd 3x65536 = 196608. So our integer value would equal 262143.
int assumed_value = 262143;
so let's say we want to retrieve the two 16bit integer values.
short value1 = (short)(assumed_value & 0xFFFF); //-1
short value2 = (short)(assumed_value >> 16); //=3
Also basically think of bitwise operators as powers of 2. That is all they really are. Never look at it terms of 0's and 1's. I mostly posted this to assist anyone who may come across this searching for unsigned short or even possibly multi-dimensional arrays. If there are any typo's I apologize quickly wrote this up.

Java does not have unsigned types. What do you need it for?
Java does have the 'byte' data type, however.

You can code yourself up a ShortUnsigned class and define methods for those operators you want. You won't be able to overload + and - and the others on them, nor have implicit type conversion with other primitive or numeric object types, alas.
Like some of the other answerers, I wonder why you have this pressing need for unsigned short that no other data type will fill.

Simple program to show why unsigned numbers are needed:
package shifttest;
public class ShiftTest{
public static void main(String[] args){
short test = -15000;
System.out.format ("0x%04X 0x%04X 0x%04X 0x%04X 0x%04X\n",
test, test>>1, test>>2, test>>3, test>>4);
}
}
results:
0xC568 0xFFFFE2B4 0xFFFFF15A 0xFFFFF8AD 0xFFFFFC56
Now for those that are not system types:
JAVA does an arithmetic shift because the operand is signed, however, there are cases where a logical shift would be appropriate but JAVA (Sun in particular), deemed it unnecessary, too bad for us on their short sightedness. Shift, And, Or, and Exclusive Or are limited tools when all you have are signed longer numbers. This is a particular problem when interfacing to hardware devices that talk "REAL" computer bits that are 16 bits or more. "char" is not guaranteed to work (it is two bytes wide now) but in several eastern gif based languages such as Chinese, Korean, and Japanese, require at least 3 bytes. I am not acquainted with the number need for sandscript style languages. The number of bytes does not depend on the programmer rather the standards committee for JAVA. So basing char as 16 bits has a downstream risk. To safely implement unsigned shorts JAVA, as special class is the best solution based on the aforementioned ambiguities. The downside of the class is the inability of overloading the mathematical operations for this special class. Many of the contributors for this thread of accurately pointed out these issues but my contribution is a working code example and my experience with 3 byte gifs languages in C++ under Linux.

//вот метод для получения аналога unsigned short
public static int getShortU(byte [] arr, int i ) throws Exception
{
try
{
byte [] b = new byte[2];
b[1] = arr[i];
b[0] = arr[i+1];
int k = ByteBuffer.wrap(b).getShort();
//if this:
//int k = ((int)b[0] << 8) + ((int)b[1] << 0);
//65536 = 2**16
if ( k <0) k = 65536+ k;
return k;
}
catch(Throwable t)
{
throw new Exception ("from getShort: i=" + i);
}
}

What is the best way to work around the fact that ALL Java bytes are signed?

In Java, there is no such thing as an unsigned byte.
Working with some low level code, occasionally you need to work with bytes that have unsigned values greater than 128, which causes Java to interpret them as a negative number due to the MSB being used for sign.
What's a good way to work around this? (Saying don't use Java is not an option)

It is actually possible to get rid of the if statement and the addition if you do it like this.
byte[] foobar = ..;
int value = (foobar[10] & 0xff);
This way Java doesn't interpret the byte as a negative number and flip the sign bit on the integer also.

When reading any single value from the array copy it into something like a short or an int and manually convert the negative number into the positive value it should be.
byte[] foobar = ..;
int value = foobar[10];
if (value < 0) value += 256 // Patch up the 'falsely' negative value
You can do a similar conversion when writing into the array.

Using ints is generally better than using shorts because java uses 32-bit values internally anyway (Even for bytes, unless in an array) so using ints will avoid unnecessary conversion to/from short values in the bytecode.

Probably your best bet is to use an integer rather than a byte. It has the room to allow for numbers greater than 128 without the overhead of having to create a special object to replace byte.
This is also suggested by people smarter than me (everybody)
http://www.darksleep.com/player/JavaAndUnsignedTypes.html
http://www.jguru.com/faq/view.jsp?EID=13647

The best way to do bit manipulation/unsigned bytes is through using ints. Even though they are signed they have plenty of spare bits (32 total) to treat as an unsigned byte. Also, all of the mathematical operators will convert smaller fixed precision numbers to int. Example:
short a = 1s;
short b = 2s;
int c = a + b; // the result is up-converted
short small = (short)c; // must cast to get it back to short
Because of this it is best to just stick with integer and mask it to get the bits that you are interested in. Example:
int a = 32;
int b = 128;
int foo = (a + b) | 255;
Here is some more info on Java primitive types http://mindprod.com/jgloss/primitive.html
One last trivial note, there is one unsigned fixed precision number in Java. That is the char primitive.

I know this is a very late response, but I came across this thread when trying to do the exact same thing. The issue is simply trying to determine if a Java byte is >127.
The simple solution is:
if((val & (byte)0x80) != 0) { ... }
If the real issue is >128 instead, just adding another condition to that if-statement will do the trick.

I guess you could just use a short to store them. Not very efficient, but really the only option besides some herculean effort that I have seen.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.