Converting US-ASCII encoded byte to integer and back

Converting US-ASCII encoded byte to integer and back - java

I have a byte array that can be of size 2,3 or 4. I need to convert this to the correct integer value. I also need to do this in reverse, i.e an 2,3 or 4 character integer to a byte array.
e.g., raw hex bytes are : 54 and 49. The decoded string US-ASCII value is 61. So the integer answer needs to be 61.
I have read all the conversion questions on stackoverflow etc that I could find, but they all give the completely wrong answer, I dont know whether it could be the encoding?
If I do new String(lne,"US-ASCII"), where lne is my byte array, I get the correct 61. But when doing this ((int)lne[0] << 8) | ((int)lne[1] & 0xFF), I get the complete wrong answer.
This may be a silly mistake or I completely don't understand the number representation schemes in Java and the encoding/decoding idea.
Any help would be appreciated.
NOTE: I know I can just parse the String to integer, but I would like to know if there is a way to use fast operations like shifting and binary arithmetic instead?

Here's a thought on how to use fast operations like byte shifting and decimal arithmetic to speed this up. Assuming you have the current code:
byte[] token; // bytes representing a bunch of ascii numbers
int n = Integer.parseInt(new String(token)); // current approach
Then you could instead replace that last line and do the following (assuming no negative numbers, no foreign langauge characters, etc.):
int n = 0;
for (byte b : token)
n = 10*n + (b-'0');
Out of interest, this resulted in roughly a 28% speedup for me on a massive data set. I think this is due to not having to allocate new String objects and then trash them after each parseInt call.

You need two conversion steps. First, convert your ascii bytes to a string. That's what new String(lne,"us-ascii") does for you. Then, convert the string representation of the number to an actual number. For that you use something like Integer.parseInt(theString) -- remember to handle NumberFormatException.

As you say, new String(lne,"US-ASCII") will give you the correct string. To convert your String to an integer, use int myInt = Integer.parseInt(new String(lne,"US-ASCII"));

Related

Creating a BigInteger from String and pad "0"s to it

[InputParam1:Decimal Number in String format(for Eg:30),
InputParam2:Integer to denote number of repeating 0's to append(for Eg:6)]
For converting a number from Decimal to Binary and pad digits to its front I need to perform the following steps:
Step1: BigInteger binary=new BigInteger(InputParam1,2);
-->This works when I define a BigInt with a base 16 (which I just tried when base2 failed) but not with a base 2 like above.
It throws a numberformat exception.
Step 2: String pad=StringUtils.repeat("0",InputParam2);
(to repeat 0 'InputParam2' number of times)
-->This works fine
Step 3: Need to append pad in front of binary(from Step1)
For a BigInteger, I'm not able to get something similar to .append.
So I'm trying to a)convert this BigInteger number to String and b)append pad c)convert back to BigInteger (this step I'm not able to get without creating a new BigInteger)
Any pointers on Step1 and Step3-c would be helpful please.

#1 - You haven't said what value you're passing in InputParam1, but my guess is that you're passing in something other than a string containing only '0' in '1' characters. The 'radix' parameter to the constructor tells the code how to interpret the string you're giving it. This works fine for me:
BigInteger binary=new BigInteger("1000",2);
System.out.println(binary);
How you construct the number has nothing to do with how you're eventually going to want to represent or display that number. The key idea here is that the representation of a number (hex, decimal, binary) has nothing to do with the number itself, just like leading zeros don't affect the value of number. 1111 binary, f hex, and 15 decimal are all the same number. If you pass the three of these into BigDecimal's constructor with a radix of 2, 16 and 10 respectively, you'll end up with EXACTLY the same thing in each case. The object will lose any notion of what kind of representation you used to set it to its initial value.
#3 - There is no concept of a number (int, BigInteger, etc.) with zero padding at the front, just as there's no notion of which base/radix you might use to represent a number visually. You can think about it conceptually, but that's just a way of displaying the number. It has nothing to do with the number itself.
I hadn't tried it before, but it seems there's no super simple way to format a binary value in Java with leading 0s, like there is for decimal and hex, since String.format() doesn't give a format specifier for binary. Per this StackOverflow post How to get 0-padded binary representation of an integer in java?, this seems to be about the best way to go, having converted the most accepted answer to work with BigInteger:
String str = String.format("%16s", binary.toString(2)).replace(' ', '0');
So here's all of my sample code, with output:
BigInteger binary=new BigInteger("1000",2);
System.out.println(binary);
String str = String.format("%16s", binary.toString(2)).replace(' ', '0');
System.out.println(str);
Output:
8
0000000000001000

Convert 32bit binary string to byte

I am using FileInputStream to read a single byte at a time, I then turn it into a string using the Integer.toBinaryString(), and later need to cast it back into a byte.
here's my problem: java uses a 32bit byte, meaning that my string returned by Integer.toBinaryString() has a length of 32, and I've been getting quite frustrated trying to convert that back into a byte because methods like Byte.valueOf() and so on, all throw an error pertaining to the value being out of byte range.
Am I missing something? Or do I need to develop my own method.

I found the simple answer :|
public class abc
{
public static void main (String[] args)
{
byte b = -1;
String binString = Integer.toBinaryString(b); //value 1111,1111,1111,1111,1111,1111,1111,1111
new BigInteger(binString, 2).byteValue(); //value -1
}
}
It truncates the binString, but that doesn't effect the byte value, because the 24 bits on the left aren't really relevant to the byte.
Now, in the example above I'm using a BigInteger and its byteValue() method, the reason for that, and why I'm not just using a 0xff solution, is because the Integer.toBinaryString() in the code above will return a string digit (32 1's) with a length of 32 and this cannot be casted into an int or even a long, because it is too large a number/value. As a result of that, not being able to cast it into an int or long, one can't perform operations like & on it, so the solution is to trim the string of the of the 24 bits on the left, then one can cast/convert the shorter string into an int, which is what BigInteger does in the code above.
The byte will still maintain its value, because the value is only contained in the 8 bits on the right(staying true to what a byte is). The added 24 bits that java sometimes adds to the 8 bit byte, is for java's own purposes and can be ignored if need be, such as the example, the problem which spawned this thread.

Get least significant bytes from an integer

I need to sum all data bytes in ByteArrayOutputStream, adding +1 to the result and taking the 2 least significant bytes.
int checksum = 1;
for(byte b : byteOutputStream.toByteArray()) {
checksum += b;
}
Any input on taking the 2 least significant bytes would be helpful. Java 8 is used in the environment.

If you really mean least significant bytes then:
checksum & 0xFFFF
If you meant that you want to take least significant bits from checksum, then:
checksum & 0x3

Add
checksum &= 0x0000ffff;
That will zero out everything to the left of the 2 least significant bytes.

Your question is a bit underspecified. You didn’t say neither, what you want to do with these two bytes nor how you want to store them (which depends on what you want to do).
To get to individual bytes, you can use
byte lowest = (byte)checksum, semiLowest=(byte)(checksum>>8);
In case you want to store them in a single integer variable, you have to decide, how these bytes are to be interpreted numerically, i.e signed or unsigned.
If you want a signed interpretation, the operation is as simple as
short lowest2bytes = (short)checksum;
If you want an unsigned interpretation, there’s the obstacle that Java has no dedicated type for that. There is a 2 byte sized unsigned type (char), but using it for numerical values can cause confusion when other code tries to interpret it as character value (i.e. when printing). So in that case, the best solution is to use an int variable again and only initialize it with the unsigned char value:
int lowest2bytes = (char)checksum;
Note that this is semantically equivalent to
int lowest2bytes = checksum&0xffff;
seen in other solutions.

Array of chars vs. array of bytes

I've found a few answers about this but none of them seem to apply to my issue.
I'm using the NDK and C++ is expecting an unsigned char array of 1024 elements, so I need to create this in java to pass it as a parameter.
The unsigned char array is expected to contain both numbers and characters.
I have tried this:
byte[] lMessage = new byte[1024];
lMessage[4] = 'a';
The problem is that then the 4th element gets added as a numerical value instead of maintaining the 'a' character.
I have also tried
char[] lMessage = new char[1024];
lMessage[4] = 'a';
While this retains the character, it does duplicate the amount of bytes in the array from 8 to 16.
I need the output to be a 8 bit ASCII unsigned array.
Any suggestions?
Thanks.

It is wrong to say that the element "gets added as a numerical value". The only thing that you can say for sure is that it gets added as electrostatic charges in eight cells of your RAM.
How you choose to represent those eight bits (01100001) in order to visualize them has little to do with what they really are, so if you choose to see them as a numerical value, then you might be tricked into believing that they are in fact a numerical value. (Kind of like a self-fulfilling prophecy (wikipedia).)
But in fact they are nothing but 8 electrostatic charges, interpretable in whatever way we like. We can choose to interpret them as a two's complement number (97), we can choose to interpret them as a binary-coded decimal number (61), we can choose to interpret them as an ASCII character ('a'), we can choose to interpret them as an x86 instruction opcode (popa), the list goes on.
The closest thing to an unsigned char in C++ is a byte in java. That's because the fundamental characteristic of these small data types is how many bits long they are. Chars in C++ are 8-bit long, and the only type in java which is also 8-bits long is the byte.
Unfortunately, a byte in java tends to be thought of as a numerical quantity rather than as a character, so tools (such as debuggers) that display bytes will display them as little numbers. But this is just an arbitrary convention: they could have just as easily chosen to display bytes as ASCII (8-bit) characters, and then you would be seeing an actual 'a' in byte[] lMessage[4].
So, don't be fooled by what the tools are showing, all that counts is that it is an 8-bit quantity. And if the tools are showing 97 (0x61), then you know that the bit pattern stored in those 8 memory cells can just as legitimately be thought of as an 'a', because the ASCII code of 'a' is 97.
So, finally, to answer your question, what you need to do is find a way to convert a java string, which consists of 16-bit unicode characters, to an array of ASCII characters, which would be bytes in java. You can try this:
String s = "TooManyEduardos";
byte[] bytes = s.getBytes("US-ASCII");
Or you can read the answers to this question: Convert character to ASCII numeric value in java for more ideas.

Will work for ASCII chars
lMessage[4] = new String('a').getBytes()[0];

What is the best way to work around the fact that ALL Java bytes are signed?

In Java, there is no such thing as an unsigned byte.
Working with some low level code, occasionally you need to work with bytes that have unsigned values greater than 128, which causes Java to interpret them as a negative number due to the MSB being used for sign.
What's a good way to work around this? (Saying don't use Java is not an option)

It is actually possible to get rid of the if statement and the addition if you do it like this.
byte[] foobar = ..;
int value = (foobar[10] & 0xff);
This way Java doesn't interpret the byte as a negative number and flip the sign bit on the integer also.

When reading any single value from the array copy it into something like a short or an int and manually convert the negative number into the positive value it should be.
byte[] foobar = ..;
int value = foobar[10];
if (value < 0) value += 256 // Patch up the 'falsely' negative value
You can do a similar conversion when writing into the array.

Using ints is generally better than using shorts because java uses 32-bit values internally anyway (Even for bytes, unless in an array) so using ints will avoid unnecessary conversion to/from short values in the bytecode.

Probably your best bet is to use an integer rather than a byte. It has the room to allow for numbers greater than 128 without the overhead of having to create a special object to replace byte.
This is also suggested by people smarter than me (everybody)
http://www.darksleep.com/player/JavaAndUnsignedTypes.html
http://www.jguru.com/faq/view.jsp?EID=13647

The best way to do bit manipulation/unsigned bytes is through using ints. Even though they are signed they have plenty of spare bits (32 total) to treat as an unsigned byte. Also, all of the mathematical operators will convert smaller fixed precision numbers to int. Example:
short a = 1s;
short b = 2s;
int c = a + b; // the result is up-converted
short small = (short)c; // must cast to get it back to short
Because of this it is best to just stick with integer and mask it to get the bits that you are interested in. Example:
int a = 32;
int b = 128;
int foo = (a + b) | 255;
Here is some more info on Java primitive types http://mindprod.com/jgloss/primitive.html
One last trivial note, there is one unsigned fixed precision number in Java. That is the char primitive.

I know this is a very late response, but I came across this thread when trying to do the exact same thing. The issue is simply trying to determine if a Java byte is >127.
The simple solution is:
if((val & (byte)0x80) != 0) { ... }
If the real issue is >128 instead, just adding another condition to that if-statement will do the trick.

I guess you could just use a short to store them. Not very efficient, but really the only option besides some herculean effort that I have seen.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.