How can I read a file as unsigned bytes in Java? - java

How can I read a file to bytes in Java?
It is important to note that all the bytes need to be positive, i.e. the negative range cannot be used.
Can this be done in Java, and if yes, how?
I need to be able to multiply the contents of a file by a constant. I was assuming that I can read the bytes into a BigInteger and then multiply, however since some of the bytes are negative I am ending up with 12 13 15 -12 etc and get stuck.

Well, Java doesn't have the concept of unsigned bytes... the byte type is always signed, with values from -128 to 127 inclusive. However, this will interoperate just fine with other systems which have worked with unsigned values for example, C# code writing a byte of "255" will produce a file where the same value is read as "-1" in Java. Just be careful, and you'll be okay.
EDIT: You can convert the signed byte to an int with the unsigned value very easily using a bitmask. For example:
byte b = -1; // Imagine this was read from the file
int i = b & 0xff;
System.out.println(i); // 255
Do all your arithmetic using int, and then cast back to byte when you need to write it out again.
You generally read binary data from from files using FileInputStream or possibly FileChannel.
It's hard to know what else you're looking for at the moment... if you can give more details in your question, we may be able to help you more.

With the unsigned API in Java 8 you have Byte.toUnsignedInt. That'll be a lot cleaner than manually casting and masking out.
To convert the int back to byte after messing with it of course you just need a cast (byte)value

You wrote in a comment (please put such informations in the question - there is an edit link for this):
I need to be able to multiply the contents of a file by a constant.
I was assuming that I can read the bytes into a BigInteger and then
multiply, however since some of the bytes are negative I am ending
up with 12 13 15 -12 etc and gets stuck.
If you want to use the whole file as a BigInteger, read it in a byte[], and give this array (as a whole) to the BigInteger-constructor.
/**
* reads a file and converts the content to a BigInteger.
* #param f the file name. The content is interpreted as
* big-endian base-256 number.
* #param signed if true, interpret the file's content as two's complement
* representation of a signed number.
* if false, interpret the file's content as a unsigned
* (nonnegative) number.
*/
public static BigInteger fileToBigInteger(File f, boolean signed)
throws IOException
{
byte[] array = new byte[file.length()];
InputStream in = new FileInputStream(file);
int i = 0; int r;
while((r = in.read(array, i, array.length - i) > 0) {
i = i + r;
}
in.close();
if(signed) {
return new BigInteger(array);
}
else {
return new BigInteger(1, array);
}
}
Then you can multiply your BigInteger and save the result in a new file (using the toByteArray() method).
Of course, this very depends on the format of your file - my method assumes the file contains the result of the toByteArray() method, not some other format. If you have some other format, please add information about this to your question.
"I need to be able to multiply the contents of a file by a constant." seems quite a dubious goal - what do you really want to do?

If using a larger integer type internally is not a problem, just go with the easy solution, and add 128 to all integers before multiplying them. Instead of -128 to 127, you get 0 to 255. Addition is not difficult ;)
Also, remember that the arithmetic and bitwise operators in Java only returns integers, so:
byte a = 0;
byte b = 1;
byte c = a | b;
would give a compile time error since a | b returns an integer. You would have to to
byte c = (byte) a | b;
So I would suggest just adding 128 to all your numbers before you multiply them.

Some testing revealed that this returns the unsigned byte values in [0…255] range one by one from the file:
Reader bytestream = new BufferedReader(new InputStreamReader(
new FileInputStream(inputFileName), "ISO-8859-1"));
int unsignedByte;
while((unsignedByte = bytestream.read()) != -1){
// do work
}
It seems to be work for all bytes in the range, including those that no characters are defined for in ISO 8859-1.

Related

Hash a hexadecimal number with sha-256 in java [duplicate]

The question is about the correct way of creating a hash in Java:
Lets assume I have a positive BigInteger value that I would like to create a hash from. Lets assume that below instance of the messageDigest is a valid instance of (SHA-256)
public static final BigInteger B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);
byte[] byteArrayBBigInt = B.toByteArray();
this.printArray(byteArrayBBigInt);
messageDigest.reset();
messageDigest.update(byteArrayBBigInt);
byte[] outputBBigInt = messageDigest.digest();
Now I only assume that the code below is correct, as according to the test the hashes I produce match with the one produced by:
http://www.fileformat.info/tool/hash.htm?hex=BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58
However I am not sure why we are doing the step below i.e.
because the returned byte array after the digest() call is signed and in this case it is a negative, I suspect that we do need to convert it to a positive number i.e. we can use a function like that.
public static String byteArrayToHexString(byte[] b) {
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
thus:
String hex = byteArrayToHexString(outputBBigInt)
BigInteger unsignedBigInteger = new BigInteger(hex, 16);
When I construct a BigInteger from the new hex string and convert it back to byte array then I see that the sign bit, that is most significant bit i.e. the leftmost bit, is set to 0 which means that the number is positive, moreover the whole byte is constructed from zeros ( 00000000 ).
My question is: Is there any RFC that describes why do we need to convert the hash always to a "positive" unsigned byte array. I mean even if the number produced after the digest call is negative it is still a valid hash, right? thus why do we need that additional procedure. Basically, I am looking for a paper: standard or rfc describing that we need to do so.
A hash consists of an octet string (called a byte array in Java). How you convert it to or from a large number (a BigInteger in Java) is completely out of the scope for cryptographic hash algorithms. So no, there is no RFC to describe it as there is (usually) no reason to treat a hash as a number. In that sense a cryptographic hash is rather different from Object.hashCode().
That you can only treat hexadecimals as unsigned is a bit of an issue, but if you really want to then you can first convert it back to a byte array, and then perform new BigInteger(result). That constructor does threat the encoding within result as signed. Note that in protocols it is often not needed to convert back and forth to hexadecimals; hexadecimals are mainly for human consumption, a computer is fine with bytes.

Convert 32bit binary string to byte

I am using FileInputStream to read a single byte at a time, I then turn it into a string using the Integer.toBinaryString(), and later need to cast it back into a byte.
here's my problem: java uses a 32bit byte, meaning that my string returned by Integer.toBinaryString() has a length of 32, and I've been getting quite frustrated trying to convert that back into a byte because methods like Byte.valueOf() and so on, all throw an error pertaining to the value being out of byte range.
Am I missing something? Or do I need to develop my own method.
I found the simple answer :|
public class abc
{
public static void main (String[] args)
{
byte b = -1;
String binString = Integer.toBinaryString(b); //value 1111,1111,1111,1111,1111,1111,1111,1111
new BigInteger(binString, 2).byteValue(); //value -1
}
}
It truncates the binString, but that doesn't effect the byte value, because the 24 bits on the left aren't really relevant to the byte.
Now, in the example above I'm using a BigInteger and its byteValue() method, the reason for that, and why I'm not just using a 0xff solution, is because the Integer.toBinaryString() in the code above will return a string digit (32 1's) with a length of 32 and this cannot be casted into an int or even a long, because it is too large a number/value. As a result of that, not being able to cast it into an int or long, one can't perform operations like & on it, so the solution is to trim the string of the of the 24 bits on the left, then one can cast/convert the shorter string into an int, which is what BigInteger does in the code above.
The byte will still maintain its value, because the value is only contained in the 8 bits on the right(staying true to what a byte is). The added 24 bits that java sometimes adds to the 8 bit byte, is for java's own purposes and can be ignored if need be, such as the example, the problem which spawned this thread.

Get least significant bytes from an integer

I need to sum all data bytes in ByteArrayOutputStream, adding +1 to the result and taking the 2 least significant bytes.
int checksum = 1;
for(byte b : byteOutputStream.toByteArray()) {
checksum += b;
}
Any input on taking the 2 least significant bytes would be helpful. Java 8 is used in the environment.
If you really mean least significant bytes then:
checksum & 0xFFFF
If you meant that you want to take least significant bits from checksum, then:
checksum & 0x3
Add
checksum &= 0x0000ffff;
That will zero out everything to the left of the 2 least significant bytes.
Your question is a bit underspecified. You didn’t say neither, what you want to do with these two bytes nor how you want to store them (which depends on what you want to do).
To get to individual bytes, you can use
byte lowest = (byte)checksum, semiLowest=(byte)(checksum>>8);
In case you want to store them in a single integer variable, you have to decide, how these bytes are to be interpreted numerically, i.e signed or unsigned.
If you want a signed interpretation, the operation is as simple as
short lowest2bytes = (short)checksum;
If you want an unsigned interpretation, there’s the obstacle that Java has no dedicated type for that. There is a 2 byte sized unsigned type (char), but using it for numerical values can cause confusion when other code tries to interpret it as character value (i.e. when printing). So in that case, the best solution is to use an int variable again and only initialize it with the unsigned char value:
int lowest2bytes = (char)checksum;
Note that this is semantically equivalent to
int lowest2bytes = checksum&0xffff;
seen in other solutions.

Can one force String.valueOf() to return unsigned value in Java? [duplicate]

This question already has answers here:
Can we make unsigned byte in Java
(17 answers)
Closed 9 years ago.
I have a byte array:
byte[] a = new byte[3];
which I have added some bytes. For this example, let's say 3, 4, and 210.
I would like to print this string of bytes to look like 3 4 210, but instead I get 3 4 -46
I am using String.valueOf(a[i]) to do my conversion. Is there any way to force this conversion to give unsigned values?
Thanks in advance,
EDIT: Thanks to the various feedback on this question. I had not realized Java Bytes were signed values by default, and so was suspecting the String.valueOf() method as being the issue. It turns out just simply using
String.valueOf(a[i]&0xFF)
takes care of the signed formatting issue.
Again, thank you for your feedback!
Guava provides a UnsignedBytes class that can make that conversion. The static toString(byte) method
Returns a string representation of x, where x is treated as unsigned.
For example
System.out.println(UnsignedBytes.toString(a[i]));
where a[i] = -46 would print
210
Internally, all this does is call
public static int toInt(byte value) {
return value & UNSIGNED_MASK; // UNSIGNED_MASK = 0xFF
}
and convert the int to a String which it returns.
For an explanation
With
someByte & 0xFF
since OxFF is an integer literal, the someByte value is widened to an int. Let's take for example the value -46. Its binary representation is
11111111111111111111111111010010
The binary representation of 0xFF is
11111111 // ie 255
if you and & the two
11111111111111111111111111010010
00000000000000000000000011111111
--------------------------------
00000000000000000000000011010010
which is equal to
210
Basically you only keep the lower 8 bits of the int.
Java byte data type range is minimum value of -128 and a maximum value of 127 (inclusive). String.valueOf(a[i]) doesn't do this conversion. Use int type instead.
byte
byte range limit is within -128 to 127,
so for 210 it gives -46. so convert it using int type
You've run into Java's famous problem of bytes treated as signed even though most of the real world prefers these unsigned. Try this:
int[] signedArr = new int[a.length];
for (int i=0; i<a.length; ++i) {
signedArr[i] = a[i] & 0xff;
}
Then you can work with signedArr.

Converting US-ASCII encoded byte to integer and back

I have a byte array that can be of size 2,3 or 4. I need to convert this to the correct integer value. I also need to do this in reverse, i.e an 2,3 or 4 character integer to a byte array.
e.g., raw hex bytes are : 54 and 49. The decoded string US-ASCII value is 61. So the integer answer needs to be 61.
I have read all the conversion questions on stackoverflow etc that I could find, but they all give the completely wrong answer, I dont know whether it could be the encoding?
If I do new String(lne,"US-ASCII"), where lne is my byte array, I get the correct 61. But when doing this ((int)lne[0] << 8) | ((int)lne[1] & 0xFF), I get the complete wrong answer.
This may be a silly mistake or I completely don't understand the number representation schemes in Java and the encoding/decoding idea.
Any help would be appreciated.
NOTE: I know I can just parse the String to integer, but I would like to know if there is a way to use fast operations like shifting and binary arithmetic instead?
Here's a thought on how to use fast operations like byte shifting and decimal arithmetic to speed this up. Assuming you have the current code:
byte[] token; // bytes representing a bunch of ascii numbers
int n = Integer.parseInt(new String(token)); // current approach
Then you could instead replace that last line and do the following (assuming no negative numbers, no foreign langauge characters, etc.):
int n = 0;
for (byte b : token)
n = 10*n + (b-'0');
Out of interest, this resulted in roughly a 28% speedup for me on a massive data set. I think this is due to not having to allocate new String objects and then trash them after each parseInt call.
You need two conversion steps. First, convert your ascii bytes to a string. That's what new String(lne,"us-ascii") does for you. Then, convert the string representation of the number to an actual number. For that you use something like Integer.parseInt(theString) -- remember to handle NumberFormatException.
As you say, new String(lne,"US-ASCII") will give you the correct string. To convert your String to an integer, use int myInt = Integer.parseInt(new String(lne,"US-ASCII"));

Categories

Resources