Left shift unsigned byte, a better way? - java

I have an array of bytes (because unsigned byte isn't an option) and need to take 4 of them into a 32 bit int. I'm using this:
byte rdbuf[] = new byte[fileLen+1];
int i = (rdbuf[i++]) | ((rdbuf[i++]<<8)&0xff00) | ((rdbuf[i++]<<16)&0xff0000) | ((rdbuf[i++]<<24)&0xff000000);
If i don't do all the logical ands, it sign extends the bytes which is clearly not what I want.
In c this would be a no brainer. Is there a better way in Java?

You do not have to do this, you can use a ByteBuffer:
int i = ByteBuffer.wrap(rdbuf).order(ByteOrder.LITTLE_ENDIAN).getInt();
If you have many ints to read, the code becomes:
ByteBuffer buf = ByteBuffer.wrap(rdbuf).order(ByteOrder.LITTLE_ENDIAN);
while (buf.remaining() >= 4) // at least four bytes
i = bb.getInt();
Javadoc here. Recommended for use in any situation where binary data has to be dealt with (whether you read or write such data). Can do little endian, big endian and even native ordering. (NOTE: big endian by default).
(edit: #PeterLawrey rightly mentions that this looks like little endian data, fixed code extract -- also, see his answer for how to wrap the contents of a file directly into a ByteBuffer)
NOTES:
ByteOrder has a static method called .nativeOrder(), which returns the byte order used by the underlying architecture;
a ByteBuffer has a builtin offset; the current offset can be queried using .position(), and modified using .position(int); .remaining() will return the number of bytes left to read from the current offset until the end;
there are relative methods which will read from/write at the buffer's current offset, and absolute methods, which will read from/write at an offset you specify.

Instead of reading into a byte[] which you have to wrap with a ByteBuffer which does the shift/mask for you, you can use a direct ByteBuffer which avoid all this overhead.
FileChannel fc = new FileInputStream(filename).getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(fc.size()).order(ByteBuffer.nativeOrder());
fc.read(bb);
bb.flip();
while(bb.remaining() > 0) {
int n = bb.getInt(); // grab 32-bit from direct memory without shift/mask etc.
short s = bb.getShort(); // grab 16-bit from direct memory without shift/mask etc.
// get a String with an unsigned 16 bit length followed by ISO-8859-1 encoding.
int len = bb.getShort() & 0xFFFF;
StringBuilder sb = new StringBuilder(len);
for(int i=0;i<len;i++) sb.append((char) (bb.get() & 0xFF));
String text = sb.toString();
}
fc.close();

Related

How to read and write to/from file at a bit level precision in Java

I am little new to bit level reading and writing to/from file.
I want to read and write to file with precision at bit level. i.e. read exactly same number of bits in buffer as it is written.
Here is my attempt:
1) Write to file method
private static final int INT_BYTES = Integer.SIZE / Byte.SIZE;
public void writeToFile(FileChannel fc, BitSet bitSet, int intId ) throws IOException
{
//each bitSet has a intId, first two int bytes written will have intId and bitSet.length() and rest will have content of bitSet
int byteLenOfBitSet=(int)Math.ceil((bitSet.length/8.0));
ByteBuffer bf = ByteBuffer.allocate(byteLenOfBitSet + 2*(INT_BYTES));
bf.putInt(intId); //put the Id
bf.putInt(bitSet.length(); // put the bit length, it would be used during read
bf.put(bitSet) //FIXME this is wrong, need to put bits , bf.put() put bytes
bf.flip();
fc.write(bf);
}
2) Read from file method
public Result readFromFile(FileChannel fc) throws IOException
{
long currentPos = fc.position();
ByteBuffer bf = ByteBuffer.allocate(2* INT_BYTES);
if(fc.read(bf) < 0)return null;
bf.rewind();
int intId=bf.getInt(); //read first int as intId
int bitLen = bf.getInt(); //read second int as bitLen to read further from file
int byteLen=(int)Math.ceil((bitLen/8.0)); //FIXME not so sure
//move fc read position ahead by 2 INT_BYTES to read bitSet
fc.position((currentPos + INT_BYTES * 2));
bf = ByteBuffer.allocate(byteLen);//FIXME, this is wrong, we need to somehow get bit not in bytelen , don't want unnecessarily read entire byte if there is less than 8 bit of info to read
if(fc.read(bf) < 0)return null;
bf.rewind();
BitSet readBitSet = new BitSet();
//TODO, then read each bit from bf to a readBitSet
// and return result with intId+readBitSet
}
In another set of methods, where I had to read and write just Integers (at byte level) I had got it working fine using logic similar to above. But, got stuck at bit level.
Please let me know if need more clarification.
It might be similar to Read and write file bit by bit
but that answer is for Perl, I am looking for implementation in Java
EDIT
My concern:
Since data are written to file this way
2 INT_BYTES then bitSet example: 5 3 101
2 INT_BYTES then bitSet example: 2 10 1010111101
I am concerned I might read second 2 INT_BYTES while trying to read first bitSet, so my first result bitSet would be wrong. So, wondering how to ensure that bit level boundary is maintained. i.e. I want to read until first BitSet's length only when reading first bitSet.
This answer includes a BitSet subclass that has a toByteArray method. To write, you can get the byte[] array from that method and use ByteBuffer.put(byte[]) (ByteBuffer docs). To read, you can use get() and then loop over the byte[] and rebuild the bitset.
(For reference: FileChannel docs)
EDIT in answer to your question below.
I think you can get rid of fc.position since fc.read and bf.getInt both advance their current positions.
According to this answer, the parameter to allocate should be the number of bytes you want to read with a call to fc.read. So 2*INT_BYTES looks correct for the first allocate call. The second allocate also looks OK; just don't call fc.position.
For byteLen, try byteLen=(bitLen >> 3) + ((bitLen&0x07)?1:0). bitLen>>3 divides by 8 (2^3) with truncation. So 1..7 bits have zero, 8..15 have one, ... . If the number of bits is not a multiple of 8, you thus need one more byte. ((bitLen&0x07)?1:0) is 1 in that situation and 0 otherwise.
Bear in mind that the bits will be padded at the end if you don't have a multiple of 8. E.g., reading 12 bits will take two full bytes from the stream.

Parse C byte array into Java ByteBuffer.

I'm parsing a byte array which contains variables of different types. I'm getting this array from HID connected to my phone. Array was made by C programmer. I'm trying to parse it using ByteBuffer class:
byte[] buffer = new byte[64];
if(connection.bulkTransfer(endpoint, buffer, 64, 1000) >= 0)
{
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
char mId = byteBuffer.getChar();
short rId = byteBuffer.getShort();
// ............................
}
But the values of this variables are not correct. Can anyone please tell me what i'm doing wrong?
There are systems with LitteEndian Byte order and BigEndian.
java uses BigEndian.
If the c programmer wrote the byte array in Little endian, you could use DataInputStream based on an Appache LittleEndianInputStream:
LittleEndianInputStream leis = new LittleEndianInputStream(is);
DataInputStream dis = new DataInputStream(leis);
int i1 = dis.readInt();
short s2 = dis.readShort();
If you and your colleague define a binary interface (file, or byte array) you always should force a speciifc byte order (Either little or big endian).
If byte order (little vs big endian) is the issue, you can set the byte order for the ByteBuffer to native without changing all of the program:
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
byteBuffer.order(ByteOrder.nativeOrder()); // Set native byte order
char mId = byteBuffer.getChar();
short rId = byteBuffer.getShort();
On the other hand, if you find ByteBuffer objects more convenient than byte arrays, tell the C programmer to return you a direct byte buffer instead of an array: easier for all parties and probably more efficient.

Reading serial information in Java

I'm working on porting an Android app that has already been developed in Python. In the Python program, there is a line that I'm trying to fully understand:
self.comfd = Serial(...) # from the pySerial API
....
self.buffer = list(struct.unpack('192H', self.comfd.read(384)))
From what I understand, self.comfd.read(384) is reading 384 bytes, and the unpack('192H' is unpacking 192 unsigned shorts from that data. Is this correct?
Now in Java, I've been able to read the buffer using
SerialPort device = SerialPort(file, baud, flags);
InputStream in = device.getInputStream();
My question is, now that I have the input stream, how do I create the unsigned shorts like the Python program is doing?
What I've tried (not producing correct values):
byte[] buffer = new byte[384];
in.read(buffer);
ByteBuffer bb = ByteBuffer.allocate(2);
for (int i = 0; i < buffer.length / 2; i++) {
bb.put(buffer[i]);
bb.put(buffer[i + 1]);
short val = bb.getShort(0);
System.out.println(val);
bb.clear();
}
What am I doing wrong? Thanks for any help!
edit: I incorporated Jason C's answer and also I was looping incorrectly. By changing it to
for (int i = 0; i < buffer.length; i=i+2) that fixed my problem.
You could use a char (it's a 16-bit unsigned value in Java), e.g.:
byte[] buffer = ...;
ByteBuffer bb = ByteBuffer.wrap(buffer); // don't need to put()
int val = (int)bb.getChar(0);
Use bb.order() to set big- vs. little-endian.
You can also just pack the 2 bytes into an int (assuming little-endian) without using a ByteBuffer. Byte is signed in Java, so you will have to convert the byte to an unsigned value before shifting, which you can do by temporarily storing it in a short (or an int or anything large enough to hold 0-255):
short b0 = (buffer[0] & 255); // trick converts to unsigned
short b1 = (buffer[1] & 255);
int val = b0 | (b1 << 8);
// or just put it all inline:
int val = (buffer[0]&255) | ((buffer[1]&255) << 8);
For big-endian data just swap b0 and b1.
Hope that helps.
Java has no unsigned numbers (char is 16bit unsigned but it's not a number and math with char will always result in implicit casts to int)
If you read 2 bytes of unsigned data into a short and want to see values in range from 0-65535 (instead of -32768 - 32767) you'll have to use a type that can have values in that range.
In case of 16bit short next bigger one is 32bit int. The conversion that does the trick is
short signed = ...;
int unsigned = signed & 0xFFFF;
Assuming signed has a value of 0xFFFF this is what happens:
short signed = -1; // FFFF on byte level
The expression signed & 0xFFFF contains a short and an int. 0xFFFF is a literal integer type number which when found in Java source is considered int. You could make it long by changing it to 0xFFFFL (you would need that if you want to convert unsigned int to long).
Since the & operator needs both sides in a common type Java will silently convert the smaller one.
int stillSigned = (int) signed; // hidden step
It will still have the exact same value of -1 since that's what it was before when looking at it unsigned but it is changed on bytelevel to 0xFFFFFFFF.
Now the bit-manipulation is applied to remove all the added FFs
int unsigned = stillSigned & 0xFFFF;
and you end up with 0x0000FFFF on byte level and can finally see the value of 65535.
Since you happen to have 16bit values you can use char and simply cast it to int.
char value = ...;
int unsigned = value;
But above approach works for any unsigned conversion: byteValue & 0xFF, shortValue & 0xFFFF, intValue & 0xFFFFFFFFL
The next thing you should do is not to use a simple InputStream to do
SerialPort device = SerialPort(file, baud, flags);
InputStream in = device.getInputStream();
byte[] buffer = new byte[384];
in.read(buffer);
Reason is that InputStream#read(byte[]) is not guaranteed to read all the bytes you want in your buffer. It returns you the number of bytes it has read or -1 if the stream is done. Manually writing code that ensures you have a filled buffer is nasty but there is a simple solution: DataInputStream
SerialPort device = SerialPort(file, baud, flags);
DataInputStream in = new DataInputStream(device.getInputStream());
byte[] buffer = new byte[384];
in.readFully(buffer);
DataInputStream has very nice functionality that you could use:
SerialPort device = SerialPort(file, baud, flags);
DataInputStream in = new DataInputStream(device.getInputStream());
int unsignedShort = in.readUnsignedShort();
Another way to get different numbers out of byte[] data is to use ByteBuffer since that provides methods like .getShort()
SerialPort device = SerialPort(file, baud, flags);
DataInputStream in = new DataInputStream(device.getInputStream());
byte[] buffer = new byte[384];
in.readFully(buffer);
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
while (byteBuffer.hasRemaining()) {
int unsigned = byteBuffer.getChar();
System.out.println(unsigned);
}

Get single bytes from multi-byte variable in java

How can I split a variable into single bytes in java? I have for example following snippet in C++:
unsigned long someVar;
byte *p = (byte*)(void*) someVar; // byte being typedef unsigned char (from 0 to 255)
byte *bytes = new byte[sizeof(someVar)];
for(byte i = 0;i<sizeof(someVar);i++)
{
bytes[i] = *p++;
}
.... //do something with bytes
I want to accomplish the same under java, but I can't seem to find an obvious workaround.
There are two ways to do it with the ByteBuffer class. One is to create a new byte array dynamically.
long value = 123;
byte[] bytes = ByteBuffer.allocate(8).putLong(value).array();
Another is to write to an existing array.
long value = 123;
byte[] bytes = new byte[8];
ByteBuffer.wrap(bytes).putLong(value);
// bytes now contains the byte representation of 123.
If you use Guava, there is a convenience Longs.toByteArray. It is simply a wrapper for John's ByteBuffer answer above, but if you already use Guava, it's slightly "nicer" to read.

Fast ByteBuffer to CharBuffer or char[]

What is the fastest method to convert a java.nio.ByteBuffer a into a (newly created) CharBuffer b or char[] b.
By doing this it is important, that a[i] == b[i]. This means, that not a[i] and a[i+1] together make up a value b[j], what getChar(i) would do, but the values should be "spread".
byte a[] = { 1,2,3, 125,126,127, -128,-127,-126 } // each a byte (which are signed)
char b[] = { 1,2,3, 125,126,127, 128, 129, 130 } // each a char (which are unsigned)
Note that byte:-128 has the same (lower 8) bits as char:128. Therefore I assume the "best" interpretation would be as I noted it above, because the bits are the same.
After that I also need the vice versa translation: The most efficient way to get a char[] or java.nio.CharBuffer back into a java.nio.ByteBuffer.
So, what you want is to convert using the encoding ISO-8859-1.
I don't claim anything about efficiency, but at least it is quite short to write:
CharBuffer result = Charset.forName("ISO-8859-1").decode(byteBuffer);
The other direction would be:
ByteBuffer result = Charset.forName("ISO-8859-1").encode(charBuffer);
Please measure this against other solutions. (To be fair, the Charset.forName part should not be included, and should also be done only once, not for each buffer again.)
From Java 7 on there also is the StandardCharsets class with pre-instantiated Charset instances, so you can use
CharBuffer result = StandardCharsets.ISO_8859_1.decode(byteBuffer);
and
ByteBuffer result = StandardCharsets.ISO_8859_1.encode(charBuffer);
instead. (These lines do the same as the ones before, just the lookup is easier and there is no risk to mistype the names, and no need to catch the impossible exceptions.)
I would agree with #Ishtar's, suggest to avoid converting to a new structure at all and only convert as you need it.
However if you have a heap ByteBuffer you can do.
ByteBuffer bb = ...
byte[] array = bb.array();
char[] chars = new char[bb.remaining()];
for (int i = 0; i < chars.length; i++)
chars[i] = (char) (array[i + bb.position()] & 0xFF);
Aside from deferring creation of CharBuffer, you may be able to get by without one.
If code that is using data as characters does not strictly need a CharBuffer or char[], just do simple on-the-fly conversion; use ByteBuffer.get() (relative or absolute), convert to char (note: as pointed out, you MUST unfortunately explicitly mask things; otherwise values 128-255 will be sign-extended to incorrect values, 0xFF80 - 0xFFFF; not needed for 7-bit ASCII), and use that.

Categories

Resources