Tokenising binary data in java

Tokenising binary data in java - java

I am writing some code to handle a stream of binary data. It is received in chunks represented by byte arrays. Combined, the byte arrays represent sequential stream of messages, each of which ends with the same constant terminator value (0xff in my case). However, the terminator value can appear at any point in any given chunk of data. A single chunk can contain part of a message, multiple messages and anything in between.
Here is a small sampling of what data handled by this might look like:
[0x00, 0x0a, 0xff, 0x01]
[0x01, 0x01]
[0xff, 0x01, 0xff]
This data should be converted into these messages:
[0x00, 0x0a, 0xff]
[0x01, 0x01, 0x01, 0xff]
[0x01, 0xff]
I have written a small class to handle this. It has a method to add some data in byte array format, which is then placed in a buffer array. When the terminator character is encountered, the byte array is cleared and the complete message is placed in a message queue, which can be accessed using hasNext() and next() methods (similar to an iterator).
This solution works fine, but as I finished it, I realized that there might already be some stable, performant and tested code in an established library that I could be using instead.
So my question is - do you know of a utility library that would have such a class, or maybe there is something in the standard Java 6 library that can do this already?

I don't think you need a framework as a custom parser is simple enough.
InputStream in = new ByteArrayInputStream(new byte[]{
0x00, 0x0a, (byte) 0xff,
0x01, 0x01, 0x01, (byte) 0xff,
0x01, (byte) 0xff});
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int b; (b = in.read()) >= 0; ) {
baos.write(b);
if (b == 0xff) {
byte[] bytes = baos.toByteArray();
System.out.println(Arrays.toString(bytes));
baos = new ByteArrayOutputStream();
}
}
prints as (byte) 0xFF == -1
[0, 10, -1]
[1, 1, 1, -1]
[1, -1]

Related

Java equivalent of struct.unpack('d', s.decode('hex'))[0]

I am reading a file which is stored binaries.
In python I can decode the file easily
>>> s = '0000000000A0A240'
>>> s.decode('hex')
'\x00\x00\x00\x00\x00\xa0\xa2#'
>>> import struct
>>> struct.unpack('d', s.decode('hex'))[0]
2384.0
Now I want to do the same decoding in Java, do we have anything similar?

Since those bytes are in Little-Endian order, being C code on an Intel processor, use ByteBuffer to help flip the bytes:
String s = "0000000000A0A240";
double d = ByteBuffer.allocate(8)
.putLong(Long.parseUnsignedLong(s, 16))
.flip()
.order(ByteOrder.LITTLE_ENDIAN)
.getDouble();
System.out.println(d); // prints 2384.0
Here I'm using Long.parseUnsignedLong(s, 16) as a quick way to do the decode('hex') for 8 bytes.
If data is already a byte array, do this:
byte[] b = { 0x00, 0x00, 0x00, 0x00, 0x00, (byte) 0xA0, (byte) 0xA2, 0x40 };
double d = ByteBuffer.wrap(b)
.order(ByteOrder.LITTLE_ENDIAN)
.getDouble();
System.out.println(d); // prints 2384.0
Imports for the above are:
import java.nio.ByteBuffer;
import java.nio.ByteOrder;

LengthFieldBasedFrameDecoder not parsing correctly when buffer size is less than frame size

I am unit testing a netty pipeline using the frame based decoder. It looks like the framing is incorrect if I use buffer size that is smaller that the largest frame. I am testing with a file that contains two messages. The length field is the second work and includes the length of the entire message including the length field and the work before it.
new LengthFieldBasedFrameDecoder(65536, 4, 4, -8, 0)
I am reading a file with various block sizes. The size of the first message is 348 bytes, the second is 456 bytes. If block size of 512, 3456, or larger, is used both messages are read and correctly framed to the next handler which for diagnostic purposes will print out as a hexadecimal string the contents of the buffer it received. If a smaller block size is used framing errors occur. The code used to read and write the file is shown below.
public class NCCTBinAToCSV {
private static String inputFileName = "/tmp/combined.bin";
private static final int BLOCKSIZE = 456;
public static void main(String[] args) throws Exception {
byte[] bytes = new byte[BLOCKSIZE];
EmbeddedChannel channel = new EmbeddedChannel(
new LengthFieldBasedFrameDecoder(65536, 4, 4, -8, 0),
new NCCTMessageDecoder(),
new StringOutputHandler());
FileInputStream fis = new FileInputStream(new File(inputFileName));
int bytesRead = 0;
while ((bytesRead = fis.read(bytes)) != -1) {
ByteBuf buf = Unpooled.wrappedBuffer(bytes, 0, bytesRead);
channel.writeInbound(buf);
}
channel.flush();
}
}
Output from a successful run with block size of 356 bytes is show below (with the body of the messages truncated for brevity
LOG:DEBUG 2017-04-24 04:19:24,675[main](netty.NCCTMessageDecoder) - com.ticomgeo.mtr.ncct.netty.NCCTMessageDecoder.decode(NCCTMessageDecoder.java:21) ]received 348 bytes
Frame Start========================================
(byte) 0xbb, (byte) 0x55, (byte) 0x05, (byte) 0x16,
(byte) 0x00, (byte) 0x00, (byte) 0x01, (byte) 0x5c,
(byte) 0x01, (byte) 0x01, (byte) 0x02, (byte) 0x02,
(byte) 0x05, (byte) 0x00, (byte) 0x00, (byte) 0x00,
(byte) 0x50, (byte) 0x3a, (byte) 0xc9, (byte) 0x17,
....
Frame End========================================
Frame Start========================================
(byte) 0xbb, (byte) 0x55, (byte) 0x05, (byte) 0x1c,
(byte) 0x00, (byte) 0x00, (byte) 0x01, (byte) 0xc8,
(byte) 0x01, (byte) 0x01, (byte) 0x02, (byte) 0x02,
(byte) 0x05, (byte) 0x00, (byte) 0x00, (byte) 0x00,
(byte) 0x04, (byte) 0x02, (byte) 0x00, (byte) 0x01,
If I change the block size to 256, the wrong bytes seem to be read as the length field.
Exception in thread "main" io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 65536: 4294967040 - discarded
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:499)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)

TL;DR; Your problem is caused because netty reuses the passed in bytebuf, and then you are overwriting the contents.
LengthFieldBasedFrameDecoder is designed through inheritance to reuse the passed in ByteBuf, because it is useless to let the object decay through garbage collection when you can reuse it because its reference count is 1. The problem however comes from the fact that you are changing the internals of the passed in bytebuf, and therefore changing the frame on the fly. Instead of making a wrappedBuffer, that uses your passed in variable as storage, you should use copiedBuffer, because that one properly makes a copy of it, so the internals of LengthFieldBasedFrameDecoder can do freely things with it.

int to unsigned char array in Java

I'm trying to connect Android(Java) with Linux(Qt C++) using socket. After that I want to transfer length of message in bytes. For converting int in unsigned char array on the C++ side I use:
QByteArray IntToArray(qint32 source)
{
QByteArray tmp;
QDataStream data(&temp, QIODevice::ReadWrite);
data << source;
return tmp;
}
But I don't know how I can do the same converting on the Java side, because Java hasn't unsigned types. I tried to use some examples but always got different results. So, I need Java method which returns this for source = 17:
0x00, 0x00, 0x00, 0x11
I understand that it's a very simple question, but I'm new in Java so it's not clear to me.
UPD:
Java:
PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
out.print(ByteBuffer.allocate(4).putInt(17).array());
Qt C++:
QByteArray* buffer = new QByteArray();
buffer->append(socket->readAll());
Output:
buffer = 0x5b, 0x42, 0x40, 0x61, 0x39, 0x65, 0x31,
0x62, 0x66, 0x35.
UPD2:
Java:
out.print(toBytes(17));
...
byte[] toBytes(int i)
{
byte[] result = new byte[4];
result[0] = (byte) (i >> 24);
result[1] = (byte) (i >> 16);
result[2] = (byte) (i >> 8);
result[3] = (byte) (i /*>> 0*/);
return result;
}
Qt C++: same
Output:
buffer = 0x5b, 0x42, 0x40, 0x63, 0x38, 0x61, 0x39,
0x33, 0x38, 0x33.
UPD3:
Qt C++:
QByteArray buffer = socket->readAll();
for(int i = 0; i < buffer.length(); ++i){
std::cout << buffer[i];
}
std::cout<<std::endl;
Output:
[B#938a15c

First of all, don't use PrintWriter.
Here's something to remember about Java I/O:
Streams are for bytes, Readers/Writers are for characters.
In Java, a character is not a byte. Characters have an encoding associated with them, like UTF-8. Bytes don't.
When you wrap a Stream in a Reader or a Writer, you are taking a byte stream and imposing a character encoding on that byte stream. You don't want that here.
Just try this:
OutputStream out = socket.getOutputStream();
out.write(toBytes(17));

Android - Writing to ISO15693 Tags

I'm currently trying to write a couple of bytes to a specific block. My read commands work fine and I am able to read any block of my tag using the code below:
command = new byte[]{
(byte) 0x02, // Flags
(byte) 0x23, // Command: Read multiple blocks
(byte) 0x09, // First block (offset)
(byte) 0x03 // Number of blocks // MAX READ SIZE: 32 blocks:1F
};
byte[] data = nfcvTag.transceive(command);
When I try to write with the code below, my app crashes.
Write = new byte[]{
(byte) 0x02, // Flags
(byte) 0x21, // Command: Write 1 blocks
(byte) 0x5A, // First block (offset)
(byte) 0x41 // Data
};
nfcvTag.transceive(Write);
I'm doing this in an AsyncTask and getting the java.lang.RuntimeException: Can't create handler inside thread that has not called Looper.prepare() exception.
Any tips? The tag is a STMicroelectronics M24LR04E-R

Figured it out. I was only writing 8 bits of data while the tag has 32 bits per block. Added 3 0x00's and the write was successful.
Write = new byte[]{
(byte) 0x02, // Flags
(byte) 0x21, // Command: Write 1 blocks
(byte) 0x5A, // First block (offset)
(byte) 0x41,
(byte) 0x00,
(byte) 0x00,
(byte) 0x00
};
nfcvTag.transceive(Write);

Query on reading bytes from "UTF-8" world to Java "char"

With the below code snippet given in this link,
byte[] bytes = {0x00, 0x48, 0x00, 0x69, 0x00, 0x2C,
0x60, (byte)0xA8, 0x59, 0x7D, 0x00, 0x21}; // "Hi,您好!"
Charset charset = Charset.forName("UTF-8");
// Encode from UCS-2 to UTF-8
// Create a ByteBuffer by wrapping a byte array
ByteBuffer bb = ByteBuffer.wrap(bytes);
// Create a CharBuffer from a view of this ByteBuffer
CharBuffer cb = bb.asCharBuffer();
Using wrap() method, "The new buffer will be backed by the given byte array", Here we do not have any encoding from byte to other format, it just placed byte array in a buffer.
Can you please help me understand, what exactly are we doing when we say bb.asCharBuffer() in the above code?cb is similar to array of characters. Because char is UTF-16 in Java, Using asCharBuffer() method, Are we considering every 2bytes in bb as char? Is this the right approach? If no, Please help me with right approach.
Edit:
I tried this program as recommended by Meisch below,
byte[] bytes = {0x00, 0x48, 0x00, 0x69, 0x00, 0x2C,
0x60, (byte)0xA8, 0x59, 0x7D, 0x00, 0x21}; // "Hi,您好!"
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer bb = ByteBuffer.wrap(bytes);
CharBuffer cb = decoder.decode(bb);
which gives exception
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at java.nio.charset.CharsetDecoder.decode(Unknown Source)
at TestCharSet.main(TestCharSet.java:16)
Please help me, am stuck up here!!!
Note : am using java 1.6

You ask: “Because char is UTF-16 in Java, using asCharBuffer() method, are we considering every 2 bytes in bb as char?”
The answer to that question is yes. Your understanding is correct.
Your next question is: “Is this the right approach?”
If you are just trying to demonstrate how the ByteBuffer, CharBuffer and Charset classes work, it's acceptable.
However, when you are coding an application, you will never write code like that. To begin with, there is no need for a byte array; you can represent the characters as a literal String:
String s = "Hi,\u60a8\u597d!";
If you want to convert the string to UTF-8 bytes, you can simply do this:
byte[] encodedBytes = s.getBytes(StandardCharsets.UTF_8);
If you're still using Java 6, you would do this instead:
byte[] encodedBytes = s.getBytes("UTF-8");
Update: Your byte array represents chars in the UTF-16BE (big-endian) encoding. Specifically, your array has exactly two bytes per character. That is not a valid UTF-8 encoded byte sequence, which is why you're getting the MalformedInputException.
When characters are encoded as UTF-8 bytes, each character will be represented with 1 to 4 bytes. For your second code fragement to work, the array must be:
byte[] bytes = {
0x48, 0x69, 0x2c, // ASCII chars are 1 byte each
(byte) 0xe6, (byte) 0x82, (byte) 0xa8, // U+60A8
(byte) 0xe5, (byte) 0xa5, (byte) 0xbd, // U+597D
0x21
};
When converting from bytes to chars, my earlier statement still applies: You don't need ByteBuffer or CharBuffer or Charset or CharsetDecoder. You can use those classes, but usually it's more succinct to just create a String:
String s = new String(bytes, "UTF-8");
If you want a CharBuffer, just wrap the String:
CharBuffer cb = CharBuffer.wrap(s);
You may be wondering when it is appropriate to use a CharsetDecoder directly. You would do that if the bytes are coming from a source which is not under your control, and you have good reason to believe it may not contain properly UTF-8 encoded bytes. Using an explicit CharsetDecoder allows you to customize how invalid bytes will be handled.

I just had a look at the sources, it boils down to two bytes from the byte buffer being combined into one character. The order in which the two bytes are used depends on the endianness, default ist big-endian.
Another approach using nio classes than what I wrote in the comments would be to use the CharsetDecoder.decode() method.
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
ByteBuffer bb = ByteBuffer.wrap(bytes);
CharBuffer cb = decoder.decode(bb);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tokenising binary data in java - java

Related

Java equivalent of struct.unpack('d', s.decode('hex'))[0]

LengthFieldBasedFrameDecoder not parsing correctly when buffer size is less than frame size

int to unsigned char array in Java

Android - Writing to ISO15693 Tags

Query on reading bytes from "UTF-8" world to Java "char"

Categories

Resources