Parsing a TCP packet - java

I'm having some trouble to parse a TCP packet from a socket...
In my protocol, my messages are like this:
'A''B''C''D''E'.........0x2300
'A''B''C''D''E' --> start message pattern
0x2300 --> two bytes end message
But due to the Nagle's algorithm, sometimes my messages are concatenated like:
'A''B''C''D''E'.........0x2300'A''B''C''D''E'.........0x2300'A''B''C''D''E'.........0x2300
I already tried to setNoDelay() to true but the problem persists.
I have the message in a byte[].
How could I split my messages to be parsed individually?
PS: For now I am able to get the first message but the others are lost...

Just loop through you received data and check for end-markers. When found set a start index to the next package and continue searching. Something like this:
int packageStart = 0;
for(int i = 0; i < data.length - 1; i++) {
if(data[i] == 0x23 && data[i + 1] == 0x00) {
// Found end of package
i++;
processPackage(data, packageStart, i);
packageStart = i;
}
// At this point: from packageStart till data.length are unprocessed bytes...
As noted, there might be some left over data (if data did not end with the end-marker). You might want to keep it, so you can prepend it to the next batch of received data. And thus preventing data-loss due to chopped up TCP/IP packages.

You have to think of it as parsing a continuous stream of bytes. Your code needs to identify the start and end of a message.
Due to the way packets get sent, you may have a complete message, multiple messages, a partial message, etc. You code needs to identify when a message has begun and keep reading until it has found the end of a message or in some instance, when you've read more bytes than your max message size and you need to resync.
I've seen some comm managers drop and reestablish the connection (start over) and others throw away data until they can get back in sync. Then you get into the fun of whether you need guaranteed delivery and retransmission.
The best protocols are the simple ones. Create a message header which contains say an SOH byte, a two byte message length (or whatever is appropriate), a 2 byte message type and 1 byte message subtype. You can also end the message with any number of bytes. Look at an ASCII chart, there's a number of Hex bytes 00-1F that are pretty standard since the terminal days.
No point in reinventing the wheel here. Makes it easier, because you know how long this message should be instead of looking for patterns in the data.

It sounds like you need to treat it like a Byte Stream and buffer the packets until you see your EOF code 0x2300.

Related

HTTP Webserver ignoring last line of POST request [duplicate]

Right now, I'm trying to write a GUI based Java tic-tac-toe game that functions over a network connection. It essentially works at this point, however I have an intermittent error in which several chars sent over the network connection are lost during gameplay. One case looked like this, when println statements were added to message sends/reads:
Player 1:
Just sent ROW 14 COLUMN 11 GAMEOVER true
Player 2:
Just received ROW 14 COLUMN 11 GAMEOV
Im pretty sure the error is happening when I read over the network. The read takes place in its own thread, with a BufferedReader wrapped around the socket's InputStream, and looks like this:
try {
int input;
while((input = dataIn.read()) != -1 ){
char msgChar = (char)input;
String message = msgChar + "";
while(dataIn.ready()){
msgChar = (char)dataIn.read();
message+= msgChar;
}
System.out.println("Just received " + message);
this.processMessage(message);
}
this.sock.close();
}
My sendMessage method is pretty simple, (just a write over a DataOutputStream wrapped around the socket's outputstream) so I don't think the problem is happening there:
try {
dataOut.writeBytes(message);
System.out.println("Just sent " + message);
}
Any thoughts would be highly appreciated. Thanks!
As it turns out, the ready() method guaruntees only that the next read WON'T block. Consequently, !ready() does not guaruntee that the next read WILL block. Just that it could.
I believe that the problem here had to do with the TCP stack itself. Being stream-oriented, when bytes were written to the socket, TCP makes no guarantees as to the order or grouping of the bytes it sends. I suspect that the TCP stack was breaking up the sent string in a way that made sense to it, and that in the process, the ready() method must detect some sort of underlying break in the stream, and return false, in spite of the fact that more information is available.
I refactored the code to add a newline character to every message send, then simply performed a readLine() instead. This allowed my network protocol to be dependent on the newline character as a message delimiter, rather than the ready() method. I'm happy to say this fixed the problem.
Thanks for all your input!
Try flushing the OutputStream on the sender side. The last bytes might remain in some intenal buffers.
It is really important what types of streamed objects you use to operate with data. It seems to me that this troubleshooting is created by the fact that you use DataOutputStream for sending info, but something else for receiving. Try to send and receive info by DataOutputStream and DataInputStream respectively.
Matter fact, if you send something by calling dataOut.writeBoolean(b)
but trying to receive this thing by calling dataIn.readString(), you will eventually get nothing. DataInputStream and DataOutputStream are type-sensitive. Try to refactor your code keeping it in mind.
Moreover, some input streams return on invocation of read() a single byte. Here you try to convert this one single byte into char, while in java char by default consists of two bytes.
msgChar = (char)dataIn.read();
Check whether it is a reason of data loss.

Count the number of messages in protocol buffer file

I am currently writing the java code to check the number of messages in a protocol buffer file *.pb
I would like to know if there is meta-data or header that contains the information of number of messages in the protobuf file?
I am looping through the whole file, and I think there should be a better way to do it.
while ((m = message.getParserForType().parseDelimitedFrom(input)) != null) {
recordCount++;
}
Thanks
David
There is no header or anything that will tell you the number of messages in the file. That format just consists of a length prefix in varint format, followed by a message payload, repeated for as many messages as you have.
However, you could in principle count the number of messages in a much more efficient way. If you just want to know how many there are, you could read the length prefixes and skip over the actual message payloads without parsing them.

Reading first four bytes from ByteBuffer, then writing them back?

I have a ByteBuffer object called msg with the intended message length in the first four bytes, which I read as follows:
int msgLen = msg.getInt();
LOG.debug("Message size: " + msgLen);
If the msgLen is less than some threshold value, I have a partial message and need to cache. In this case, I'd like to put those first four bytes back into the beginning of the message; that is, put the message back together to be identical to pre-reading. For example:
if (msgLen < threshold) {
msg.rewind();
msg.put(msgLen);
Unfortunately, this does not seem to be the correct way to do this. I've tried many combinations of flip, put, and rewind, but must be misunderstanding.
How would I put the bytes back into the write buffer in their original order?
Answer was posted by Andremoniy in comments section. Read operations do not consume bytes in the buffer, so msg.rewind() was adequate. This didn't work in my case because of some other logic in the program, and I incorrectly associated that with a problem at the buffer level.

Debugging if UTF-8 decoding is done correctly?

We have a Java code talking to external system over TCP connections with xml messages encoded in UTF-8.
The message received begin with '?'. SO the XML received is
?<begin>message</begin>
There is a real doubt if the first character is indeed '?'. At the moment, we cannot ask the external system if/what.
The code snippet for reading the stream is as below.
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8")));
int readByte = reader.read();
if (readByte <= 0) {
inputStream.close();
}
builder.append((char) readByte);
We are currently trying to log the raw bytes int readByte = inputStream.read(). The logs will take few days to be received.
In the mean time, I was wondering how we could ascertain at our end if it was truly a '?' and not a decoding issue?
I suspect strongly you have a byte-order-mark at the beginning of your doc. That won't render as a valid character, and consequently could appear as a question mark. Can you dump the raw bytes out and check for that sequence ?
Your question seems to boil down to this:
Can we ascertain the real value of the first byte of the message without actually looking at it.
The answer is "No, you can't". (Obviously!)
...
However, if you could intercept the TCP/IP traffic from the external system with a packet sniffer (aka traffic monitoring tool), then dumping the first byte or bytes of the message would be simple ... requiring no code changes.
Is logging the int returned by inputStream.read() the correct way to to analyse the bytes received. Or does the word length of the OS or other environment variables come into picture.
The InputStream.read() method returns either a single (unsigned) byte of data (in the range 0 to 255 inclusive) or -1 to indicate "end of stream". It is not sensitive to the "word length" or anything else.
In short, provided you treat the results appropriately, calling read() should give you the data you need to see what the bytes in the stream really are.

Handling IOExceptions from the Google Protocol Buffer library

I have some code that tries to read in a Google Protocol Buffer message from a socket in Java. However, the mergeDelimitedFrom() method can throw an IOException if it reads in invalid data or if the socket connection is reset (and probably other reasons). If the connection is reset I would like to exit out of the loop, but if it is just a invalid message I would like to continue running. One thought is to just have some sort of exception counter and exit after X consecutive failures, but I was hoping to be able to figure out what type of error occurs instead of being in the dark.
This is basically the code I have:
while (m_Running)
{
SomeMessage message = null;
try
{
final Builder builder = SomeMessage.newBuilder();
if (builder.mergeDelimitedFrom(m_InputStream))
{
message = builder.build();
}
else
{
// Google protocol buffers doesn't document it very well
// but if mergeDelimietedFrom returns false then it has
// reached the end of the input stream. For a socket, no
// more data will be coming so exit from the thread
m_Running = false;
}
}
catch (final IOException e)
{
// what should really be done here ???
}
}
Just don't do it. If you are reading protocol buffer objects directly off a socket, then you are effectively defining your own application protocol. It's harder than you might think to do it right - there is a good general description of some of the problems at On the Design of Application Protocols. It's important to understand framing - determining where one message ends and another begins.
Which leads us to some advice from the inventors of protobuf at https://developers.google.com/protocol-buffers/docs/techniques. The key piece of advice is this:
If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins.
I recommend that you decide on a framing protocol to divide the stream into messages, then write some custom socket code to handle the work or reading bytes of the sockets, dividing them into byte arrays where each byte array is known to contain exactly one message, then finally use protobuf to deserialize each message-byte-array into an object. Guaranteed no IOException protobuf deserialization.
You'll still have to deal with IOExceptions but it will be at a lower level where you are just reading byte arrays and you'll know exactly how much data has been deserialized when the error occurs.
Also consider using something like netty to help with the socket code.

Categories

Resources