I have some code that tries to read in a Google Protocol Buffer message from a socket in Java. However, the mergeDelimitedFrom() method can throw an IOException if it reads in invalid data or if the socket connection is reset (and probably other reasons). If the connection is reset I would like to exit out of the loop, but if it is just a invalid message I would like to continue running. One thought is to just have some sort of exception counter and exit after X consecutive failures, but I was hoping to be able to figure out what type of error occurs instead of being in the dark.
This is basically the code I have:
while (m_Running)
{
SomeMessage message = null;
try
{
final Builder builder = SomeMessage.newBuilder();
if (builder.mergeDelimitedFrom(m_InputStream))
{
message = builder.build();
}
else
{
// Google protocol buffers doesn't document it very well
// but if mergeDelimietedFrom returns false then it has
// reached the end of the input stream. For a socket, no
// more data will be coming so exit from the thread
m_Running = false;
}
}
catch (final IOException e)
{
// what should really be done here ???
}
}
Just don't do it. If you are reading protocol buffer objects directly off a socket, then you are effectively defining your own application protocol. It's harder than you might think to do it right - there is a good general description of some of the problems at On the Design of Application Protocols. It's important to understand framing - determining where one message ends and another begins.
Which leads us to some advice from the inventors of protobuf at https://developers.google.com/protocol-buffers/docs/techniques. The key piece of advice is this:
If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins.
I recommend that you decide on a framing protocol to divide the stream into messages, then write some custom socket code to handle the work or reading bytes of the sockets, dividing them into byte arrays where each byte array is known to contain exactly one message, then finally use protobuf to deserialize each message-byte-array into an object. Guaranteed no IOException protobuf deserialization.
You'll still have to deal with IOExceptions but it will be at a lower level where you are just reading byte arrays and you'll know exactly how much data has been deserialized when the error occurs.
Also consider using something like netty to help with the socket code.
Related
Right now, I'm trying to write a GUI based Java tic-tac-toe game that functions over a network connection. It essentially works at this point, however I have an intermittent error in which several chars sent over the network connection are lost during gameplay. One case looked like this, when println statements were added to message sends/reads:
Player 1:
Just sent ROW 14 COLUMN 11 GAMEOVER true
Player 2:
Just received ROW 14 COLUMN 11 GAMEOV
Im pretty sure the error is happening when I read over the network. The read takes place in its own thread, with a BufferedReader wrapped around the socket's InputStream, and looks like this:
try {
int input;
while((input = dataIn.read()) != -1 ){
char msgChar = (char)input;
String message = msgChar + "";
while(dataIn.ready()){
msgChar = (char)dataIn.read();
message+= msgChar;
}
System.out.println("Just received " + message);
this.processMessage(message);
}
this.sock.close();
}
My sendMessage method is pretty simple, (just a write over a DataOutputStream wrapped around the socket's outputstream) so I don't think the problem is happening there:
try {
dataOut.writeBytes(message);
System.out.println("Just sent " + message);
}
Any thoughts would be highly appreciated. Thanks!
As it turns out, the ready() method guaruntees only that the next read WON'T block. Consequently, !ready() does not guaruntee that the next read WILL block. Just that it could.
I believe that the problem here had to do with the TCP stack itself. Being stream-oriented, when bytes were written to the socket, TCP makes no guarantees as to the order or grouping of the bytes it sends. I suspect that the TCP stack was breaking up the sent string in a way that made sense to it, and that in the process, the ready() method must detect some sort of underlying break in the stream, and return false, in spite of the fact that more information is available.
I refactored the code to add a newline character to every message send, then simply performed a readLine() instead. This allowed my network protocol to be dependent on the newline character as a message delimiter, rather than the ready() method. I'm happy to say this fixed the problem.
Thanks for all your input!
Try flushing the OutputStream on the sender side. The last bytes might remain in some intenal buffers.
It is really important what types of streamed objects you use to operate with data. It seems to me that this troubleshooting is created by the fact that you use DataOutputStream for sending info, but something else for receiving. Try to send and receive info by DataOutputStream and DataInputStream respectively.
Matter fact, if you send something by calling dataOut.writeBoolean(b)
but trying to receive this thing by calling dataIn.readString(), you will eventually get nothing. DataInputStream and DataOutputStream are type-sensitive. Try to refactor your code keeping it in mind.
Moreover, some input streams return on invocation of read() a single byte. Here you try to convert this one single byte into char, while in java char by default consists of two bytes.
msgChar = (char)dataIn.read();
Check whether it is a reason of data loss.
My goal is to send different kind of messages from client to server, and it will be text based. The thing I am uncertain of is how to del with partial reads here. I will have to be sure that I get a whole message and nothing more.
Do anyone have experience with that?
Here is what I have so far:
private void handleNewClientMessage(SelectionKey key) throws IOException {
SocketChannel sendingChannel = (SocketChannel) key.channel();
ByteBuffer receivingBuffer = ByteBuffer.allocate(2048);
int bytesRead = sendingChannel.read(receivingBuffer);
if (bytesRead > 0) {
receivingBuffer.flip();
byte[] array = new byte[receivingBuffer.limit()];
receivingBuffer.get(array);
String message = new String(array);
System.out.println("Server received " +message);
}
selector.wakeup();
}
But I have no way of "ending" the message and be certain to have one full message.
Best regards,
O
You can never be sure you won't read more than one message unless you only read one byte at a time. (Which I don't suggest).
Instead I would read as much as you can into a ByteBuffer and then parse it to find the end of the message e.g. a newline for text.
When you find the end of a line extract it and convert it to a String and process it. repeat until you have a partial message (or nothing left)
If you find you have only part of a message, you compact() (if position() > 0) when you have and try to read() some more.
This will allows you to read as many messages at once as you can but can also handle incomplete messages.
Note: You will need to keep the ByteBuffer for a connection so you know what partial messages you have read before.
Note: this is will not work if you have a message which is larger than your buffer size. I suggest using a recycled direct ByteBuffer of say 1+ MB. With direct ByteBuffers only the pages of the ByteBuffer which are used get allocated to real memory.
If you are concerned about performance I would re-use your byte[] where possible. You only need to re-allocate it if you need more space than you have already.
BTW, You might find using a BufferedReader with Plain IO is much simpler to use, but still performance well enough.
I'm having some trouble to parse a TCP packet from a socket...
In my protocol, my messages are like this:
'A''B''C''D''E'.........0x2300
'A''B''C''D''E' --> start message pattern
0x2300 --> two bytes end message
But due to the Nagle's algorithm, sometimes my messages are concatenated like:
'A''B''C''D''E'.........0x2300'A''B''C''D''E'.........0x2300'A''B''C''D''E'.........0x2300
I already tried to setNoDelay() to true but the problem persists.
I have the message in a byte[].
How could I split my messages to be parsed individually?
PS: For now I am able to get the first message but the others are lost...
Just loop through you received data and check for end-markers. When found set a start index to the next package and continue searching. Something like this:
int packageStart = 0;
for(int i = 0; i < data.length - 1; i++) {
if(data[i] == 0x23 && data[i + 1] == 0x00) {
// Found end of package
i++;
processPackage(data, packageStart, i);
packageStart = i;
}
// At this point: from packageStart till data.length are unprocessed bytes...
As noted, there might be some left over data (if data did not end with the end-marker). You might want to keep it, so you can prepend it to the next batch of received data. And thus preventing data-loss due to chopped up TCP/IP packages.
You have to think of it as parsing a continuous stream of bytes. Your code needs to identify the start and end of a message.
Due to the way packets get sent, you may have a complete message, multiple messages, a partial message, etc. You code needs to identify when a message has begun and keep reading until it has found the end of a message or in some instance, when you've read more bytes than your max message size and you need to resync.
I've seen some comm managers drop and reestablish the connection (start over) and others throw away data until they can get back in sync. Then you get into the fun of whether you need guaranteed delivery and retransmission.
The best protocols are the simple ones. Create a message header which contains say an SOH byte, a two byte message length (or whatever is appropriate), a 2 byte message type and 1 byte message subtype. You can also end the message with any number of bytes. Look at an ASCII chart, there's a number of Hex bytes 00-1F that are pretty standard since the terminal days.
No point in reinventing the wheel here. Makes it easier, because you know how long this message should be instead of looking for patterns in the data.
It sounds like you need to treat it like a Byte Stream and buffer the packets until you see your EOF code 0x2300.
In socket I/O, may I know how does a objectinputstream readObject knows how many bytes to read? Is the content length encapsulated inside the bytes itself or does it simply reads all the available bytes in the buffer itself?
I am asking this because I was referring to the Python socket how-to and it says
Now if you think about that a bit, you’ll come to realize a
fundamental truth of sockets: messages must either be fixed length
(yuck), or be delimited (shrug), or indicate how long they are (much
better), or end by shutting down the connection. The choice is
entirely yours, (but some ways are righter than others).
However in another SO answer, #DavidCrawshaw mentioned that `
So readObject() does not know how much data it will read, so it does
not know how many objects are available.
I am interested to know how it works...
You're over-interpreting the answer you cited. readObject() doesn't know how many bytes it will read, ahead of time, but once it starts reading it is just parsing an input stream according to a protocol, that consists of tags, primitive values, and objects, which in turn consist of tags, primitive values, and other objects. It doesn't have to know ahead of time. Consider the similar-ish case of XML. You don't know how long the document will be ahead of time, or each element, but you know when you've read it all, because the protocol tells you.
The readOject() method is using BlockedInputStream to read the byte.If you check the readObject of ObjectInputStream , it is calling
readObject0(false).
private Object readObject0(boolean unshared) throws IOException {
boolean oldMode = bin.getBlockDataMode();
if (oldMode) {
int remain = bin.currentBlockRemaining();
if (remain > 0) {
throw new OptionalDataException(remain);
} else if (defaultDataEnd) {
/*
* Fix for 4360508: stream is currently at the end of a field
* value block written via default serialization; since there
* is no terminating TC_ENDBLOCKDATA tag, simulate
* end-of-custom-data behavior explicitly.
*/
throw new OptionalDataException(true);
}
bin.setBlockDataMode(false);
}
byte tc;
while ((tc = bin.peekByte()) == TC_RESET) {
bin.readByte();
handleReset();
}
which is reading from the stream is using bin.readByte().bin is BlockiedDataInputStream which in turns use PeekInputStream to read it.This class finally is using InputStream.read().
From the description of the read method:
/**
* Reads the next byte of data from the input stream. The value byte is
* returned as an <code>int</code> in the range <code>0</code> to
* <code>255</code>. If no byte is available because the end of the stream
* has been reached, the value <code>-1</code> is returned. This method
* blocks until input data is available, the end of the stream is detected,
* or an exception is thrown.
So basically it reads byte after byte until it encounters -1.So As EJP mentioned, it never know ahead of time how many bytes are there to be read. Hope this will help you in understanfing it.
I have a socketChannel configured as blocking, but when reading byte buffers of 5K from this socket, I get an incomplete buffer sometimes.
ByteBuffer messageBody = ByteBuffer.allocate(5*1024);
messageBody.mark();
messageBody.order(ByteOrder.BIG_ENDIAN);
int msgByteCount = channel.read(messageBody);
Ocasionally, messageBody is not completely filled and channel.read() does not return -1 or an exception, but the actual number of bytes read (which is less than 5k).
Has anyone experienced a similar problem?
That's how reads work. The SocketChannel documentation says:
A read operation might not fill the buffer, and in fact it might not read any bytes at all. [...] It is guaranteed, however, that if a channel is in blocking mode and there is at least one byte remaining in the buffer then this method will block until at least one byte is read [emphasis added].
When you use sockets you must anticipate that the socket might transfer fewer bytes than you expect. You must loop on the .read method to get the remainder of the bytes.
This is also true when you send bytes through a socket. You must check how many bytes were sent, and loop on the send until all bytes have been sent.
This behavior is due to the network layers splitting the messages into multiple packets. If your messages are short, then you are less likely to encounter this. But you should always code for it.
With 5k bytes per buffer you are very likely to see the sender's message spit into multiple packets. Each read operation will receive one packet, which is only part of your message.
TCP/IP sends the information in packets, they are not always all available when you do the read, therefore you must do the read in a loop.
char [] buffer = new char[1024];
int chars_read;
try
{
while((chars_read = from_server.read(buffer)) != -1)
{
to_user.write(buffer,0,chars_read);
to_user.flush();
}
}
catch(IOException e)
{
to_user.println(e);
}
See this post