I have a buffer that stores multiple messages. Say the buffer is 50 bytes (let me try to illustrate this with metacode)
-------------------- (50 byte empty buffer)
and my messages are of size 20. On a given socket read, I may get 1 message
|111111111|----------- (50 byte buffer with 1 20 byte message
Or two messages
|111111111|222222222|----
But if I get three messages, I end up with a partial third message
|111111111|222222222|3333 (TRUNCATE)
On the next socketread, the rest of the third message comes through and the bytestream contains the second half of message 3:
>>> socket.read()
33334444444455555555 ....
Furthermore, I know the position at which the third message starts, so I'd like to simply retain the contents of the third message in my buffer. I thought doing compact would be the case:
>>> readbuffer.compact()
And then simply pass this same buffer back into socket.read()
>>> socket.read(readBuffer)
And ideally, this would fill my buffer as so
33333333|44444444|55555...
However, I don't think that compacting and simply passing the readbuffer back into sock.read() is the correct approach.
Is there a well-known solution for handling partial messages this way? I can think of a lot of things to try, but this has to be a common problem. I'd like to avoid the intermediary creation of buffers as much as possible, but can't think of a solution that doesn't invoke some sort of a residual buffer.
Thanks
You are mistaken. Using compact() and then reusing the buffer for the next read() is exactly the correct approach.
Related
The docs of the method DatagramChannel.receive(ByteBuffer) states that it will read the data from the channel equal to the ByteBuffer size and discard the remaining if the Buffer size is too small to hold the data. I was wondering if there's any way to know that the receive method has discarded the data.
The usual technique is to use a buffer one larger than the largest expected datagram. Then if you get one that size, it has almost certainly been truncated, and it is also an application protocol error.
Lets say you have a continious binary stream of data. And each of data's pieces should be somehow split. What is the best way to do it?
Socket.read(byte[] arr) doesnt guarantee that you will recieve exactly the same ammount of bytes as you sent using Socket.write(byte[] arr) arr may be split (out of 10 bytes you first read 8 and then 2) or spliced.
One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes. But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size
Another one I can think of is prefixing chunks of data with a pseudo random byte sequences. Initialize Random on client and server with the seed and use its random.nextBytes(byte[] arr) for prefixes. The downside of it is that in order to make sure that there is a very little possibility of having a random sequence in an actual data chunk you have to make it pretty long. That will add up a lot of useless traffic. And again it is not a way out in UDP sockets.
So what are other good ways of doing this and are there any simple libraries which would allow me to simply do conn.sendDataChunk(byte[] arr) ?
One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes.
Yep, that's exactly what you should do. In other words, you're adding a message header before each message. This is a practical necessity when you want to layer a message-based network protocol atop of a stream-based one that has no concept of message boundaries. (TCP purposefully obscures the IP packet boundaries.)
You could also use this as an opportunity to add other fields to the message header, such as a message ID number to help you distinguish between different message types.
But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size.
This is true. So don't send a malformed header!
In all seriousness, a header with a length field is standard practice. It's a good idea to add sanity checks on the length: make sure it's not too large (or negative) so you don't end up allocating 2GB of memory to read the next message.
Also, don't assume you can read the whole header in with a single read() either. It may take multiple reads to get the whole header.
Regarding UDP sockets: reading a UDP socket is not the same as reading a TCP socket. You either get the UDP packet or you do not. From UDP:
Datagrams – Packets are sent individually and are checked for integrity only if they arrive. Packets have definite boundaries which are honored upon receipt, meaning a read operation at the receiver socket will yield an entire message as it was originally sent.
So for UDP, you do not need to worry about reading an incorrect number of bytes. But you do have to worry about what happens if some data does not arrive, or if it arrives in a different order than it was sent.
TCP network messages can be fragmented. But fragmented messages are difficult to parse, especially when data types longer than a byte are transferred. For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
Parsing would be much easier if multiple Channels could be recombined on the fly. So I thought of sending all Data through a java.nio.channels.Pipe.
// count total length
int length = 0;
foreach (Buffer buffer: buffers) {
length += buffer.remaining()
}
// write to pipe
Pipe pipe = Pipe.open();
pipe.sink().write(buffers);
// read back from pipe
ByteBuffer complete = ByteBuffer.allocateDirect(length)
if (pipe.source().read(complete) != length) {
System.out.println("Fragmented!")
}
But will this be guaranteed to fill up the buffer completely? Or could the Pipe introduce fragmentation again? In other words, will the body of the condition ever be reached?
TCP fragmentation has little to do with the problem you are experiencing. The TCP stack on the source of the stream is dividing messages that are too large for a single packet into multiple packets and they are arriving and being reassembled possibly out of alignment of the longs you are expecting.
Regardless, you are treating what amounts to a byte array (a ByteBuffer) as an input stream. You are telling the JVM to read 'the rest of what is in the buffer' into a ByteBuffer. Meanwhile, the second half of your long now inside the network buffer. The ByteBuffer you are now trying to read through will never have the rest of that long.
Consider using a Scanner to read longs, it will block until a long can be read.
Scanner scanner= new Scanner(socket.getChannel());
scanner.nextLong();
Also consider using a DataInputStream to read longs, although I can't tell if it blocks until a whole long is read based on the documentation.
DataInputStream dis = new DataInputStream(socket.InputStream);
dis.readLong();
If you have control over the server, consider using flush() to prevent your packets from getting buffered and sent 'fragmented' or an ObjectOutputStream/ObjectInputStream as a more convenient way to do IO.
No. A Pipe is intended to be written by one thread and read by another. There is an internal buffer of only 4k. If you write more than that to it you will stall.
They really aren't much use at all actually, except as a demonstration.
I don't understand this:
For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
What second buffer? You should be using the same receive buffer for the life of the channel. Make it an attachment to the SelectionKey so you can find it when you need it.
I also don't understand this:
Parsing would be much easier if multiple Channels could be recombined on the fly
Surely you mean multiple buffers, but the basic idea is to only have one buffer in the first place.
For example I have a file whose content is:
abcdefg
then i use the following code to read 'defg'.
ByteBuffer bb = ByteBuffer.allocate(4);
int read = channel.read(bb, 3);
assert(read == 4);
Because there's adequate data in the file so can I suppose so? Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
The Javadoc says:
a read might not fill the buffer
and gives some examples, and
returns the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream.
This is NOT sufficient to allow you to make that assumption.
In practice, you are likely to always get a full buffer when reading from a file, modulo the end of file scenario. And that makes sense from an OS implementation perspective, given the overheads of making a system call.
But, I can also imagine situations where returning a half empty buffer might make sense. For example, when reading from a locally-mounted remote file system over a slow network link, there is some advantage in returning a partially filled buffer so that the application can start processing the data. Some future OS may implement the read system call to do that in this scenario. If assume that you will always get a full buffer, you may get a surprise when your application is run on the (hypothetical) new platform.
Another issue is that there are some kinds of stream where you will definitely get partially filled buffers. Socket streams, pipes and console streams are obvious examples. If you code your application assuming file stream behavior, you could get a nasty surprise when someone runs it against another kind of stream ... and fails.
No, in general you cannot assume that the number of bytes read will be equal to the number of bytes requested, even if there are bytes left to be read in the file.
If you are reading from a local file, chances are that the number of bytes requested will actually be read, but this is by no means guaranteed (and won't likely be the case if you're reading a file over the network).
See the documentation for the ReadableByteChannel.read(ByteBuffer) method (which applies for FileChannel.read(ByteBuffer) as well). Assuming that the channel is in blocking mode, the only guarantee is that at least one byte will be read.
I am building a non blocking server using javas NIO package, and have a couple questions about validating received data.
I have noticed that when i call read on a socket channel, it will attempt to fill the byte buffer that it is reading to (as per the documentation). When i send 10 bytes to the server from the client, the server will read the those ten bytes into the byte buffer, the rest of the bytes in the byte buffer will stay at zero, and the number returned from the read operation will be the size of my byte buffer even though the client only wrote 10 bytes.
What i am trying to figure out is if there is a way to get just the number of bytes the client sent to the server when the server reads from a socket channel (in the above case, 10 instead of 1024).
If that doesn't work i know i can get separate all the actual received data from the client from this 'excess' data stored in the byte buffer by using delimiters in conjunction with my 'instruction set headers' and what not, but it seems like this should exists so i have to wonder if i am just missing something obvious, or if there is some low level reason why this can't be done.
Thanks :)
you probably forgot to call the notorious flip() on your buffer.
buffer.clear();
channel.read(buffer);
buffer.flip();
// now you can read from the buffer
buffer.get...
I need to change my signature to nio.sucks