Can TCP fragmentation be eliminated using a Pipe? - java

TCP network messages can be fragmented. But fragmented messages are difficult to parse, especially when data types longer than a byte are transferred. For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
Parsing would be much easier if multiple Channels could be recombined on the fly. So I thought of sending all Data through a java.nio.channels.Pipe.
// count total length
int length = 0;
foreach (Buffer buffer: buffers) {
length += buffer.remaining()
}
// write to pipe
Pipe pipe = Pipe.open();
pipe.sink().write(buffers);
// read back from pipe
ByteBuffer complete = ByteBuffer.allocateDirect(length)
if (pipe.source().read(complete) != length) {
System.out.println("Fragmented!")
}
But will this be guaranteed to fill up the buffer completely? Or could the Pipe introduce fragmentation again? In other words, will the body of the condition ever be reached?

TCP fragmentation has little to do with the problem you are experiencing. The TCP stack on the source of the stream is dividing messages that are too large for a single packet into multiple packets and they are arriving and being reassembled possibly out of alignment of the longs you are expecting.
Regardless, you are treating what amounts to a byte array (a ByteBuffer) as an input stream. You are telling the JVM to read 'the rest of what is in the buffer' into a ByteBuffer. Meanwhile, the second half of your long now inside the network buffer. The ByteBuffer you are now trying to read through will never have the rest of that long.
Consider using a Scanner to read longs, it will block until a long can be read.
Scanner scanner= new Scanner(socket.getChannel());
scanner.nextLong();
Also consider using a DataInputStream to read longs, although I can't tell if it blocks until a whole long is read based on the documentation.
DataInputStream dis = new DataInputStream(socket.InputStream);
dis.readLong();
If you have control over the server, consider using flush() to prevent your packets from getting buffered and sent 'fragmented' or an ObjectOutputStream/ObjectInputStream as a more convenient way to do IO.

No. A Pipe is intended to be written by one thread and read by another. There is an internal buffer of only 4k. If you write more than that to it you will stall.
They really aren't much use at all actually, except as a demonstration.
I don't understand this:
For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
What second buffer? You should be using the same receive buffer for the life of the channel. Make it an attachment to the SelectionKey so you can find it when you need it.
I also don't understand this:
Parsing would be much easier if multiple Channels could be recombined on the fly
Surely you mean multiple buffers, but the basic idea is to only have one buffer in the first place.

Related

Java Partial message handling from socket NIO ByteBuffer

I have a buffer that stores multiple messages. Say the buffer is 50 bytes (let me try to illustrate this with metacode)
-------------------- (50 byte empty buffer)
and my messages are of size 20. On a given socket read, I may get 1 message
|111111111|----------- (50 byte buffer with 1 20 byte message
Or two messages
|111111111|222222222|----
But if I get three messages, I end up with a partial third message
|111111111|222222222|3333 (TRUNCATE)
On the next socketread, the rest of the third message comes through and the bytestream contains the second half of message 3:
>>> socket.read()
33334444444455555555 ....
Furthermore, I know the position at which the third message starts, so I'd like to simply retain the contents of the third message in my buffer. I thought doing compact would be the case:
>>> readbuffer.compact()
And then simply pass this same buffer back into socket.read()
>>> socket.read(readBuffer)
And ideally, this would fill my buffer as so
33333333|44444444|55555...
However, I don't think that compacting and simply passing the readbuffer back into sock.read() is the correct approach.
Is there a well-known solution for handling partial messages this way? I can think of a lot of things to try, but this has to be a common problem. I'd like to avoid the intermediary creation of buffers as much as possible, but can't think of a solution that doesn't invoke some sort of a residual buffer.
Thanks
You are mistaken. Using compact() and then reusing the buffer for the next read() is exactly the correct approach.

How to change internal buffer size of DataInputStream

I'm using this kind of code for my TCP/IP connection:
sock = new Socket(host, port);
sock.setKeepAlive(true);
din = new DataInputStream(sock.getInputStream());
dout = new DataOutputStream(sock.getOutputStream());
Then, in separate thread I'm checking din.available() bytes to see if there are some incoming packets to read.
The problem is, that if a packet bigger than 2048 bytes arrives, the din.available() returns 2048 anyway. Just like there was a 2048 internal buffer. I can't read those 2048 bytes when I know it's not the full packet my application is waiting for. If I don't read it however - it'll all stuck at 2048 bytes and never receive more.
Can I enlarge the buffer size of DataInputStream somehow? Socket receive buffer is 16384 as returned by sock.getReceiveBufferSize() so it's not the socket limiting me to 2048 bytes.
If there is no way to increase the DataInputStream buffer size - I guess the only way is to declare my own buffer and read everything from DataInputStream to that buffer?
Regards
I'm going to make an assumption about what you're calling a "packet". I am going to assume that your "packet" is some unit of work being passed to your server. Ethernet TCP packets are limited to 1536 bytes. No matter what size writes the peer is performing.
So, you cannot expect to atomically read a full unit of work every time. It just won't happen. What you are writing will need to identify how large it is. This can be done by passing a value up front that tells the server how much data it should expect.
Given that, an approach would be to have a thread do a blocking read on din. Just process data as it becomes available until you have a complete packet. Then pass the packet to another thread to process the packet itself. (See ArrayBlockingQueue.)
You socket reader thread will process data at whatever rate and granularity it arrives. The packet processor thread always works in terms of complete packets.
Wrap your data input stream around larger buffered input stream:
DataInputStream din =
new DataInputStream(
new BufferedInputStream( sock.getInputStream( ), 4096 )
);
But I don't think it's going to help you. You have to consume the input from the socket otherwise the sender will get stuck.
You should probably invest more time in working out a better communication protocol.
If you dig into the source of DataInputStream getAvailable() is actually delegated to the stream its reading from, so the 2048 is coming from the socket input stream, not the DataInputStream (which is implemented by default in native code).
Also be aware that the API states for an InputStream
Note that while some implementations
of InputStream will return the total
number of bytes in the stream, many
will not. It is never correct to use
the return value of this method to
allocate a buffer intended to hold all
data in this stream.
So just because you are receiving a value of 2048 does not mean there is not more data available, it is simply the amount that is guaranteed to be read without blocking. Also note that while BufferedInputStream as suggested by Alexander is an option, it makes no guarantee that the buffer will always be filled (in fact if you look at the source it only attempts to fill the buffer when one of the read calls is made).
So if you want to make sure you are always receiving "full packets" you are likely better off creating your own input stream wrapper where you can add a specialty method "byte[] readPacket()" that will block until it can fill its own buffer as data is read from the underlying socket stream.
This is not how you use InputStreams. You never want to use the available method, it's pretty much useless. If you need to read "packets" of data, then you need to design that into your protocol. One easy way to do that is send the length of the packet first, then send the packet. the receiver reads the length of the packet, then reads exactly that many bytes from the stream.
You can't. DataInputStream doesn't have an internal buffer. You just need to block in a read loop.

How to limit the maximum size read via ObjectInputStream from a Socket?

Is there a way to limit the maximum buffer size to be read from an ObjectInputStream in java?
I want to stop the deserialization if it becomes clear that the Object in question is crafted maliciously huge.
Of course, there is ObjectInputStream.read(byte[] buf, int off, int len), but I do not want to suffer the performance penalty of allocating, say byte[1000000].
Am I missing something here?
You write a FilterInputStream which will throw an exception if it discovers it has read more than a certain amount of data from its underlying stream.
I can see two ways:
1) do your reads in a loop, grabbing chunks whose allocation size you're comfortable with, and exit and stop when you hit your limit; or
2) Allocate your max-size buffer once and re-use it for subsequent reads.
Actually, there's a really easy way.
You can use NIO's ByteBuffer, and use the allocateDirect method. This method will allow you to allocate a memory-mapped file, so it doesn't have a huge overhead, and you can limit its size.
Then, instead of getting the stream from the socket, get the Channel.
Code:
Socket s;
ByteBuffer buffer = ByteBuffer.allocateDirect(10 * 1024 * 1024);
s.getChannel().read(buffer);
Now, don't try to call the "array()" method on the byte buffer; it doesn't work on a directly-allocated buffer. However, you can wrap the buffer as an input stream and send it to the ObjectInputStream for further processing.

read and write method for large data in socket communication does not work reliably

I have created a socket programming for server client communication.
I am reading data using read(byte[]) of DataInputStream, also writing data using write(byte[]) of DataOutputStream.
Whenver I am sending small amount of data my program works fine.
But if I send a data of 20000 characters and send it 10 times then I am able to receive the data 8 times perfectly but not the 2 times.
So can I reliably send and receive data using read and write in socket programming?
My guess is that you're issuing a single call to read() and assuming it will return all the data you asked for. Streams don't generally work that way. It will block until some data is available, but it won't wait until it's got enough data to fill the array.
Generally this means looping round. For instance:
byte[] data = new byte[expectedSize];
int totalRead = 0;
while (totalRead < expectedSize)
{
int read = stream.read(data, totalRead, expectedSize-totalRead);
if (read == -1)
{
throw new IOException("Not enough data in stream");
}
totalRead += read;
}
If you don't know how many bytes you're expecting in the first place, you may well want to still loop round, but this time until read() returns -1. Use a buffer (e.g. 8K) to read into, and write into a ByteArrayOutputStream. When you've finished reading, you can then get the data out of the ByteArrayOutputStream as a byte array.
Absolutly -- TCP Sockets is a reliable network protocol provided the API is used properly.
You really need to check the number of bytes you receive on each read() call.
Sockets will arbiterily decide you have enough data and pass it back on hte read call -- the amount can dependon many factors (buffer size, memory availibility, network respose time etc.) most of which are unpredicatable. For smaller buffers you normally get as many bytes as you asked for, but, for larger buffer sizes read() will often return less data than you asked for -- you need to check the number of bytes read and repeat the read call for the remaining bytes.
It is also possible that something in your network infrastructure (router, firewall etc.) is misconfigred and trucating large packets.
Your problem is that in the server thread, you must call outputstream.flush(), to specify that the buffered data should be send to the other end of the communication

How to read huge data in socket and also write into socketchannel

How to read a very big data using DataInputStream of socket If the data is in String format and having a length of more than 1,00,000 characters.
Also how to write that big data using SocketChannel in java?
The problem is that your data is arriving in chunks. Either the packet size is limiting that or maybe DataInputStream has an internal buffer of only 40k. I don't know, but it doesn't matter. Either way, all 1000000 bytes will not arrive at once. So you have to rewrite your program to expect that. You need to read the smaller chunks that you receive and store them in another byte[1000000] variable (keeping track of where your last byte index). Keep looping until you are done reading the socket. Then you can work with your internal variable.

Categories

Resources