I/O streams in Java are the most misunderstood concept for me in programming.
Suppose, we get input stream from a socket connection:
DataInputStream in = new DataInputStream(clientSocket.getInputStream());
When I get data from a remote server, which of these describes things correctly?
Data stored in the in variable. When extra data comes from server, it appends to in, increasing its size. And then we can read data from in variable that way:
byte[] messageByte = new byte[1000];
boolean end = false;
String dataString = "";
while(!end)
{
bytesRead = in.read(messageByte);
messageString += new String(messageByte, 0, bytesRead);
if (messageString.length == 100)
{
end = true;
}
}
in is only a link to the source of data and doesn't contain data itself. When we call in.read(messageByte) 1000 bytes copy from the socket to bytesRead?
Alternatively, instead of a socket let's say we have stream connected to file on HDD. When we call in.read(messageByte) we read 1000 bytes from HDD, yes?
Which approach is right? I tend to think it's #2, but if so where is data stored in the socket case? Is the remote server waiting when we read 1000 bytes, and then sends extra data again? Or is data from the server stored in some buffer in the operating system?
Data stored in in variable.
No.
When extra data comes from server, it appends to in, increase it size. And then we can read data from in variable that way:
byte[] messageByte = new byte[1000];
boolean end = false;
String dataString = "";
while(!end)
{
bytesRead = in.read(messageByte);
messageString += new String(messageByte, 0, bytesRead);
if (messageString.length == 100)
{
end = true;
}
}
No. See below.
in is only link to source of data, and don't contains data themselves.
Correct.
And when we call in.read(messageByte); 1000 bytes copy from socket to bytesRead?
No. It blocks until:
at least one byte has been transferred, or
end of stream has occurred, or
an exception has been thrown,
whichever occurs first. See the Javadoc.
(Instead socket, we can have stream connected to file on HDD, and when we call in.read(messageByte) we read 1000 bytes from HDD. Yes?)
No. Same as above.
What approach right?
Neither of them. The correct way to read from an input stream is to loop until you have all the data you're expecting, or EOS or an exception occurs. You can't rely on read() filling the buffer. If you need that, use DataInputStream.readFully().
I tend to 2
That doesn't make sense. You don't have the choice. (1) and (2) aren't programming paradigms, they are questions about how the stream actually works. The question of how to write the code is distinct from that.
where data stored in socket?
Some of it is in the socket receive buffer in the kernel. Most of it hasn't arrived yet. None of it is 'in the socket'.
Or remote server waiting when we read 1000 bytes, and then send extra data again?
No. The server sends through its socket send buffer into your socket receive buffer. Your reads and the server's writes are very decoupled from each other.
Or data from server stored in any buffer in operating system?
Yes, the socket receive buffer.
It depends on the type of stream. Where the data is stored varies from stream to stream. Some have internal storage, some read from other sources, and some do both.
A FileInputStream reads from the file on disk when you request it to. The data is on disk, it's not in the stream.
A socket's InputStream reads from the operating systems buffers. When packets arrive the operating system automatically reads them and buffers up a small amount of data (say, 64KB). Reading from the stream drains that OS buffer. If the buffer is empty because no packets have arrived, your read call blocks. If you don't drain the buffer fast enough and it gets full then the OS will drop network packets until you free up some space.
A ByteArrayOutputStream has an internal byte[] array. When you write to the stream it stores your writes in that array. In this case the stream does have internal storage.
A BufferedInputStream is tied to another input stream. When you read from a BufferedInputStream it will typically request a bug chunk of data from the underlying stream and store it in a buffer. Subsequent read requests you issue are then satisfied with data from the buffer rather than performing additional I/O on the underlying stream. The goal is to minimize the number of individual read requests the underlying stream receives by issuing a smaller number of bulk reads. In this case the stream has a mixed strategy of some internal storage and some external reads.
Related
I have a file of bytes mybytes.dat, and I'm writing a TCP Server/Client, where the server sends the contents of mybytes.dat to the client over a socket.
If mybytes.dat is read completely into memory, then the application can send data at about 160MB/s on my local network stack. However, now I'm trying to stream datafiles that are > 1GB, and shouldn't all be read into memory.
This related solution for sending a file in chunks seems appropriate; however, would it be more efficient to read large chunks of the file into memory (ie. maybe 1MB at a time in a buffer) and then write these as smaller chunks (32kb) to a socket? If this is reasonable, how can one use a BufferedFileReader to read large chunks, and then write smaller chunks to an OutputStream? To get started, let me declare at least some variables:
BufferedInputStream blobReader = new BufferedInputStream(newInputStream("mybytes.dat"), 1024*1024);
OutputStream socketWriter = socket.getOutputStream();
What is the correct way to connect my blobReader to the socketWriter, such that I always maintain enough bytes in memory to ensure the application is not limited by disk reads? Or am I completely offbase with this line of reasoning?
I am writing about 10000 bytes on SSL socket in one shot by taking OutputStream instance from it
OutputStrem os = ssl_Socket.getOutputStream();
os is OutputStream here. It writes the data successfully to the server, but the data received at server end is getting corrupted somehow.
But If I use BufferedOutputStream everthing works fine.
os = new BufferedOutputStream(c._s.getOutputStream(), 8196);
My Question :
Is there any limit on data that can be written on SSL socket in one shot ?
Is there any default buffer size ?
Why it worked successfully with BufferedOutputStream ? Since I have to write large chunk of data I don't want to use BufferedOutputStream ?
Is there any limit on data that can be written on SSL socket in one shot?
There is no limit other than Integer.MAX_VALUE. The SSLSocket's output stream will block until all the data has been sent, including encryption and packaging into the requisite number of underlying SSL records.
Is there any default buffer size?
BufferedOutputStream has a default buffer size of 8192. 8196 is a curious number to use for a buffer size, but you should certainly always use a buffered stream or writer over an SSLSocket's output stream. Otherwise you can get a data explosion of up to 42 times, if you write a byte at a time
Why it worked successfully with BufferedOutputStream ? Since I have to write large chunk of data I don't want to use BufferedOutputStream?
You don't have to use a BufferedOutputStream, but it doesn't hurt, even if you're writing large chunks of data. The buffer is bypassed when possible.
Your problems are almost certainly at the receiving end.
"[TLS] specifies a fixed maximum plaintext fragment length of 2^14 bytes." - which is 16K.
Read about "max_fragment_length" TLS extension which can limit the size of block.
PS: Not familiar with Java SSL library, maybe there is something specific.
TCP network messages can be fragmented. But fragmented messages are difficult to parse, especially when data types longer than a byte are transferred. For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
Parsing would be much easier if multiple Channels could be recombined on the fly. So I thought of sending all Data through a java.nio.channels.Pipe.
// count total length
int length = 0;
foreach (Buffer buffer: buffers) {
length += buffer.remaining()
}
// write to pipe
Pipe pipe = Pipe.open();
pipe.sink().write(buffers);
// read back from pipe
ByteBuffer complete = ByteBuffer.allocateDirect(length)
if (pipe.source().read(complete) != length) {
System.out.println("Fragmented!")
}
But will this be guaranteed to fill up the buffer completely? Or could the Pipe introduce fragmentation again? In other words, will the body of the condition ever be reached?
TCP fragmentation has little to do with the problem you are experiencing. The TCP stack on the source of the stream is dividing messages that are too large for a single packet into multiple packets and they are arriving and being reassembled possibly out of alignment of the longs you are expecting.
Regardless, you are treating what amounts to a byte array (a ByteBuffer) as an input stream. You are telling the JVM to read 'the rest of what is in the buffer' into a ByteBuffer. Meanwhile, the second half of your long now inside the network buffer. The ByteBuffer you are now trying to read through will never have the rest of that long.
Consider using a Scanner to read longs, it will block until a long can be read.
Scanner scanner= new Scanner(socket.getChannel());
scanner.nextLong();
Also consider using a DataInputStream to read longs, although I can't tell if it blocks until a whole long is read based on the documentation.
DataInputStream dis = new DataInputStream(socket.InputStream);
dis.readLong();
If you have control over the server, consider using flush() to prevent your packets from getting buffered and sent 'fragmented' or an ObjectOutputStream/ObjectInputStream as a more convenient way to do IO.
No. A Pipe is intended to be written by one thread and read by another. There is an internal buffer of only 4k. If you write more than that to it you will stall.
They really aren't much use at all actually, except as a demonstration.
I don't understand this:
For example buffer.getLong() may fail if some bytes of the long I expect end up in a second buffer.
What second buffer? You should be using the same receive buffer for the life of the channel. Make it an attachment to the SelectionKey so you can find it when you need it.
I also don't understand this:
Parsing would be much easier if multiple Channels could be recombined on the fly
Surely you mean multiple buffers, but the basic idea is to only have one buffer in the first place.
I'm using this kind of code for my TCP/IP connection:
sock = new Socket(host, port);
sock.setKeepAlive(true);
din = new DataInputStream(sock.getInputStream());
dout = new DataOutputStream(sock.getOutputStream());
Then, in separate thread I'm checking din.available() bytes to see if there are some incoming packets to read.
The problem is, that if a packet bigger than 2048 bytes arrives, the din.available() returns 2048 anyway. Just like there was a 2048 internal buffer. I can't read those 2048 bytes when I know it's not the full packet my application is waiting for. If I don't read it however - it'll all stuck at 2048 bytes and never receive more.
Can I enlarge the buffer size of DataInputStream somehow? Socket receive buffer is 16384 as returned by sock.getReceiveBufferSize() so it's not the socket limiting me to 2048 bytes.
If there is no way to increase the DataInputStream buffer size - I guess the only way is to declare my own buffer and read everything from DataInputStream to that buffer?
Regards
I'm going to make an assumption about what you're calling a "packet". I am going to assume that your "packet" is some unit of work being passed to your server. Ethernet TCP packets are limited to 1536 bytes. No matter what size writes the peer is performing.
So, you cannot expect to atomically read a full unit of work every time. It just won't happen. What you are writing will need to identify how large it is. This can be done by passing a value up front that tells the server how much data it should expect.
Given that, an approach would be to have a thread do a blocking read on din. Just process data as it becomes available until you have a complete packet. Then pass the packet to another thread to process the packet itself. (See ArrayBlockingQueue.)
You socket reader thread will process data at whatever rate and granularity it arrives. The packet processor thread always works in terms of complete packets.
Wrap your data input stream around larger buffered input stream:
DataInputStream din =
new DataInputStream(
new BufferedInputStream( sock.getInputStream( ), 4096 )
);
But I don't think it's going to help you. You have to consume the input from the socket otherwise the sender will get stuck.
You should probably invest more time in working out a better communication protocol.
If you dig into the source of DataInputStream getAvailable() is actually delegated to the stream its reading from, so the 2048 is coming from the socket input stream, not the DataInputStream (which is implemented by default in native code).
Also be aware that the API states for an InputStream
Note that while some implementations
of InputStream will return the total
number of bytes in the stream, many
will not. It is never correct to use
the return value of this method to
allocate a buffer intended to hold all
data in this stream.
So just because you are receiving a value of 2048 does not mean there is not more data available, it is simply the amount that is guaranteed to be read without blocking. Also note that while BufferedInputStream as suggested by Alexander is an option, it makes no guarantee that the buffer will always be filled (in fact if you look at the source it only attempts to fill the buffer when one of the read calls is made).
So if you want to make sure you are always receiving "full packets" you are likely better off creating your own input stream wrapper where you can add a specialty method "byte[] readPacket()" that will block until it can fill its own buffer as data is read from the underlying socket stream.
This is not how you use InputStreams. You never want to use the available method, it's pretty much useless. If you need to read "packets" of data, then you need to design that into your protocol. One easy way to do that is send the length of the packet first, then send the packet. the receiver reads the length of the packet, then reads exactly that many bytes from the stream.
You can't. DataInputStream doesn't have an internal buffer. You just need to block in a read loop.
I have created a socket programming for server client communication.
I am reading data using read(byte[]) of DataInputStream, also writing data using write(byte[]) of DataOutputStream.
Whenver I am sending small amount of data my program works fine.
But if I send a data of 20000 characters and send it 10 times then I am able to receive the data 8 times perfectly but not the 2 times.
So can I reliably send and receive data using read and write in socket programming?
My guess is that you're issuing a single call to read() and assuming it will return all the data you asked for. Streams don't generally work that way. It will block until some data is available, but it won't wait until it's got enough data to fill the array.
Generally this means looping round. For instance:
byte[] data = new byte[expectedSize];
int totalRead = 0;
while (totalRead < expectedSize)
{
int read = stream.read(data, totalRead, expectedSize-totalRead);
if (read == -1)
{
throw new IOException("Not enough data in stream");
}
totalRead += read;
}
If you don't know how many bytes you're expecting in the first place, you may well want to still loop round, but this time until read() returns -1. Use a buffer (e.g. 8K) to read into, and write into a ByteArrayOutputStream. When you've finished reading, you can then get the data out of the ByteArrayOutputStream as a byte array.
Absolutly -- TCP Sockets is a reliable network protocol provided the API is used properly.
You really need to check the number of bytes you receive on each read() call.
Sockets will arbiterily decide you have enough data and pass it back on hte read call -- the amount can dependon many factors (buffer size, memory availibility, network respose time etc.) most of which are unpredicatable. For smaller buffers you normally get as many bytes as you asked for, but, for larger buffer sizes read() will often return less data than you asked for -- you need to check the number of bytes read and repeat the read call for the remaining bytes.
It is also possible that something in your network infrastructure (router, firewall etc.) is misconfigred and trucating large packets.
Your problem is that in the server thread, you must call outputstream.flush(), to specify that the buffered data should be send to the other end of the communication