Side effects of flush() OutputStreamWriter to Network - java

What are the side effects of flushing a OutputStreamWriter which is going to a network socket.
I have a program which calls out.flush() after every few bytes. Is there any reason why I should wait until all bytes I need are in the buffer?
Will I get lower transfer rate if I flush too much (more overhead)?
Will this slow down execution of my program (blocking)?

Each time you write to a socket, you add between 5 and 15 micro-seconds. For buffered output, this occurs when you flush() the data. Note: if you don't have a buffered output, it will be performed on every write() and the flush() won't do anything.
Fortunately the OS expects applications to make more calls than is optimal so the it uses Nagle's algorithm by default to groups portions of the data writing into a larger packets. Note: not only does the OS do this but some network adapter do this by default too.
In short, don't flush() too often but unless tens of micros-seconds add up to something which matters to you, you might not notice the difference. e.g. if you do 100 flushes you might add a milli-second.

There is no reason to flush unless:
you want the peer to receive the data as soon as possible
you've sent a buffered request and you're now going to read the response.
In other cases it is better to allow the buffer, the Nagle algorithm, and the TCP receive window work their magic.

In case of data transfer on network, batching of outputStreamWriter's flush does improve performance.
I observed that single flush of data packet of about 520 bytes was taking around 3 milliseconds.

Related

How is InputStream managed in memory?

I am familiar with the concept of InputStream,buffers and why they are useful (when you need to work with data that might be larger then the machines RAM for example).
I was wondering though, how does the InputStream actually carry all that data?. Could a OutOfMemoryError be caused if there is TOO much data being transfered?
Case-scenario
If I connect from a client to a server,requesting a 100GB file, the server starts iterating through the bytes of the file with a buffer, and writing the bytes back to the client with outputStream.write(byte[]). The client is not ready to read the InputStream right now,for whatever reason. Will the server continue sending the bytes of the file indefinitely? and if so, won't the outputstream/inputstream be larger than the RAM of one of these machines?
InputStream and OutputStream implementations do not generally use a lot of memory. In fact, the word "Stream" in these types means that it does not need to hold the data, because it is accessed in a sequential manner -- in the same way that a stream can transfer water between a lake and the ocean without holding a lot of water itself.
But "stream" is not the best word to describe this. It's more like a pipe, because when you transfer data from a server to a client, every stage transfers back-pressure from the client that controls the rate at which data gets sent. This is similar to how your faucet controls the rate of flow through your pipes all the way to the city reservoir:
As the client reads data, it's InputStream only requests more data from the OS when its internal (small) buffers are empty. Each request allows only a limited amount of data to be transferred;
As data is requested from the OS, its own internal buffer empties, and it notifies the server about how much space there is for new data. The server can send only this much (that's called 'flow control' in TCP: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage)
On the server side, the server-side OS sends out data from its own internal buffer when the client has space to receive it. As its own internal buffer empties, it allows the writing process to re-fill it with more data.
As the server-side process write()s to its OutputStream, the OutputStream will try to write data to the OS. When the OS buffer is full, it will make the server process wait until the server-side buffer has space to accept new data.
Notice that a slow client can make the server process take a very long time. If you're writing a server, and you don't control the clients, then it's very important to consider this and to ensure that there are not a lot of server-side resources tied up while a long data transfer takes place.
Your question is as interesting as difficult to answer properly.
First: InputStream and OutputStream are not a storage means, but an access means: They describe that the data shall be accessed in sequential, unidirectional order, but not how it shall be stored. The actual way of storing the data is implementation-dependent.
So, would there be an InputStream that stores the whole amount of data simultaneally in memory? Yes, could be, though it would be an appalling implementation. The most common and sensitive implementation of InputStreams / OutputStreams is by storing just a fixed and short amount of data into a temporary buffer of 4K-8K, for example.
(So far, I supposed you already knew that, but it was necessary to tell.)
Second: What about connected writting / reading streams between a server and a client? In a common scenario of buffered writting, the server will not write more data than the buffer allows. So, if the server starts writing, and the client then goes down (for whatever reason), the server will just keep writing until the buffer is full, and then set it as ready for reading, and until the read is not completed (by the client peer), the server won't fill the buffer again. Remember: This kind of read/write is blocking: The client blocks until there is a buffer ready to be read, and the server blocks (or, at least, the server thread bound to this connection, it's understood) until the last read is completed.
How many time will the server block? Typically, a server should have a security timeout to ensure that long blocks will break the connection, thus releasing the blocked thread. The same should have the client.
The timeouts set for the connection depend on the implementation, and the protocol.
No, it does not need to hold all data. I just advances forward in the file (usually using buffered data). The stream can discard old buffers as it pleases.
Note that there are a a lot of very different implementations of inputstreams, so the exact behaviour varies a lot.

Speed difference in write to socket on slow and fast connections?

Does a write to a plain Outputstream backed by a java socket (on serverside) take the same time to clients with a fast and slow connection?
I would suspect no, but i can also suspect that there is some kind of buffer in front of the socket internally.
There's two socket buffers. One for output, one for input. If you're writing data that fits in the buffers, the speed of the connection doesn't matter for the first write.
After that there will be a noticeable difference, and the slow connection will block a lot more, waiting for space in the output buffer. Of course that speed difference won't affect the code, you'll just be writing data that eventually will be transferred, whether it happens fast or slow. It's only the end user that notices the slowness.

Using buffered streams for sending objects?

I'm currently using Java sockets in a client-server application with OutputStream and not BufferedOutputStream (and the same for input streams).
The client and server exchanges serialized objects (writeObject() method).
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
And when I have to flush or should I not write a flush() statement?
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
Actually, it probably doesn't make sense1.
The object stream implementation internally wraps the stream it has been given with a private class called BlockDataOutputStream that does buffering. If you wrap the stream yourself, you will have two levels of buffering ... which is likely to make performance worse2.
And when I have to flush or should I not write a flush() statement?
Yes, flushing is probably necessary. But there is no universal answer as to when to do it.
On the one hand, if you flush too often, you generate extra network traffic.
On the other hand, if you don't flush when it is needed, the server can stall waiting for an object that the client has written but not flushed.
You need to find the compromise between these two syndromes ... and that depends on your application's client/server interaction patterns; e.g. whether the message patterns are synchronous (e.g. message/response) or asynchronous (e.g. message streaming).
1 - To be certain on this, you would need to do some forensic testing to 1) measure the system performance, and 2) determine what syscalls are made and when network packets are sent. For a general answer, you would need to repeat this for a number of use-cases. I'd also recommend looking at the Java library code yourself to confirm my (brief) reading.
2 - Probably only a little bit worse, but a well designed benchmark would pick up a small performance difference.
UPDATE
After writing the above, I found this Q&A - Performance issue using Javas Object streams with Sockets - which seems to suggest that using BufferedInputStream / BufferedOutputStream helps. However, I'm not certain whether the performance improvement that was reported is 1) real (i.e. not a warmup artefact) and 2) due to the buffering. It could be just due to adding the flush() call. (Why: because the flush could cause the network stack to push the data sooner.)
I think these links might help you:
What is the purpose of flush() in Java streams?
The flush method flushes the output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.
How java.io.Buffer* stream differs from normal streams?
Internally a buffer array is used and instead of reading bytes individually from the underlying input stream enough bytes are read to fill the buffer. This generally results in faster performance as less reads are required on the underlying input stream.
http://www.oracle.com/technetwork/articles/javase/perftuning-137844.html
As a means of starting the discussion, here are some basic rules on how to speed up I/O: 1.Avoid accessing the disk. 2.Avoid accessing the underlying operating system. 3.Avoid method calls. 4.Avoid processing bytes and characters individually.
So using Buffered-Streams usually speeds speeds up the IO-processe, as less read() are done in the background.

How does TCP_NODELAY affect consecutive write() calls?

TCP_NODELAY is an option to enable quick sending of TCP packets, regardless of their size. This is very useful option when speed matters, however, I'm curious what it will do to this:
Socket socket = [some socket];
socket.setTcpNoDelay(true);
OutputStream out = socket.getOutputStream();
out.write(byteArray1);
out.write(byteArray2);
out.write(byteArray3);
out.flush();
I tried to find what flush actually does on a SocketOutputStream, but as far as I know now, it doesn't do anything. I had hoped it would tell the socket "send all your buffered data NOW", but, unfortunately, no it doesn't.
My question is: are these 3 byte arrays sent in one packet? I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays, so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?
My question is: are these 3 byte arrays sent in one packet?
As you have disabled the Nagle algorithm, almost certainly not, but you can't be 100% sure.
I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays
Yes. Don't disable the Nagle algorithm.
so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?
Yes, or simpler still just wrap the socket output stream in a BufferedOutputStream and call flush() when you want the data to be sent, as per your present code. You are correct that flush() does nothing on a socket output stream, but it flushes a BufferedOutputStream.
Could manually packing the byte arrays and sending them in one call to write help?
Yes, send them all in a single call to write. This will maximize the chances that your bytes will be sent in a single packet.
Of course, you can never know, since there are too many variables involved - different OSs and lots of different networking gear between you and your peer, but if you give the OS the ability to pack everything together, it will generally try to.
If you disable nagle and make seperate system calls (remember, the OS controls the sockets, not your appliation or java), you're asking the OS to send them individually. The OS has no idea you're about to call write again with some more data.

Does it really matter the size of the objects you send through a Socket with object input/output streams?

Is it more efficient to flush the OutputStream after each individual invocation of ObjectOutputStream#writeObject rather than flushing the stream after a sequence of object writes? (Example: write object and flush 4 times, or write 4 times and then just flush once?)
How does ObjectOutputStream work internally?
Is it somehow better sending four Object[5] (flushing each one) than a Object[20], for example?
It is not better. In fact it is probably worse, from a performance perspective. Each of those flushes will force the OS-level TCP/IP stack to send the data "right now". If you just do one flush at the end, you should save on system calls, and on network traffic.
If you haven't done this already, inserting a BufferedOutputStream between the Socket OutputStream and the ObjectOutputStream will make a much bigger difference to performance. This allows the serialized data to accumulate in memory before being written to the socket stream. This potentially save many system calls and could improve performance by orders of magnitude ... depending on the actual objects being sent.
(The representation of four Object[5] objects is larger than one Object[20] object, and that results in a performance hit in the first case. However, this is marginal at most, and tiny compared with the flushing and buffering issues.)
How does this stream work internally?
That is too general a question to answer sensibly. I suggest that you read up on serialization starting with the documents on this page.
No, it shouldn't matter, unless you have reason to believe the net link is likely to go down, and partial data is useful. Otherwise it just sounds like a way to make the code more complex for no reason.
If you look at the one and only public constructor of ObjectOutputStream, you note that it requires an underlying OutputStream for its instantiation.
When and how you flush your ObjectStream is entirely dependent on the type of stream you are using. (And in considering all this, do keep in mind that not all extension of OutputStream are guaranteed to respect your request to flush -- it is entirely implementation independent, as it is spelled out in the 'contract' of the javadocs.)
But certainly we can reason about it and even pull up the code and see what is actually done.
IFF the underlying OutputStream must utilize the OS services for devices (such as the disk or the network interface in case of Sockets) then the behavior of flush() is entirely OS dependent. For example, you may grab the output stream of a socket and then instantiate an ObjectOutputStream to write serialized objects to the net. TCP/IP implementation of the host OS is in charge.
What is more efficient?
Well, if your object stream is wrapping a ByteArrayOutputStream, you are potentially looking at a series of reallocs and System.arrayCopy() calls. I say potentially, since the implementation of byte array doubles the size on each (internal) resize() op and it is very unlikely that writing n (small) objects and flushing each time will result in n reallocs. (Where n is assumed to be a reasonably small number).
But if you are wrapping a network stream, you must keep in mind that network writes are very expensive. It makes much more sense, if your protocol allows it, to chunk your writes (to fill the send buffer) and just flush once.

Categories

Resources