Do all Java output streams have buffers? - java

I used OutputStreamWriter with socket. When I tried to write data, I found that the data was not sent out. So I tried the flush() and it succeeded.
I would like to ask some question:
Do all output streams have a buffer ? So is there any unique benefit of a class like BufferedOutputStream? (Since there are buffers)
when I use close(). Has the data in the buffer disappeared or was flushed out? I tested it, the data in the buffer was not successfully written out. But I have seen in many places that calling the close() will flush the data in the buffer?

Not all OutputStream implementations have buffers.
That's trivially proven, since you can just implement your own.
But most uses of an OutputStream will often involve a buffer at some level.
BufferedOutputStream is the obvious candidate, where the buffer is diretly in the class as the primary feature of the class. This way you can add explicit buffering to any other OutputStream.
OutputStreams that communicate with the OS such as FileOutputStream or the OutputStream returned for a network connection don't have explicit buffers themselves, but will write to the OS which may or may not have its own buffer (for example most OS won't immediately write to a file on a write system call and only flush on close or after some time).
In either case (direct buffer in the OutputStream or an underlying system buffer) a flush call tends to initiate a write-back to the "real" target.
Note that a flush before a close should never be necessary on a conforming OutputStream.

Not all OutputStream implementations perform buffering, although buffering is quite a common feature that they had the flush() method defined as part of the contract of OutputStream (specifically by extending a Flushable interface). I think the best source of information about this is the Javadoc of that method:
Flushes this output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.
If the intended destination of this stream is an abstraction provided by the underlying operating system, for example a file, then flushing the stream guarantees only that bytes previously written to the stream are passed to the operating system for writing; it does not guarantee that they are actually written to a physical device such as a disk drive.
The flush method of OutputStream does nothing.
I've highlighted the part where it's clear that it depends on the implementation whether or not buffering is implemented.

Related

Why does the PrintWriter class (and other writers) require a call to flush after writing?

I have noticed that some I/O Classes in Java (and a lot others, like the BufferedWriter and FileWriter), require a call to flush() after writing. (With the exception of AutoFlush, I'll get to that later).
For example, this call to println() will not work. However, if I invoke writer#flush() after, the line will print.
PrintWriter writer = new PrintWriter(System.out);
writer.println("test");
Also, does autoflushing impact performance in any way (especially in larger/consistent writes), or is it just a convenience, and is it recommended to use it?
Why does the PrintWriter class (and other writers) require a call to flush after writing?
To the extent that flushing is wanted1, it will be needed if the "stack" of output classes beneath the print writer is doing some output buffering. If an output stream is buffered, then some event needs to trigger pushing (flushing) the buffered output to the the external file, pipe, socket or whatever. The things that will trigger flushing are:
the buffer filling up
something calling close() on the stream, or
something calling flush() on the stream.
In the case of a PrintWriter, the underlying stream can also be flushed by the classes auto-flushing mechanism.
The reason for buffering output (in general) is efficiency. Performing the low-level output operation that writes data to the (external) file, pipe, whatever involves a system call. There are significant overheads in doing this, so you want to avoid doing lots of "little" writes.
Output buffering is the standard way to solve this problem. Data to be written is collected in the buffer until the buffer fills up. The net result us lots of "little" writes can be aggregated into a "big" write. The performance improvement can be significant.
Also, does autoflushing impact performance in any way (especially in larger/consistent writes), or is it just a convenience, and is it recommended to use it?
It is really a convenience to avoid having to explicitly flush. It doesn't improve performance. Indeed, if you don't need the data to be flushed1, then unnecessary auto-flushing will reduce performance.
1 - You would want the data to be flushed if someone or something wants to see the data you are writing as soon as possible.
They don't require flushing, only if you want to guarantee that output has been displayed so far, which is exactly what flushing is. If you are fine writing to a file and just want to make sure it gets there before the program terminates, then no need to flush.
When data is written to an output stream, the underlying
an operating system does not guarantee that the data will make it
to the file system immediately. In many operating systems, the
data may be cached in memory, with a write occurring only
after a temporary cache is filled or after some amount of time
has passed.
If the data is cached in memory and the application terminates
unexpectedly, the data would be lost, because it was never
written to the file system. To address this, all output stream
classes provide a flush() method, which requests that all accumulated data be written immediately to disk.
The flush() method helps reduce the amount of data lost if the
application terminates unexpectedly. It is not without cost,
though. Each time it is used, it may cause a noticeable delay in
the application, especially for large files. Unless the data that
you are writing is extremely critical, the flush() method should
be used only intermittently. For example, it should not
necessarily be called after every write.
You also do not need to call the flush() method when you have
finished writing data, since the close() method will
automatically do this.
Read from the book here -> OCP Oracle Certified Professional Java SE 11
Hope this is clear to you!

Using buffered streams for sending objects?

I'm currently using Java sockets in a client-server application with OutputStream and not BufferedOutputStream (and the same for input streams).
The client and server exchanges serialized objects (writeObject() method).
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
And when I have to flush or should I not write a flush() statement?
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
Actually, it probably doesn't make sense1.
The object stream implementation internally wraps the stream it has been given with a private class called BlockDataOutputStream that does buffering. If you wrap the stream yourself, you will have two levels of buffering ... which is likely to make performance worse2.
And when I have to flush or should I not write a flush() statement?
Yes, flushing is probably necessary. But there is no universal answer as to when to do it.
On the one hand, if you flush too often, you generate extra network traffic.
On the other hand, if you don't flush when it is needed, the server can stall waiting for an object that the client has written but not flushed.
You need to find the compromise between these two syndromes ... and that depends on your application's client/server interaction patterns; e.g. whether the message patterns are synchronous (e.g. message/response) or asynchronous (e.g. message streaming).
1 - To be certain on this, you would need to do some forensic testing to 1) measure the system performance, and 2) determine what syscalls are made and when network packets are sent. For a general answer, you would need to repeat this for a number of use-cases. I'd also recommend looking at the Java library code yourself to confirm my (brief) reading.
2 - Probably only a little bit worse, but a well designed benchmark would pick up a small performance difference.
UPDATE
After writing the above, I found this Q&A - Performance issue using Javas Object streams with Sockets - which seems to suggest that using BufferedInputStream / BufferedOutputStream helps. However, I'm not certain whether the performance improvement that was reported is 1) real (i.e. not a warmup artefact) and 2) due to the buffering. It could be just due to adding the flush() call. (Why: because the flush could cause the network stack to push the data sooner.)
I think these links might help you:
What is the purpose of flush() in Java streams?
The flush method flushes the output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.
How java.io.Buffer* stream differs from normal streams?
Internally a buffer array is used and instead of reading bytes individually from the underlying input stream enough bytes are read to fill the buffer. This generally results in faster performance as less reads are required on the underlying input stream.
http://www.oracle.com/technetwork/articles/javase/perftuning-137844.html
As a means of starting the discussion, here are some basic rules on how to speed up I/O: 1.Avoid accessing the disk. 2.Avoid accessing the underlying operating system. 3.Avoid method calls. 4.Avoid processing bytes and characters individually.
So using Buffered-Streams usually speeds speeds up the IO-processe, as less read() are done in the background.

Is it overkill to use BufferedWriter and BufferedOutputStream together?

I want to write to a socket. From reading about network IO, it seems to me that the optimal way to write to it is to do something like this:
OutputStream outs=null;
BufferedWriter out=null;
out =
new BufferedWriter(
new OutputStreamWriter(new BufferedOutputStream(outs),"UTF-8"));
The BufferedWriter would buffer the input to the OutputStreamWriter which is recommended, because it prevents the writer from starting up the encoder for each character.
The BufferedOutputStream would then buffer the bytes from the Writer to avoid putting one byte at a time potentially onto the network.
It looks a bit like overkill, but it all seems like it helps?
Grateful for any help..
EDIT: From the javadoc on OutputStreamWriter:
Each invocation of a write() method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream. The size of this buffer may be specified, but by default it is large enough for most purposes. Note that the characters passed to the write() methods are not buffered.
For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations. For example:
Writer out = new BufferedWriter(new OutputStreamWriter(System.out));
The purpose of the Buffered* classes is to coalesce small write operations into a larger one, thereby reducing the number of system calls, and increasing throughput.
Since a BufferedWriter already collects writes in a buffer, then converts the characters in the buffer into another buffer, and writes that buffer to the underlying OutputStream in a single operation, the OutputStream is already invoked with large write operations. Therefore, a BufferedOutputStream finds nothing to combine, and is simply redundant.
As an aside, the same can apply to the BufferedWriter: buffering will only help if the writer is only passed few characters at a time. If you know the caller only writes huge strings, the BufferedWriter will find nothing to combine and is redundant, too.
The BufferedWriter would buffer the input to the outputStreamWriter, which is recommended because it prevents the writer from starting up the encoder for each character.
Recommended by who, and in what context? What do you mean by "starting up the encoder"? Are you writing a single character at a time to the writer anyway? (We don't know much about how you're using the writer... that could be important.)
The BufferedOutputStream would then buffer the bytes from the Writer to avoid putting one byte at a time potentially onto the network.
What makes you think it would write one byte at a time? I think it very unlikely that OutputStreamWriter will write a byte at a time to the underlying writer, unless you really write a character at a time to it.
Additionally, I'd expect the network output stream to use something like Nagle's algorithm to avoid sending single-byte packets.
As ever with optimization, you should do it based on evidence... have you done any testing with and without these layers of buffering?
EDIT: Just to clarify, I'm not saying the buffering classes are useless. In some cases they're absolutely the right way to go. I'm just saying that as with all optimization, they shouldn't be used blindly. You should consider what you're trying to optimize for (processor usage, memory usage, network usage etc) and measure. There are many factors which matter here - not least of which is the write pattern. If you're already writing "chunkily" - writing large blocks of character data - then the buffers will have relatively little impact. If you're actually writing a single character at a time to the writer, then they would be more significant.
Yes it is overkill. From the Javadoc for OutputStreamWriter: "Each invocation of a write() method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream.".

FileInputStream and FileOutputStream to the same file: Is a read() guaranteed to see all write()s that "happened before"?

I am using a file as a cache for big data. One thread writes to it sequentially, another thread reads it sequentially.
Can I be sure that all data that has been written (by write()) in one thread can be read() from another thread, assuming a proper "happens-before" relationship in terms of the Java memory model? Is this behavior documented?
In my JDK, FileOutputStream does not override flush(), and OutputStream.flush() is empty. That's why I'm wondering...
The streams in question are owned exclusively by a class that I have full control of. Each stream is guaranteed to be accesses by one thread only. My tests show that it works as expected, but I'm still wondering if this is guaranteed and documented.
See also this related discussion.
Assuming you are using a posix file system, then yes.
FileInputStream and FileOutputStream on *nix use the read and write system calls internally. The documentation for write says that reads will see the results of past writes,
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the write()
for that position until such byte positions are again modified.
I'm pretty sure ntfs on windows will have the same read() write() guarantees.
You can't talk about "happens-before" relationship in terms of the Java memory model between your FileInputStream and FileOutputStream objects since they don't share any memory or thread. VM is free to reorder them just honoring your synchronization requirements. When you have proper synchronization between reads and writes without application level buffering, you are safe.
However FileInputStream and FileOutputStream share a file, which leaves things up to the OS which in main stream ones you can expect to read after write in order.
If FileOutputStream does not override flush(), then I think you can be sure all data written by write() can be read by read(), unless your OS does something weird with the data (like starting a new thread that waits for the hard drive to spin at the right speed instead of blocking, etc) so that it is not written immediately.
No, you need to flush() the Streams (at least for Buffered(Input|Output)Streams), otherwise you could have data in a buffer.
Maybe you need a concurrent data structure?

Does it really matter the size of the objects you send through a Socket with object input/output streams?

Is it more efficient to flush the OutputStream after each individual invocation of ObjectOutputStream#writeObject rather than flushing the stream after a sequence of object writes? (Example: write object and flush 4 times, or write 4 times and then just flush once?)
How does ObjectOutputStream work internally?
Is it somehow better sending four Object[5] (flushing each one) than a Object[20], for example?
It is not better. In fact it is probably worse, from a performance perspective. Each of those flushes will force the OS-level TCP/IP stack to send the data "right now". If you just do one flush at the end, you should save on system calls, and on network traffic.
If you haven't done this already, inserting a BufferedOutputStream between the Socket OutputStream and the ObjectOutputStream will make a much bigger difference to performance. This allows the serialized data to accumulate in memory before being written to the socket stream. This potentially save many system calls and could improve performance by orders of magnitude ... depending on the actual objects being sent.
(The representation of four Object[5] objects is larger than one Object[20] object, and that results in a performance hit in the first case. However, this is marginal at most, and tiny compared with the flushing and buffering issues.)
How does this stream work internally?
That is too general a question to answer sensibly. I suggest that you read up on serialization starting with the documents on this page.
No, it shouldn't matter, unless you have reason to believe the net link is likely to go down, and partial data is useful. Otherwise it just sounds like a way to make the code more complex for no reason.
If you look at the one and only public constructor of ObjectOutputStream, you note that it requires an underlying OutputStream for its instantiation.
When and how you flush your ObjectStream is entirely dependent on the type of stream you are using. (And in considering all this, do keep in mind that not all extension of OutputStream are guaranteed to respect your request to flush -- it is entirely implementation independent, as it is spelled out in the 'contract' of the javadocs.)
But certainly we can reason about it and even pull up the code and see what is actually done.
IFF the underlying OutputStream must utilize the OS services for devices (such as the disk or the network interface in case of Sockets) then the behavior of flush() is entirely OS dependent. For example, you may grab the output stream of a socket and then instantiate an ObjectOutputStream to write serialized objects to the net. TCP/IP implementation of the host OS is in charge.
What is more efficient?
Well, if your object stream is wrapping a ByteArrayOutputStream, you are potentially looking at a series of reallocs and System.arrayCopy() calls. I say potentially, since the implementation of byte array doubles the size on each (internal) resize() op and it is very unlikely that writing n (small) objects and flushing each time will result in n reallocs. (Where n is assumed to be a reasonably small number).
But if you are wrapping a network stream, you must keep in mind that network writes are very expensive. It makes much more sense, if your protocol allows it, to chunk your writes (to fill the send buffer) and just flush once.

Categories

Resources