I am reading the book Thinking in Java, which explains the java.nio.* package and says that NIO is faster than reading and writing files with traditional IO streams. Why?
I have reviewed the following information:
IO stream is byte oriented, traditional IO processing unit is byte, and NIO processing unit is block (byte array), but I think traditional IO can also directly process block (byte array) through BufferedFile*, and traditional IO also has direct Method of processing byte array
private native int readBytes(byte b[], int off, int len) throws IOException;
IO is blocking read, NIO can be non-blocking, but I found that the file NIO can only be non-blocking, then NIO has no advantage.
I think the need to use NIO is generally other advantages that need to use NIO, such as:
transferTo()/transferFrom()
So, when should I use NIO for file reading and writing? Why is it faster than traditional IO? What is the correct way to use it? Should I use IO or NIO only when reading and writing files?
There are only two cases where a FileChannel is faster than a FileInputStream or FileOutputStream.
The first is when you can use an off-heap ("direct") ByteBuffer to hold data, so that it isn't copied into the Java heap. For example, if you were writing a web-server that delivered static files to a socket, it would be faster to use a FileInputStream and a SocketChannel rather than a FileInputStream and a SocketOutputStream.
These cases are, in my opinion, very few and far between. Normally when you read (or write) a file in Java you will be doing something with the data. In which case, you can't avoid copying the data onto the heap.
The other use for a FileChannel is to create a MappedByteBuffer for random access to the contents of a file. This is significantly faster than using RandomAccessFile because it replaces explicit calls to the OS kernel with memory accesses that leverage the OS's paging mechanism.
If you're just getting started with I/O in Java, I recommend sticking with the classes in java.io unless and until you can explain why switching to java.nio will give you improved performance. It's much easier to use a stream-oriented abstraction than a block-oriented one.
Related
I wanted to pipe an OutputStream to an InputStream such that every time I write to my OutputStream those bytes become available in my InputStream.
Reading the JDK documentation I found the PipeInputStream and the PipeOutputStream which seemed a good alternative, e.g.
PipedOutputStream out = new PipedOutputStream();
PipedInputStream in = new PipedInputStream(out);
However, the documentation explicitly states the two streams must be used in separate threads.
Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.
Is there another easy way to pipe these two streams to run in the same thread using some other features available in Java?
I suppose the major issue here is buffering the data being written by to output, since while we're writing, those bytes get buffered somewhere to be later consumed by the input reading part.
However I'm working with discrete amounts of data, of the kind that would easily fit in memory. So for me buffering a few bytes is not a big concern. I'm more interested in finding a simple pattern to do this.
Being that the case, I thought I could easily do this manually by writing everything to a ByteArrayOutputStream and then get the bytes from it and read them again in a ByteArrayInputStream.
However this piping scenario seems such a natural use case that I was wondering if there's another simpler way to pipe two streams in a single-threaded application, e.g.
output.pipe(input);
message.writeTo(output);
process(input);
I want to know the distinction to clear the conceptual difference as I have been seeing SocketChannel, FileChannel etc. classes. compared to Socket and File I/O Streams
As I know, I/O Streams must be accessed sequentially i.e. they are a sequence of bytes which can be read and written to. You can also use Buffered Stream to increase efficiency of I/O as well.
So, compared to Streams, are "Channels" a totally new concept or just a wrapper over Streams?
Yes, If we say "Stream is a sequence of bytes" then what is a Channel in that sense if both are different?
Neither. Channels are not wrappers around streams (unless you explicitly wrap a stream via Channels.newChannel(InputStream) or Channels.newChannel(OutputStream)) and they are not a “totally new concept”.
Depending on the particular type, a channel still represents a sequence of bytes that may be read or written sequentially. The fact that you can translate between these APIs via factory methods in the Channels class shows that there is a relationship.
But the NIO API addresses certain design issues which could not be fixed by refactoring the old stream classes (in a compatible way). E.g. the base types are interfaces now, which allows certain channels to implement multiple types, like ReadableByteChannel and WritableByteChannel at the same time. Further, there is no method for reading a single byte, which is a good way to get rid of the “You can use BufferedStream to increase efficiency” myth. If an insufficient buffer size is the cause of an I/O performance bottleneck, you solve it by providing a larger buffer in the first place, rather than wrapping a stream or channel into another, forcing it to copy all data between buffers. Consequently, there is no BufferedChannel.
Certain implementations like FileChannel offer additional methods allowing random access to the underlying resource, in addition to the sequential access. That way, you can use a uniform interface, instead of dealing with entirely different APIs, like with the RandomAccessFile/ InputStream/ OutputStream relationship.
Further, a lot of previously missing I/O features were added when NIO was introduced. Some of them could have been implemented atop the old classes without problems, but the designers clearly favored using the new API for all of them, where these features could be considered in the design right from the start.
But generally, as said, a channel isn’t an entirely new concept compared to streams.
I'm currently using Java sockets in a client-server application with OutputStream and not BufferedOutputStream (and the same for input streams).
The client and server exchanges serialized objects (writeObject() method).
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
And when I have to flush or should I not write a flush() statement?
Does it make sense (more speed) to use BufferedOutputStream and BufferedInputStream in this case?
Actually, it probably doesn't make sense1.
The object stream implementation internally wraps the stream it has been given with a private class called BlockDataOutputStream that does buffering. If you wrap the stream yourself, you will have two levels of buffering ... which is likely to make performance worse2.
And when I have to flush or should I not write a flush() statement?
Yes, flushing is probably necessary. But there is no universal answer as to when to do it.
On the one hand, if you flush too often, you generate extra network traffic.
On the other hand, if you don't flush when it is needed, the server can stall waiting for an object that the client has written but not flushed.
You need to find the compromise between these two syndromes ... and that depends on your application's client/server interaction patterns; e.g. whether the message patterns are synchronous (e.g. message/response) or asynchronous (e.g. message streaming).
1 - To be certain on this, you would need to do some forensic testing to 1) measure the system performance, and 2) determine what syscalls are made and when network packets are sent. For a general answer, you would need to repeat this for a number of use-cases. I'd also recommend looking at the Java library code yourself to confirm my (brief) reading.
2 - Probably only a little bit worse, but a well designed benchmark would pick up a small performance difference.
UPDATE
After writing the above, I found this Q&A - Performance issue using Javas Object streams with Sockets - which seems to suggest that using BufferedInputStream / BufferedOutputStream helps. However, I'm not certain whether the performance improvement that was reported is 1) real (i.e. not a warmup artefact) and 2) due to the buffering. It could be just due to adding the flush() call. (Why: because the flush could cause the network stack to push the data sooner.)
I think these links might help you:
What is the purpose of flush() in Java streams?
The flush method flushes the output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that, if any bytes previously written have been buffered by the implementation of the output stream, such bytes should immediately be written to their intended destination.
How java.io.Buffer* stream differs from normal streams?
Internally a buffer array is used and instead of reading bytes individually from the underlying input stream enough bytes are read to fill the buffer. This generally results in faster performance as less reads are required on the underlying input stream.
http://www.oracle.com/technetwork/articles/javase/perftuning-137844.html
As a means of starting the discussion, here are some basic rules on how to speed up I/O: 1.Avoid accessing the disk. 2.Avoid accessing the underlying operating system. 3.Avoid method calls. 4.Avoid processing bytes and characters individually.
So using Buffered-Streams usually speeds speeds up the IO-processe, as less read() are done in the background.
I use MappedByteBuffers to achieve thread safety between readers and writers of a file via volatile variables (writer updates position and readers read the writer's position) (this is a file upload system, the the incoming file is a stream, if that matters). There are more tricks, obviously (sparse files, power of two mapping growth), but it all boils down to that.
I can't find a faster way to write to a file while concurrently reading the same without caching the same completely in memory (which I cannot do due to shear size).
Is there any other method of IO that guarantees visibility within the same process for readers to written bytes? MappedByteBuffer makes its guarantees, indirectly, via the Java Memory Model, and I'd expect any other solution to do the same (read: non platform specific and more).
Is this the fastest way? Am I missing something in the docs?
I did some tests quite a few years ago on what was then decent hardware, and MappedByteBuffer was about 20% faster than any other I/O technique. It does have the disadvantage for writing that you need to know the file size in advance.
Which of the two would be the best choice and in which circumstance?
Clearly there is no sense in using a file channel for a very small file. Besides that, what are the pro and cons of the two input/output means.
Thanks a lot in advance.
FileChannel has many features missing in java.io: it is interruptible, it can move position within the file, it can lock a file, etc. And it can be faster than old IO, especially when it uses direct byte buffers, here is an explanation from ByteBuffer API:
byte buffer is either direct or non-direct. Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.
If you need none of the above features go with streams, you'll get a shorter code.