I revise knowledge about I/O in java. Now I an introduce with pipes. I noticed that java.io.* have pipe mechanism and java.nio.* have similar mechanism.
I am reading some tutorials about these things. My current vision - pipes from NIO and pipes from I/O looks same. But I understand that it is strange to have duplicate things in JDK. Thus I think that I am wrong.
Can you clarify difference?
IO is stream-oriented, NIO is buffer-oriented.
IO streams are blocking, NIO has non-blocking mode.
In this Java NIO Tutorial (by Jakob Jenkov) you find background and examples. It also helps you finding the best approach on your case.
Related
Original
I have a batch job that passes through the contents it downloads from various URLs to an S3 storage. I am currently using blocking IO and have reached a point where my job is IO bound blocking most of the time because of IO. So in order to speed up the whole process I was thinking about using non-blocking IO.
Unfortunately I wasn't able to find utility code for passing through content from a set of channels to another set of channels. Since I read that writing correct non-blocking code is not exactly easy, I would prefer to use an existing utility/framework than to write that code myself.
The TransferManager seems to be the only possible option for higher throughput when using the AWS SDK, but it only offers the usage of streams and seems to use IO threads in the background. Apparently there is no out of the box option for non-blocking uploads to S3.
What would you recommend? Right now I can only imagine the following solutions.
Stay with blocking IO and use my own IO thread pool
Use non-blocking IO to download the files to the local filesystem, then upload with TransferManager
Use non-blocking IO for pass through
Option 1 will obviously not scale and 2 will probably work for a while, but I would really like to keep my IOs on EBS low so I'd rather use 3.
To successfully implement option 3 I guess I would have to implement a lot myself so my final question is, whether you think it's worth it and if so, which tools I could use to make this work.
Edit 1
Clarifying that by IO bound I actually meant that the job is mostly waiting for IO. Here you can see that my bandwidth is not really saturated, but I would like it to be if that is possible.
If your job is I/O bound you've finished. You're bound by the speed of the network, not of your code. Using NIO won't make it any faster.
Clarifying that by IO bound I actually meant that the job is mostly waiting for IO.
Yes, that's what 'I/O bound' means. Nothing has changed.
Here you can see that my bandwidth is not really saturated, but I would like it to be if that is possible.
You could try using larger buffers, but as I said it seems to me that you've already finished.
Strangely, i was unable to find in Google clear answer to NIO.2 async IO performance vs using NIO's multiplexed IO via java.nio.channels.Selector.
So, my question is:
Does NIO.2 AsynchronousChannel have better performance than NIO Selector?
Of course, i'm interested in server side of things under different load profiles - number of simultaneous connections/average connection lifetime/traffic.
The only information i was able to find is that Windows IOCP is slightly better than Windows select.
I don't think NIO.2 will have better performance than NIO, because NIO.2 still make use of select/poll system calls and thread pools to simulate asynchronous IO. One example is that Netty removed NIO.2 support in 4.0.0, because the author think that NIO.2 doesn't bring better performance than NIO in Linux platform.
While working on a project using the the NIO.2 AIO features I looked in the "old" NIO selector implementation and saw that on windows the default select-function is used which does not scale at all on windows due to a bad internal implementation. Everybody knows that on windows IOCP is the only real solution. Of course the callback-on-completion model does not fit into the NIO selector model but does this effectively mean that using NIO on windows is basically not a good idea ?
For instance: The new AIO features include an IOCP implementation.
This is especially true while using the latest Netty framework where support for AIO has been dropped. So Netty is not as fast on Windows as it could be ?
NIO.2 uses IOCP
The call tree below demonstrates this for file i/o by featuring "Iocp" in several of the called class names, is from Java 7: NIO.2 File Channels on the test bench.
See also sun.nio.ch.Iocp.java, the "Windows implementation of AsynchronousChannelGroup encapsulating an I/O completion port".
NIO does not make use of IOCP, as it only supports "non-blocking i/o" (selectors), not "asynchronous i/o" (completion handlers) that was only added with NIO.2.
I think you are confusing asynchronous with faster. Certainly NIO buffers are faster than serializing the same data that would be in the buffers, but many AIO techniques incur costs and delays that can give synchronous IO an advantage.
There was an article a while back that did some pretty good benchmarking of various IO techniques, and the results were (a bit) surprising. The Netty people probably decided to align with the better performing (blocking) IO models mentioned.
The problem with IOCP and Java is that IOCP creates and manages threads. My understanding is that for IOCP to work in Java, the event system actually has to go through the Windows IOCP Thread, then scheduled on the Java ThreadPool for execution. This makes IOCP very very expensive to implement in Java versus C++/C#.
AIO was probably removed from Netty because no one wants to sacrifice 450,000 potential transactions just to use AIO versus NBIO. The transactional performance gap between AIO and NBIO is huge.
I had built a java TCPServer using serversocketchannels running on one port. However, it is not very scalable as it attends to one incoming socket (blocking mode) only.
I want to extend this TCPServer to service multiple incoming sockets (maximum 10 incoming sockets). As such, am wondering if i should implement the TCPServer using non-blocking io or use thread+blocking io.
Paul Tyma recently compared the two approaches, generating diverse discussion. Under certain circumstances, modern threading libraries can outperform java.nio.channels.Selector. As the result is somewhat counter-intuitive, you may have to prototype both to get a definitive answer.
JBoss-Netty or Apache-Mina are nio framework that provide much things to implement your own server. So, now i'm using netty and happy with it.
With only 10 incoming sockets I don't think the effect is clear to see. What you should do is to focus on the upper layer (protocol, application) and leave that low level network implementation to a framework. I would recommend the Apache Mina for that job. As you will see, I does the job very well with blocking or non blocking, you choose; and leave open interfaces for you to implement the protocol & application.
I would use threads and blocking I/O until you know you have at least 1000 concurrent connections. That also gives you an easy way to get it working. When and if you get to 1000, evaluate.
It seems to be pretty fast already and i was just wondering if anyone knows if its using NIO. I tried searching the whole source code for NIO (well, its kind of way to search :) lol); but did not hit anything. Also, If its not using NIO; do you think its worthwhile to modify log4j to use NIO also to make it more faster? Any pointers advises and links to some resources would be lot helpful.
Also, If its not using NIO; do you
think its worthwhile to modify of
log4j to use NIO also to make it more
faster?
No, unless logging is a significant part of your application's activities, in which case there is usually something wrong.
You seem to be under the impression that NIO 'is faster', but this is not true in general. Just try creating two files, one with standard IO and one with NIO, writing a bunch of data to them and closing them. You'll see the performance hardly differs. NIO will only perform better in certain use cases; most usually the case of many connections.
Check out the FileAppender source. Pretty much just standard java.io.
I don't see any reason why FileChannel could be any faster than FileOutputStream in this case.
maybe by using MappedByteBuffer? but in append mode, the behavior is OS dependent.
ultimately, the performance depends on your hard drive, your code matters very little.
Elaborating on Confusion's answer, File NIO blocks as well. That's why it is not faster than traditional File IO in some scenarios. Quoting O'Reilly's Java NIO book:
File channels are always blocking and cannot be placed into
nonblocking mode. Modern operating systems have sophisticated caching
and prefetch algorithms that usually give local disk I/O very low
latency. Network filesystems generally have higher latencies but often
benefit from the same optimizations. The nonblocking paradigm of
stream-oriented I/O doesn't make as much sense for file-oriented
operations because of the fundamentally different nature of file I/O.
For file I/O, the true winner is asynchronous I/O, which lets a
process request one or more I/O operations from the operating system
but does not wait for them to complete. The process is notified at a
later time that the requested I/O has completed. Asynchronous I/O is
an advanced capability not available on many operating systems. It is
under consideration as a future NIO enhancement.
Edit: With that said, you can get better read/write efficiency if you use File NIO with a MappedByteBuffer. Note that using MappedByteBuffer in Log4j 2 is under consideration.