While working on a project using the the NIO.2 AIO features I looked in the "old" NIO selector implementation and saw that on windows the default select-function is used which does not scale at all on windows due to a bad internal implementation. Everybody knows that on windows IOCP is the only real solution. Of course the callback-on-completion model does not fit into the NIO selector model but does this effectively mean that using NIO on windows is basically not a good idea ?
For instance: The new AIO features include an IOCP implementation.
This is especially true while using the latest Netty framework where support for AIO has been dropped. So Netty is not as fast on Windows as it could be ?
NIO.2 uses IOCP
The call tree below demonstrates this for file i/o by featuring "Iocp" in several of the called class names, is from Java 7: NIO.2 File Channels on the test bench.
See also sun.nio.ch.Iocp.java, the "Windows implementation of AsynchronousChannelGroup encapsulating an I/O completion port".
NIO does not make use of IOCP, as it only supports "non-blocking i/o" (selectors), not "asynchronous i/o" (completion handlers) that was only added with NIO.2.
I think you are confusing asynchronous with faster. Certainly NIO buffers are faster than serializing the same data that would be in the buffers, but many AIO techniques incur costs and delays that can give synchronous IO an advantage.
There was an article a while back that did some pretty good benchmarking of various IO techniques, and the results were (a bit) surprising. The Netty people probably decided to align with the better performing (blocking) IO models mentioned.
The problem with IOCP and Java is that IOCP creates and manages threads. My understanding is that for IOCP to work in Java, the event system actually has to go through the Windows IOCP Thread, then scheduled on the Java ThreadPool for execution. This makes IOCP very very expensive to implement in Java versus C++/C#.
AIO was probably removed from Netty because no one wants to sacrifice 450,000 potential transactions just to use AIO versus NBIO. The transactional performance gap between AIO and NBIO is huge.
Related
What I know is after JDK 1.2 all Java Threads are created using 'Native Thread Model' which associates each Java Thread with an OS thread with the help of JNI and OS Thread library.
So from the following text I believe that all Java threads created nowadays can realize use of multi-core processors:
Multiple native threads can coexist. Therefore it is also called many-to-many model. Such characteristic of this model allows it to take complete advantage of multi-core processors and execute threads on separate individual cores concurrently.
But when I read about the introduction of Fork/Join Framework introduced in JDK 7 in JAVA The Compelete Reference :
Although the original concurrent API was impressive in its own right, it was significantly expanded by JDK 7. The most important addition was the Fork/Join Framework. The Fork/Join Framework facilitates the creation of programs that make use of multiple processors (such as those found in multicore systems). Thus, it streamlines the development of programs in which two or more pieces execute with true simultaneity (that is, true parallel execution), not just time-slicing.
It makes me question why the framework was introduced when 'Java Native Thread Model' existed since JDK 3?
Fork join framework does not replace the original low level thread API; it makes it easier to use for certain classes of problems.
The original, low-level thread API works: you can use all the CPUs and all the cores on the CPUs installed on the system. If you ever try to actually write multithreaded applications, you'll quickly realize that it is hard.
The low level thread API works well for problems where threads are largely independent, and don't have to share information between each other - in other words, embarrassingly parallel problems. Many problems however are not like this. With the low level API, it is very difficult to implement complex algorithms in a way that is safe (produces correct results and does not have unwanted effects like dead lock) and efficient (does not waste system resources).
The Java fork/join framework, an implementation on the fork/join model, was created as a high level mechanism to make it easier to apply parallel computing for divide and conquer algorithms.
System.IO.File in .NET and .NET Core has a family of Read...Async() methods, all of which return either Task<byte[]> or Task<string> (Task<T> is the .NET's equivalent of Java's Future<T>).
This looks largely equivalent to AsynchronousFileChannel APIs (which either consume a CompletionHandler or return a Future), with one major difference.
AsynchronousFileChannel uses a managed background thread to perform asynchronous I/O (the thread may be provided either by the default thread pool (sun.nio.ch.ThreadPool) or by the ExecutorService explicitly specified during channel creation).
FileStream implementation in .NET, on the other hand, passes FileOptions.Asynchronous flag to the underlying operating system (see also Synchronous and Asynchronous I/O), doesn't spawn any managed background threads and uses what is called an Overlapped I/O.
Questions:
Is there any (existing or planned) File I/O API in Java which would use Overlapped I/O on Windows and POSIX AIO on Unices? Update: Windows-specific Java runtime features sun.nio.ch.WindowsAsynchronousFileChannelImpl which is exactly an abstraction layer on top of Overlapped I/O.
Are there any plans to provide java.nio.channels.SelectableChannel implementations for File I/O? If no, what are the technical limitations?
It is not really possible. The Whole IO API would have to be re-implemented. NIO means non blocking I/O it is not the same as Asynchronous I/O. Non blocking is implemented in JAVA and long story short that means the OS has no ability to notify runtime that operation is completed. Isned java uses select() or poll() system calls to check if data is available.
I could talk about it but stollen picture is worth 100 words:
That is why in JAVA the separate thread is required to constantly call check,check,check,check .....
I don't know .NET platform but if what you posted is correct it utilizing asynchronous I/O so the last column. But I don't trust anything that comes from Microsoft.
Hope it answers your question. Also here I a additional reading material:
https://stackoverflow.com/a/2625565/8951886
Strangely, i was unable to find in Google clear answer to NIO.2 async IO performance vs using NIO's multiplexed IO via java.nio.channels.Selector.
So, my question is:
Does NIO.2 AsynchronousChannel have better performance than NIO Selector?
Of course, i'm interested in server side of things under different load profiles - number of simultaneous connections/average connection lifetime/traffic.
The only information i was able to find is that Windows IOCP is slightly better than Windows select.
I don't think NIO.2 will have better performance than NIO, because NIO.2 still make use of select/poll system calls and thread pools to simulate asynchronous IO. One example is that Netty removed NIO.2 support in 4.0.0, because the author think that NIO.2 doesn't bring better performance than NIO in Linux platform.
I read in a couple of comments on SO Java 7 supports kernel bypass. However, when googling the topic I did not see any immediate examples of this.
Does anyone have an example of Java 7 performing kernel bypass? I'd be interested to see it
The Answers to this related Question mention that SolarFlare has Java bindings: Networking with Kernel Bypass in Java.
As far as Java 7 is concerned, there is no support for this kind of thing in the core libraries. Kernel bypass is too system / vendor specific for inclusion in the standard APIs.
You can do other things to improve network throughput in Java that don't involve kernel bypass. For instance using the NIO Buffer and Channel APIs ... However, your typical Java "framework" tends to get in the way of this ... by only exposing Stream / Reader and other high level I/O abstractions to "application" code.
(I would also opine that if you have an application where network latency and throughput are critical enough for kernel bypass to be worthwhile, you should use a programming language that is "closer to the metal". Java is better for applications where the biggest problem is application complexity ... NOT moving lots of bits through the network fast.)
Take a look at the Onload Extensions API JNI Wrapper on github. The author seems to specialize in kernel bypass.
Kernel bypassing is a method of avoiding the kernel when reading/writing to external data sources, e.g. files or networking.
Instead, you directly access the data storage without letting all the bytes running through the OS kernel. This is usually faster but also less secure, since the entire process is not supervised by the operating system anymore.
Assumption:
In regard to Java, the kernel (could) represent(s) the JVM.
I have found a very good article on this.
I've just started exploring java NIO, non-blocking IO.
I'm interested to know the fundamentals behind the implementation. How is communication between Java selector and physical socket is established? Is there a operating system level thread that polls underlying resource continuously? And is there any java thread per selector continously polling to receive these events? Can someone of you kindly point me to this.
No, the point of select is that you don't have to waste cycles polling when nothing is happening. Every OS implements this capability in some way or other (usually through hardware interrupts) and makes it available to user-space programs through the select() system call. The connection to the Java language is that the JVM now contains code that will call the OS's select for you if you use the right NIO classes and methods. But this required changes to the JVM code itself, it isn't something that you could have done purely within Java before NIO.
Since it is not specified in the documentation, I'd assume that (strictly speaking) this is implementation dependent.
However in *NIX and Windows the implementation typically relies directly on the select system call. This system call is not implemented by spawning multiple threads.
It depends on the operation system used. On Linux the current implementation use's the kernel's epoll mechanism.
Typically the underlying kernel network system is filling or draining buffers for the socket, probably on it's IRQ handling threads. So what you are waiting for is the kernel to tell you that a buffer is ready to be filled (writing) or read to be draining (reading).
I think it's better first give you a picture(take from other guy's blog)
(source: csdn.net)
Also some information get from that blog,
For select implementation,it depends on OS. For epoll/select in *nix ENV, you can get more information from 《Unix network programming》
And for notify/wakeup the select, the JVM also use different implementation, like TCP/IP on windows, pipes on *nix.