Allocation free implementation of AsynchronousByteChannel (Java NIO.2)

Allocation free implementation of AsynchronousByteChannel (Java NIO.2) - java

All implementations of AsynchronousByteChannel in Java 8 allocate additional objects for each channel read/write operation. That looks strange for me as this API was intended (*) to be used for high performance applications and allocation of additional objects on hot execution path does not match best practices for such applications. Are there any alternative implementations that may yield good network i/O performance without allocations?
Note: One of reasons why I planned to use NIO2 is Iocp support for Windows. NIO from Java 1.4 does not use it.
* well, I don't know original intentions for this API but I assume that "normal" applications are completely satisfied with common blocking IO API and just don't need NIO/AIO.

Related

How is it possible to avoid making garbage in Java?

As I know, Java does not have any method for managing memory, because the whole memory management is done by the built in automatically running garbage collector, which might be a little bit inefficient in some cases.
http://www.coralblocks.com/
I have found this website, which tells that they are making a java tools and libraries which works without making any garbage at all. I would like to get some kind of logical explanation about how could it be possible.

http://www.coralblocks.com/index.php/2015/10/is-coralfix-the-fastest-and-easiest-to-use-fix-engine/
All Coral Blocks components produce zero garbage for the GC in the critical path.
My guess. Pre-allocated buffers, no String objects. As they say:
At Coral Blocks we use Java as a syntax language. Our libraries have zero external dependencies and we don’t even rely on the JDK standard libraries. With CoralFIX you have total control over the critical path.

In fact, the article about CoralFIX says:
Zero Garbage: All Coral Blocks components produce zero garbage for the GC in the critical path.
That's not the same as saying zero garbage at all. And it is achievable (for Coral) only in a relatively small class of applications; i.e. message-based systems where you can do all of the work by in-place matching on the bytes in a message buffer. As soon as you need to need to use normal data structures or (most) standard library classes you will be generating objects.
And ...
At Coral Blocks we use Java as a syntax language.
In other words, Coral Blocks application programmers don't write Java code!
Is it possible to write your code to do the same thing?
In theory Yes, but in practice probably No. You would need to replace so much of the functionality of the Java SE libraries (and 3rd party libraries) that you would be better off writing your application1 in a different programming language.
1 - I guess, if your application was simple and had minimal Java SE and external library dependencies, then it would be feasible to do it. But few non-trivial applications are like that.

It's not possible to completely cease creating garbage, and it's premature to try to optimize out garbage creation except for certain specific tasks and on extremely memory constrained systems. A great deal of tasks will cause allocations of some sort.
However, garbage can be decreased but not eliminated by:
Pooling and re-using some object references.
Allocating large blocks of data off-heap and managing them manually.

You can't avoid making garbage in java, you can reduce it though. Good and efficient code usually doesn't leave any variables without use. The one way you can avoid making garbage is by paying attention to what you leave unused.

Java NIO windows implementation

While working on a project using the the NIO.2 AIO features I looked in the "old" NIO selector implementation and saw that on windows the default select-function is used which does not scale at all on windows due to a bad internal implementation. Everybody knows that on windows IOCP is the only real solution. Of course the callback-on-completion model does not fit into the NIO selector model but does this effectively mean that using NIO on windows is basically not a good idea ?
For instance: The new AIO features include an IOCP implementation.
This is especially true while using the latest Netty framework where support for AIO has been dropped. So Netty is not as fast on Windows as it could be ?

NIO.2 uses IOCP
The call tree below demonstrates this for file i/o by featuring "Iocp" in several of the called class names, is from Java 7: NIO.2 File Channels on the test bench.
See also sun.nio.ch.Iocp.java, the "Windows implementation of AsynchronousChannelGroup encapsulating an I/O completion port".
NIO does not make use of IOCP, as it only supports "non-blocking i/o" (selectors), not "asynchronous i/o" (completion handlers) that was only added with NIO.2.

I think you are confusing asynchronous with faster. Certainly NIO buffers are faster than serializing the same data that would be in the buffers, but many AIO techniques incur costs and delays that can give synchronous IO an advantage.
There was an article a while back that did some pretty good benchmarking of various IO techniques, and the results were (a bit) surprising. The Netty people probably decided to align with the better performing (blocking) IO models mentioned.

The problem with IOCP and Java is that IOCP creates and manages threads. My understanding is that for IOCP to work in Java, the event system actually has to go through the Windows IOCP Thread, then scheduled on the Java ThreadPool for execution. This makes IOCP very very expensive to implement in Java versus C++/C#.
AIO was probably removed from Netty because no one wants to sacrifice 450,000 potential transactions just to use AIO versus NBIO. The transactional performance gap between AIO and NBIO is huge.

Kernel bypass with Java 7?

I read in a couple of comments on SO Java 7 supports kernel bypass. However, when googling the topic I did not see any immediate examples of this.
Does anyone have an example of Java 7 performing kernel bypass? I'd be interested to see it

The Answers to this related Question mention that SolarFlare has Java bindings: Networking with Kernel Bypass in Java.
As far as Java 7 is concerned, there is no support for this kind of thing in the core libraries. Kernel bypass is too system / vendor specific for inclusion in the standard APIs.
You can do other things to improve network throughput in Java that don't involve kernel bypass. For instance using the NIO Buffer and Channel APIs ... However, your typical Java "framework" tends to get in the way of this ... by only exposing Stream / Reader and other high level I/O abstractions to "application" code.
(I would also opine that if you have an application where network latency and throughput are critical enough for kernel bypass to be worthwhile, you should use a programming language that is "closer to the metal". Java is better for applications where the biggest problem is application complexity ... NOT moving lots of bits through the network fast.)

Take a look at the Onload Extensions API JNI Wrapper on github. The author seems to specialize in kernel bypass.

Kernel bypassing is a method of avoiding the kernel when reading/writing to external data sources, e.g. files or networking.
Instead, you directly access the data storage without letting all the bytes running through the OS kernel. This is usually faster but also less secure, since the entire process is not supervised by the operating system anymore.
Assumption:
In regard to Java, the kernel (could) represent(s) the JVM.
I have found a very good article on this.

Are Threads in Java platform dependent?

It is obvious that OS scheduling/ threading algorithms have their impact on Java threads but
can we safely say that Threads are OS/machine dependant?
If this is the case then doesn't it make Java platform dependant?

Yes, the details of the scheduling of threads in Java depends on the JVM implementation and (usually) on the OS implementation as well.
But the specifics of that scheduling is also not specified in the Java SE specification, only a selected few ground rules are specified.
This means that as long as the OS specific scheduling is conforming to those ground rules, it is also conforming to the JVM spec.
If your code depends on scheduling specifics that are not specified in the JVM spec, then it depends on implementation details and can not be expected to work everywhere.
That's pretty much the same situation as file I/O: if you hard-code paths and use a fixed directory separator, then you're working outside the spec and can not expect your code to work cross-platform.
Edit: The JVM implementation itself (i.e. the JRE) is platform dependent, of course. It provides the layer that allows pure Java programs to not care about the platform specifics. To achieve this, the JRE has to be paltform specific.

... Java will usually use native threads, but on some operating
systems it uses so called "green threads", which the JVM handles
itself and is executed by a single native thread.
You shouldn't have to worry about this. It is all handled by the JVM,
and is invisible to the programmer. The only real difference I can
think of is that on an implementation that uses green threads, there
will be no performance gain from multi-threaded divide-and-conquer
algorithms. However, the same lack of performance gain is true for
implementations that use native threads, but run on a machine with a
single core.
Excerpt from JVM & Java Threads Scheduling

Even on the same platform, if you write unsafe multi-thread code, behavior can depend on the full configuration details, the rest of the machine load, and a lot of luck, as well as hardware and OS. An unsafe program can work apparently correctly one day, and fail the next on the same hardware with more-or-less the same workload.
If you write safe multi-thread code, code that depends only on what is promised in the Java Language Specification and the library APIs, the choice of platform can, of course, affect performance, but not whether it works functionally.

Any concept of shared memory in Java

AFAIK, memory in Java is based on heap from which the memory is allotted to objects dynamically and there is no concept of shared memory.
If there is no concept of shared memory, then the communication between Java programs should be time consuming. In C where inter-process communication is quicker via shared memory compared to other modes of communication.
Correct me if I'm wrong. Also what is the quickest way for 2 Java progs to talk to each other.

A few ways:
RAM Drive
Apache APR
OpenHFT Chronicle Core
Details here and here with some performance measurements.

Since there is no official API to create a shared memory segment, you need to resort to a helper library/DDL and JNI to use shared memory to have two Java processes talk to each other.
In practice, this is rarely an issue since Java supports threads, so you can have two "programs" run in the same Java VM. Those will share the same heap, so communication will be instantaneous. Plus you can't get errors because of problems with the shared memory segment.

Java Chronicle is worth looking at; both Chronicle-Queue and Chronicle-Map use shared memory.
These are some tests that I had done a while ago comparing various off-heap and on-heap options.

One thing to look at is using memory-mapped files, using Java NIO's FileChannel class or similar (see the map() method). We've used this very successfully to communicate (in our case one-way) between a Java process and a C native one on the same machine.
I'll admit I'm no filesystem expert (luckily we do have one on staff!) but the performance for us is absolutely blazingly fast -- effectively you're treating a section of the page cache as a file and reading + writing to it directly without the overhead of system calls. I'm not sure about the guarantees and coherency -- there are methods in Java to force changes to be written to the file, which implies that they are (sometimes? typically? usually? normally? not sure) written to the actual underlying file (somewhat? very? extremely?) lazily, meaning that some proportion of the time it's basically just a shared memory segment.
In theory, as I understand it, memory-mapped files CAN actually be backed by a shared memory segment (they're just file handles, I think) but I'm not aware of a way to do so in Java without JNI.

Shared memory is sometimes quick. Sometimes its not - it hurts CPU caches and synchronization is often a pain (and should it rely upon mutexes and such, can be a major performance penalty).
Barrelfish is an operating system that demonstrates that IPC using message passing is actually faster than shared memory as the number of cores increases (on conventional X86 architectures as well as the more exotic NUMA NUCA stuff you'd guess it was targeting).
So your assumption that shared memory is fast needs testing for your particular scenario and on your target hardware. Its not a generic sound assumption these days!

There's a couple of comparable technologies I can think of:
A few years back there was a technology called JavaSpaces but that never really seemed to take hold, a shame if you ask me.
Nowadays there are the distributed cache technologies, things like Coherence and Tangosol.
Unfortunately neither will have the out right speed of shared memory, but they do deal with the issues of concurrent access, etc.

The easiest way to do that is to have two processes instantiate the same memory-mapped file. In practice they will be sharing the same off-heap memory space. You can grab the physical address of this memory and use sun.misc.Unsafe to write/read primitives. It supports concurrency through the putXXXVolatile/getXXXVolatile methods. Take a look on CoralQueue which offers IPC easily as well as inter-thread communication inside the same JVM.
Disclaimer: I am one of the developers of CoralQueue.

Similar to Peter Lawrey's Java Chronicle, you can try Jocket.
It also uses a MappedByteBuffer but does not persist any data and is meant to be used as a drop-in replacement to Socket / ServerSocket.
Roundtrip latency for a 1kB ping-pong is around a half-microsecond.

MappedBus (http://github.com/caplogic/mappedbus) is a library I've added on github which enable IPC between multiple (more than two) Java processes/JVMs by message passing.
The transport can be either a memory mapped file or shared memory. To use it with shared memory simply follow the examples on the github page but point the readers/writers to a file under "/dev/shm/".
It's open source and the implementation is fully explained on the github page.

The information provided by Cowan is correct. However, even shared memory won't always appear to be identical in multiple threads (and/or processes) at the same time. The key underlying reason is the Java memory model (which is built on the hardware memory model). See Can multiple threads see writes on a direct mapped ByteBuffer in Java? for a quite useful discussion of the subject.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Allocation free implementation of AsynchronousByteChannel (Java NIO.2) - java

Related

How is it possible to avoid making garbage in Java?

Java NIO windows implementation

Kernel bypass with Java 7?

Are Threads in Java platform dependent?

Any concept of shared memory in Java

Categories

Resources