Why can't Java IO implement async reading?

Why can't Java IO implement async reading? - java

In Java IO, we use Stream and Reader while in NIO we use Channel, Selector.
They both do the same thing, but the structure is totally different.
So why they don't write a new Stream like "AsyncStream" or a Reader like "AsyncReader" to implement what NIO have implemented. If so, we only have one structure and it's beautiful.
So why Java IO cannot implement async reading?
What are the difficulties to implement async reading using Java IO ?
Or what is the advantage of writing a new framework instead of using the existing one?

I tried to understand your idea in the comment above, but I'm afraid that there is insufficient detail to understand it as a proposal. Without that, it is impossible to make a judgment on whether it would work or not.
However, there are a couple of points that can be made in response:
IF Java was a shiny new (and unreleased) language, THEN they could improve on the I/O API design in a few areas.
In addition, IF someone came up with a well-considered, fleshed out API design that combined sync and async capabilities, THEN that could be considered.
But Java is NOT a shiny new language. It is an old programming language, with billions of lines of source code written in it. The kind of change that you are contemplating on a central API would either create huge binary compatibility problems for Oracle's paying customers, or it would lead to a huge legacy code problem. It simply would not happen.
Setting that aside, if you attempted to merge sync and async capabilities into a single API, you risk creating a situation where:
custom stream types need to implement a lot of extra functionality, and / or
the merger leads to unanticipated performance issues.
Now these concerns may not be warranted.
However, we cannot tell without seeing a concrete API design proposal and an implementation to try out for usability and performance. Bear in mind that the original elegant API design for I/O streams and (then) readers / writers actually turned out to have a variety of problems. These did not become apparent until people used the APIs in production code. For example:
character encoding concerns lead to the introduction of Reader / Writer in Java 1.1.
performance analyse identified that memory memory-to-memory copying was a problem, leading to the introduction of Buffer and so on.
In summary, it is really hard to design a good I/O API ... and Java is unlikely to change anyway.

Related

Are ReadableByteChannel and WritableByteChannel a replacement of InputStream and OutputStream?

As far as I know, ReadableByteChannel and WritableByteChannel from the nio package can be considered a replacement of InputStream and OutputStream from the legacy io package. If fact, they can be used to do the same I/O operations and more.
Nevertheless, they still seem not much used. Also, their support in popular libraries is quite poor. For example, the Guava team is even thinking about dropping it.
Is there any reason for new code dealing with I/O to use Streams with respect to Channels?

There is a historical context that should not be ignored. When NIO was introduced (back in Java 1.4), a lot of features were missing and the promise to hand them in later was not hold for a long time. Recall that in Java 1.4 the way to get a FileChannel was to first create a FileInputStream, FileOutputStream, or RandomAccessFile and invoke getChannel on it. Thus, there was no way to write code using NIO/channels without using the older IO/stream API.
The first Java version providing a way to create a FileChannel without using the old API is Java 7. This is also the first version providing an NIO API for directory scan, change notifications and advanced file attributes.
So the NIO API can be considered quite new regarding these features and it’s not that surprising that it takes some time for developers adopting it. By the way, this kind of adoption might include removing utility methods which became unnecessary with the new API like the proposal you have linked. As far as I can see it really means removing just four methods whose functionality is trivial when using the most recent API.
Obviously, when you want to use one of the newer features like memory mapping or the Java 7 features named above you have to use the NIO API. On the other hand, when you want to use Serialization or Zip/GZip (de)compression the only channel support, Java offers, is to wrap your channel into a stream…

It really depends on what you are doing. If you are dealing directly with files and/or sockets, there are many advantages to using nio. In some cases (especially around sockets) new functionality is /only/ being exposed through nio. Some of those advantages are muted if you also compress or encrypt data, where most apis only support streams. Historically the servlet-api spec was all based on streams. However, there is growing support for nio in that space as well.

Immediate performance benefits are gained by using NIO Channels with ByteBuffers. Also if you need to block on multiple things Selectors can be handy.

how to choose java nio vs io?

As we had known, If we want to use traditional IO to construct server, it must block somewhere, so we had to use loop or one thread one socket mode, So nio seem it is better choice. So I want know if the nio is better choice forever?

IMHO, Blocking IO is generally the simplest to use, and unless you have a specific requirement which demands more from your system, you should stick with simplest option.
The next simplest option is blocking NIO, which I often prefer if I want something more efficiency or control than IO. It is still relatively simple but allows you to use ByteBuffers. e.g. ByteBuffers support little endian.
A common option is to use non-blocking NIO with Selectors. Much of the complexity this introduces can be handled by frameworks such as Netty or Mina. I suggest you use such a library if you need non-blocking IO e.g. because you have thousands of concurrent connections per server. IMHO You have thousands of connections, you should consider having more servers unless what each connection does is pretty trivial. AFAIK google go for more servers rather thousands of users per server.
The more extreme option is to use NIO2. This is even more complex and lengthy it write than non-blocking NIO. I don't know of any frameworks which support this well. i.e. it is actually faster when you do. AFAIK It appears this is worth using if you have Infiniband (which is what it was designed to support) but perhaps not worth using if you have Ethernet.

If you want non-blocking IO, NIO is not the better choice—it's the only choice in Java. Keep in mind that people still use the old IO regularly because it is way simpler to code against. NIO API is quite raw and is more of an enabling low-level technology than a client-side API. I suggest using NIO through an API that provides a simpler interface to the problems you want to solve using non-blocking IO.

A little late, but personally, I use NIO even for the regular "everyday" file handling. So, I use things like:
1. if(Files.notExists(path)) { }
2. Files.createDirectory(path);
3. Files.newInputStream(path) targetPath.resolve("somefile.txt");
4. Files.newBufferedWriter(path, charset);
5. DirectoryStream<Path> directoryStream = Files.newDirectoryStream(path);
and it works fine for me. I prefer Path instead of the old File because of methods like relativize or resolveSibling.
Doesn't strike me as more complicated than IO.

You would only use NIO if you can justify the inevitable complexity that it introduces. If you do not have any guidance in terms of the expected load, and also in terms of whether your product / project has the resources to maintain the relevant code, then your should err on the side of caution and use IO.
To give my answer some weight, I have just spent three months maintaining and bug fixing an integration layer where raw Java NIO (i.e. no overarching framework was used) was used. The design was based, in essence, on client threads adding messages to a queue and a small number of worker threads performing their NIO magic and then passing replies back to client threads in an event-based manner. In retrospect, I cannot justify the original decision to use NIO, since it became a distraction that ate significant amounts of time that should have been spent on higher level business logic.

You can use any of this, unless you are going to create "super fast" server.
Of course a good approach here is to use nio, since it's new and modern way to write multi-client servers for high throughput tasks.

Some advantages of the NIO.2 API over the legacy java.io.File class for working with files:
Supports file system–dependent attributes.
Allows you to traverse a directory tree directly.
Supports symbolic links.
For specific use cases and more details, you can see this article

Traditional IO is easy and simplified code, NIO is more complicated but more flexible.
In my case i prefer use IO for small streaming and NIO for large streaming but nio is really more complex
with NIO i have to create an entire package to manage it instead io package that i directly use snippet

Find an efficient way to integrate different language libraries into one project using Python as the "glue"

I am about to get involved in a NLP-related project and I need to use various libraries. Some are in java, others in C/C++ (for tasks that require more speed) and finally some are in Python. I was thinking of using Python as the "glue" and create wrapper-classes for every task that I want to do that relies on a different language. In order to do that, the wrapper class, for example, would execute the java program and communicate with it using pipes.
My questions are:
Do you think that would work for cpu-demanding and highly repetitive tasks? Or would the overhead added by the pipe-communication be too heavy?
Is there any other (preferably simple) architecture that you would suggest?

I would simply advise not doing this.
Don't implement stuff in C/C++ "for speed". The performance benefit is not likely to be as great as you expect; e.g. compared with implementing in Java using "best practice" design and performance techniques.
Don't try and glue lots of languages together. You are setting yourself up for lots of portability issues, difficulties in debugging, and reliability issues; e.g. due to C / C++ bugs crashing the JVM. In addition, there are performance overheads in bridging between languages, and there can be unexpected bottlenecks. (For instance, you may find that your C/C++ has to be run single-threaded due to threading issues, and that you therefore can't get the benefit of Java multi-threading on a typically multi-core system.)
Instead, I advise you to look for libraries that allow you to implement the entire application in one language. If that is not possible, design it so that the different language components are different executables / processes, communicating via some kind of RPC, messaging, or whatever.

Whether or not you'd have problems communicating over pipes / sockets has nothing to do with how CPU intensive the tasks are, but how frequently you'd need to send information between the processes and how much data they need to send. Setting up threads to do your communication will have little processing overhead.
You can probably automatically wrap the C/C++ code with Python (SWIG, ctypesgen, Boost.Python), so the only glue you'll have to write yourself would then be talking to Java.
You could also do it the other way -- run the Python code in the JVM with Jython so the Python and Java code are together, then talk to the C/C++ from there.

You should take a look at Apache UIMA. It is designed exactly for this. From the project website:
The Frameworks run the components, and are available for both Java and C++. The Java Framework supports running both Java and non-Java components (using the C++ framework). The C++ framework, besides supporting annotators written in C/C++, also supports Perl, Python, and TCL annotators.
UIMA can manage pipes and annotators and is built to scale.

I would look at Jepp or JPype instead of using IPC for this. I would avoid Jython since loading the C/C++ libraries into Java would probably be harder than into CPython.

1) Do you think that would work for cpu-demanding and highly repetitive tasks? Or would the overhead added by the pipe-communication be too heavy?
Depends on your task. If this is a typical NLP app where you have a large model loaded in memory and you only communicate relatively small pieces of data (strings in, label sequences/parse trees out), it may work. Pipe communication is hard to get right, though, since there's a lot of buffering and synchronization issues you have to tackle. Python is a very good glue language, but it doesn't solve everything.
2) Is there any other (preferably simple) architecture that you would suggest?
Make your NLP components services and connect to them via REST interfaces. There are off-the-shelf tools that do this, e.g. CLAM. Pyro and SPIRO make communication between Java and Python even more direct and might be easier to use than HTTP/REST (but YMMV).
The parts that are written in C/C++ can also be integrated with CPython using Cython. Don't start implementing things in C or C++ because you think they'll be faster, though; you can also implement them in Python first, then see if you can get the desired performance with NumPy and/or Cython.

Java app & C++ app integration / communication

We have two code bases, one written in C++ (MS VS 6) and another in Java (JDK 6).
Looking for creative ways to make the two talk to each other.
More Details:
Both applications are GUI applications.
Major rewrites or translations are not an option.
Communications needs to be two-way.
Try to avoid anything involving writing files to disk.
So far the options considered are:
zero MG
RPC
CORBA
JNI
Compiling Java to native code, and then linking
Essentially, apart from the last item, this boils down to a choice between various ways to achieve interprocess communication between a Java application and a C++ application. Still open to other creative suggestions!
If you have attempted this, or something similar before please chime in with your suggestions, lessons learnt, pitfalls to avoid, etc.
Someone will no doubt point out shortly, that there is no one correct answer to this question. I thought I would tap on the collective expertise of the SO community anyway, and hope to get many excellent answers.

Well, it depends on how tightly integrated you want these applications to be and how you see them evolving in the future. If you just want to communicate data between the two of them (e.g. you want one to be able to open a file written by the other, or read a stream directly from the other), then I would say that protocol buffers are your best bet. If you want the window rendered by one of these GUI apps to actually be embedded in a panel of the other GUI app, then you probably want to use the JNI approach. With the JNI approach, you can use SWIG to automate a great deal of it, though it is dangerously magical and comes with a number of caveats (e.g. it doesn't do so well with function overloading).
I strongly recommend against CORBA, RMI, and similarly remote-procedure-call implementations, mostly because, in my experience, they tend to be very heavy-weight and consume a lot of resources. If you do want something similar to RMI, I would recommend something lighter weight where you pass messages, but not actual objects (as is the case with RMI). For example, you could use protocol buffers as your message format, and then simply serialize these back and forth across normal sockets.
Kit Ho mentioned XML or JSON, but protocol buffers are significantly more efficient than either of those formats and also have notions of backwards-compatibility built directly into the definition language.

Use Jacob ( http://sourceforge.net/projects/jacob-project ), JCom ( http://sourceforge.net/projects/jcom ), or j-Interop ( http://j-interop.org ) and use COM for communication.

Since you're using Windows, I'd suggest using DDE (Dynamic Data Exchange). There's a Java library available from Java Parts.

Dont' know how much data and what type of data you wanna transfer and communicate.
But to simplify the way, I suggest using XML or Json based on HTTP protocol.
Since there are lots of library for both applications and you won't spend too much effort to implement and understand.
More, if you have additional applications to talk with, it is not hard since both tech. are cross-languages.
correct me if i am wrong

Java NIO, to use or not to use a framework?

I'm developing a Java Based server, with NIO multiplex and I started to see a lot of frameworks... I don't understand if these frameworks makes the life easier only or has also an increment of performance ( for example netty )

No framework can increase performance of what's underneath it. In the case of NIO I've come around to the view that it already is a framework itself. I've reviewed a couple of NIO frameworks such as Mina, and indeed wrote one myself, but my own conclusion is that this is largely wasted effort, that ultimately gets in the way one way or another. All you need is a well-written select loop and the appropriate data structures.

I think the core point is that they make life easier/get you productive faster. They may be more or less performant compared to each other, or to your own code (no reason to think that if you coded it from scratch you would get better performance the first try - of course ultimately you own it so you can optimize it to death if you want and have the time).
Ultimately they are all using the Java NIO framework and classes, and the only way to outperform those is to do your own JNI - assuming you succeeded - it is hard stuff, really a specialty of its own within programming.

It depends on what you're trying to do. NIO frameworks are useful because they provide you an abstraction of NIO core action. Although, they force you to use several design patterns you may not be comfortable with.
If you think you adapt yourself to those design patterns you should probably use a framework. It will have less bugs, you will have less work to do and ultimately you won't see where all of the action happens. You just have to focus on what you are trying to achieve.
It has some additional overhead in comparison to a "domestic" solution but it is negligible.

It really depends on your level of knowledge with the java.nio API. If you're not sure on how things work then you should probably use a 3rd party API. If you know how things work and are capable of writing code without a 3rd party API then you should definitely use your own code without any strings attached. You can achieve better performance without extra things (3rd party API) going on.
I like to live by the KISS principle.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.