What is about NIO problems in Scala / Java - java

While searching the web for concurrency in jvm I found questions about searching Non-blocking IO library for Scala / Java.
What is the problem about? If I want to send something to file / socket I can launch separate thread which make the job.
I know there could be problem using event based threads - because whole system could be blocked. But does it reference to JVM/ Scala?
ADDED:
Please correct me if I'm wrong:
I think that when you need to call some IO function in asynchronous way it need to go into separate process or system (heavy) thread. Am I right?
So - all the questions about solving this kind of thing in common languages goes into creating and managing separate process or threads. So the only facilitate from the language is to create some pool of threads which will be assigned to IO operations in async.
So my hypotheses is.
Sentence: Language X is better then Y because calling async IO operation dosen't block the virtual machine is false because in every language that support system threads there is possibility to manage NIO, the only difference is that language X has better support for this through builtin libraries / language features.
Is this hypothese Truth?
Can some language achieve NIO without os system support? (through processes / threads)

Scala has a bunch of tools for concurrency, and NIO has a few tools for non-blocking IO. So, it should come as no surprise that there are a lot of great libraries that help connect the dots:
Finagle
... a library for building
asynchronous RPC servers and clients
in Java, Scala, or any JVM language.
Built atop Netty, Finagle provides a
rich set of tools that are protocol
independent.
Akka is a pretty nice, featureful actors/concurrency/services package which also uses Netty for their built-in remoting functionality
Naggati2 is another one from Twitter, also built on Netty, not sure if it's being superseded by Finagle though.

Here is an interesting recent blog post that may help you: http://jim-mcbeath.blogspot.com/2011/03/java-nio-and-scala-continuations.html

Related

Find an efficient way to integrate different language libraries into one project using Python as the "glue"

I am about to get involved in a NLP-related project and I need to use various libraries. Some are in java, others in C/C++ (for tasks that require more speed) and finally some are in Python. I was thinking of using Python as the "glue" and create wrapper-classes for every task that I want to do that relies on a different language. In order to do that, the wrapper class, for example, would execute the java program and communicate with it using pipes.
My questions are:
Do you think that would work for cpu-demanding and highly repetitive tasks? Or would the overhead added by the pipe-communication be too heavy?
Is there any other (preferably simple) architecture that you would suggest?
I would simply advise not doing this.
Don't implement stuff in C/C++ "for speed". The performance benefit is not likely to be as great as you expect; e.g. compared with implementing in Java using "best practice" design and performance techniques.
Don't try and glue lots of languages together. You are setting yourself up for lots of portability issues, difficulties in debugging, and reliability issues; e.g. due to C / C++ bugs crashing the JVM. In addition, there are performance overheads in bridging between languages, and there can be unexpected bottlenecks. (For instance, you may find that your C/C++ has to be run single-threaded due to threading issues, and that you therefore can't get the benefit of Java multi-threading on a typically multi-core system.)
Instead, I advise you to look for libraries that allow you to implement the entire application in one language. If that is not possible, design it so that the different language components are different executables / processes, communicating via some kind of RPC, messaging, or whatever.
Whether or not you'd have problems communicating over pipes / sockets has nothing to do with how CPU intensive the tasks are, but how frequently you'd need to send information between the processes and how much data they need to send. Setting up threads to do your communication will have little processing overhead.
You can probably automatically wrap the C/C++ code with Python (SWIG, ctypesgen, Boost.Python), so the only glue you'll have to write yourself would then be talking to Java.
You could also do it the other way -- run the Python code in the JVM with Jython so the Python and Java code are together, then talk to the C/C++ from there.
You should take a look at Apache UIMA. It is designed exactly for this. From the project website:
The Frameworks run the components, and are available for both Java and C++. The Java Framework supports running both Java and non-Java components (using the C++ framework). The C++ framework, besides supporting annotators written in C/C++, also supports Perl, Python, and TCL annotators.
UIMA can manage pipes and annotators and is built to scale.
I would look at Jepp or JPype instead of using IPC for this. I would avoid Jython since loading the C/C++ libraries into Java would probably be harder than into CPython.
1) Do you think that would work for cpu-demanding and highly repetitive tasks? Or would the overhead added by the pipe-communication be too heavy?
Depends on your task. If this is a typical NLP app where you have a large model loaded in memory and you only communicate relatively small pieces of data (strings in, label sequences/parse trees out), it may work. Pipe communication is hard to get right, though, since there's a lot of buffering and synchronization issues you have to tackle. Python is a very good glue language, but it doesn't solve everything.
2) Is there any other (preferably simple) architecture that you would suggest?
Make your NLP components services and connect to them via REST interfaces. There are off-the-shelf tools that do this, e.g. CLAM. Pyro and SPIRO make communication between Java and Python even more direct and might be easier to use than HTTP/REST (but YMMV).
The parts that are written in C/C++ can also be integrated with CPython using Cython. Don't start implementing things in C or C++ because you think they'll be faster, though; you can also implement them in Python first, then see if you can get the desired performance with NumPy and/or Cython.

How do I isolate untrusted native code in Java?

I have a piece of C library that I don't trust (in the sense that it might crash frequently). I am calling this from a Java process.
To prevent the crash in C library bringing the whole Java app. down, I figured it will be best if I spawn a dedicated java processes for this library, and let it interface with the Java app. through socket programming or RMI. Then, if a crash happens, I can just spawn another one and continue processing.
Is ProcessBuilder the way to go? Or are there any other easier ways?
Thanks!
Yes, hosting the native code in a separate Java process is the only way to protect your application from native code.
As for easier ways, just minor implementation differences. For example, not spawning the code from your Java application and wrapping the native code in a native wrapper that is configured to auto-start. This would simplify the solution, if you have knowledge of C and sockets. In this approach, RMI wouldn't be the best choice.
Even if you wrap the native code in Java, I still wouldn't pick RMI. I have run into networking problems with Windows on WANs. I would keep the communication simple if possible. If the data is too complicated, maybe a basic serialization library. There are a few choices if you go down the XML route. It's overkill, but you could also embed an http server and web services layer. I don't know your system requirements, bu
Recovery is going to create a variety of challenges. If it stops responding, do you just spawn another process...how many times are you willing to do that... Process management from Java, leaves a lot to be desired.
I don't know of an easier way.
For the interaction between the parent and the child, i wouldn't use RMI or sockets - i'd use the child's standard input and output streams, accessible through the Process object. This is simple, efficient, and private. You can use the streams exactly as you would socket streams, although without any considerations of identity, addresses, authentication, and so on. You can write a protocol yourself, or use something like Thrift or Protocol Buffers to build a protocol from entity definitions.
If performance isn't an issue and if there is a possibility of other applications hitting your "native" service, I'd go the RESTful or some other sort of web service oriented way. As far as re-spawning on crashes are concerned, as others have mentioned, just spawn the process as a service and you should be good to go.
If your application is the only entity which would be hitting this native service, then I'd prefer to go the RMI way as opposed to the pure socket way. IMO, RMI is a natural fit for inter-process communication (where the processes are Java processes). RMI has the concept of an "activatable" remote object which would be a natural fit given your requirements (auto-spawn on crash). Also, if using RMI, your application would speak with the native process through well defined Java interfaces rather than ad-hoc protocol contracts (which can be achieved using other high level solutions like web services but a real pain when it comes to raw sockets).
BTW, JFTR, we are using this strategy with our production app and it is working out quite well, YMMV. :-)

Manage out-of-JVM agent pool and communication

I need to managed a pool of agents from my application. All are written in Java but the agents need to run in their own JVM. I wrote a proof of concept that starts the subprocesses and uses the stdout/stdin to send commands and keep-alive information. I also open a socket connection for data transfer.
I guess that some connection pooling libraries should be able to help in the management of the agents.
What about the communication between the agents and the main process ? Using TCP with XML messages (JAXB) is not really as reliable or convenient as I would like. Any suggestion for a better library to assist here ?
I could very well write what I need myself but I'm sure other people have done that way better already.
For messaging could try something like ZeroMQ, it's a messaging tool and has local transports for communicationg between processes, then you could just serialised objects between the process.
The alternative is to go back to traditionally rmi, probably the simplest.
You could try Hessian:
http://hessian.caucho.com/
or Preon:
http://preon.sourceforge.net/
I've actually found two ways that would have been of great help when I developed this:
WebSockets. I used simple sockets but then I needed to reinvent signaling to check who's sending and done sending things. I used a line-based approach but it's really ugly. WebSockets offer the message-based communication and that's great.
Hazelcast. This is a "distributed system" and offers great things like distributed executors (I schedule a message to be sent in the app server and let any available out-of-jvm agent handle it, atomically), shared and thread safe hashmaps (to keep track of who is running) etc. Many of the similar tools I had seen were either in native code (like ZeroMQ btw) or with per-CPU licenses and such. Hazelcast is community edition and can be bundled into my apps.
Actually, I had started using vert.x to handle websocket-based communication and realized it was itself using hazelcast.

Are non blocking IOs still an issue with server side Java?

The Java NIO Socket Framework supposedly hides the dirty details of non-blocking IO from developers, allowing them to build highly scalable applications, which can handle over 10000 incoming and outgoing sockets using only one thread.
Are non blocking IOs still a pain with the typical version of Java 2 SE/EE?
Is this framework still necessary and useful?
Thanks for your time.
Well, NIO creates an abstraction over some of the details, certainly. Non-blocking IO is still a pain to get your head around (at least, I find it is) but at least it's feasible. (Personally I prefer the .NET style of asynchronous IO, but that's a different matter.)
I usually use blocking IO: for most tasks, this is all I require and I wouldn't gain significantly by using non-blocking IO. In some cases (such as the one you mentioned) non-blocking IO is really the only way forward if you want to keep your thread down.
I recommend that you learn about it, play with it, and then use judgement to decide when to use it in production code. I wouldn't suggest starting to use it everywhere...
Yes, NIO is very useful. NIO is also a bit hard to work with.
Depending on your needs you could consider using frameworks that wrap NIO, like grizzly or mina. Grizzly is the networking part of glassfish appserver from sun Oracle.
Mina is a network application framework from Apache.org.
Personally I prefer grizzly but that's just me.

java.nio.channels.*

What is up with nio channels ? There were some nice talks when it was added to java but I still don't see people using it in their applications.
Is there something wrong with it, or am I just not encountering people who use it?
Any nice examples as to why I should bother using it at all ?
Thanks
You're asking about channels, but channels only make sense within the general framework of using the (relatively) new nio capabilities as a whole.
My guess is that of the many, many Java applications out in the world, not many need the capabilities of nio. The usual "business" process read streams and/or files... nothing special.
That said, the Apache folks have recently rewritten their core Java libraries ( http://hc.apache.org/ ) to use nio, and claim some impressive performance benefits in some cases.
nio also lets you do stuff like memory-mapping files, and this can allow an application to do very fast random access to the file. Again, only some special applications need this, and that's probably why you don't see a lot of it used.
Apache Mina is a great networking library and uses NIO.
Apache MINA is a network application framework which helps users develop high performance and high scalability network applications easily. It provides an abstract · event-driven · asynchronous API over various transports such as TCP/IP and UDP/IP via Java NIO.
Net4J, a signaling platform/framework, makes heavy use of NIO channels. (One part of Net4J basically provides a convenience API to NIO channels.)

Categories

Resources