I've written a multi-threaded Java program to solve an embarrassingly parallel problem such that it utilizes all the free CPU cycles of on a multi-core CPU. I'd like to refactor my solution so that it can run on multiple nodes while still keeping the majority of the code I've already written.
I've used MPI with C in the past and been told that it's the "correct" way to address the issue of maximizing CPU cycles, but I'm also aware of other concurrent frameworks in Java like RMI and wonder if they are just as good.
Is there a good way to handle multi-node and multi-core concurrency in Java where the main goal is to leverage the most CPU cycles as possible out of the cluster?
Edit: I get the impression that there's no easy way to handle this stuff. I'm not surprised, but I was hoping. :)
Depends on what you are doing and your budget you might want to look into (in no particular order)
Actors especially Akka that has good remote actors, STM and supervisor style managment with a Java API
Norbert
GridGain
Terracotta
Gigaspaces
Oracle Coherence
IBM Extreme Scale
TIBCO ActiveSpaces
Also see:
java util concurrent and guava (Presentation slides focusing on util.concurrent) (EventBus)
Libraries such as javolution or JSR 166
Functional programming capable JVM languages such as Scala and Clojure has better multi core utilization than Java.
RXJava (Java, Clojure, Scala ... Reactive Extentions)
You can try Hazelcast. It has a distributed ExecutorService. This should allow you to add tasks to a service which run across any number of nodes.
JMS is a good place to start.
Also consider Apache Hadoop, it uses MapReduce and is well suited for many parallel solutions.
Related
What I know is after JDK 1.2 all Java Threads are created using 'Native Thread Model' which associates each Java Thread with an OS thread with the help of JNI and OS Thread library.
So from the following text I believe that all Java threads created nowadays can realize use of multi-core processors:
Multiple native threads can coexist. Therefore it is also called many-to-many model. Such characteristic of this model allows it to take complete advantage of multi-core processors and execute threads on separate individual cores concurrently.
But when I read about the introduction of Fork/Join Framework introduced in JDK 7 in JAVA The Compelete Reference :
Although the original concurrent API was impressive in its own right, it was significantly expanded by JDK 7. The most important addition was the Fork/Join Framework. The Fork/Join Framework facilitates the creation of programs that make use of multiple processors (such as those found in multicore systems). Thus, it streamlines the development of programs in which two or more pieces execute with true simultaneity (that is, true parallel execution), not just time-slicing.
It makes me question why the framework was introduced when 'Java Native Thread Model' existed since JDK 3?
Fork join framework does not replace the original low level thread API; it makes it easier to use for certain classes of problems.
The original, low-level thread API works: you can use all the CPUs and all the cores on the CPUs installed on the system. If you ever try to actually write multithreaded applications, you'll quickly realize that it is hard.
The low level thread API works well for problems where threads are largely independent, and don't have to share information between each other - in other words, embarrassingly parallel problems. Many problems however are not like this. With the low level API, it is very difficult to implement complex algorithms in a way that is safe (produces correct results and does not have unwanted effects like dead lock) and efficient (does not waste system resources).
The Java fork/join framework, an implementation on the fork/join model, was created as a high level mechanism to make it easier to apply parallel computing for divide and conquer algorithms.
I'm new to both, but I want to understand when it's better to use one over the other.
I know that Hadoop only works on embarrassingly parallel tasks (and that MPI is pretty good for almost anything else), but I can't help but notice that developing a massively parallel program with MPI is almost trivial with the MPI_Bcast and MPI_Allreduce functions.
So can anyone tell me more about the optimal usage scenario for each (Hadoop and MPI)? Is there any time where (performance-wise) I should look to one instead of the other?
MPI and Hadoop are designed for different purposes. MPI is a relatevly simple communication middleware, suitable for use in tightly coupled stable static systems, e.g. supercomputers or dedicated computing clusters. It tries to be very light and fast on message passing and provides some options to deal with data arrays. Although it supports heterogeneous environments, it does not support failover nor fault tollerance - if one process dies or some compute node fails, this usually brings down the whole MPI job.
I have a project and the task of the project is implementing matrix multiplication in respective parallel distributed environment (at least on 2 computers). I want to solve my problem in JAVA. There is no problem with matrix multiplication, just don't know which technology to use for running it in parallel distributed environment. What do you suggest? Thanx :)
I've worked with Hazelcast before. Very easy and straightforward. Just be careful with parallel processing. The job needs to be big enough with a small data footprint, else you're going to be tied down by network communication.
i.e. multiplying matrices may be faster on a sinlge processor; but a hard genetic algorithm works great as each cpu can be an island as a islandic ga implementation. Network communication will be limited to emigration strategies.
good luck!
Hadoop is one of most widely used distributed computing tool. Though your computing requirement is not very intensive, its a good good tool to explore.
The Akka actor library has excellent support for remote actors which transparently handle data serialization. If you can decompose your matrix multiplication to use actors, you can then later configure your actors to run in the distributed environment quite easily using Akka.
I am helping develop an application that needs to run several processes. I need to be able to start and stop the processes as well as monitor them. JPPF provides the ability to do management and monitoring of JPPF jobs and nodes/servers that run those jobs, but that is all across JVMs. I'm trying to weigh other solutions for management/monitoring processes that may not all be JVMs. The library I am looking for would be preferable if it can be used in Java.
I don't think this addresses your issue of running processes that are not JVMs, but you might be interested in looking at Akka library as an alternative to JPPF: http://akka.io/. It is mostly built for scala I think (not a bad thing!) but also has a java api.
I've work in embedded systems and systems programming for hardware interfaces
to date. For fun and personal knowledge, recently I've been trying to learn more about server programming after getting my hands wet with Erlang. I've been going back and thinking about servers from a C++/Java prospective, and now I wonder how scalable systems can be built with technology like C++ or Java.
I've read that due to context-switching and limited memory, a per-client thread handler isn't realistic. Usually a thread-pool is created and a mix of worker-threads and asynchronous I/O is used to handle requests. I wonder, first of all, how does one determine the thread pool size? Does one simply have to measure and find the optimal balance? Eventually as the system scales then perhaps more than one server is needed to handle requests. How are requests managed across mulitple servers handling a large client base?
I am just looking for some direction into where I might be able to read more and find answers to my questions. What area of computer science would I look into for more information in this area? Are there any design patterns for this area of computing?
Your question is too general to have a nice answer. The answer depends greatly on the context, on how much processing any one Thread does, on how rapidly requests arrive, on the CPU family being used, on the web container being used, and on many other factors.
for C++ I've used boost::asio, it's very modern C++, and quite plesant to work with. Also the C++0x network libraries will be based on ASIO's implementation, so it's valuable knowledge.
As for designs 1thread per client, doesn't work, as you've already learned. And for high performance multithreading the best number of threads seems to be CoresX2, but for servers, there is lots of IO per request, which means lots of idle waiting. And from experience, looking at Apache, MySQL, and Oracle the amount of threads is about CoresX10 for database servers, and CoresX40 for web servers, not saying these are the ideals, but they seem to be patterns of succesful systems, so if your system can be balanced to work optimally with similar numbers atleast you'll know your design isn't completely lousy.
C++ Network Programming: Mastering Complexity Using ACE and Patterns and
C++ Network Programming: Systematic Reuse with ACE and Frameworks are very good books that describe many design patterns and their use with the highly portable ACE library.
Like Lothar, we use the ACE library which contains reactor and proactor patterns for handling asynchronous events and asynchronous I/O with C++ code. We use sizable worker thread pools that grow as needed (to a configurable maximum) and shrink over time.
One of the tricks with C++ is how you are going to propagate exceptions and error situations across network boundaries (which isn't handled by the language). I know that there are ways with .NET to throw exceptions across these network boundaries.
One thing you may consider is looking into SOA (Service Oriented Architecture) for dealing with higher level distributed system issues. ACE if really for running at the bare metal of the machine.