I have a project and the task of the project is implementing matrix multiplication in respective parallel distributed environment (at least on 2 computers). I want to solve my problem in JAVA. There is no problem with matrix multiplication, just don't know which technology to use for running it in parallel distributed environment. What do you suggest? Thanx :)
I've worked with Hazelcast before. Very easy and straightforward. Just be careful with parallel processing. The job needs to be big enough with a small data footprint, else you're going to be tied down by network communication.
i.e. multiplying matrices may be faster on a sinlge processor; but a hard genetic algorithm works great as each cpu can be an island as a islandic ga implementation. Network communication will be limited to emigration strategies.
good luck!
Hadoop is one of most widely used distributed computing tool. Though your computing requirement is not very intensive, its a good good tool to explore.
The Akka actor library has excellent support for remote actors which transparently handle data serialization. If you can decompose your matrix multiplication to use actors, you can then later configure your actors to run in the distributed environment quite easily using Akka.
Related
I've been digging into the depths of IBM's research on JavaSplit and cJVM because I want to run a JVM program across a cluster of 4 Raspberry Pi 3 Model B's like This.
I know nearly nothing about clusters and distributed computing, so I'm starting my dive into the depths by trying to get a Minecraft Server running across them.
My question is, is there a relatively simply way to get a Java program running on a JVM to split across a cluster without source code access?
Notes:
The main problem is that most java programs (toy program included) were not built to run across a cluster, but I'm hoping that I can find a method to hack the JVM to have it work.
I've seen some possible solutions but due to the nature of Minecraft and Java, updates come so frequently and the landscape changes that I don't even know what is possible.
As far as I know, FastCraft implements multithreading support, or it used to and it's now built in.
Purpose:
This is a both a toy program and a practical problem for me. I'm doing it to learn how clusters work, to learn more about Linux administration and distributed computing, and because it's fun. I'm not doing it to setup a minecraft server. The server is a cherry on top, but if it doesn't work out I'll shove it on a Dell tower.
MineCraft can be scaled using what is effectively a partitioning service. The tool which is usually used is BungeeCord This allows a client to connect to a service which passes the session to multiple backend servers which run largely without change. This limits the number of users which can be in one server, but between them you can have any number of servers.
I can only reiterate that such a generic solution, if one exists, is not commonly applied. There are inherent challenges to try and distribute a JVM, such as translating a shared memory execution model, where all memory access is cheap, to a distributed model, where non-local memory access is orders of magnitude more expensive, without degrading performance. This requires smart partitioning of data, and finding such partitions in an automated way is a very complex optimization problem.
In the particular example of minecraft, one would additionally have to transform a single threaded program into a multi threaded one, which is a rather complex program transformation by itself.
In a nutshell, solving the clustering problem in such generality is a research level topic, for which, to the best of my knowledge, no algorithms competitive with manual code changes currently exist. In addition, if such an algorithm were to exist, if would be very unlikely to be offered free of charge, because it would represent both a significant achievement, and could be licensed for a lot of money.
I'm new to both, but I want to understand when it's better to use one over the other.
I know that Hadoop only works on embarrassingly parallel tasks (and that MPI is pretty good for almost anything else), but I can't help but notice that developing a massively parallel program with MPI is almost trivial with the MPI_Bcast and MPI_Allreduce functions.
So can anyone tell me more about the optimal usage scenario for each (Hadoop and MPI)? Is there any time where (performance-wise) I should look to one instead of the other?
MPI and Hadoop are designed for different purposes. MPI is a relatevly simple communication middleware, suitable for use in tightly coupled stable static systems, e.g. supercomputers or dedicated computing clusters. It tries to be very light and fast on message passing and provides some options to deal with data arrays. Although it supports heterogeneous environments, it does not support failover nor fault tollerance - if one process dies or some compute node fails, this usually brings down the whole MPI job.
I've written a multi-threaded Java program to solve an embarrassingly parallel problem such that it utilizes all the free CPU cycles of on a multi-core CPU. I'd like to refactor my solution so that it can run on multiple nodes while still keeping the majority of the code I've already written.
I've used MPI with C in the past and been told that it's the "correct" way to address the issue of maximizing CPU cycles, but I'm also aware of other concurrent frameworks in Java like RMI and wonder if they are just as good.
Is there a good way to handle multi-node and multi-core concurrency in Java where the main goal is to leverage the most CPU cycles as possible out of the cluster?
Edit: I get the impression that there's no easy way to handle this stuff. I'm not surprised, but I was hoping. :)
Depends on what you are doing and your budget you might want to look into (in no particular order)
Actors especially Akka that has good remote actors, STM and supervisor style managment with a Java API
Norbert
GridGain
Terracotta
Gigaspaces
Oracle Coherence
IBM Extreme Scale
TIBCO ActiveSpaces
Also see:
java util concurrent and guava (Presentation slides focusing on util.concurrent) (EventBus)
Libraries such as javolution or JSR 166
Functional programming capable JVM languages such as Scala and Clojure has better multi core utilization than Java.
RXJava (Java, Clojure, Scala ... Reactive Extentions)
You can try Hazelcast. It has a distributed ExecutorService. This should allow you to add tasks to a service which run across any number of nodes.
JMS is a good place to start.
Also consider Apache Hadoop, it uses MapReduce and is well suited for many parallel solutions.
I have to multiply 2 (most of the times) sparse matrix.
Those matrix are pretty bit (about 10k*10k) and i've a two Xeon Quad core and just one thread for this job?
is there any fast library for multi-thread moltiplication? any other advice?
I would try Colt, from CERN. It's a bit old now, but still provides excellent libraries for what you are trying.
For parallel processing, try the newer Parallel Colt.
With due respect to Colt and Parallel Colt, they are not very fast. If you insist on using Java and expect fast numerical computations, use JBLAS. JBLAS uses ATLAS. I have compiled JBLAS to use multithreaded ATLAS - it does not do this by default. You would need to change a few configure options. However even single threaded JBLAS is faster than multithreaded Colt and Parallel Colt. I tested Colt, Parallel Colt, JAMA and JBLAS. JBLAS is the best by a country mile.
Colt and Parallel Colt are very slow. So is JAMA. The best library in Java for such things is JBLAS.
Do it on a GPU? http://www.nvidia.com/object/io_1254288141829.html
Did you look at the Java Matrix Benchmark? It compares performance between several of the most common java linear algebra packages - including a couple that use/call native code. Matrix multiplication is of course one of the things tested/compared and the latest benchmark execution was actually done a dual Quad-Core Intel Xeon machine.
What you don't see there is how these libraries perform using sparse matrices (or if they support that at all).
It's possible to get very good performance with a pure Java implementation, but if you want the best possible performance with matrices that big you have to "leave the JVM".
Yes, there are libraries for multi-threaded matrix multiplication; let Google be your friend. Though if you only have one thread multithreading may not be necessary. Why do you have only one thread on an 8-core machine ? One library to consider is the Java BLAS interface.
You're definitely taking the right approach, looking for a library rather than trying to write this yourself.
I am developing a scientific application used to perform physical simulations. The algorithms used are O(n3), so for a large set of data it takes a very long time to process. The application runs a simulation in around 17 minutes, and I have to run around 25,000 simulations. That is around one year of processing time.
The good news is that the simulations are completely independent from each other, so I can easily change the program to distribute the work among multiple computers.
There are multiple solutions I can see to implement this:
Get a multi-core computer and distribute the work among all the cores. Not enough for what I need to do.
Write an application that connects to multiple "processing" servers and distribute the load among them.
Get a cluster of cheap linux computers, and have the program treat everything as a single entity.
Option number 2 is relatively easy to implement, so I don't look so much for suggestions for how to implement this (Can be done just by writing a program that waits on a given port for the parameters, processes the values and returns the result as a serialized file). That would be a good example of Grid Computing.
However, I wonder at the possibilities of the last option, a traditional cluster. How difficult is to run a Java program in a linux grid? Will all the separate computers be treated as a single computer with multiple cores, making it thus easy to adapt the program? Is there any good pointers to resources that would allow me to get started? Or I am making this over-complicated and I am better off with option number 2?
EDIT: As extra info, I am interested on how to implement something like described in this article from Wired Magazine: Scientific replaced a supercomputer with a Playstation 3 linux cluster. Definitively number two sounds like the way to go... but the coolness factor.
EDIT 2: The calculation is very CPU-Bound. Basically there is a lot of operations on large matrixes, such as inverse and multiplication. I tried to look for better algorithms for these operations but so far I've found that the operations I need are 0(n3) (In libraries that are normally available). The data set is large (for such operations), but it is created on the client based on the input parameters.
I see now that I had a misunderstanding on how a computer cluster under linux worked. I had the assumption that it would work in such a way that it would just appear that you had all the processors in all computers available, just as if you had a computer with multiple cores, but that doesn't seem to be the case. It seems that all these supercomputers work by having nodes that execute tasks distributed by some central entity, and that there is several different libraries and software packages that allow to perform this distribution easily.
So the question really becomes, as there is no such thing as number 3, into: What is the best way to create a clustered java application?
I would very highly recommend the Java Parallel Processing Framework especially since your computations are already independant. I did a good bit of work with this undergraduate and it works very well. The work of doing the implementation is already done for you so I think this is a good way to achieve the goal in "number 2."
http://www.jppf.org/
Number 3 isn't difficult to do. It requires developing two distinct applications, the client and the supervisor. The client is pretty much what you have already, an application that runs a simulation. However, it needs altering so that it connects to the supervisor using TCP/IP or whatever and requests a set of simulation parameters. It then runs the simulation and sends the results back to the supervisor. The supervisor listens for requests from the clients and for each request, gets an unallocated simulation from a database and updates the database to indicate the item is allocated but unfinished. When the simulation is finished, the supervisor updates the database with the result. If the supervisor stores the data in an actual database (MySql, etc) then the database can be easily queried for the current state of the simulations. This should scale well up to the point where the time taken to provide the simulation data to all the clients is equal to the time required to perform the simulation.
Simplest way to distribute computing on a Linux cluster is to use MPI. I'd suggest you download and look at MPICH2. It's free. their home page is here
If your simulations are completely independent, you don't need most of the features of MPI. You might have to write a few lines of C to interface with MPI and kick off execution of your script or Java program.
You should check out Hazelcast, simplest peer2peer (no centralized server) clustering solution for Java. Try Hazelcast Distributed ExecutorService for executing your code on the cluster.
Regards,
-talip
You already suggested it, but disqualified it: Multi cores. You could go for multi core, if you had enough cores. One hot topic atm is GPGPU computing. Esp. NVIDIAs CUDA is a very priomising approach if you have many independent task which have to do the same computation. A GTX 280 delivers you 280 cores, which can compute up to 1120 - 15360 threads simultanously . A pair of them could solve your problem. If its really implementable depends on your algorithm (data flow vs. control flow), because all scalar processors operate in a SIMD fashion.
Drawback: it would be C/C++, not java
How optimized are your algorithms? Are you using native BLAS libraries? You can get about an order of magnitude performance gain by switching from naive libraries to optimized ones. Some, like ATLAS will also automatically spread the calculations over multiple CPUs on a system, so that covers bullet 1 automatically.
AFAIK clusters usually aren't treated as a single entity. They are usually treated as separate nodes and programmed with stuff like MPI and SCALAPACK to distribute the elements of matrices onto multiple nodes. This doesn't really help you all that much if your data set fits in memory on one node anyways.
Have you looked at Terracotta?
For work distribution you'll want to use the Master/Worker framework.
Ten years ago, the company I worked for looked at a similar virtualization solution, and Sun, Digital and HP all supported it at the time, but only with state-of-the-art supercomputers with hardware hotswap and the like. Since then, I heard Linux supports the type of virtualization you're looking for for solution #3, but I've never used it myself.
Java primitives and performance
However, if you do matrix calculations you'd want to do them in native code, not in Java (assuming you're using Java primitives). Especially cache misses are very costly, and interleaving in your arrays will kill performance. Non-interleaved chunks of memory in your matrices and native code will get you most of the speedup without additional hardware.