I have a little experience with SungridEngine and MPI (using OpenMPI).
Whats the different between these frameworks/API and JPPF?
All three of these are somehow related to parallel computing, but on pretty different levels.
The Sun Grid Engine (SGE) is a queueing system. It is usually set up by the system administrator of a big computing site, and allows users to submit long-running computing "jobs". SGE checks whether any computing nodes are unoccupied, and if they are, it starts the job on that machine, otherwise the job will have to wait in the queue until a machine becomes available. SGE mainly cares about correct distribution of the jobs. For a single user, SGE is of very limited use. SGE is often used in high performance computing to schedule the user jobs.
JPPF is a Java framework which can help an application developer to run and implement a parallel Java program. It allows a Java application to run independent parts of it on other machines in parallel. It is useful to split a computing-intensive Java application into several mostly independent parts (which are typically called "tasks"). Although I do not really know the framework, I guess that it is mostly used to distribute big business applications onto several computers.
MPI (Message Passing interface) is an API (mainly for C/FORTRAN, but bindings for other languages exist) that allows developers to write massively parallel applications. MPI is mostly intended for data-parallel applications, where all parallel jobs do the same operations, but on different data, and where the different jobs have to communicate a lot. It is used in high performance computing, where a single application may run on up to several thousands of processors for up to several days.
Related
I've been digging into the depths of IBM's research on JavaSplit and cJVM because I want to run a JVM program across a cluster of 4 Raspberry Pi 3 Model B's like This.
I know nearly nothing about clusters and distributed computing, so I'm starting my dive into the depths by trying to get a Minecraft Server running across them.
My question is, is there a relatively simply way to get a Java program running on a JVM to split across a cluster without source code access?
Notes:
The main problem is that most java programs (toy program included) were not built to run across a cluster, but I'm hoping that I can find a method to hack the JVM to have it work.
I've seen some possible solutions but due to the nature of Minecraft and Java, updates come so frequently and the landscape changes that I don't even know what is possible.
As far as I know, FastCraft implements multithreading support, or it used to and it's now built in.
Purpose:
This is a both a toy program and a practical problem for me. I'm doing it to learn how clusters work, to learn more about Linux administration and distributed computing, and because it's fun. I'm not doing it to setup a minecraft server. The server is a cherry on top, but if it doesn't work out I'll shove it on a Dell tower.
MineCraft can be scaled using what is effectively a partitioning service. The tool which is usually used is BungeeCord This allows a client to connect to a service which passes the session to multiple backend servers which run largely without change. This limits the number of users which can be in one server, but between them you can have any number of servers.
I can only reiterate that such a generic solution, if one exists, is not commonly applied. There are inherent challenges to try and distribute a JVM, such as translating a shared memory execution model, where all memory access is cheap, to a distributed model, where non-local memory access is orders of magnitude more expensive, without degrading performance. This requires smart partitioning of data, and finding such partitions in an automated way is a very complex optimization problem.
In the particular example of minecraft, one would additionally have to transform a single threaded program into a multi threaded one, which is a rather complex program transformation by itself.
In a nutshell, solving the clustering problem in such generality is a research level topic, for which, to the best of my knowledge, no algorithms competitive with manual code changes currently exist. In addition, if such an algorithm were to exist, if would be very unlikely to be offered free of charge, because it would represent both a significant achievement, and could be licensed for a lot of money.
In short:
Does it worth the effort to add multithreading scalability (Vertical scalability) on an application that will run always in a MPP infrastructure such Tandem HPNS (Horizontal scalable)?
Now, let me go deeper:
I’ve seen on many places the development under MPP (Massively Parallel Processing) using Java tend to think, if it’s Java you can use all what Java provides (You know, Write once run anywhere!) in which multithreading libraries(such threads, AKKA, Thread Pools, etc.) can help a lot by speeding up the performance using parallelism.
Forgetting the fact, if it’s MPP, it is horizontal scalable, meaning if you need a faster app, you have to design it to run multiples copies of the application, each on a different processor.
On the other side we have SMP (Symmetric Multi-processing) infrastructures (here we have any windows, Linux, UNIX like environment), in these you don’t have to worry about that, since the scalability is vertical, you can have more threads in which their execution will be distributed on the different cores the OS have available (Here I do agree on using Multithread libraries).
So, having this in mind, my question is, if there is a need of creating an application that will perform a heavy load of data with a lot of validations and other requirements in which the use of parallelism will help a lot to improve the load time, but, it has to run under a MPP environment (such Tandem HPNS).
Should the developer invest time on adding Multithread libraries to add parallelism and concurrency?
Just a couple of side notes:
1) I’m not saying SMP is better or MPP is better, they are just different infrastructures; my point goes just to the use of multithread libraries on MPP environments giving the fact an application using multithread on MPP will use just one CPU of the N Cpus the Server may has.
2) I’m not saying the MPP server does not support multithread libraries, you can have multithreads running on HPNS, but even you have 20 threads, there is no real parallelism since one thread is blocking the others; unless you have the application distributed (several copies running) on different CPUs.
No I don't think it makes sense to add multithreaded scalability on application that will always run on tandem, because tandem does not provide kernel level thread so even though you write multithreaded application it will not give any benefit.
Even tandem HPNS Java provides multithreading as per Java Spec but its performance is not comparable with linux or any other OS which support kernel level threading.
Actual purpose of tandem is HA availability because of its hardware redundancy.
What are some ways to do multi-core programming on Android, is it that Async-Tasks and threads they themselves used multiple cores or do we need to call some API's to enforce that?
You don't need to use any API. The operating system and the Dalvik virtual machine schedule the execution of the threads on the available cores.
I like this blog post on the qualcomm dev site, since qualcomm is deep into the hardware cores side of things for Android: https://developer.qualcomm.com/blog/multi-threading-android-apps-multi-core-processors-part-1-2 and https://developer.qualcomm.com/blog/multi-threading-android-apps-multi-core-processors-part-2-2
TL;DR - No API's necessary, just do clean parallelization using different Android constructs available like AsyncTasks, Java Threads and IntentService and Dalvik-Linux Kernel will do the rest.
Some good parts of it relevant here:
Can my single-threaded application benefit from multiple cores? How?
Even a single-threaded application can benefit from parallel
processing on different cores. For example, if your application uses a
media server, then the media processing and your UI rendering
application logic can run on different cores at the same time. Also,
the garbage collector can run on a different core.
How can I write code that takes advantage of multiple cores?
To realize the maximum potential of the available processing power on
multi-core devices, write your application with concurrency in mind.
The application should be designed so that tasks which can be executed
in parallel are set up to run on separate threads.
Also the answers here: does single thread application utilize multi core in android? especially the one analysing perf wrt how sinngle threaded applications can utilize multicores.
I also like this from AT&T going into the specifics of Multi Core Coding in Dalvik itself. They use a Mandelbrot set image to explain which is cool with backlinks to Android dev site for multithreading.
Use multithreading to use multi cores because the android operating system and Dalvik vertual machine (DVM) manage multitasking and use multicores when in need so you do not have to use any Api . To know more about them download this pdf.
For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?
I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).
If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.
I am developing a scientific application used to perform physical simulations. The algorithms used are O(n3), so for a large set of data it takes a very long time to process. The application runs a simulation in around 17 minutes, and I have to run around 25,000 simulations. That is around one year of processing time.
The good news is that the simulations are completely independent from each other, so I can easily change the program to distribute the work among multiple computers.
There are multiple solutions I can see to implement this:
Get a multi-core computer and distribute the work among all the cores. Not enough for what I need to do.
Write an application that connects to multiple "processing" servers and distribute the load among them.
Get a cluster of cheap linux computers, and have the program treat everything as a single entity.
Option number 2 is relatively easy to implement, so I don't look so much for suggestions for how to implement this (Can be done just by writing a program that waits on a given port for the parameters, processes the values and returns the result as a serialized file). That would be a good example of Grid Computing.
However, I wonder at the possibilities of the last option, a traditional cluster. How difficult is to run a Java program in a linux grid? Will all the separate computers be treated as a single computer with multiple cores, making it thus easy to adapt the program? Is there any good pointers to resources that would allow me to get started? Or I am making this over-complicated and I am better off with option number 2?
EDIT: As extra info, I am interested on how to implement something like described in this article from Wired Magazine: Scientific replaced a supercomputer with a Playstation 3 linux cluster. Definitively number two sounds like the way to go... but the coolness factor.
EDIT 2: The calculation is very CPU-Bound. Basically there is a lot of operations on large matrixes, such as inverse and multiplication. I tried to look for better algorithms for these operations but so far I've found that the operations I need are 0(n3) (In libraries that are normally available). The data set is large (for such operations), but it is created on the client based on the input parameters.
I see now that I had a misunderstanding on how a computer cluster under linux worked. I had the assumption that it would work in such a way that it would just appear that you had all the processors in all computers available, just as if you had a computer with multiple cores, but that doesn't seem to be the case. It seems that all these supercomputers work by having nodes that execute tasks distributed by some central entity, and that there is several different libraries and software packages that allow to perform this distribution easily.
So the question really becomes, as there is no such thing as number 3, into: What is the best way to create a clustered java application?
I would very highly recommend the Java Parallel Processing Framework especially since your computations are already independant. I did a good bit of work with this undergraduate and it works very well. The work of doing the implementation is already done for you so I think this is a good way to achieve the goal in "number 2."
http://www.jppf.org/
Number 3 isn't difficult to do. It requires developing two distinct applications, the client and the supervisor. The client is pretty much what you have already, an application that runs a simulation. However, it needs altering so that it connects to the supervisor using TCP/IP or whatever and requests a set of simulation parameters. It then runs the simulation and sends the results back to the supervisor. The supervisor listens for requests from the clients and for each request, gets an unallocated simulation from a database and updates the database to indicate the item is allocated but unfinished. When the simulation is finished, the supervisor updates the database with the result. If the supervisor stores the data in an actual database (MySql, etc) then the database can be easily queried for the current state of the simulations. This should scale well up to the point where the time taken to provide the simulation data to all the clients is equal to the time required to perform the simulation.
Simplest way to distribute computing on a Linux cluster is to use MPI. I'd suggest you download and look at MPICH2. It's free. their home page is here
If your simulations are completely independent, you don't need most of the features of MPI. You might have to write a few lines of C to interface with MPI and kick off execution of your script or Java program.
You should check out Hazelcast, simplest peer2peer (no centralized server) clustering solution for Java. Try Hazelcast Distributed ExecutorService for executing your code on the cluster.
Regards,
-talip
You already suggested it, but disqualified it: Multi cores. You could go for multi core, if you had enough cores. One hot topic atm is GPGPU computing. Esp. NVIDIAs CUDA is a very priomising approach if you have many independent task which have to do the same computation. A GTX 280 delivers you 280 cores, which can compute up to 1120 - 15360 threads simultanously . A pair of them could solve your problem. If its really implementable depends on your algorithm (data flow vs. control flow), because all scalar processors operate in a SIMD fashion.
Drawback: it would be C/C++, not java
How optimized are your algorithms? Are you using native BLAS libraries? You can get about an order of magnitude performance gain by switching from naive libraries to optimized ones. Some, like ATLAS will also automatically spread the calculations over multiple CPUs on a system, so that covers bullet 1 automatically.
AFAIK clusters usually aren't treated as a single entity. They are usually treated as separate nodes and programmed with stuff like MPI and SCALAPACK to distribute the elements of matrices onto multiple nodes. This doesn't really help you all that much if your data set fits in memory on one node anyways.
Have you looked at Terracotta?
For work distribution you'll want to use the Master/Worker framework.
Ten years ago, the company I worked for looked at a similar virtualization solution, and Sun, Digital and HP all supported it at the time, but only with state-of-the-art supercomputers with hardware hotswap and the like. Since then, I heard Linux supports the type of virtualization you're looking for for solution #3, but I've never used it myself.
Java primitives and performance
However, if you do matrix calculations you'd want to do them in native code, not in Java (assuming you're using Java primitives). Especially cache misses are very costly, and interleaving in your arrays will kill performance. Non-interleaved chunks of memory in your matrices and native code will get you most of the speedup without additional hardware.