Multithread applications on MPP architecture

Multithread applications on MPP architecture - java

In short:
Does it worth the effort to add multithreading scalability (Vertical scalability) on an application that will run always in a MPP infrastructure such Tandem HPNS (Horizontal scalable)?
Now, let me go deeper:
I’ve seen on many places the development under MPP (Massively Parallel Processing) using Java tend to think, if it’s Java you can use all what Java provides (You know, Write once run anywhere!) in which multithreading libraries(such threads, AKKA, Thread Pools, etc.) can help a lot by speeding up the performance using parallelism.
Forgetting the fact, if it’s MPP, it is horizontal scalable, meaning if you need a faster app, you have to design it to run multiples copies of the application, each on a different processor.
On the other side we have SMP (Symmetric Multi-processing) infrastructures (here we have any windows, Linux, UNIX like environment), in these you don’t have to worry about that, since the scalability is vertical, you can have more threads in which their execution will be distributed on the different cores the OS have available (Here I do agree on using Multithread libraries).
So, having this in mind, my question is, if there is a need of creating an application that will perform a heavy load of data with a lot of validations and other requirements in which the use of parallelism will help a lot to improve the load time, but, it has to run under a MPP environment (such Tandem HPNS).
Should the developer invest time on adding Multithread libraries to add parallelism and concurrency?
Just a couple of side notes:
1) I’m not saying SMP is better or MPP is better, they are just different infrastructures; my point goes just to the use of multithread libraries on MPP environments giving the fact an application using multithread on MPP will use just one CPU of the N Cpus the Server may has.
2) I’m not saying the MPP server does not support multithread libraries, you can have multithreads running on HPNS, but even you have 20 threads, there is no real parallelism since one thread is blocking the others; unless you have the application distributed (several copies running) on different CPUs.

No I don't think it makes sense to add multithreaded scalability on application that will always run on tandem, because tandem does not provide kernel level thread so even though you write multithreaded application it will not give any benefit.
Even tandem HPNS Java provides multithreading as per Java Spec but its performance is not comparable with linux or any other OS which support kernel level threading.
Actual purpose of tandem is HA availability because of its hardware redundancy.

Related

Compile Java to behave like GO code

Would it be possible to write a Java compiler or Virtual Machine that would let you compile legacy java application that use thread and blocking system call the same way GO program are compiled.
Thus new Thread().run(); would create light weight thread and all blocking system call will instead be asynchronous Operating System call and make the light weight thread yield.
If not, what is the main reason this would be impossible!

Earlier versions of Sun's Java runtime on Solaris (and other UNIX systems) made use of a user space threading system known as "green threads". As described in the Java 1.1 for Solaris documentation:
Implementations of the many-to-one model (many user threads to one kernel thread) allow the application to create any number of threads that can execute concurrently. In a many-to-one (user-level threads) implementation, all threads activity is restricted to user space. Additionally, only one thread at a time can access the kernel, so only one schedulable entity is known to the operating system. As a result, this multithreading model provides limited concurrency and does not exploit multiprocessors. The initial implementation of Java threads on the Solaris system was many-to-one, as shown in the following figure.
This was replaced fairly early on by the use of the operating system's threading support. In the case of Solaris prior to Solaris 9, this was an M:N "many to many" system similar to Go, where the threading library schedules a number of program threads over a smaller number of kernel-level threads. On systems like Linux and newer versions of Solaris that use a 1:1 system where user threads correspond directly with kernel-level threads, this is not the case.
I don't think there has been any serious plans to move the Sun/Oracle JVM away from using the native threading libraries since that time. As history shows, it certainly would be possible for a JVM to use such a model, but it doesn't seem to have been considered a direction worth pursuing.

James Henstridge has already provided good background on Java green threads, and the efficiency problems introduced by exposing native OS threads to the programmer because their use is expensive.
There have been several university attempts to recover from this situation. Two such are JCSP from Kent and CTJ (albeit probably defunct) from Twente. Both offer easy design of concurrency in the Go style (based on Hoare's CSP). But both suffer from the poor JVM performance of coding in this way because JVM threads are expensive.
If performance is not critical, CSP is a superior way to achieve a concurrent design because it avoids the complexities of asynchronous programming. You can use JCSP in production code - I do.
There were reports that the JCSP team also had an experimental JNI-add-on to the JVM to modify the thread semantics to be much more efficient, but I've never seen that in action.
Fortunately for Go you can "have your cake and eat it". You get CSP-based happen-before simplicity, plus top performance. Yay!
Aside: an interesting Oxford University paper reported on a continuation-passing style modification for concurrent Scala programs that allows CSP to be used on the JVM. I'm hoping for further news on this at the CPA2014 conference in Oxford this August (forgive the plug!).

When to choose several processes over threads in Java?

For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?

I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).

If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.

How to force two Java threads to run on same processor/core?

I would like a solution that doesn't include critical sections or similar synchronization alternatives. I'm looking for something similar the equivalent of Fiber (user level threads) from Windows.

The OS manages what threads are processed on what core. You will need to assign the threads to a single core in the OS.
For instance. On windows, open task manager, go to the processes tab and right click on the java processes... then assign them to a specific core.
That is the best you are going to get.

To my knowledge there is no way you can achieve that.
Simply because the OS manages running threads and distributes resources according to it's scheduler.
Edit:
Since your goal is to have a "spare" core to run other processes on I'd suggest you use a thread manager and get the number of cores on the system (x) and then spawn at most x-1 threads on the specific system. That way you'll have your spare core.
The former statements still apply, you cannot specify which cores to run threads on unless you in the OS specify it. But from java, no.

Short of assigning the entire JVM to a single core, I'm not sure how you'd be able to do this. In Linux, you can use taskset:
http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
I suppose you could run your JVM within a virtualized environment (e.g., VirtualBox/VMWare instance) with one processor allocated, but I'm not sure that that gets you what you want.

I read this as asking if a Java application can control the thread affinity itself. Java does not provide any way to control this. It is treated as the business of the host operating system.
If anything can do it, the OS can, and they typically can, though the tools you use for thread pinning will be OS specific. (But if the OS is itself is virtualized, there are two levels of pinning. I don't know if that is going to work / be practical.)
There don't appear to be any relevant Hotspot JVM thread tuning options in modern JVMs.
If you were using a Rockit JVM you could choose between "native threads" (where there is a 1-1 mapping between Java and OS threads) and "thin threads" where multiple Java threads are multiplexed onto a small number of OS threads. But AFAIK, JRocket "thin threads" are only supported in 32bit mode, and they don't allow you to tune the number of OS threads used.
This is really the kind of question that you should be asking under a Sun support contract. They have people who have spent years figuring out how to get the best performance out of big Java apps.

MPI , Sungrid vs JPPF?

I have a little experience with SungridEngine and MPI (using OpenMPI).
Whats the different between these frameworks/API and JPPF?

All three of these are somehow related to parallel computing, but on pretty different levels.
The Sun Grid Engine (SGE) is a queueing system. It is usually set up by the system administrator of a big computing site, and allows users to submit long-running computing "jobs". SGE checks whether any computing nodes are unoccupied, and if they are, it starts the job on that machine, otherwise the job will have to wait in the queue until a machine becomes available. SGE mainly cares about correct distribution of the jobs. For a single user, SGE is of very limited use. SGE is often used in high performance computing to schedule the user jobs.
JPPF is a Java framework which can help an application developer to run and implement a parallel Java program. It allows a Java application to run independent parts of it on other machines in parallel. It is useful to split a computing-intensive Java application into several mostly independent parts (which are typically called "tasks"). Although I do not really know the framework, I guess that it is mostly used to distribute big business applications onto several computers.
MPI (Message Passing interface) is an API (mainly for C/FORTRAN, but bindings for other languages exist) that allows developers to write massively parallel applications. MPI is mostly intended for data-parallel applications, where all parallel jobs do the same operations, but on different data, and where the different jobs have to communicate a lot. It is used in high performance computing, where a single application may run on up to several thousands of processors for up to several days.

Java on mainframes

I work for a large corporation that runs a lot of x86 based servers on which we run JVMs.
We have experimented successfully with VMWare ESX to get better usage out of our data center. But these still consume a lot of power per processing unit.
I had a mad idea that we should resurrect mainframes, we could host either lots of JVMs or virtual machines.
Has anyone tried this? Are there any good cost-benefits?
Do you lose flexibility? E.g. we have mainframes in other parts of the company but they seem to have much more rigid usage of the machines.. lots of change control, long lead times etc

IBM makes a special Java co-processor that you should seriously consider. I would not run Java on the general engines as this may increase MPU charges for licensed software.

All this assumes you’re talking about Java on Z/OS and not running Linux VM’s on the mainframe to take advantage of the cost savings that come with fewer machines.
My thoughts on virtualization are at the end of this and it’s probably the route you want to look at but I’ll start out with Z/OS since it’s what mainframes are traditionally associated with and what I have familiarity with. I have some experience with mainframe Java.
The short answer is, it depends, but probably not. What exactly are your applications? The mainframe is a difficult environment compared to x86 servers. If you're running I/O-intensive workloads under something like Websphere, it might be worth it, assuming your mainframe is underutilized.
In my experience, Java is horribly slow on a mainframe but that’s because the system I used was set up for developer flexibility rather than performance. That just goes to prove performance tuning on the mainframe is usually much more complicated then on an average server since mainframes will be running many more workloads then a generic x86 server.
Remember that the mainframe is designed primarily for I/O throughput and can outperform any normal x86 server at that. It was not designed to do a lot of computationally intensive calculations so won’t outperform a small cluster of x86 servers if your doing a lot of math.
The change controls on mainframes are there for a good reason - if one x86 server has a problem, you reboot it. If a mainframe has a problem, every second that it’s down is costing the company money. You also have to take into account any native code your apps depend on or third party libraries that may use native code. All that code would have to be ported.
Configuration of a mainframe also takes a lot longer on average then on an x86 server. I would suggest that, if you want to seriously look into this, you make a better business case than power savings, such as tight integration with current business apps and start out small either with a proof of concept or a new application. One that is not business critical, that can be implemented to take advantage of the mainframes strengths.
IBM mainframes can also run Linux in either native mode or a virtualized environment similar to VMWare. Unless your company is the exception to the rule, your Linux instances would run as virtual machines. I haven’t had much experience with this but, if your app depends on no native code and runs under Linux, it would probably work on a mainframe running Linux. For more info about Linux on mainframes see this link.

We have extensive experience running Java under Windows, Linux and on IBM SystemI (or iSeries, or AS/400, depending on IBM's mood that year) minicomputers. It is my opinion that the mini-computer platform seems to deliver much less bang for your buck against modern multi-core x86 CPU's.
Note that Java benefits more readily from having multiple cores available than typical software today, because of it's inherently multithreaded nature - this would be even more true as you run multiple JVMs.
That said, you will typically be capable of getting many more CPU cores available with better bandwidth to access memory on a mini or mainframe, and better throughput on disk subsystems (overall) so these systems may very well scale much better as you toss more JVMs on them.

IBM allows this. Some of their mainframes can hold Java accelerator processors that run the bytecode natively for more performance. They also have DB2 accelerators, and possibly some for XML operations.
I've never gotten to play with any of them, but I'd sure love to.

Though I've been in the Industry since 1975, I'm no longer sure what a "mainframe" is. My current development machine has four 3GHZ processors in it, 8GB of RAM, and 750GB of disk space (RAID 1, so it's really double that), and two 19-inch flatscreen monitors.
That's because I'm there on a contract. The employees all have much more powerful boxes than mine.
I understand that the server machines, especially the database servers, are much faster.
Mainframe?

Depending on you workload this is worth looking at!
There are a bewildering number of options available to you just using the IBM hardware:
Its definately worth considering the add on java processors.
(these are actually not that different form the standard cpus its just they are
restricted to java jvm workloads -- and -- most importantly are excluded from
cpu based software license pricing).
You can run multple Linux VMs each ruuning thier own Java app.
You can run multiple native VMs running thier minimalist operating system
used to be called DOS but they change the name every couple of years.
The software licenses are cheaper than the main OS, but it has very limited
functionality which turns out to be an advantage if you are running self contained
applications.
You can run in the monster z/OS environment either:-
a. Within USS (Unix System Services) which is pretty much a full UNIX
OS running inside the parent z/OS.
b. Run your java app in its own started task (== unix daemon).
c. Run your app inside CICS.
(Probably not as you need to use CICS/Java API where you would
normally use Servlet/J2EE APIs so you app would require a rewrite.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.