Recently, I came to know about the Java 7 fork/join framework - what I learned is that it could be useful for divide-and-conquer like problems.
My question is, does the framework guarantees executing threads on separate CPUs? Or is it event possible to instruct the threads I create using classes of concurrent package to run on separate CPUs available in my server?
It'll be built upon the standard JVM concurrency primitives, in which case they will (eventually) be scheduled onto real OS threads. You cannot guarantee that your OS scheduler is going to schedule threads onto separate CPUS, although it's quite likely in most instances.
Trying to guess what a concurrent scheduler is going to do at runtime is a really bad idea. Just assume that you will be able to make use of no more than as many CPUs as you have active threads, and don't try to second-guess the runtime behaviour unless you're trying to do a particular kind of very low-level optimisation.
At least it will do its best. The fork/join framework is designed to take advantage of multiple processors. By default ForkJoinPool is created with the number of worker threads equal to the number of processors.
Does the framework guarantee executing threads on separate CPUs?
No. No guarantees.
Or is it event possible to instruct the threads I create using classes of concurrent package to run on separate CPUs available in my server?
Not using the standard Java libraries. In theory, anything is possible (up to the limit of what the OS allows) if you are willing to dig around in the native layers of the JVM. But you will be in for a lot of unnecessary work / pain.
My advice:
You probably don't need that level of control. (IMO) it is likely that the default behaviour of the native thread scheduler is "good enough" to achieve satisfactory performance.
If you really need that level of control, you would be better off using a different programming language; i.e. one where you can interact directly with the host OS'es native thread scheduler. You may even need a different operating system ...
Related
So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Also tell the same for the ForkJoinPool and ThreadPool (Executor Framework).
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
Parallel: On the same instant on different cores e.g. 8 threads/4 cores(hyperthreaded).
Thanks a lot in advance
Parallel is concurrent. "Concurrent" means that the effective order in which events from two or more different threads happen is undefined (not counting events like unlocking a mutex that are specifically intended to coordinate the threads.) "Parallel" means, that the threads are using more CPU resources than a single CPU core is able to provide. Threads can't run in parallel without also running concurrently.
What was the situation before 1.7 version
I don't remember what changed with 1.7, but I was using Java from its earliest days, and the language always promised that threads would run concurrently. Whether or not they also were able to run in parallel was outside of the scope of the language spec. It depended on what hardware you were running on, what operating system and version, what JVM and version, etc.
I think that the actual change that the "article" was referring to happened in Java 1.3 when the "green thread" implementation1 was replaced with "native" threads. (Source: https://en.wikipedia.org/wiki/Green_thread)
However, your distinction between Concurrent vs Parallel does not match Oracle / Sun's definitions; see Sun's Multithreaded Programming Guide: Defining Multithreading Terms.
"Parallelism: A condition that arises when at least two threads are executing simultaneously."
"Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.".
This also aligns with what the Wikipedia page on Concurrency says.
"In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems."
If you could give us a citation for the source(s) of your definitions of "Concurrent" and "Parallel", it would help us to understand whether there is a genuine dispute about the terminology ... or ... you are simply misinformed.
1 - Interesting fact: they were called "green threads" because the Sun team that developed the first Java release was called the "Green Team". Source: "Java Technology: The early years" by Jon Byous, April 2003.
So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
I think the context matters. Maybe you are talking about this Quora post? To quote:
Ever since Java 7, the JVM has been core-aware and able to access cores within a CPU. If the host has two CPUs with two cores each, then it can create four threads and dispatch them to each of the four cores.
This is not talking about the differences between concurrency theory or parallelism but rather about how the JVM interfaces with the OS and the hardware to provide thread services to the application.
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Java threads have been available for some time before 1.7. Most of the concurrency stuff was greatly improved in 1.5. Again, the post seems specifically about CPUs verses cores. Applications before 1.7 could use multiple cores to run in parallel.
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
So this part of the question seems to be addressing the academic terms "parallel" and "concurrent". #SolomonSlow sounds like they have more academic instruction around this. I've been programming threads for 30+ years starting when they were non-preemptive – back when reentrance was more about recursion than threads.
To me "concurrent" means in parallel – running concurrency on a single piece of hardware put on different cores (physical or virtual). I understand that this wikipedia page on Concurrency (computer science) disagrees with my definition.
I also understand that a threaded program may run serially depending on many factors including the application itself, the OS, the load on the server running it, etc. and there is a lot of theory behind all this.
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
This definition I disagree with. The wikipedia page talks about the fact that 2 concurrent units can run in parallel or out of order which could mean sequentially, but it's not part of the definition.
What I know is after JDK 1.2 all Java Threads are created using 'Native Thread Model' which associates each Java Thread with an OS thread with the help of JNI and OS Thread library.
So from the following text I believe that all Java threads created nowadays can realize use of multi-core processors:
Multiple native threads can coexist. Therefore it is also called many-to-many model. Such characteristic of this model allows it to take complete advantage of multi-core processors and execute threads on separate individual cores concurrently.
But when I read about the introduction of Fork/Join Framework introduced in JDK 7 in JAVA The Compelete Reference :
Although the original concurrent API was impressive in its own right, it was significantly expanded by JDK 7. The most important addition was the Fork/Join Framework. The Fork/Join Framework facilitates the creation of programs that make use of multiple processors (such as those found in multicore systems). Thus, it streamlines the development of programs in which two or more pieces execute with true simultaneity (that is, true parallel execution), not just time-slicing.
It makes me question why the framework was introduced when 'Java Native Thread Model' existed since JDK 3?
Fork join framework does not replace the original low level thread API; it makes it easier to use for certain classes of problems.
The original, low-level thread API works: you can use all the CPUs and all the cores on the CPUs installed on the system. If you ever try to actually write multithreaded applications, you'll quickly realize that it is hard.
The low level thread API works well for problems where threads are largely independent, and don't have to share information between each other - in other words, embarrassingly parallel problems. Many problems however are not like this. With the low level API, it is very difficult to implement complex algorithms in a way that is safe (produces correct results and does not have unwanted effects like dead lock) and efficient (does not waste system resources).
The Java fork/join framework, an implementation on the fork/join model, was created as a high level mechanism to make it easier to apply parallel computing for divide and conquer algorithms.
I recently had a chance to listen to another discussion about the JVM. It was stated that Java Virtual Machine is well built in terms of concurrency. I could not find any satisfying answer to what I thought to be a simple question: when JVM runs multiple-threads (and therefore uses multiple virtual-cores, right?) does it make use of multiple real cores of machine's CPU?
I heard "not always" or "sometimes" as an answer; so is there any way to ensure that when we design our code to run multiple threads the JVM will use multiple cores of the CPU as well? Or the other way, what determines whether the JVM uses mutliple CPU cores or not?
I am not really able to give an example of when this would be necessary, but I find it interesting, as I know designers who prefer everything to be deterministic in their project. And what would really be the point of having multiple threads in some big applications if for real they would never be computed parallely?
Java threads, like regular threads, may be scheduled to use multiple cores. The real sticky thing about concurrent programming is that it's hard to "force" a program to use X number of cores and to have this thread run on this core, etc. In other words, to be deterministic.
It's ultimately up to the OS and in some sense the hardware. But at the end of the day the programmer should be thinking about concurrent programming abstractly and trusting that the scheduler is doing its job.
In other words, yes.
In all mainstream, modern jvms, the java threads are "real" operating system threads and therefore will be scheduled across cores just like any other OS threads.
I was just wondering whether we actually need the algorithm to be muti-threaded if it must make use of the multi-core processors or will the jvm make use of multiple core's even-though our algorithm is sequential ?
UPDATE:
Related Question:
Muti-Threaded quick or merge sort in java
I don't believe any current, production JVM implementations perform automatic multi-threading. They may use other cores for garbage collection and some other housekeeping, but if your code is expressed sequentially it's difficult to automatically parallelize it and still keep the precise semantics.
There may be some experimental/research JVMs which try to parallelize areas of code which the JIT can spot as being embarrassingly parallel, but I haven't heard of anything like that for production systems. Even if the JIT did spot this sort of thing, it would probably be less effective than designing your code for parallelism in the first place. (Writing the code sequentially, you could easily end up making design decisions which would hamper automatic parallelism unintentionally.)
Your implementation needs to be multi-threaded in order to take advantage of the multiple cores at your disposal.
Your system as a whole can use a single core per running application or service. Each running application, though, will work off a single thread/core unless implemented otherwise.
Java will not automatically split your program into threads. Currently, if you want you code to be able to run on multiple cores at once, you need to tell the computer through threads, or some other mechanism, how to split up the code into tasks and the dependencies between tasks in your program. However, other tasks can run concurrently on the other cores, so your program may still run faster on a multicore processor if you are running other things concurrently.
An easy way to make you current code parallizable is to use JOMP to parallelize for loops and processing power intensize, easily parellized parts of your code.
I dont think using multi-threaded algorithm will make use of multi-core processors effectively unless coded for effectiveness. Here is a nice article which talks about making use of multi-core processors for developers -
http://java.dzone.com/news/building-multi-core-ready-java
I would like a solution that doesn't include critical sections or similar synchronization alternatives. I'm looking for something similar the equivalent of Fiber (user level threads) from Windows.
The OS manages what threads are processed on what core. You will need to assign the threads to a single core in the OS.
For instance. On windows, open task manager, go to the processes tab and right click on the java processes... then assign them to a specific core.
That is the best you are going to get.
To my knowledge there is no way you can achieve that.
Simply because the OS manages running threads and distributes resources according to it's scheduler.
Edit:
Since your goal is to have a "spare" core to run other processes on I'd suggest you use a thread manager and get the number of cores on the system (x) and then spawn at most x-1 threads on the specific system. That way you'll have your spare core.
The former statements still apply, you cannot specify which cores to run threads on unless you in the OS specify it. But from java, no.
Short of assigning the entire JVM to a single core, I'm not sure how you'd be able to do this. In Linux, you can use taskset:
http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
I suppose you could run your JVM within a virtualized environment (e.g., VirtualBox/VMWare instance) with one processor allocated, but I'm not sure that that gets you what you want.
I read this as asking if a Java application can control the thread affinity itself. Java does not provide any way to control this. It is treated as the business of the host operating system.
If anything can do it, the OS can, and they typically can, though the tools you use for thread pinning will be OS specific. (But if the OS is itself is virtualized, there are two levels of pinning. I don't know if that is going to work / be practical.)
There don't appear to be any relevant Hotspot JVM thread tuning options in modern JVMs.
If you were using a Rockit JVM you could choose between "native threads" (where there is a 1-1 mapping between Java and OS threads) and "thin threads" where multiple Java threads are multiplexed onto a small number of OS threads. But AFAIK, JRocket "thin threads" are only supported in 32bit mode, and they don't allow you to tune the number of OS threads used.
This is really the kind of question that you should be asking under a Sun support contract. They have people who have spent years figuring out how to get the best performance out of big Java apps.