Java Threads vs OS Threads

Java Threads vs OS Threads - java

Looks like I have messed up with Java Threads/OS Threads and Interpreted language.
Before I begin, I do understand that Green Threads are Java Threads where the threading is taken care of by the JVM and the entire Java process runs only as a single OS Thread. Thereby on a multi processor system it is useless.
Now my questions is. I have two Threads A and B. Each with 100 thousand lines of independent code. I run these threads in my Java Program on a multiprocessor system. Each Thread will be given a native OS Thread to RUN which can run on a different CPU but since Java is interpreted these threads will require to interact with the JVM again and again to convert the byte code to machine instructions ? Am I right ? If yes, than for smaller programs Java Threads wont be a big advantage ?
Once the Hotspot compiles both these execution paths both can be as good as native Threads ? Am I right ?
[EDIT] : An alternate question can be, assume you have a single Java Thread whose code is not JIT compiled, you create that Thread and start() it ? How does the OS Thread and JVM interact to run that Bytecode ?
thanks

Each Thread will be given a native OS
Thread to RUN which can run on a
different CPU but since Java is
interpreted these threads will require
to interact with the JVM again and
again to convert the byte code to
machine instructions ? Am I right ?
You are mixing two different things; JIT done by the VM and the threading support offered by the VM. Deep down inside, everything you do translates to some sort of native code. A byte-code instruction which uses thread is no different than a JIT'ed code which accesses threads.
If yes, than for smaller programs Java
Threads wont be a big advantage ?
Define small here. For short lived processes, yes, threading doesn't make that big a difference since your sequential execution is fast enough. Note that this again depends on the problem being solved. For UI toolkits, no matter how small the application, some sort of threading/asynchronous execution is required to keep the UI responsive.
Threading also makes sense when you have things which can be run in parallel. A typical example would be doing heavy IO in on thread and computation in another. You really wouldn't want to block your processing just because your main thread is blocked doing IO.
Once the Hotspot compiles both these
execution paths both can be as good as
native Threads ? Am I right ?
See my first point.
Threading really isn't a silver bullet, esp when it comes to the common misconception of "use threads to make this code go faster". A bit of reading and experience will be your best bet. Can I recommend getting a copy of this awesome book? :-)
#Sanjay: Infact now I can reframe my
question. If I have a Thread whose
code has not been JIT'd how does the
OS Thread execute it ?
Again I'll say it, threading is a completely different concept from JIT. Let's try to look at the execution of a program in simple terms:
java pkg.MyClass -> VM locates method
to be run -> Start executing the
byte-code for method line by line ->
convert each byte-code instruction to
its native counterpart -> instruction
executed by OS -> instruction executed
by machine
When JIT has kicked in:
java pkg.MyClass -> VM locates method
to be run which has been JIT'ed ->
locate the associated native code
for that method -> instruction
executed by OS -> instruction executed
by machine
As you can see, irrespective of the route you follow, the VM instruction has to be mapped to its native counterpart at some point in time. Whether that native code is stored for further re-use or thrown away if a different thing (optimization, remember?).
Hence to answer your question, whenever you write threading code, it is translated to native code and run by the OS. Whether that translation is done on the fly or looked up at that point in time is a completely different issue.

and the entire Java process runs only as a single OS Thread
This is not true. Thus not specified, we often see, that Java threads are in fact native OS threads and that multithreaded Java applications really make use of multi-core processors or multi-processor platforms.
A common recommendation is using a thread pool where the number of threads is proportional to the number of cores (factor 1-1.5). This is another hint, that the JVM is not restricted to a single OS thread / process.
From wkipedia:
In Java 1.1, green threads were the only threading model used by the JVM,[4] at least on Solaris. As green threads have some limitations compared to native threads, subsequent Java versions dropped them in favor of native threads.
Now, back in 2010 with Java 7 under development and Java 8 planned - are we really still interested in historic "green threads"??

Threading and running a byte code are separate issues. Green threads are used by JVM on platforms that do not have native support of threads. (IMHO I do not know which platform does not support threads).
Byte code is interpreted in real time and executed on native platform by JVM. JVM decides what are the most popular code fragments and performs so called Just in time compiling of these fragments, so it does not have to compile them again and again. This is independent on threading. If for example you have one thread that executes the same code fragment in loop you this fragment will be cached by just in time compiler.
Bottom line: do not worry about performance and threads. Java is strong enough to run everything you are coding.

Some Java-implementations may create
green threads like you describe it
(scheduling made by the JVM on a
single native thread), but normal
implementations of Java on PC use
multiple cores.
The JVM itself might already use different threads for the work to do (garbage collection, class loading, byte-code-verification, JIT-Compiler).
The OS runs a program called JVM. The JVM executes the Java-Bytecode. If every Java-Thread has an associated native thread (that makes sense and seems to be the case on PC-implementations), then the JVM-code in that thread executes the Java-code - JITed or interpreted - like on a single-thread-program. No difference here through multithreading.

Related

Java threads are concurrent or parallel?

So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Also tell the same for the ForkJoinPool and ThreadPool (Executor Framework).
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
Parallel: On the same instant on different cores e.g. 8 threads/4 cores(hyperthreaded).
Thanks a lot in advance

Parallel is concurrent. "Concurrent" means that the effective order in which events from two or more different threads happen is undefined (not counting events like unlocking a mutex that are specifically intended to coordinate the threads.) "Parallel" means, that the threads are using more CPU resources than a single CPU core is able to provide. Threads can't run in parallel without also running concurrently.
What was the situation before 1.7 version
I don't remember what changed with 1.7, but I was using Java from its earliest days, and the language always promised that threads would run concurrently. Whether or not they also were able to run in parallel was outside of the scope of the language spec. It depended on what hardware you were running on, what operating system and version, what JVM and version, etc.

I think that the actual change that the "article" was referring to happened in Java 1.3 when the "green thread" implementation1 was replaced with "native" threads. (Source: https://en.wikipedia.org/wiki/Green_thread)
However, your distinction between Concurrent vs Parallel does not match Oracle / Sun's definitions; see Sun's Multithreaded Programming Guide: Defining Multithreading Terms.
"Parallelism: A condition that arises when at least two threads are executing simultaneously."
"Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.".
This also aligns with what the Wikipedia page on Concurrency says.
"In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems."
If you could give us a citation for the source(s) of your definitions of "Concurrent" and "Parallel", it would help us to understand whether there is a genuine dispute about the terminology ... or ... you are simply misinformed.
1 - Interesting fact: they were called "green threads" because the Sun team that developed the first Java release was called the "Green Team". Source: "Java Technology: The early years" by Jon Byous, April 2003.

So i am pretty confused. I read in an article that version 1.7 onwards java has been 'core-aware'
I think the context matters. Maybe you are talking about this Quora post? To quote:
Ever since Java 7, the JVM has been core-aware and able to access cores within a CPU. If the host has two CPUs with two cores each, then it can create four threads and dispatch them to each of the four cores.
This is not talking about the differences between concurrency theory or parallelism but rather about how the JVM interfaces with the OS and the hardware to provide thread services to the application.
What was the situation before 1.7 version, does that mean java was only concurrent back then?
Java threads have been available for some time before 1.7. Most of the concurrency stuff was greatly improved in 1.5. Again, the post seems specifically about CPUs verses cores. Applications before 1.7 could use multiple cores to run in parallel.
Now question is if I use Thread class, will the threads be parallel or concurrent assuming that its a multi-core system and tasks are fully disjoint, and lets assume only this process is running on the system?
So this part of the question seems to be addressing the academic terms "parallel" and "concurrent". #SolomonSlow sounds like they have more academic instruction around this. I've been programming threads for 30+ years starting when they were non-preemptive – back when reentrance was more about recursion than threads.
To me "concurrent" means in parallel – running concurrency on a single piece of hardware put on different cores (physical or virtual). I understand that this wikipedia page on Concurrency (computer science) disagrees with my definition.
I also understand that a threaded program may run serially depending on many factors including the application itself, the OS, the load on the server running it, etc. and there is a lot of theory behind all this.
Concurrent: Not on the same instant, on same core sequentially i.e. on mercy of Thread Schedular.
This definition I disagree with. The wikipedia page talks about the fact that 2 concurrent units can run in parallel or out of order which could mean sequentially, but it's not part of the definition.

Wouldn't each thread require its own copy of the JVM?

I'm trying to figure out how the JVM works with regard to spawning multiple threads. I think my mental model may be a little off, but right now I am stuck on grokking this idea: since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM? I realize that the multiple threads of a java application are mapped to native os threads, but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM? thanks, any help appreciated.

but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM?
http://www.artima.com/insidejvm/ed2/jvmP.html explains this well. Here is what it says:
"Each thread of a running Java application is a distinct instance of the virtual machine's execution engine. From the beginning of its lifetime to the end, a thread is either executing bytecodes or native methods. A thread may execute bytecodes directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time compiling and executing the resulting native code. A Java virtual machine implementation may use other threads invisible to the running application, such as a thread that performs garbage collection. Such threads need not be "instances" of the implementation's execution engine. All threads that belong to the running application, however, are execution engines in action."
Summarizing my understanding of this:
For every thread (execpt GC thread and ilk), corresponding ExecutionEngine instance (in the same JVM) converts bytecodes to machine instructions and native OS thread executes those machine instructions. Of course, I am not talking about green thread here.

This is a bit oversimplified and some of what I wrote isn't strictly correct, but the essence is this:
since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM?
Not really. You could permit multiple threads to read from one piece of memory (as in same address in memory) and thus have only one JVM. However, you need to be careful so that threads don't create a mess when they access such shared resources (JVM) concurrently, as it is the case in the real world (imagine two people trying to type two different documents at the same time with one PC).
One strategy of having multiple threads working okay together with some shared resource (such as JVM (stack, heap, byte code compiler), console, printer etc.) is indeed having copies for each threads (one PC for each person). For example, each threads has its own stack.
This is however not the only way. For example, immutable resources (like class byte codes in memory) can be shared among multiple threads without problem through shared memory. If a piece of memo doesn't change, two people can both safely look at that memo at the same time. Similarly, because class byte code doesn't change, multiple threads can read them at the same time from one copy.
Another way is to use a lock to sort things out between the threads (whoever is touching the mouse get's to use the PC). For example, you could imagine a JVM in which there is only one byte code interpreter which is shared among all threads and is protected by one global lock (which would be very inefficient in practice, but you get the idea).
There are also some other advanced mechanism to let multiple threads work with shared resources. People who developed the JVM used these techniques and that's why you don't need a copy of JVM per thread.

By definitions threads in a Java application share the same memory space, therefore are executing within the same JVM. This way you can easily share objects across multiple threads, perform synchronization and such, all that is happening within the JVM.
One way to see it is that processes have their own memory space, while threads within an application share the same memory space.

How can I get the cpu usage a jvm process consumes for each cpu core using java?

Actually I'm using java to monitor the cpu usage for a certain java process.Here are my questions:
First,is there a limit that a single process can only consume cpu processing time on 1 or limited cpu cores?Or it can use cpu time on each of the cpu core?
Second,if I want to monitor the cpu usage of a certain java process for each cpu core,how can I do that?
And I prefer to handle it using pure java,not native method.

To the operating system, a single thread (which I assume is what you mean by "Java process") essentially cannot use CPU on more than one "processor" (which may or may not mean a physical core-- see below) simultaneously.
Generally, whenever a given thread gets a "turn at running", Windows (and I assume other operating systems) will attempt to schedule a given thread on to the same "processor" that it last ran on.
However, the situation is complicated by hyperthreading CPUs which actually present to the operating system several "processors" for what is actually a single core physical core. In this case, it is actually the CPU itself that switches between what instruction of what thread is running on which component of the given core at any one time. (Because, e.g. the core's arithmetic unit could be performing an arithmetic instruction for Thread 1 while the load/store unit is fetching data from memory for an instruction for Thread 2, etc.)
So given the complexity of this situation, even if you can get per-core measurements, I'm not entirely sure quite what useful meaning you would attach to them
P.S. If you'll permit the plug, I don't know if this Java-focussed article on thread scheduling that I wrote a couple of years ago might be useful. I should say I wrote it before either Windows 7 or the latest Intel Core CPUs were released, and there may be one or two updates to the information that would be pertinent (in particular, I don't address the issue of variable core speeds and how that could affect scheduling).

Understanding java's native threads and the jvm

I understand that the jvm is itself an application that turns the bytecode of the java executable into native machine code, but when using native threads I have some questions that I just cannot seem to answer.
Does every thread create their own
instance of the jvm to handle their
particular execution?
If not then does the jvm have to have some way to schedule which thread it will handle next, if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?

Does every thread create their own instance of the JVM to handle their particular execution?
No. They execute in the same JVM so that (for example) they can share objects and values of static fields.
If not then does the JVM have to have some way to schedule which thread it will handle next
There are two kinds of thread implementation in Java. Native threads are mapped onto a thread abstraction which is implemented by the host OS. The OS takes care of native thread scheduling, and time slicing.
The second kind of thread is "green threads". These are implemented and managed by the JVM itself, with the JVM implementing thread scheduling. Java green thread implementations have not been supported by Sun / Oracle JVMs since Java 1.2. (See Green Threads vs Non Green Threads)
If so wouldn't this render the multi-threaded nature of Java useless since only one thread can be ran at a time?
We are talking about green threads now, and this is of historic interest (only) from the Java perspective.
Green threads have the advantage that scheduling and context switching are faster in the non-I/O case. (Based on measurements made with Java on Linux 2.2; http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.9238)
With pure green threads, N programming language threads are mapped to a single native thread. In this model you don't get true parallel execution, as you noted.
In a hybrid thread implementation, N programming language threads are mapped onto M native threads (where N > M). In this model, the in-process thread scheduler is responsible for the green thread to native thread scheduling AND you get true parallel execution (if M > 1); see https://stackoverflow.com/a/16965741/139985.
But even with the pure green threads, you still get concurrency. Control is switched to another threads a thread blocks on an I/O operation, whick acquiring a lock, and so on. Furthermore, the JVM's runtime could implement periodic thread preemption so that a CPU intensive thread doesn't monopolize the (single) core to the exclusion of other threads

Does every thread create their own instance of the jvm to handle their particular execution?
No, your application running in the JVM can have many threads that all exist within that instance of the JVM.
If not then does the jvm have to have some way to schedule which thread it will handle next...
Yes, the JVM has a thread scheduler. There are many different algorithms for thread scheduling, and which one is used is JVM-vendor dependent. (Scheduling in general is an interesting topic.)
...if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?
I'm not sure I understand this part of your question. This is kind of the point of threading. You typically have more threads than CPUs, and you want to run more than one thing at a time. Threading allows you to take full(er) advantage of your CPU by making sure it's busy processing one thread while another is waiting on I/O, or is for some other reason not busy.

A Java thread may be mapped one-to-one to a kernel thread. But this must not be so. There could be n kernel threads running m java threads, where m may be much larger than n, and n should be larger than the number of processors. The JVM itself starts the n kernel threads, and each one of them picks a java thread and runs it for a while, then switches to some other java thread. The operating system picks kernel threads and assigns them to a cpu. So there may be thread scheduling on several levels.
You may be interested to look at the GO programming language, where thousands of so called "Goroutines" are run by dozens of threads.

Java threads are mapped to native OS threads. They have little to do with the JVM itself.

How to force two Java threads to run on same processor/core?

I would like a solution that doesn't include critical sections or similar synchronization alternatives. I'm looking for something similar the equivalent of Fiber (user level threads) from Windows.

The OS manages what threads are processed on what core. You will need to assign the threads to a single core in the OS.
For instance. On windows, open task manager, go to the processes tab and right click on the java processes... then assign them to a specific core.
That is the best you are going to get.

To my knowledge there is no way you can achieve that.
Simply because the OS manages running threads and distributes resources according to it's scheduler.
Edit:
Since your goal is to have a "spare" core to run other processes on I'd suggest you use a thread manager and get the number of cores on the system (x) and then spawn at most x-1 threads on the specific system. That way you'll have your spare core.
The former statements still apply, you cannot specify which cores to run threads on unless you in the OS specify it. But from java, no.

Short of assigning the entire JVM to a single core, I'm not sure how you'd be able to do this. In Linux, you can use taskset:
http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
I suppose you could run your JVM within a virtualized environment (e.g., VirtualBox/VMWare instance) with one processor allocated, but I'm not sure that that gets you what you want.

I read this as asking if a Java application can control the thread affinity itself. Java does not provide any way to control this. It is treated as the business of the host operating system.
If anything can do it, the OS can, and they typically can, though the tools you use for thread pinning will be OS specific. (But if the OS is itself is virtualized, there are two levels of pinning. I don't know if that is going to work / be practical.)
There don't appear to be any relevant Hotspot JVM thread tuning options in modern JVMs.
If you were using a Rockit JVM you could choose between "native threads" (where there is a 1-1 mapping between Java and OS threads) and "thin threads" where multiple Java threads are multiplexed onto a small number of OS threads. But AFAIK, JRocket "thin threads" are only supported in 32bit mode, and they don't allow you to tune the number of OS threads used.
This is really the kind of question that you should be asking under a Sun support contract. They have people who have spent years figuring out how to get the best performance out of big Java apps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.