Wouldn't each thread require its own copy of the JVM? - java

I'm trying to figure out how the JVM works with regard to spawning multiple threads. I think my mental model may be a little off, but right now I am stuck on grokking this idea: since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM? I realize that the multiple threads of a java application are mapped to native os threads, but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM? thanks, any help appreciated.

but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM?
http://www.artima.com/insidejvm/ed2/jvmP.html explains this well. Here is what it says:
"Each thread of a running Java application is a distinct instance of the virtual machine's execution engine. From the beginning of its lifetime to the end, a thread is either executing bytecodes or native methods. A thread may execute bytecodes directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time compiling and executing the resulting native code. A Java virtual machine implementation may use other threads invisible to the running application, such as a thread that performs garbage collection. Such threads need not be "instances" of the implementation's execution engine. All threads that belong to the running application, however, are execution engines in action."
Summarizing my understanding of this:
For every thread (execpt GC thread and ilk), corresponding ExecutionEngine instance (in the same JVM) converts bytecodes to machine instructions and native OS thread executes those machine instructions. Of course, I am not talking about green thread here.

This is a bit oversimplified and some of what I wrote isn't strictly correct, but the essence is this:
since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM?
Not really. You could permit multiple threads to read from one piece of memory (as in same address in memory) and thus have only one JVM. However, you need to be careful so that threads don't create a mess when they access such shared resources (JVM) concurrently, as it is the case in the real world (imagine two people trying to type two different documents at the same time with one PC).
One strategy of having multiple threads working okay together with some shared resource (such as JVM (stack, heap, byte code compiler), console, printer etc.) is indeed having copies for each threads (one PC for each person). For example, each threads has its own stack.
This is however not the only way. For example, immutable resources (like class byte codes in memory) can be shared among multiple threads without problem through shared memory. If a piece of memo doesn't change, two people can both safely look at that memo at the same time. Similarly, because class byte code doesn't change, multiple threads can read them at the same time from one copy.
Another way is to use a lock to sort things out between the threads (whoever is touching the mouse get's to use the PC). For example, you could imagine a JVM in which there is only one byte code interpreter which is shared among all threads and is protected by one global lock (which would be very inefficient in practice, but you get the idea).
There are also some other advanced mechanism to let multiple threads work with shared resources. People who developed the JVM used these techniques and that's why you don't need a copy of JVM per thread.

By definitions threads in a Java application share the same memory space, therefore are executing within the same JVM. This way you can easily share objects across multiple threads, perform synchronization and such, all that is happening within the JVM.
One way to see it is that processes have their own memory space, while threads within an application share the same memory space.

Related

Thread execution on single and multi core

This is what I see in Oracle documentation and would like to confirm my understanding (source):
A computer system normally has many active processes and threads. This
is true even in systems that only have a single execution core, and
thus only have one thread actually executing at any given moment.
Processing time for a single core is shared among processes and
threads through an OS feature called time slicing.
Does it mean that in a single core machine only one thread can be executed at given moment?
And, does it mean that on multi core machine multiple threads can be executed at given moment?
one thread actually executing at any given moment
Imagine that this is game where 10 people try to sit on 9 chairs in a circle (I think you might know the game) - there isn't enough chairs for every one, but the entire group of people is moving, always. It's just that everyone sits on the chair for some amount of time (very simplified version of time slicing).
Thus multiple processes can run on the same core.
But even if you have multiple processors, it does not mean that a certain thread will run only on that processor during it's entire lifetime. There are tools to achieve that (even in java) and it's called thread affinity, where you would pin a thread only to some processor (this is quite handy in some situations). That thread can be moved (scheduled by the OS) to run on a different core, while running, this is called context switching and for some applications this switching to a different CPU is sometimes un-wanted.
At the same time, of course, multiple threads can run in parallel on different cores.
Does it mean that in a single core machine only one thread can be executed at given moment?
Nope, you can easily have more threads than processors assuming they're not doing CPU-bound work. For example, if you have two threads mostly waiting on IO (either from network or local storage) and another thread consuming the data fetched by the first two threads, you could certainly run that on a machine with a single core and obtain better performance than with a single thread.
And, does it mean that on multi core machine multiple threads can be executed at given moment?
Well yeah you can execute any number of threads on any number of cores, provided that you have enough memory to allocate a stack for each of them. Obviously if each thread makes intensive use of the CPU it will stop being efficient when the number of threads exceeds the number of cores.

What kind of resources does a process or a thread require in JAVA

I came across the Java documentations that said
Both processes and threads provide an execution environment, but
creating a new thread requires fewer resources than creating a new
process.
Ref:https://docs.oracle.com/javase/tutorial/essential/concurrency/procthread.html
In this context what actually do we mean by resources?
EDIT1:
Also why is Runnable faster than Threads?
What are the generic resources?
What is the difference in resources both are using?
Spawning a new process will create a new Java Virtual Machine.
Where as threads will share memory, JVM's, etc.
The JVM is not a light program, so will consume more memory, etc.
Some JVM's are multi-process, allowing multiple processes to share a JVM.
From the linked tutorial in the question:
https://docs.oracle.com/javase/tutorial/essential/concurrency/procthread.html
A process generally has a complete, private set of basic run-time
resources; in particular, each process has its own memory space.
and
Threads share the process's resources, including memory and open files. This makes for efficient, but potentially problematic, communication.
To address EDIT 1.
First lets define some general computing terms.
Operating System Concepts
Resources
From https://en.wikipedia.org/wiki/System_resource
In computing, a system resource, or simply resource, is any physical
or virtual component of limited availability within a computer system.
Every device connected to a computer system is a resource. Every
internal system component is a resource. Virtual system resources
include files (concretely file handles), network connections
(concretely network sockets), and memory areas
Process
https://en.wikipedia.org/wiki/Process_(computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Thread
https://en.wikipedia.org/wiki/Thread_(computing)
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.
Java Concepts
Process
In relation to Java, a Process typically runs a separate JVM, different heaps, etc.
Threads
Threads share a JVM, and are able to access the same classes and memory, but since they are a concept outside of Java, relating to the Operating System, there are overheads for interacting / creating them.
Runnable - https://docs.oracle.com/javase/7/docs/api/java/lang/Runnable.html
A Runnable is a concept included only within Java that the OS is not aware of, it's literally just an interface with a method called run, however you need to handle running it yourself.
The reason for abstracting it away from threads, is that the classes involving threads themselves have to concern themselves with compatibility with the underlying operating system bindings, your runnable doesn't need to know any of this, it's just code that's expected to run in a Java context.
It's really just a marker to show others that you plan for this to be run by a thread, or some other form of scheduled execution.
Where as Threads are external concepts, managed by the operating system, thus have costs relating to memory, context switching, etc.
Processes are even more costly, and have separate program memory that is not shared.
Everything related with the environment, like cpu,memory, disk, network, etc.

Multiple Processes under one JVM

Can we run multiple processes in one JVM? And each process should have its own memory quota?
My aim is to start new process when a new http request comes in and assign a separate memory to the process so that each user request has its own memory quota - and doesn't bother other user requests if one's memory quota gets full.
How can I achieve this?
Not sure if this is hypothetical.
Short answer: not really.
The Java platform offers you two options:
Threads. And that is the typical answer in many cases: each new incoming request is dealt with by a separate thread (which is probably coming out of a pool to limit the overall number of thread instances that get created/used in parallel). But of course: threads exist in the same process; there is no such thing as controlling the memory consumption "associated" by what a thread is doing.
Child processes. You can create a real process and use that to run whatever you intend to run. But of course: then you have an external real process to deal with.
So, in essence, the real answer is: no, you can't apply this idea to Java. The "more" Java solution would be to look into concepts such as application servers, for example Tomcat or WebSphere.
Or, if you insist on doing things manually; you could build your own "load balancer"; where you have one client-facing JVM; which simply "forwards" requests to one of many other JVMs; and those "other" JVMs would work independently; each running in its own process; which of course you could then "micro manage" regarding CPU/memory/... usage.
The closest concept is Application Isolation API (JSR-121) that AFAIK has not been implemented: See https://en.wikipedia.org/wiki/Application_Isolation_API.
"The Application Isolation API (JSR 121) provides a specification for isolating and controlling Java application life cycles within a single Java Virtual Machine (JVM) or between multiple JVMs. An isolated computation is described as an Isolate that can communicate and exchange resource handles (e.g. open files) with other Isolates through a messaging facility."
See also https://www.flux.utah.edu/janos/jsr121-internal-review/java/lang/isolate/package-summary.html:
"Informally, isolates are a construct midway between threads and JVMs. Like threads, they can be used to initiate concurrent execution. Like JVMs, they cause execution of a "main" method of a given class to proceed within its own system-level context, independently of any other Java programs that may be running. Thus, isolates differ from threads by guaranteeing lack of interference due to sharing statics or per-application run-time objects (such as the AWT thread and shutdown hooks), and they differ from JVMs by providing an API to create, start, terminate, monitor, and communicate with these independent activities."

Every jvm created for each application, is this a thread or a process

A new JVM instance is allocated to every application that user start using jre. Does this JVM a new process or thread ? and Why ?
Does this JVM a new process or thread ?
A process.
Why?
a) Because that is that way that "modern" operating systems work ...
b) Because if JVMs were threads (within a larger process) then different JVMs would be able to interfere with each other ways that would be impossible to entirely control.
c) Because attempting to address b) would be difficult and would most likely have significant performance implications.
If JVM is a thread, then how JVM can manage all this I/O control, thread control and controlling the application run under JVM (who should start JVM?).
Threads don't have separate address space, run in a shared memory space. Threads are designed for doing small tasks and loading it with heavy task leads to an unhanded situation (from OS perspective). Threads can communicate easily whereas IPC is quite resource intensive. We are installing software everyday, we are creating process.
JVM is equivalent to an Operating System process.JVM is Java Virtual Machine.it is a memory space where classes are loaded and objects are shared.
It is a process....

Java Threads vs OS Threads

Looks like I have messed up with Java Threads/OS Threads and Interpreted language.
Before I begin, I do understand that Green Threads are Java Threads where the threading is taken care of by the JVM and the entire Java process runs only as a single OS Thread. Thereby on a multi processor system it is useless.
Now my questions is. I have two Threads A and B. Each with 100 thousand lines of independent code. I run these threads in my Java Program on a multiprocessor system. Each Thread will be given a native OS Thread to RUN which can run on a different CPU but since Java is interpreted these threads will require to interact with the JVM again and again to convert the byte code to machine instructions ? Am I right ? If yes, than for smaller programs Java Threads wont be a big advantage ?
Once the Hotspot compiles both these execution paths both can be as good as native Threads ? Am I right ?
[EDIT] : An alternate question can be, assume you have a single Java Thread whose code is not JIT compiled, you create that Thread and start() it ? How does the OS Thread and JVM interact to run that Bytecode ?
thanks
Each Thread will be given a native OS
Thread to RUN which can run on a
different CPU but since Java is
interpreted these threads will require
to interact with the JVM again and
again to convert the byte code to
machine instructions ? Am I right ?
You are mixing two different things; JIT done by the VM and the threading support offered by the VM. Deep down inside, everything you do translates to some sort of native code. A byte-code instruction which uses thread is no different than a JIT'ed code which accesses threads.
If yes, than for smaller programs Java
Threads wont be a big advantage ?
Define small here. For short lived processes, yes, threading doesn't make that big a difference since your sequential execution is fast enough. Note that this again depends on the problem being solved. For UI toolkits, no matter how small the application, some sort of threading/asynchronous execution is required to keep the UI responsive.
Threading also makes sense when you have things which can be run in parallel. A typical example would be doing heavy IO in on thread and computation in another. You really wouldn't want to block your processing just because your main thread is blocked doing IO.
Once the Hotspot compiles both these
execution paths both can be as good as
native Threads ? Am I right ?
See my first point.
Threading really isn't a silver bullet, esp when it comes to the common misconception of "use threads to make this code go faster". A bit of reading and experience will be your best bet. Can I recommend getting a copy of this awesome book? :-)
#Sanjay: Infact now I can reframe my
question. If I have a Thread whose
code has not been JIT'd how does the
OS Thread execute it ?
Again I'll say it, threading is a completely different concept from JIT. Let's try to look at the execution of a program in simple terms:
java pkg.MyClass -> VM locates method
to be run -> Start executing the
byte-code for method line by line ->
convert each byte-code instruction to
its native counterpart -> instruction
executed by OS -> instruction executed
by machine
When JIT has kicked in:
java pkg.MyClass -> VM locates method
to be run which has been JIT'ed ->
locate the associated native code
for that method -> instruction
executed by OS -> instruction executed
by machine
As you can see, irrespective of the route you follow, the VM instruction has to be mapped to its native counterpart at some point in time. Whether that native code is stored for further re-use or thrown away if a different thing (optimization, remember?).
Hence to answer your question, whenever you write threading code, it is translated to native code and run by the OS. Whether that translation is done on the fly or looked up at that point in time is a completely different issue.
and the entire Java process runs only as a single OS Thread
This is not true. Thus not specified, we often see, that Java threads are in fact native OS threads and that multithreaded Java applications really make use of multi-core processors or multi-processor platforms.
A common recommendation is using a thread pool where the number of threads is proportional to the number of cores (factor 1-1.5). This is another hint, that the JVM is not restricted to a single OS thread / process.
From wkipedia:
In Java 1.1, green threads were the only threading model used by the JVM,[4] at least on Solaris. As green threads have some limitations compared to native threads, subsequent Java versions dropped them in favor of native threads.
Now, back in 2010 with Java 7 under development and Java 8 planned - are we really still interested in historic "green threads"??
Threading and running a byte code are separate issues. Green threads are used by JVM on platforms that do not have native support of threads. (IMHO I do not know which platform does not support threads).
Byte code is interpreted in real time and executed on native platform by JVM. JVM decides what are the most popular code fragments and performs so called Just in time compiling of these fragments, so it does not have to compile them again and again. This is independent on threading. If for example you have one thread that executes the same code fragment in loop you this fragment will be cached by just in time compiler.
Bottom line: do not worry about performance and threads. Java is strong enough to run everything you are coding.
Some Java-implementations may create
green threads like you describe it
(scheduling made by the JVM on a
single native thread), but normal
implementations of Java on PC use
multiple cores.
The JVM itself might already use different threads for the work to do (garbage collection, class loading, byte-code-verification, JIT-Compiler).
The OS runs a program called JVM. The JVM executes the Java-Bytecode. If every Java-Thread has an associated native thread (that makes sense and seems to be the case on PC-implementations), then the JVM-code in that thread executes the Java-code - JITed or interpreted - like on a single-thread-program. No difference here through multithreading.

Categories

Resources