Can we run multiple processes in one JVM? And each process should have its own memory quota?
My aim is to start new process when a new http request comes in and assign a separate memory to the process so that each user request has its own memory quota - and doesn't bother other user requests if one's memory quota gets full.
How can I achieve this?
Not sure if this is hypothetical.
Short answer: not really.
The Java platform offers you two options:
Threads. And that is the typical answer in many cases: each new incoming request is dealt with by a separate thread (which is probably coming out of a pool to limit the overall number of thread instances that get created/used in parallel). But of course: threads exist in the same process; there is no such thing as controlling the memory consumption "associated" by what a thread is doing.
Child processes. You can create a real process and use that to run whatever you intend to run. But of course: then you have an external real process to deal with.
So, in essence, the real answer is: no, you can't apply this idea to Java. The "more" Java solution would be to look into concepts such as application servers, for example Tomcat or WebSphere.
Or, if you insist on doing things manually; you could build your own "load balancer"; where you have one client-facing JVM; which simply "forwards" requests to one of many other JVMs; and those "other" JVMs would work independently; each running in its own process; which of course you could then "micro manage" regarding CPU/memory/... usage.
The closest concept is Application Isolation API (JSR-121) that AFAIK has not been implemented: See https://en.wikipedia.org/wiki/Application_Isolation_API.
"The Application Isolation API (JSR 121) provides a specification for isolating and controlling Java application life cycles within a single Java Virtual Machine (JVM) or between multiple JVMs. An isolated computation is described as an Isolate that can communicate and exchange resource handles (e.g. open files) with other Isolates through a messaging facility."
See also https://www.flux.utah.edu/janos/jsr121-internal-review/java/lang/isolate/package-summary.html:
"Informally, isolates are a construct midway between threads and JVMs. Like threads, they can be used to initiate concurrent execution. Like JVMs, they cause execution of a "main" method of a given class to proceed within its own system-level context, independently of any other Java programs that may be running. Thus, isolates differ from threads by guaranteeing lack of interference due to sharing statics or per-application run-time objects (such as the AWT thread and shutdown hooks), and they differ from JVMs by providing an API to create, start, terminate, monitor, and communicate with these independent activities."
Related
I came across the Java documentations that said
Both processes and threads provide an execution environment, but
creating a new thread requires fewer resources than creating a new
process.
Ref:https://docs.oracle.com/javase/tutorial/essential/concurrency/procthread.html
In this context what actually do we mean by resources?
EDIT1:
Also why is Runnable faster than Threads?
What are the generic resources?
What is the difference in resources both are using?
Spawning a new process will create a new Java Virtual Machine.
Where as threads will share memory, JVM's, etc.
The JVM is not a light program, so will consume more memory, etc.
Some JVM's are multi-process, allowing multiple processes to share a JVM.
From the linked tutorial in the question:
https://docs.oracle.com/javase/tutorial/essential/concurrency/procthread.html
A process generally has a complete, private set of basic run-time
resources; in particular, each process has its own memory space.
and
Threads share the process's resources, including memory and open files. This makes for efficient, but potentially problematic, communication.
To address EDIT 1.
First lets define some general computing terms.
Operating System Concepts
Resources
From https://en.wikipedia.org/wiki/System_resource
In computing, a system resource, or simply resource, is any physical
or virtual component of limited availability within a computer system.
Every device connected to a computer system is a resource. Every
internal system component is a resource. Virtual system resources
include files (concretely file handles), network connections
(concretely network sockets), and memory areas
Process
https://en.wikipedia.org/wiki/Process_(computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Thread
https://en.wikipedia.org/wiki/Thread_(computing)
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.
Java Concepts
Process
In relation to Java, a Process typically runs a separate JVM, different heaps, etc.
Threads
Threads share a JVM, and are able to access the same classes and memory, but since they are a concept outside of Java, relating to the Operating System, there are overheads for interacting / creating them.
Runnable - https://docs.oracle.com/javase/7/docs/api/java/lang/Runnable.html
A Runnable is a concept included only within Java that the OS is not aware of, it's literally just an interface with a method called run, however you need to handle running it yourself.
The reason for abstracting it away from threads, is that the classes involving threads themselves have to concern themselves with compatibility with the underlying operating system bindings, your runnable doesn't need to know any of this, it's just code that's expected to run in a Java context.
It's really just a marker to show others that you plan for this to be run by a thread, or some other form of scheduled execution.
Where as Threads are external concepts, managed by the operating system, thus have costs relating to memory, context switching, etc.
Processes are even more costly, and have separate program memory that is not shared.
Everything related with the environment, like cpu,memory, disk, network, etc.
I am dealing with below OutOfMemory exception in WAS 6.1.
Exception in thread "UnitHoldingsPolicySummary" java.lang.OutOfMemoryError: unable to create new native thread.
I have done a lot of research on this to prevent this. After Googling, I have found that, this happens when the Native memory gets exhausted due to creation of lots of threads concurrently.
Now, after analysing the below logs, we can figure out that, inside the application, the threads are created explicitely, which I read is a very very bad practice. (Can experts please confirm this?)
07/07/14 08:50:38:165 BST] 0000142c SystemErr R Exception in thread "xxxxxx" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:574)
at com.fp.sv.controller.business.thread.xxxxxxxxxexecute(Unknown Source)
at com.fp.sv.controller.business.thread.xxxxxxxxx.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
I am more into WAS administration and doesn't possess much knowledge on Java and thread creation in Java. Now I need to discuss this with developer, but before that I want to be 100% confirmed that my findings are correct and developers should correct the code by not explicitely creating the threads.
What all things that I need to check on application server side before blaming this on code?
On solaris, I am firing the command pmap -x 9547|grep -i stack|wc -l to check how many threads are getting created on that instance of time. I could see during the 'OutOfMemory' issue, this number is very high.
Could you please confirm whether this command is the good way to checknumber of threads currently active?
Editing the question with my latest findings
Also, when this issue happens, at the same one of the MQ queue gets piled up as WAS doesn't pick up the messages from the queue. I could see below error in the application specific logs.
Non recoverable Exception detected whilst connecting to queue manager or response queue
Underlying reason = MQJE001: Completion Code 2, Reason 2102
Can this issue related to MQ as well?Which in turn causes OutOfMemory issue?
Regards,
Rahul
There are different possibilities of implementing a threading system for a virtual machine. The two extreme forms are:
Green threads: All Java Thread instances are managed within one native OS thread. This can cause problems if a method blocks within a native invocation what makes this implementation complex. In the end, implementers need to introduce renegade threads for holding native locks to overcome such limitations.
Native threads: Each Java Thread instance is backed by a native OS thread.
For the named limitations of green threads, all modern JVM implementations, including HotSpot, choose the later implementation. This implies that the OS needs to reserve some memory for each created thread. Also, there is some runtime overhead for creating such a thread as it needs direct interaction with the underlying OS. At some point, these costs accumulate and the OS refuses the creation of new threads to prevent the stability of your overall system.
Threads should therefore be pooled for resue. Object pooling is normally considered bad practice as many programers used it to ease the JVM's garbage collector. This is not longer useful as modern garbage collectors are optimized for handling short-living objects. Today, by pooling objects you might in contrary slow down your system. However, if an object is backed by costly native resources (as a Thread), pooling is still a recommended practice. Look into the ExecutorService for a canonical way of pooling threads in Java.
In general, consider that context switches of threads are expensive. You should not create a new thread for small tasks, this will slow your application down. Rather make your application less concurrent. You only have a number of cores which can work concurrently in the first place, creating more threads than your (non-virtual) cores will not improve runtime performance. Are you implementing some sort of divide-and-conquer algorithm? Look into Java's ForkJoinPool.
Yes, it's a bad practice. Normally, you don't manage threads inside a Java EE server. By "normally" I mean "while developing business applications".
According to http://www.oracle.com/technetwork/java/restrictions-142267.html:
Why is thread creation and management disallowed?
The EJB specification assigns to the EJB container the responsibility
for managing threads. Allowing enterprise bean instances to create and
manage threads would interfere with the container's ability to control
its components' lifecycle. Thread management is not a business
function, it is an implementation detail, and is typically complicated
and platform-specific. Letting the container manage threads relieves
the enterprise bean developer of dealing with threading issues.
Multithreaded applications are still possible, but control of
multithreading is located in the container, not in the enterprise
bean.
However, I don't think your logs demonstrate that threads are being created explicitly. If you want to be 100% sure, decompile the deployables and look at the code in those lines.
Also take a look at this:
"java.lang.OutOfMemoryError : unable to create new native Thread"
And this:
https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread
Concerning the number of threads used by your app, I'd try to use a monitoring tool like JConsole, or VisualVm.
I'm trying to figure out how the JVM works with regard to spawning multiple threads. I think my mental model may be a little off, but right now I am stuck on grokking this idea: since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM? I realize that the multiple threads of a java application are mapped to native os threads, but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM? thanks, any help appreciated.
but I don't get how the threads that are not running the JVM are crunching bytecode; is it that all threads somehow have access to the JVM?
http://www.artima.com/insidejvm/ed2/jvmP.html explains this well. Here is what it says:
"Each thread of a running Java application is a distinct instance of the virtual machine's execution engine. From the beginning of its lifetime to the end, a thread is either executing bytecodes or native methods. A thread may execute bytecodes directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time compiling and executing the resulting native code. A Java virtual machine implementation may use other threads invisible to the running application, such as a thread that performs garbage collection. Such threads need not be "instances" of the implementation's execution engine. All threads that belong to the running application, however, are execution engines in action."
Summarizing my understanding of this:
For every thread (execpt GC thread and ilk), corresponding ExecutionEngine instance (in the same JVM) converts bytecodes to machine instructions and native OS thread executes those machine instructions. Of course, I am not talking about green thread here.
This is a bit oversimplified and some of what I wrote isn't strictly correct, but the essence is this:
since there is only one copy of the JVM running at any time, wouldn't each thread require its own copy of the JVM?
Not really. You could permit multiple threads to read from one piece of memory (as in same address in memory) and thus have only one JVM. However, you need to be careful so that threads don't create a mess when they access such shared resources (JVM) concurrently, as it is the case in the real world (imagine two people trying to type two different documents at the same time with one PC).
One strategy of having multiple threads working okay together with some shared resource (such as JVM (stack, heap, byte code compiler), console, printer etc.) is indeed having copies for each threads (one PC for each person). For example, each threads has its own stack.
This is however not the only way. For example, immutable resources (like class byte codes in memory) can be shared among multiple threads without problem through shared memory. If a piece of memo doesn't change, two people can both safely look at that memo at the same time. Similarly, because class byte code doesn't change, multiple threads can read them at the same time from one copy.
Another way is to use a lock to sort things out between the threads (whoever is touching the mouse get's to use the PC). For example, you could imagine a JVM in which there is only one byte code interpreter which is shared among all threads and is protected by one global lock (which would be very inefficient in practice, but you get the idea).
There are also some other advanced mechanism to let multiple threads work with shared resources. People who developed the JVM used these techniques and that's why you don't need a copy of JVM per thread.
By definitions threads in a Java application share the same memory space, therefore are executing within the same JVM. This way you can easily share objects across multiple threads, perform synchronization and such, all that is happening within the JVM.
One way to see it is that processes have their own memory space, while threads within an application share the same memory space.
For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?
I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).
If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.
What is the difference between a Thread and a Process in the Java context?
How is inter-Process communication and inter-Thread communication achieved in Java?
Please point me at some real life examples.
The fundamental difference is that threads live in the same address spaces, but processes live in the different address spaces. This means that inter-thread communication is about passing references to objects, and changing shared objects, but processes is about passing serialized copies of objects.
In practice, Java interthread communication can be implemented as plain Java method calls on shared object with appropriate synchronization thrown in. Alternatively, you can use the new concurrency classes to hide some of the nitty-gritty (and error prone) synchronization issues.
By contrast, Java interprocess communication is based at the lowest level on turning state, requests, etc into sequences of bytes that can be sent as messages or as a stream to another Java process. You can do this work yourself, or you can use a variety of "middleware" technologies of various levels of complexity to abstract away the implementation details. Technologies that may be used include, Java object serialization, XML, JSON, RMI, CORBA, SOAP / "web services", message queing, and so on.
At a practical level, interthread communication is many orders of magnitude faster than interprocess communication, and allows you to do many things a lot more simply. But the downside is that everything has to live in the same JVM, so there are potential scalability issues, security issues, robustness issues and so on.
A thread can access memory inside a process, even memory that could be manipulated by another thread within the same process. Since all threads are internal to the same running process, they can communicate more quickly (because they don't need the operating system to referee).
A process cannot access memory inside another process, although you can communicate between processes through various means like:
Network packages.
Files
Pipes
Shared Memory
Semaphores
Corba messages
RPC calls
The important thing to remember with process to process communication is that the communication must be managed through the operating system, and like all things which require a middle man, that adds overhead.
On the downside, if a thread misbehaves, it does so within the running process, and odds are high it will be able to take down all the well behaving threads. If a process misbehaves, it can't directly write into the memory of the other processes, and odds are that only the misbehaving process will die.
Inter-Thread Communication = threads inside the same JVM talking to each other
Inter-Process Communication (IPC) = threads inside the same machine but running in different JVMs talking to each other
Threads inside the same JVM can use pipelining through lock-free queues to talk to each other with nanosecond latency.
Threads in different JVMs can use off-heap shared memory (usually acquired through the same memory-mapped file) to talk to each other with nanosecond latency.
Threads in different machines can use the network to talk to each other with microsecond latency.
For a complete explanation about lock-free queues and IPC you can check CoralQueue.
Disclaimer: I am one of the developers of CoralQueue.
I like to think of a single instance of a JVM as a process. So, interprocess communication would be between instances of JVM's, for example, through sockets (message passing).
Threads in java implement Runnable and are contained within a JVM. They share data simply by passing references in the JVM around. Whenever threads share data you almost always need to protect the data so multiple threads don't clobber each other. There are many mechanisms for protection that all involve preventing multiple threads from entering critical sections of code.