Isolation within OSGi

Isolation within OSGi - java

I'm trying to understand the benefits of OSGi and what I can't understand is what happens if one of the user-supplied components crashes (for example with OutOfMemory exception). Will this problem be isolated just to this component or the complete JVM will crash?

OSGi does not provide memory or CPU isolation between bundles or components. All bundles in an OSGi Framework run inside a Java Virtual Machine, and Java itself does not have the capability to offer this kind of isolation. OSGi can only do things that are possible within the standard Java architecture.
If you want greater isolation, then use separate OS processes. Remember though: there is no such thing as perfect isolation. If you run as separate processes there is always the chance that a rogue process can take down the entire OS. Even if you run on a separate computer in the same datacentre, then the next power cut or tsunami will affect both computers. So you have to ask yourself how much isolation is needed, and what specific risks you need to mitigate.
For what it's worth, there was an attempt within Java a long time ago to provide memory and CPU isolation. This was JSR-121 (Application Isolation API) and it was never adopted into Java SE. Some vendors such as IBM and Waratek implemented proprietary isolation/multitenancy, but these did not catch on (Waratek later pivoted to application security). Basically you end up implementing a process scheduler within the JVM, and what's the point when the OS already has a good one?

OSGi isolates components at classloader level, and an OutOfMemoryError occurs at JVM level. OSGi does not provide "memory isolation". To answer shortly: the whole JVM will crash.

OSGi Isolation is related to the class resolution and class loading. JVM process itself is shared. So OOM Exception will affect your whole container
If you want to run a 3rd party component, you may want to create a new instance (managed from your root container), it runs as a separate OS process with separate JVM parameters.

Related

Confused about Java Memory [duplicate]

Is the same JVM used by all Java applications running or, does 'one JVM per Java application' apply? (say the applications are IntelliJ IDEA, a server and NetBeans for example)
Further, is there any connection between JVMs assigned and processes used by each Java application?

Generally speaking, each application will get its own JVM instance and its own OS-level process and each JVM instance is independent of each other.
There are some implementation details such as Class Data Sharing, where multiple JVM instances might share some data/memory but those have no user-visible effect to the applications (except for improved startup time, hopefully).
A common scenario however is a single application server (or "web server") such as Glassfish or Tomcat running multiple web applications. In this case, multiple web applications can share a JVM.

There's one JVM per Java application. There shouldn't be any connection between them unless you establish one, e.g. with networking. If you're working inside of an IDE, the code you write generally runs in a separate JVM. The IDE will typically connect the separate JVM for debugging. If you're dealing with multiple web applications they could share the same JVM if they're deployed to the same web container.

In theory you can run multiple applications in a JVM. In practice, they can interfere with each other in various ways. For example:
The JVM has one set of System.in/out/err, one default encoding, one default locale, one set of system properties, and so on.
If one application changes these, it affects all applications.
Any application that calls System.exit() kills all applications.
If one application thread goes wild, and consumes too much CPU or memory it will affect the other applications too.

Short answer: often, yes, you'll get one application per JVM.
Long answer: the JVM can be used that way, and that may be the best option, but it doesn't have to be.
It all depends on what you consider to be an 'application'. An IDE is a good example of an application which is presented to its end users (i.e. us) as a single entity but which is actually comprised of multiple underlying applications (compilers, test runners, static analysis tools, packagers, package managers, project / dependency management tools, etc). In that case there are a variety of tricks which the IDE uses to ensure that the user experiences an integrated experience while also being shielded (to some extent) from the individual vagaries of the underlying tools. One such trick is to do some things in a separate JVM, communicating either via text files or via the application-level debugging facilities.
Application servers (Wildfly, Glassfish, Websphere, Weblogic, etc) are applications whose raison d'etre is to act as containers for other applications to run in. In that case, from one perspective, there's a single JVM per application (i.e. one JVM is used to run the entire application server) but there are actually multiple applications contained within that JVM in their own right, each logically separated from each other in their own classloader (reducing the possibility of accidental in-process crosstalk).
So, it all really depends on what you consider an application to be. If you're purely talking about "the thing which runs when 'main()' is called", then you're looking at one application per JVM - when the OS starts the JVM, the JVM runs a single class's public static void main() method.
But once your applications start getting more complicated your boundaries become more blurred. An IDE such as Intellij or Eclipse will reuse much of the same stuff as 'javac', either in the same JVM or a different one, as well as doing different work (such as repainting the screen). And users of a web application on a (shared JVM) application server may actually be using much the same 'core' application as could be used locally via the command line.

Number of JVMs running is the number of executables invoked.
Each such application invokes its own java executable (java.exe/ javaw.exe etx for windows) which means each is running in a separate JVM.

Any application which has shared libraries will share the same copy of those libraries. Java has a fair amount of shared libraries. However, you won't notice the difference except for some memory saved.

Little late here however this info may be useful for somebody. In a Linux system, if you want to know how many JVMs are running you can try this command
$ ps -ef | grep "[j]ava" | wc -l
ps to list process, grep to search process containing "java" and wc to count lines returned

Actually this is one question that can have very confusing answers. To keep it real short:
Yes per java process, per JVM.
Runtime and ProcessBuilder follow this rule.
Loading jars using reflection and then executing the main won't spawn new JVM.

Why have one JVM per application?

I read that each application runs in its own JVM. Why is it so ? Why don't they make one JVM run 2 or more apps ?
I read a SO post, but could not get the answers there.
Is there one JVM per Java application?
I am talking about applications launched via a public static void main(String[]) method ...)

(I assume you are talking about applications launched via a public static void main(String[]) method ...)
In theory you can run multiple applications in a JVM. In practice, they can interfere with each other in various ways. For example:
The JVM has one set of System.in/out/err, one default encoding, one default locale, one set of system properties, and so on. If one application changes these, it affects all applications.
Any application that calls System.exit() will effectively kill all applications.
If one application goes wild, and consumes too much CPU or memory it will affect the other applications too.
In short, there are lots of problems. People have tried hard to make this work, but they have never really succeeded. One example is the Echidna library, though that project has been quiet for ~10 years. JNode is another example, though they (actually we) "cheated" by hacking core Java classes (like java.lang.System) so that each application got what appeared to be independent versions of System.in/out/err, the System properties and so on1.
1 - This ("proclets") was supposed to be an interim hack, pending a proper solution using true "isolates". But isolates support stalled, primarily because the JNode architecture used a single address space with no obvious way to separate "system" and "user" stuff. So while we could create APIs that matched the isolate APIs, key isolate functionality (like cleanly killing an isolate) was virtually impossible to implement. Or at least, that was/is my view.

Reason to have one JVM pre application, basically same having OS process per application.
Here are few reasons why to have a process per application.
Application bug will not bring down / corrupt data in other applications sharing same process.
System resources are accounted per process hence per application.
Terminating process will automatically release all associated resources (application may not clean up for itself, so sharing processes may produce resource leaks).
Well some applications such a Chrome go even further creating multiple processes to isolate different tabs and plugins.
Speaking of Java there are few more reasons not to share JVM.
Heap space maintenance penalty is higher with large heap size. Multiple smaller independent heaps easier to manage.
It is fairly hard to unload "application" in JVM (there to many subtle reasons for it to stay in memory even if it is not running).
JVM have a lot of tuning option which you may want to tailor for an application.
Though there are several cases there JVM is actually shared between application:
Application servers and servlet containers (e.g. Tomcat). Server side Java specs are designed with shared server JVM and dynamic loading/unloading applications in mind.
There few attempts to create shared JVM utility for CLI applications (e.g. nailgun)
But in practice, even in server side java, it usually better to use JVM (or several) per applications, for reasons mentioned above.

For isolating execution contexts.
If one of the processes hangs, or fails, or it's security is compromised, the others don't get affected.
I think having separate runtimes also helps GC, because it has less references to handle than if it was altogether.
Besides, why would you run them all in one JVM?

Java Application Servers, like JBoss, are design to run many applications in one JVM

When to choose several processes over threads in Java?

For what reasons would one choose several processes over several threads to implement an application in Java?
I'm refactoring an older java application which is currently divided into several smaller applications (processes) running on the same multi-core machine, communicating which each other via sockets.
I personally think this should be done using threads rather than processes, but what arguments would defend the original design?

I (and others, see attributions below) can think of a couple of reasons:
Historical Reasons
The design is from the days when only green threads were available and the original author/designer figured they wouldn't work for him.
Robustness and Fault Tolerance
You use components which are not thread safe, so you cannot parallelize withough resorting to multiple processes.
Some components are buggy and you don't want them to be able to affect more than one process. Say, if a component has a memory or resource leak which eventually could force a process restart, then only the process using the component is affected.
Correct multithreading is still hard to do. Depending on your design harder than multiprocessing. The later, however, is arguably also not too easy.
You can have a model where you have a watchdog process that can actively monitor (and eventually restart) crashed worker processes. This may also include suspend/resume of processes, which is not safe with threads (thanks to #Jayan for pointing out).
OS Resource Limits & Governance
If the process, using a single thread, is already using all of the available address space (e.g. for 32bit apps on Windows 2GB), you might need to distribute work amongst processes.
Limiting the use of resources (CPU, memory, etc.) is typically only possible on a per process basis (for example on Windows you could create "job" objects, which require a separate process).
Security Considerations
You can run different processes using different accounts (i.e. "users"), thus providing better isolation between them.
Compatibility Issues
Support multiple/different Java versions: Using differnt processes you can use different Java versions for your application parts (if required by 3rd party libraries).
Location Transparency
You could (potentially) distribute your application over multiple physical machines, thus further increasing scalability and/or robustness of the application (see #Qwe's answer for more Details / the original idea).

If you decide to go with threads you will restrict your app to be run on a single machine. This solution doesn't scale (or scales to some extent) - there are always hardware limits.
And different processes communicating via sockets can be distributed between machines, so that you could add virtually unlimited number or them. This scales better at the cost of slow communication between processes.
Deciding which approach is more suitable is itself a very interesting task. And once you make the decision there's no guarantee that it will look stupid to your successors in a couple of years when requirements change or new hardware becomes available.

How to isolate user sessions in a Java EE?

We are considering development of a mission critical application in Java EE, and one thing that really impressed me is the lack of session isolation in the platform. Let me explain the scenario.
We have a native Windows application (a complete ERP solution) that receives about 2k LoC and 50 bug-fixes per month from sparse contributors. It also supports scripting, so the costumer can add their own logic and we have no clue about what such logic does. Instead of using a thread pool, each server node has a broker and a process pool. The broker receives a client request, enqueues it until a pooled instance is free, sends request to that instance, delivers response to client, and releases the instance back to the process pool.
This architecture is robust because with so many sparse contributions and custom scripting, it's not uncommon for a deployed version to have some serious bug such as an infinite loop, a long-waiting pessimistic lock, a memory corruption or memory leakage. We implemented a memory limit, a timeout for requests, and a simple watchdog. Whenever some process fails to answer correctly and on time, the broker simply kills it, so the watchdog detects and starts another instance. If a process crashes before it started to answer a request, the broker sends the same request to another pooled instance, and the user doesn't know about any failure on the server side (except in admin logs). This is nice because some instances are slowly trashed by bogus code as they work on requests. Because most session data is held at the client or (in rare cases) at a shared storage, it seems to work perfectly.
Now considering a move to Java EE, I couldn't find anything similar on the spec or popular application servers such as Glassfish and JBoss. Yes, I know that most cluster implementations do transparent fail-over with session replication, but we have small companies that use our system on a simple 2-node cluster (and we also have adventurers that use the system on a 1-node server). With a thread pool, I understand that a buggy thread can bring an entire node down, because the server cannot detect and safely kill it. Bringing an entire node down is much worst than killing a single process - we have deployments where each node has about 100 pooled process instances.
I know that IBM and SAP are aware of this problem, based on
http://www.trl.ibm.com/people/kawatiya/pub/Kawachiya07vee.pdf,
and
http://java.sys-con.com/node/47362
, respectively. But based on recent JSRs, forums and open-source tools, there isn't much activity on the community.
Now comes the questions!
If you have a similar scenario and
use Java EE, how did you solve?
Do you know about an upcoming
open-source product or change in
Java EE spec that can address this
issue?
Does .NET have the same problem? Can
you explain or cite references?
Do you know about some modern and
open platform that can address this
issue and is worth the task doing
ERP business logic?
Please, I have to ask you not tell about making more testing or any kind of QA investment, because we cannot force our costumers to make this on their own scripts. We also have cases where urgent bug-fixes must bypass QA, and while we force the customer to accept this, we cannot make him accept that a buggy software part can affect a range of unrelated features. This is issue is about robust architectures, not development process.
Thanks for your attention!

What you have stumbled upon is a fundamental issue regarding the use of Java and "hostile" applications.
It's a fundamental issue not just at the Java EE level, but at the core JVM level. The typical JVMs available have all sorts of issues with loading "unsafe code". From memory leaks, class loader leaks, resource exhaustion, and unclean thread kills, the typical JVM is simply not robust enough to handle badly behaving code well in a shared environment.
A simple example is memory exhaustion of the Java heap. As a basic rule, NOBODY (and by nobody, I specifically mean the core java library and just about every other 3rd party library out there) catches OutOfMemory exceptions. There are the rare few who do, but even they can do little about it. Typical code handles the exceptions they "expect" to handle, but let others fall through. Runtime exceptions (of which OOM is one) will happily bubble up through the call stack all the way to the top, leaving behind a wreckage of unchecked critical path code, leaving all sort of things in unknown state.
Things such as Constructors or static initializers which "can't fail" leaving behind uninitialized class members which are "never null". These damaged classes simply don't know they're damaged. Nobody knows they're damaged, and there's no way to clean them up. A Heap that hits OOM is an unsafe image and pretty much needs to be restarted (unless, of course, you wrote or audited ALL of the code yourself, which, naturally, you won't -- who would?).
Now, there may well be vendor specific JVMs which are better behaved and give you better control. The ones based on the Sun/Oracle JVM (i.e. most of them) do not.
So, it's not necessarily a Java EE issue, it's a JVM issue.
Hosting hostile code in the JVM is a bad idea. The only way it's practical is if you host a scripting language, and that scripting language implements some kind of resource control. That could be done, and you can tweak the existing ones as a start (JavaScript, Groovy, JPython, JRuby). The fact that these languages give users direct access to Java libraries makes them potentially dangerous, so you may have to restrict that as well to only aspects wrapped by script handlers. At this point, though, the "why use Java at all" question floats up.
You'll note Google App Engine does none of these. It spools up a separate JVM for each application that's being run, but even then it greatly restricts what can be done within those JVMs, notably through the existing Java security model. The distinction here is that these instances tend to be "long lived" so as not to endure the processing costs of startup and shutdown. I should say, they SHOULD be long lived, and those that are not do incur those costs.
You can make several instances of the JVM yourself, give them a bit of infrastructure to handle requests for logic, give them custom class loader logic to try and protect from class loader leaks, and minimally let you kill the instances off (they're simply a process) if you want. That can work, and probably work "ok" depending on the granularity of the calls, and the "start up" time for your logic. The start up time will minimally be the loading of the classes for the logic from run to run, that alone may make this a bad idea. And it certainly WON'T be "Java EE". Java EE is not set up to do this kind of thing. But you're not clear what Java EE features you're looking at either.
Effectively, this is what Apache and "mod_php" does. Several instances, as processes, individually handling requests, with badly behaving once being killed off as necessary. This is why PHP is common in the shared hosting business. In this structure, it's basically "safe".

I believe your scenario is highly untypical, thus it is improbable that there is a ready made framework/platform addressing this need. Java EE sort of assumes that the request processing code is written by the same team as the rest of the app, thus it need not be isolated, watched and reset that often, and bug fixes would be handled the same way in all parts of the system. This assumption greatly simplifies development, deployment, testing etc. for most of the projects, not forcing them to pay for something they don't need, And yes, it isn't suitable for everyone. If you want something fundamentally different, you probably need to implement a fair amount of failover logic yourself. Java EE does provide the fundamental building blocks for this though.
I believe (although have no concrete experience to prove it) that .NET or other platforms are basically built on similar assumptions.

We had a similar - though not so severe - port of a really enormous Perl site to Java. On receiving an HTTP request we instantiate a class and call its processRequest method. surrounded by try-catch and time measurement. Adding a timer and thread would suffice to be able to kill the thread. This probably is sufficient in real life.
A Java EE server like glassfish is an OSGi container you might have more isolating means.
Also you could run an array of (web or local) applications on which you dispatch your request via a central web applications. Those applications then are isolated.
Even more isolated are serialized sessions and operating system processes starting a new JVM.

tomcat isolate webapps

multiple webapp running on same tomcat using same jvm. sometime, one webapp that have memory leak will cause entire jvm to crash and affect other webapps. any recommendation how to isolated that without need to use multiple jvm and tomcat

Within the same JVM everything shares the the same memory. There is no system to allocate separate pools or quota.
If one of your applications behaves really badly in this regard, the only thing you can do is run it isolated in a separate JVM (separate Tomcat).

Are the applications running as separate processes? Or the same one?
First off you should look at profiling to find the memory leak https://stackoverflow.com/questions/1716597/java-memory-leak-detection-tools.
However, as a quick solution from inside you could use Runtime.getRuntime().totalMemory() to see how much memory is in use, and if it grows above a certain limit, and you know which app is causing the problem, you could restart that app.
You could also try running System.gc() which is a terrible way to do it, and really shouldn't be used as it can be ignored by the JVM.

To the best of my knowledge, the short answer is: No, it can't be done. Tomcat uses a single memory space for all running apps.
My knee-jerk response is that you should fix the memory leak rather than trying to isolate the misbehaving app. Cure is better than quarantine. As I don't know the details of your problem, maybe this isn't practical for some reason.

You can't isolate apps in the same JVM (though you can do things like instrument a particular apps ClassLoader for diagnostics)
If your concern is administration/configuration though (and not total memory consumption) you can run multiple instances of Tomcat off the same install by using catalina.home and catalina.base

JSR 121 was designed to solve this, but it hasn't been implemented yet.

There is no standard system in Java to truly isolate memory used by web applications.
However, you could write some byte-code weaving logic to track how much memory a particular app has allocated. If it goes over a particular threshold, you could throw an exception and stop the app from allocating anymore memory. What do you want to do if you could track all the memory consumed by a web app? What are you trying to implement?
Note that this would only really work effectively for figuring out how much memory a webapp has allocated, not how much it is currently consuming in the system. In order to get that metric, you'd have to byte-code weave finalize() for all objects. Since finalize() gets run in a best-effort fashion by the JVM, this may not get you the most accurate value should the system be under load. The JVM would deprioritize these finalize threads and your value will never get updated even though objects have been cleaned up.

To bring this up to date, it is now possible to run multiple applications on a single JVM. Applications run in isolated java virtual containers which protect your applications from 'noisy neighbours' as well as allowing you to share resources across your applications. This gives you isolation, elasticity and increased application density for Apache Tomcat. Download it from www.elasticat.com NB I do work for Waratek who developed this new JVM

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.