Is it possible to make some sub-set of threads (e.g. from specific ThreadPool) allocate memory from own heap? E.g. most of the threads are allocating from regular shared heap, and few worker threads are allocating from individual heaps (1:1 per thread).
The intent is to ensure safe execution of the code in shared environment - typical worker is stateless and is running on separate thread, processing of one request should not consume more than 4MB of heap.
Update #1
Re: But why are you worried about "safe execution" and unpredictable increasing of heap consumption?
The point is about safe hosting of arbitrary 3rd party java code within my process. Once of the points is to not get "Out of Memory" for my entire process because of bugs in the 3rd party code.
Update #2
Re: As of limiting memory usage per thread, in Java the language it's impossible
According to my investigation before I've posted this question my opinion is the same, I'm just hoping I'm missing something.
The only possible alternative solutions for my use-case as I see right now are ...
1) How much memory does my java thread take? - track thread memory usage in some governor thread and terminate bad threads
2) Run Java code on my own JVM - Yes it is possible. You can download a JVM open source implementation, modify it ... :)
Check out Java nonblocking memory allocation — threads are usually allocating memory from their own allocation blocks already. So if the speed is of concern, Sun has done it for you.
As of limiting memory usage per thread, in Java the language it's impossible. Whether it is possible (or makes sense) in JVM and Java the platform is an interesting question. You can of course do it the same way as any memory profiler does, but I'm afraid the management system will outgrow the application itself pretty soon.
No. There is no concept of this in Java. There is one 'heap' that new allocates from. Java allocation is thread-safe. And why do you think that making more heaps would cause threads to consume less memory?
If you want to control memory usage in a thread, don't allocate things.
You could, in theory, create pools of reusable objects for a purpose like this, but the performance would almost certainly be worse than the obvious alternative.
Threads by design share all the heap and other regions of memory. Only the stack is truly thread local, and this space can be limited.
If you have tasks which you want to run in their own memory and/or can be stopped, you have to run them as a separate process.
Related
In the spirit of question Java: Why does MaxPermSize exist?, I'd like to ask why the Oracle JVM uses a fixed upper limit for the size of its memory allocation pool.
The default is 1/4 of your physical RAM (with upper & lower limit); as a consequence, if you have a memory-hungry application you have to manually change the limit (parameter -Xmx), or your app will perform poorly, possible even crash with an OutOfMemoryError.
Why does this fixed limit even exist? Why does the JVM not allocate memory as needed, like native programs do on most operating systems?
This would solve a whole class of common problems with Java software (just Google to see how many hints there are on the net on solving problems by setting -Xmx).
Edit:
Some answers point out that this will protect the rest of the system from a Java program with a run-away memory leak; without the limit this would bring the whole system down by exhausting all memory. This is true. However, it is equally true for any other program, and modern OSes already let you limit the maximum memory for a programm (Linux ulimit, Windows "Job Objects"). So this does not really answer the question, which is "Why does the JVM do it differently from most other programs / runtime environments?".
Why does this fixed limit even exist? Why does the JVM not allocate memory as needed, like native programs do on most operating systems?
The reason is NOT that the GC needs to know before hand what the maximum heap size can be. The JVM is clearly capable of expanding its heap ... up to the maximum ... and I'm sure it would be a relatively small change to remove that maximum. (After all, other Java implementations do this.) And it would equally be possible to have a simple way to say "use as much memory as you like" to the JVM.
I'm sure that the real reason is to protect the host operating system against the effects of faulty Java applications using all available memory. Running with an unbounded heap is potentially dangerous.
Basically, many operating systems (e.g. Windows, Linux) suffer serious performance degradation if some application tries to use all available memory. On Linux for example, the system may thrash badly, resulting in everything on the system running incredibly slowly. In the worst case, the system won't be able to start new processes, and existing processes may start crashing when the operating system refuses their (legitimate) requests for more memory. Often, the only option is to reboot.
If the JVM ran with an unbounded heap by default, any time someone ran a Java program with a storage leak ... or that simply tried to use too much memory ... they would risk bringing down the entire operating system.
In summary, having a default heap bound is a good thing because:
it protects the health of your system,
it encourages developers / users to think about memory usage by "hungry" applications, and
it potentially allows GC optimizations. (As suggested by other answers: it is plausible, but I cannot confirm this.)
EDIT
In response to the comments:
It doesn't really matter why Sun's JVMs live within a bounded heap, where other applications don't. They do, and advantages of doing so are (IMO) clear. Perhaps a more interesting question is why other managed languages don't put a bound on their heaps by default.
The -Xmx and ulimit approaches are qualitatively different. In the former case, the JVM has full knowledge of the limits it is running under and gets a chance to manage its memory usage accordingly. In the latter case, the first thing a typical C application knows about it is when a malloc call fails. The typical response is to exit with an error code (if the program checks the malloc result), or die with a segmentation fault. OK, a C application could in theory keep track of how much memory it has used, and try to respond to an impending memory crisis. But it would be hard work.
The other thing that is different about Java and C/C++ applications is that the former tend to be both more complicated and longer running. In practice, this means that Java applications are more likely to suffer from slow leaks. In the C/C++ case, the fact that memory management is harder means that developers don't attempt to build single applications of that complexity. Rather, they are more likely to build (say) a complex service by having a listener process fork of child processes to do stuff ... and then exit. This naturally mitigates the effect of memory leaks in the child process.
The idea of a JVM responding "adaptively" to requests from the OS to give memory back is interesting. But there is a BIG problem. In order to give a segment of memory back, the JVM first has to clear out any reachable objects in the segment. Typically that means running the garbage collector. But running the garbage collector is the last thing you want to do if the system is in a memory crisis ... because it is pretty much guaranteed to generate a burst of virtual memory paging.
Hm, I'll try summarizing the answers so far.
There is no technical reason why the JVM needs to have a hard limit for its heap size. It could have been implemented without one, and in fact many other dynamic languages do not have this.
Therefore, giving the JVM a heap size limit was simply a design decision by the implementors. Second-guessing why this was done is a bit difficult, and there may not be a single reason. The most likely reason is that it helps protect a system from a Java program with a memory leak, which might otherwise exhaust all RAM and cause other apps to crash or the system to thrash.
Sun could have omitted the feature and simply told people to use the OS-native resource limiting mechanisms, but they probably wanted to always have a limit, so they implemented it themselves.
At any rate, the JVM needs to be aware of any such limit (to adapt its GC strategy), so using an OS-native mechanism would not have saved much programming effort.
Also, there is one reason why such a built-in limit is more important for the JVM than for a "normal" program without GC (such as a C/C++ program):
Unlike a program with manual memory management, a program using GC does not really have a well-defined memory requirement, even with fixed input data. It only has a minimum requirement, i.e. the sum of the sizes of all objects that are actually live (reachable) at a given point in time. However, in practice a program will need additional memory to hold dead, but not yet GCed objects, because the GC cannot collect every object right away, as that would cause too much GC overhead. So GC only kicks in from time to time, and therefore some "breathing room" is required on the heap, where dead objects can await the GC.
This means that the memory required for a program using GC is really a compromise between saving memory and having good througput (by letting the GC run less often). So in some cases it may make sense to set the heap limit lower than what the JVM would use if it could, so save RAM at the expense of performance. To do this, there needs to be a way to set a heap limit.
I think part of it has to do with the implementation of the Garbage Collector (GC). The GC is typically lazy, meaning it will only start really trying to reclaim memory internally when the heap is at its maximum size. If you didn't set an upper limit, the runtime would happily continue to inflate until it used every available bit of memory on your system.
That's because from the application's perspective, it's more performant to take more resources than exert effort to use the resources you already have to full utilization. This tends to make sense for a lot of (if not most) uses of Java, which is a server setting where the application is literally the only thing that matters on the server. It tends to be slightly less ideal when you're trying to implement a client in Java, which will run amongst dozens of other applications at the same time.
Remember that with native programs, the programmer typically requests but also explicitly cleans up resources. That isn't typically true with environments who do automatic memory management.
It is due to the design of the JVM. Other JVM's (like the one from Microsoft and some IBM ones) can use all the memory available in the system if needed, without an arbitrary limit.
I believe it allows for GC-optimizations.
I think that the upper limit for memory is is linked to the fact that JVM is a VM.
As any physical machine has a given (fixed) ammount of RAM so the VM has one.
The maximal size makes the JVM easier to manage by the operating system and ensures some performance gains(less swapping).
Sun' JVM also works in quite limited hardware architecture(embedded ARM systems) and there the management of resources is crucial.
One answer that no-one above gave is that the JVM uses both heap and non-heap memory pools. Putting an upper limit on the heap defines not only how much memory is available for the heap memory pools, but it also defines how much memory is available for NON-HEAP usages. I suppose that the JVM could just allocate non-heap at the top of virtual memory and heap at the bottom of virtual memory and grow both toward each other.
Non-heap memory includes the DLLs or SOs that comprise the JVM and any native code being used as well as compiled Java code, thread stacks, native objects, PermGen (meta-data about compiled classes), among other uses. I've seen Java programs crash because so much memory was given to the heap that the application ran out of non-heap memory. This is where I learned that it can be important to reserve memory for non-heap usages by not setting the heap to be too large.
This makes a much bigger difference in a 32-bit world where an application often has only 2GB of virtual address space than it does in a 64-bit world, of course.
Would it not make more sense to separate the upper bound that triggers GC and the maximum that can be allocated ? Once the memory allocated hits the upper-bound, GC can kick in and release some memory to the free pool.
sort of like how I clean my desk that I share with my co-worker. I have a large desk, and my threshold of how much junk I can tolerate on the table is much less than the size of my desk. I don't need to have fill up every available inch before I garbage collect.
I could also return some of the desk space that I using to my co-worker, who is sharing my desk....I understand jvms don't return memory back to the system after they've allocated it to themselves, but it does not have to be that way no ?
It does allocate memory as needed, up to -Xmx ;)
One reason I can think of is that once the JVM allocates an amount of memory for its heap, it will never let it go. So if your heap has no upper bound, the JVM may just grab all the free memory on the system and then never let it go.
The upper bound also tells the JVM when it needs to do a full garbage collection. If your app is still under the upper bound, the JVM will postpone garbage collection and let the memory footprint of your application grow.
Native programs can die due to out of memory errors as well since native applications also have a memory limit: the memory available on the system - the memory already held by other applications.
The JVM also needs a contiguous block of system memory in order for garbage collection to be performed efficiently.
EDIT
Contiguous memory claim or here
The JVM will apparently let some memory go, but it is rare with the default configuration.
I have a container that is limited to 1 CPU, the default case for java 11+ (and probably older also) in such case it to user SerialGC.
Should I force a threaded GC (like G1GC) or just leave it at SerialGC?
Which one will perform better on a single CPU?
I always assumed SerialGC is better in such case but I frequently see G1GC forced in some cases.
EDIT: I'm asking for general case, because we have a lot of different apps running using the same configuration and it is hard to test each and every case.
According to the documentation.
The serial collector uses a single thread to perform all garbage
collection work, which makes it relatively efficient because there is
no communication overhead between threads.
It's best-suited to single processor machines because it can't take
advantage of multiprocessor hardware, although it can be useful on
multiprocessors for applications with small data sets (up to
approximately 100 MB).
I'm assuming processor = core in the documentation (and your question). While the documentation says that the serial collector is not a good option for multi-core machines, it doesn't say that other collectors would be bad for a single-core machine.
The other collectors do tend to use multiple threads though, and you won't get the full benefits of those in a single-core environment.
So why have you seen G1GC used? Maybe no reason other than it was the newest. However if there is a reason, it would most likely be the shorter GC pauses that G1 provides:
If response time is more important than overall throughput and garbage
collection pauses must be kept shorter than approximately one second,
then select a mostly concurrent collector with -XX:+UseG1GC or
-XX:+UseConcMarkSweepGC.
The best case scenario is that in those cases they measured the performance with different collectors and chose the one that provided the best results.
Also consider the String deduplication Holger mentioned in the comments. This is a specific memory optimization that can be the reason behind using G1GC. After all if you have a single core, you probably don't have a lot of memory at your disposal either.
What do you want to optimize? Do you want to be able always to answer extremely fast or to have better overall performance? In the first case, you should aim for shorter GC pauses, in the second for the lower sum of all the GC pauses.
There are other factors that you have in mind (i.e. how often applications are restarted) so IMO the best approach is a data-driven approach. Use GC easy or GC viewer to analyze the performance of each application and act accordingly.
Please have in mind that GC tuning is not always required so if you do not know what you want to achieve you probably optimize prematurely.
In general:
use The Serial GC for applications that do not have low pause time requirements and are run in the environment with low resources
go with G1 Garbage Collector if you have more resources or you need to answer fast (remember to measure the performance before and after the change)
As a more general comment, don't make the assumption that because you only have a single core/CPU that making a task multi-threaded will have no benefit. Depending on the task involved (in this case GC), there may well be situations where one thread becomes blocked (e.g. waiting for IO to complete), which allows other threads performing another part of the task to use the processor and complete useful work. Overall performance is increased, despite only one thread being able to run at a time.
One important thing that has not been mentioned in this thread is that the G1GC can return the memory (uncommit it) back to the OS, so if other applications are running on the server, they can make use of it.
I noticed this when switching from a single vCPU server to 2 vCPU server, as java by default uses SerialGC for single CPU and G1GC for multi-CPU (well at least it does for JDK 11)
I have some questions about the Garbage Collection concept of Java when working in distributed systems:
Why is mark-and-sweep GC not recommended in RMI system?
Is it possible to run the GCs "Reference counting"-algorithm in a parallel thread without suspending the application itself?
Thanks in advance.
Why is mark-and-sweep GC not recommended in RMI system?
I don't believe it is.
Is it possible to run the GCs "Reference counting"-algorithm in a parallel thread without suspending the application itself?
While reference counting is not forbidden as a mode of GC, it is not supported by any JVM AFAIK as it has many limitations including performance, memory usage and circular references. I know C++ uses it but is a hack by comparison to what the managed memory systems do.
Note: MappedByteBuffers use reference counts for some purposes. This is an isolated use case.
There is a purely concurrent collector, the most popular of which is available from Azul. http://www.azulsystems.com/zing/pgc Note: it should really be called "pause less" instead of "pauseless" as it dramatically reduces GC related pauses, but doesn't eliminate them completely. (It is often used for low latency trading system in Java.)
If you are really concerned about GC pauses, the best thing to do is avoid using Java RMI. It is designed to be a "full fat" fully featured RPC which does lots of things you possibly never thought of doing. The Serialization isn't very efficient and generates lots of garbage. Using a more targeted RPC solution can reduce garbage by 90 - 99% or much better.
Check Java's web site on this: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
Basically there are several garbage collectors available. I've been running Parallel collector on production system for quite a while, but googling around will show you that G1 is also showing great promises.
Why is mark-and-sweep GC not recommended in RMI system?
I don't know what this means. RMI uses a distributed garbage collection algorithm (DGC), from Modula-3, which uses reference-counting, among other things, but that's completely separate from JVM garbage collection. I've never hear of the recommendation you mention. Citation please. I'm not sure the statement even makes sense. Changing 'recommended' to 'convenient' as per your comment doesn't really help.
Is it possible to run the GCs "Reference counting"-algorithm in a parallel thread without suspending the application itself?
There is no reference counting algorithm in current JVMs, but GC does run in its own thread as far as I am aware. So does RMI DGC.
In short your question doesn't make sense.
I have a Java/Java EE Web Application.
Often when I see that the application stops responding because of high heap usage (or out of memory scenario), I also see that threads are blocked (via a thread dump) - often on logging, and also on random things.
I have seen this happen more than once in the web application.
Is there any correlation between an out of memory scenario and blocked threads?
Yes there is a direct correlation between OOM and blocked threads. This is due to the reason that the thread is trying to allocate memory on the heap and not able get adequate memory. Mostly you will see blocked threads around logging, class loading, resource lookup, IO. These all are the cases where new memory allocation is required.
Yes, because threads are where your code executes and your code needs memory. Java is object-oriented, so creation of new objects is an extremely common occurrence. When a JVM is having memory issues, attempts to allocate more memory block until memory can be granted.
Interfacing with external systems (I/O) is a common point to see threads blocking because these often involve good size chunks of memory allocation (such as string buffers for formatting, reading in a .class file by a class loader, generating objects for a database result set).
This is one of many reasons why troubleshooting OutOfMemoryError can be very difficult. When your heap space is running low / exhausted, every thing slows down and breaks to point where separating the symptoms from the cause becomes difficult.
Yes there is a correlation. Although threads share the heap but they have their own stack. Both are the memory allocated from the available memory. A thread may be doing some work as you have mentioned in your case logging. For logging thread may be holding some logs in memory and will be trying to put them in the log file. As different logging threads are there, so they will be waiting for turns to get the access to the log file. If threads are waiting too long for the file then they will be holding the log data in memory for long. If this keep happening there will too many threads with too many data in memory. Eventually JVM will encounter out of memory when someone tries to get memory and there is nothing available.
In my application i run some threads with untrusted code and so i have to prevent a memory overflow. I have a WatchDog wich analyses the time of the current thread (the threads were called in serial).
But how i can determine the memory usage?
I only know the memory usage of the whole VM with Runtime.totalMemory()?
If there is a possibility to find out the usage of the thread, or the usage of the single process it would be great. With the memory usage of the process i could calculate the usage of the thread anyway.
Since a JVM executing a Java program is a Java process you don't have to worry about that. All threads share the same memory space in the JVM process.
Hence it is sufficient to rely on
Runtime.totalMemory()
Runtime.freeMemory()
A Java application cannot control the amount of memory or (or CPU) used by its threads,
irrespective of whether the threads are running trusted or untrusted code. There are no APIs for doing
this in current generation JVMs. And there are certainly no APIs for monitoring a thread's usage of memory. (It is not even clear that this is a meaningful concept ... )
The only way you can guarantee to control the resource usage of untrusted Java code is to run the code in a separate JVM, and use operating system level resource controls (such as ulimit, nice, sigstop, etc) and "-Xmx" to limit that JVM's resource usage.
Some time back, a Sun produced JSR 121 aimed at addressing this issue. This JSR would allow an application to be split into parts (called "isolates") that communicated via message passing, and offered the ability for one isolate to monitor and control another. Unfortunately, the Isolate APIs have yet to be implemented in any mainstream JVM.
What you need to do is to run the untrusted code in its own process/JVM. This is possible using the JNI interfaces (if your operating system permits it).