Is there a way to limit both the number of cores that java uses?
And in that same vein, is it possible to limit how much of that core is being used?
You can use taskset on linux. You can also lower the priority of a process, but unless the CPU(S) are busy, a process will get as much CPU as it can use.
I have a library for dedicating thread to a core, called Java Thread Affinity, but it may have a different purpose to what you have in mind. Can you clarify why you want to do this?
I don't think that there are built-in JVM options to do these kind of tweaks, however you can limit CPU usage by setting priority and/or CPU affinity of the JVM process.
If you are on Linux take a look at CPULimit that is an awesome tool to do these kind of resource limitations.
https://github.com/haosdent/jcgroup
jcgroup is your best choice. You could use this library to limit the CPU shares, Disk I/O speed, Network bandwidth and etc.
Related
I have a container that is limited to 1 CPU, the default case for java 11+ (and probably older also) in such case it to user SerialGC.
Should I force a threaded GC (like G1GC) or just leave it at SerialGC?
Which one will perform better on a single CPU?
I always assumed SerialGC is better in such case but I frequently see G1GC forced in some cases.
EDIT: I'm asking for general case, because we have a lot of different apps running using the same configuration and it is hard to test each and every case.
According to the documentation.
The serial collector uses a single thread to perform all garbage
collection work, which makes it relatively efficient because there is
no communication overhead between threads.
It's best-suited to single processor machines because it can't take
advantage of multiprocessor hardware, although it can be useful on
multiprocessors for applications with small data sets (up to
approximately 100 MB).
I'm assuming processor = core in the documentation (and your question). While the documentation says that the serial collector is not a good option for multi-core machines, it doesn't say that other collectors would be bad for a single-core machine.
The other collectors do tend to use multiple threads though, and you won't get the full benefits of those in a single-core environment.
So why have you seen G1GC used? Maybe no reason other than it was the newest. However if there is a reason, it would most likely be the shorter GC pauses that G1 provides:
If response time is more important than overall throughput and garbage
collection pauses must be kept shorter than approximately one second,
then select a mostly concurrent collector with -XX:+UseG1GC or
-XX:+UseConcMarkSweepGC.
The best case scenario is that in those cases they measured the performance with different collectors and chose the one that provided the best results.
Also consider the String deduplication Holger mentioned in the comments. This is a specific memory optimization that can be the reason behind using G1GC. After all if you have a single core, you probably don't have a lot of memory at your disposal either.
What do you want to optimize? Do you want to be able always to answer extremely fast or to have better overall performance? In the first case, you should aim for shorter GC pauses, in the second for the lower sum of all the GC pauses.
There are other factors that you have in mind (i.e. how often applications are restarted) so IMO the best approach is a data-driven approach. Use GC easy or GC viewer to analyze the performance of each application and act accordingly.
Please have in mind that GC tuning is not always required so if you do not know what you want to achieve you probably optimize prematurely.
In general:
use The Serial GC for applications that do not have low pause time requirements and are run in the environment with low resources
go with G1 Garbage Collector if you have more resources or you need to answer fast (remember to measure the performance before and after the change)
As a more general comment, don't make the assumption that because you only have a single core/CPU that making a task multi-threaded will have no benefit. Depending on the task involved (in this case GC), there may well be situations where one thread becomes blocked (e.g. waiting for IO to complete), which allows other threads performing another part of the task to use the processor and complete useful work. Overall performance is increased, despite only one thread being able to run at a time.
One important thing that has not been mentioned in this thread is that the G1GC can return the memory (uncommit it) back to the OS, so if other applications are running on the server, they can make use of it.
I noticed this when switching from a single vCPU server to 2 vCPU server, as java by default uses SerialGC for single CPU and G1GC for multi-CPU (well at least it does for JDK 11)
Is there a way to share core library between Java processes (or other way to minimize JVM initial memory impact)
So here's my case. I'm playing with microservices. I'm runing quite a lot of them. I'm setting their heap for 128M as it's enough for them. But I've noticed that the Linux process is consuming much more.
If I understand correctly from here
Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]
although I am using Java 8 so probably perm size is no longer the issue? or is it.
There is initial "core" JVM memory footprint... and I was wondering if you heard a way to somehow share that "core" memory between processes (as it's really the same). Or any way to deal with that extra cost when running many processes of java.
Conceptually you're asking if you can fork a JVM - since forking (generally) uses copy-on-write memory semantics this can be an effective space-saving measure. Unfortunately as discussed in this answer forking the JVM is not supported, and generally not practical. Non-Unix systems cannot fork efficiently, and there are numerous other side-effects that a forked JVM would have to resolve in messy ways. Theoretically you could probably fork a JVM process, but you'd be walking squarely into "undefined behavior" territory.
The "right" way to avoid JVM startup costs is to reduce the number of JVMs you need to start up in the first place. Java is a highly-concurrent language that supports shared access to common memory out of the box via its threading model. If you can refactor your code to run concurrently in the same JVM you'll see much better performance.
I'm using Java's fork-join framework to deal with a CPU- intensive calculation.
I've tweaked the "sequential threshold" (used to determine whether to create subtasks or do the work) a bit, but to my disappointment, going from single-threaded to 4+4 cores only about doubles the overall performance. The pool does report 8 CPUs, and when I manually set 2,3,4,.. I see gradual increases in performance, but still it tops out at about twice the single- thread throughput overall. Also, the Linux System Activity monitor hovers around 50% for that Java process.
Also very suspicious is the fact that when I start multiple Java processes, the collective throughput is more in line (almost 4 times faster than a single thread) and the System Activity monitor shows higher CPU use.
Is is possible that there is a limitation in either Java, Linux, or the fork/join framework that would disallow full CPU usage? Any suggestions or similar experiences?
NB. This is on an Intel 3770 CPU, with 4 cores and 4 hyperthreaded cores, running Oracle Java 7r13 on a Linux Mint box.
Thanks for the thoughts and answers, everyone! From your suggestions, I concluded that the problem was not the framework itself and went on to some more testing, finding that after a few minutes the cpu load dropped down to 15% !
Turns out, Random (which I use extensively) has poor performance in a multithreaded setup. The solution was to use ThreadLocalRandom.current().nextXXX() instead. I'm now up to consistent 80% usage (there are still some sequential passages left). Sweet!
Thanks again for putting me on the right track.
In my application i run some threads with untrusted code and so i have to prevent a memory overflow. I have a WatchDog wich analyses the time of the current thread (the threads were called in serial).
But how i can determine the memory usage?
I only know the memory usage of the whole VM with Runtime.totalMemory()?
If there is a possibility to find out the usage of the thread, or the usage of the single process it would be great. With the memory usage of the process i could calculate the usage of the thread anyway.
Since a JVM executing a Java program is a Java process you don't have to worry about that. All threads share the same memory space in the JVM process.
Hence it is sufficient to rely on
Runtime.totalMemory()
Runtime.freeMemory()
A Java application cannot control the amount of memory or (or CPU) used by its threads,
irrespective of whether the threads are running trusted or untrusted code. There are no APIs for doing
this in current generation JVMs. And there are certainly no APIs for monitoring a thread's usage of memory. (It is not even clear that this is a meaningful concept ... )
The only way you can guarantee to control the resource usage of untrusted Java code is to run the code in a separate JVM, and use operating system level resource controls (such as ulimit, nice, sigstop, etc) and "-Xmx" to limit that JVM's resource usage.
Some time back, a Sun produced JSR 121 aimed at addressing this issue. This JSR would allow an application to be split into parts (called "isolates") that communicated via message passing, and offered the ability for one isolate to monitor and control another. Unfortunately, the Isolate APIs have yet to be implemented in any mainstream JVM.
What you need to do is to run the untrusted code in its own process/JVM. This is possible using the JNI interfaces (if your operating system permits it).
We have a small text box with 512Mb of ram. We wanted to see how many threads we can create in Java in this box. To our surprise, we can't create many. Essentially the minimum stack size you can set with -Xss is 64k. Simple math will tell you that 64*7000 will consume 430Mb so we were only able to get it up to around 7000 threads or so and then we encountered this error:
java.lang.OutOfMemoryError: unable to create new native thread.
Is this the true limit with Java? Per 512Mb of ram we can only squeeze in 7k number of threads or so?
Use asynchronous IO (java nio) and you'll don't need 7k threads to support 7k clients, a few threads for handling io (5?) will be enough.
Take a look at Netty ;)
One thread for each client is a really bad design.
Once you create your 7k threads, you're not going to have any memory to do anything useful. Perhaps you should have a rethink about the design of your application?
Anyway, isn't 512Mb quite small? Perhaps you could provide a bit more information about your application or perhaps the domain?
Keep in mind that you will never be able to dedicate 100% of the RAM to running Java threads. Some RAM is used by the OS and other running applications, meaning you will never have the full 512 Mb available.
It's not the programming language, it's on the operating system level.
More reading about it, for Windows:
Does Windows have a limit of 2000 threads per process?
Pushing the Limits of Windows: Processes and Threads (by Mark Russinovich)
You don't necessarily need one thread per client session. If you look at the way that a J2EE (or JavaEE) server handles multiple connections it uses a mixture of strategies including concurrency, queuing and swapping. Usually you can configure the maximum number of live concurrent instances and idle time-out values at deployment time to tune the performance of your application.
Try setting the maximum memory allowed -Xmx to a lower value and see whether the thread count can be increased. In a project at work I could allocate around 2,5k threads with -Xmx512m and about 4k threads with -Xmx96m.
The bigger your heap the smaller your thread stack space (at least to my experience).