Getting cpu usage by process within container with limits

Getting cpu usage by process within container with limits - java

Do you know whether exist a library for Java that makes it possible to get recent CPU usage, let's say for the last few seconds?
This library should work on different OSes (Mac, Linux, Windows) and be container-aware - let's say that our JVM was run in container and CPU was limited to 1000 ticks per any period. Then, cpu usage by process should be relative to the limit.

With Jmx / MBean you can get that kind of info. Java process will run as fast as the CPU limit. So that measurement doesn't make sense on different machine.
To have an idea you need to develop on docker with CPU limits and check wall clock to validate requirement of the service with load tests like gatling https://gatling.io/.

Related

How to test if my java application can successfully handle low memory/CPU resources on Tomcat server?

I want to test how my java application would behave on Tomcat server with 512M RAM only. In other words I need to do memory load-testing to check if my application can run in such restricted environment.
Using which tools and how can I achieve this?
I heard about APM software including Stackify Prefix, New Relic APM, JMeter, JVisualVM, JVM Monitor, JBenchX - but I am not sure I need to proceed with any of them for my specific purpose.
The same problem for having very limited CPU resources. I'd like to test my app on my desktop PC before deploying to Jelastic cloud with limited memory/CPU.

You can artificially limit the JVM heap allocated to tomcat by modifying -Xmx command-line argument which defines the maximum heap space your Tomcat server will use.
If low heap size is the only thing you would like to test - it would be sufficient.
You might also amending CPU affinity to bind your Tomcat server to a single CPU core (or limited number of cores)
If you want to go further you can create a virtual machine using i.e. VirtualBox and replicate all the anticipated hardware/software which you'll have after the deployment.
With regards to testing I would recommend the following performance testing techniques:
Load Testing - putting your system under anticipated load to see if it is capable of handling it
Soak Testing - basically the same as Load Testing but for prolonged duration (i.e. overnight or weekend) - it will allow you to identify memory leaks
Stress Testing - start with Load Testing and gradually increase the load until response time starts exceeding acceptable threshold or errors start occurring (whatever comes the first) - it will let you know the limits of your application/configuration and vision what and how is gonna break
Using profiler tools like YourKit or JProfiler for fine-tuning your code would be beneficial as well.

The best way to do this is with a Virtual Machine. You can pick your technology of choice, but an easy option would be to use Oracle VirtualBox, which is freely-available for many platforms. Just install a minimal OS inside the VM, then add Java, your application, etc. and then run your load-test against it.
Networking works as usual, so you can use your existing load-testing framework and just point it at the IP address of the VM.
There are other fancier way to do it, e.g. using Docker or whatever, but this will get the job done for a smoke-test.
I wouldn't recommend trying to use a server with a large amount of RAM and then try to "synthesize" a low-RAM situation without using something like a Virtual Machine (and BTW Docker uses VMs internally).

Java - issue with memory

Need some help from the experts!
We have a project here (still on dev) that needs to run 50 java processes (for now and it will probably doubled or tripled in the future) at the same time every 5 minutes. I set Xmx50m for every process and our server has only 4gb of RAM, I know that would really slow our server. What I have in mind is to upgrade our RAM. My question is that do I have other options to prevent our server from being slow when running that amount of java processes?

Since you have 50 process and as per your assumption your processes need about 2.5 Gb to run .
To prevent your server from being slow you can follow some best practices to set java memory parameters e.g. set -Xmin and -Xmx the same values and determine a proper values based on your process usage, Also you can profile your process on runtime to ensure that everything is ok.

How to handle thousands of threads in Java without using the new java.util.concurrent package

I have a situation in which I need to create thousands of instances of a class from third party API. Each new instance creates a new thread. I start getting OutOfMemoryError once threads are more than 1000. But my application requires creating 30,000 instances. Each instance is active all the time. The application is deployed on a 64 bit linux box with 8gb RAM and only 2 gb available to my application.
The way the third party library works, I cannot use the new Executor framework or thread pooling.
So how can I solve this problem?
Note that using thread pool is not an option. All threads are running all the time to capture events.
Sine memory size on the linux box is not in my control but if I had the choice to have 25GB available to my application in a 32GB system, would that solve my problem or JVM would still choke up?
Are there some optimal Java settings for the above scenario ?
The system uses Oracle Java 1.6 64 bit.

I concur with Ryan's Answer. But the problem is worse than his analysis suggests.
Hotspot JVMs have a hard-wired minimum stack size - 128k for Java 6 and 160k for Java 7.
That means that even if you set the stack size to the smallest possible value, you'd need to use roughly twice your allocated space ... just for thread stacks.
In addition, having 30k native threads is liable to cause problems on some operating systems.
I put it to you that your task is impossible. You need to find an alternative design that does not require you to have 30k threads simultaneously. Alternatively, you need a much larger machine to run the application.
Reference: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-June/003867.html

I'd say give up now and figure another way to do it. Default stack size is 512K. At 30k threads, that's 15G in stack space alone. To fit into 2G, you'll need to cut it down to less than 64K stacks, and that leaves you with zero memory for the heap, including all the Thread objects, or the JVM itself.
And that's just the most obvious problem you're likely to run into when running that many simultaneous threads in one JVM.

I think we are missing lots of details, but would a distributed plateform would work? Each of individual instances would manage a range of your classes instances. Those plateform could be running on different pcs or virtual machines and communicate with each other?

I had the same problem with an SNMP provider that required a thread for each outstanding get (I wanted to have tens of thousands of outstanding gets going on at once). Now that NIO exists I'd just rewrite the library myself if I had to do this again.
You cannot solve it in "Java Code" or configuration. Windows chokes at around 2-3000 threads in my experience (this may have changed in later versions). When I was doing this I surprisingly found that Linux supported even less threads (around 1000).
When the system stops supplying threads, "Out of Memory" is the exception you should expect to see--so I'm sure that's it--I started getting this exception long before I ran out of memory. Perhaps you could hack linux somehow to support more, but I have no idea how.
Using the concurrent package will not help here. If you could switch over to "Green" threads it might, but that might take recompiling the JVM (it would be nice if it was available as a command line switch, but I really don't think it is).

Slowing process creation under Java?

I have a single, large heap (up to 240GB, though in the 20-40GB range for most of this phase of execution) JVM [1] running under Linux [2] on a server with 24 cores. We have tens of thousands of objects that have to be processed by an external executable & then load the data created by those executables back into the JVM. Each executable produces about half a megabyte of data (on disk) that when read right in, after the process finishes, is, of course, larger.
Our first implementation was to have each executable handle only a single object. This involved the spawning of twice as many executables as we had objects (since we called a shell script that called the executable). Our CPU utilization would start off high, but not necessarily 100%, and slowly worsen. As we began measuring to see what was happening we noticed that the process creation time [3] continually slows. While starting at sub-second times it would eventually grow to take a minute or more. The actual processing done by the executable usually takes less than 10 seconds.
Next we changed the executable to take a list of objects to process in an attempt to reduce the number of processes created. With batch sizes of a few hundred (~1% of our current sample size), the process creation times start out around 2 seconds & grow to around 5-6 seconds.
Basically, why is it taking so long to create these processes as execution continues?
[1] Oracle JDK 1.6.0_22
[2] Red Hat Enterprise Linux Advanced Platform 5.3, Linux kernel 2.6.18-194.26.1.el5 #1 SMP
[3] Creation of the ProcessBuilder object, redirecting the error stream, and starting it.

My guess is that you MIGHT be running into problems with fork/exec, if Java is using the fork/exec system calls to spawn subprocesses.
Normally fork/exec is fairly efficient, because fork() does very little - all pages are copy-on-write. This stops being so true with very large processes (i.e. those with gigabytes of pages mapped) because the page tables themselves take a relatively long time to create - and of course, destroy, as you immediately call exec.
As you're using a huge amount of heap, this might be affecting you. The more pages you have mapped in, the worse it may become, which could be what's causing the progressive slowdown.
Consider either:
Using posix_spawn, if that is NOT implemented by fork/exec in libc
Using a single subprocess which is responsible for creating / reaping others; spawn this once and use some IPC (pipes etc) to tell it what to do.
NB: This is all speculation; you should probably do some experiments to see whether this is the case.

Most likely you are running out of a resource. Are your disks getting busier as you create these processes. Do you ensure you have less processes than you have cores? (To minimise context switches) Is your load average below 24?
If your CPU consumption is dropping you are likely to be hitting IO (disk/network) contention i.e. the processes cannot get/write data fast enough to keep them busy. If you have 24 cores, how many disks do you have?
I would suggest you have one process per CPU (in your case I imagine 4) Give each JVM six tasks to run concurrently to use all its cores without overloading the system.

You would be much better off using a set of long lived processes pulling your data off of queues and sending them back that constantly forking new processes for each event, especially from the host JVM with that enormous heap.
Forking a 240GB image is not free, it consumes a large amount of virtual resources, even if only for a second. The OS doesn't know how long the new process will be aware so it must prepare itself as if the entire process will be long lived, thus it sets up the virtual clone of all 240GB before obliterating it with the exec call.
If instead you had a long lived process that you could end objects to via some queue mechanism (and there are many for both Java and C, etc.), that would relieve you of some of the pressure of the forking process.
I don't know how you are transferring the data form the JVM to the external program. But if your external program can work with stdin/stdout, then (assuming you're using unix), you could leverage inetd. Here you make a simple entry in the inetd configuration file for your process, and assign it a port. Then you open up a socket, pour the data down in to it, then read back from the socket. Inetd handles the networking details for you and your program works as simply with stdin and stdout. Mind you'll have an open socket on the network, which may or may not be secure in your deployment. But it's pretty trivial to set up even legacy code to run via a network service.
You could use a simple wrapper like this:
#!/bin/sh
infile=/tmp/$$.in
outfile=/tmp/$$.out
cat > $infile
/usr/local/bin/process -input $infile -output $outfile
cat $outfile
rm $infile $outfile
It's not the highest performing server on the planet designed to zillions of transactions, but it's sure a lot faster than forking 240GB over and over and over.

I most agree with Peter. Your are most probably suffering from IO bottlenecks. Once you have may process the OS has to work harder too for trivial tasks hence having exponential performance penalty.
So the 'solution' could be to create 'consumer' processes, only initialise certain few; as Peter suggested one per CPU or more. Then use some form of IPC to 'transfer' these objects to the consumer processes.
Your 'consumer' processes should manage sub-process creation; the processing executable which I presume you don't have any access to, and this way you don't clutter the OS with too many executables and the 'job' will be "eventually" complete.

one high-end server with one Application Server or multiple Application Servers?

If I have a high-end server, for example with 1T memory and 8x4core CPU...
will it bring more performance if I run multiple App Server (on different JVM) rather than just one App Server?
On App Server I will run some services (EAR whith message driven beans) which exchange message with each other.
btw, has java 64bit now no memory limitation any more?
http://java.sun.com/products/hotspot/whitepaper.html#64

will it bring more performance if I run multiple App Server (on different JVM) rather than just one App Server?
There are several things to take into account:
A single app server means a single point of failure. For many applications, this is not an option and using horizontal and vertical scaling is a common configuration (i.e. multiple VMs per machine and multiple machines). And adding more machines is obviously easier/cheaper if they are small.
A large heap takes longer to fill so the application runs longer before a garbage collection occurs. However, a larger heap also takes longer to compact and causes garbage collection to take longer. Sizing the VM usually means finding a good compromise between frequency and duration (in other words, you don't always want to give as much RAM as possible to one VM)
So, to my experience, running multiple machines hosting multiple JVM is the usual choice (and is usually cheaper than a huge beast and gives you more flexibility).

There is automatically a performance hit when you need to do out-of-process communications, so the question is if the application server does not scale well enough so this can pay off.
As a basic rule of thumb the JVM design allows the usage of any number of CPU's and any amount of RAM the operating system provides. The actual limits are JVM implementation specific, and you need to read the specifications very carefully before choosing to see if there is any limits relevant to you.
Given you have a JVM which can utilize the hardware, you then need an app server which can scale appropriately. A common bottleneck these days is the amount of web requests that can be processed per second - a modern server should be able to process 10000 requests per second (see http://www.kegel.com/c10k.html) but not all do.
So, first of all identify your most pressing needs (connections per second? memory usage? network bandwidth?) and use that to identify the best platform + jvm + app server combination. If you have concrete needs, vendors will usually be happy to assist you to make a sale.

Most likely you will gain by running multiple JVMs with smaller heaps instead of a single large JVM. There is a couple of reasons for this:
Smaller heaps mean shorter garbage collections
More JVMs means lesser competition for internal resources inside JVM such as thread pools and other synchronized access.
How many JVMs you should fit into that box depends on what the application does. The best way to determine this is to set up a load test that simulates production load and observe how the number of requests the system can handle grows with the number of added JVMs. At some point you will see that adding more JVMs does not improve throughput. That's where you should stop.
Yet, there is another consideration. It is better to have multiple physical machines rather than a single big fat box. This is reliability. Should this box go offline for some reason, it will take with it all the app servers that are running inside it. The infrastructure running many separate smaller physical machines is going to be less affected by the failure of a single machine as compared to a single box.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.