I have a Google AppEngine application written in Java using JDK8.
I want to get the CPU utilization. How can I achieve that?
I have tried two approaches:
final OperatingSystemMXBean os = ManagementFactory.getOperatingSystemMXBean(); // this is the java.lang version
final double systemCpuLoadRatio = Math.max(0.0, os.getSystemLoadAverage() / os.getAvailableProcessors());
final long systemCpuLoad = (long) (systemCpuLoadRatio * 100);
OperatingSystemMXBean os = (OperatingSystemMXBean)ManagementFactory.getOperatingSystemMXBean() // This is the sun.management version.
final long systemCpuLoad = os.getAvailableProcessors();
Both approaches always yield zero as the result. The CPU utilization cannot be 0 as there are real requests ongoing.
Try multiplying by 100.0 then let the cast remove the fractional part...
You may also try the oshi framework: https://github.com/oshi
Related
Problem: application is killed due to memory usage
Status reason OutOfMemoryError: Container killed due to memory usage
Exit Code 137
Environment: Spring Boot app in docker container on AWS ECS instance with configuration:
AWS hard memory limit/total RAM - 384 MB
-Xmx134m
-Xms134m
-XX:MaxMetaspaceSize=110m
According to the java max memory formula (which I have found during the weeks of research -https://dzone.com/articles/why-does-my-java-process and improved a bit):
max_memory = xmx + non-heap_memory + threads_number * xss
non-heap_memory = metaspace + (compressed_class_space + codeHeap_profiledmethods + CodeHeap_non-methods+ CodeHeap_non-profiled_methods).
Take into account that 2nd part of non-heap memory takes nearly 40mb combined
so in my case max_memory = 134(xmx) + 110(metaspace_max) + 40(non-heap-not_metaspace) + 9(threads) * 1(default Xss) = 293
However, under the load heapUsedMemory = ~105-120mb and non-heapUsedMemory(metaspace + JVM stuff) = ~140mb which means that there must be 384 - 120 - 140 = 124 mb of free memory.
So the problem is that there is plenty of free memory and all java tools are showing it(jstat -gc, Spring on grafana, different java API etc).
Here is code-snippet of an API I have developed and used during my research:
#GetMapping("/memory/info")
public Map<String, String> getMmUsage() {
Map<String, Long> info = new HashMap<>();
List<MemoryPoolMXBean> memPool = ManagementFactory.getMemoryPoolMXBeans();
for (MemoryPoolMXBean p : memPool) {
if ("Metaspace".equals(p.getName())) {
info.put("metaspaceMax", p.getUsage().getMax());
info.put("metaspaceCommitted", p.getUsage().getCommitted());
info.put("metaspaceUsed", p.getUsage().getUsed());
}
}
info.put("heapMax", ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax());
info.put("heapCommitted", ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getCommitted());
info.put("heapUsed", ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed());
info.put("non-heapMax", ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getMax());
info.put("non-heapCommitted", ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getCommitted());
info.put("non-heapUsed", ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getUsed());
Map<String, String> memoryData = info.entrySet().stream().collect(Collectors.toMap(Entry::getKey, e -> {
long kb = e.getValue() / 1024;
return (kb / 1024) + " Mb (" + kb + " Kb)";
}, (v1, v2) -> v1, TreeMap::new));
Set<Thread> threads = Thread.getAllStackTraces().keySet();
memoryData.put("threadsCount", Integer.toString(threads.size()));
memoryData.put("threadsCountRunning",
Long.toString(threads.stream().filter(t -> t.getState() == Thread.State.RUNNABLE).count()));
return memoryData;
}
So my application should be stable since it has a lot of memory to deal with. But its not the case. As I described above Container killed due to memory usage
As I described above java tools shows that there are plenty of memory and that memory(heap) is being released. On the other hand AWS cloudwatch metric MmeoryUtilization show constant growth of memory (in very small portions):
Interesting exploration: during endless testing I have found next: when I set xmx=134MB application lives longer and was capable of surviving 5 rounds of performance tests, when I set xmx/xms = 200mb it survived 1 round of performance tests. How could this be possible??
My opinion: It looks like that there is something that uses memory and not releasing it properly.
I would like to hear your opinions regarding why my app constantly dying when there is 50+mb of free memory and why AWS metrics show different result comparing to java tools
I am writing a program in Java to periodically display the CPU and memory usage of a given process ID. My implementation invokes tasklist. It is pretty straightforward to get the memory usage by the following command:
tasklist /fi "memusage ge 0" /fi "pid eq 2076" /v
This will return the memory usage of process id 2076 and i can use this for my task. By invoking the following command, I can extract the CPU Time.
tasklist /fi "pid eq 2076" /fi "CPUTIME ge 00:00:00" /v
My question is, how would I go about getting the CPU usage of this process?
I found a post on StackOverflow for my question but the answer isn't clear and I don't understand what to type in the command to get what I need. The question was answered in 2008 and someone asked for clarification in 2013 but the person that answered the question hasn't replied.
Here is the post that I have found.
Memory is like a tea cup, it maybe full or empty, an instantaneous look at the cup allows you to see how full of tea it is (that is your "memusage" command).
CPU is like a ski lift. It moves at a reasonably constant rate irrespective of whether your are riding the lift or not. It is not possible to determine your usage in a single instantaneous observation - we need to know how long you were riding it for (that is your "cputime" command). You have to use the "cputime" command at least twice!
For example:
At 7:09 pm, you run the cputime command on your process, and it returns "28 minutes"
At 7:17 pm, you run the cputime command on your process again, and it returns "32 minutes"
From the first time you ran the cputime command to the second time, the usage has increased from 28 minutes to 32 minutes -- the process has used 4 minutes of CPU time.
From 7:09pm to 7:17pm is a duration of 8 minutes -- A total of 8 minutes of time were available, but your process just used 4 minutes: 4 / 8 = 50% average system usage.
If your system has multiple processors, then you can divide by the total number of CPUs to get an average per CPU - e.g. 50% / 2 = 25% average in a dual cpu system.
I used minutes above for ease of writing - in reality you may be looking at how many nanoseconds of CPU time the process used during a time window that is just milliseconds long.
tasklist does not provide the information you are looking for. I would suggest using Get-Counter. A comment on an answer from the SuperUser site looks to be on track for what you're after.
Get-Counter '\Process(*)\% Processor Time' | Select-Object -ExpandProperty countersamples| Select-Object -Property instancename, cookedvalue| ? {$_.instanceName -notmatch "^(idle|_total|system)$"} | Sort-Object -Property cookedvalue -Descending| Select-Object -First 25| ft InstanceName,#{L='CPU';E={($_.Cookedvalue/100/$env:NUMBER_OF_PROCESSORS).toString('P')}} -AutoSize
I once wrote a class:
private static class PerformanceMonitor {
private int availableProcessors = ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors();
private long lastSystemTime = 0;
private long lastProcessCpuTime = 0;
/**
* Get's the cpu usage of the jvm
*
* #return the cpu usage a double of percentage
*/
private synchronized double getCpuUsage() {
if (lastSystemTime == 0) {
baselineCounters();
return 0d;
}
long systemTime = System.nanoTime();
long processCpuTime = 0;
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
processCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
double cpuUsage = ((double) (processCpuTime - lastProcessCpuTime)) / ((double) (systemTime - lastSystemTime));
lastSystemTime = systemTime;
lastProcessCpuTime = processCpuTime;
return cpuUsage / availableProcessors;
}
private void baselineCounters() {
lastSystemTime = System.nanoTime();
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
lastProcessCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
}
}
Which is used like:
private static final PerformanceMonitor _MONITOR = new PerformanceMonitor();
_MONITOR.getCpuUsage();
This prints out the usage of the cpu consumed by the process of this JVM.
I'm getting an out of memory exception due to lack of Java heap space when I try and download tweets using Flume and pipe them into Hadoop.
I have set the heap space currently to 4GB in the mapred-site.xml of Hadoop, like so:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
I am hoping to download tweets continually for two days but can't get past 45 minutes without errors.
Since I do have the disk space to hold all of this, I am assuming the error is coming from Java having to handle so many things at once. Is there a way for me to slow down the speed at which these tweets are downloaded, or do something else to solve this problem?
Edit: flume.conf included
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <required>
TwitterAgent.sources.Twitter.consumerSecret = <required>
TwitterAgent.sources.Twitter.accessToken = <required>
TwitterAgent.sources.Twitter.accessTokenSecret = <required>
TwitterAgent.sources.Twitter.keywords = manchester united, man united, man utd, man u
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:50070/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Edit 2
I've tried increasing the memory to 8GB which still doesn't help. I am assuming I am placing too many tweets in Hadoop at once and need to write them to disk and release the space again (or something to that effect). Is there a guide anywhere on how to do this?
Set JAVA_OPTS value at flume-env.sh and start flume agent.
It appears the problem had to do with the batch size and transactionCapacity. I changed them to the following:
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
This works without me even needing to change the JAVA_OPTS value.
I am a very proficient C# developer, but need to start writing code that works on the JVM. The Java language is feature poor compared to C# these days, so I was interested in the features that Scala offers.
However, when hearing that in Scala, all operators are simply methods, I became suspicious of the performance impact that would have on math-heavy computations (which is important for the types of applications my team writes)
So I ran some simple int based tests, and find that Scala is about 30x slower than the equivalent Java code. Not good! Can anyone tell me what I'm doing wrong? or how to improve the computational performance of the scala example to be on par with Java?
UPDATE1: as pointed out by the first two answers, I was being a super-noob and running this in the IntelliJ IDE. I don't know how to run the scala app via the java command line, which may be an IntelliJ issue. Thanks for the help guys, I'll need to investigate simple commandline execution of scala before I continue with perf testing, as the IDE given results are obviously too inaccurate.
UPDATE2: Luigi in the comments says in IntelliJ he gets equal times, so it seems that my wild difference isn't due to IntelliJ? Any other ideas on what this could be? I'll try getting this running via command line and post an update with my results.
UPDATE3:
after running this via commandline, I get the same 30x perf difference.
My computer is a 3core AMD x64 3.4Ghz, running J2SE 6 jdk 64bit 1.6.0_31, Window7.
Here are my runtimes:
Java: 210ms.
Scala: between 2000 and 7400ms (generally the 7000 range)
so, i suppose the question is still open. why is scala running so slow on my platform? something with the java 64bit runtime, or with Java 6?
runtime versions:
C:\Users\jason>java -showversion
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
C:\Users\jason>scala
Welcome to Scala version 2.9.1-1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31).
UPDATE 4 while my original test has a 30x difference, increasing the iterations to 100000000 causes the difference to shrink to about 33%, so it seems scala was still being dominated by some unknown initialization cost on my machine. i'll close this with the highest rated answer as i don't think we'll find out the performance problem, due to no one except me seeing the issue :(
*UPDATE 5, SOLUTION: based on the help from the 2 answers i got, i figured out the problem, see my answer below for more details (summary: the first call to System.nanoTime() takes a long time) *
Here are my sample apps:
//scala
object HelloWorld {
//extends Application {
def main(args: Array[String]) {
println("hello scala")
var total: Long = 0
var i: Long = 0
var x: Long=0;
//warm up of the JVM to avoid timing of runtime initialization
while (i < 100000)
{
x=i;
x += x - 1;
x -= x + 1;
x += 1;
x -= 1;
total += x;
i+=1;
}
//reset variables
total = 0
i = 0;
//start timing
var start: Long = System.nanoTime
//run test
while (i < 100000) {
x=i;
x += x - 1;
x -= x + 1;
x += 1;
x -= 1;
total += x;
i+=1;
}
var end: Long = System.nanoTime
System.out.println("ms, checksum = ")
System.out.println((end - start) / 1000)
System.out.println(total)
}
}
and here is the java equivalent, 30x faster
//java
public class app {
public static void main(String[] args)
{
String message = "hello, java";
System.out.println(message);
long total = 0;
//warm up of the JVM to avoid timing of runtime initialization
for(long i=0;i< 100000;i++)
{
long x=i;
x+=x-1;
x-=x+1;
x++;
x--;
total+=x;
}
//reset variables
total = 0;
//start timing and run test
long start = System.nanoTime();
for(long i=0;i< 100000;i++)
{
long x=i;
x+=x-1;
x-=x+1;
x++;
x--;
total+=x;
}
long end = System.nanoTime();
System.out.println("ms, checksum = ");
System.out.println((end-start)/1000);
System.out.println(total);
}
}
So, I guess I figured out the answer myself.
The problem is in the call to System.nanoTime. Doing this has some initialization cost (loading up the Java base libraries, etc) which is much less expensive to load when called from the Java runtime than from the Scala runtime.
I prove this by changing the initial value of total, instead setting it to
var total: Long = System.nanoTime()
This is added before the first "warm up" loop, and doing so now makes both versions of the app (Java and Scala) run at the same time: about 2100 for 1000000 iterations.
Thanks for your guys' help on this, I wouldn't have figured this out without your assistance.
ps: I'll leave the "accepted answer" as-is because I wouldn't have tracked this down without his help.
I've re-run your code (and increased number of cycles x1000, so to get some meaning into benchmark).
Results:
Scala: 92 ms
Java: 59 ms
You can see that Java is 30% faster.
Looking at the bytecode, I can say that two versions are almost identical - so the difference is really strange (the bytecode listing is quite long, so I won't post it here).
Increasing the count x10000 gives this:
Scala: 884 ms
Java: 588 ms
Since the results are fairly stable, there should be some constant factor lurking somewhere. Maybe in some parameters that "scala" runner passes to JVM?
EDIT:
My configuration:
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
$ scala -version
Scala code runner version 2.9.0.1 -- Copyright 2002-2011, LAMP/EPFL
$ inxi -SCD
System: Host the-big-maker Kernel 2.6.35-22-generic x86_64 (64 bit) Distro Linux Mint 10 Julia
CPU: Quad core AMD Phenom II X4 965 (-MCP-) cache 2048 KB flags (lm nx sse sse2 sse3 sse4a svm)
Clock Speeds: (1) 800.00 MHz (2) 800.00 MHz (3) 800.00 MHz (4) 800.00 MHz
Disks: HDD Total Size: 750.2GB (5.8% used) 1: /dev/sda OCZ 90.0GB
2: /dev/sdb ST3500413AS 500.1GB 3: /dev/sdc ST3802110A 80.0GB
4: /dev/sdd Maxtor_6Y080M0 80.0GB
$ javac app.java
$ scalac app.scala
$ scala HelloWorld
hello scala
ms, checksum =
1051
-100000
$ java app
hello, java
ms, checksum =
1044
-100000
What I'm doing wrong?
The javadoc for Runtime.availableProcessors() in Java 1.6 is delightfully unspecific. Is it looking just at the hardware configuration, or also at the load? Is it smart enough to avoid being fooled by hyperthreading? Does it respect a limited set of processors via the linux taskset command?
I can add one datapoint of my own: on a computer here with 12 cores and hyperthreading, Runtime.availableProcessors() indeed returns 24, which is not a good number to use in deciding how many threads to try to run. The machine was clearly not dead-idle, so it also can't have been looking at load in any effective way.
On Windows, GetSystemInfo is used and dwNumberOfProcessors from the returned SYSTEM_INFO structure.
This can be seen from void os::win32::initialize_system_info() and int os::active_processor_count() in os_windows.cpp of the OpenJDK source code.
dwNumberOfProcessors, from the MSDN documentation says that it reports 'The number of logical processors in the current group', which means that hyperthreading will increase the number of CPUs reported.
On Linux, os::active_processor_count() uses sysconf:
int os::active_processor_count() {
// Linux doesn't yet have a (official) notion of processor sets,
// so just return the number of online processors.
int online_cpus = ::sysconf(_SC_NPROCESSORS_ONLN);
assert(online_cpus > 0 && online_cpus <= processor_count(), "sanity check");
return online_cpus;
}
Where _SC_NPROCESSORS_ONLN documentation says 'The number of processors currently online (available).' This is not affected by the affinity of the process, and is also affected by hyperthreading.
According to Sun Bug 6673124:
The code for active_processor_count, used by Runtime.availableProcessors() is as follows:
int os::active_processor_count() {
int online_cpus = sysconf(_SC_NPROCESSORS_ONLN);
pid_t pid = getpid();
psetid_t pset = PS_NONE;
// Are we running in a processor set?
if (pset_bind(PS_QUERY, P_PID, pid, &pset) == 0) {
if (pset != PS_NONE) {
uint_t pset_cpus;
// Query number of cpus in processor set
if (pset_info(pset, NULL, &pset_cpus, NULL) == 0) {
assert(pset_cpus > 0 && pset_cpus <= online_cpus, "sanity check");
_processors_online = pset_cpus;
return pset_cpus;
}
}
}
// Otherwise return number of online cpus
return online_cpus;
}
This particular code may be Solaris-specific. But I would imagine that the behavior would be at least somewhat similar on other platforms.
AFAIK, it always gives you the total number of available CPUs even those not available for scheduling. I have a library which uses this fact to find reserved cpus. I read the /proc/cpuinfo and the default thread affinity of the process to work out what is available.