Generating flame graphs for a whole Java program execution

Generating flame graphs for a whole Java program execution - java

I'm trying to generate a Flame Graph for a Java program, using perf-map-agent. I know that you can use perf-java-record-stack to record data for a running process. I have also found out that you may use the script jmaps in the Flame Graph directory. I have found Brendan Gregg's example as well as a Stack Overflow post illustrating this. However, in none of these examples the Java process is given as an argument to perf record (which means that perf collects stack traces for the entire system).
I want to record profiling data for the whole execution of the program (and preferably nothing else). Is there any way to do this? I have tried:
perf record -a -g java -XX:+PreserveFramePointer <other JVM arguments> <my java program>; sudo ~/bin/brendangregg/FlameGraph/jmaps
which answers:
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.711 MB perf.data (3449 samples) ]
Fetching maps for all java processes...
Mapping PID 2213 (user malin):
wc(1): 2970 9676 156108 /tmp/perf-2213.map
always with the same PID. This PID is a running process, not the one I tried to record data for.

I think what you want might be:
run the perf record with -a -g constantly, before the java application get fired up.
run jmap while the Java application running, so that you can collect the JIT related symbols.
End the perf record after the java application finished.
filter the output of perf script by the pid you are interested in. At that time, your Java process is already running, and you know what pid it is. (open the output of the perf script to have a look, you will know how to filter)
Run the flamegraph generation script.
In this way you can have the Java application recorded for the whole period of time.

Related

Java job gives OOM error inconsistently

I have scheduled(cron) a jar file on Linux box. The jar connects with Hive server over JDBC and runs select query, after that I write the selected data in csv file. The daily data volume is around 150 Million records and the csv file is approx. of size 30GB.
Now, this job does not completes every time it is invoked and results in writing part of data. I checked the PID for error with dmesg | grep -E 31866 and I can see:
[1208443.268977] Out of memory: Kill process 31866 (java) score 178 or sacrifice child
[1208443.270552] Killed process 31866 (java) total-vm:25522888kB, anon-rss:11498464kB, file-rss:104kB, shmem-rss:0kB
I am invoking my jar with memory options like :
java -Xms5g -Xmx20g -XX:+UseG1GC -cp jarFile
I want to know what exact the error text means and Is there any solution I can apply to ensure my job will not run OOM. The wired thing is the job does not fail every time its behaviour is inconsistence.

That message is actually from linux kernel, not your job. It means that your system ran out of memory and the kernel has killed your job to resolve the problem (otherwise you'd probably get a kernel panic).
You could try modifying your app to lower memory requirements (e.g. load your data incrementally or write a distributed job that would complete needed transformations on the cluster, not just one machine).

Is it better to launch a Java app once and sleep or repeat launching and killing?

I have a Java application that needs to run several times. Every time it runs, it checks if there's data to process and if so, it processes the data.
I'm trying to figure out what's the best approach (performance, resource consumption, etc.) to do this:
1.- Launch it once, and if there's nothing to process make it sleep (All Java).
2.- Using a bash script to launch the Java app, and when it finishes, sleep (the script) and then relaunch the java app.
I was wondering if it is best to keep the Java app alive (sleeping) or relaunching every time.

It's hard to answer your question without the specific context. On the face of it, your questions sounds like it could be a premature optimization.
Generally, I suggest you do what's easier for you to do (and to maintain), unless you have good reasons not to. Here are some possible good reasons, pick the ones appropriate to your situation:
For sleeping in Java:
The check of whether there's new data is easier in Java
Starting the Java program takes time or other resources, for example if on startup, your program needs to load a bunch of data
Starting the Java process from bash is complex for some reason - maybe it requires you to fiddle with a bunch of environment variables, files or something else.
For re-launching the Java program from bash:
The check of whether there's new data is easier in bash
Getting the Java process to sleep is complex - maybe your Java process is a complex multi-threaded beast, and stopping, and then re-starting the various threads is complicated.
You need the memory in between Java jobs - killing the Java process entirely would free all of its memory.

I would not keep it alive.
Instead of it you can use some Job which runs at defined intervals you can use jenkins or you can use Windows scheduler and configure it to run every 5 minutes (as you wish).
Run a batch file with Windows task scheduler
And from your batch file you can do following:
javac JavaFileName.java // To Compile
java JavaFileName // to execute file
See here how to execute java file from cmd :
How do I run a Java program from the command line on Windows?

I personally would determine it, by the place where the application is working.
if it would be my personal computer, I would use second option with bash script (as resources on my local machine might change a lot, due to extensive use of some other programs and it can happen that at some point I might be running out of memory for example)
if it goes to cloud (amazon, google, whatever) I know exactly what kind of processes are running there (it should not change so dynamically comparing to my local PC) and long running java with some scheduler would be fine for me

disk I/O of a command line java program

I have a simple question, I've read up online but couldn't find a simple solution:
I'm running a java program on the command line as follows which accesses a database:
java -jar myProgram.jar
I would like a simple mechanism to see the number of disk I/Os performed by this program (on OSX).
So far I've come across iotop but how do I get iotop to measure the disk I/O of myProgram.jar?
Do I need a profiler like JProfiler do get this information?

iotop is a utility which gives you top n processes in descending order of IO consumption/utilization.
Most importantly it is a live monitoring utility which means its output changes every n sec( or time interval you specify). Though you can redirect it to a file, you need to parse that file and find out meaningful data after plotting a graph.
I would recommend to use sar. you can read more about it here
It is the lowest level monitoring utility in linux/unix. It will give you much more data than iotop.
best thing about sar is you can collect the data using a daemon when your program is running and then later analyze it using ksar
According to me, you can follow below approach,
Start sar monitoring, collect sar data every n seconds. value of n depends of approximate execution time of your program.
example : if your program takes 10 seconds to execute then monitoring per sec is good but if your program takes 1hr to execute then monitor per min or 30 sec. This will minimize overhead of sar process and still your data is meaningful.
Wait for some time (so that you get data before your program starts) and then start your program
end of your program execution
wait for some time again (so that you get data after your program finishes)
stop sar.
Monitor/visualize sar data using ksar. To start with, you check for disk utilization and then IOPS for a disk.
You can use Profilers for same thing but they have few drawbacks,
They need their own agents (agents will have their own overhead)
Some of them are not free.
Some of them are not easy to set up.
may or may not provide enough/required data.
besides this IMHO, Using inbuilt/system level utilities is always beneficial.
I hope this was helpful.

Your Java program will eventually be a process for host system so you need to filter out output of monitoring tool for your own process id. Refer Scripts section of this Blog Post
Also, even though you have tagged question with OsX but do mention in question that you are using OsX.
If you are looking for offline data - that is provided by proc filesystem in Unix bases systems but unfortunately that is missing in OSX , Where is the /proc folder on Mac OS X?
/proc on Mac OS X
You might chose to write a small script to dump data from disk and process monitoring tools for your process id. You can get your process id in script by process name, put script in a loop to look for that process name and start script before you execute your Java program. When script finds the said process, it will keep dumping relevant data from commands chosen by you at intervals decided by you. Once your programs ends ,log dumping script also terminates.

How to automatically kill orphaned Java processes

I've posted this question before, but didn't get the answer I wanted. The problem I have right now is that there are a number of Java processes getting orphaned. This is both on Linux and Windows. I need a way to FIND which Java processes are the ones that are orphaned and kill them.
NOTE: I CANNOT make changes to the Java code as I have no access to it on any level. I am simply running some tests on my machine. I am aware of solutions like this one
Killing a process using Java
but that is not what I am looking for.

On Linux an orphaned process becomes the child of init, which always has pid 1. To kill java processes that are children of init you can use pkill:
pkill --parent 1 java
To make this automatic you can add this command to cron, for example.

PHP exec a unit testing with memory usage information

I have a question relates to PHP and Java programming.
I am going to develop a web application to do a unit testing.
PHP is the language that I'll use for the web and will executes the exec() function.
as I read, the function returns the output of execution, ok I do need it.
but it's not enough, I also think if I can get how much memory are used during the execution.
the web will run in an Apache web server in a Linux native operating system (orefered in Ubuntu).
This is the second case:
if there is a Java source which contains a program in which requires the user input during the execution, how can I execute it via a web server with also pass all the lines which may act as the user input?
the next problem is, the exec() function only accepts parameters in line.
How if I want
so, if there is any idea how to do that things?

The /usr/bin/time program (documented in time(1)) can return the amount of memory used during execution:
$ /usr/bin/time echo hello
hello
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 2512maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps
You can see that the echo(1) program required 2.5 megabytes of memory and ran very quickly. Larger programs will be more impressive:
$ /usr/bin/time jacksum --help
Unknown argument. Use -h for help. Exit.
Command exited with non-zero status 2
0.08user 0.03system 0:00.87elapsed 12%CPU (0avgtext+0avgdata 57456maxresident)k
25608inputs+64outputs (92major+4072minor)pagefaults 0swaps
jacksum is a Java-based program, so it took 57 megabytes to tell me I screwed up the command line arguments. That's more like it.
You might also find the BSD process account system worthwhile. See lastcomm(1), sa(8), and dump-acct(8) for more information.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.