Performance hit of using an agent

Performance hit of using an agent - java

I am evaluating a product which works by instrumenting bytecode.
I launch my application via the provided agent which instruments bytecode, profiles and generates reports about my application.
What would be the best way to determine the performance penalty imposed by running the agent?
Would it be enough if I were to capture latency of a few of operation in my app via JMH with and without the agent?
Also, is there a baseline expected drop in performance by using an agent which does bytecode instrumentation?
Thanks

You can download an existing Java performance benchmark like SPECjvm2008 and run it with/without your agent. I wouldn't only write a microbenchmark sourced from the application you are monitoring because that might not highlight the bottlenecks of various operations and instrumentation techniques used by this product(method/memory/system).
The baseline that is advertised for a typical agent is 5%, which is a number I would take with a huge grain of salt.

If your application is some kind of a server, you can launch it with an agent and run a separate JMH benchmark that would measure calls to your app. That way you have your app running with or without this agent and benchmark code wouldn't interfere with it. But you'll have to isolate a lot of factors like call latency, cpu usage by the caller (if you run caller from the same host), calls distribution (you have a lot of different calls on a server, but only a couple in a benchmark), etc.
If instead you would benchmark inside a process agent is attached to, I wouldn't trust this result, unless actions take really long time, like >5ms.

Related

JMH vs JMeter for benchmarking Java classes?

If I want to benchmark methods in my Java class, which one is a better choice? What are the pros/cons between the two?

I have a lot of experience with both JMeter and JMH. I would say there are many cases where you can use either of them, but in scope of Java class benchmark testing, only JMH is really relevant, you can't do class-level benchmarking with JMeter, it will only be a crude approximation. Or it may work if your class is doing some long operations, where precision is not that important. Reasons:
JMeter's own overhead is significant. For example I have a JMH test that tests function X, and it averages at about 0.01ms. I also use this function in JMeter to set things up in custom Java sampler. That sampler typically shows average of about 20ms. That's 200 times difference, just as JMeter overhead.
With JMeter you cannot avoid JVM optimization, which makes benchmark effectively unrealistic (although due to problem #1 it may not matter at all). JMH has a mechanism to prevent this pitfall. I don't think in JMeter you can solve that issue in any clean way.
I also disagree with Dmitri T's notion that it has "Limited number of modes" - it actually has more relevant modes than JMeter, and they are easier to setup:
first of all it has several modes you can choose from (including "time to hold the load"). It also has an ability to use all modes at once (with separate results), so you don't have to choose a single mode, and you specify this by annotation or command-line parameter alone. In JMeter it's only possible with additional development (e.g. separate thread group or test).
JMH doesn't have ramp-up, but it has warmup, which allows to exclude initial executions from the result, thus making sure results are clean from the initial startup noise, which essentially the same goal as ramp-up
there's definitely ways to control iterations: number, their time, and so on; via annotation, or command line.
Also there's quite a few things that are very easy in JMH , where's in JMeter they require a lot of various workarounds. For example:
synchronizing iterations is a matter of annotation in JMH, but would require a careful setup in JMeter.
asymetric test, which allows you to test, for example, producer/consumer model at the same time, but measure them independently. In JMH you write your tests, mark them with annotation and you are done. In JMeter you'd need significant overhead to set it up right
In terms of reporting, JMH, much like JMeter has plugins for Jenkins and TeamCity, which produces result tables and graphs. Also it can publish results in various formats, which can be consumed, processed or stored by other tools.
So if JMH is so great, then what's JMeter is good for?
Mainly for testing various network protocols, while JMH was not built for that use case. This is where you probably don't care about JMeter's overhead or JVM optimization, and can take advantage of built-in samplers in JMeter. Nothing prevents you from testing any network protocol with JMH of course (as long as you use a proper library). But in JMeter you are relieved from writing a custom code to handle a communication protocols.
You can't/don't want to write Java code. In JMeter you can express your logic visually, which makes it possible for people who don't code to write the tests (although you still may need to control the logic of the test with some programming concepts, like loops or timers, and you may need some scripting in pre/post processing). Visual recording can also be attractive if you can use it (that is if you record HTTP test).
You may also feel that JMeter tests are on "functional", while JMH tests are on "unit" test level. But that's quite subjective.

If it's really your Java class and all you need is to measure number of operations per unit of time - go for JMH as it is more lightweight solution which can be integrated directly into your application build system (especially if you use Maven)
Pros:
Simplicity (given you're a Java developer)
Small footprint
Cons:
Limited number of modes (workload models) supported
Very limited reporting options which doesn't tell much to non-technical person
Not many integration options
JMeter is a multiprotocol load testing tool which can be used for assessing certain functions performance as well, there are 2 mechanisms available:
JSR223 Sampler and Groovy language - where you can write arbitrary code
JUnit Request sampler which detects JUnit tests in libraries in JMeter Classpath so you can execute them with increased number of threads.
Pros:
Extremely powerful in terms of workload models definition, you can configure threads ramp-up, ramp-down, iterations, time to hold the load, etc.
Can be run from command-line, IDE, Ant task, Maven plugin, etc.
Test report can be exported as CSV, HTML, charts, HTML Reporting Dashboard
Cons:
If you don't have previous JMeter experience you will need to learn another tool
Relatively large footprint (at least + ~100 MB of JVM heap) and reporting subsystem can negatively impact your test results, i.e. you will have higher throughput with JMH on the same hardware assuming you use 90% of available OS resources.

Performance / Stress Testing Java EE applications

It's difficult to find all bottlenecks, deadlocks, and memory leaks in a Java application using unit tests alone.
I'd like to add some level of stress testing for my application. I want to test the limits of the application and determine how it reacts under high load.
I'd like to gauge the following:
Availablity under high load
Performance under high load
Memory / CPU / Disk Usage under high load
Does it crash under high load or react gracefully
It would also be interesting to measure and contrast such characteristics under normal load.
Are their well known, standard techniques to address stress testing.
I am looking for help / direction in setting up such an environment.
Ideally, I would like to run these tests regularly, so that wecan determine if recent deliveries impact performance.

I am a big fan of JMeter. You can set up calls directly against the server just as users would access it. You can control the number of user (concurrent threads) and accesses. It can follow a workflow, scraping pertinent information page to page. It takes 1 to 2 days to learn it well enough to be productive. (You can do the basics within an hour of downloading!)
As for seeing how all that affects the server, that is a tougher question. I have used professional tools from CA and IBM. (I am drawing a blank on specific tool names - maybe due to PTSD!) I have used out-of-the-box JVM profilers. I have used native linux and windows tools. If you are not too concerned about profiling what parts of your application causes issues, then you can just use the native tools for your OS to monitor CPU/Memory/IO.

One of our standard techniques is running stepped-ramp load tests to measure scalability.

There are mainly two approaches for performance on an application:
Performance test and System Test
How do they differ? Well it's easy, it's based on their scope, Performance tests' scope is limited and are highly unrealistic. Example: Test the IncomingMessage handler on some App X, for this you would setup a test which sends meesages to this handler on a X,Y,Z basis. This approach will help you pin down problems and measure performance of individual and limited zones on your application.
So this should now take you to the question, so am I to benchmark and performance test each one of the components in my app individually? Yes if you believe the component's behavior is critical and changes on newer versions are likely to induce performance penalties. But, if you want to get a feel on your application as a whole, the bunch of components interacting with each other and see how performance comes out, then you need a system test.
A system test will always, try to replicate as close as possible any customer production environment. Here you can observe what a real world feel of your app's performance is like and act accordingly to correct it.
So as conclusion,setup a system test on your app and measure what you were saying you wanted to measure. Then Stress the system as a whole and see how it reacts, you will be surprised on the outcome.
Finally, Performance test individually any critical components you have identified or would like to keep tracked on your app.
As a general guideline, when doing performance you should always:
1.- Get a baseline for the system on an idle state.
2.- Get a baseline for the system under normal expected load.
3.- Get a baseline for the system under stress conditions.
Keep in mind that Normal load results should be extrapolated to stress conditions, and a nice system will always be that one which scales linearly.
Hope this helps.
P.S. Tests, envirnoment setup and even data collection should be as fully automated as possible, this will help you run this on a basis and spend time diagnosing performance problems and not setting up the test.

As mentioned by others; tools like JMeter (Commercial tools like LoadRunner and more) can help you generate concurrent test load.
Many monitoring tools (some provided within JDK like MissionControl, some other open source/ free tools like java Melody and many commercial one's) can help you do generic monitoring of various system (memory, CPU, network bandwidth) and JVM resources (Heap, CPU, GC overheads etc).
But to really identify bottlenecks within your code as well as other dependencies of your applications (like external services invoked, DB queries/updates etc) in a very quick and easy way; I recommend considering a good APM i.e. Application Performance Monitoring Tools like AppDynamics/ DynaTrace and more. They can help you pinpoint bottlenecks for specific request level, highlight slower parts of apps, generate percentile metrics at individual service end point or component / method level etc. They can be immensely useful , if one is dealing with very high concurrent users and stringent response time NFR's. They help uncover many bottlenecks across the layers of your application. Many even configure these tools in production (expected to cause 2-3% overheads; but worth it per me for the benefits they provide) - as production logging is not at debug level by default; so once some errors or slowness is observed; it's often extremely difficult to reproduce in lower environments or debug in absence of debug level logs from specific past duration.

There's no one tool to tackle this as far as I know. So build you own environment
Load Injecting & Scripting: JMeter, SOAP UI, LoadUI
Scheduling Tests & Automation: Jenkins, Rundeck
Analytics on transaction data, resources, application performance logs: AppDynamics, ElasticSearch, Splunk
Profiling: AppDynamics, YouKit, Java Mission Control, VisualVm

Performance tests

How to make performance tests on Java? Such as time of execution, a track of using memory, a track of operations per second and may be there are another helpful tests. How to realize this tests without an influence to the real application work (without this test)? I mean that I could be confident that my application would be work with the same performance on average without those tests.
Note: I need to attach a specific tool to my test class. I know that there is many different tools to test a huge amount of VM parameters of an application. I need to write a test class with distinct test parameters which values I can handle as I want. It would be good if API supported graphical GUI for some those parameters.

You can use tools like jconsole, visual vm especially for the memory. These come bundled with the Jdk.
As for the speed of your application, typically log messages can help you with that. If the log messages follow a standard you can pretty much write your own script to give you a pretty formatted result. Also, it is a good idea to write test classes to give you a fair idea of the performance for varying loads.
http://visualvm.java.net/gettingstarted.html

I would recommend jProfile for performance test and jConsole for Monitoring memory usage etc...

I am under the assumption that your servers are hosted on the linux env. Jconsole and Visualvm are indeed helpful to monitor your memory and performance. Just as a work around you can also use sar, top and mpstat commands to get your required values.
You will have to install the sar package on your linux server by command sudo apt-get install sysstat. You can refer "http://www.thegeekstuff.com/2011/03/sar-examples/?utm_source=feedburner" to learn more on the commnads.
You don't need admin access to configure any of these commnds..

Profiling short-lived Java applications

Is there any Java profiler that allows profiling short-lived applications? The profilers I found so far seem to work with applications that keep running until user termination. However, I want to profile applications that work like command-line utilities, it runs and exits immediately. Tools like visualvm or NetBeans Profiler do not even recognize that the application was ran.
I am looking for something similar to Python's cProfile, in that the profiler result is returned when the application exits.

You can profile your application using the JVM builtin HPROF.
It provides two methods:
sampling the active methods on the stack
timing method execution times using injected bytecode (BCI, byte codee injection)
Sampling
This method reveals how often methods were found on top of the stack.
java -agentlib:hprof=cpu=samples,file=profile.txt ...
Timing
This method counts the actual invocations of a method. The instrumenting code has been injected by the JVM beforehand.
java -agentlib:hprof=cpu=times,file=profile.txt ...
Note: this method will slow down the execution time drastically.
For both methods, the default filename is java.hprof.txt if the file= option is not present.
Full help can be obtained using java -agentlib:hprof=help or can be found on Oracles documentation

Sun Java 6 has the java -Xprof switch that'll give you some profiling data.
-Xprof output cpu profiling data

A program running 30 seconds is not shortlived. What you want is a profiler which can start your program instead of you having to attach to a running system. I believe most profilers can do that, but you would most likely like one integrated in an IDE the best. Have a look at Netbeans.

Profiling a short running Java applications has a couple of technical difficulties:
Profiling tools typically work by sampling the processor's SP or PC register periodically to see where the application is currently executing. If your application is short-lived, insufficient samples may be taken to get an accurate picture.
You can address this by modifying the application to run a number of times in a loop, as suggested by #Mike. You'll have problems if your app calls System.exit(), but the main problem is ...
The performance characteristics of a short-lived Java application are likely to be distorted by JVM warm-up effects. A lot of time will be spent in loading the classes required by your app. Then your code (and library code) will be interpreted for a bit, until the JIT compiler has figured out what needs to be compiled to native code. Finally, the JIT compiler will spend time doing its work.
I don't know if profilers attempt to compensate to for JVM warmup effects. But even if they do, these effects influence your applications real behavior, and there is not a great deal that the application developer can do to mitigate them.
Returning to my previous point ... if you run a short lived app in a loop you are actually doing something that modifies its normal execution pattern and removes the JVM warmup component. So when you optimize the method that takes (say) 50% of the execution time in the modified app, that is really 50% of the time excluding JVM warmup. If JVM warmup is using (say) 80% of the execution time when the app is executed normally, you are actually optimizing 50% of 20% ... and that is not worth the effort.

If it doesn't take long enough, just wrap a loop around it, an infinite loop if you like. That will have no effect on the inclusive time percentages spent either in functions or in lines of code. Then, given that it's taking plenty of time, I just rely on this technique. That tells which lines of code, whether they are function calls or not, are costing the highest percentage of time and would therefore gain the most if they could be avoided.

start your application with profiling turned on, waiting for profiler to attach. Any profiler that conforms to Java profiling architecture should work. i've tried this with NetBeans's profiler.
basically, when your application starts, it waits for a profiler to be attached before execution. So, technically even line of code execution can be profiled.
with this approach, you can profile all kinds of things from threads, memory, cpu, method/class invocation times/duration...
http://profiler.netbeans.org/

The SD Java Profiler can capture statement block execution-count data no matter how short your run is. Relative execution counts will tell you where the time is spent.

You can use a measurement (metering) recording: http://www.jinspired.com/site/case-study-scala-compiler-part-9
You can also inspect the resulting snapshots: http://www.jinspired.com/site/case-study-scala-compiler-part-10
Disclaimer: I am the architect of JXInsight/OpenCore.

I suggest you try yourkit. It can profile from the start and dump the results when the program finishes. You have to pay for it but you can get an eval license or use the EAP version without one. (Time limited)

YourKit can take a snapshot of a profile session, which can be later analyzed in the YourKit GUI. I use this to feature to profile a command-line short-lived application I work on. See my answer to this question for details.

Performance Cost of Profiling a Web-Application in Production

I am attempting to solve performance issues with a large and complex tomcat java web application. The biggest issue at the moment is that, from time to time, the memory usage spikes and the application becomes unresponsive. I've fixed everything I can fix with log profilers and Bayesian analysis of the log files. I'm considering running a profiler on the production tomcat server.
A Note to the Reader with Gentle Sensitivities:
I understand that some may find the very notion of profiling a production app offensive. Please be assured that I have exhausted most of the other options. The reason I am considering this is that I do not have the resources to completely duplicate our production setup on my test server, and I have been unable to cause the failures of interest on my test server.
Questions:
I am looking for answers which work either for a java web application running on tomcat, or answer this question in a language agnostic way.
What are the performance costs of profiling?
Any other reasons why it is a bad idea to remotely connect and profile a web application in production (strange failure modes, security issues, etc)?
How much does profiling effect the memory foot print?
Specifically are there java profiling tools that have very low performance costs?
Any java profiling tools designed for profiling web applications?
Does anyone have benchmarks on the performance costs of profiling with visualVM?
What size applications and datasets can visualVM scale to?

OProfile and its ancestor DPCI were developed for profiling production systems. The overhead for these is very low, and they profile your full system, including the kernel, so you can find performance problems in the VM and in the kernel and libraries.
To answer your questions:
Overhead: These are sampled profilers, that is, they generate timer or performance counter interrupts at some regular interval, and they take a look at what code is currently executing. They use that to build a histogram of where you spend your time, and the overhead is very low (1-8% is what they claim) for reasonable sampling intervals.
Take a look at this graph of sampling frequency vs. overhead for OProfile. You can tune the sampling frequency for lower overhead if the defaults are not to your liking.
Usage in production: The only caveat to using OProfile is that you'll need to install it on your production machine. I believe there's kernel support in Red Hat since RHEL3, and I'm pretty sure other distributions support it.
Memory: I'm not sure what the exact memory footprint of OProfile is, but I believe it keeps relatively small buffers around and dumps them to log files occasionally.
Java: OProfile includes profiling agents that support Java and that are aware of code running in JITs. So you'll be able to see Java calls, not just the C calls in the interpreter and JIT.
Web Apps: OProfile is a system-level profiler, so it's not aware of things like sessions, transactions, etc. that a web app would have.
That said, it is a full-system profiler, so if your performance problem is caused by bad interactions between the OS and the JIT, or if it's in some third-party library, you'll be able to see that, because OProfile profiles the kernel and libraries. This is an advantage for production systems, as you can catch problems that are due to misconfigurations or particulars of the production environment that might not exist in your test environment.
VisualVM: Not sure about this one, as I have no experience with VisualVM
Here's a tutorial on using OProfile to find performance bottlenecks.

I've used YourKit to profile apps in a high-load production environment, and while there was certainly an impact, it was easily an acceptable one. Yourkit makes a big deal of being able to do this in a non-invasive manner, such as selectively turning off certain profiling features that are more expensive (it's a sliding scale, really).
My favourite aspect of it is that you can run the VM with the YourKit agent running, and it has zero performance impact. it's only when you connect the GUI and start profiling that it has an effect.

There is nothing wrong in profiling production apps. If you work on distributed applications, there are times when a outofmemory exception occurs in a very unique probability scenario which is very difficult to reproduce in a dev/stage/uat environment.
You can try using custom profilers but if you are in a hurry and plugging in/ setting upa profiler on a production box will take time, you can also use the jvm to take a memory dump(jvms memory dump also gives you thread dump)
You can activate the automatic generation on the JVM command line, by using the following option :
-XX:+HeapDumpOnOutOfMemoryError
he Eclipse Memory Analyzer project has a very powerful feature called “group by value”, which makes it possible to build an object query and regroup the instances by a field value. This is useful in the case where you have a lot of instances that are containing a smaller set of possible values, and you can to see which values are being used the most. This has really helped me understand some complex memory dumps so I recommend you try it out.

You may also consider using one of the modern HotSpot JVM - Java Flight Recorder and Java Mission Control. It is a set of tools that allow you to collect low-level runtime information with the CPU overhead about 5% (I cannot prove the last statement anyhow, this is the statement of Oracle engineer who presented the feature and live demo).
You can use this tool as long as your application is running 1_7u40 JVM or higher. To enable the runtime info collection, you need to start JVM with particular flags:
By default, JFR is disabled in the JVM. To enable JFR, you must launch your Java application with the -XX:+FlightRecorder option. Because JFR is a commercial feature, available only in the commercial packages based on Java Platform, Standard Edition (Oracle Java SE Advanced and Oracle Java SE Suite), you also have to enable commercial features using the -XX:+UnlockCommercialFeatures options.
(Quoted http://docs.oracle.com/javase/8/docs/technotes/guides/jfr/about.html#sthref7)
I added this answer as this is viable option for profiling in production IMO.
Also there is an Eclipse plugin that supports JFR and JMC and capable of displaying information user-friendly.

The tools have improved vastly over the years. These days, most people who have needs like these use a tool that hooks into Java's instrumentation API instead of the profiling API. Surely there are more examples, but NewRelic and AppDynamics come to mind. Instrumentation-based solutions usually run as an agent in the JVM and constantly collect data. They report the data at a higher level (business transaction, web transaction, database transaction) than the old profiling approach and allow you to dig deeper (down to the method or line) if necessary. You can even setup monitoring and alerts, so you can track/alert on metrics like page load times and performance against SLAs. With these great tools, you really should have no reason to run a profiler in production any longer. The cost of running them is negligible.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.