If I want to benchmark methods in my Java class, which one is a better choice? What are the pros/cons between the two?
I have a lot of experience with both JMeter and JMH. I would say there are many cases where you can use either of them, but in scope of Java class benchmark testing, only JMH is really relevant, you can't do class-level benchmarking with JMeter, it will only be a crude approximation. Or it may work if your class is doing some long operations, where precision is not that important. Reasons:
JMeter's own overhead is significant. For example I have a JMH test that tests function X, and it averages at about 0.01ms. I also use this function in JMeter to set things up in custom Java sampler. That sampler typically shows average of about 20ms. That's 200 times difference, just as JMeter overhead.
With JMeter you cannot avoid JVM optimization, which makes benchmark effectively unrealistic (although due to problem #1 it may not matter at all). JMH has a mechanism to prevent this pitfall. I don't think in JMeter you can solve that issue in any clean way.
I also disagree with Dmitri T's notion that it has "Limited number of modes" - it actually has more relevant modes than JMeter, and they are easier to setup:
first of all it has several modes you can choose from (including "time to hold the load"). It also has an ability to use all modes at once (with separate results), so you don't have to choose a single mode, and you specify this by annotation or command-line parameter alone. In JMeter it's only possible with additional development (e.g. separate thread group or test).
JMH doesn't have ramp-up, but it has warmup, which allows to exclude initial executions from the result, thus making sure results are clean from the initial startup noise, which essentially the same goal as ramp-up
there's definitely ways to control iterations: number, their time, and so on; via annotation, or command line.
Also there's quite a few things that are very easy in JMH , where's in JMeter they require a lot of various workarounds. For example:
synchronizing iterations is a matter of annotation in JMH, but would require a careful setup in JMeter.
asymetric test, which allows you to test, for example, producer/consumer model at the same time, but measure them independently. In JMH you write your tests, mark them with annotation and you are done. In JMeter you'd need significant overhead to set it up right
In terms of reporting, JMH, much like JMeter has plugins for Jenkins and TeamCity, which produces result tables and graphs. Also it can publish results in various formats, which can be consumed, processed or stored by other tools.
So if JMH is so great, then what's JMeter is good for?
Mainly for testing various network protocols, while JMH was not built for that use case. This is where you probably don't care about JMeter's overhead or JVM optimization, and can take advantage of built-in samplers in JMeter. Nothing prevents you from testing any network protocol with JMH of course (as long as you use a proper library). But in JMeter you are relieved from writing a custom code to handle a communication protocols.
You can't/don't want to write Java code. In JMeter you can express your logic visually, which makes it possible for people who don't code to write the tests (although you still may need to control the logic of the test with some programming concepts, like loops or timers, and you may need some scripting in pre/post processing). Visual recording can also be attractive if you can use it (that is if you record HTTP test).
You may also feel that JMeter tests are on "functional", while JMH tests are on "unit" test level. But that's quite subjective.
If it's really your Java class and all you need is to measure number of operations per unit of time - go for JMH as it is more lightweight solution which can be integrated directly into your application build system (especially if you use Maven)
Pros:
Simplicity (given you're a Java developer)
Small footprint
Cons:
Limited number of modes (workload models) supported
Very limited reporting options which doesn't tell much to non-technical person
Not many integration options
JMeter is a multiprotocol load testing tool which can be used for assessing certain functions performance as well, there are 2 mechanisms available:
JSR223 Sampler and Groovy language - where you can write arbitrary code
JUnit Request sampler which detects JUnit tests in libraries in JMeter Classpath so you can execute them with increased number of threads.
Pros:
Extremely powerful in terms of workload models definition, you can configure threads ramp-up, ramp-down, iterations, time to hold the load, etc.
Can be run from command-line, IDE, Ant task, Maven plugin, etc.
Test report can be exported as CSV, HTML, charts, HTML Reporting Dashboard
Cons:
If you don't have previous JMeter experience you will need to learn another tool
Relatively large footprint (at least + ~100 MB of JVM heap) and reporting subsystem can negatively impact your test results, i.e. you will have higher throughput with JMH on the same hardware assuming you use 90% of available OS resources.
Related
I am evaluating a product which works by instrumenting bytecode.
I launch my application via the provided agent which instruments bytecode, profiles and generates reports about my application.
What would be the best way to determine the performance penalty imposed by running the agent?
Would it be enough if I were to capture latency of a few of operation in my app via JMH with and without the agent?
Also, is there a baseline expected drop in performance by using an agent which does bytecode instrumentation?
Thanks
You can download an existing Java performance benchmark like SPECjvm2008 and run it with/without your agent. I wouldn't only write a microbenchmark sourced from the application you are monitoring because that might not highlight the bottlenecks of various operations and instrumentation techniques used by this product(method/memory/system).
The baseline that is advertised for a typical agent is 5%, which is a number I would take with a huge grain of salt.
If your application is some kind of a server, you can launch it with an agent and run a separate JMH benchmark that would measure calls to your app. That way you have your app running with or without this agent and benchmark code wouldn't interfere with it. But you'll have to isolate a lot of factors like call latency, cpu usage by the caller (if you run caller from the same host), calls distribution (you have a lot of different calls on a server, but only a couple in a benchmark), etc.
If instead you would benchmark inside a process agent is attached to, I wouldn't trust this result, unless actions take really long time, like >5ms.
What would be the performance impact if I instrument my java classes with cobertura or clover and deploy vs not instrument jars and deploy ?
Will this have any significant difference in how application performs ? I tried this test on my web application(locally), which is really small and I am not able to notice any difference in performance, but I would like to know how much impact does it have on a big projects like a heavy web server that takes around 50 requests per sec.
This will have a performance impact on a larger web application project's performance. Cobertura and Clover both do compile time instrumentation which modifies the byte code to all instructions to write the coverage data to disk or where-ever you have specified. The performance impact will be relative to the amount of code instrumented so if you only instrument your client code and not all the dependencies that will lessen the impact. You will need to test empirically to get a good idea what the impact will be.
Performance impact for an instrumented code may vary very significantly, depending on the kind of an application you have. Usually, an impact is high for CPU-intensive operations and rather low when your application performs a lot of I/0 operations (as CPU is just waiting in such case).
In case of Clover, it adds an extra instruction (a method call of the coverage recorder) for every:
method entry
code statement
true and false condition in a boolean expression
You can find some sample performance data here:
https://confluence.atlassian.com/display/CLOVER/Clover+Performance+Tuning
Please keep in mind that it's just sample data. The best is to measure performance of your own application.
I am planning to micro benchmark my java code which involves several calls to local as well as remote database. I was about to use System.nanoTime() but started reading about the micro benchmarking frameworks such as jmh and caliper. Use of these frameworks is definitely recommended but from whatever (little) I read, it seems that we can benchmark only a complete method and also it allows us to do this non-invasively (w.r.t existing code) i.e., we need not litter existing code with the code/annotations of jmh/caliper.
I want to benchmark only specific pieces of code (statements) within some methods. Is it possible to do this with any of micro benchmarking frameworks? Please provide some insights into this.
I guess, calls to a DB are usually expensive enough to eliminate most of the problem with microbenchmarking. So your approach was probably fine. If you're measuring it in production, repeating the measurement many times, and don't care about a few nanoseconds, stick with System.nanoTime.
You're doing something very different from microbenchmarking like e.g. I did here. You're not trying to optimize a tiny piece of code and you don't want to eliminate external influences.
Microbenchmarking a part of a method makes no sense to me, as a method gets optimized as a whole (and possibly also inlined). It's a different level.
I don't think any framework could help, all they can do in your case is automate the work, which you don't seem to need. Note that System.nanoTime may take several hundreds cycles (which is probably fine in your case).
You can try using metrics from codehale.
I found its easy to use and low overhead if you are using in certain configuration i.e. Exponentially decaying Reservoir.
Micro level and precise benchmarking does comes with an associated cost with it i.e. memory overhead at run time for sampling, benchmark might it self take time for calculation and and stats generation (ideal one would be offsetting that from stats) .
But if you want to bench mark db connection which I don't think should be very frequent, metrics might be appropriate, I found its easy to use. and yes it is bit invasive but configurable.
It's difficult to find all bottlenecks, deadlocks, and memory leaks in a Java application using unit tests alone.
I'd like to add some level of stress testing for my application. I want to test the limits of the application and determine how it reacts under high load.
I'd like to gauge the following:
Availablity under high load
Performance under high load
Memory / CPU / Disk Usage under high load
Does it crash under high load or react gracefully
It would also be interesting to measure and contrast such characteristics under normal load.
Are their well known, standard techniques to address stress testing.
I am looking for help / direction in setting up such an environment.
Ideally, I would like to run these tests regularly, so that wecan determine if recent deliveries impact performance.
I am a big fan of JMeter. You can set up calls directly against the server just as users would access it. You can control the number of user (concurrent threads) and accesses. It can follow a workflow, scraping pertinent information page to page. It takes 1 to 2 days to learn it well enough to be productive. (You can do the basics within an hour of downloading!)
As for seeing how all that affects the server, that is a tougher question. I have used professional tools from CA and IBM. (I am drawing a blank on specific tool names - maybe due to PTSD!) I have used out-of-the-box JVM profilers. I have used native linux and windows tools. If you are not too concerned about profiling what parts of your application causes issues, then you can just use the native tools for your OS to monitor CPU/Memory/IO.
One of our standard techniques is running stepped-ramp load tests to measure scalability.
There are mainly two approaches for performance on an application:
Performance test and System Test
How do they differ? Well it's easy, it's based on their scope, Performance tests' scope is limited and are highly unrealistic. Example: Test the IncomingMessage handler on some App X, for this you would setup a test which sends meesages to this handler on a X,Y,Z basis. This approach will help you pin down problems and measure performance of individual and limited zones on your application.
So this should now take you to the question, so am I to benchmark and performance test each one of the components in my app individually? Yes if you believe the component's behavior is critical and changes on newer versions are likely to induce performance penalties. But, if you want to get a feel on your application as a whole, the bunch of components interacting with each other and see how performance comes out, then you need a system test.
A system test will always, try to replicate as close as possible any customer production environment. Here you can observe what a real world feel of your app's performance is like and act accordingly to correct it.
So as conclusion,setup a system test on your app and measure what you were saying you wanted to measure. Then Stress the system as a whole and see how it reacts, you will be surprised on the outcome.
Finally, Performance test individually any critical components you have identified or would like to keep tracked on your app.
As a general guideline, when doing performance you should always:
1.- Get a baseline for the system on an idle state.
2.- Get a baseline for the system under normal expected load.
3.- Get a baseline for the system under stress conditions.
Keep in mind that Normal load results should be extrapolated to stress conditions, and a nice system will always be that one which scales linearly.
Hope this helps.
P.S. Tests, envirnoment setup and even data collection should be as fully automated as possible, this will help you run this on a basis and spend time diagnosing performance problems and not setting up the test.
As mentioned by others; tools like JMeter (Commercial tools like LoadRunner and more) can help you generate concurrent test load.
Many monitoring tools (some provided within JDK like MissionControl, some other open source/ free tools like java Melody and many commercial one's) can help you do generic monitoring of various system (memory, CPU, network bandwidth) and JVM resources (Heap, CPU, GC overheads etc).
But to really identify bottlenecks within your code as well as other dependencies of your applications (like external services invoked, DB queries/updates etc) in a very quick and easy way; I recommend considering a good APM i.e. Application Performance Monitoring Tools like AppDynamics/ DynaTrace and more. They can help you pinpoint bottlenecks for specific request level, highlight slower parts of apps, generate percentile metrics at individual service end point or component / method level etc. They can be immensely useful , if one is dealing with very high concurrent users and stringent response time NFR's. They help uncover many bottlenecks across the layers of your application. Many even configure these tools in production (expected to cause 2-3% overheads; but worth it per me for the benefits they provide) - as production logging is not at debug level by default; so once some errors or slowness is observed; it's often extremely difficult to reproduce in lower environments or debug in absence of debug level logs from specific past duration.
There's no one tool to tackle this as far as I know. So build you own environment
Load Injecting & Scripting: JMeter, SOAP UI, LoadUI
Scheduling Tests & Automation: Jenkins, Rundeck
Analytics on transaction data, resources, application performance logs: AppDynamics, ElasticSearch, Splunk
Profiling: AppDynamics, YouKit, Java Mission Control, VisualVm
Are there any known techniques (and resources related to them, like research papers or blog entries) which describe how do dynamically programatically detect the part of the code that caused a performance regression, and if possible, on the JVM or some other virtual machine environment (where techniques such as instrumentation can be applied relatively easy)?
In particular, when having a large codebase and a bigger number of committers to a project (like, for example, an OS, language or some framework), it is sometimes hard to find out the change that caused a performance regression. A paper such as this one goes a long way in describing how to detect performance regressions (e.g. in a certain snippet of code), but not how to dynamically find the piece of the code in the project that got changed by some commit and caused the performance regression.
I was thinking that this might be done by instrumenting pieces of the program to detect the exact method which causes the regression, or at least narrowing the range of possible causes of the performance regression.
Does anyone know about anything written about this, or any project using such performance regression detection techniques?
EDIT:
I was referring to something along these lines, but doing further analysis into the codebase itself.
Perhaps not entirely what you are asking, but on a project I've worked on with extreme performance requirements, we wrote performance tests using our unit testing framework, and glued them into our continuous integration environment.
This meant that every check-in, our CI server would run tests that validated we hadn't slowed down the functionality beyond our acceptable boundaries.
It wasn't perfect - but it did allow us to keep an eye on our key performance statistics over time, and it caught check-ins that affected the performance.
Defining "acceptable boundaries" for performance is more an art than a science - in our CI-driven tests, we took a fairly simple approach, based on the hardware specification; we would fail the build if the performance tests exceeded a response time of more than 1 second with 100 concurrent users. This caught a bunch of lowhanging fruit performance issues, and gave us a decent level of confidence on "production" hardware.
We explicitly didn't run these tests before check-in, as that would slow down the development cycle - forcing a developer to run through fairly long-running tests before checking in encourages them not to check in too often. We also weren't confident we'd get meaningful results without deploying to known hardware.
With tools like YourKit you can take a snapshot of the performance breakdown of a test or application. If you run the application again, you can compare performance breakdowns to find differences.
Performance profiling is more of an art than a science. I don't believe you will find a tool which tells you exactly what the problem is, you have to use your judgement.
For example, say you have a method which is taking much longer than it used to do. Is it because the method has changed or because it is being called a different way, or much more often. You have to use some judgement of your own.
JProfiler allows you to see list of instrumented methods which you can sort by average execution time, inherent time, number of invocations etc. I think if this information is saved over releases one can get some insight into regression. Offcourse the profiling data will not be accurate if the tests are not exactly same.
Some people are aware of a technique for finding (as opposed to measuring) the cause of excess time being taken.
It's simple, but it's very effective.
Essentially it is this:
If the code is slow it's because it's spending some fraction F (like 20%, 50%, or 90%) of its time doing something X unnecessary, in the sense that if you knew what it was, you'd blow it away, and save that fraction of time.
During the general time it's being slow, at any random nanosecond the probability that it's doing X is F.
So just drop in on it, a few times, and ask it what it's doing.
And ask it why it's doing it.
Typical apps are spending nearly all their time either waiting for some I/O to complete, or some library function to return.
If there is something in your program taking too much time (and there is), it is almost certainly one or a few function calls, that you will find on the call stack, being done for lousy reasons.
Here's more on that subject.