How to approach performance benchmarking of large Java application with JMX?

How to approach performance benchmarking of large Java application with JMX? - java

I'm going to performance benchmark a large Java application. What is the best approach to do it?
It is POJO-based application that consists online part with UI and WebService layer, but the most of heavy business logic is in long running batch processes implemented with Spring Batch. The application is deployed on JBoss Application Server and persists data in Oracle database.
What I'm looking to get is not only the throughput of batch process or number of concurrent requests handled by online layer. I'm also looking to get more insight and gather some internal metrics for things like batch job steps, JDBC etc. Furthermore I'm going to correlate the metrics with system-wide metrics I can get from JVM, OS, application server, database.
Ideally I'd like to do it with no code modifications if possible.
My idea to implement this kind of performance monitoring is to rely on JMX and provide a central component that can gather metrics from various MBeans and correlate them. It seems that each main component I want to measure is JMX-enabled:
JBoss Application server has extensive JMX capabilities
Spring Batch can be JMX-enabled with Spring Batch Admin extention
JVM is OK for all it's internals exposed via JMX
Oracle with DMS MEtrics also give good insight into performance of JDBC link
I was wondering if there are any good open source frameworks/tools that could help me with collection, correlation and visualization of the metrics as described above. Thanks in advance for your recommendations.
P.S. I have done my homework and browsed the open source scene with that regards, but I don'w want to put any specific names here yet to avoid biased answers.

Related

what is the difference between jolokia & jmxtrans ? When to choose one over the other? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am looking for a JMX querying tool. I have come across jolokia & jmxtrans, both support JSON based querying. JMXtrans has writers to monitoring tools which I think is missing in Jolokia. I googled, but I didn't get much info comparing the two.
But I have read positive blog posts wrt both tools. If anyone has used these earlier,pls share ur experiences...

I'm the author of jmxtrans. I considered Jolokia before I developed jmxtrans. I choose to develop with jmxtrans because I had a different usecase in mind.
Jolokia runs as a war in Tomcat. It is similar to jmxtrans in that it allows you to query a JMX server, but that is about where the similarity ends. You would need to implement the rest of jmxtrans on top of Jolokia in order to have feature parity.
On the other hand, jmxtrans is standalone with no requirements on Tomcat. My target audience is the netops/devops role. It is configured with a relatively simple JSON based structure so it doesn't require an engineering degree to configure and use it.
The thought is that you use jmxtrans to continuously monitor services which expose statistics via jmx. You use jmxtrans to 'transform' data from jmx to whatever output format you want. Generally, devops people want to integrate their JVMs with some sort of monitoring solution like Graphite/Ganglia so I provided output writers for those tools.
jmxtrans is also very smart about how it does queries against jmx servers and there is a bit of optimization in there for that. There is also a lot of work to allow you to do things like parallelize requests to many servers to enable to you scale jmxtrans to continuously query hundreds of servers with a single jmxtrans instance.
I hope that clarifies things a bit. If you have any specific questions, I'm happy to answer them.

Indeed, jmxtrans and Jolokia have a different focus. Jmxtrans is a full monitoring solution with a scheduler which queries JMX request periodically and sent the results to a backend like graphite or rrdtool. It uses standard JSR-160 (RMI based) communication for querying JMX enabeled Java servers
Jolokia on the other hand is an HTTP/JSON-JMX Adaptor, which allows for easy access for non-Java clients and adds some unique features which are not available for a pure JSR-160 implementation. For integration into a monitoring platform yet another piece of software is needed. For Nagios, there is Jmx4Perl which provides a full featured Nagios Plugin for querying Jolokia agents.
Since I'm the Jolokia author, let me allow to stress some highlights of Jolokia
Many requests can be send at once as a single bulk requests. This allow for querying hundreds of attributes with a single HTTP turn around. This really makes a huge difference in large environments.
Due to the usage of HTTP Jolokia allows easy querying across firewall boundaries (which is a nightmare when using standard JMX connectors)
Fine grained authorization is easily possible by using a plain XML policy file
Agents are available not only for Tomcat or any other Java EE container but also for any Java 6 application (e.g. ActiveMQ or Camel)
A nice suite of command lines tools (e.g. a readline based JMX shell with context sensitive command completion) comes with Jmx4Perl.
Access libraries from Perl, Javascript, Python, ... are available.
.... For more info, please refer to www.jolokia.org
To summarize, I think you should use jmxtrans when you need a complete monitoring solution based on JSR-160 remoting (however, you could use Nagios and check_jmx4perl, too) and Jolokia when you need to overcome JSR-160 limitations or can benefit of one of its unique features. One could even imagine to integrate Jolokia into jmxtrans for the communication to the servers to monitor, which then would go over JSON/HTTP instead of JSR-160 RMI (maybe this also clarify the difference focus and supported use case).

let me put one more project on the table - https://github.com/dimovelev/metrics-sampler
it queries JMX data using regular expressions and variable substitution and also supports JDBC queries as metric sources (mostly to monitor our oracle db stats) and mod_qos for the apache mod_qos stuff. We only need graphite as output and that is the only output it currently supports.
By the way, IMHO the JMX ports are problematic with firewalls just because hotspot picks a random ephemeral port upon startup. With jrockit it can be specified (to the same port as the registry) with a standard jvm option. To do this on the hotspot you need to code it yourself (or reuse code e.g. from the tomcat jmx connector). The nice part is that it is okay to set both ports to the same value thus needing just one firewall rule.
Cheers
Dimo

Testing traffic on a webapp (Spring, Hibernate, Java)

I am developing a webapp and am looking into how I can automate testing of the web site such as seeing how it copes with multiple concurrent users / heavy traffic. Could anyone point me in the direction of any software or techniques I could be using to help me do this?
I am also looking into how to automate testing things at the front end? For example I have unit tested all of my business logic at the backend, but them am unsure as to what I should be do in order to automate testing of everything else.

For heavy traffic testing, I've been using JMeter. For front end testing, I'm using Selenium.

Beside Apache JMeter, which generates artificial load and allows you to test performance, there are two main technologies for testing accurately performance during operation:
Tagging Systems (like Google Analytics)
Access Log File Analysis
With tagging you create an account with Google Analytics and add some JavaScript code to the relevant places of your code, that allows the browser of your visitors to connect to GA and get captured there.
The Access log file holds all information about each session. There is a data overload, so data has to be Extracted, Transformed and Loaded (ETL) to a database. The evaluation can be then performed in nearly real-time. You can create some dashboard application that does the ETL and displays the status of you application in nearly real time.

I had same need some years ago while developing a large scale webapp.
I've been using Apache JMeter as for automation testing, and Yourkit Java Profiler for profiling Heap JVM usage and actually found lot of memory leaks!
cheers

Selenium to test the flow and expected results
Yorkit to profile CPU and Memory usage => excellent to track concurrency issues and memory leaks
Spring Insight to visually understand your application performance / load +
See the SQL executed for any page request => with drill down to the corresponding source code
Find pages which are executing slowly and drill into the cause
Verify your application's transactions are working as designed
Spring Insight is deployable as a stand alone war (Tomcat / tC Server / etc..)

Cloud for Flex, Java, mongoDb?

I am about to develop my masters project using Flex as front end, BlazeDs, Java Web Services and MongoDB in the backend. I am looking to deploy and manage it on a cloud. (The application analyzes financial data from various sources, I will need to query multiple endpoints for news articles and DB for processing)
It is my experiment to usage of cloud rather than deploying on my local for demo and presentation purposes.
I saw heroku (http://www.heroku.com/), but I am not sure if it allows flash.
Please suggest a cloud application platform which allows Flex, BlazeDs, Java Web Services and MongoDB?

Amazon Web Services is a good place to start. You can have a instance ready in like 15-30min from signing up. If you are just experimenting, you ought to try to get the Amazon Linux Image (AMI) up and running. Scour the net on HOWTO set up Tomcat, for your requirements it might be too much to go J2EE, but you might know better.
But a word of advice, it's better to get your application working on a local machine first. Then drop the programmer hat and put on the deployment hat 100% cause it's a b!tch configuring deployment environment for Tomcat configurations, Blaze DS, Mongo's failover servers, load balancers and all kinds of non-programming tasks. You will want to work your development stack close to home so you can diagnose quickly.
Cloud business is great only when you want 1) Not use your home PC and bandwidth as a server 2) You want to have global mirror points to your application so that user's latency in one area of the world is not slower than another part of the world 3) You want to distribute computing load burden on one application across many instances of the same application.
Clouds are relatively cheap to deploy but if you got an application that hording GB's of bandwidth and storage, be prepared to fork over $1000's+ in costs. You can save money by going with an OS with no licensing costs to get a better rate.

What sort of Java container for long-running processes on EC2?

I'm a long-time client-side (Swing) developer and I operated pretty much by myself in the same job for a long time. Working from home in a vacuum, I was pretty much completely isolated from the community. I recently took a position as a server-side Java guy for a startup, and I'm learning a ton of stuff but I'm the only Java person and am pretty much on my own again. Having never done server-side Java before, so much of this stuff is completely new and I feel like I have no idea what the normal best-practices are, or I don't have an intuitive feel for what tools to use for what jobs. I keep reading and reading various Internet sources (SO is awesome!) trying to bulk up my knowledge, but some things seem hard to search for because they don't have any obvious keywords. Hopefully some of you gurus here can point me in the right direction.
I'm in charge of implementing our backend REST service, which for now supports our website and an iPhone app. We're doing a social media site, eventually with many different clients. Currently the only clients of the service are our own website and our own iPhone app. I'm using Jersey, Spring, Tomcat, and RDS (Amazon's MySQL) on Amazon's EC2 platform. Our media storage is via S3. I've picked up all of these things pretty quickly and so far so good -- things are working fine with the website and the iPhone app. Cool.
Our next step is adding some long-running server-side processing. This processing is basically CPU-intensive stuff that doesn't involve any communication until it's done. I'm trying to figure out what the best way to handle this is. I'm thinking of using Amazon's SQS to queue up jobs in response to the REST events that should trigger them, but I can't figure out how I should handle the dequeuing and processing. I know I need some threads somewhere that take jobs off the SQS queue and process them, and then tell the REST service that the job is done. But where do these threads live?
In a plain "java -jar jobconsumer.jar" process on another EC2 instance that starts a small thread pool. Maybe use Spring to wire up this piece and start it running?
In a webapp deployed in a container like Tomcat on another EC2 instance? I don't really know what benefits I would get from this, but somehow running in a container like this seems more stable? Does this sort of container even really support long-running processing loops, or is it just good at responding to HTTP events?
Now that I write it out like that, I don't really see why I would want to use a container. It just seems like an over-complication. However, the Java community seems so centered on these types of containerized, "managed" environments that to not use a container seems somehow wrong. I feel like maybe I'm not understanding what some of the major benefits of these containers are? I mean, beyond the obvious benefits of the web-facing Servlet and JSP specs. Would any of the functionality of those specs help me out with something like this?

For a regular Java web app, you almost certainly want to be using one of the Servlet containers such as Tomcat - it takes care of accepting connections, parsing and serialising HTTP messages, JSPs, SSL, authentication, etc for you.
For a non-web app, the argument for using Tomcat (or similar) is weaker, but there are a few reasons to still consider it:
straightforward to add JSPs for querying and managing the app or add a web API in future
easy distribution of releases (one .war vs. an unholy mess of jars and config files)
hot deployment (although I've yet to see anyone using this for anything serious)
In terms of long-running processing loops, Servlet containers don't help you out beyond notifying your ServletContextListener when the app starts, so you can kick off any long-running tasks.
It's worth noting that if you're already using Spring, it's relatively easy to switch from a stand-alone app to a container using ContextLoaderListener, so it shouldn't be a problem if you decide later that you need the web stuff.

We recently faced a similar question, as we are hosting a large distributed service on EC2.
In short, we are very happy with Jetty 7 as a container. We use it for our user-facing-www, public-api, and internal-backend-api services. In some cases we use it for non-api services such as a workqueue, simply to expose a bit of status & health info for our monitoring.
The great thing about Jetty (any version) is that it can be configured in ~5 lines of code, with zero external config files etc. It's not a container specifically, but an http server that you can embed.
We use Guice for dependency injection, which also favors config-file-less implementations.
Long lived Java processes are nothing to worry about - you basically bring up your servers / threads / threadpools in your main method and don't call System.exit until you want to shutdown explicitly.

How to cluster and load balance a Spring + OSGi app

I am considering to develop an web application with Spring and OSGi. It seems like they fit together nicely. What are the options for clustering and loadbalancing such an app and what are the pros and cons of each?

Neither OSGi nor Spring were created to solve problems such as high-availability, clustering or load-balancing. You could, of course, build a clustered and load-balanced system using Spring and OSGi, but you are probably going to need something else as well, such as a way to detect and communicate node failures and load levels.
Since you are building a web application, most likely you will be using one of the many application servers. Good AS products provide clustering for you. Some also provide load balancing. You can also achieve load balancing through a completely independent setup of your own, using Apache for example to front your main application servers.
If you are really bent on creating your own solution, I have seen JGroups being used in multiple products to provide the necessary infrastructure to build a clustering and/or load-balancing solution. Some of the distributed in-memory caching products use JGroups, for example.
Talking about distributed caches, products such as Ehcache can help with scaling and load-balancing problems.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.