Scalability of a single server for running a Java Web application

Scalability of a single server for running a Java Web application - java

I want to gain more insight regarding the scale of workload a single-server Java Web application deployed to a single Tomcat instance can handle. In particular, let's pretend that I am developing a Wiki application that has a similar usage pattern like Wikipedia. How many simultaneous requests can my server handle reliably before going out of memory or show signs of excess stress if I deploy it on a machine with the following configuration:
4-Core high-end Intel Xeon CPU
8GB RAM
2 HDDs in RAID-1 (No SSDs, no PCIe based Solid State storages)
RedHat or Centos Linux (64-bit)
Java 6 (64-bit)
MySQL 5.1 / InnoDB
Also let's assume that the MySQL DB is installed on the same machine as Tomcat and that all the Wiki data are stored inside the DB. Furthermore, let's pretend that the Java application is built on top of the following stack:
SpringMVC for the front-end
Hibernate/JPA for persistence
Spring for DI and Security, etc.
If you haven't used the exact configuration but have experience in evaluating the scalability of a similar architecture, I would be very interested in hearing about that as well.
Thanks in advance.
EDIT: I think I have not articulated my question properly. I mark the answer with the most up votes as the best answer and I'll rewrite my question in the community wiki area. In short, I just wanted to learn about your experiences on the scale of workload your Java application has been able to handle on one physical server as well as some description regarding the type and architecture of the application itself.

You will need to use group of tools :
Loadtesting Tool - JMeter can be used.
Monitoring Tool - This tool will be used to monitor various numbers of resources load. There are Lot paid as well as free ones. Jprofiler,visualvm,etc
Collection and reporting tool. (Not used any tool)
With above tools you can find optimal value. I would approach it in following way.
will get to know what should be ratio of pages being accessed. What are background processes and their frequency.
Configure my JMeter accordingly (for ratios) , and monitor performance for load applied ( time to serve page ...can be done in JMeter), monitor other resources using Monitor tool. Also check count of error ratio. (NOTE: you need to decide upon what error ratio is not acceptable.)
Keep increasing Load step by step and keep writting various numbers of interest till server fails completely.
You can decide upon optimal value based on many criterias, Low error rate, Max serving time etc.
JMeter supports lot of ways to apply load.

To be honest, it's almost impossible to say. There's probably about 3 ways (of the top of my head to build such a system) and each would have fairly different performance characteristics. You best bet is to build and test.
Firstly try to get some idea of what the estimated volumes you'll have and the latency constraints that you'll need to meet.
Come up with a basic architecture and implement a thin slice end to end through the system (ideally the most common use case). Use a load testing tool like (Grinder or Apache JMeter) to inject load and start measuring the performance. If the performance is acceptable - be conservative your simple implementation will likely include less functionality and be faster than the full system - continue building the system and testing to make sure you don't introduce a major performance bottleneck. If not come up with a different design.
If your code is reasonable the bottleneck will likely be the database and somewhere in the region 100s of db ops per second. If that is insufficient then you may need to think about caching.

Definitely take a look at Spring Insight for performance monitoring and analysis.

English Wikipedia has 14GB data. A 8GB mem cache would have very high hit/miss ratio, and I think harddisk read would be well within its capacity. Therefore, the app is most likely network bound.
English Wikipedia has about 3000 page views per second. It is possible that tomcat can handle the load by careful tuning, and the network has enough throughput to server the traffic.
So the entire wikipedia site can be hosted on one moderate machine? Probably not. Just an idea.
-
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Tomcat doesn't allow for spreading over multiple machines. If you really are concerned about scalability, you must consider what to do when your application outgrows a single machine.

Related

Decision to go for distributed application?

I have a legacy product in financial domain.Using tomcat 6. We get millions of request 10k of request in hour. I am wondering at high level
should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
The reason i am planning to go in this direction so that load is distribute and get better performance With this it becomes scalable also.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
If some expert can help
what is the criteria i should consider to go for distributed model and pros/cons of it? I also tried googling where i could get some stats
like how much load a given webserver (tomcat in my case)handle efiiciently with given hardware(16 gb ram, windows 7, processor ).
Yes i am going
to do POC where i will be measuring performance with distributed model vs without bit high level input will be highly appreciated?

It is impossible to answer this questions without more details - how long does it take to reply to one request on the current server? How many resources are allocated for one request?
having 10k requests per hour means ~3 requests per second. If performing the necessary operations and replying to a request, using 1 CPU takes ~300ms - one simple machine is totally fine. This is simple math, and doesn't always work. I guess you still have peaks within those 10k requests per hour and they aren't gradually distributed.
If we assume, one reply can take up to 1 second, than you can handle as many replies per second as your system has CPUs (given that a CPU would be the bottle neck) If the CPU isn't the bottle neck for your application server, there's probably something wrong. You should set up the database(s) on a different machine and only perform computation tasks on the application server machine.
Especially in the financial sector with a legacy software, I wouldn't try splitting a running product. How old is the current server? I believe that a new Server should be cheaper than rewriting an application. Unless you expect 50-100k requests per hour very soon, I don't think, splitting up such small parts makes sense.
Instead - run it on an up to date server hardware, split application server and data storage and you should be fine.

I am wondering at high level if should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
I'm not sure what you mean for "system" in this context, but if it means that you are planning to run your application in two servers,
one dedicated to presentation and other dedicated to business layer, take in mind that a simpler approach (and probably more suitable for your app)
is build a co-located architecture.
Basically, the idea is to replicate your app in several servers (at least two) and put in front of them a load balancer that routes the incoming requests among the available servers.
All servers share the same database instance. This will give you vertical scalability and also will improve the availability of your system.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
Distributing your business logic will probably involve a refactor of your application code, if the system is working well you will add some bugs for sure.
The necessary remote calls will add latency and the fact that you execute your business logic in several servers doesn't resolve the performance problems on the presentation tier.
In Expert One-on-One J2EE Development Without EJB (pag. 65), you can find a good reading about why not distribute your business logic.

Performance / Stress Testing Java EE applications

It's difficult to find all bottlenecks, deadlocks, and memory leaks in a Java application using unit tests alone.
I'd like to add some level of stress testing for my application. I want to test the limits of the application and determine how it reacts under high load.
I'd like to gauge the following:
Availablity under high load
Performance under high load
Memory / CPU / Disk Usage under high load
Does it crash under high load or react gracefully
It would also be interesting to measure and contrast such characteristics under normal load.
Are their well known, standard techniques to address stress testing.
I am looking for help / direction in setting up such an environment.
Ideally, I would like to run these tests regularly, so that wecan determine if recent deliveries impact performance.

I am a big fan of JMeter. You can set up calls directly against the server just as users would access it. You can control the number of user (concurrent threads) and accesses. It can follow a workflow, scraping pertinent information page to page. It takes 1 to 2 days to learn it well enough to be productive. (You can do the basics within an hour of downloading!)
As for seeing how all that affects the server, that is a tougher question. I have used professional tools from CA and IBM. (I am drawing a blank on specific tool names - maybe due to PTSD!) I have used out-of-the-box JVM profilers. I have used native linux and windows tools. If you are not too concerned about profiling what parts of your application causes issues, then you can just use the native tools for your OS to monitor CPU/Memory/IO.

One of our standard techniques is running stepped-ramp load tests to measure scalability.

There are mainly two approaches for performance on an application:
Performance test and System Test
How do they differ? Well it's easy, it's based on their scope, Performance tests' scope is limited and are highly unrealistic. Example: Test the IncomingMessage handler on some App X, for this you would setup a test which sends meesages to this handler on a X,Y,Z basis. This approach will help you pin down problems and measure performance of individual and limited zones on your application.
So this should now take you to the question, so am I to benchmark and performance test each one of the components in my app individually? Yes if you believe the component's behavior is critical and changes on newer versions are likely to induce performance penalties. But, if you want to get a feel on your application as a whole, the bunch of components interacting with each other and see how performance comes out, then you need a system test.
A system test will always, try to replicate as close as possible any customer production environment. Here you can observe what a real world feel of your app's performance is like and act accordingly to correct it.
So as conclusion,setup a system test on your app and measure what you were saying you wanted to measure. Then Stress the system as a whole and see how it reacts, you will be surprised on the outcome.
Finally, Performance test individually any critical components you have identified or would like to keep tracked on your app.
As a general guideline, when doing performance you should always:
1.- Get a baseline for the system on an idle state.
2.- Get a baseline for the system under normal expected load.
3.- Get a baseline for the system under stress conditions.
Keep in mind that Normal load results should be extrapolated to stress conditions, and a nice system will always be that one which scales linearly.
Hope this helps.
P.S. Tests, envirnoment setup and even data collection should be as fully automated as possible, this will help you run this on a basis and spend time diagnosing performance problems and not setting up the test.

As mentioned by others; tools like JMeter (Commercial tools like LoadRunner and more) can help you generate concurrent test load.
Many monitoring tools (some provided within JDK like MissionControl, some other open source/ free tools like java Melody and many commercial one's) can help you do generic monitoring of various system (memory, CPU, network bandwidth) and JVM resources (Heap, CPU, GC overheads etc).
But to really identify bottlenecks within your code as well as other dependencies of your applications (like external services invoked, DB queries/updates etc) in a very quick and easy way; I recommend considering a good APM i.e. Application Performance Monitoring Tools like AppDynamics/ DynaTrace and more. They can help you pinpoint bottlenecks for specific request level, highlight slower parts of apps, generate percentile metrics at individual service end point or component / method level etc. They can be immensely useful , if one is dealing with very high concurrent users and stringent response time NFR's. They help uncover many bottlenecks across the layers of your application. Many even configure these tools in production (expected to cause 2-3% overheads; but worth it per me for the benefits they provide) - as production logging is not at debug level by default; so once some errors or slowness is observed; it's often extremely difficult to reproduce in lower environments or debug in absence of debug level logs from specific past duration.

There's no one tool to tackle this as far as I know. So build you own environment
Load Injecting & Scripting: JMeter, SOAP UI, LoadUI
Scheduling Tests & Automation: Jenkins, Rundeck
Analytics on transaction data, resources, application performance logs: AppDynamics, ElasticSearch, Splunk
Profiling: AppDynamics, YouKit, Java Mission Control, VisualVm

Java applications on Oracle Exadata

For reasons that are beside the point, a company has bought an Exadata Eighth Rack. Some of the managers thought that this would improve performance of current applications. The problem is that hardly any application makes intensive database work (yes, this is a good moment for looking at facepalm animated gifs). So, at the moment, migrations have proven just little benefit.
The question is obvious. Most of the applications are written in Java, and some of them make intensive use of Solr and Cassandra. For what I know, Exadata is intended for storing data, while Exalogic can hold applications too. Anyway, I'm wondering if there is some way of taking advantage of mentioned infrastructure.

Replace Solr with Oracle Text.
Before I get down-voted: normally I would not recommend replacing existing code built with a popular, open-source program with a seldom-used, proprietary product. But if you want to use a lot of space and CPU on your database servers then Oracle Text can definitely help.
As more generic advice, the primary role of a database is not to store data. A file system can do that. Databases are built to join data. If an application is reading a large amount of data and doing ad hoc joins, those are the jobs you want to move to the database.

Exadata -> Oracle Database extreme performance.
Exalogic -> Fusion Middleware extreme performance. (Java goes here)
Your best move will be refactoring the application to put as much workload as possible on the DB (PL/SQL).
Another thing I could think of, but this would be a radical approach I have never really tried it myself (Yes I work with Exadatas too) maybe you can give it a shot and let us know here...
What about using all those GBs on the Exadata's RAM and start tuning your Java application's latency? I mean with that gruesome amount of Memory you can try and set a real nice amount of heap and avoid Garbage Collection induced latency. Please do let me know here what comes out if you actually try this.

Which protocol do the Java applications use to connect to Oracle?
If it's not IPC (inter process communication, aka BEQUEATH, aka shared memory), but maybe TCP and you have many fast & tiny roundtrips, than this would be your low-hanging fruit - eliminate the network stack.
edit: just realized that exadata cannot run java applications by default (only ODA does) - so it wouldn't be possible to make use of IPC. However, perhaps you're able to test the impact of IPC in one of your applications using the former infrastructure?

Exadata cannot host any customer application. You cannot install anything there. You only can host Oracle database on Exadata.
It means you can use database features like DBFS (file system over Oracle database), Java option (storing and executing java code in database). But you need to check what options you have license for. And internal JVM is used, which cannot be customized or upgraded.
Exadata is database appliance designed to work with large amount of differently accessed data in very effective and manageable way.

Java Webapp Performance Issues

I have a Web Application, Made entirely with Java. The Webapp doesn't use any Graphical / Model Framework, instead, the webapp uses The Model-View Controller. It's made only with Servlet specification (Servlet ver. 2.4).
The webapp it's developed since 2001, and it's very complex. Initially, was built for work with Tomcat 4.x/5.x. Actually, runs on Tomcat 6.x. But, we still having memory Leaks.
In Depth, the specifications of The Webapp can resumed as:
Uses Servlet v. 2.4 Specification.
It doesn't use Any Framework
It doesn't use JavaEE (Not EJB)
It's based on JavaSE (With Servlets)
Works Only on IE 6+ (Because of it's age)
Infrastructure Specification
Actually, the webapp works in three environments:
First
IBM Server (I don't remember exactly the model)
Intel Xeon 2.4 Ghz
32GB RAM
1TB HDD
Tomcat (Version 6) is configured to use 8GB of RAM
Second
Dell Server
Intel Xeon 2.0Ghz
4GB RAM
500GB HDD
Tomcat (Version 5.5) is configured to use 1.5GB of RAM
Third
Dell Server
Amd Opteron 1214 2.20Ghz
4GB RAM
320GB HDD
Tomcat (Version 6) is Configured to use 1.5GB of RAM
Database specification
The webapp uses SQL Server 2008 R2 Express Edition as a DBMS, except for the user of the first server-specification, that uses SQL Server 2008 R2 Standard Edition. For the connection pools, the app uses Apache DBCP.
Problem
Well, it has very serious performance issues. The webapp slow down continually, and, many times Denies the Service. The only way to recover the app is restarting The Apache Tomcat Service.
During a performance Audit, i've found several programming issues (Like database connections that never closes, excesive use of Vector collection [instead of ArrayList]).
I want to know how can improve the performance for the app, which applications can help me to monitoring the Tomcat performance and the Webapp Memory usage.
All suggestions are gladly accepted.

You could also try stagemonitor. It is an open source performance monitoring library. It records request response times, JVM metrics, request details including a call stack (profile) of the called methods during the request and more. Because of the low overhead, you can also use it in production.
The tuning procedure would be the following.
Identify slow requests with the Request Dashboard
Analyze the stack trace of the request with the Request Detail Dashboard to find out about slow methods
Dive into your code and try to optimize those slow methods
You can also correlate some metrics like the throughput or number of sessions with the response time or cpu usage
Analyze the heap with the JVM Memory Dashboard
Note: I am the developer of stagemonitor.

I would start with some tools that can help you profiling the application. Since you are developing webapp start with Lambda Probe and Java melody.
The first step is to determine the conditions under which the app starts to behave oddly. Ask yourself few questions:
Do performance issues arise right after applications starts, or overtime?
Do performance issues are correlated to quantity of client requests?
What is the real performance problem - high load on the server or lack of memory (note that they are related, so check which one starts first)
Are there any background processes which are performing some massive operations? Are they scheduled to run at some particular time period?
Try to find some clues before going deep into code. It will help you to narrow down possible causes.
As Joshua Bloch has stated in his book entitled "Effective Java" - performance issues are rarely the effect of some minor mistakes in source code (although, of course, misuse of Java constructs can lead to disaster). Usually the cause is bad system (API) architecture.
The last suggestion based on my experience - try not to think that high memory consumption is something bad. Tomcat will use as much memory as operating system and JVM will let him (not more than max settings) and just when it needs more - Tomcat will perform garbage collection. So a typical (proper!) graph of memory consumption looks like a saw. If you are dealing with memory leak, then the graph will be increasing constantly, but indefinitely. This is the most often misunderstood of memory leaks, so keep it in mind.
To be honest - we cannot help you much further. Those are just pointers, now you will have to make extensive research to figure out the cause :)

The general solution is to use a profiler e.g. YourKit, with a realistic workload which reproduces the problem.
What I do first is a CPU only profile, a memory only profile and finally a CPU & Memory profile on at once (I then look at the CPU profile results)
YourKit can also monitor your high level operations such a Java EE resources and JDBC connections. I haven't tried these as I don't use them. ;)
It can be a good idea to improve the efficiency even if its not the cause of the problem as it will reduce the amount of "noise" in these profiles and make your issues more obvious.
You could try increasing the amount of memory available but a suspect it will just delay the problem.

Ok. So I have seen huge Java applications run lesser configurations. You should try to do the following -
First connect a Profiler to your application and see which part of your application takes the most time. You can use JProfiler or Eclipse MAT ( I personally prefer JProfiler). Also try to take a look at the objects taking the most memory. This will help you narrow down to the parts which you need to rewrite to improve the performance.
Once you have taken a look at the memory leaks update your application to use 64bit JDK(assuming it already does not do so)
Take a look at your JVM arguments and optimize them.

You can try the open source tool Webapp Watcher in order to identify where in the code is the performance issue.
You have first to add a filter in the webapp (as explained here) in order to record metrics, and then import the logs in the WAW Analyzer tool and follow the steps described in the doc to know where is the potential performance issue in the code.

how to increase the performance of an application

How can we increase the performance of an application. My application is written using Java, Hibernate, Servlets, Wsdl i have used for web services. I have executed some of the tests on linux machine, so that i can get proper TPS of the execution.
but still , i am not satisfied by the performance.
So for this, what all steps i should try to increase the performance.
adding to above, i have executed code coverage and used find bugs in the code prominently for each and every test and every service i have written.
Individual suggestions are invited.
Thanks.

Profile your application, and remove all of your bottlenecks.
In addition, or better before, take a day or two and read as much from the Java Performance Tuning newsletters as you understand.

You should monitor your application with a tool like VisualVM, JProfiler etc. to determine the performance bottleneck(s). It is pointless to tune the application without knowing where the actual performance problems are located.
In a professional environment, I suggest dynaTrace that can show you performance bottlenecks along the execution path. The tool can show you exactly where the application spends its time.

Is the performance related to disk I/O or network I/O? In a high throughput system (from DB point of view) Hibernate might not be the best way to go. If you have a lot of writes I would recommend you use a different mechanism to write to database -- perhaps simply switching to simple JDBC might speed it up?
Secondly, is it the case that your webservices are taking too long to get back with results? SOAP is not the fastest protocols really -- have you looked at something like REST maybe coupled with JSON ?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.