Configure Tomcat for multiple simultaneous SOAP requests

Configure Tomcat for multiple simultaneous SOAP requests - java

I'm very much a Tomcat newbie, so I'm guessing that the answer to this is pretty straightforward, but Google is not being friendly to me today.
I have a Java web application mounted on Apache Tomcat. Whilst the application has a front page (for diagnostic purposes), the application is really all about a SOAP interface. No client will ever need to look up the server's web page. The clients send SOAP requests to the server, which parses the requests and then looks up results in a database. The results are then passed back to the clients, again over SOAP.
In its default configuration, Tomcat appears to queue requests. My experiment consisted of installing the client on two separate machines pointing at the same server and running a search at exactly the same time (well, one was 0.11 seconds after the other, but you get the picture).
How do I configure the number of concurrent request threads?
My ideal configuration would be to have X request threads, each of which recycles itself (i.e. calls destructor and constructor and recycles its memory allocation) every Y minutes, or after Z requests, whichever is the sooner. I'm told that one can configure IIS to do this (although I also have no experience with IIS), but how would you do this with Tomcat?
I'd like to be able to recycle threads because Tomcat seems to be grabbing memory when a request comes in and not releasing it, which means that I get occasional (but not consistent) Java Heap Space errors when we are approaching the memory limit (which I have already configured to be 1GB on a 2GB server). I'm not 100% sure if this is due to a memory leak in my application, or just that the tools that I'm using use a lot of memory.
Any advice would be gratefully appreciated.
Thanks,
Rik

Tomcat, by default, can handle up to 150 concurrent HTTP requests - this is totally configurable and obviously varies depending on your server spec and application.
However, if your app has to handle 'bursts' of connections, I'd recommend looking into Tomcat's min and max "spare" threads. These are threads actively waiting for a connection. If there aren't enough waiting threads, Tomcat has to allocate more (which incurs a slight overhead), so you might see a delay.
Also, have a look at my answer to this question which covers how to configure the connector:
Tomcat HTTP Connector Threads
In addition, look at basic JVM tuning - especially in relation to heap allocation overhead and GC pause times.

Related

The load in some Google App Engine instances and very small in others. Why?

As you can see in the following snapshot the load in some of the dynamic instances is huge (more than 20k requests) while in other are very small.
Why is this happening? Shouldn't GAE distribute uniformly the load??

If the load would be balanced across the active dynamic instances then they'd rarely become idle (only when the entire app's traffic would drop to almost nothing) thus it'd be difficult to dynamically shut them down.
More info here:
https://cloud.google.com/appengine/docs/scaling#scaling_dynamic_instances
https://cloud.google.com/appengine/docs/managing-resources#instances

This is what I got from a Google App Engine expert:
App Engine request scheduling uses several heuristics for routing requests to application instances. At low QPS it stays in affinity scheduling mode and routes majority of requests to instances that have most recently responded to the health check and handled requests successfully. That would explain why you see this variation in number of requests for each instance. As you ramp up the application traffic, load should even out across all instances.
I also asked what was the policy GAE follow to shut down the instances. I see that many of them are up even if they are not receiving any request
Dynamic instances that are not serving requests get garbage collected eventually. However, you only get billed for 15 additional minutes after they serve the last request. Please refer to this doc for additional information on instance billing.
https://cloud.google.com/appengine/kb/billing#different_on_demand_instance_resident

What determines number of simultaneous connections

In a Java servlet environment, what are the factors that are the bottleneck for number of simultaneous users.
Number of HTTP connections the server can allow per port
Number of HTTP connections the server can allow across several ports (I can have multiple WAS profiles on several HTTP ports)
Number of servlets in pool
Number of threads configured for WAS to use to service connections
RAM available to server (is there any any correletation between number of service threads assuming 0-memory leak in application)
Are there any other factors?
Edited:
To leave business logic out of the picture, assume have only one servlet printing one line on Log4j.
Can my Tomcat server handle 6000 simultaneous HTTP connections? Why
not (file handles? CPU time per request?)?
Can I have thread pool size as 5000 (do idle threads cost CPU/RAM)?
Can I have oracle connection pool size as 500 connections (do idle
connections cost CPU/RAM)?
Is the amount of garbage that is generated for each connection have an impact? For example, if for each HTTP connection 20KB of objects are created and left behind by Tomcat.. then by the time 2500 requests are processed 100MB heap would be used and this may trigger a GC pause of 300ms.
Can we say something like this: if Tomcat uses 0.2 sec of CPU time for processing a single HTTP request, then it would be able to handle roughly 500 http connections in a second. So, 6000 connections would need 5 seconds.

Interesting question, If we leave apart all the performance deciding attributes finally it boils down to how much work you are doing in the servlet or how much time it takes if it has highest I/O, CPU and memory. Now lets move down with you list with the above statement in mind;-
Number of HTTP connections the server can allow per port
There are limit for file descriptors but that again gets triggered by how much time a servlet is taking complete a request or how much time it takes from request first byte receive to finish sending the entire response. Because if it take only 1ms and you are using Netty and persistent connection, you can reach a really high >> 6000.
Number of servlets in pool
Theoretically >> 6000. But how many thread are processing your requests? Is there a thread pool that is burning your requests ? So you want to increase threads, but how much lets say 2000 concurrent threads. Is your CPU behaving poor with context switching ? Is it I/O bound? if yes it makes sense to context switch but then you will be hitting those network limits because a lot of thread waiting on network I/O, so ultimately how much time you spent on a piece of work.
DB
If it oracle, bless you with connection management, you definitely need rigorous monitoring here. Now this is just another limiting factor and can be considered as an just another blocking I/O. By definition of I/O, latency/throughput matters and becomes a bottleneck the moment it becomes the bigger than the smallest piece of work.
So, finally, you need to break down following or more attributes for all the servlets
Is it CPU bound? If yes, how much cycles it takes or can it be converted safely to some time unit. e.g. 1ms for just the compute piece of work.
Is it I/O bound, If yes similarly find the unit.
and others
A long list of what you have, e.g. CPU, Memory, GB/s
Now you know how much work needs to be done and all you do is divide by what you have and keep tuning , so that you find out the optimal and also find out what else attribute you have not considered and consider them.

The biggest bottleneck I have experienced is the time it takes to process the request.
The faster you can service a request, the more connections you can handle.
It's a difficult question to answer due to every application being different.
To figure this out for an application I support, I created a unit test that spawns many threads and I watch the memory usage in VisualVM in eclipse.
You can see how your memory consumption changes with the number of threads in use.
And you should be able to get a thread dump and see how much memory the thread is using.
You can extrapolate an average out to understand how much RAM you might need for N number of users.
The bottleneck will be a moving target since you'll optimize one area until you can scale larger, then another area will become your bottleneck.
If the response time of the servlet is a bottleneck, you'll could use some queuing mathematics to determine how many requests can be queued optimally based on the avg response time.
http://www4.ncsu.edu/~hp/SSME_QueueingTheory.pdf
Hope this helps.
Updated to address your additional questions:
Can my Tomcat server handle 6000 simultaneous HTTP connections? Why not (file handles? CPU time per request?)?
It's possible but probably not. Also you should probably add a web layer in front of the application server if you plan on doing high volume.
Suppose you have 6000 users all pounding away on your application. Each request a user sends only exists on the server for a moment [hopefully], and your peak thread count may have never reached over 20.
I'd recommend setting up some monitoring to understand how your application performs under real use cases. Check out http://Hawt.io which uses Jolokia to grab JMX metrics via http.
If your serious about analytics I'd recommend using something like Graphite to aggregate your JMX metrics. https://github.com/graphite-project/graphite-web
I've written a collector for Jolokia to send metrics to Carbon/Graphite, and may be able to open-source it with approval from my management. Let me know if you are interested.
Can I have thread pool size as 5000 (do idle threads cost CPU/RAM)?
Idle threads are not much to worry about, though setting your thread pool too high could allow your application server to receive too many requests. If this happens you may end up flooding your DB with connections it cant handle, or your memory allocation may not be enough to handle so many requests. This could start overall application performance degradation.
Set too low, and your app server could start queuing request again causing performance degradation.
It's normally to have some queuing during spikes or high volume times, but you don't want to overload your application server. Check out queuing theory to understand more about this.
Also, this is where having a web server in front of the app server could help you. If you have Apache serve your static content, only dynamic requests will reach the application servers in most cases.
Tuning is very specific to your individual application. I'd recommend staying with the defaults and just optimize your code until you can gather enough data to know which knob should be turned.
Can I have oracle connection pool size as 500 connections (do idle connections cost CPU/RAM)?
Same situation as the application thread pool size. Though your pool size for DB should be much smaller than the app thread count.
500 would be too high for most web applications unless you have very high volume, in which case you may need a DB cluster environment like Oracle RAC.
If the pool is set too high and you start using a lot of connections, your DB hardware will not be able to keep up and you will end up with performance problem on the database server.
The time it takes for a query to return may increase, in turn causing your application response time to increase. The "log jam" effect.
Use profiling or metrics to determine the avg number of active DB connections under normal use, and use that as a baseline for determining the max allowed.
Is the amount of garbage that is generated for each connection have an impact? For example, if for each HTTP connection 20KB of objects are created and left behind by Tomcat.. then by the time 2500 requests are processed 100MB heap would be used and this may trigger a GC pause of 300ms.
The numbers would be different, but yes. Also remember the Full GC are more concern. The incremental GCs will not pause your application. Check out "concurrent mark and sweep" and "Garbage first".
Can we say something like this: if Tomcat uses 0.2 sec of CPU time for processing a single HTTP request, then it would be able to handle roughly 500 http connections in a second. So, 6000 connections would need 5 seconds.
It's not quite that easy as each request is coming in, there are also some being processed and completed. Check out queuing theory to understand this better.
http://www4.ncsu.edu/~hp/SSME_QueueingTheory.pdf

There is another common bottleneck : the size of the database connection pool. But I have an additional remark : when you exhaust the number of allowed HTTP connections, of the number of threads allowed to serve request, you will only reject some requests. But when you exhaust memory (too much sessions with too much data for example), you can crash the whole application.
The difference is that in the case of heavy load for a short time, when load later falls down :
in first case, the application is up and can serve requests normally
in second case the application is down and must be restarted
EDIT :
I forgot to remember real use cases. The biggest problem I ever found for serving numerous concurrent connections is the quality of the database requests (assuming you use a database). There is not a direct impact since there is no maximum number, but you can easily hog all database server resources. Common examples of poor database requests :
no index on a table with a large number of rows
a request (on a big table) that makes no use of any index
the n+1 syndrome : with a ORM when you map a one to many relation to a collection no eagerly when you always need data from the collection
the load full database syndrome : with a ORM when you map all relations as eager, any single request ends in loading a high quantity of dependent data.
What is worse with those problems, is that they can cause no harm in tests when the database is young because there are not that many rows, but with time and increasing number of rows performances fall giving a unusable application over few users.

Number of HTTP connections the server can allow per port
Unlimited except by kernel resources, e.g. FDs, socket buffer soace, etc.
Number of HTTP connections the server can allow across several ports (I can have multiple WAS profiles on several HTTP ports)
As the number of connections per port is unlimited, this irrelevant.
Number of servlets in pool
Irrelevant except insofar as it increases the rate of incoming requests.
Number of threads configured for WAS to use to service connections
Relevant in an indirect way, see below.
RAM available to server (is there any any correletation between number of service threads assuming 0-memory leak in application)
Relevant if it limits the number of threads below the configured number of threads mentioned above.
The fundamental limitation is request service time. The shorter, the better. The longer it is, the longer the thread is tied up in that request, the longer wait queues get, ... Queuing theory dictates that the 'sweet spot' is no more than 70% server utilization. Beyond that, wait times grow rapidly with increasing utilization.
So anything that contributes to request service time is significant: for example, thread pool size, connection pool size, concurrency bottlenecks, ...

You should also consider that the use case itself is limiting the amount of concurrency. Imagine a collaborative environment where the order of actions matters. This forces you to synchronize actions - even if you would have been able to process all of them at once.
In java land this could be a simple thing as sharing a single resource which is using blocking access. (e.g. shared Random number generators (not per thread), shared Vectors, concurrent structures like ConcurrentHashMap etc.).
The more synchronization the less you will be able to fully utilize your server hardware.
So apart from running out of memory or saturating the CPU or hitting the garbage collection limit this synchronization might be a problem which does not only need to be solved in your code but maybe even requires you to soften some requirements of the high level workflow.

Seeing point 6, you can use these tools to see if your hardware is being the bottleneck: Assuming that you're on linux, you can use VmStat to see some statistics on your RAM usage, top or atop (depending on your distro) to see processes taking a toll in your CPU and RAM, nload and iftop to see what is consuming network bandwith, and iotop to see what is reading and writing to your disk.

How to properly throttle web requests to external systems?

My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.

Perhaps this open source system can help you a little: http://code.google.com/p/valogato/

Configuring Jetty for high request volume

In our application we need to handle request volumes in excess of 5,000 requests per second. We've been told that this is feasible with Jetty in our type of application (where we must expose a JSON-HTTP API to a remote system, which will then initiate inbound requests and connections to us).
We receive several thousand inbound HTTP connections, each of which is persistent and lasts about 30 seconds. The remote server then fires requests at us as quickly as we can respond to them on each of these connections. After 30 seconds the connection is closed and another is opened. We must respond in less than 100ms (including network transit time).
Our server is running in EC2 with 8GB of RAM, 4GB of which is allocated to our Java VM (past research suggested that you should not allocate more than half the available RAM to the JVM).
Here is how we currently initialize Jetty based on various tips we've read around the web:
Server server = new Server();
SelectChannelConnector connector = new SelectChannelConnector();
connector.setPort(config.listenPort);
connector.setThreadPool(new QueuedThreadPool(5120));
connector.setMaxIdleTime(600000);
connector.setRequestBufferSize(10000);
server.setConnectors(new Connector[] { connector });
server.setHandler(this);
server.start();
Note that we originally had just 512 threads in our threadpool, we tried increasing to 5120 but this didn't noticeably help.
We find with this setup we struggle to handle more than 300 requests per second. We don't think the problem is our handler as it is just doing some quick calculations, and a Gson serialization/deserialization.
When we manually do a HTTP request of our own while it's trying to handle this load we find that it can take several seconds before it begins to respond.
We are using Jetty version 7.0.0.pre5.
Any suggestions, either for a solution, or techniques to isolate the bottleneck, would be appreciated.

First, Jetty 7.0.0.pre5 is VERY old. Jetty 9 is now out, and has many performance optimisations.
Download a newer version of the 7.x line at
https://www.eclipse.org/jetty/previousversions.html
This following advice is documented at
Eclipse.org / Jetty - HowTo: High Load
Eclipse.org / Jetty - HowTo: Garbage Collection
Lies, Damned Lies, and Benchmarks
Be sure you read them.
Next, the threadpool size is for handling accepted requests, 512 is high. 5120 is ridiculous.
Pick a number higher than 50, and less than 500.
If you have a Linux based EC2 node, be sure you configure the networking for maximum benefit at the OS level. (See the document titled "High Load" in the above mentioned list for details)
Be sure you are using a recent JRE/JDK, such as Oracle Java 1.6u38 or 1.7u10. Also, if you have a 64 bit OS, use the 64 bit JRE/JDK.
Set your acceptor count, SelectChannelConnector.setAcceptors(int) to be a a value between 1 and (number_of_cpu_cores - 1).
Lastly, setup optimized Garbage Collection, and turn on GC Logging to see if the problems you are having are with jetty, or with Java's GC. If you see via the GC logging that there are massive GC "stop the world" events taking lots of time, then you know one more cause for your performance issues.

Spring + Hibernate application not releasing memory

We have created a spring web app. using:
Spring 3.1.0
Hibernate 3.5.4 final
tomcat 6.24
The application is reasonably heavy, we are sending about 1000 contacts per user request.
We tested our application with 9 concurrent users with repeated requests and profiled with visual vm the results are as follows:
Looking at the results, the high peaks are the repeated requests and the lower points are when all requests are stopped. The first ~200MB of memory does not seem to be released at all. Is spring actually just this heavy or do I have a potential memory issue? The release version of this web app will potentially handle much more users.
I have similar results testing on tomcat 7 as well.

its not any memory issue, GC is smart enough that release objects after there is no reference in your application, make sure that there is no global reference for which can be used as local to any method, and as per your graph it is releasing objects, 200 mb may be required tor permgen, so you should not worry.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.