I noticed that even when the database is down, so no connection is actually available in the pool, Hikari CP still waits for the connection timeout to expire before sending an exception to the client.
I agree this is desirable when the database is available, but in my case I would like for the pool not to wait before sending an exception when no connection is available.
The reason is that the database itself answers in less than 2ms , so I can handle thousands of transactions per second but when there is no connection available the pool will wait for a lot longer (the minimum acceptable timeout recommended being 250 ms) so I can no longer handle the throughput. On the other hand, my logic can work without the database for a period of time.
How should I manage this?
EDIT:
This link is almost what I want to achieve, minus the fact that I would prefer HikariCP to do this automatically, I shouldn't activate the suspend state.
Perhaps you should introduce a counter somewhere in your application code and if the number of concurrent requests exceeds the value don't use database. It's hard to tell without knowing what you are dealing with e.g. read vs write.
As per brettwooldridge comment regarding connectionTimeout property lower timeout is unreliable due to thread scheduling, even when there are available connections:
We can certainly consider a lower floor, but 125ms would be the absolute minimum.
Both Windows and Linux have a default scheduler quantum of 20ms. If 16 threads are running on a 4-core CPU, a single thread may have to wait up to 80ms just to be allowed to run. If the pool has a vacancy due to, for example, the retirement of a connection at maxLifetime, this leaves precious little time to establish a connection to fill the slot without returning a spurious failure to the client.
If careful consideration is not taken to ensure the CPU and scheduler are not oversaturated, running at a 125ms timeout, puts your application tier at risk of spurious failures even if the pool has available connections. For example, running 32 threads on a 4-core CPU can lead to thread starvations under load as long as 120ms -- very close to the edge.
Related
My goal is to handle WebSocket connections inside threads. If I use in a new Thread, the number of WebSocket connections that the server can handle is unknown. If I use in a Thread pool, the number of WebSocket connections that the server can handle is the thread pool size.
I am not sure about the correlation between available processors and threads. Does 1 processor execute 1 thread at a time?
My expected result: Creating more threads than the available processors is not advisable and you should re-design how you handle the WebSocket connections.
in a new Thread
final Socket socket = serverSocket.accept();
new Thread(new WebSocket(socket, listener)).start();
in a Thread pool
final ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final Socket socket = serverSocket.accept();
es.execute(new WebSocket(socket, listener));
To avoid confusion, the WebSocket class is a custom class that implements Runnable. As far I know, Java SE does not have a WebSocket server, only a WebSocket client.
Make threads. A thousand if you want.
At the CPU core level, here's what's happening:
The CPU core is chugging along, doing work for a given websocket.
Pretty soon the core runs into a road block: Half of an incoming bunch of data has arrived, the rest is still making its way down the network cable, and thus the CPU can't continue until it arrives. Alternatively, the code that the CPU core is running is sending data out, but the network card's buffer is full, so now the CPU core has to wait for that network card to find its way to sending another packet down the cable before there's room.
Of course, if there's work to do (say, you have 10 cores in the box, and 15 web users are simultaneously connected, that leaves at least 5 users of your web site waiting around right now) - then the CPU should not just start twiddling its thumbs. It should go do something.
In practice, then, there's a whole boatload of memory that WAS relevant that no longer is (all that memory that contained all that state and other 'working items' that was neccessary to do the work for the websocket that we were working on, but which is currently 'blocked' by the network), and a whole bunch of memory that wasn't relevant that now becomes relevant (All the state and working memory of a websocket connection that was earlier put in the 'have yourself a bit of a timeout and wait around for the network packet to arrive' - for which the network packet has since arrived, so if a CPU core is free to do work, it can now go do work).
This is called a 'context switch', and it is ridiculously expensive, 500+ cycles worth. It is also completely unavoidable. You have to make the context switch. You can't avoid it. That means a cost is paid, and about 500 cycles worth just go down the toilet. It's what it is.
The thing is, there are 2 ways to pay that cost: You can switch to another thread, which is all sorts of context switching. Or, you have a single thread running so-called 'async' code that manages all this stuff itself and hops to another job to do, but then there's still a context switch.
Specifically, CPUs can't interact with memory at all anymore these days and haven't for the past decade. They can only interact with a CPU cache page. machine code is actually not really 'run directly' anymore, instead there's a level below that where a CPU notices it's about to run an instruction that touches some memory and will then map that memory command (after all, CPUs can no longer interact with it at all, memory is far too slow to wait for it) to the right spot in the cache. It'll also notice if the memory you're trying to access with your machinecode isn't in a cache page associated with that core at all, in which case it'll fire a page miss interrupt which causes the memory subsystem of your CPU/memory bus to 'evict a page' (write all back out to main memory) and then load in the right page, and only then does the CPU continue.
This all happens 'under the hood', you don't have to write code to switch pages, the CPU manages it automatically. But it's a heavy cost. Not quite as heavy as a thread switch but almost as heavy.
CONCLUSION: Threads are good, have many of them. It ensures CPUs won't twiddle their thumbs when there is work to do. Note that there are MANY blog posts that extoll the virtues of async, claiming that threads 'do not scale'. They are wrong. Threads scale fine, and async code also pays the cost of context switching, all the time.
In case you weren't aware, 'async code' is code that tries to never sleep (never do something that would ever wait. So, instead of writing 'getMeTheNextBlockOfBytesFromTheNetworkCard', you'd write: "onceBytesAreAvailableRunThis(code goes here)`). Writing async code in java is possible but incredibly difficult compared to using threads.
Even in the extremely rare cases where async code would be a significant win, Project Loom is close to completion which will grant java the ability to have thread-like things that you can manually manage (so-called fibers). That is the route the OpenJDK has chosen for this. In that sense, even if you think async is the answer, no it's not. Wait for Project Loom to complete, instead. If you want to read more, read What color is your function?, and callback hell. Neither post is java-specific but covers some of the more serious problems inherent in async.
My application is a Piwik Server that receives incoming tracking data from tracking codes placed on hundreds of websites. The bulk of the workload is small writes to the database hundreds of times per second as these tracking requests come in. I'm using MySQL server with JDBC and Hibernate.
I've recently been increasing the maxPoolSize setting gradually on my application to improve performance. It certainly seems like the higher I set the configuration, the more responsive the application is, and the more stable the disk queue depth.
My current configuration:
jdbc.maxPoolSize=100
jdbc.minPoolSize=100
jdbc.maxStatements=1000
Essentially, my question is what risks I should be watching for when I increase the maxPoolSize? Are there any specific factors or metrics that I should watch to judge whether I've configured this setting too high? Obviously if increasing the maxPoolSize was a magic bullet for resolving performance problems, everyone would want to set it as high as possible. Apologies in advance if this is a duplicate, but I couldn't find any answer addressing how to assess if your connection pool is too large.
I'm running MySQL on an AWS RDS instance. These are my guesses as to what the concerns might be:
Avoid exceeding the maximum number of connections allowed by the RDS instance type
Would an excessively high setting suck up all the memory on the server and impact performance?
Will too many threads cause tables to lock and increase queue time for some of the queries?
Any assistance in understanding what factors to watch for would be greatly appreciated.
I strongly recommend setting up DropWizard metrics and/or JMX monitoring.
In the case of JMX, graph the "Active Connections" over time, if your pool never crosses (or rarely crosses) a given threshold, setting the maximumPoolSize above that is simply wasting resources.
In the case of DropWizard metrics, the "Usage" measurement -- reflecting how long connections are out of the pool -- would give a "comparable" for you to check when playing with the maximumPoolSize.
If connections tend to be out of the pool longer when the maximumPoolSize is 50 (for example) compared to 40, that would indicate that the database is oversaturated, and 40 is closer to ideal.
If there is no difference between a maximumPoolSize of 30 compared to 40 (again, just an example), it could mean that 40 is simply unnecessarily high, or it could mean that the period of time over which those metrics were collected was simply a low period of demand and 40 may still be correct.
Best of all is to combine the above metrics with total web request service times and overlay them on a graph or at least side-by-side.
Metrics are the key to analysis! Find and track as many relevant ones as you can; patterns will emerge.
Lastly, you might try setting the pool for minumumIdle=20 and maximumPoolSize=100 and see where the pool generally settles, ignoring the occasional spike. RDS is unlike typical databases, where you control the hardware where the database is running. With RDS you really don't know how Amazon is spreading the load, so it is just going to require experimentation. Let each experiment run long enough (several hours) to collect sufficient data, and take screenshots of your monitor for comparison.
Avoid exceeding the maximum number of connections allowed by the RDS instance type.
That is plausible.
Would an excessively high setting suck up all the memory on the server and impact performance?
That is possible. Each active connection in the pool will have associated buffers, etcetera. However I would expect the buffers to be bounded.
Will too many threads cause tables to lock and increase queue time for some of the queries?
Possibly. However, if you are mostly doing small writes then I'd not have thought that locking would be a concern for other writes. But if you are doing simultaneous queries that entail a table scan, locking could be a concern.
However, I'd not have thought that increasing the pool size (above 100) is likely to increase throughput. Check the CPU and/or disk I/O load on the database instance, or network traffic between your front end and the DB instance. If the database is where the bottleneck is, then allowing the front end to make more simultaneous requests is likely to make performance worse.
You need to consider what happens if the load (e.g. request rate) on your system goes above the overall throughput that it can sustain. If the pool size is too large, then the front-end load spike could turn into a database load-spike that leads to a drop in throughput. The problem is that you won't know when the load spike is going to happen, and unless you have load tested your system beforehand with the tweaked pool size, you won't know what the (actual) affect of the pool size change will be ...
I have developed a JAVA based server having a Thread Pool that is dynamically growing with respect to client request rate.This strategy is known as FBOS(Frequency Based Optimization Strategy) FBOS for Thread pool System.
For example if request rate is 5 requests per second then my thread pool will have 5 threads to service client's requests. The client requests are I/O bound jobs of 1 seconds.i.e. each request is a runnable java object that have a sleep() method to simulate I/O operation.
If client request rate is 10 requests per second then my thread pool will have 10 threads inside in it to process clients. Each Thread have an internal timer object that is activated when its corresponding thread is idle and when its idle time becomes 5 seconds the timer will delete its corresponding thread from the Thread Pool to dynamically shrink the Thread Pool.
My strategy is working well for short I/O intensities.My server is working nicely for small request rate but for large request rate my Thread pool have large number of threads inside it. For example if request rate is 100 request per second then my Thread Pool will have 100 threads inside it.
Now I have 3 questions in my mind
(1) Can i face memory leaks using this strategy, for large request rate?
(2) Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
(3) Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
I just want to take advice from all of you that how a clustering environment can give me more benefit in the scenario of Frequency Based Thread Pool for I/O bound jobs only. That is can a clustering environment give me benefit of using memories of other systems(nodes)?
The simplest solution to use is a cached thread pool, see Executors I suggest you try this first. This will create the number of threads to need at once. For an IO bound request, a single machine can easily expand to 1000s of threads without needing an additional server.
Can i face memory leaks using this strategy, for large request rate?
No, 100 per second is not particularly high. If you are talking over 10,000 per second, you might have a problem (or need another server)
Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
Yes, my rule of thumb is that 10,000 threads wastes about 1 cpu in overhead.
Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
Given you look to be using up to 1% of one machine, I wouldn't worry about using multiple machines to do the IO. Most likely you want to process the results, but without more information you couldn't say whether more machines would help or not.
can a clustering environment give me benefit of using memories of other systems(nodes)?
It can help if you need it or it can add complexity you don't need if you don't need it.
I suggest you start with a real problem and look for a solution to solve it, rather than start with a cool solution and try to find a problem for it to solve.
Brief
I am running a multithreaded tcp server that uses a fixed thread pool with an unbounded Runnable queue. The clients dispatch the runnables to the pool.
In my stress test scenario, 600 clients attempt to login to the server and immediately broadcast messages to every other client simultaneously and repeatedly to no end and without sleeping (Right now the clients just discard the incoming messages). Using a quad-core with 1GB reserved for heap memory - and a parallel GC for both the young and old generations - the server crashes with a OOM exception after 20 minutes. Monitoring the garbage collector reveals that the tenured generation is slowly increasing, and a full GC only frees up a small fraction of memory. A snapshot of a full heap shows that the old generation is almost completely occupied by Runnables (and their outgoing references).
It seems the worker threads are not able to finish executing the Runnables faster than the clients are able to queue them for execution (For each incoming "event" to the server, the server will create 599 runnables as there are 600 - 1 clients - assuming they are all logged in at the time).
Question
Can someone please help me conceive a strategy on how to handle the overwhelmed thread pool workers?
Also
If I bound the queue, what policy should I implement to handle rejected execution?
If I increase the size of the heap, wouldn't that only prolong the OOM exception?
A calculation can be made to measure the amount of work done in the aggregation of Runnables. Perhaps this measurement be used as a basis for a locking mechanism to coordinate clients' dispatching work?
What reaction should the client experience when the server is overwhelmed with work?
Do not use an unbounded queue. I cannot tell you what the bound should be; your load tests should give you an answer to that question. Anyhow, make the bound configurable: at least dynamycalliy configurable, better yet adaptable to some load measurement.
You did not tell us how the clients submit their requests, but if HTTP is involved, there already is a status code for the overloaded case: 503 Service Unavailable.
I would suggest you limit the capacity of the queue and "push back" on the publisher to stop it publishing or drop the requests gracefully. You can do the former b making the Queue block when its full.
You should be able to calculate your maximum throughput based on you network bandwidth and message size. If you are getting less than this, I would consider changing how your server distributes data.
Another approach is to make your message handling more efficient. You could have each reading thread from each client write directly to the listening clients. This avoids the need for an explicit queue (you might think of the buffers in the Socket as a queue of bytes) and limits the speed to whatever the server can handle. It will also not use more memory under load (than it does when idle)
Using this approach you can achieve as high message rates as your network bandwidth can handle. (Even with a 10 Gig-E network) This moves the bottle neck elsewhere, meaning you still have a problem but your server shouldn't fail.
BTW: If you use direct ByteBuffers you can do this without creating garbage and with a minimum of heap. e.g. ~1 KB of heap per client.
It sounds as if you're doing load testing. I would determine what you consider to be "acceptable heavy load". What is the heaviest amount of traffic you can expect a single client to generate? Then double it. Or triple it. Or scale a manner similar to that. Use this threshold to throttle or deny clients that use this much bandwidth.
This has a number of perks. First, it gives you the kind of analysis you need to determine server load (users per server). Second it gives you a first line of defense against DDOS attacks.
You have to somehow throttle the incoming requests, and the mechanism for doing that should depend on the work you are trying to do. Anything else will simply result in an OOM under enough load, and thus open you up for DOS attacks (even unintentional ones).
Fundamentally, you have 4 choices:
Make clients wait until you are ready to accept their requests
Actively reject client requests until you are ready to accept new requests
Allow clients to timeout while trying to reach your server when it is not ready to receive requests
A blend of 2 or 3 of the above strategies.
The right strategy depends on how your real clients will react under the various circumstances – is it better for them to wait, possibly (effectively) indefinitely, or is it better that they know quickly that their work won't get done unless they try again later?
Whichever way you do it, you need to be able to count the number of tasks currently queued and either add a delay, block completely, or return an error condition based on the number of items in the queue.
A simple blocking strategy can be implemented by using a BlockingQueue implementation. However, this doesn't give particularly fine-grained control.
Or you can use a Semaphore to control permits to add tasks to the queue, which has the advantage of supplying a tryAcquire(long timeout, TimeUnit unit) method if you want to apply a mild throttling.
Whichever way, don't allow the threads that service the clients to grow without bounds, or else you'll simply end up with an OOM for a different reason!
Currently we are using 4 cpu windows box with 8gb RAM with MySQL 5.x installed on same box. We are using Weblogic application server for our application. We are targeting for 200 concurrent users for our application (Obviously not for same module/screen). So what is optimal number of connections should we configured in connection pool (min and max number) (We are using weblogic AS' connection pooling mechanism) ?
Did you really mean 200 concurrent users or just 200 logged in users? In most cases, a browser user is not going to be able to do more than 1 page request per second. So, 200 users translates into 200 transactions per second. That is a pretty high number for most applications.
Regardless, as an example, let's go with 200 transactions per second. Say each front end (browser) tx takes 0.5 seconds to complete and of the 0.5 seconds, 0.25 are spent in the database. So, you would need 0.5 * 200, or 100 connections in the WebLogic thead pool and 0.25 * 200 = 50 connections in the DB connection pool.
To be safe, I would set the max thread pool sizes to at least 25% larger than you expect to allow for spikes in load. The minimums can be a small fraction of the max, but the tradeoff is that it could take longer for some users because a new connection would have to be created. In this case, 50 - 100 connections is not that many for a DB so that's probably a good starting number.
Note, that to figure out what your average transaction response times are, along with your average DB query time, you are going to have to do a performance test because your times at load are probably not going to be the times you see with a single user.
There is a very simple answer to this question:
The number of connections in the connection pool should be equal the number of the exec threads configured in WebLogic.
The rationale is very simple: If the number of the connections is less than the number of threads, some of the thread maybe waiting for a connection thus making the connection pool a bottleneck. So, it should be equal at least the number the exec threads (thread pool size).
Sizing a connection pool is not a trivial thing to do. You basically need:
metrics to investigate the connection usage
failover mechanisms for when there is no connection available
FlexyPool aims to aid you in figuring out the right connection pool size.
You should profile the different expected workflows to find out. Ideally, your connection pool will also dynamically adjust the number of live connections based on recent usage, as it's pretty common for load to be a function of the current time of day in your target geographical area.
Start with a small number and try to reach a reasonable number of concurrent users, then crank it up. I think it's likely that you'll find that your connection pooling mechanism is not nearly as instrumental in your scalability as the rest of the software.
The connection pool should be able to grow and shink based on actual needs. Log the numbers needed to do analysis on the running system, either through logging statements or through JMX surveillance. Consider setting up alerts for scenarios like "peak detected: more than X new entries had to be allocated in Y seconds", "connection was out of pool for more than X seconds" which will allow you to give attention to performance issues before they get real problems.
It's difficult to get hard data for this. It's also dependent on a number of factors you don't mention -
200 concurrent users, but how much of their activity will generate database queries? 10 queries per page load? 1 query just on login? etc. etc.
Size of the queries and the db obviously. Some queries run in milliseconds, some in minutes.
You can monitor mysql to watch the current active queries with "show processlist". This could give you a better sense of how much activity is actually going on in the db under peak load.
This is something that needs to be tested and determined on an individual basis - it's pretty much impossible to give an accurate answer for your circumstances without intimately being familiar with them.
Based on my experience on high transaction financial systems, if you want to handle for example 1K requests per seconds, and you have 32 CPU's, You need to have 1000/32 open connection polls to your database.
Here is my formula:
RPS / CPU_COUNT
If most cases, your database engine will be able to handle your requests even in much lower numbers, but your connections will be in waiting mode if the number is low.
I think it's pretty important to mention that your database should be able to handle those transactions (based on your disk speed, database configuration and server power).
Good luck.