I'm developing a multithread application where threads are created when someone connects to my socket. Each connection creates a new thread, and each thread make queries to a MySQL database using JDBC. I'm wondering if this multiple connections to the MySQL from my different threads could cause any problem in my application or negatively affects MySQL data.
On the contrary, you should always connect to a DB in a multithreaded fashion. Or really, a pooled fashion!
Consider the situation when your application becomes a worldwide hit and you get 100k hits a minute, then you will have a heck of a lot of threads - namely one per connection, which will break your application, your app-server and your DB... :-)
Instead you might implement a pool of DB connections from which your threads can borrow and return when done with. There are several good opensource projects to choose from for this, C3PO and Commons DBCP being just two of them.
Hope that helps,
There is nothing to be gained by having more threads than there are
actual number of cpu/hardware threads. If anything, more threads
creates more overhead which will effectively slow your application
down.
What threads do provide is relatively easy of way of doing more
of the same thing, but after a while it starts hitting a brick wall and one has to start thinking about other potential solutions.
If you want your application to be scalable, now is a good time to
start thinking about how you can scale-out your application, i.e.
a distributed solution where multiple systems share the load.
Instead of more threads on a single system, think in terms of
work queues and worker threads distributed across N systems.
Related
I run multiple game servers and I want to develop a custom application to manage them. Basically all the game servers will connect to the application to exchange data. I don't want any of this data getting lost so I think it would be best to use TCP. I have looked into networking and understand how it works however I have a question about cpu usage. More servers are being added and in the next few months it could potentially reach around 100 - 200 and will continue to grow as needed. Will new threads for each server use a lot of cpu and is it a good idea to do this? Does anyone have any suggestions on how to go about this? Thanks.
You should have a look at non blocking io. With blocking io, each socket will consume 1 thread and the number of threads in a system is limited. And even if you can create 1000+, it is a questionable approach.
With non blocking io, you can server multiple sockets with a single thread. This is a more scalable approach + you control how many threads at any given moment are running.
More servers are being added and in the next few months it could potentially reach around 100 - 200 and will continue to grow as needed. Will new threads for each server use a lot of cpu and is it a good idea to do this?
It is a standard answer to caution away from 100s of threads and to the NIO solution. However, it is important to note that the NIO approach has a significantly more complex implementation. Isolating the interaction with a server connection to a single thread has its advantages from a code standpoint.
Modern OS' can fork 1000s of threads with little overhead aside from the stack memory. If you are sure of your scaling factors (i.e. you're not going to reach 10k connections or something) and you have the core memory then I would say that a thread per TCP connection could work very well. I've very successfully run applications with 1000s of threads and have not seen fall offs in performance due to context switching which used to be the case with earlier processors/kernels.
We're currently trying to make our server software use a connection pool to greatly reduce lag however instead of reducing the time queries take to run, it is doubling the time and making it even slower than it was before the connection pool.
Are there any reasons for this? Does JDBC only allow a single query at a time or is there another issue?
Also, does anyone have any examples of multi-threaded connection pools to reduce the time hundreds of queries take as the examples we have found only made it worse.
We've tried using BoneCP and Apache DBCP with similar results...
That one is using Apache's DBCP. We also have tried using BoneCP with the same result...
A connection pool helps mitigating the overhead/cost of creating new connections to the database, by reusing already existing ones. This is important if your workload requires many, short to medium living connections, e.g. an app that processes concurrent user requests by querying the database. Unfortunately your example benchmark code does not have such a profile. You are just using 4 connections in parallel and there is no reuse involved.
What a connection pool cannot achieve is magically speeding up execution times or improving the concurrency level beyond that, which is provided by the database. If the benchmark code represents the expected workload, I would advise you to look into batching statements instead of threading. That will massively increase performance of INSERT/UPDATE operations.
update :
Using multiple connections in parallel can enhance performance. Just keep in mind, that there is not necessarily a relation between multiple threads in your Java application and in the database. JDBC is just a wrapper around the database driver, using multiple connections results in multiple queries being submitted to the database server in parallel. If those queries are suited for it, every modern RDBMS will be able to process them in parallel. But if those queries are very work intensive, or even worse include table locks or conflicting updates, the DB may not be able to do so. If you experience bad performance, check which queries are lagging and optimize them (are they efficient? proper indexes in place? denormalizing the schema may help in more extreme cases. Use prepared statements and batch mode for larger updates, etc.). If your db is overloaded with many, similar and small queries, consider caching frequently used data.
Our Java application runs frequently heavy databases queries. As Java process and Oracle process run on the same computer, these heavy queries may consume so much of CPU or IO that important applicative threads i.e. user requests, become unresponsive.
I'm looking for a solution to prioritize transactions (or connections or connection pools) in Oracle. I am aware of Oracle's Resource Manager feature, but we don't have license to use it.
If the prioritization is not possible can transactions be paused or even killed in the middle?
We are running on Linux, J2EE, hibernate / sql
My 2 cents worth: Try to control priorities at the Java (i.e. application) level rather than relying on Oracle. This could be done using a PriorityBlockingQueue from which threads consume database requests.
I'm trusting that you're not attempting do these DB operations from the UI's event dispatch thread :)
You might try messing with thread priorities; use a lower priority on the DB threads so the UI remains responsive. However this might not be effective and may have other issues such as priority inversion.
Another idea is to put short sleeps in a DB thread, or try yield() every so often. These should be done outside DB transactions so that you don't block other concurrent DB clients.
On the Oracle side, I would definitely not recommend messing with Oracle server process priorities. The main reason is, at any given moment, an Oracle process may be holding a lock or latch on some element of the SGA. If you lower the priority of a process, and it's holding locks and/or latches, you could end up impacting the performance of other Oracle server processes, as they queue up behind the lock/latch holder, which is low priority, and now can't get any CPU.
Oracle itself does allow for certain background processes to run at higher priority, but that's by design, and functionality that's built-in to Oracle. I wouldn't mess with priorities of individual server processes.
Finally, I'd really look at the feasibility of moving the Java application code to a different server. That way, Oracle and Java aren't competing for what sounds like a scarce resource.
Hope that helps.
So, here is the deal.
I'm developing an Android application (although it could just as easily be any other mobile platform) that will occasionally be sending queries to a server (which is written is Java). This server will then search a MySQL database for the query, and send the results back to the Android. Although this sounds fairly generic, here are some specifics:
The Android will make a new TCP connection to the server every time it queries. The server is geographically close, the Android could potentially be moving around a lot, and, since the Android app might run for hours while only sending a few queries, this seemed the best use of resources.
The server could potentially have hundreds (or possibly even thousands) of these queries at once.
Since each query runs in its own Thread, each query will at least need its own Statement (and could have its own Connection).
Right now, the server is set up to make one Connection to the database, and then create a new Statement for each query. My questions for those of you with some database experience (MySQL in particular, since it is a MySQL database) are:
a) Is it thread safe to create one Statement per Thread from a single Connection? From what I understand it is, just looking for confirmation.
b) Is there any thread safe way for multiple threads to use a single PreparedStatement? These queries will all be pretty much identical, and since each thread will execute only one query and then return, this would be ideal.
c) Should I be creating a new Connection for each Thread, or is it better to spawn new Statements from a single Connection? I think a single Connection would be better performance-wise, but I have no idea what the overhead for establishing a DB Connection is.
d) Is it best to use stored SQL procedures for all this?
Any hints / comments / suggestions from your experience in these matters are greatly appreciated.
EDIT:
Just to clarify, the android sends queries over the network to the server, which then queries the database. The android does not directly communicate with the database. I am mainly wondering about best practices for the server-database connection here.
Just because a Connection object is thread safe does not mean its thread efficient. You should use a Connection pool as a best practice to avoid potential blocking issues. But in answer to your question, yes you can share a Connection object between multiple threads.
You do need to create a new Statements/Prepared Statements in each thread that will be accessing the database, they are NOT thread safe. I would highly recommend using Prepared Statements as you will gain efficiency and protection against SQL injection attacks.
Stored procedures will speed up your database queries since the execution plan is compiled already and saved - highly recommended to use if you can.
Have you looked at caching your database data? Take a look at spymemcached if you can, its a great product for reducing number of calls to your data store.
From my experience, you should devote a little time to wrap the database in a web service. This accomplishes two things:
You are forced to examine the data for wider consumption
You make it easier for new consumers to consume the data
A bit more development time, but direct connections to a database via an open network (Internet) is more problematic than specifying what can be accessed through a method.
Use a connection pool such as Commons DBCP. It will handle all the stuff you're worrying about, out of the box.
I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.
Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.
The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.
A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.
To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.
I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.
Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.