We're currently trying to make our server software use a connection pool to greatly reduce lag however instead of reducing the time queries take to run, it is doubling the time and making it even slower than it was before the connection pool.
Are there any reasons for this? Does JDBC only allow a single query at a time or is there another issue?
Also, does anyone have any examples of multi-threaded connection pools to reduce the time hundreds of queries take as the examples we have found only made it worse.
We've tried using BoneCP and Apache DBCP with similar results...
That one is using Apache's DBCP. We also have tried using BoneCP with the same result...
A connection pool helps mitigating the overhead/cost of creating new connections to the database, by reusing already existing ones. This is important if your workload requires many, short to medium living connections, e.g. an app that processes concurrent user requests by querying the database. Unfortunately your example benchmark code does not have such a profile. You are just using 4 connections in parallel and there is no reuse involved.
What a connection pool cannot achieve is magically speeding up execution times or improving the concurrency level beyond that, which is provided by the database. If the benchmark code represents the expected workload, I would advise you to look into batching statements instead of threading. That will massively increase performance of INSERT/UPDATE operations.
update :
Using multiple connections in parallel can enhance performance. Just keep in mind, that there is not necessarily a relation between multiple threads in your Java application and in the database. JDBC is just a wrapper around the database driver, using multiple connections results in multiple queries being submitted to the database server in parallel. If those queries are suited for it, every modern RDBMS will be able to process them in parallel. But if those queries are very work intensive, or even worse include table locks or conflicting updates, the DB may not be able to do so. If you experience bad performance, check which queries are lagging and optimize them (are they efficient? proper indexes in place? denormalizing the schema may help in more extreme cases. Use prepared statements and batch mode for larger updates, etc.). If your db is overloaded with many, similar and small queries, consider caching frequently used data.
Related
Application is hosted on multiple Virtual Machines and DB is on single server. All VMs are pointing to single Instance of DB.
In this architecture, I have a table having very few record. But this table is accessed and updated by threads running on VMs very heavily. This is causing a performance bottleneck and sometimes record level exception. Database level locking does not seem to be best option as it is introducing significant delays in request processing.
Please suggest if there is any other technique to solve this problem.
Few questions first!
Is your application using connection pooling? If not, please use it. Creating a JDBC connection is expensive!
Is your application read heavy/write heavy?
What kind of storage engine are you using in your MySQL tables? InnoDB or MyISAM. If your application is write heavy, please use InnoDB based tables as it uses row level locking and will serve concurrent requests better.
One special case - if you are implementing queues on top of database tables, find a database that has a built-in queue operation and use that, or use a reliable messaging service. Building queues on top of databases is typically not efficient. See e.g. http://mikehadlow.blogspot.co.uk/2012/04/database-as-queue-anti-pattern.html
In general, running transactions on databases is slow because at the end of each transaction the database needs to be sure that enough has been saved out to disk that if the system died right now the changes made by the transaction would be safely preserved. If you don't need this you might find it faster to write a single non-database application that does what the database does but doesn't write anything out to disk, or still does database IO but does the minimum possible. Then instead of all of the VMs talking to the database directly they would all talk to this application.
On my server, in order to speed up things, I have allocated a connection pool to my sqlite odbc source.
What happens if two or more hosts want to alter my data?
Are these multiple connections automatically handled by the sqllite?
You can read this thread
If most of those concurrent accesses are reads (e.g. SELECT), SQLite can handle them very well. But if you start writing concurrently, lock contention could become an issue. A lot would then depend on how fast your filesystem is, since the SQLite engine itself is extremely fast and has many clever optimizations to minimize contention. Especially SQLite 3.
For most desktop/laptop/tablet/phone applications, SQLite is fast enough as there's not enough concurrency. (Firefox uses SQLite extensively for bookmarks, history, etc.)
For server applications, somebody some time ago said that anything less than 100K page views a day could be handled perfectly by a SQLite database in typical scenarios (e.g. blogs, forums), and I have yet to see any evidence to the contrary. In fact, with modern disks and processors, 95% of web sites and web services would work just fine with SQLite.
If you want really fast read/write access, use an in-memory SQLite database. RAM is several orders of magnitude faster than disk.
And check this
In short: It is not good solution.
Description:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.
For your situation it is not good.
Advice: Use another RDBMS.
I'm developing a multithread application where threads are created when someone connects to my socket. Each connection creates a new thread, and each thread make queries to a MySQL database using JDBC. I'm wondering if this multiple connections to the MySQL from my different threads could cause any problem in my application or negatively affects MySQL data.
On the contrary, you should always connect to a DB in a multithreaded fashion. Or really, a pooled fashion!
Consider the situation when your application becomes a worldwide hit and you get 100k hits a minute, then you will have a heck of a lot of threads - namely one per connection, which will break your application, your app-server and your DB... :-)
Instead you might implement a pool of DB connections from which your threads can borrow and return when done with. There are several good opensource projects to choose from for this, C3PO and Commons DBCP being just two of them.
Hope that helps,
There is nothing to be gained by having more threads than there are
actual number of cpu/hardware threads. If anything, more threads
creates more overhead which will effectively slow your application
down.
What threads do provide is relatively easy of way of doing more
of the same thing, but after a while it starts hitting a brick wall and one has to start thinking about other potential solutions.
If you want your application to be scalable, now is a good time to
start thinking about how you can scale-out your application, i.e.
a distributed solution where multiple systems share the load.
Instead of more threads on a single system, think in terms of
work queues and worker threads distributed across N systems.
So, here is the deal.
I'm developing an Android application (although it could just as easily be any other mobile platform) that will occasionally be sending queries to a server (which is written is Java). This server will then search a MySQL database for the query, and send the results back to the Android. Although this sounds fairly generic, here are some specifics:
The Android will make a new TCP connection to the server every time it queries. The server is geographically close, the Android could potentially be moving around a lot, and, since the Android app might run for hours while only sending a few queries, this seemed the best use of resources.
The server could potentially have hundreds (or possibly even thousands) of these queries at once.
Since each query runs in its own Thread, each query will at least need its own Statement (and could have its own Connection).
Right now, the server is set up to make one Connection to the database, and then create a new Statement for each query. My questions for those of you with some database experience (MySQL in particular, since it is a MySQL database) are:
a) Is it thread safe to create one Statement per Thread from a single Connection? From what I understand it is, just looking for confirmation.
b) Is there any thread safe way for multiple threads to use a single PreparedStatement? These queries will all be pretty much identical, and since each thread will execute only one query and then return, this would be ideal.
c) Should I be creating a new Connection for each Thread, or is it better to spawn new Statements from a single Connection? I think a single Connection would be better performance-wise, but I have no idea what the overhead for establishing a DB Connection is.
d) Is it best to use stored SQL procedures for all this?
Any hints / comments / suggestions from your experience in these matters are greatly appreciated.
EDIT:
Just to clarify, the android sends queries over the network to the server, which then queries the database. The android does not directly communicate with the database. I am mainly wondering about best practices for the server-database connection here.
Just because a Connection object is thread safe does not mean its thread efficient. You should use a Connection pool as a best practice to avoid potential blocking issues. But in answer to your question, yes you can share a Connection object between multiple threads.
You do need to create a new Statements/Prepared Statements in each thread that will be accessing the database, they are NOT thread safe. I would highly recommend using Prepared Statements as you will gain efficiency and protection against SQL injection attacks.
Stored procedures will speed up your database queries since the execution plan is compiled already and saved - highly recommended to use if you can.
Have you looked at caching your database data? Take a look at spymemcached if you can, its a great product for reducing number of calls to your data store.
From my experience, you should devote a little time to wrap the database in a web service. This accomplishes two things:
You are forced to examine the data for wider consumption
You make it easier for new consumers to consume the data
A bit more development time, but direct connections to a database via an open network (Internet) is more problematic than specifying what can be accessed through a method.
Use a connection pool such as Commons DBCP. It will handle all the stuff you're worrying about, out of the box.
I have a project whereby I'm reading huge volumes of data from an Oracle database from Java.
I have the feeling that the application we are writing is going to process the data far faster than it will be given to us using a single threaded SELECT query and so I've been trying to research faster ways of obtaining the data.
Does anyone have anything I could read that would help me with my plight?
You haven't given us a lot of information on why it will be necessary to bring "huge volumes of data" into the Java application instead of processing it on the database side. Although there can be exceptions, usually this is signal to re-think the design. As a general rule with Oracle it is most efficient to do as much work as you can with pure set operations (SQL), followed by procedural processing with the rdbms engine (PL/SQL) before bringing results back to the client application.
Oracle supports parallel DML. In particular this applies to SELECT queries. Ultimately the bottleneck will probably be the IO read speed. Either use faster disks or stripe the data accross many disks.
Update
As APC noted in the comments Parallel Queries/DML is an Entreprise Edition feature and is not available in the Standard Edition.
Also, Parallel DML/Query is not the solution to all performance problems. Since more than one process will be used by the query it may improve throughput, but at the cost of concurrency. The purpose of parallelism is to use more resources to process the query faster. If the query is IO-bound or CPU-bound, there is no extra resources to use and adding parallelism will only make matter worse.
From the link above:
Parallel execution is not normally
useful for:
Environments in which the CPU, memory, or I/O resources are already
heavily utilized. Parallel execution
is designed to exploit additional
available hardware resources; if no
such resources are available, then
parallel execution will not yield any
benefits and indeed may be detrimental
to performance.
Use the setFetchSize(int) method on the Statement or PreparedStatement before you open the query. You should experiment with different sizes. Try 75 as a starting point.
On a slightly different useage, people have said that the PL/SQL bulk fetch "sweet spot" is between 2000 and 3000 but I saw one benchmark that indicated that 75 was optimum.
A large fetch size will tend to reduce the number of round trips between client and server. But if it is too large the database has to have a big buffer and the networking software may have to break up the big message into a lot of packets.
Firstly, 'huge data' to database people is [at least] gigabytes, in which case I suspect your problems are going to be reading those sort of volumes into your processes memory and aggregating them there. Why do you think a single-threaded select will be the bottleneck ?
If the bottleneck were getting the data from disk, then having multiple threads pulling data from the same disk wouldn't necessarily be faster and may even be slower. But if you could spread the data over separate disks, separate threads would be faster. If, using SSD, you don't think disks will be a contention point,we can look elsewhere.
If the bottleneck was network bandwidth, again multiple threads wouldn't fit any more data through the pipe any faster. You may even benefit from unloading the data to a flat file, compressing it and transferring that.
If the select is being sorted or comes from a hash-join, you may use memory more efficiently with a single thread. Multiple sessions would have to share the machine's memory.
If there is a CPU intensive processing then multiple threads may help. That could be as simple as having multiple connections from java, each getting a different slice of data (eg A-K and L-Z), but it would very much depend on the SELECT.
I agree with dpbradley that you should determine the bottleneck first. If you have the data and select, it should be simple enough to determine how long it takes (both on the local machine and through the network), and a trace would be a necessary starting point to really go into how it could be speeded up.