Best practice for multithreaded applications that talk to a sql database

Best practice for multithreaded applications that talk to a sql database - java

I have 20 factory machines all processing tasks. I have a database adapter that talks to a MYSQL database that the machines store all their info in. Is it safe to have each factory machine panel have it's own database adapter and updater thread? The updater thread just continuously checks to see if the panel's current taskID is the same as the current in the database and if not it repopulates the panel with information about the new task.
I'm not certain if having too many connections will add overhead or not?

RDBS are designed to be accessed by multiple clients at a given time. It's one of their purpose.
So I don't think 20, or even a thousand of simultaneous connection will cause any problem at all.

Rather than having many connection all doing same task create one process which maintains the list of taskid (if it is different for all machines) and checks the current taskid in database. if it is changed then send message to all the machines (which has change in their taskid)to update their panels. This will avoid unnecessary load on database and will also handle increase in number of machines without any impact.

The number of connection in mysql is controlled by max_connections (total number of allowed simultaneous connection) system variable and max_user_connections(max number of simultaneous connections per user). Take a look on your server settings and maybe change them. Default numbers are definitely bigger than 20 though.

Related

Oracle JDBC connection timed out issue

I have a scenario in production for a web app, where when a form is submitted the data gets stored in 3 tables in Oracle DB through JDBC. Sometimes I am seeing connection time out errors in logs while the app is trying to connect to Oracle DB through Java code. This is intermittent.
Below is the exception:
SQL exception while storing data in table
java.sql.SQLRecoverableException: IO Error: Connection timed out
Most of the times the web app is able to connect to data base and insert values in it but some times and I am getting this time out error and unable to insert data in it. I am not sure why am I getting this intermittent issue. When I checked the connections pool config in my application, I noticed the following things:
Pool Size (Maximum number of Connections that this pool can open) : 10
Pool wait (Maximum wait time, in milliseconds, before throwing an Exception if all pooled Connections are in use) : 1000
Since the pool size is just 10 and if there are multiple users trying to connect to data base will this connection time out issue occur ?
Also since there are 3 tables where the data insertion occurs we are doing the whole insertion in just one connection itself. We are not opneing each DB connection for each individual table.
NOTE: This application is deployed on AEM (Content Management system) server and connections pool config is provided by them.
Update: I tried setting the validation query in the connections pool but still I am getting the connection time out error. I am not sure whether the connections pool has checked the validation query or not. I have attached the connections pool above for reference.

I would try two things:
Try setting a validation query so each time the pool leases a connection, you're sure it's actually available. select 1 from dual should work. On recent JDBC drivers that should not be required but you might give it a go.
Estimate the concurrency of your form. A 10 connections pool is not too small depending on the complexity of your work on DB. It seems you're saving a form so it should not be THAT complex. How many users per day do you expect? Then, on peak time, how many users do you expect to be using the form at the same time? A 10 connections pool often leases and retrieves connections quite fast so it can handle several transactions per second. If you expect more, increase the size slightly (more than 25-30 actually degrades DB performance as more queries compete for resources there).
If nothing seems to work, it would be good to check what's happening on your DB. If possible, use Enterprise Manager to see if there are latches while doing stuff on those three tables.

I give this answer from programming point of view. There are multiple possibilities for this problem. These are following and i have added appropriate solution for it. As connection timeout occurs, means your new thread do not getting database access within mentioned time and it is due to:
Possibility I: Not closing connection, there should be connection leakage somewhere in your application Solution
You need to ensure this thing and need to check for this leakage and close the connection after use.
Possibility II: Big Transaction Solution
i. Is these insertion synchronized, if it is so then use it very carefully. Use it at block level not method level. And your synchronized block size should be minimum as much as possible.
What happen is if we have big synchronized block, we give connection, but it will be in waiting state as this synchronized block needs too much time for execution. so other thread waiting time increases. Suppose we have 100 users, each have 100 threads for that operation. 1st is executing and it takes too long time. and others are waiting. So there may be a case where 80th 90th,etc thread throw timeout. And For some thread this issue occurs.
So you must need to reduce size of the synchronized block.
ii. And also for this case also check If the transaction is big, then try to cut the transaction into smaller ones if possible:-
For an example here, for one insertion one small transaction. for second other small transaction, like this. And these three small transaction completes operation.
Possibility III: Pool size is not enough if usability of application is too high Solution
Need to increase the pool size. (It is applicable if you properly closes all the connection after use)

You can use Java Executor service in this case .One thread One connection , all asynchronous .Once transaction completed , release the connection back to pool.That way , you can get rid of this timeout issue.
If one connection is inserting the data in 3 tables and other threads trying to make connection are waiting, timeout is bound to happen.

How much time passed after the last active connection?

I use thousands of H2 databases via TCP using Hikari Connection Pools. In a period of 1-30min a lot of queries will be performed on about five of the databases. There will be some queries to some of the other databases too but it is not predictable which and how many databases are affected by those side queries.
I store the already used HikariDataSources in a HashMap for the subsequent queries but I fear that the HashMap (contained objects) is getting too large and therefore I want to create a cleanup thread that closes HikariDataSources and removes them from the HashMap after they weren't used for a certain period of time.
To remove the right connection pools I need to know if a pool was not used for a defined period of time. How do I get that information?
And is there a better way to handle the number of Connection Pools? Maybe there is something that is made for Connection Pools similar as HikariCP is made for Connections. Is there a Pool for Connection Pools? :D

Server overloading due to mysql and tomcat

I have a system where I have multiple sensors and I need to collect data from each sensor every minute. I am using
final Runnable collector = new Runnable(){public void run() {{...}};
scheduler.scheduleAtFixedRate(collector, 0, 1, TimeUnit.MINUTES);
to initiate the process every minute and starts an individual thread for each sensor. Each thread opens a mysql connection and gets details of the sensor from database, opens a socket to collect data and stores data into the database and closes socket and db connection. (I make sure all the connections are closed)
Now there are other applications which I use to generate alerts and reports from that data.
Now as the number of sensors are increasing the server starts to get overload and the applications are getting slow.
I need some expert advice, how to optimise my system and what is the best way to implement these type of systems. Should I use only one application to (collect data + generate alarm + generate reports, generate chart images + etc).
Thanks in advance.
Here is the basic code for data collector application
public class OnlineSampling
{
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
public void startProcess(int start)
{
try
{
final Runnable collector = new Runnable()
{
#SuppressWarnings("rawtypes")
public void run()
{
DataBase db = new DataBase();
db.connect("localhost");
try
{
ArrayList instruments = new ArrayList();
//Check if the database is connected
if(db.isConnected())
{
String query="SELECT instrumentID,type,ip,port,userID FROM onlinesampling WHERE status = 'free'";
instruments = db.getData(query,5);
for(int i=0;i<instruments.size();i++)
{
...
OnlineSamplingThread comThread = new OnlineSamplingThread(userID,id,type,ip,port,gps,unitID,parameterID,timeZone,units,parameters,scaleFactors,offsets,storageInterval);
comThread.start();
//This onlineSamplingThread opens the socket and collects the data and does few more things
}
}
} catch (Exception e)
{
e.printStackTrace();
}
finally
{
//Disconnect from the database
db.disconnect();
}
}
};
scheduler.scheduleAtFixedRate(collector, 0, 60 , TimeUnit.SECONDS);
} catch (Exception e) {}
}
}
UPDATED:
How many sensors do you have? We have around 400 sensors (increasing).
How long is data-gathering session with each sensor?
Each sensor has a small webserver with a sim card in it wo connect to the internet. It depends on the 3G network, in normal conditions it does not take more than 3.5 seconds.
Are you closing the network connections properly after you're done with a single sensor? I make sure I close the socket everytime, I have also set the timeout duration for each socket which is 3.5 seconds.
What OS are you using to collect sensor data? We have our own protocol to communicate with the sensors using socket programming.
Is it configured as a server or a desktop? Each sensor is a server.

What you probably need is connection pooling - instead of opening one DB connection per sensor, have a shared pool of opened connections that each thread uses when it needs to access the DB. That way, the number of connections can be much smaller than the number of your sensors (assuming that most of the time, the program will do other things than read/write into the DB, like communicate with the sensor or wait for sensor response).
If you don't use a framework that has connection pooling feature, you can try Apache Commons DBCP.

Reuse any open files or sockets whenever you can. DBCP is a good start.
Reuse any threads if you can. That "comThread" is very suspect in that regard.
Consider adding queues to your worker threads. This will allow you to have threads that process tasks/jobs serially.
Profile, Profile, Profile!! You really have no idea what to optimize until you profile. JProfiler and YourKit are both very popular, but there are some free tools such as Netbeans and VisualVM.
Use Database caching such as Redis or Memcache
Consider using Stored Procedures versus inline queries
Consider using a Service Oriented Architecture or Micro-Services. Splitting each application function into a separate service which can be tightly optimized for that function.
These are from the smalll amount of code you posted. But profile should give you a much better idea.

Databases are made to handle loads of way more than "hundreds of inserts" per minute. In fact a MySQL database can easily handle hundreds of inserts per second.So, you problem it's probably not related to the load.
The first goal it's to find out "What is slow" or "What is collapsing", run all the queries that your application runs and see if any of them are abnormally slow compared to the others. Alternatively configure the Slow Query Log (https://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html ) with parameters fitting to your problem, and then analice the output.
Once you find "What" is the problem, you can ask for help here with laying out more information. We have no way to help you with the information provided.
However, just as a hunch, what's the max_connections parameter value you have for your database? The default value it's 100 or 151 I think, so if you have more than 151 sensors connected at the database at the same time it will queue or drop the new incoming connections. If that's your issue you just have to minimise the time sensors are connected to your database and it will fix the issue.

Your system is (almost certainly) slowing down because of the enormous overhead of starting threads, opening database connections, and then closing them. 300 sensors means five of these operations per second, continuously. That's too many.
Here's what you need to do to make this scalable.
First step
Make your sampling program long-running, rather than starting it over frequently.
Have it start a sensor thread for each 20 sensors (approximately).
Each thread will query its sensors one by one and insert the results into some sort of thread-safe data structure. A Bag or a Queue would be suitable.
When your sensor threads come to the end of each minute's work, make each of them sleep for the remaining time before the next minute starts, then start over.
Have your program start a single database-writing thread. That thread will open a database connection and hold it open. It will then take results from the queue and write them to the database, waiting when no results are available.
The database-writing thread should start a MySQL transaction, then INSERT some number of rows (ten to 100), then Commit the transaction and start another one, rather than using the default autocommit behavior. (If you're using MyISAM tables, you don't need to do this.)
This will drastically improve your throughput and reduce your MySQL overhead.
Second step
When your workload gets too big for a single program instance with multiple sensor threads to handle, run multiple instances of the program, each with its own list of sensors.
Third step
When the workload gets too big for a single machine, add another one and run new instances of your program on that new machine.

Collecting data from hundreds of sensors should not pose a performance problem if done correctly. To scale this process you should carefully manage your database connections as well as your sensor connections and you should leverage queues for the sensor-sampling and sensor-data writing processes. If your sensor count is stable, you can cache the sensor connection data, possibly with periodic updates to your sensor connection cache.
Use a connection pool to talk to your database. Query your database for the sensor connection information, then release that connection back to the pool as soon as possible -- do not keep the database connection open while talking to the sensor. It's likely reading sensor connection data (which talks to your database) can be done in a single thread, and that thread creates sensor sampling jobs for your executor.
Within each sensor sampling job, open the HTTP sensor connection, collect sensor data, close HTTP sensor connection, and then create a sensor data write job to write the sampling data to the database. Assuming your sensors are distinct nodes, an HTTP sensor connection pool is not likely to help much because HTTP client and server connections are relatively light (unlike database connections).
Writing sensor-sampling data back to the database should also be made in a queue and these database write jobs should use your database connection pool.
With this design, you should be able to easily handle hundreds of sensors and likely thousands of sensors with modest hardware running a Linux server OS as the collector and a properly configured database.
I suggest you test these processes independently, so you know the sustainable rates for each step:
reading and caching sensor connection data and create sampling jobs;
execute sampling jobs and create writing jobs; and,
execute sample data writing jobs.
Let me know if you'd like code as well.

Concurrent use of same JDBC connection by multiple threads

I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?

I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.

Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.

Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.

We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.

Correct way to implement a connection pool

I'm trying to write a multithreading program that connects to a MySQL database and processes the returned set for a query (which has thousands of rows). The problem is that I have implemented the connection pool and I get every thread to open the connection to the database and get the resulting set. But I don't understand which is the advantage of using connection-pooling if retrieving that big set takes such a lot of time. It wouldn't be better if I get the whole set with only one connection (without using pooling) and then I use thread pooling to process it? Or is there a way that every thread takes the next row of the resulting set?

If you have a limited number of threads, I would have a connection per thread.
A connection pool is more efficient if the number of threads which could use a connection is too high and those thread use the connections a relatively low percentage of the time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.