I have a main thread that runs periodically. It opens a connection, with setAutoCommit(false), and is passed as reference to few child threads to do various database read/write operations. A reasonably good number of operations are performed in the child threads. After all the child threads had completed their db operations, the main thread commits the transaction with the opened connection. Kindly note that I run the threads inside the ExecutorService. My question, is it advisable to share a connection across threads? If "yes" see if the below code is rightly implementing it. If "no", what are other way to perform a transaction in multi-threaded scenario? comments/advise/a-new-idea are welcome. pseudo code...
Connection con = getPrimaryDatabaseConnection();
// let me decide whether to commit or rollback
con.setAutoCommit(false);
ExecutorService executorService = getExecutor();
// connection is sent as param to the class constructor/set-method
// the jobs uses the provided connection to do the db operation
Callable jobs[] = getJobs(con);
List futures = new ArrayList();
// note: generics are not mentioned just to keep this simple
for(Callable job:jobs) {
futures.add(executorService.submit(job));
}
executorService.shutdown();
// wait till the jobs complete
while (!executorService.isTerminated()) {
;
}
List result = ...;
for (Future future : futures) {
try {
results.add(future.get());
} catch (InterruptedException e) {
try {
// a jobs has failed, we will rollback the transaction and throw exception
connection.rollback();
result = null;
throw SomeException();
} catch(Exception e) {
// exception
} finally {
try {
connection.close();
} catch(Exception e) {//nothing to do}
}
}
}
// all the jobs completed successfully!
try {
// some other checks
connection.commit();
return results;
} finally {
try {
connection.close();
} catch(Exception e){//nothing to do}
}
I wouldn't recommend you to share connection between threads, as operations with connection is quite slow and overall performance of you application may harm.
I would rather suggest you to use Apache Connections Pool and provide separate connection to each thread.
You could create a proxy class that holds the JDBC connection and gives synchronized access
to it. The threads should never directly access the connection.
Depending on the use and the operations you provide you could use synchronized methods, or lock on objects if the proxy needs to be locked till he leaves a certain state.
For those not familiar with the proxy design pattern. Here the wiki article. The basic idea is that the proxy instance hides another object, but offers the same functionality.
In this case, consider creating a separate connection for each worker. If any one worker fails, roll back all the connections. If all pass, commit all connections.
If you're going to have hundreds of workers, then you'll need to provide synchronized access to the Connection objects, or use a connection pool as #mike and #NKukhar suggested.
Related
I'm writing stock quotes processing engine and receiving async notifications from postgresql database with pgjdbc-ng-0.6 library. I'd like to know if my database connection is still alive, so I wrote in thread's run() method
while (this.running) {
try {
this.running = pgConnection.isValid(Database.CONNECTION_TIMEOUT);
Thread.sleep(10000);
} catch (SQLException e) {
log.warn(e.getMessage(), e);
gracefullShutdown();
} catch (InterruptedException e) {
gracefullShutdown();
}
}
I read isValid() declaration and it stated that the function will return false if timeout reached. However, isValid() just hangs indefinitely and run() is never exited. To create connectivity issues I disable VPN my application uses to connect to database. Rather rude way,but function must return false... Is this bug in driver or another method exists?
I tried setNetworkTimeout() in PGDataSource, but without any success.
One of the possible handlings of this problem for Enterprise developers is to use existing threadPool to create a separate thread for submitting in
Callable with test call() for DB (in case of Oracle it may be "SELECT 'OK' FROM dual"),
using statement object.execute('...'), (not executeUpdate, because it
may cause the call to get stuck), and further using for example - future.get(3, TimeUnit.Seconds).
Using this approach in a final call you need to catch InterruptedException
for get() call. If Callable throws you set database availability as false, if not then true.
Before using this approach inside the Enterprise application you have to ensure you have access to application server threadPool executor in certain object and #Autowire it there.
I am having difficulty trying to correctly program my application in the way I want it to behave.
Currently, my application (as a Java Servlet) will query the database for a list of items to process. For every item in the list, it will submit an HTTP Post request. I am trying to create a way where I can stop this processing (and even terminate the HTTP Post request in progress) if the user requests. There can be simultaneous threads that are separately processing different queries. Right now, I will stop processing in all threads.
My current attempt involves implementing the database query and HTTP Post in a Callable class. Then I submit the Callable class via the Executor Service to get a Future object.
However, in order properly to stop the processing, I need to abort the HTTP Post and close the database's Connection, Statement and ResultSet - because the Future.cancel() will not do this for me. How can I do this when I call cancel() on the Future object? Do I have to store a List of Arrays that contains the Future object, HttpPost, Connection, Statement, and ResultSet? This seems overkill - surely there must be a better way?
Here is some code I have right now that only aborts the HttpPost (and not any database objects).
private static final ExecutorService pool = Executors.newFixedThreadPool(10);
public static Future<HttpClient> upload(final String url) {
CallableTask ctask = new CallableTask();
ctask.setFile(largeFile);
ctask.setUrl(url);
Future<HttpClient> f = pool.submit(ctask); //This will create an HttpPost that posts 'largefile' to the 'url'
linklist.add(new tuple<Future<HttpClient>, HttpPost>(f, ctask.getPost())); //storing the objects for when I cancel later
return f;
}
//This method cancels all running Future tasks and aborts any POSTs in progress
public static void cancelAll() {
System.out.println("Checking status...");
for (tuple<Future<HttpClient>, HttpPost> t : linklist) {
Future<HttpClient> f = t.getFuture();
HttpPost post = t.getPost();
if (f.isDone()) {
System.out.println("Task is done!");
} else {
if (f.isCancelled()) {
System.out.println("Task was cancelled!");
} else {
while (!f.isDone()) {
f.cancel(true);
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("!Aborting Post!");
try {
post.abort();
} catch (Exception ex) {
System.out.println("Aborted Post, swallowing exception: ");
ex.printStackTrace();
}
}
}
}
}
}
Is there an easier way or a better design? Right now I terminate all processing threads - in the future, I would like to terminate individual threads.
I think keeping a list of all the resources to be closed is not the best approach. In your current code, it seems that the HTTP request is initiated by the CallableTask but the closing is done by somebody else. Closing resources is the responsibility of the one who opened it, in my opinion.
I would let CallableTask to initiate the HTTP request, connect to database and do it's stuff and, when it is finished or aborted, it should close everything it opened. This way you have to keep track only the Future instances representing your currently running tasks.
I think your approach is correct. You would need to handle the rollback yourself when you are canceling the thread
cancel() just calls interrupt() for already executing thread. Have a look here
http://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html:
As it says
An interrupt is an indication to a thread that it should stop what it
is doing and do something else. It's up to the programmer to decide
exactly how a thread responds to an interrupt, but it is very common
for the thread to terminate.
Interrupted thread would throw InterruptedException
when a thread is waiting, sleeping, or otherwise paused for a long
time and another thread interrupts it using the interrupt() method in
class Thread.
So you need to explicitly code for scenarios such as you mentioned in executing thread where there is a possible interruption.
My team has to make some changes and renew an old web application. This application has one main thread and 5 to 15 daemon threads used as workers to retrieve and insert data in a DB.
All those threads have this design (here simplified for convenience):
public MyDaemon implements Runnable {
// initialization and some other stuffs
public void run() {
...
while(isEnabled) {
Engine.doTask1();
Engine.doTask2();
...
Thread.sleep(someTime);
}
}
}
The Engine class provides a series of static methods used to maipulate other methods of DataAccessor classes, some of those methods been static:
public Engine {
public static doTask1() {
ThisDataAccessor.retrieve(DataType data);
// some complicated operations
ThisDataAccessor.insertOrUpdate(DataType data);
}
public static doTask2() {
ThatDataAccessor da = new ThatDataAccessor();
da.retrieve(DataType data);
// etc.
}
...
}
DataAccessor classes usually interact with DB using simple JDBC statements enclosed in synchronized methods (static for some classes). DataSource is configured in the server.
public ThatDataAccessor {
public synchronized void retrieve(DataType data) {
Connection conn = DataSource.getConnection();
// JDBC stuff
conn.close();
}
...
}
The problem is that the main thread needs to connect to DB and when these daemon threads are working we run easily out of available connections from the pool, getting "waiting for connection timeout" exceptions. In addition, sometimes even those daemon threads get the same exception.
We have to get rid of this problem.
We have a connection pool configured with 20 connections, and no more can be added since that "20" is our production environment standard. Some blocks of code need to be synchronized, even if we plan to move the "synchronized" keyword only where really needed. But I don't think that it would make really the difference.
We are not experienced in multithreading programming and we've never faced this connection pooling problem before, that's why I'm asking: is the problem due to the design of those threads? Is there any flaw we haven't noticed?
I have profiled thread classes one by one and as long as they are not running in parallel it seems that there's no bottleneck to justify those "waiting for connection timeout".
The app is running on WebSphere 7, using Oracle 11g.
You are likely missing a finally block somewhere to return the connections back to the pool. With hibernate, I think this is probably done when you call close() or possibly for transactions, when you call rollback(). But I would call close anyway.
For example, I wrote a quick and dirty pool myself to extend an old app to make it multithreaded, and here is some of the handling code (which should be meaningless to you except the finnally block):
try {
connection = pool.getInstance();
connection.beginTransaction();
processFile(connection, ...);
connection.endTransaction();
logger_multiThreaded.info("Done processing file: " + ... );
} catch (IOException e) {
logger_multiThreaded.severe("Failed to process file: " + ... );
e.printStackTrace();
} finally {
if (connection != null) {
pool.releaseInstance(connection);
}
}
It is fairly common for people to fail to use finally blocks properly... For example, look at this hibernate tutorial, and skip to the very bottom example. You will see that in the try{} he uses tx.commit() and in the catch{} he uses tx.rollback(), but he has no session.close(), and no finally. So even if he added a "session.close()" in try and in catch, if his try block threw something other than a RuntimeException, or his catch caused an additional Exception before the try or a non-HibernateException before the rollback(), his connection would not be closed. And without session.close(), I don't think that is actually very good code. But even if the code is seemingly working, a finally gives you assurance that you are protected from this type of problem.
So I would rewrite his methods that use Session to match the idiom shown on this hibernate documentation page. (and also I don't recommend his throwing a RuntimeException, but that is a different topic).
So if you are using Hibernate, I think the above is good enough. But otherwise, you'll need to be more specific if you want specific code help, but otherwise the simple idea that you should use a finally to ensure the connection is closed is enough.
I have tested a socket connection programme with the idea where the socket connection will be one separate thread by itself and then it will enqueue and another separate thread for dbprocessor will pick from the queue and run through a number of sql statement. So I notice here is where the bottle neck that the db processing. I would like to get some idea is what I am doing the right architecture or I should change or improve on my design flow?
The requirement is to capture data via socket connections and run through a db process then store it accordingly.
public class cServer
{
private LinkedBlockingQueue<String> databaseQueue = new LinkedBlockingQueue<String>();
class ConnectionHandler implements Runnable {
ConnectionHandler(Socket receivedSocketConn1) {
this.receivedSocketConn1=receivedSocketConn1;
}
// gets data from an inbound connection and queues it for databse update
public void run(){
databaseQueue.add(message); // put to db queue
}
}
class DatabaseProcessor implements Runnable {
public void run(){
// open database connection
createConnection();
while (true){
message = databaseQueue.take(); // keep taking message from the queue add by connectionhandler and here I will have a number of queries to run in terms of select,insert and updates.
}
}
void createConnection(){
System.out.println("Crerate Connection");
connCreated = new Date();
try{
dbconn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test1?"+"user=user1&password=*******");
dbconn.setAutoCommit(false);
}
catch(Throwable ex){
ex.printStackTrace(System.out);
}
}
}
public void main()
{
new Thread(new DatabaseProcessor()).start(); //calls the DatabaseProcessor
try
{
final ServerSocket serverSocketConn = new ServerSocket(8000);
while (true){
try{
Socket socketConn1 = serverSocketConn.accept();
new Thread(new ConnectionHandler(socketConn1)).start();
}
catch(Exception e){
e.printStackTrace(System.out);
}
}
}
catch (Exception e){
e.printStackTrace(System.out);
}
}
}
It's hard (read 'Impossible') to judge a architecture without the requirements. So I will just make some up:
Maximum Throughput:
Don't use a database, write to a flatfile, possibly stored on something fast like a solid state disc.
Guaranteed Persistence (If the user gets an answer not consisting of an error, the data must be stored securely):
Make the whole thing single threaded, save everything in a database with redundant discs. Make sure you have a competent DBA who knows about Back up and Recovery. Test those on regular intervals.
Minimum time for finishing the user request:
Your approach seems reasonable.
Minimum time for finishing the user request + Maximizing Throughput + Good Persistence (what ever that means):
Your approach seems good. You might plan for multiple threads processing the DB requests. But test how much (more) throughput you actually get and where precisely the bottleneck is (Network, DB CPU, IO, Lock contention ...). Make sure you don't introduce bugs by using a concurrent approach.
Generally, your architecture sounds correct. You need to make sure that your two threads are synchronised correctly when reading/writing from/to the queue.
I am not sure what you mean by "bottle neck that the db processing"? If DB processing takes a long time and and you end up with a long queue, there's not much you can do apart from having multiple threads performing the DB processing (assuming the processing can be parallelised, of course) or do some performance tuning in the DB thread.
If you post some specific code that you believe is causing the problem, we can have another look.
You don't need two threads for this simple task. Just read the socket and execute the statements.
I'm still an undergrad just working part time and so I'm always trying to be aware of better ways to do things. Recently I had to write a program for work where the main thread of the program would spawn "task" threads (for each db "task" record) which would perform some operations and then update the record to say that it has finished. Therefore I needed a database connection object and PreparedStatement objects in or available to the ThreadedTask objects.
This is roughly what I ended up writing, is creating a PreparedStatement object per thread a waste? I thought static PreparedStatments could create race conditions...
Thread A stmt.setInt();
Thread B stmt.setInt();
Thread A stmt.execute();
Thread B stmt.execute();
A´s version never gets execed..
Is this thread safe? Is creating and destroying PreparedStatement objects that are always the same not a huge waste?
public class ThreadedTask implements runnable {
private final PreparedStatement taskCompleteStmt;
public ThreadedTask() {
//...
taskCompleteStmt = Main.db.prepareStatement(...);
}
public run() {
//...
taskCompleteStmt.executeUpdate();
}
}
public class Main {
public static final db = DriverManager.getConnection(...);
}
I believe it is not a good idea to share database connections (and prepared statements) between threads. JDBC does not require connections to be thread-safe, and I would expect most drivers to not be.
Give every thread its own connection (or synchronize on the connection for every query, but that probably defeats the purpose of having multiple threads).
Is creating and destroying PreparedStatement objects that are always the same not a huge waste?
Not really. Most of the work happens on the server, and will be cached and re-used there if you use the same SQL statement. Some JDBC drivers also support statement caching, so that even the client-side statement handle can be re-used.
You could see substantial improvement by using batched queries instead of (or in addition to) multiple threads, though. Prepare the query once, and run it for a lot of data in a single big batch.
The threadsafety is not the issue here. All looks syntactically and functionally fine and it should work for about half a hour. Leaking of resources is however the real issue here. The application will crash after about half a hour because you never close them after use. The database will in turn sooner or later close the connection itself so that it can claim it back.
That said, you don't need to worry about caching of preparedstatements. The JDBC driver and the DB will take care about this task. Rather worry about resource leaking and make your JDBC code as solid as possible.
public class ThreadedTask implements runnable {
public run() {
Connection connection = null;
Statement statement = null;
try {
connection = DriverManager.getConnection(url);
statement = connection.prepareStatement(sql);
// ...
} catch (SQLException e) {
// Handle?
} finally {
if (statement != null) try { statement.close(); } catch (SQLException logOrIgnore) {}
if (connection != null) try { connection.close(); } catch (SQLException logOrIgnore) {}
}
}
}
To improve connecting performance, make use of a connection pool like c3p0 (this by the way does not mean that you can change the way how you write the JDBC code; always acquire and close the resources in the shortest possible scope in a try-finally block).
You're best to use a connection pool and get each thread to request a connection from the pool. Create your statements on the connection you're handed, remembering to close it and so release it back to the pool when you're done. The benefit of using the pool is that you can easily increase the number of available connections should you find that thread concurrency is becoming an issue.