Best Practice to process multiple rows from database in different thread

Best Practice to process multiple rows from database in different thread - java

I would like to ask about the best way to do the following, currently I have many rows that are being inserted into database, with some status like 'NEW'
One thread(ThreadA) is reading 20 rows of the data from table with following query: select * from TABLE where status = 'NEW' order by some_date asc and puts read data into the the queue. It only reads data when number of elements in the queue is less then 20.
Another thread(ThreadB) is reading data from the queue and processes it, during the process it changes the status of the row to something like 'IN PROGRESS'.
My fear is that while ThreadB is processing one row, but still didn't update its status, if the number of elements in the queue is reduced to be lower than 20, it will fetch another 20 elements and put it into the queue, so there is a possibility of having duplicates in queue.
The data might come back with a status like 'NEW' I thought that I can update the data read with some flag(something like fetched), and to set the flag as not read after processing.
I feel like I am missing something. So I would like to ask if there some best practice on how to handle tasks similar to this.
PS. Number of threads that read the data might be increased in the future, this is what I try to keep in mind

Right, since no-one seems to be picking this one up, I'll continue here what was started in the comments:
There are lots of solutions to this one. In your case, with just one processing thread, you might want for example to store just the records ids in the queue. Then ThreadB can fetch the row itself to make sure the status is indeed NEW. Or use optimistic locking with update table set status='IN_PROGRESS' where id=rowId and status='NEW' and quit processing this row on exception.
Optimisting locking is fun, and you could also use it to get rid of producer thread altogether. Imagine a few threads, processing records from database. They could each pick up a record, and try to set the status with optimistic locking as in the first example. It's quite possible to get a lot of contention for records this way, so each thread could fetch N rows, where N is number of threads, or twice that much. And then try to process the first row that it succeeded to set IN_PROGRESS for. This solution makes for a less complicated system, and one less thing to take care of/synchronize with.
And you can have the thread not only pick up records that are NEW, but also these which are IN_PROGRESS and started_date < sysdate = timeout, that would include records that were not processed because of system failure (like a thread managed to set one row to IN_PROGRESS and then your system went down. So you get some resilience here.

Related

Transactions are failing as table is Locked in ORACLE

I do have a below scenario in a legacy codebase -
'Team' table holds information about Team and a counter. It has a column named 'TEAM_NAME' and 'COUNTER'.
Below 3 step operation is being executed in a transaction -
Take a exclusive LOCK on table.
Read the counter corresponding to the team.
Use that counter and increment the counter value & save it back to TEAM table.
Once these steps are performed , Commit the complete operation.
Due to taking an Exclusive LOCK on table in the first step other concurrent transactions are failing. I want to perform this without loosing transactions in the system.
I do think that if i remove LOCK statement and have my method as Synchronized can work but i do have 4 JVMs in real time and still concurrent transaction can hit this.
Please suggest some better design way to handle this.

You should almost never need to do a manual LOCK in Oracle. If you're doing that, you should probably rethink what you are doing. What you should probably be doing is:
Do SELECT ... FOR UPDATE on your table corresponding to the team. This will lock only that row, not the entire table. Concurrent sessions working on different teams will be free to continue.
Do whatever you need to do.
Run an UPDATE ... to update the counter.
An even simpler way would be:
Do UPDATE ... RETURNING my_counter INTO ... which will return the updated value of the counter.
Do what you need to do, keeping in mind you have the incremented counter value.

Multiple threads update same row in database at a time how to maintain consistence

In my java application multiple threads update same row at a time how to get consistence results ?
for example
current row value count =0;
thread 1 updating it to count+1=1;
thread 2 updating at the same time count+1=2
but it should not happen like this
thread 1 updating it to count+1=1;
thread 2 updating at the same time count+1=1;
both threads should not catch the same value because both are running same time
how can we achieve this in jdbc hibernate , database ??

There are two possible ways to go.
Either you choose a pessimistic approach and lock rows, tables or even ranges of rows.
Or you work with versioned Entities (Optimistic Locking).
Maybe you can will find more information here:
https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/transactions.html

Incrementing a counter in this way is hard to manage concurrently. You really need to use pessimistic locking to solve this particular problem.
SELECT 1 FROM mytable WHERE x IS TRUE FOR UPDATE
This will force each thread to wait until the previous one commits before it reads the counter.
This is necessary because you have two potential issues, the first is the read race and the second is the write lock. The write lock gets taken automatically in most RDBMSs but unless you take it explicitly before you read, the counter will be incremented once by both threads together (because both read the original value before the update).
If you need to have parallel writes, then you need to insert a table and then materialize an aggregate later. That is a more complex design pattern though.

Your question is not 100% clear, but I guess you're looking for the different locking strategies: http://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/Hibernate_User_Guide.html#locking

If you're working on a DB that has sequence generators (Oracle, Postgres, ...) you should consider using those. Assuming you're always doing the same increment value and it's no that one thread increments by one and another by two then that should be a good solution.

Here is a detailed answer to this question:
How to properly handle two threads updating the same row in a database
To summarize:
The biggest question is: Are the two threads trying to persist the same data ?To summarize the content of the linked answer. Lets name the two threads T1 and T2. There are couple of aproaches:
Approach 1, this is more or less the last to update Wins situation. It more or less avoids the optimistic locking (the version counting). In case you don't have dependency from T1 to T2 or reverse in order to set status PARSED. This should be good.
Aproach 2 Optimistic Locking This is what you have now. The solution is to refresh the data and restart your operation.
Aproach 3 Row level DB lock The solution here is more or less the same as for approach 2 with the small correction that the Pessimistic lock dure. The main difference is that in this case it may be a READ lock and you might not be even able to read the data from the database in order to refresh it if it is PESSIMISTIC READ.
Aproach 4 application level synchronization There are many different ways to do synchronization. One example would be to actually arrange all your updates in a BlockingQueue or JMS queue(if you want it to be persistent) and push all updates from a single thread. To visualize it a bit T1 and T2 will put elements on the Queue and there will be a single T3 thread reading operations and pushing them to the Database server.

Getting usernames from database that are not being used by a thread

I have a multi threaded Java program where each thread gets one username for some processing which takes about 10 minutes or so.
Right now it's getting the usernames by a sql query that returns one username randomly and the problem is that the same username can be given to more than one thread at a time.
I don't want a username that is being processed by a thread, to be fetched again by another thread. What is a simple and easy way to achieve this goal?

Step-by-step solution:
Create a threads table where you store the threads' state. Among other columns, you need to store the owner user's id there as well.
When a thread is associated to a user, create a record, storing the owner, along with all other juicy stuff.
When a thread is no longer associated to a user, set its owner to null.
When a thread finishes its job, remove its record.
When you randomize your user for threads, filter out all the users who are already associated to at least a thread. This way you know any users at the end of randomization are threadless.
Make sure everything is in place. If, while working on the feature some thread records were created and should be removed or disposed from its owner, then do so.

There is a lot of ways to do this... I can think of three solution to this problem:
1) A singleton class with a array that contains all the user already in use. Be sure that the acces to the array is synchronized and you remove the unused users from it.
2) A flag in the user table that contains a unique Id referencing the thread that is using it. After you have to manage when you remove the flag from the table.
-> As an alternative, why do you check if a pool of connections shared by all the thread could be the solution to your problem ?

You could do one batch query that returns all of the usernames you want from the database and store them in a List (or some type of collection).
Then ensure synchronised access to this list to prevent two threads taking the same username at the same time. Use a synchronised list or a synchronised method to access the list and remove the username from the list.

One way to do it is to add another column to your users table.this column is a simple flag that shows if a user has an assigned thread or not.
but when you query the db you have to wrap it in a transaction.
you begin the transaction and then first you select a user that doesn't have a thread after that you update the flag column and then you commit or roll back.
since the queries are wrapped in a transaction the db handles all the issues that happen in scenarios like this.
with this solution there is no need to implement synchronization mechanisms in your code since the database will do it for you.
if you still have problems after doing this i think you have to configure isolation levels of your db server.

You appear to want a work queue system. Don't reinvent the wheel - use a well established existing work queue.
Robust, reliable concurrent work queuing is unfortunately tricky with relational databases. Most "solutions" land up:
Failing to cope with work items not being completed due to a worker restart or crash;
Actually land up serializing all work on a lock, so all but one worker are just waiting; and/or
Allowing a work item to be processed more than once
PostgreSQL 9.5's new FOR UPDATE SKIP LOCKED feature will make it easier to do what you want in the database. For now, use a canned reliable task/work/message queue engine.
If you must do this yourself, you'll want to have a table of active work items where you record the active process ID / thread ID of the worker that's processing a row. You will need a cleanup process that runs periodically, on thread crash, and on program startup that removes entries for failed jobs (where the worker process no longer exists) so they can be re-tried.
Note that unless the work the workers do is committed to the database in the same transaction that marks the work queue item as done, you will have timing issues where the work can be completed then the DB entry for it isn't marked as done, leading to work being repeated. To absolutely prevent that requires that you commit the work to the DB in the same transaction as the change that marks the work as done, or that you use two-phase commit and an external transaction manager.

Mysql/JDBC: Deadlock

I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).

Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)

How to iterate over db records correctly with hibernate

I want to iterate over records in the database and update them. However since that updating is both taking some time and prone to errors, I need to a) don't keep the db waiting (as e.g. with a ScrollableResults) and b) commit after each update.
Second thing is that this is done in multiple threads, so I need to ensure that if thread A is taking care of a record, thread B is getting another one.
How can I implement this sensibly with hibernate?
To give a better idea, the following code would be executed by several threads, where all threads share a single instance of the RecordIterator:
Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
Record rec = iter.next();
// do something lengthy here
db.save(rec);
}
So my question is how to implement the RecordIterator. If on every next() I perform a query, how to ensure that I don't return the same record twice? If I don't, which query to use to return detached objects? Is there a flaw in the general approach (e.g. use one RecordIterator per thread and let the db somehow handle synchronization)? Additional info: there are way to many records to locally keep them (e.g. in a set of treated records).
Update: Because the overall process takes some time, it can happen that the status of Records changes. Due to that the ordering of the result of a query can change. I guess to solve this problem I have to mark records in the database once I return them for processing...

Hmmm, what about pushing your objects from a reader thread in some bounded blocking queue, and let your updater threads read from that queue.
In your reader, do some paging with setFirstResult/setMaxResults. E.g. if you have 1000 elements maximum in your queue, fill them up 500 at a time. When the queue is full, the next push will automatically wait until the updaters take the next elements.

My suggestion would be, since you're sharing an instance of the master iterator, is to run all of your threads using a shared Hibernate transaction, with one load at the beginning and a big save at the end. You load all of your data into a single 'Set' which you can iterate over using your threads (be careful of locking, so you might want to split off a section for each thread, or somehow manage the shared resource so that you don't overlap).
The beauty of the Hibernate solution is that the records aren't immediately saved to the database, since you're using a transaction, and are stored in hibernate's cache. Then at the end they'd all be written back to the database at once. This would save on those expensive database writes you're worried about, plus it gives you an actual object to work with on each iteration, instead of just a database row.
I see in your update that the status of the records may change during processing, and this could always cause a problem. If this is a constantly running process or long running, then my advice using a hibernate solution would be to work in smaller sets, and yes, add a flag to mark records that have been updated, so that when you move to the next set you can pick up ones that haven't been touched.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.