BlockingQueue inside Transaction java - java

I am in process of building a system which has a basic Produce Consumer paradigm flavor to it, but the Producer part needs to be a in a Transaction mode.
Here is my exact scenario :
Poller Thread -
[Transaction START]
Polls the DB for say 10 records
* Sets the status in DB for those records as IN-Progress
* Puts the above 10 records in a LinkedBlockingQueue - workqueue
[Transaction END]
Worker Thread Pool of 5
* Polls the workqueue for the tasks, do some lookups and update the same records in DB with some looked up values.
Now my problem is with the Part 1 of the process because if for lets says some reason my process of extracting and updating from DB is successful but inserting in the queue fails for one record, i can rollback the whole transaction, and all my records in DB will be in state NOT Processed, but there can some elements which will be inserted in this work queue, and my worker threadpool can pickup them and start processing, which should not happen.
Trying to find if there is a way to have the writes to the blockingqueue in transnational manner.
Thinking of adding some writelock() readlock() mechanisms, if i can stop the worker threads from reading if something is being written in the queue.
Any thoughts for a better approach.
Thanks,

Consider the worst case scenario: the unplug case (database connection lost) and the crash case (program out of memory). How would you recover from that?
Some hints:
If you can think of a reason why inserting in the queue would fail (queue full), don't start the transaction. Just skip one poll.
First commit the in-progress transaction, then add all records to the workqueue. Or use one transaction per record so you can add the records to the workqueue one by one.
Maintain an in-memory HashSet of IDs of all records being processed. If the ID is in the set but the record is not in-progress, or vice versa, something is very wrong (e.g. task for a record did not complete/crashed).
Set a timestamp when in-progress was set. Have another background-process check for records that are in-progress for too long. Reset the in-progress state if the ID is not in the in-progress HashSet and your normal process will retry to operation.
Make your tasks idempotent: see if you can find a way that tasks can recognize work for the record has already been done. This might be a relative expensive operation, but it will give you the guarantee that work is only done once in case of retries.

Related

Need suggestion about java thread pool execution queue processing

In my application we have number of clients Databases, every hour we
get new data for processing in that databases
There is a cron to checks data from this databases and pickup the data and
Then Create thread pool and It start execution of 30 threads in parallel and
remaining thread are store in queue
it takes several hours to process this all threads
So while execution, if new data arrives then it has to wait, because this cron
will not pickup this newly arrived data until it's current execution is not
got finished.
Sometimes we have priority data for processing but due to this case that
clients also need to wait for several hours for processing their data.
Please give me the suggestion to avoid this wait state for newly arrived data
(I am working on java 1.7 , tomcat7 and SQL server2012)
Thank you in advance
Please let me know, for more information on this if not clear
Each of your thread should procces data in bulk (for example 100/1000 records) and this records should be selected from DB by priority. Each time you select new records for proccesing data with highest priority go first.
I can't create comment yet :(
For this problem we are thinking about two solution
Create more then one thread pool for processing normal and high
priority data.
Create more then one tomcat instance with same code for processing normal and priority
data
But I am not understanding which solution is best for my case 1 or 2
Please give me suggestions about above solutions, so that i can take decision on it
You can use ExecutorService newCachedThreadPool()
Benefits of using a cached thread pool :
The pool creates new threads if needed but reuses previously constructed threads if they are available.
Only if no threads are available for reuse will a new thread be created and added to the pool.
Threads that have not been used for more than sixty seconds are terminated and removed from the cache. Hence a pool which has not been used long enough will not consume any resources.

How to properly handle two threads updating the same row in a database

I have a thread called T1 for reading a flat file and parsing it. I need to create a new thread called T2 for parsing some part of this file and later this T2 thread would need to update the status of the original entity, which is also being parsed and updated by the original thread T1.How can I handle this situation?
I receive a flat file having the below sample records:
AAAA
BBBB
AACC
BBCC
AADD
BBDD
First this file is saved in database in Received status. Now all the records starting with BB or with AA need to be processed in a separate thread. Once it's successfully parsed, both threads will try to update the status of this file object in a database to Parsed. In some cases, I get staleObjectException. Edit: And the work done by any thread before the exception is lost. We are using optimistic locking. What is the best way of avoiding this problem?
Possible hibernate exceptions when two threads update the same Object?
The above post helps to understand some part of it, but it does not help to resolve my problem.
Part 1 - Your problem
The main reason for you receiving this exception is that you are using Hibernate with optimistic locking. This basically tells you that either thread T1 or thread T2 have already updated the state to PARSED and now the other thread is holding old version of the row with smaller version than the one held in the database and trying to update the state to PARSED as well.
The question here is "Are the two threads trying to preserve the same data?". If the answer is yes then even if the last update succeed there shouldn't be any problem, because eventually they are updating the row to the same state. In that case you don't need Optimistic locking because your data will, in any case be in sync.
The main problem comes if after the state is set to RECIEVED if the two threads T1 and T2 actually depending on one another when reseting to the next status. In that case you need to ensure that if T1 has executed first(or vice versa) T2 needs to refresh the data for the updated row and re-apply its changes based on the changes already pushed by T1. In this case the solution is the following. If you encounter staleObjectException you basically need to refresh your data from the database and restart your operation.
Part 2 analysis on the link posted Possible hibernate exceptions when two threads update the same Object?
Approach 1, this is more or less the last to update Wins situation. It more or less avoids the optimistic locking (the version counting). In case you don't have dependency from T1 to T2 or reverse in order to set status PARSED. This should be good.
Aproach 2 Optimistic Locking This is what you have now. The solution is to refresh the data and restart your operation.
Aproach 3 Row level DB lock The solution here is more or less the same as for approach 2 with the small correction that the Pessimistic lock dure. The main difference is that in this case it may be a READ lock and you might not be even able to read the data from the database in order to refresh it if it is PESSIMISTIC READ.
Aproach 4 application level synchronization There are many different ways to do synchronization. One example would be to actually arrange all your updates in a BlockingQueue or JMS queue(if you want it to be persistent) and push all updates from a single thread. To visualize it a bit T1 and T2 will put elements on the Queue and there will be a single T3 thread reading operations and pushing them to the Database server.
If you use application level synchronization you should be aware that no all structures can be distributes in a multi-server deployment.
Well I can't think of anything else for now :)
I'm not certain I understand the question, but it seems it would constitute a logic error for a thread T1 which is only processing, for example, records beginning with AA to mark the entire file as "Parsed"? What happens if, for example, your application crashes after T1 updates but while T2 is still processing BB records? Some BB records are likely to be lost, correct?
Anyhow, the crux of the issue is you have a race condition with two threads updating the same object. The stale object exception just means one of your threads lost the race. A better solution avoids a race entirely.
(I am assuming here that the individual record processing is idempotent, if that's not the case I think you have bigger problems as some failure modes will result in re-processing of records. If record processing has to happen once and only once, then you have a harder problem for which a message queue would probably be a better solution.)
I would leverage the functionality of java.util.concurrent to dispatch records out to threaded workers, and have the thread interacting with hibernate block until all records have been processed, at which point that thread can mark the file as "Parsed".
For example,
// do something like this during initialization, or use a Guava LoadingCache...
Map<RecordType, Executor> executors = new HashMap<>();
// note I'm assuming RecordType looks like an enum
executors.put(RecordType.AA_RECORD, Executors.newSingleThreadExecutor());
then as you process the file, you dispatch each record as follows, building up a list of futures corresponding to the status of the queued tasks. Let's assume successfully processing a record returns a boolean "true":
List<Future<Boolean>> tasks = new ArrayList<>();
for (Record record: file.getRecords()) {
Executor executorForRecord = executors.get(record.getRecordType());
tasks.add(executor.submit(new RecordProcessor(record)));
}
Now wait for all tasks to complete successfully - there are more elegant ways to do this, especially with Guava. Note you also need to deal with ExecutionException here if your task failed with an exception, I'm glossing over that here.
boolean allSuccess = true;
for (Future<Boolean> task: tasks) {
allSuccess = allSuccess && task.get();
if (!allSuccess) break;
}
// if all your tasks completed successfully, update the file record
if (allSuccess) {
file.setStatus("Parsed");
}
Assuming that each thread T1,T2 will parse different parts of the file, means no one override the other thread parsing. the best thing is to decouple your parsing process from the DB commit.
T1, T2 will do the parsing T3 or Main Thread will do the commit after both T1,T2 has finished. and i think in this approach its more correct to change the file status to Parsed only when both threads has finished.
you can think of T3 as CommitService class which wait till T1,T2 finsih and then commit to DB
CountDownLatch is a helpful tool to do it. and here is an Example

Android: blocking or non-blocking queue for continuous SQL inserts?

I have an app where upon pressing a Start button, a service will begin that polls a few sensors, stores the sensor data into some object whenever the sensor values change. Every 10ms, a database insert occurs that takes the objects current values and stores them into the database. This happens for 30 minutes
Given the speed and duration of insertion, I want to run this in a separate thread from the UI thread so navigation doesn't take a hit. So my service will offer some data to the thread by adding it to a queue, then the other thread (consumer) will take from the queue and insert into the database
When the Stop button is pressed, I need to make sure to process the rest of the queue before killing off the thread.
It seems that everywhere I look, some sort of blocking queue is recommended for producer/consumer type situations (e.g. LinkedBlockingQueue vs ConcurrentLinkedQueue, or What's the different between LinkedBlockingQueue and ConcurrentLinkedQueue?)
My question is, does a blocking queue make sense in my situation?
The most vital thing in this app is that all data gets inserted into the db. From what I understand (please correct me if I'm wrong), but if the queue becomes full, and the consumer thread can't do inserts quickly enough to free up more queue space, then the producer is blocked from adding things to the queue? If that's right then by the time queue the has free space, a few sensor readings would have gone by and they would not be inserted into the db due to the blocking
At the end of the day, I just need the best way to ensure that data gets inserted every 10ms without skipping a beat. In my mind it makes sense to dump the values into some unbounded, limitless queue every 10ms, and have the consumer poll it as soon as it's able. Then when Stop is pressed, drain the rest of the queue before killing the thread.
So what is the correct way to handle this in a 1 producer/1 consumer situations?
If I were you, I would use a single thread executor for this task - it already comes with the exact functionality that you need out of the box. More info here.

Getting usernames from database that are not being used by a thread

I have a multi threaded Java program where each thread gets one username for some processing which takes about 10 minutes or so.
Right now it's getting the usernames by a sql query that returns one username randomly and the problem is that the same username can be given to more than one thread at a time.
I don't want a username that is being processed by a thread, to be fetched again by another thread. What is a simple and easy way to achieve this goal?
Step-by-step solution:
Create a threads table where you store the threads' state. Among other columns, you need to store the owner user's id there as well.
When a thread is associated to a user, create a record, storing the owner, along with all other juicy stuff.
When a thread is no longer associated to a user, set its owner to null.
When a thread finishes its job, remove its record.
When you randomize your user for threads, filter out all the users who are already associated to at least a thread. This way you know any users at the end of randomization are threadless.
Make sure everything is in place. If, while working on the feature some thread records were created and should be removed or disposed from its owner, then do so.
There is a lot of ways to do this... I can think of three solution to this problem:
1) A singleton class with a array that contains all the user already in use. Be sure that the acces to the array is synchronized and you remove the unused users from it.
2) A flag in the user table that contains a unique Id referencing the thread that is using it. After you have to manage when you remove the flag from the table.
-> As an alternative, why do you check if a pool of connections shared by all the thread could be the solution to your problem ?
You could do one batch query that returns all of the usernames you want from the database and store them in a List (or some type of collection).
Then ensure synchronised access to this list to prevent two threads taking the same username at the same time. Use a synchronised list or a synchronised method to access the list and remove the username from the list.
One way to do it is to add another column to your users table.this column is a simple flag that shows if a user has an assigned thread or not.
but when you query the db you have to wrap it in a transaction.
you begin the transaction and then first you select a user that doesn't have a thread after that you update the flag column and then you commit or roll back.
since the queries are wrapped in a transaction the db handles all the issues that happen in scenarios like this.
with this solution there is no need to implement synchronization mechanisms in your code since the database will do it for you.
if you still have problems after doing this i think you have to configure isolation levels of your db server.
You appear to want a work queue system. Don't reinvent the wheel - use a well established existing work queue.
Robust, reliable concurrent work queuing is unfortunately tricky with relational databases. Most "solutions" land up:
Failing to cope with work items not being completed due to a worker restart or crash;
Actually land up serializing all work on a lock, so all but one worker are just waiting; and/or
Allowing a work item to be processed more than once
PostgreSQL 9.5's new FOR UPDATE SKIP LOCKED feature will make it easier to do what you want in the database. For now, use a canned reliable task/work/message queue engine.
If you must do this yourself, you'll want to have a table of active work items where you record the active process ID / thread ID of the worker that's processing a row. You will need a cleanup process that runs periodically, on thread crash, and on program startup that removes entries for failed jobs (where the worker process no longer exists) so they can be re-tried.
Note that unless the work the workers do is committed to the database in the same transaction that marks the work queue item as done, you will have timing issues where the work can be completed then the DB entry for it isn't marked as done, leading to work being repeated. To absolutely prevent that requires that you commit the work to the DB in the same transaction as the change that marks the work as done, or that you use two-phase commit and an external transaction manager.

Stream a database recordset to multiple thread workers

I have a process which requires streaming data from a database and passing the records off to an external server for processing before returning the results to store back in the database.
Get database row from table A
Hand off to external server
Receive result
insert database row into table B
Currently this is a single-threaded operation, and the bottleneck is the external server process and so I would like to improve performance by using other instances of the external server process to handle requests.
Get 100 database rows from table A
For each row
Hand off to external server 1
Receive Result
insert database row into table B
In parallel get 100 database rows from table A
For each row
Hand off to external server 2
Receive Result
insert database row into table B
Problem 1
I have been investigating Java thread pools, and dispatching records to the external servers this way, however I'm not sure how to fetch records from the database as quickly as possible without the workers rejecting new tasks. Can this be done with thread pools? What architecture should be used to achieve this?
Problem 2
At present I have optimised the database inserts by using batch statements and only executing once 2000 records have been processed. Would it be possible to adopt a similar approach within the workers?
Any help in structuring a solution to this problem would be greatly appreciated.
Based on your comments, I think the key point is controlling the count of pending tasks. You have several options:
Do an estimate on the number of records in your data set. Then, decide on a batch size that will produce a reasonable number of tasks. For example, if you want to limit pending task count to 100. Then if you have 100K records, you can have a batch size of 1K. If you have 1Mil records, then set batch size to 10K.
Supply your own bounded BlockingQueue to the threadpool. If you haven't done it before, you probably should study the java.util.concurrent package carefully before doing this.
Or you can use a java.util.concurrent.Semaphore, which is a simpler facility than a user supplied queue:
Declare a semaphore with your pending task count limit
Semaphore mySemaphore = new Semaphore(max_pending_task_count);
Since your task generation is fast, you can use a single thread to generate all tasks. In your task generating thread:
while(hasMoreTasks()) {
// this will block if you've reached the count limit
mySemaphore.acquire();
// generate a new task only after acquire
// The new task must have a reference to the Semaphore
Task task = new Task(..., mySemaphore);
threadpool.submit(task);
}
// now that you've generated all tasks,
// time to wait for them to finish.
// you may have a better way to detect that, however
while(mySemaphore.availablePermits() < max_pending_task_count) {
Thread.sleep(some_time);
}
// now, go ahead dealing with the results
In your Task thread:
public void run() {
...
// when finished, do a release which increases the permit
// by 1 and inform your task generator thread to produce 1 more task
mySemaphore.release();
}

Categories

Resources