I have a set of queries which has to be executed simultaneously. For this I am starting a Runnable thread for each queries, calling it from a while loop iterating through the List of queries.
The thread executed each of the queries, and also calculates the time taken by each query to execute. This is done by capturing start time and end time getting the diff. This time should be specific for each query, which needs to be printed out along with the corresponding query/
In this scenario, do I just need to capture times and display? Will it lead to any synchronization problems?
This question is too generic. Will it lead to concurrency problems? May be. Do you share Objects between each Runnable? If you do you MIGHT have issue, if you don't then no.
If your loop looks like this for example (this code proves a point - it does not work):
for(int i=0;i<queries.size();++i){
String query = queries.get(i);
new Thread(new Runnable() {
public void run() {
//Execute the query
}
});
}
Each Thread will execute it's own query WITHOUT accessing some shared data - then no, you will not have concurrency problems - at least in the Java code. You could have problems in the Database - for example multiple queries trying to update the same row.
In this scenario, do I just need to capture times and display? Will it
lead to any synchronization problems?
I don't know, do you? Why don't you just have a member field for each one of your Runnable instances that keeps track of the time it takes to execute - and then when all threads are finished, iterate over the Runnables and display the data. Who besides you knows what your true intentions are. There are only two synchronization problems I know of: Deadlock & Race Conditions. Neither one applies to this scenario.
Related
I've already posted this question on codereview site https://codereview.stackexchange.com/questions/158999/get-set-the-value-in-the-cache-using-the-atomicreference-in-java , but thought of posting here, so that it reaches the wider audience and i can get the quicker solution posting it here as well.
I am having below code which get and set the data in the cache using the synchronized block and i want to know if i can optimize the below code :-
public int getValue() {
AtomicReferenceTest<Integer> cachedIntRef = new AtomicReference<Integer>();
boolean wasCached = true;
Integer cachedInt = cachedIntRef.get();
if (cachedInt == null) {
synchronized (cachedIntRef) {
cachedInt = cachedIntRef.get();
if (cachedInt == null) {
wasCached = false;
// Make DB call to get the data and update the cache.
cachedInt = baseDao.getCloudMaximumWeight();
cachedIntRef.set(cachedInt);
}
}
}
}
I want to know if is there is any way by which i can remove the synchronized block and optimize further or this code is already optimized?
EDIT :- i'll remove the question from one of the site, if i get the answer on any of the site. Also when i profile my application sometime even with less no of threads, i see threads blocking on synchronized piece of code. which made me think as i code is using the AtomicRef , somehow i can get rid of syncronized or is there is some other better way of optimize the code.
I want to know if is there is any way by which i can remove the synchronized block and optimize further or this code is already optimized?
I assume that optimizing the code means removing the synchronized block. The problem with that thinking is that most likely your dao call is significantly more expensive than synchronized. Any IO (especially to a remote database) is going to be at least 4+ orders of magnitude more expensive than the locking.
That said, you can remove the synchronized block if you don't mind multiple DAO calls when initializing the cache. If the DAO calls are inexpensive then having 2 threads making them maybe isn't a problem. There is a race condition on which one's answer will be put into the cache but chances are their results will be the same anyway. I often do this and assume that as the application starts up, the first couple of calls are going to be more expensive as the cache warms. But are 2 threads making the same DAO request ever going to be faster than 1 thread doing it and 1 waiting for the other thread to finish?
If there is a number of different DAO calls then you can try some sort of lock segregation so not all cache requests go through the same lock. This would allow some parallelization which might help. I can't tell if your code is specific or an example of the problem. This is how the ConcurrentHashMap works for example.
But really I would be sure that this section of code has performance problems before I worry too much about it. And even if a profiler is saying that it is a primary time sink, it may just be that the DAO calls are the most expensive part of the equation so saving a couple with synchronization would be the best way to speed it up anyway. You can take out the dao calls and replace with a straight assignment if you need to see if it the synchronized or dao.* calls that is the problem.
Try using volatile integer instead. Maybe I am missing something here but I don't see the use case for the AtomicReference here.
Scenario: One thread is being called up to thousands of times per second to do inserts to the same table and is currently doing them one-by-one.
Goal: Do periodic batch inserts instead to improve performance.
Trying to use a TimerTask to instead add objects being saved to a list as the thread's saveItem method gets called, then combine them for a batch insert every 2 seconds or so.
First thought was to have two Lists, call them toSave and toSaveBackup. When the thread's saveItem method is called to save something it will be added to the toSave list, but once the TimerTask kicks off and needs to save everything to the database, it will set an AtomicBoolean flag saveInProgress to true. This flag is checked by saveItem and it will add to toSaveBackup instead of toSave if saveInProgress is true. When the batch save is complete, all items will in toSaveBackup will be moved to the toSave list, probably with a synchronized block on the lists.
Is this a reasonable approach? Or is there a better best practice? My googling skills have failed me so any help is welcome.
Misc info:
All these inserts are to the same table
Inserts are driven by receipt of MQTT messages, so I can't combine them in a batch before this point
Update: A tweak on CKing's answer below achieved the desired approach: A TimerTask runs every 100 ms and checks the size of the saveQueue and how long it's been since a batch was saved. If either of these values exceed the configured limit (save every 2 seconds or every 1000 records etc) then we save. A LinkedBlockingQueue is used to simplify sychronization.
Thanks again to everyone for their help!
It looks like your primary objective is to wait for a predefined amount of time and then trigger an insert. When an insert is in progress, you wan't other insert requests to wait till the insert is complete. After the insert is complete, you want to repeat the same process again for the next insert requests.
I would propose the following solution with the above understanding in mind. You don't need to have two separate lists to achieve your goal. Also note that I am proposing an old fashioned solution for the sake of explanation. I cover some other APIs you can use at the end of my explanation. Here goes :
Define a Timer and a TimerTask that will run every N seconds.
Define an ArrayList that will be used for queuing up insert requests sent to saveItem method.
The saveItem method can define a sycnrhonized block around this ArrayList. You can add items to the ArrayList within this synchronized block as and when saveItem is called.
On the other side of the equation, TimerTask should have a synchronized block on the same ArrayList as well inside its run method. It should insert all the records present in the ArrayList at that given moment into the database. Once the insert is complete, the TimerTask should clear the ArrayList and finally come out of the synchronized block.
You will no longer need to explicitly monitor if an insert is in progress or create a copy of your ArrayList when an insert is in progress. Your ArrayList becomes the shared resource in this case.
If you also want size to be a deciding factor for proceeding with inserts, you can do this :
Define an int called waitAttempts in TimerTask. This field indicates the number of consecutive wake ups for which the TimerTask should do nothing if the size of the list is not big enough.
Everytime the TimerTask wakes up, it can do something like if(waitAttempts%3==0 || list.size > 10) { insert data } else { increment waitAttempts and do nothing. Exit the synchronized block and the run method }. You can change 3 and 10 to whatever number suits your throughput requirements.
Note Intrinsic locking was used as a means of explaining the approach. One can always take this approach and implement it using modern constructs such as a BlockingQueue that would eliminate the need to synchronize manually on the ArrayList. I would also recommend the use of Executors.newSingleThreadScheduledExecutor() instead of a TimerTask as it ensures that there will only be one thread running at any given time and there wont be an overlap of threads. Also, the logic for waitAttempts is indicative and will need to be adjusted to work correctly.
I have a thread called T1 for reading a flat file and parsing it. I need to create a new thread called T2 for parsing some part of this file and later this T2 thread would need to update the status of the original entity, which is also being parsed and updated by the original thread T1.How can I handle this situation?
I receive a flat file having the below sample records:
AAAA
BBBB
AACC
BBCC
AADD
BBDD
First this file is saved in database in Received status. Now all the records starting with BB or with AA need to be processed in a separate thread. Once it's successfully parsed, both threads will try to update the status of this file object in a database to Parsed. In some cases, I get staleObjectException. Edit: And the work done by any thread before the exception is lost. We are using optimistic locking. What is the best way of avoiding this problem?
Possible hibernate exceptions when two threads update the same Object?
The above post helps to understand some part of it, but it does not help to resolve my problem.
Part 1 - Your problem
The main reason for you receiving this exception is that you are using Hibernate with optimistic locking. This basically tells you that either thread T1 or thread T2 have already updated the state to PARSED and now the other thread is holding old version of the row with smaller version than the one held in the database and trying to update the state to PARSED as well.
The question here is "Are the two threads trying to preserve the same data?". If the answer is yes then even if the last update succeed there shouldn't be any problem, because eventually they are updating the row to the same state. In that case you don't need Optimistic locking because your data will, in any case be in sync.
The main problem comes if after the state is set to RECIEVED if the two threads T1 and T2 actually depending on one another when reseting to the next status. In that case you need to ensure that if T1 has executed first(or vice versa) T2 needs to refresh the data for the updated row and re-apply its changes based on the changes already pushed by T1. In this case the solution is the following. If you encounter staleObjectException you basically need to refresh your data from the database and restart your operation.
Part 2 analysis on the link posted Possible hibernate exceptions when two threads update the same Object?
Approach 1, this is more or less the last to update Wins situation. It more or less avoids the optimistic locking (the version counting). In case you don't have dependency from T1 to T2 or reverse in order to set status PARSED. This should be good.
Aproach 2 Optimistic Locking This is what you have now. The solution is to refresh the data and restart your operation.
Aproach 3 Row level DB lock The solution here is more or less the same as for approach 2 with the small correction that the Pessimistic lock dure. The main difference is that in this case it may be a READ lock and you might not be even able to read the data from the database in order to refresh it if it is PESSIMISTIC READ.
Aproach 4 application level synchronization There are many different ways to do synchronization. One example would be to actually arrange all your updates in a BlockingQueue or JMS queue(if you want it to be persistent) and push all updates from a single thread. To visualize it a bit T1 and T2 will put elements on the Queue and there will be a single T3 thread reading operations and pushing them to the Database server.
If you use application level synchronization you should be aware that no all structures can be distributes in a multi-server deployment.
Well I can't think of anything else for now :)
I'm not certain I understand the question, but it seems it would constitute a logic error for a thread T1 which is only processing, for example, records beginning with AA to mark the entire file as "Parsed"? What happens if, for example, your application crashes after T1 updates but while T2 is still processing BB records? Some BB records are likely to be lost, correct?
Anyhow, the crux of the issue is you have a race condition with two threads updating the same object. The stale object exception just means one of your threads lost the race. A better solution avoids a race entirely.
(I am assuming here that the individual record processing is idempotent, if that's not the case I think you have bigger problems as some failure modes will result in re-processing of records. If record processing has to happen once and only once, then you have a harder problem for which a message queue would probably be a better solution.)
I would leverage the functionality of java.util.concurrent to dispatch records out to threaded workers, and have the thread interacting with hibernate block until all records have been processed, at which point that thread can mark the file as "Parsed".
For example,
// do something like this during initialization, or use a Guava LoadingCache...
Map<RecordType, Executor> executors = new HashMap<>();
// note I'm assuming RecordType looks like an enum
executors.put(RecordType.AA_RECORD, Executors.newSingleThreadExecutor());
then as you process the file, you dispatch each record as follows, building up a list of futures corresponding to the status of the queued tasks. Let's assume successfully processing a record returns a boolean "true":
List<Future<Boolean>> tasks = new ArrayList<>();
for (Record record: file.getRecords()) {
Executor executorForRecord = executors.get(record.getRecordType());
tasks.add(executor.submit(new RecordProcessor(record)));
}
Now wait for all tasks to complete successfully - there are more elegant ways to do this, especially with Guava. Note you also need to deal with ExecutionException here if your task failed with an exception, I'm glossing over that here.
boolean allSuccess = true;
for (Future<Boolean> task: tasks) {
allSuccess = allSuccess && task.get();
if (!allSuccess) break;
}
// if all your tasks completed successfully, update the file record
if (allSuccess) {
file.setStatus("Parsed");
}
Assuming that each thread T1,T2 will parse different parts of the file, means no one override the other thread parsing. the best thing is to decouple your parsing process from the DB commit.
T1, T2 will do the parsing T3 or Main Thread will do the commit after both T1,T2 has finished. and i think in this approach its more correct to change the file status to Parsed only when both threads has finished.
you can think of T3 as CommitService class which wait till T1,T2 finsih and then commit to DB
CountDownLatch is a helpful tool to do it. and here is an Example
I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?
Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).
The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )
While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.
I'm using JDBC, need to constantly check the database against changing values.
What I have currently is an infinite loop running, inner loop iterating over a changing values, and each iteration checking against the database.
public void runInBG() { //this method called from another thread
while(true) {
while(els.hasElements()) {
Test el = (Test)els.next();
String sql = "SELECT * FROM Test WHERE id = '" + el.getId() + "'";
Record r = db.getTestRecord(sql);//this function makes connection, executeQuery etc...and return Record object with values
if(r != null) {
//do something
}
}
}
}
I'm think this isn't the best way.
The other way I'm thinking is the reverse, to keep iterating over the database.
UPDATE
Thank you for the feedback regarding timers, but I don't think it will solve my problem.
Once a change occurs in the database I need to process the results almost instantaneously against the changing values ("els" from the example code).
Even if the database does not change it still has to check constantly against the changing values.
UPDATE 2
OK, to anyone interested in the answer I believe I have the solution now. Basically the solution is NOT to use the database for this. Load in, update, add, etc... only whats needed from the database to memory.
That way you don't have to open and close the database constantly, you only deal with the database when you make a change to it, and reflect those changes back into memory and only deal with whatever is in memory at the time.
Sure this is more memory intensive but performance is absolute key here.
As to the periodic "timer" answers, I'm sorry but this is not right at all. Nobody has responded with a reason how the use of timers would solve this particular situation.
But thank you again for the feedback, it was still helpful nevertheless.
Another possibility would be using ScheduledThreadPoolExecutor.
You could implement a Runnable containing your logic and register it to the ScheduledExecutorService as follows:
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(10);
executor.scheduleAtFixedRate(myRunnable, 0, 5, TimeUnit.SECONDS);
The code above, creates a ScheduledThreadPoolExecutor with 10 Threads in its pool, and would have a Runnable registered to it that will run in a 5 seconds period starting immediately.
To schedule your runnable you could use:
scheduleAtFixedRate
Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given period; that is executions will commence after initialDelay then initialDelay+period, then initialDelay + 2 * period, and so on.
scheduleWithFixedDelay
Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given delay between the termination of one execution and the commencement of the next.
And here you can see the advantages of ThreadPoolExecutor, in order to see if it fits to your requirements. I advise this question: Java Timer vs ExecutorService? too in order to make a good decision.
Keeping the while(true) in the runInBG() is a bad idea. You better remove that. Instead you can have a Scheduler/Timer(use Timer & TimerTask) which would call the runInBG() periodically and check for the updates in the DB.
u could use a timer--->
Timer timer = new Timer("runInBG");
//Taking an instance of class contains your repeated method.
MyClass t = new MyClass();
timer.schedule(t, 0, 2000);
As you said in the comment above, if application controls the updates and inserts then you can create a framework which notifies for 'BG' thread or process about change in database. Notification can be over network via JMS or intra VM using observer pattern or both local and remote notifications.
You can have generic notification message like (it can be class for local notification or text message for remote notifications)
<Notification>
<Type>update/insert</Type>
<Entity>
<Name>Account/Customer</Name>
<Id>id</Id>
<Entity>
</Notification>
To avoid a 'busy loop', I would try to use triggers. H2 also supports a DatabaseEventListener API, that way you wouldn't have to create a trigger for each table.
This may not always work, for example if you use a remote connection.
UPDATE 2
OK, to anyone interested in the answer I believe I have the solution now. Basically the solution is NOT to use the database for this. Load in, update, add, etc... only whats needed from the database to memory. That way you don't have to open and close the database constantly, you only deal with the database when you make a change to it, and reflect those changes back into memory and only deal with whatever is in memory at the time. Sure this is more memory intensive but performance is absolute key here.