Java thread slowing Postgres DB update

Java thread slowing Postgres DB update - java

I have a piece of legacy code which is basically like:
// instance variable
List<Future<Object>> futureList;
methodA() {
List list = getListOfMessages();
for (Object o : list) {
methodB(o);
}
}
void methodB(Object o) {
// after multi-threading, this statement takes ~2 mins
someDAO.update(o.value);
// some other tasks
}
This works fine, except that I have about million records that are retrieved into list via getListOfMessages(). So I was asked to multithread it and I changed it to something like...
methodA() {
List list = getListOfMessages();
// created executorservice here
for(Object o : list) {
Future future = executorService.submit(methodB(o));
futureList.add(future);
}
// call another method to see the status of each ask
checkFutureStatus(futureList);
}
void checkFutureStatus(List<Future<Object>> list) {
for(Future<Object> future : list) {
try {
future.get(1000, TimeUnit.Milliseconds);
} catch (InterrupException | ExecutionExecption e) {
} catch (TimeoutException e) {
}
}
}
So basically, for each list item, I pass it to methodB but have it handled by separate threads. Once all threads have been submitted, I check the status of the threads but every thread throws a TimeoutException. On debugging, I see that the threads take too long for DB updates...like 1-2 min.
Just to be sure that the threads are not competing with each other, I had the getListOfMessages() return just one message. And even that is taking 1-2 min, If I just revert everything and go non threaded approach, the DB update takes 1ms. I can't really figure out why the multi-thread implementation is causing the db update to take so long.
I'm using Postgres 10 and the db update is via jdbctemplate.
Thank you in advance.
Edit:
Added method to explain how I'm checking the status of each thread.

Related

How to wait for some period of time and after that just return default value?

I have below code which tells me whether my data is PARTIAL or FULL. It works fine most of the time.
public static String getTypeOfData {
DataType type = new SelectTypes().getType();
if (type == DataType.partial || type == DataType.temp) {
return "partial";
}
return "full";
}
But sometimes, this line DataType type = new SelectTypes().getType(); just hangs and it keeps on waiting forever. This code is not in my control as it is developed by some other teams.
What I want to do is if this line DataType type = new SelectTypes().getType(); takes more than 10 second (or any default number of second), my method should return back a default string which can be partial.
Is this possible to do by any chance? Any example will help me to understand better.
I am using Java 7.

The ExecutorService provides methods which allow you to schedule tasks and invoke them with timeout options. This should do what you are after, however, please pay attention since terminating threads could leave your application in an inconsistent state.
If possible, you should contact the owners of the API and ask for clarification or more information.
EDIT: As per your comment, would caching be a possibility? Meaning that on start up, or some other point, you application goes through the SelectTypes and gets their type and stores them. Assuming that these do not change often, you can save them/update them periodically.
EDIT 2: As per your other comment, I cannot really add much more detail. You would need to add a method call which would allow your application to set these up the moment it is launched (this will depend on what framework you are using, if any).
A possible way would be to make the class containing the getTypeOfData() method as a Singleton. You would then amend the class to pull this information as part of its creation mechanism. Lastly, you would then create a Map<String, Type> in which you would throw in all your types. You could use getClass().getName() to populate the key for your map, and what you are doing now for the value part.

If you are not well aware of executor service then the easiest way to achieve this is by using Thread wait and notify mechanism:
private final static Object lock = new Object();
private static DataType type = null;
public static String getTypeOfData {
new Thread(new Runnable() {
#Override
public void run() {
fetchData();
}
}).start();
synchronized (lock) {
try {
lock.wait(10000);//ensures that thread doesn't wait for more than 10 sec
if (type == DataType.partial || type == DataType.temp) {
return "partial";
}else{
return "full";
}
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
return "full";
}
private static void fetchData() {
synchronized (lock) {
type = new SelectTypes().getType();
lock.notify();
}
}
You might have to do some little changes to make it work and looks better like instead of creating new thread directly you can use a Job to do that and some other changes based on your requirement. But the main idea remains same that Thread would only wait for max 10 sec to get the response.

Multi Threading in Google App Engine Datastore

How can I make the operations of getting and setting a property from datastore, thread safe?
Currently, I have code which puts tasks in the queue and each task perform a task and then updates a property called of numberOfTasks which is of type int. It basically fetches the current value of this property and increments it.
However as tasks are executed in the queue, the final value is not coming to be correct because of the threading issue. Sometimes, two tasks tries to update the proeprty at the same time and hence sometime the increment isnt done.
Could anyone please help in getting this done correctly?
Datastore Property Getter Method:
private String doGet(String rowId) throws EntityNotFoundException {
Key egsKey = KeyFactory.createKey(DATASTORE_KIND, rowId);
Entity egsEntity = datastore.get(egsKey);
// schema changed from String to Text type. Transparently handle that here.
Object propertyValue = egsEntity.getProperty(PROPERTY_KEY);
if (propertyValue instanceof String) {
return (String) propertyValue;
}
Text text = (Text) propertyValue;
return text.getValue();
}
Datastore Property SETTER METHOD:
private void doPut(String rowId, List<String> list) {
Entity entity = new Entity(DATASTORE_KIND, rowId);
entity.setProperty(PROPERTY_KEY, list);
datastore.put(entity);
}
Setter and Getter Methods:
public synchronized int getPendingUsersForProcessing() {
String pendingUsersForProcessingAsString = null;
try {
pendingUsersForProcessingAsString = doGet(PENDING_USERS_FOR_PROCESSING);
return Integer.valueOf(pendingUsersForProcessingAsString);
} catch (NumberFormatException e) {
throw new IllegalStateException("The num of last batches processed in Datastore is not a number: "
+ pendingUsersForProcessingAsString);
} catch (EntityNotFoundException e) {
return DEFAULT_PENDING_USERS_FOR_PROCESSING;
}
}
/** {#inheritDoc } */
#Override
public synchronized void setPendingUsersForProcessing(int pendingUsersForProcessing) {
doPut(PENDING_USERS_FOR_PROCESSING, String.valueOf(pendingUsersForProcessing));
LOG.info("Number of Pending Users For Processing is set to : " + pendingUsersForProcessing);
}
Code Where I am trying to update the property:
int pendingUsers = appProperties.getPendingUsersForProcessing();
int requestUsers = request.getUserKeys().size();
appProperties.setPendingUsersForProcessing(pendingUsers + requestUsers);

This is not exactly a threading issue as you may have multiple instances of your app performing the tasks, and those instances do not know about each other. So this is a contention situation.
You have several options on how to resolve it.
Use sharding for your counters.
Instead of constantly updating the same entity, create a new entity for each completed task, using the time when a task was completed as an id. The advantage of this approach is that it creates an audit trail and you can always get stats like the number of tasks completed today, within the last hour, etc. To count the number of entities you can use a keys-only query, which is almost free and very fast. The disadvantage is a higher cost of writing these entities - this is not a solution if you have a very large number of tasks to complete.
Instead of counting tasks, count the results of these tasks. For example, if a task updates a user status, you can count the number of users with "pending" status using a free and fast keys-only query. This is a very good approach if you already have an indexed property that you can use as a flag to count the tasks completed.

how can i make a thread sleep for a while and then start working again?

I have the following code:
public void run()
{
try
{
logger.info("Looking for new tasks to fetch... ");
// definitions ..
for(Task t: tasks)
{
logger.info(" Task " + t.getId() + " is being fetched ");
// processing ... fetching task info from db using some methods
}
Thread.sleep(FREQUENCY);
//t.start();
} catch (Exception e)
{
logger.info("FetcherThread interrupted: "+e.getMessage());
}
}
I'm trying to make the thread to sleep for a specific time "FREQUENCY" and then work again. when I execute this code in eclipse, the thread works only once and then nothing happens and process terminates. If I remove the comment from the statement: t.start(), I get "FetcherThread interrupted: null".
Can anyone tell me where I'm going wrong?
N.B.: I want the thread to be working all the time, but fetching on periods (say every 5 minutes)

You're missing any sort of loop in that code.
It seems that the thread is actually doing what you tell it to do: it runs all the tasks, then sleeps for a bit - then it has no more work to do, and so exits. There are several ways to address this, in ascending order of complexity and correctness:
The simple (and naive) way to address this is to wrap the try-catch block in an infinite loop (while(true) { ... }). This way after the thread finishes sleeping, it will loop back to the top and process all the tasks again.
However this isn't ideal, as it's basically impossible to stop the thread. A better approach is to declare a boolean field (e.g. boolean running = true;), and change the loop to while(running). This way, you have a way to make the thread terminate (e.g. expose a method that sets running to false.) See Sun's Why is Thread.stop() deprecated article for a longer explanation of this.
And taking a step further back, you may be trying to do this at too low a level. Sleeping and scheduling isn't really part of the job of your Runnable. The actual solution I would adopt is to strip out the sleeping, so that you have a Runnable implementation that processes all the tasks and then terminates. Then I would create a ScheduledExecutorService, and submit the "vanilla" runnable to the executor - this way it's the job of the executor to run the task periodically.
The last solution is ideal from an engineering perspective. You have a class that simply runs the job once and exits - this can be used in other contexts whenever you want to run the job, and composes very well. You have an executor service whose job is the scheduling of arbitrary tasks - again, you can pass different types of Runnable or Callable to this in future, and it will do the scheduling bit just as well. And possibly the best part of all, is that you don't have to write any of the scheduling stuff yourself, but can use a class in the standard library which specifically does this all for you (and hence is likely to have the majority of bugs already ironed out, unlike home-grown concurrency code).

Task scheduling has first-class support in Java, don't reinvent it. In fact, there are two implementations: Timer (old-school) and ScheduledExecutorService (new). Read up on them and design your app aroud them.

Try executing the task on a different thread.

You need some kind of loop to repeat your workflow. How shall the control flow get back to the fetching part?

You can put the code inside a loop.( May be while)
while(condition) // you can make it while(true) if you want it to run infinitely.
{
for(Task t: tasks)
{
logger.info(" Task " + t.getId() + " is being fetched ");
// processing ... fetching task info from db using some methods
}
Thread.sleep(FREQUENCY);
}
Whats happening in your case its running the Task loop then sleeping for some time and exiting the thread.

Put the thread in a loop as others have mentioned here.
I would like to add that calling Thread.start more than once is illegal and that is why you get an exception.
If you would like to spawn multiple thread create one Thread object per thread you want to start.
See http://docs.oracle.com/javase/6/docs/api/java/lang/Thread.html#start()

public void run()
{
while (keepRunning) {
try
{
logger.info("Looking for new tasks to fetch... ");
// definitions ..
for(Task t: tasks)
{
logger.info(" Task " + t.getId() + " is being fetched ");
// processing ... fetching task info from db using some methods
t.start();
}
Thread.sleep(FREQUENCY);
} catch (Exception e) {
keepRunning = false;
logger.info("FetcherThread interrupted: "+e.getMessage());
}
}
}
Add a member call keepRunning to your main thread and implement an accessor method for setting it to false (from wherever you need to stop the thread from executing the tasks)

You need to put the sleep in an infinite loop (or withing some condition specifying uptill when you want to sleep). As of now the sleep method is invoked at the end of the run method and behavior you observe is correct.
The following demo code will print "Sleep" on the console after sleeping for a second. Hope it helps.
import java.util.concurrent.TimeUnit;
public class Test implements Runnable {
/**
* #param args
*/
public static void main(String[] args) {
Test t = new Test();
Thread thread = new Thread(t);
thread.start();
}
public void run() {
try {
// logger.info("Looking for new tasks to fetch... ");
// definitions ..
// for(Task t: tasks)
// {
// logger.info(" Task " + t.getId() + " is being fetched ");
// // processing ... fetching task info from db using some methods
// }
while (true) { // your condition here
TimeUnit.SECONDS.sleep(1);
System.out.println("Sleep");
}
// t.start();
} catch (Exception e) {
// logger.info("FetcherThread interrupted: "+e.getMessage());
}
}
}

You could try ScheduledExecutorService (Javadoc).
And us it's scheduleAtFixedRate, which:
Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given period; that is executions will commence after initialDelay then initialDelay+period, then initialDelay + 2 * period, and so on.

AtomicReference to a mutable object and visibility

Say I have an AtomicReferenceto a list of objects:
AtomicReference<List<?>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>());
Thread A adds elements to this list: batch.get().add(o);
Later, thread B takes the list and, for example, stores it in a DB: insertBatch(batch.get());
Do I have to do additional synchronization when writing (Thread A) and reading (Thread B) to ensure thread B sees the list the way A left it, or is this taken care of by the AtomicReference?
In other words: if I have an AtomicReference to a mutable object, and one thread changes that object, do other threads see this change immediately?
Edit:
Maybe some example code is in order:
public void process(Reader in) throws IOException {
List<Future<AtomicReference<List<Object>>>> tasks = new ArrayList<Future<AtomicReference<List<Object>>>>();
ExecutorService exec = Executors.newFixedThreadPool(4);
for (int i = 0; i < 4; ++i) {
tasks.add(exec.submit(new Callable<AtomicReference<List<Object>>>() {
#Override public AtomicReference<List<Object>> call() throws IOException {
final AtomicReference<List<Object>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>(batchSize));
Processor.this.parser.parse(in, new Parser.Handler() {
#Override public void onNewObject(Object event) {
batch.get().add(event);
if (batch.get().size() >= batchSize) {
dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));
}
}
});
return batch;
}
}));
}
List<Object> remainingBatches = new ArrayList<Object>();
for (Future<AtomicReference<List<Object>>> task : tasks) {
try {
AtomicReference<List<Object>> remainingBatch = task.get();
remainingBatches.addAll(remainingBatch.get());
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof IOException) {
throw (IOException)cause;
}
throw (RuntimeException)cause;
}
}
// these haven't been flushed yet by the worker threads
if (!remainingBatches.isEmpty()) {
dao.insertBatch(remainingBatches);
}
}
What happens here is that I create four worker threads to parse some text (this is the Reader in parameter to the process() method). Each worker saves the lines it has parsed in a batch, and flushes the batch when it is full (dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));).
Since the number of lines in the text isn't a multiple of the batch size, the last objects end up in a batch that isn't flushed, since it's not full. These remaining batches are therefore inserted by the main thread.
I use AtomicReference.getAndSet() to replace the full batch with an empty one. It this program correct with regards to threading?

Um... it doesn't really work like this. AtomicReference guarantees that the reference itself is visible across threads i.e. if you assign it a different reference than the original one the update will be visible. It makes no guarantees about the actual contents of the object that reference is pointing to.
Therefore, read/write operations on the list contents require separate synchronization.
Edit: So, judging from your updated code and the comment you posted, setting the local reference to volatile is sufficient to ensure visibility.

I think that, forgetting all the code here, you exact question is this:
Do I have to do additional synchronization when writing (Thread A) and
reading (Thread B) to ensure thread B sees the list the way A left it,
or is this taken care of by the AtomicReference?
So, the exact response to that is: YES, atomic take care of visibility. And it is not my opinion but the JDK documentation one:
The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in The Java Language Specification, Third Edition (17.4 Memory Model).
I hope this helps.

Adding to Tudor's answer: You will have to make the ArrayList itself threadsafe or - depending on your requirements - even larger code blocks.
If you can get away with a threadsafe ArrayList you can "decorate" it like this:
batch = java.util.Collections.synchronizedList(new ArrayList<Object>());
But keep in mind: Even "simple" constructs like this are not threadsafe with this:
Object o = batch.get(batch.size()-1);

The AtomicReference will only help you with the reference to the list, it will not do anything to the list itself. More particularly, in your scenario, you will almost certainly run into problems when the system is under load where the consumer has taken the list while the producer is adding an item to it.
This sound to me like you should be using a BlockingQueue. You can then Limit the memory footprint if you producer is faster than your consumer and let the queue handle all contention.
Something like:
ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (50);
// ... Producer
queue.put(o);
// ... Consumer
List<Object> queueContents = new ArrayList<Object> ();
// Grab everything waiting in the queue in one chunk. Should never be more than 50 items.
queue.drainTo(queueContents);
Added
Thanks to #Tudor for pointing out the architecture you are using. ... I have to admit it is rather strange. You don't really need AtomicReference at all as far as I can see. Each thread owns its own ArrayList until it is passed on to dao at which point it is replaced so there is no contention at all anywhere.
I am a little concerned about you creating four parser on a single Reader. I hope you have some way of ensuring each parser does not affect the others.
I personally would use some form of producer-consumer pattern as I have described in the code above. Something like this perhaps.
static final int PROCESSES = 4;
static final int batchSize = 10;
public void process(Reader in) throws IOException, InterruptedException {
final List<Future<Void>> tasks = new ArrayList<Future<Void>>();
ExecutorService exec = Executors.newFixedThreadPool(PROCESSES);
// Queue of objects.
final ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (batchSize * 2);
// The final object to post.
final Object FINISHED = new Object();
// Start the producers.
for (int i = 0; i < PROCESSES; i++) {
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
Processor.this.parser.parse(in, new Parser.Handler() {
#Override
public void onNewObject(Object event) {
queue.add(event);
}
});
// Post a finished down the queue.
queue.add(FINISHED);
return null;
}
}));
}
// Start the consumer.
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
List<Object> batch = new ArrayList<Object>(batchSize);
int finishedCount = 0;
// Until all threads finished.
while ( finishedCount < PROCESSES ) {
Object o = queue.take();
if ( o != FINISHED ) {
// Batch them up.
batch.add(o);
if ( batch.size() >= batchSize ) {
dao.insertBatch(batch);
// If insertBatch takes a copy we could merely clear it.
batch = new ArrayList<Object>(batchSize);
}
} else {
// Count the finishes.
finishedCount += 1;
}
}
// Finished! Post any incopmplete batch.
if ( batch.size() > 0 ) {
dao.insertBatch(batch);
}
return null;
}
}));
// Wait for everything to finish.
exec.shutdown();
// Wait until all is done.
boolean finished = false;
do {
try {
// Wait up to 1 second for termination.
finished = exec.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
}
} while (!finished);
}

On FutureTask, finally and TimeoutExceptions in Java

I'm trying to understand how to ensure that a specific action completes in a certain amount of time. Seems like a simple job for java's new util.concurrent library. However, this task claims a connection to the database and I want to be sure that it properly releases that connection upon timeout.
so to call the service:
int resultCount = -1;
ExecutorService executor = null;
try {
executor = Executors.newSingleThreadExecutor();
FutureTask<Integer> task = new CopyTask<Integer>();
executor.execute(task);
try {
resultCount = task.get(2, TimeUnit.MINUTES);
} catch (Exception e) {
LOGGER.fatal("Migrate Events job crashed.", e);
task.cancel(true);
return;
}
} finally {
if (executor != null) {
executor.shutdown();
}
The task itself simply wrapps a callable, here is the call method:
#Override
public Integer call() throws Exception {
Session session = null;
try {
session = getSession();
... execute sql against sesssion ...
}
} finally {
if (session != null) {
session.release();
}
}
}
So, my question for those who've made it this far, is: Is session.release() garaunteed to be called in the case that the task fails due to a TimeoutException? I postulate that it is no, but I would love to be proven wrong.
Thanks
edit: The problem I'm having is that occasionally the sql in question is not finishing due to wierd db problems. So, what I want to do is simply close the connection, let the db rollback the transaction, get some rest and reattempt this at a later time. So I'm treating the get(...) as if it were like killing the thead. Is that wrong?

When you call task.get() with a timeout, that timeout only applies to the attempt to obtain the results (in your current thread), not the calculation itself (in the worker thread). Hence your problem here; if a worker thread gets into some state from which it will never return, then the timeout simply ensures that your polling code will keep running but will do nothing to affect the worker.
Your call to task.cancel(true) in the catch block is what I was initially going to suggest, and this is good coding practice. Unfortunately this only sets a flag on the thread that may/should be checked by well-behaved long-running, cancellable tasks, but it doesn't take any direct action on the other thread. If the SQL executing methods don't declare that they throw InterruptedException, then they aren't going to check this flag and aren't going to be interruptable via the typical Java mechanism.
Really all of this comes down to the fact that the code in the worker thread must support some mechanism of stopping itself if it's run for too long. Supporting the standard interrupt mechanism is one way of doing this; checking some boolean flag intermittently, or other bespoke alternatives, would work too. However there is no guaranteed way to cause another thread to return (short of Thread.stop, which is deprecated for good reason). You need to coordinate with the running code to signal it to stop in a way that it will notice.
In this particular case, I expect there are probably some parameters you could set on the DB connection so that the SQL calls will time out after a given period, meaning that control returns to your Java code (probably with some exception) and so the finally block gets called. If not, i.e. there's no way to make the database call (such as PreparedStatement.execute()) return control after some predetermined time, then you'll need to spawn an extra thread within your Callable that can monitor a timeout and forcibly close the connection/session if it expires. This isn't very nice though and your code will be a lot cleaner if you can get the SQL calls to cooperate.
(So ironically despite you supplying a good amount of code to support this question, the really important part is the bit you redacted: "... execute sql against sesssion ..." :-))

You cannot interrupt a thread from the outside, so the timeout will have no effect on the code down in the JDBC layer (perhaps even over in JNI-land somewhere.) Presumably eventually the SQL work will end and the session.release() will happen, but that may be long after the end of your timeout.

The finally block will eventually execute.
When your Task takes longer then 2 minutes, a TimeoutException is thrown but the actual thread continues to perform it's work and eventually it will call the finally block. Even if you cancel the task and force an interrupt, the finally block will be called.
Here's a small example based in your code. You can test these situations:
public static void main(String[] args) {
int resultCount = -1;
ExecutorService executor = null;
try {
executor = Executors.newSingleThreadExecutor();
FutureTask<Integer> task = new FutureTask<Integer>(new Callable<Integer>() {
#Override
public Integer call() throws Exception {
try {
Thread.sleep(10000);
return 1;
} finally {
System.out.println("FINALLY CALLED!!!");
}
}
});
executor.execute(task);
try {
resultCount = task.get(1000, TimeUnit.MILLISECONDS);
} catch (Exception e) {
System.out.println("Migrate Events job crashed: " + e.getMessage());
task.cancel(true);
return;
}
} finally {
if (executor != null) {
executor.shutdown();
}
}
}

Your example says:
copyRecords.cancel(true);
I assume this was meant to say:
task.cancel(true);
Your finally block will be called assuming that the contents of the try block are interruptible. Some operations are (like wait()), some operations are not (like InputStream#read()). It all depends on the operation that that the code is blocking on when the task is interrupted.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.