Appropriate way to batch method invocations?

Appropriate way to batch method invocations? - java

Suppose something like the following:
public boolean doThisThing(SomeArg arg) {
if(iAmAllowedToDoIt()) {
doThing(arg);
return true;
} else {
return false;
}
Suppose that iAmAllowedToDoIt() is a very expensive method, and doThisThing() is invoked by many threads concurrently, and I am allowed to do everything given that I am allowed to do anything, is there a way to batch invocations of iAmAllowedToDoIt() such that I would accumulate SomeArgs in a concurrent data structure, and invoke doThing on all of them at once after resolving iAmAllowedToDoIt only one time without modifying the API? What would that code look like? I can't figure out how to do multithreaded batching performantly like this without modifying the API. An ideal answer would include something that doesn't rely on blocking for a fixed period of time to accumulate invocations of doThisThing().
Ideally it would end up as something like:
Call doThisThing
Call iAmAllowedToDoIt asynchronously
All calls to doThisThing before (2) retuns block until (2) returns
(2) Returns, if true invoke doThing for all blocked doThisThing()s

Your containing object could have an AtomicReference that holds a CompleteableFuture for the computation of iAmAllowedToDoIt(). Additional invocations of doThisThing() simply await the completion of the completable future if one is present or create a new one otherwise, with an appropriate CAS loop to avoid creating more than one instance at a time.
Upon completion the reference is set to null again so that threads invoking the method at a later point can start a new computation.

You could do the following (which implements an algorithm similar to the one proposed by #the8472) :
public class Test {
/**
* Lock used to guard accesses to allowedFuture
*/
private final Object lock = new Object();
/**
* The future result being computed, reset to null as soon as the result is known
*/
private FutureTask<Boolean> allowedFuture = null;
private static final Random RANDOM = new Random();
public boolean doThisThing() throws ExecutionException, InterruptedException {
if (iAmAllowedToDoIt()) {
System.out.println("doing it...");
return true;
}
else {
System.out.println("not doing it...");
return false;
}
}
private boolean iAmAllowedToDoIt() throws ExecutionException, InterruptedException {
// if true, this means that this thread is the one which must really compute if I am allowed
boolean mustCompute = false;
// The Future holding the result which is either the cached one, or a new one stored in the cache
FutureTask<Boolean> result;
synchronized (lock) {
// if no one has computed the result yet, or if it has been computed and thus must be recomputed
// then create it
if (this.allowedFuture == null) {
mustCompute = true;
this.allowedFuture = new FutureTask<>(new Callable<Boolean>() {
#Override
public Boolean call() throws Exception {
System.out.println("computing if I am allowed...");
Thread.sleep(RANDOM.nextInt(3000));
boolean allowed = RANDOM.nextBoolean();
System.out.println(allowed ? "allowed!" : "not allowed!");
return allowed;
}
});
}
result = this.allowedFuture;
}
if (mustCompute) {
allowedFuture.run();
// reset the cache to null, so that the next thread recomputes the result
synchronized (lock) {
this.allowedFuture = null;
}
}
return result.get();
}
public static void main(String[] args) {
Test test = new Test();
Runnable r = new Runnable() {
#Override
public void run() {
try {
Thread.sleep(RANDOM.nextInt(6000));
test.doThisThing();
}
catch (ExecutionException | InterruptedException e) {
throw new RuntimeException(e);
}
}
};
for (int i = 0; i < 50; i++) {
Thread t = new Thread(r);
t.start();
}
}
}

Related

Execute two threads in parallel and restart the first one when it ends instead of waiting for both to finish

I have two methods in Java and I execute them in parallel in a class which has a fixed delay. The first thread takes a few minutes to complete, while the second one can take some hours. What I want is to restart the first thread and execute it when it ends, instead of waiting for the second one to finish and re-execute both of them.
Can anyone help me with this?
My code is below:
#Scheduled(fixedDelay = 30)
public void scheduled_function() throws IOException, InterruptedException {
Callable<Void> callableSchedule = new Callable<Void>()
{
#Override
public Void call() throws Exception
{
getAndUpdateSchedule();
return null;
}
};
Callable<Void> callableMatches = new Callable<Void>()
{
#Override
public Void call() throws Exception
{
processMatches();
return null;
}
};
//add to a list
List<Callable<Void>> taskList = new ArrayList<Callable<Void>>();
taskList.add(callableSchedule);
taskList.add(callableMatches);
//create a pool executor with threads
ExecutorService executor = Executors.newFixedThreadPool(2);
try
{
//start the threads
executor.invokeAll(taskList);
}
catch (InterruptedException ie)
{
System.out.println("An InterruptedException occured");
}

You can just store a boolean variable, let's call it isComplete, that stores whether the long task has completed or not. This will be an instance variable, since we need it to stay around after scheduled_function() returns. Something like this:
private boolean isComplete = false;
Now, right now this variable is meaningless because we never update it. So, we need to make sure to update this variable when the long task completes:
Callable<Void> callableMatches = new Callable<Void>()
{
#Override
public Void call() throws Exception
{
processMatches();
synchronized (MyClass.this) { // MyClass is just a placeholder name
isComplete = true;
}
return null;
}
};
Notice that where I update the isComplete variable, I put it in a synchronized block. This ensures that the value we are writing is actually going to be updated on the other thread, and it prevents the other thread from reading while we're writing the value. The result is that the other thread always gets the updated value.
This bit is tangential to the answer, but we can actually shorten this piece of code significantly by using lambda syntax. Callable is a functional interface, so this is perfectly legal:
Callable<Void> callableMatches = () -> {
processMatches();
synchronized (MyClass.this) { // MyClass is just a placeholder name
isComplete = true;
}
return null;
};
Now all we have to do is check this variable every time we want to start the short task. Since we only have 2 threads, and one of the threads is being used for the long task, we know that this task will always be executed on the same thread. This means there's no point in going back to the executor, we can just put it in a while loop inside the callable. On every iteration of the while loop, we just need to check our isComplete variable, and we'll break out of the loop if the other task has completed.
Callable<Void> callableSchedule = () -> {
while (true) {
synchronized (MyClass.this) { // MyClass is just a placeholder name
if (isComplete) {
break;
}
}
getAndUpdateSchedule();
}
return null;
};
Note that in this example, I've used the lambda syntax and I've put the if statement inside another synchronized block. As I explained above, we don't want to get a stale value here and keep looping after the other task is complete.

How to let other threads to continue on locking elements inside for loop java

Let's say I have below code
public class ContinueIfCannotLock implements Runnable
{
static List<LockingObject> lockObjects = new ArrayList();
#Override
public void run()
{
for(LockingObject obj : lockObjects)
{
synchronized ( obj )
{
// do things here
}
}
}
}
and the LockingObject is just and empty class. Also let's assume before these threads start we have 100 of objects in the LockingObject list.
So how can I let a thread to continue to next object in the list if it cannot acquire the lock to the current element. So that no thread (at least until all the objects are not locked by a thread) is waiting inside the loop.

Try using Thread.holdsLock(Object obj),
Returns true if and only if the current thread holds the monitor lock on the specified object.
~Thread (Java Platform SE 8)~
static List<LockingObject> lockObjects = new ArrayList();
#Override
public void run(){
for(LockingObject obj : lockObjects){
if(Thread.holdsLock(obj)){
continue; //continue the loop if object is locked.
}
synchronized(obj){
// do things here
}
}
}
}

You May use locks:
static List<ReentrantLock> lockObjects;
public static void init(){
lockObjects = new ArrayList<>(100);
for(int i = 0; i<100;i++){
lockObjects.add(new ReentrantLock());
}
}
#Override
public void run()
{
for(LockingObject lock : lockObjects)
{
if(lock.tryLock()){
try{
//dostuff
}finally{
lock.unlock();
}
// break if you only want the thread to work once
break;
}
}
}
If your only goal with this was to have a maximum of 100 threads working at the same time, you could also use a Semaphore which is a lock that let's multiple threads lock it up to a specified value.

Eliminating excess synchronization and improving error handling when updating a shared variable

I have a shared object that caches the results of database queries whose interface is "get cached results" and "invalidate cached results." It is acceptable to return slightly stale data.
My current solution is pasted at the bottom of this question. Each cache's get and clear method is accessible via a public method in CacheService. Within Cache, lastUpdated contains the most recent query results; isValid indicates whether the results should be updated; updateGuard is used to ensure that only one thread updates the results; and updateWait lets threads wait for another thread to update the results. To ensure progress and because it is acceptable to return slightly stale data, after lastUpdated is updated I immediately return its results from the updating threads and all threads waiting on the update - I do not check to see if isValid has been set to false again.
Major concern: if lastUpdated = getUpdate() throws an exception (likely the result of a network failure when trying to talk to the database) then presently I'm simply returning lastUpdated - it is acceptable to return slightly stale data, but repeated transient faults during getUpdate() could result in extremely stale data. I want to include some logic along the lines of
final int maxRetries = 5;
...
try {
updateWait.drainPermits();
int retryCount = 0;
while(true) {
try {
lastUpdated = getUpdate();
break;
} catch(Exception e) {
retryCount++;
if(retryCount == maxRetries) {
throw Exception e in all threads waiting on semaphore
}
}
}
isValid = true;
}
However I'm not sure of a good way to implement "throw Exception e in all threads waiting on semaphore" or if there's a better alternative. One option I've considered is to use a Scala Try, i.e. Try<ImmutableList<T>> lastUpdated, but I'm trying not to mix Scala and Java objects where possible in order to make code maintenance easier.
Less Major Concern: Right now I've got three synchronization variables (isValid, updateGuard, updateWait) which seems excessive - I'm looking for a way to safely eliminate one or two of these.
public class CacheService {
private final Cache<Foo> fooCache;
private final Cache<Bar> barCache;
// and so on
private abstract class Cache<T> {
private final AtomicBoolean updateGuard = new AtomicBoolean(false);
private final Semaphore updateWait = new Semaphore(Integer.MAX_VALUE);
private volatile boolean isValid = true;
private volatile ImmutableList<T> lastUpdated = getUpdate();
protected abstract ImmutableList<T> getUpdate();
public void clear() {
isValid = false;
}
public ImmutableList<T> get() {
if(isValid) {
return lastUpdated;
} else {
if(updateGuard.compareAndSet(false, true)) {
try {
updateWait.drainPermits();
lastUpdated = getUpdate();
isValid = true;
} finally {
updateGuard.set(false);
updateWait.release(Integer.MAX_VALUE);
}
} else {
while(updateGuard.get()) {
try {
updateWait.acquire();
} catch(InterruptedException e) {
break;
}
}
}
return lastUpdated;
}
}
}
public CacheService() {
fooCache = new Cache<Foo>() {
#Override
protected ImmutableList<Foo> getUpdate() {
return // database query
}
};
// Likewise when initializing barCache etc
}
}

One way to do this is with a CompletableFuture and completeExceptionally
private abstract static class Cache<T> {
private final AtomicReference<CompletableFuture<ImmutableList<T>>> value =
new AtomicReference<>();
private static final int MAX_TRIES = 5;
protected abstract ImmutableList<T> getUpdate();
public void clear() {
value.getAndUpdate(f -> f != null && f.isDone() ? null : f);
// or value.set(null); if you want the cache to be invalidated while it is being updated.
}
public ImmutableList<T> get() {
CompletableFuture<ImmutableList<T>> f = value.get();
if (f != null) {
try {
return f.get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
f = new CompletableFuture<>();
if (!value.compareAndSet(null, f)) {
return get();
}
for(int tries = 0; ; ){
try {
ImmutableList<T> update = getUpdate();
f.complete(update);
return update;
} catch (Exception e){
if(++tries == MAX_TRIES){
f.completeExceptionally(e);
throw new RuntimeException(e);
}
}
}
}
}
You may want to handle the exceptions differently, and you will need to clear it after an exception is thrown if you want to try to get the update again.

Your implementation has a problem. When 100 threads stall on the updateGuard lock, all the threads will execute the getUpdate() path. So, once you have the lock, you need to recheck isValid.
I am not the expert for the Semphore class, but I think combining the updateGuard and the updateWait should be feasible.
Here is just the stripped down version of your get method body:
while (!isValid) {
if (updateWait.tryAcquire()) {
if (!isValid) {
lastUpdate = getUpdate();
isValid = true;
}
} else {
updateWait.acquire();
}
updateWait.release();
}
return lastUpdate;
This should have all the semantics from your code, plus rechecking isValid.
Exception: Within the Java Caching library cache2k we implemented Exception caching. I wrote a blog entry on this, see: About caching exception. This may address some of your issues.
At the bottom line, this is my summary on it:
Fail-fast and always propagate an exception if you cannot do anything useful.
Fail-fast means no retry whatsoever to get rid of blocked resources as soon as possible. The user will retry in any case: on failure or when waiting time gets too long.
When you propagate the exception don't log it as warning additionally.
If you rethrow one exception from a data source to multiple consumers, make sure you explicitly make clear that these exceptions are duplicates
As soon as you return outdated data, because the recent request returns Exceptions, make sure to have a warning mechanism. In cache2k we probably will implement two metrics which say: how many seconds are overdue and how many entries are affected

Producer consumer in batches; second batch shouldn't come until the previous batch is complete

I'm trying to implement a mechanism where the runnables are both producer and consumer;
Situation is-
I need to read records from the DB in batches, and process the same. I'm trying this using producer consumer pattern. I get a batch, I process. Get a batch, process. This gets a batch whenever it sees queue is empty. One of the thread goes and fetches things. But the problem is that I can't mark the records that get fetched for processing, and that's my limitation. So, if we fetch the next batch before entirely committing the previous, I might fetch the same records again. Therefore, I need to be able to submit the previous one entirely before pulling the other one. I'm getting confused as to what should I do here. I've tried keeping the count of the fetched one, and then holding my get until that count is reached too.
What's the best way of handling this situation? Processing records from DB in chunks- the biggest limitation I've here is that I can't mark the records which have been picked up. So, I want batches to go through sequentially. But a batch should use multithreading internally.
public class DealStoreEnricher extends AsyncExecutionSupport {
private static final int BATCH_SIZE = 5000;
private static final Log log = LogFactory.getLog(DealStoreEnricher.class);
private final DealEnricher dealEnricher;
private int concurrency = 10;
private final BlockingQueue<QueryDealRecord> dealsToBeEnrichedQueue;
private final BlockingQueue<QueryDealRecord> dealsEnrichedQueue;
private DealStore dealStore;
private ExtractorProcess extractorProcess;
ExecutorService executor;
public DealStoreEnricher(DealEnricher dealEnricher, DealStore dealStore, ExtractorProcess extractorProcess) {
this.dealEnricher = dealEnricher;
this.dealStore = dealStore;
this.extractorProcess = extractorProcess;
dealsToBeEnrichedQueue = new LinkedBlockingQueue<QueryDealRecord>();
dealsEnrichedQueue = new LinkedBlockingQueue<QueryDealRecord>(BATCH_SIZE * 3);
}
public ExtractorProcess getExtractorProcess() {
return extractorProcess;
}
public DealEnricher getDealEnricher() {
return dealEnricher;
}
public int getConcurrency() {
return concurrency;
}
public void setConcurrency(int concurrency) {
this.concurrency = concurrency;
}
public DealStore getDealStore() {
return dealStore;
}
public DealStoreEnricher withConcurrency(int concurrency) {
setConcurrency(concurrency);
return this;
}
#Override
public void start() {
super.start();
executor = Executors.newFixedThreadPool(getConcurrency());
for (int i = 0; i < getConcurrency(); i++)
executor.submit(new Runnable() {
public void run() {
try {
QueryDealRecord record = null;
while ((record = get()) != null && !isCancelled()) {
try {
update(getDealEnricher().enrich(record));
processed.incrementAndGet();
} catch (Exception e) {
failures.incrementAndGet();
log.error("Failed to process deal: " + record.getTradeId(), e);
}
}
} catch (InterruptedException e) {
setCancelled();
}
}
});
executor.shutdown();
}
protected void update(QueryDealRecord enrichedRecord) {
dealsEnrichedQueue.add(enrichedRecord);
if (batchComplete()) {
List<QueryDealRecord> enrichedRecordsBatch = new ArrayList<QueryDealRecord>();
synchronized (this) {
dealsEnrichedQueue.drainTo(enrichedRecordsBatch);
}
if (!enrichedRecordsBatch.isEmpty())
updateTheDatabase(enrichedRecordsBatch);
}
}
private void updateTheDatabase(List<QueryDealRecord> enrichedRecordsBatch) {
getDealStore().insertEnrichedData(enrichedRecordsBatch, getExtractorProcess());
}
/**
* #return true if processed records have reached the batch size or there's
* nothing to be processed now.
*/
private boolean batchComplete() {
return dealsEnrichedQueue.size() >= BATCH_SIZE || dealsToBeEnrichedQueue.isEmpty();
}
/**
* Gets an item from the queue of things to be enriched
*
* #return {#linkplain QueryDealRecord} to be enriched
* #throws InterruptedException
*/
protected synchronized QueryDealRecord get() throws InterruptedException {
try {
if (!dealsToBeEnrichedQueue.isEmpty()) {
return dealsToBeEnrichedQueue.take();
} else {
List<QueryDealRecord> records = getNextBatchToBeProcessed();
if (!records.isEmpty()) {
dealsToBeEnrichedQueue.addAll(records);
return dealsToBeEnrichedQueue.take();
}
}
} catch (InterruptedException ie) {
throw new UnRecoverableException("Unable to retrieve QueryDealRecord", ie);
}
return null;
}
private List<QueryDealRecord> getNextBatchToBeProcessed() {
List<QueryDealRecord> recordsThatNeedEnriching = getDealStore().getTheRecordsThatNeedEnriching(getExtractorProcess());
return recordsThatNeedEnriching;
}
#Override
public void stop() {
super.stop();
if (executor != null)
executor.shutdownNow();
}
#Override
public boolean await() throws InterruptedException {
return executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS) && !isCancelled() && complete();
}
#Override
public boolean await(long timeout, TimeUnit unit) throws InterruptedException {
return executor.awaitTermination(timeout, unit) && !isCancelled() && complete();
}
private boolean complete() {
setCompleted();
return true;
}
}

You're already using a BlockingQueue - it does all that work for you.
However, you're using the wrong method addAll() to add new elements to the queue. That method will throw an exception if the queue is not able to accept elements. Rather you should use put() because that's the blocking method corresponding to take(), which you are using correctly.
Regarding your statement in the post title:
second batch shouldn't come until the previous batch is complete
You need not be concerned about the timing of the incoming versus outgoing batches if you use BlockingQueue correctly.

It looks like a Semaphore will work perfectly for you. Have the producing thread acquire the semaphore while the consuming thread releases the semaphore when it completes the batch.
BlockingQueue blockingQueue = ...;
Semapore semaphore = new Semaphore(1);
Producing-Thread
Batch batch = db.getBatch();
semaphore.acquire(); // wait until previous batch completes
blockingQueue.add(batch);
Consuming Thread
for(;;){
Batch batch = blockingQueue.take();
doBatchUpdate(batch);
semaphore.release(); // tell next batch to run
}

Java 1.4 synchronization: only allow one instance of method to run (non blocking)?

I have a class proposing translations utilities. The translations themselves should be reloaded every 30 minutes. I use Spring Timer support for that. Basically, my class looks like :
public interface Translator {
public void loadTranslations();
public String getTranslation(String key);
}
loadTranslations() can be pretty long to run, so while it is running the old translations are still available. This is done by loading the translations in a local Map and just changing the reference when all translations are loaded.
My problem is : how do I make sure that when a thread is already loading translations, is a second one also tries to run, it detects that and returns immediately, without starting a second update.
A synchronized method will only queue the loads ... I'm still on Java 1.4, so no java.util.concurrent.
Thanks for your help !

Use some form of locking mechanism to only perform the task if it is not already in progress. Acquiring the locking token must be a one-step process. See:
/**
* #author McDowell
*/
public abstract class NonconcurrentTask implements Runnable {
private boolean token = true;
private synchronized boolean acquire() {
boolean ret = token;
token = false;
return ret;
}
private synchronized void release() {
token = true;
}
public final void run() {
if (acquire()) {
try {
doTask();
} finally {
release();
}
}
}
protected abstract void doTask();
}
Test code that will throw an exception if the task runs concurrently:
public class Test {
public static void main(String[] args) {
final NonconcurrentTask shared = new NonconcurrentTask() {
private boolean working = false;
protected void doTask() {
System.out.println("Working: "
+ Thread.currentThread().getName());
if (working) {
throw new IllegalStateException();
}
working = true;
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
if (!working) {
throw new IllegalStateException();
}
working = false;
}
};
Runnable taskWrapper = new Runnable() {
public void run() {
while (true) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
shared.run();
}
}
};
for (int i = 0; i < 100; i++) {
new Thread(taskWrapper).start();
}
}
}

I am from a .net background(no java experience at all), but you could try a simple static flag of some sort that checks at the beginning of the method if its alrady running. Then all you need to do is make sure any read/write of that flag is synchronized. So at beginning check the flag, if its not set, set it, if it is set, return. If its not set, run the rest of the method, and after its complete, unset it. Just make sure to put the code in a try/finally and the flag iunsetting in the finally so it always gets unset in case of error. Very simplified but may be all you need.
Edit: This actually probably works better than synchronizing the method. Because do you really need a new translation immediately after the one before it finishes? And you may not want to lock up a thread for too long if it has to wait a while.

Keep a handle on the load thread to see if it's running?
Or can't you just use a synchronized flag to indicate if a load is in progress?

This is actually identical to the code that is required to manage the construction of a Singleton (gasp!) when done the classical way:
if (instance == null) {
synchronized {
if (instance == null) {
instance = new SomeClass();
}
}
}
The inner test is identical to the outer test. The outer test is so that we dont routinely enter a synchronised block, the inner test is to confirm that the situation has not changed since we last made the test (the thread could have been preempted before entering Synchronized).
In your case:
if (translationsNeedLoading()) {
synchronized {
if (translationsNeedLoading()) {
loadTranslations();
}
}
}
UPDATE: This way of constructing a singleton will not work reliably under your JDK1.4. For explanation see here. However I think you are you will be OK in this scenario.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Appropriate way to batch method invocations? - java

Related

Execute two threads in parallel and restart the first one when it ends instead of waiting for both to finish

How to let other threads to continue on locking elements inside for loop java

Eliminating excess synchronization and improving error handling when updating a shared variable

Producer consumer in batches; second batch shouldn't come until the previous batch is complete

Java 1.4 synchronization: only allow one instance of method to run (non blocking)?

Categories

Resources