I have multiple async tasks running in spring boot.These tasks read an excel file and insert all that data into the database.
The task is started when a request is made from the front-end. The front-end then periodically keeps requesting for the progress status of the task.
I need to track the progress of each of these tasks and know when they are completed.
This is the controller file that takes in requests for tasks and for polling their progress status:
public class TaskController {
#RequestMapping(method = RequestMethod.POST, value = "/uploadExcel")
public ResponseEntity<?> uploadExcel(String excelFilePath) {
String taskId = UUID.randomUUID().toString();
taskAsyncService.AsyncManager(id, excelFilePath);
HashMap<String, String> responseMap = new HashMap<>();
responeMap.put("taskId",taskId);
return new ResponseEntity<>(responseMap, HttpStatus.ACCEPTED);
}
// This will be polled to get progress of tasks being executed
#RequestMapping(method = RequestMethod.GET, value = "/tasks/progress/{id}")
public ResponseEntity<?> getTaskProgress(#PathVariable String taskId) {
HashMap<String, String> map = new HashMap<>();
if (taskAsyncService.containsTaskEntry(id) == null) {
map.put("Error", "TaskId does not exist");
return new ResponseEntity<>(map, HttpStatus.BAD_REQUEST);
}
boolean taskProgress = taskAsyncService.getTaskProgress(taskId);
if (taskProgress) {
map.put("message", "Task complete");
taskAsyncService.removeTaskProgressEntry(taskId);
return new ResponseEntity<>(map, HttpStatus.OK);
}
//Otherwise task is still running
map.put("progressStatus", "Task running");
return new ResponseEntity<>(map, HttpStatus.PARTIAL_CONTENT);
}
}
This is the code that executes the async tasks.
public class TaskAsyncService {
private final AtomicReference<ConcurrentHashMap<String, Boolean>> isTaskCompleteMap = new AtomicReference<ConcurrentHashMap<String, Boolean>>();
protected boolean containsTaskEntry(String taskId) {
if (isTaskCompleteMap.get().get(taskId) != null) {
return true;
}
return false;
}
protected boolean getTaskProgress(String taskId, String excelFilePath) {
return isTaskCompleteMap.get().get(taskId);
}
protected void removeTaskProgressEntry(String taskId) {
if (isTaskCompleteMap.get() != null) {
isTaskCompleteMap.get().remove(taskId);
}
}
#Async
public CompletableFuture<?> AsyncManager(String taskId) {
HashMap<String, String> map = new HashMap<>();
//Add a new entry into isTaskCompleteMap
isTaskCompleteMap.get().put(taskId, false);
//Insert excel rows into database
//Task completed set value to true
isTaskCompleteMap.get().put(taskId, true);
map.put("Success", "Task completed");
return CompletableFuture.completedFuture(map);
}
}
I am using AWS EC2 with a load balancer. Therefore, sometimes a
polling request gets handled by a newly spawned server which cannot
access the isTaskCompleteMap and returns saying that "TaskId does not exist".
How do I track the status of the tasks in this case? I understand i need a distributed data structure but don't understand of what kind and how to implement it.
You can use Hazelcast or similar distributed solutions(Redis, etc).
maps - https://docs.hazelcast.org/docs/3.0/manual/html/ch02.html#Map
Use distributed map from hazelcast instead of CHM.
Get from such map should return task even if they are processing on another pod(server)
Related
I need to call certain API with multiple query params simultaneously, in order to do that I wanted to use reactive approach. I ended up with reactive client that is able to call endpoint based on passed SearchQuery, handle pagination of that response and call for remaining pages and returns Flux<Item>. So far it works fine, however what I need to do now is to:
Collect data for all search queries and save them as initial state
Once the initial data is collected, I need to start repeating those calls in small time intervals and validate each item against initial data. Basically, I need to find new items from here.
But I'm running out of options how to solve that, I came up with probably the dirties solution ever, but I bet there are much better ways to do that.
So first of all, this is relevant code of my client
public Flux<Item> collectData(final SearchQuery query) {
final var iteration = new int[]{0};
return invoke(query, 0).expand(res ->
this.handleResponse(res, query, iteration))
.flatMap(response -> Flux.fromIterable(response.collectItems()));
}
private Mono<ApiResponse> handleResponse(final ApiResponse response, final SearchQuery searchQuery, final int[] iteration) {
return hasNextPage(response) ? invoke(searchQuery, calculateOffset(++iteration[0])) : Mono.empty();
}
private Mono<ApiResponse> invoke(final SearchQuery query, final int offset) {
final var url = offset == 0 ? query.toUrlParams() : query.toUrlParamsWithOffset(offset);
return doInvoke(url).onErrorReturn(ApiResponse.emptyResponse());
}
private Mono<ApiResponse> doInvoke(final String endpoint) {
return webClient.get()
.uri(endpoint)
.retrieve()
.bodyToMono(ApiResponse.class);
}
And here is my service that is using this client
private final Map<String, Item> initialItems = new ConcurrentHashMap<>();
void work() {
final var executorService = Executors.newSingleThreadScheduledExecutor();
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(item -> initialItems.put(item.getId(), item));
});
executorService.scheduleAtFixedRate(() -> {
if(isReady()) {
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(this::process);
});
}
}, 0, 3, TimeUnit.SECONDS);
}
/**
* If after 2 second sleep size of initialItems remains the same,
* that most likely means that initial population phase is over,
* and we can proceed with further data processing
**/
private boolean isReady() {
try {
final var snapshotSize = initialItems.size();
Thread.sleep(2000);
return snapshotSize == initialItems.size();
} catch (Exception e) {
return false;
}
}
I think the code speaks for itself, I just want to finish first phase, which is initial data population and then start processing all incomming data.
I have the following rest controller, which receives requests, transforms them into JSON strings and puts them into a concurrent queue.
I would like to make a Flux out of this queue and subscribe to it.
Unfortunately, it doesn't work.
What am I doing wrong here?
#RestController
public class EventController {
private final ObjectMapper mapper = new ObjectMapper();
private final FirehosePutService firehosePutService;
private ConcurrentLinkedQueue<String> events = new ConcurrentLinkedQueue<>();
private int batchSize = 10;
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Flux<String> eventFlux = Flux.create((FluxSink<String> sink) -> {
String next;
while (( next = events.poll()) != null) {
sink.next(next);
}
});
eventFlux.publish().autoConnect().subscribe(new BaseSubscriber<String>() {
int consumed;
List<String> batchOfEvents = new ArrayList<>(batchSize);
#Override
protected void hookOnSubscribe(Subscription subscription) {
request(batchSize);
}
#Override
protected void hookOnNext(String value) {
batchOfEvents.add(value);
consumed++;
if (consumed == batchSize) {
batchOfEvents.addAll(events);
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
consumed = 0;
batchOfEvents.clear();
events.clear();
request(batchSize);
}
}
});
}
#GetMapping(value = "/saveMany", produces = "text/html")
public ResponseEntity<Void> saveMany(#RequestParam MultiValueMap<String, String> allRequestParams) throws JsonProcessingException {
Map<String, String> paramValues = allRequestParams.toSingleValueMap();
String reignnEvent = mapper.writeValueAsString(paramValues);
events.add(reignnEvent);
return new ResponseEntity<>(HttpStatus.OK);
}
}
First of all, you use poll method. It is not blocking and returns null if queue is empty. You loop collection until first null (i.e. while (next != null), so your code exits loop almost immediately because queue is empty on start. You must replace poll with take which is blocking and will wait until element is available.
Secondly, hookOnNext is invoked when the event is removed from the queue. However, you are trying to read events again using batchOfEvents.addAll(events);. Moreover, you also clear all pending events events.clear();
I advise you to remove all direct access to events collection from hookOnNext method.
Why do you use Flux here at all? Seems overcomplicated. You can use plain thread here
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Thread persister = new Thread(() -> {
List<String> batchOfEvents = new ArrayList<>(batchSize);
String next;
while (( next = events.take()) != null) {
batchOfEvents.add(value);
if (batchOfEvents.size() == batchSize) {
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
batchOfEvents.clear();
}
}
});
persister.start();
}
I am using spring framework StringRedisTemplate to update an entry which happen with multiple threads.
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
ops.put(key, hashKey, json);
}
In this json entry looks below,
key (assignmentId) -> value (submissionId, status)
As seen in code, before update the cache entry, I fetch current entry and add the new entry and put them all. But since this operation can be do in multiple threads, there can be situation of race condition leads to data lost. I could synchronize above method, but then it will be a bottle neck for the parallel processing power of RxJava implementation where processSubmission method is call via RxJava on two asynchronous threads.
class ProcessSubmission{
#Override
public Observable<Boolean> processSubmissionSet1(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
#Override
public Observable<Boolean> processSubmissionSet2(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
}
Above will call from below service API.
class MyService{
public void handleSubmissions(){
final Observable<Boolean> statusObser1 = processSubmission.processSubmissionSet1(subListDtos.get(0), requestHeaders)
.subscribeOn(Schedulers.newThread());
final Observable<Boolean> statusObser2 = processSubmission.processSubmissionSet2(subListDtos.get(1), requestHeaders)
.subscribeOn(Schedulers.newThread());
statusObser1.subscribe();
statusObser2.subscribe();
}
}
So handleSubmissions is calling with multiple threads per assignment id. But then per main thread is create and call two reactive java threads and process the submission list associate with each assignment.
What would be the best approach I could prevent redis entry race condition, while keep the performance of the RxJava implementation? Is there a way I could do this redis operation more efficient way?
It looks like you're only using the ops variable to do a put operation at the end, and you could isolate that point which is where you need to synchronize.
In the short research that I did, I couldn't find if HashOperations is not already thread-safe).
But an example of how you could just isolate the part you're concerned about is to do something like:
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
putThreadSafeValue(key, hashKey, json);
}
And have a method that is synchronized just for the put operation:
private synchronized void putThreadSafeValue(key, hashKey, json) {
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
ops.put(key, hashKey, json);
}
There are a number of ways to do this, but it looks like you could restrict the thread contention down to that put operation.
I can easily query the Alfresco audit log in REST using this query:
http://localhost:8080/alfresco/service/api/audit/query/audit-custom?verbose=true
But how to perform the same request in Java within Alfresco module?
It must be synchronous.
A lazy solution would be to call the REST URL in Java, but it would probably be inefficient, and more importantly it would require me to store an admin's password somewhere.
I noticed AuditService has a auditQuery method so I am trying to call it. Unfortunately it seems to be for asynchronous operations? I don't need callbacks, as I need to wait until the queried data is ready before going on to the next step.
Here is my implementation, mostly copied from the source code of the REST API:
int maxResults = 10000;
if (!auditService.isAuditEnabled(AUDIT_APPLICATION, ("/" + AUDIT_APPLICATION))) {
throw new WebScriptException(
"Auditing for " + AUDIT_APPLICATION + " is disabled!");
}
final List<Map<String, Object>> entries =
new ArrayList<Map<String,Object>>(limit);
AuditQueryCallback callback = new AuditQueryCallback() {
#Override
public boolean valuesRequired() {
return true; // true = verbose
}
#Override
public boolean handleAuditEntryError(
Long entryId, String errorMsg, Throwable error) {
return true;
}
#Override
public boolean handleAuditEntry(
Long entryId,
String applicationName,
String user,
long time,
Map<String, Serializable> values) {
// Convert values to Strings
Map<String, String> valueStrings =
new HashMap<String, String>(values.size() * 2);
for (Map.Entry<String, Serializable> mapEntry : values.entrySet()) {
String key = mapEntry.getKey();
Serializable value = mapEntry.getValue();
try {
String valueString = DefaultTypeConverter.INSTANCE.convert(
String.class, value);
valueStrings.put(key, valueString);
}
catch (TypeConversionException e) {
// Use the toString()
valueStrings.put(key, value.toString());
}
}
entry.put(JSON_KEY_ENTRY_VALUES, valueStrings);
}
entries.add(entry);
return true;
}
};
AuditQueryParameters params = new AuditQueryParameters();
params.setApplicationName(AUDIT_APPLICATION);
params.setForward(true);
auditService.auditQuery(callback, params, maxResults);
Though the callback might it look asynchronous, it is not.
In my GWT Application I'm often refering several times to the same server results. I also don't know which code is executed first. I therefore want to use caching of my asynchronous (client-side) results.
I want to use an existing caching library; I'm considering guava-gwt.
I found this example of a Guava synchronous cache (in guava's documentation):
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) throws AnyException {
return createExpensiveGraph(key);
}
});
This is how I'm trying to use a Guava cache asynchronously (I have no clue about how to make this work):
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) throws AnyException {
// I want to do something asynchronous here, I cannot use Thread.sleep in the browser/JavaScript environment.
service.createExpensiveGraph(key, new AsyncCallback<Graph>() {
public void onFailure(Throwable caught) {
// how to tell the cache about the failure???
}
public void onSuccess(Graph result) {
// how to fill the cache with that result???
}
});
return // I cannot provide any result yet. What can I return???
}
});
GWT is missing many classes from the default JRE (especially concerning threads and concurrancy).
How can I use guava-gwt to cache asynchronous results?
As I understood what you want to achieve is not just a asynchronous cache but also a lazy cache and to create one the GWT is not a best place as there is a big problem when implementing a GWT app with client side Asynchronous executions, as GWT lacks the client side implementations of Futures and/or Rx components (still there are some implementations of RxJava for GWT). So in usual java what you want to create can be achieved by :
LoadingCache<String, Future<String>> graphs = CacheBuilder.newBuilder().build(new CacheLoader<String, Future<String>>() {
public Future<String> load(String key) {
ExecutorService service = Executors.newSingleThreadExecutor();
return service.submit(()->service.createExpensiveGraph(key));
}
});
Future<String> value = graphs.get("Some Key");
if(value.isDone()){
// This will block the execution until data is loaded
String success = value.get();
}
But as GWT has no implementations for Futures you need to create one just like
public class FutureResult<T> implements AsyncCallback<T> {
private enum State {
SUCCEEDED, FAILED, INCOMPLETE;
}
private State state = State.INCOMPLETE;
private LinkedHashSet<AsyncCallback<T>> listeners = new LinkedHashSet<AsyncCallback<T>>();
private T value;
private Throwable error;
public T get() {
switch (state) {
case INCOMPLETE:
// Do not block browser so just throw ex
throw new IllegalStateException("The server response did not yet recieved.");
case FAILED: {
throw new IllegalStateException(error);
}
case SUCCEEDED:
return value;
}
throw new IllegalStateException("Something very unclear");
}
public void addCallback(AsyncCallback<T> callback) {
if (callback == null) return;
listeners.add(callback);
}
public boolean isDone() {
return state == State.SUCCEEDED;
}
public void onFailure(Throwable caught) {
state = State.FAILED;
error = caught;
for (AsyncCallback<T> callback : listeners) {
callback.onFailure(caught);
}
}
public void onSuccess(T result) {
this.value = result;
state = State.SUCCEEDED;
for (AsyncCallback<T> callback : listeners) {
callback.onSuccess(value);
}
}
}
And your implementation will become :
LoadingCache<String, FutureResult<String>> graphs = CacheBuilder.newBuilder().build(new CacheLoader<String, FutureResult<String>>() {
public FutureResult<String> load(String key) {
FutureResult<String> result = new FutureResult<String>();
return service.createExpensiveGraph(key, result);
}
});
FutureResult<String> value = graphs.get("Some Key");
// add a custom handler
value.addCallback(new AsyncCallback<String>() {
public void onSuccess(String result) {
// do something
}
public void onFailure(Throwable caught) {
// do something
}
});
// or see if it is already loaded / do not wait
if (value.isDone()) {
String success = value.get();
}
When using the FutureResult you will not just cache the execution but also get some kind of laziness so you can show some loading screen while the data is loaded into cache.
If you just need to cache the asynchronous call results, you can go for a
Non-Loading Cache, instead of a Loading Cache
In this case you need to use put, getIfPresent methods to store and retrieve records from cache.
String v = cache.getIfPresent("one");
// returns null
cache.put("one", "1");
v = cache.getIfPresent("one");
// returns "1"
Alternatively a new value can be loaded from a Callable on cache misses
String v = cache.get(key,
new Callable<String>() {
public String call() {
return key.toLowerCase();
}
});
For further reference: https://guava-libraries.googlecode.com/files/JavaCachingwithGuava.pdf