Prevent race condition with RxJava asynchronous thread implementation - java

I am using spring framework StringRedisTemplate to update an entry which happen with multiple threads.
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
ops.put(key, hashKey, json);
}
In this json entry looks below,
key (assignmentId) -> value (submissionId, status)
As seen in code, before update the cache entry, I fetch current entry and add the new entry and put them all. But since this operation can be do in multiple threads, there can be situation of race condition leads to data lost. I could synchronize above method, but then it will be a bottle neck for the parallel processing power of RxJava implementation where processSubmission method is call via RxJava on two asynchronous threads.
class ProcessSubmission{
#Override
public Observable<Boolean> processSubmissionSet1(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
#Override
public Observable<Boolean> processSubmissionSet2(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
}
Above will call from below service API.
class MyService{
public void handleSubmissions(){
final Observable<Boolean> statusObser1 = processSubmission.processSubmissionSet1(subListDtos.get(0), requestHeaders)
.subscribeOn(Schedulers.newThread());
final Observable<Boolean> statusObser2 = processSubmission.processSubmissionSet2(subListDtos.get(1), requestHeaders)
.subscribeOn(Schedulers.newThread());
statusObser1.subscribe();
statusObser2.subscribe();
}
}
So handleSubmissions is calling with multiple threads per assignment id. But then per main thread is create and call two reactive java threads and process the submission list associate with each assignment.
What would be the best approach I could prevent redis entry race condition, while keep the performance of the RxJava implementation? Is there a way I could do this redis operation more efficient way?

It looks like you're only using the ops variable to do a put operation at the end, and you could isolate that point which is where you need to synchronize.
In the short research that I did, I couldn't find if HashOperations is not already thread-safe).
But an example of how you could just isolate the part you're concerned about is to do something like:
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
putThreadSafeValue(key, hashKey, json);
}
And have a method that is synchronized just for the put operation:
private synchronized void putThreadSafeValue(key, hashKey, json) {
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
ops.put(key, hashKey, json);
}
There are a number of ways to do this, but it looks like you could restrict the thread contention down to that put operation.

Related

Spring WebFlux - how to determine when my client has finished working

I need to call certain API with multiple query params simultaneously, in order to do that I wanted to use reactive approach. I ended up with reactive client that is able to call endpoint based on passed SearchQuery, handle pagination of that response and call for remaining pages and returns Flux<Item>. So far it works fine, however what I need to do now is to:
Collect data for all search queries and save them as initial state
Once the initial data is collected, I need to start repeating those calls in small time intervals and validate each item against initial data. Basically, I need to find new items from here.
But I'm running out of options how to solve that, I came up with probably the dirties solution ever, but I bet there are much better ways to do that.
So first of all, this is relevant code of my client
public Flux<Item> collectData(final SearchQuery query) {
final var iteration = new int[]{0};
return invoke(query, 0).expand(res ->
this.handleResponse(res, query, iteration))
.flatMap(response -> Flux.fromIterable(response.collectItems()));
}
private Mono<ApiResponse> handleResponse(final ApiResponse response, final SearchQuery searchQuery, final int[] iteration) {
return hasNextPage(response) ? invoke(searchQuery, calculateOffset(++iteration[0])) : Mono.empty();
}
private Mono<ApiResponse> invoke(final SearchQuery query, final int offset) {
final var url = offset == 0 ? query.toUrlParams() : query.toUrlParamsWithOffset(offset);
return doInvoke(url).onErrorReturn(ApiResponse.emptyResponse());
}
private Mono<ApiResponse> doInvoke(final String endpoint) {
return webClient.get()
.uri(endpoint)
.retrieve()
.bodyToMono(ApiResponse.class);
}
And here is my service that is using this client
private final Map<String, Item> initialItems = new ConcurrentHashMap<>();
void work() {
final var executorService = Executors.newSingleThreadScheduledExecutor();
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(item -> initialItems.put(item.getId(), item));
});
executorService.scheduleAtFixedRate(() -> {
if(isReady()) {
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(this::process);
});
}
}, 0, 3, TimeUnit.SECONDS);
}
/**
* If after 2 second sleep size of initialItems remains the same,
* that most likely means that initial population phase is over,
* and we can proceed with further data processing
**/
private boolean isReady() {
try {
final var snapshotSize = initialItems.size();
Thread.sleep(2000);
return snapshotSize == initialItems.size();
} catch (Exception e) {
return false;
}
}
I think the code speaks for itself, I just want to finish first phase, which is initial data population and then start processing all incomming data.

How to track progress status of async tasks running in multiple servers

I have multiple async tasks running in spring boot.These tasks read an excel file and insert all that data into the database.
The task is started when a request is made from the front-end. The front-end then periodically keeps requesting for the progress status of the task.
I need to track the progress of each of these tasks and know when they are completed.
This is the controller file that takes in requests for tasks and for polling their progress status:
public class TaskController {
#RequestMapping(method = RequestMethod.POST, value = "/uploadExcel")
public ResponseEntity<?> uploadExcel(String excelFilePath) {
String taskId = UUID.randomUUID().toString();
taskAsyncService.AsyncManager(id, excelFilePath);
HashMap<String, String> responseMap = new HashMap<>();
responeMap.put("taskId",taskId);
return new ResponseEntity<>(responseMap, HttpStatus.ACCEPTED);
}
// This will be polled to get progress of tasks being executed
#RequestMapping(method = RequestMethod.GET, value = "/tasks/progress/{id}")
public ResponseEntity<?> getTaskProgress(#PathVariable String taskId) {
HashMap<String, String> map = new HashMap<>();
if (taskAsyncService.containsTaskEntry(id) == null) {
map.put("Error", "TaskId does not exist");
return new ResponseEntity<>(map, HttpStatus.BAD_REQUEST);
}
boolean taskProgress = taskAsyncService.getTaskProgress(taskId);
if (taskProgress) {
map.put("message", "Task complete");
taskAsyncService.removeTaskProgressEntry(taskId);
return new ResponseEntity<>(map, HttpStatus.OK);
}
//Otherwise task is still running
map.put("progressStatus", "Task running");
return new ResponseEntity<>(map, HttpStatus.PARTIAL_CONTENT);
}
}
This is the code that executes the async tasks.
public class TaskAsyncService {
private final AtomicReference<ConcurrentHashMap<String, Boolean>> isTaskCompleteMap = new AtomicReference<ConcurrentHashMap<String, Boolean>>();
protected boolean containsTaskEntry(String taskId) {
if (isTaskCompleteMap.get().get(taskId) != null) {
return true;
}
return false;
}
protected boolean getTaskProgress(String taskId, String excelFilePath) {
return isTaskCompleteMap.get().get(taskId);
}
protected void removeTaskProgressEntry(String taskId) {
if (isTaskCompleteMap.get() != null) {
isTaskCompleteMap.get().remove(taskId);
}
}
#Async
public CompletableFuture<?> AsyncManager(String taskId) {
HashMap<String, String> map = new HashMap<>();
//Add a new entry into isTaskCompleteMap
isTaskCompleteMap.get().put(taskId, false);
//Insert excel rows into database
//Task completed set value to true
isTaskCompleteMap.get().put(taskId, true);
map.put("Success", "Task completed");
return CompletableFuture.completedFuture(map);
}
}
I am using AWS EC2 with a load balancer. Therefore, sometimes a
polling request gets handled by a newly spawned server which cannot
access the isTaskCompleteMap and returns saying that "TaskId does not exist".
How do I track the status of the tasks in this case? I understand i need a distributed data structure but don't understand of what kind and how to implement it.
You can use Hazelcast or similar distributed solutions(Redis, etc).
maps - https://docs.hazelcast.org/docs/3.0/manual/html/ch02.html#Map
Use distributed map from hazelcast instead of CHM.
Get from such map should return task even if they are processing on another pod(server)

Cannot create hot stream from Queue

I have the following rest controller, which receives requests, transforms them into JSON strings and puts them into a concurrent queue.
I would like to make a Flux out of this queue and subscribe to it.
Unfortunately, it doesn't work.
What am I doing wrong here?
#RestController
public class EventController {
private final ObjectMapper mapper = new ObjectMapper();
private final FirehosePutService firehosePutService;
private ConcurrentLinkedQueue<String> events = new ConcurrentLinkedQueue<>();
private int batchSize = 10;
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Flux<String> eventFlux = Flux.create((FluxSink<String> sink) -> {
String next;
while (( next = events.poll()) != null) {
sink.next(next);
}
});
eventFlux.publish().autoConnect().subscribe(new BaseSubscriber<String>() {
int consumed;
List<String> batchOfEvents = new ArrayList<>(batchSize);
#Override
protected void hookOnSubscribe(Subscription subscription) {
request(batchSize);
}
#Override
protected void hookOnNext(String value) {
batchOfEvents.add(value);
consumed++;
if (consumed == batchSize) {
batchOfEvents.addAll(events);
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
consumed = 0;
batchOfEvents.clear();
events.clear();
request(batchSize);
}
}
});
}
#GetMapping(value = "/saveMany", produces = "text/html")
public ResponseEntity<Void> saveMany(#RequestParam MultiValueMap<String, String> allRequestParams) throws JsonProcessingException {
Map<String, String> paramValues = allRequestParams.toSingleValueMap();
String reignnEvent = mapper.writeValueAsString(paramValues);
events.add(reignnEvent);
return new ResponseEntity<>(HttpStatus.OK);
}
}
First of all, you use poll method. It is not blocking and returns null if queue is empty. You loop collection until first null (i.e. while (next != null), so your code exits loop almost immediately because queue is empty on start. You must replace poll with take which is blocking and will wait until element is available.
Secondly, hookOnNext is invoked when the event is removed from the queue. However, you are trying to read events again using batchOfEvents.addAll(events);. Moreover, you also clear all pending events events.clear();
I advise you to remove all direct access to events collection from hookOnNext method.
Why do you use Flux here at all? Seems overcomplicated. You can use plain thread here
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Thread persister = new Thread(() -> {
List<String> batchOfEvents = new ArrayList<>(batchSize);
String next;
while (( next = events.take()) != null) {
batchOfEvents.add(value);
if (batchOfEvents.size() == batchSize) {
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
batchOfEvents.clear();
}
}
});
persister.start();
}

How To Know All Asynchronous HTTP Calls are Completed

I am trying to figure out how to determine if all async HTTP GET requests I've made have completed, so that I can execute another method. For context, I have something similar to the code below:
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
ObjectMapper mapper = new ObjectMapper();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallbackOne(k, mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallbackTwo(k, mapper));
});
}
private createCustomCallbackOne(String id, ObjectMapper mapper) {
return new Callback() {
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
try (ResponseBody body = response.body()) {
CustomObject co = mapOfObjects.get(id);
if (co != null) {
co.setFieldOne(mapper.readValue(body.byteStream(), FieldOne.class)));
}
} // implicitly closes the response body
}
}
#Override
public void onFailure(Call call, IOException e) {
// log error
}
}
}
// createCustomCallbackTwo does more or less the same thing,
// just sets a different field and then performs another
// async GET in order to set an additional field
So what would be the best/correct way to monitor all these asynchronous calls to ensure they have completed and I can go about performing another method on the Objects stored inside the map?
The most simple way would be to keep a count of how many requests are 'in flight'. Increment it for each request enqueued, decrement it at the end of the callback. When/if the count is 0, any/all requests are done. Using a semaphore or counting lock you can wait for it to become 0 without polling.
Note that the callbacks run on separate threads, so you must provide some kind of synchronization.
If you want to create a new callback for every request, you could use something like this:
public class WaitableCallback implements Callback {
private boolean done;
private IOException exception;
private final Object[] signal = new Object[0];
#Override
public void onResponse(Call call, Response response) throws IOException {
...
synchronized (this.signal) {
done = true;
signal.notifyAll();
}
}
#Override
public void onFailure(Call call, IOException e) {
synchronized (signal) {
done = true;
exception = e;
signal.notifyAll();
}
}
public void waitUntilDone() throws InterruptedException {
synchronized (this.signal) {
while (!this.done) {
this.signal.wait();
}
}
}
public boolean isDone() {
synchronized (this.signal) {
return this.done;
}
}
public IOException getException() {
synchronized (this.signal) {
return exception;
}
}
}
Create an instance for every request and put it into e.g. a List<WaitableCallback> pendingRequests.
Then you can just wait for all requests to be done:
for ( WaitableCallback cb : pendingRequests ) {
cb.waitUntilDone();
}
// At this point, all requests have been processed.
However, you probably should not create a new identical callback object for every request. Callback's methods get the Call passed as parameter so that the code can examine it to figure out which request it is processing; and in your case, it seems you don't even need that. So use a single Callback instance for the requests that should be handled identically.
If the function asyncGet calls your function createCustomCallbackOne then its easy.
For each key you are calling two pages. "https://fakeurl1.com/item/" and "https://fakeurl2.com/item/" (left out + k)
So you need a map to trach that and just one call back function is enough.
Use a map with key indicating each call:
static final Map<String, Integer> trackerOfAsyncCalls = new HashMap<>();
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
//need to keep a track of the keys in some object
ObjectMapper mapper = new ObjectMapper();
trackerOfAsyncCalls.clear();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallback(k,1 , mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallback(k, 2, mapper));
trackerOfAsyncCalls.put(k + "-2", null);
});
}
//final important
private createCustomCallbackOne(final String idOuter, int which, ObjectMapper mapper) {
return new Callback() {
final String myId = idOuter + "-" + which;
trackerOfAsyncCalls.put(myId, null);
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
trackerOfAsyncCalls.put(myId, 1);
///or put outside of if if u dont care if success or fail or partial...
Now set up a thread or best a schduler that is caclled every 5 seconds, check all eys in mapOfObjects and trackerOfAsyncCalls to see if all keys have been started and some final success or timeout or error status has been got for all.

How to query the Alfresco Audit Service in Java

I can easily query the Alfresco audit log in REST using this query:
http://localhost:8080/alfresco/service/api/audit/query/audit-custom?verbose=true
But how to perform the same request in Java within Alfresco module?
It must be synchronous.
A lazy solution would be to call the REST URL in Java, but it would probably be inefficient, and more importantly it would require me to store an admin's password somewhere.
I noticed AuditService has a auditQuery method so I am trying to call it. Unfortunately it seems to be for asynchronous operations? I don't need callbacks, as I need to wait until the queried data is ready before going on to the next step.
Here is my implementation, mostly copied from the source code of the REST API:
int maxResults = 10000;
if (!auditService.isAuditEnabled(AUDIT_APPLICATION, ("/" + AUDIT_APPLICATION))) {
throw new WebScriptException(
"Auditing for " + AUDIT_APPLICATION + " is disabled!");
}
final List<Map<String, Object>> entries =
new ArrayList<Map<String,Object>>(limit);
AuditQueryCallback callback = new AuditQueryCallback() {
#Override
public boolean valuesRequired() {
return true; // true = verbose
}
#Override
public boolean handleAuditEntryError(
Long entryId, String errorMsg, Throwable error) {
return true;
}
#Override
public boolean handleAuditEntry(
Long entryId,
String applicationName,
String user,
long time,
Map<String, Serializable> values) {
// Convert values to Strings
Map<String, String> valueStrings =
new HashMap<String, String>(values.size() * 2);
for (Map.Entry<String, Serializable> mapEntry : values.entrySet()) {
String key = mapEntry.getKey();
Serializable value = mapEntry.getValue();
try {
String valueString = DefaultTypeConverter.INSTANCE.convert(
String.class, value);
valueStrings.put(key, valueString);
}
catch (TypeConversionException e) {
// Use the toString()
valueStrings.put(key, value.toString());
}
}
entry.put(JSON_KEY_ENTRY_VALUES, valueStrings);
}
entries.add(entry);
return true;
}
};
AuditQueryParameters params = new AuditQueryParameters();
params.setApplicationName(AUDIT_APPLICATION);
params.setForward(true);
auditService.auditQuery(callback, params, maxResults);
Though the callback might it look asynchronous, it is not.

Categories

Resources