I have the following rest controller, which receives requests, transforms them into JSON strings and puts them into a concurrent queue.
I would like to make a Flux out of this queue and subscribe to it.
Unfortunately, it doesn't work.
What am I doing wrong here?
#RestController
public class EventController {
private final ObjectMapper mapper = new ObjectMapper();
private final FirehosePutService firehosePutService;
private ConcurrentLinkedQueue<String> events = new ConcurrentLinkedQueue<>();
private int batchSize = 10;
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Flux<String> eventFlux = Flux.create((FluxSink<String> sink) -> {
String next;
while (( next = events.poll()) != null) {
sink.next(next);
}
});
eventFlux.publish().autoConnect().subscribe(new BaseSubscriber<String>() {
int consumed;
List<String> batchOfEvents = new ArrayList<>(batchSize);
#Override
protected void hookOnSubscribe(Subscription subscription) {
request(batchSize);
}
#Override
protected void hookOnNext(String value) {
batchOfEvents.add(value);
consumed++;
if (consumed == batchSize) {
batchOfEvents.addAll(events);
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
consumed = 0;
batchOfEvents.clear();
events.clear();
request(batchSize);
}
}
});
}
#GetMapping(value = "/saveMany", produces = "text/html")
public ResponseEntity<Void> saveMany(#RequestParam MultiValueMap<String, String> allRequestParams) throws JsonProcessingException {
Map<String, String> paramValues = allRequestParams.toSingleValueMap();
String reignnEvent = mapper.writeValueAsString(paramValues);
events.add(reignnEvent);
return new ResponseEntity<>(HttpStatus.OK);
}
}
First of all, you use poll method. It is not blocking and returns null if queue is empty. You loop collection until first null (i.e. while (next != null), so your code exits loop almost immediately because queue is empty on start. You must replace poll with take which is blocking and will wait until element is available.
Secondly, hookOnNext is invoked when the event is removed from the queue. However, you are trying to read events again using batchOfEvents.addAll(events);. Moreover, you also clear all pending events events.clear();
I advise you to remove all direct access to events collection from hookOnNext method.
Why do you use Flux here at all? Seems overcomplicated. You can use plain thread here
#Autowired
public EventController(FirehosePutService firehosePutService) {
this.firehosePutService = firehosePutService;
Thread persister = new Thread(() -> {
List<String> batchOfEvents = new ArrayList<>(batchSize);
String next;
while (( next = events.take()) != null) {
batchOfEvents.add(value);
if (batchOfEvents.size() == batchSize) {
log.info("Consume {} elements. Size of batchOfEvents={}", consumed, batchOfEvents.size());
firehosePutService.saveBulk(batchOfEvents);
batchOfEvents.clear();
}
}
});
persister.start();
}
Related
Is there an operator in RxJava, an external library or a way I'm missing to create a flowable/observable that recieves a function that controls the emission of data, like a valve?
I have a huge json file I need to process but I have to get a portion of the file, a list of entities, process it and then get another portion, I have tried using windows(), buffer() but the BiFunction I pass to Flowable.generate() keeps executing after I recieved the first list and I haven't finished processing it. I also tried FlowableTransformers.valve() from hu.akarnokd.rxjava3.operators but it just piles up the items before the flatMap() function that process the list
private Flowable<T> flowable(InputStream inputStream) {
return Flowable.generate(() -> jsonFactory.createParser(new GZIPInputStream(inputStream)), (jsonParser, emitter) -> {
final var token = jsonParser.nextToken();
if (token == null) {
emitter.onComplete();
}
if (JsonToken.START_ARRAY.equals(token) || JsonToken.END_ARRAY.equals(token)) {
return jsonParser;
}
if (JsonToken.START_OBJECT.equals(token)) {
emitter.onNext(reader.readValue(jsonParser));
}
return jsonParser;
}, JsonParser::close);
}
Edit: I need to control de emission of items to don't overload the memory and the function that process the data, because that function reads and writes to database, also the processing needs to be sequentially. The function that process the data it's not entirely mine and it's written in RxJava and it's expected that I use Rx.
I managed to solve it like this but if there is another way let me know please:
public static <T> Flowable<T> flowable(InputStream inputStream, JsonFactory jsonFactory, ObjectReader reader, Supplier<Boolean> booleanSupplier) {
return Flowable.generate(() -> jsonFactory.createParser(new GZIPInputStream(inputStream)), (jsonParser, emitter) -> {
if (booleanSupplier.get()) {
final var token = jsonParser.nextToken();
if (token == null) {
emitter.onComplete();
}
if (JsonToken.START_ARRAY.equals(token) || JsonToken.END_ARRAY.equals(token)) {
return jsonParser;
}
if (JsonToken.START_OBJECT.equals(token)) {
emitter.onNext(reader.readValue(jsonParser));
}
}
return jsonParser;
}, JsonParser::close);
}
Edit2: This is one of the ways I'm currently consuming the function
public Flowable<List<T>> paging(Function<List<T>, Single<List<T>>> function) {
final var atomicInteger = new AtomicInteger(0);
final var atomicBoolean = new AtomicBoolean(true);
return flowable(inputStream, jsonFactory, reader, atomicBoolean::get)
.buffer(pageSize)
.flatMapSingle(list -> {
final var counter = atomicInteger.addAndGet(1);
if (counter == numberOfPages) {
atomicBoolean.set(false);
}
return function.apply(list)
.doFinally(() -> {
if (atomicInteger.get() == numberOfPages) {
atomicInteger.set(0);
atomicBoolean.set(true);
}
});
});
}
Managed to solve it like this
public static Flowable<Object> flowable(JsonParser jsonParser, ObjectReader reader, PublishProcessor<Boolean> valve) {
return Flowable.defer(() -> {
final var token = jsonParser.nextToken();
if (token == null) {
return Completable.fromAction(jsonParser::close)
.doOnError(Throwable::printStackTrace)
.onErrorComplete()
.andThen(Flowable.empty());
}
if (JsonToken.START_OBJECT.equals(token)) {
final var value = reader.readValue(jsonParser);
final var just = Flowable.just(value).compose(FlowableTransformers.valve(valve, true));
return Flowable.concat(just, flowable(jsonParser, reader, valve));
}
return flowable(jsonParser, reader, valve);
});
}
I have multiple async tasks running in spring boot.These tasks read an excel file and insert all that data into the database.
The task is started when a request is made from the front-end. The front-end then periodically keeps requesting for the progress status of the task.
I need to track the progress of each of these tasks and know when they are completed.
This is the controller file that takes in requests for tasks and for polling their progress status:
public class TaskController {
#RequestMapping(method = RequestMethod.POST, value = "/uploadExcel")
public ResponseEntity<?> uploadExcel(String excelFilePath) {
String taskId = UUID.randomUUID().toString();
taskAsyncService.AsyncManager(id, excelFilePath);
HashMap<String, String> responseMap = new HashMap<>();
responeMap.put("taskId",taskId);
return new ResponseEntity<>(responseMap, HttpStatus.ACCEPTED);
}
// This will be polled to get progress of tasks being executed
#RequestMapping(method = RequestMethod.GET, value = "/tasks/progress/{id}")
public ResponseEntity<?> getTaskProgress(#PathVariable String taskId) {
HashMap<String, String> map = new HashMap<>();
if (taskAsyncService.containsTaskEntry(id) == null) {
map.put("Error", "TaskId does not exist");
return new ResponseEntity<>(map, HttpStatus.BAD_REQUEST);
}
boolean taskProgress = taskAsyncService.getTaskProgress(taskId);
if (taskProgress) {
map.put("message", "Task complete");
taskAsyncService.removeTaskProgressEntry(taskId);
return new ResponseEntity<>(map, HttpStatus.OK);
}
//Otherwise task is still running
map.put("progressStatus", "Task running");
return new ResponseEntity<>(map, HttpStatus.PARTIAL_CONTENT);
}
}
This is the code that executes the async tasks.
public class TaskAsyncService {
private final AtomicReference<ConcurrentHashMap<String, Boolean>> isTaskCompleteMap = new AtomicReference<ConcurrentHashMap<String, Boolean>>();
protected boolean containsTaskEntry(String taskId) {
if (isTaskCompleteMap.get().get(taskId) != null) {
return true;
}
return false;
}
protected boolean getTaskProgress(String taskId, String excelFilePath) {
return isTaskCompleteMap.get().get(taskId);
}
protected void removeTaskProgressEntry(String taskId) {
if (isTaskCompleteMap.get() != null) {
isTaskCompleteMap.get().remove(taskId);
}
}
#Async
public CompletableFuture<?> AsyncManager(String taskId) {
HashMap<String, String> map = new HashMap<>();
//Add a new entry into isTaskCompleteMap
isTaskCompleteMap.get().put(taskId, false);
//Insert excel rows into database
//Task completed set value to true
isTaskCompleteMap.get().put(taskId, true);
map.put("Success", "Task completed");
return CompletableFuture.completedFuture(map);
}
}
I am using AWS EC2 with a load balancer. Therefore, sometimes a
polling request gets handled by a newly spawned server which cannot
access the isTaskCompleteMap and returns saying that "TaskId does not exist".
How do I track the status of the tasks in this case? I understand i need a distributed data structure but don't understand of what kind and how to implement it.
You can use Hazelcast or similar distributed solutions(Redis, etc).
maps - https://docs.hazelcast.org/docs/3.0/manual/html/ch02.html#Map
Use distributed map from hazelcast instead of CHM.
Get from such map should return task even if they are processing on another pod(server)
I am trying to figure out how to determine if all async HTTP GET requests I've made have completed, so that I can execute another method. For context, I have something similar to the code below:
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
ObjectMapper mapper = new ObjectMapper();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallbackOne(k, mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallbackTwo(k, mapper));
});
}
private createCustomCallbackOne(String id, ObjectMapper mapper) {
return new Callback() {
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
try (ResponseBody body = response.body()) {
CustomObject co = mapOfObjects.get(id);
if (co != null) {
co.setFieldOne(mapper.readValue(body.byteStream(), FieldOne.class)));
}
} // implicitly closes the response body
}
}
#Override
public void onFailure(Call call, IOException e) {
// log error
}
}
}
// createCustomCallbackTwo does more or less the same thing,
// just sets a different field and then performs another
// async GET in order to set an additional field
So what would be the best/correct way to monitor all these asynchronous calls to ensure they have completed and I can go about performing another method on the Objects stored inside the map?
The most simple way would be to keep a count of how many requests are 'in flight'. Increment it for each request enqueued, decrement it at the end of the callback. When/if the count is 0, any/all requests are done. Using a semaphore or counting lock you can wait for it to become 0 without polling.
Note that the callbacks run on separate threads, so you must provide some kind of synchronization.
If you want to create a new callback for every request, you could use something like this:
public class WaitableCallback implements Callback {
private boolean done;
private IOException exception;
private final Object[] signal = new Object[0];
#Override
public void onResponse(Call call, Response response) throws IOException {
...
synchronized (this.signal) {
done = true;
signal.notifyAll();
}
}
#Override
public void onFailure(Call call, IOException e) {
synchronized (signal) {
done = true;
exception = e;
signal.notifyAll();
}
}
public void waitUntilDone() throws InterruptedException {
synchronized (this.signal) {
while (!this.done) {
this.signal.wait();
}
}
}
public boolean isDone() {
synchronized (this.signal) {
return this.done;
}
}
public IOException getException() {
synchronized (this.signal) {
return exception;
}
}
}
Create an instance for every request and put it into e.g. a List<WaitableCallback> pendingRequests.
Then you can just wait for all requests to be done:
for ( WaitableCallback cb : pendingRequests ) {
cb.waitUntilDone();
}
// At this point, all requests have been processed.
However, you probably should not create a new identical callback object for every request. Callback's methods get the Call passed as parameter so that the code can examine it to figure out which request it is processing; and in your case, it seems you don't even need that. So use a single Callback instance for the requests that should be handled identically.
If the function asyncGet calls your function createCustomCallbackOne then its easy.
For each key you are calling two pages. "https://fakeurl1.com/item/" and "https://fakeurl2.com/item/" (left out + k)
So you need a map to trach that and just one call back function is enough.
Use a map with key indicating each call:
static final Map<String, Integer> trackerOfAsyncCalls = new HashMap<>();
public void init() throws IOException {
Map<String, CustomObject> mapOfObjects = new HashMap<String, CustomObject>();
//need to keep a track of the keys in some object
ObjectMapper mapper = new ObjectMapper();
trackerOfAsyncCalls.clear();
// some code to populate the map
mapOfObjects.forEach((k,v) -> {
HttpClient.asyncGet("https://fakeurl1.com/item/" + k, createCustomCallback(k,1 , mapper));
// HttpClient is just a wrapper class for your standard OkHTTP3 calls,
// e.g. client.newcall(request).enqueue(callback);
HttpClient.asyncGet("https://fakeurl2.com/item/" + k, createCustomCallback(k, 2, mapper));
trackerOfAsyncCalls.put(k + "-2", null);
});
}
//final important
private createCustomCallbackOne(final String idOuter, int which, ObjectMapper mapper) {
return new Callback() {
final String myId = idOuter + "-" + which;
trackerOfAsyncCalls.put(myId, null);
#Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
trackerOfAsyncCalls.put(myId, 1);
///or put outside of if if u dont care if success or fail or partial...
Now set up a thread or best a schduler that is caclled every 5 seconds, check all eys in mapOfObjects and trackerOfAsyncCalls to see if all keys have been started and some final success or timeout or error status has been got for all.
I am using spring framework StringRedisTemplate to update an entry which happen with multiple threads.
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
ops.put(key, hashKey, json);
}
In this json entry looks below,
key (assignmentId) -> value (submissionId, status)
As seen in code, before update the cache entry, I fetch current entry and add the new entry and put them all. But since this operation can be do in multiple threads, there can be situation of race condition leads to data lost. I could synchronize above method, but then it will be a bottle neck for the parallel processing power of RxJava implementation where processSubmission method is call via RxJava on two asynchronous threads.
class ProcessSubmission{
#Override
public Observable<Boolean> processSubmissionSet1(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
#Override
public Observable<Boolean> processSubmissionSet2(List<Submission> submissionList, HttpHeaders requestHeaders) {
return Observable.create(observer -> {
for (final Submission submission : submissionList) {
//Cache entry insert method invoke via this call
final Boolean status = processSubmissionExecutor.processSubmission(submission, requestHeaders);
observer.onNext(status);
}
observer.onCompleted();
});
}
}
Above will call from below service API.
class MyService{
public void handleSubmissions(){
final Observable<Boolean> statusObser1 = processSubmission.processSubmissionSet1(subListDtos.get(0), requestHeaders)
.subscribeOn(Schedulers.newThread());
final Observable<Boolean> statusObser2 = processSubmission.processSubmissionSet2(subListDtos.get(1), requestHeaders)
.subscribeOn(Schedulers.newThread());
statusObser1.subscribe();
statusObser2.subscribe();
}
}
So handleSubmissions is calling with multiple threads per assignment id. But then per main thread is create and call two reactive java threads and process the submission list associate with each assignment.
What would be the best approach I could prevent redis entry race condition, while keep the performance of the RxJava implementation? Is there a way I could do this redis operation more efficient way?
It looks like you're only using the ops variable to do a put operation at the end, and you could isolate that point which is where you need to synchronize.
In the short research that I did, I couldn't find if HashOperations is not already thread-safe).
But an example of how you could just isolate the part you're concerned about is to do something like:
public void processSubmission(final String key, final Map<String, String> submissionDTO) {
final String hashKey = String.valueOf(Hashing.MURMUR_HASH.hash(key));
this.stringRedisTemplate.expire(key, 60, TimeUnit.MINUTES);
Map<String, String> data = findByKey(key);
String json;
if (data != null) {
data.putAll(submissionDTO);
json = convertSubmission(data);
} else {
json = convertSubmission(submissionDTO);
}
putThreadSafeValue(key, hashKey, json);
}
And have a method that is synchronized just for the put operation:
private synchronized void putThreadSafeValue(key, hashKey, json) {
final HashOperations<String, String, String> ops = this.stringRedisTemplate.opsForHash();
ops.put(key, hashKey, json);
}
There are a number of ways to do this, but it looks like you could restrict the thread contention down to that put operation.
I am writing a controller, that I need to make it asynchronous. How can I deal with a list of ListenableFuture? Because I have a list of URLs that I need to send GET request one by one, what is the best solution for it?
#RequestMapping(value = "/repositories", method = RequestMethod.GET)
private void getUsername(#RequestParam(value = "username") String username) {
System.out.println(username);
List<ListenableFuture> futureList = githubRestAsync.getRepositoryLanguages(username);
System.out.println(futureList.size());
}
In the service I use List<ListanbleFuture> which seems does not work, since it is asynchronous, in the controller method I cannot have the size of futureList to run a for loop on it for the callbacks.
public List<ListenableFuture> getRepositoryLanguages(String username){
return getRepositoryLanguages(username, getUserRepositoriesFuture(username));
}
private ListenableFuture getUserRepositoriesFuture(String username) throws HttpClientErrorException {
HttpEntity entity = new HttpEntity(httpHeaders);
ListenableFuture future = restTemplate.exchange(githubUsersUrl + username + "/repos", HttpMethod.GET, entity, String.class);
return future;
}
private List<ListenableFuture> getRepositoryLanguages(final String username, ListenableFuture<ResponseEntity<String>> future) {
final List<ListenableFuture> futures = new ArrayList<>();
future.addCallback(new ListenableFutureCallback<ResponseEntity<String>>() {
#Override
public void onSuccess(ResponseEntity<String> response) {
ObjectMapper mapper = new ObjectMapper();
try {
repositories = mapper.readValue(response.getBody(), new TypeReference<List<Repositories>>() {
});
HttpEntity entity = new HttpEntity(httpHeaders);
System.out.println("Repo size: " + repositories.size());
for (int i = 0; i < repositories.size(); i++) {
futures.add(restTemplate.exchange(githubReposUrl + username + "/" + repositories.get(i).getName() + "/languages", HttpMethod.GET, entity, String.class));
}
} catch (IOException e) {
e.printStackTrace();
}
}
#Override
public void onFailure(Throwable throwable) {
System.out.println("FAILURE in getRepositoryLanguages: " + throwable.getMessage());
}
});
return futures;
}
Should I use something like ListenableFuture<List> instead of List<ListenableFuture> ?
It seems like you have a List<ListenableFuture<Result>>, but you want a ListenableFuture<List<Result>>, so you can take one action when all of the futures are complete.
public static <T> ListenableFuture<List<T>> allOf(final List<? extends ListenableFuture<? extends T>> futures) {
// we will return this ListenableFuture, and modify it from within callbacks on each input future
final SettableListenableFuture<List<T>> groupFuture = new SettableListenableFuture<>();
// use a defensive shallow copy of the futures list, to avoid errors that could be caused by
// someone inserting/removing a future from `futures` list after they call this method
final List<? extends ListenableFuture<? extends T>> futuresCopy = new ArrayList<>(futures);
// Count the number of completed futures with an AtomicInt (to avoid race conditions)
final AtomicInteger resultCount = new AtomicInteger(0);
for (int i = 0; i < futuresCopy.size(); i++) {
futuresCopy.get(i).addCallback(new ListenableFutureCallback<T>() {
#Override
public void onSuccess(final T result) {
int thisCount = resultCount.incrementAndGet();
// if this is the last result, build the ArrayList and complete the GroupFuture
if (thisCount == futuresCopy.size()) {
List<T> resultList = new ArrayList<T>(futuresCopy.size());
try {
for (ListenableFuture<? extends T> future : futuresCopy) {
resultList.add(future.get());
}
groupFuture.set(resultList);
} catch (Exception e) {
// this should never happen, but future.get() forces us to deal with this exception.
groupFuture.setException(e);
}
}
}
#Override
public void onFailure(final Throwable throwable) {
groupFuture.setException(throwable);
// if one future fails, don't waste effort on the others
for (ListenableFuture future : futuresCopy) {
future.cancel(true);
}
}
});
}
return groupFuture;
}
Im not quite sure if you are starting a new project or working on a legacy one, but if the main requirement for you is none blocking and asynchronous rest service I would suggest you to have a look into upcoming Spring Framework 5 and it integration with reactive streams. Particularly Spring 5 will allow you to create fully reactive and asynchronous web services with little of coding.
So for example fully functional version of your code can be written with this small code snippet.
#RestController
public class ReactiveController {
#GetMapping(value = "/repositories")
public Flux<String> getUsername(#RequestParam(value = "username") String username) {
WebClient client = WebClient.create(new ReactorClientHttpConnector());
ClientRequest<Void> listRepoRequest = ClientRequest.GET("https://api.github.com/users/{username}/repos", username)
.accept(MediaType.APPLICATION_JSON).header("user-agent", "reactive.java").build();
return client.exchange(listRepoRequest).flatMap(response -> response.bodyToFlux(Repository.class)).flatMap(
repository -> client
.exchange(ClientRequest
.GET("https://api.github.com/repos/{username}/{repo}/languages", username,
repository.getName())
.accept(MediaType.APPLICATION_JSON).header("user-agent", "reactive.java").build())
.map(r -> r.bodyToMono(String.class)))
.concatMap(Flux::merge);
}
static class Repository {
private String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
}
To run this code locally just clone the spring-boot-starter-web-reactive and copy the code into it.
The result is something like {"Java":50563,"JavaScript":11541,"CSS":1177}{"Java":50469}{"Java":130182}{"Shell":21222,"Makefile":7169,"JavaScript":1156}{"Java":30754,"Shell":7058,"JavaScript":5486,"Batchfile":5006,"HTML":4865} still you can map it to something more usable in asynchronous way :)