CompleteableFuture in a loop contruct in a private Ethereum Blockchain - java

I have a private Ethereum blockchain set up with 5 machines mining on it. The size of the block chain [number of blocks] are as of now, 300. The processing is done on back-end Java.
I need to run the following loop construct in a asynchronous manner. The bottleneck of the loop is during the execution of the following command:
EthBlock eb = web3.ethGetBlockByNumber(new DefaultBlockParameterNumber(BigInteger.valueOf(i)), true).send();
The command can also return a Completablefuture<EthBlock> object by ending it with supplyAsync() given here https://github.com/web3j/web3j#start-sending-requests Just calling supplyAync().get() removes the parallelism aspect and makes it behave synchronously.
public void businessLogic() throws Exception {
recentBlocks = new ArrayList<EthBlock.Block>();
for (long i = 1; i <= 300000; i++) {
EthBlock eb = web3.ethGetBlockByNumber(new DefaultBlockParameterNumber(BigInteger.valueOf(i)), true).send();
if (eb == null || eb.getBlock() == null) {
continue;
}
EthBlock.Block block = eb.getBlock();
recentBlocks.add(block);
}
}
I not able to grasp the institution of translating the code into a way CompleteableFuture can operate on. Goal is to 'group' up multiple calls to web.ethGetBlockNumber(...).supplyAync() into a collection and call them all at once to update an array which will get filled by EthBlock objects i.e recentBlocks.
This is what I came up with:
public void businessLogic() throws Exception {
recentBlocks = new ArrayList<EthBlock.Block>();
List<CompleteableFuture> compFutures = new ArrayList<>();
for (long i = 0, i <= 300000, i++){
CompleteableFuture<EthBlock> compFuture = eb3.ethGetBlockByNumber(new DefaultBlockParameterNumber(BigInteger.valueOf(i)), true).sendAsync();
compFuture.thenAcceptAsync(eb -> // Doesn't look right
EthBlock.Block block = eb.getBlock();
recentBlock.add(block);)
compFutures.add(compFuture);
}
CompleteableFuture.allOf(compFutures).get();
}
Implementing IntStream
long start = System.nanoTime();
recentBlocks = IntStream.rangeClosed(0, 300_000)
.parallel()
.mapToObj(i -> {
try {
System.out.println("Current Thread -> " + Thread.currentThread());
return web3.ethGetBlockByNumber(new DefaultBlockParameterNumber(BigInteger.valueOf(i)), true).send();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
})
.filter(Objects::nonNull)
.map(EthBlock::getBlock)
.filter(Objects::nonNull)
.collect(Collectors.toList());
long stop = System.nanoTime();
System.out.println("Time Elapsed: " + TimeUnit.MICROSECONDS.convert(stop-start, TimeUnit.NANOSECONDS));

You might be able to benefit from a parallel stream instead of relying on CompletableFuture, assuming the order of the resulting List isn't important:
IntStream.rangeClosed(0, 300_000)
.parallel()
.mapToObj(i -> web3.ethGetBlockByNumber(new DefaultBlockParameterNumber(BigInteger.valueOf(i)), true).send())
.filter(Objects::nonNull)
.map(EthBlock::getBlock)
.filter(Objects::nonNull)
.collect(Collectors.toList());
Because you stated that didn't help, let's try an ExecutorService that utilizes a cached thread pool instead:
List<EthBlock.Block> blocks = Collections.synchronizedList(new ArrayList<>(300_000));
ExecutorService service = Executors.newCachedThreadPool();
for (int i = 0; i <= 300_000; i++) {
BigInteger number = BigInteger.valueOf(i);
service.execute(() -> {
EthBlock eb = web3.ethGetBlockByNumber(new DefaultBlockParameterNumber(number), true).send();
if (eb == null) {
return;
}
EthBlock.Block block = eb.getBlock();
if (block != null) {
blocks.add(block);
}
});
}

CompletableFuture contains an Override for get:
get(long timeout, TimeUnit unit). You can use this to poll by making the get timeout if it does not return within a specific time.

Related

Reactive: executing many .existById() then executing some operations on collected data

I have many reactive operations .existById() to do,
then on the collected list I want to execute once some operations.
The problem is I don't know which reactive operators I should use, or how to rewrite this to archive the desirable effect.
Every try I coded effect execution of only one part of that code.
First try where no-reactive code weren't waiting on Cassandra repo:
//First part:
signaturesFromFile.forEach(signatureFromFile -> {
SignatureKey signatureKey = new SignatureKey(signatureFromFile.getProfileId(), signatureFromFile.getUserId(), signatureFromFile.getId());
signatureRepositoryCassandra.existsById(signatureKey)
.map(exists -> {
if (!exists) {
notSavedSignatures.add(signatureFromFile);
}
return null;
}).subscribe();
});
//Second part:
String notSavedSignaturesListJson = convertSignaturesToJson(notSavedSignatures);
String pathToNotSavedSigantures = NOT_TRANSFERRED_SIGNATURES_DIRECTORY + signatureFilename;
saveIfFileNotExist(getUserIdFromFileName(signatureFilename), pathToNotSavedSigantures, notSavedSignaturesListJson);
deleteTransferredByUserIdSignaturesFile(signatureFilename);
The second idea was to close it in one reactive stream, but the problem recurred and the roles reversed - operations in the second part weren't executed
Flux.fromIterable(signaturesFromFile).map(signatureFromFile -> {
SignatureKey signatureKey = new SignatureKey(signatureFromFile.getProfileId(), signatureFromFile.getUserId(), signatureFromFile.getId());
//First part:
signatureRepositoryCassandra.existsById(signatureKey)
.map(exists -> {
if (!exists) {
notSavedSignatures.add(signatureFromFile);
}
return null;
}).subscribe();
return null;
}).mergeWith(e -> {
//Second part:
String notSavedSignaturesListJson = convertSignaturesToJson(notSavedSignatures);
String pathToNotSavedSigantures = NOT_TRANSFERRED_SIGNATURES_DIRECTORY + signatureFilename;
saveIfFileNotExist(getUserIdFromFileName(signatureFilename), pathToNotSavedSigantures, notSavedSignaturesListJson);
deleteTransferredByUserIdSignaturesFile(signatureFilename);
}).subscribe();
I have a workaround for that problem ->
.findAll() and filter data instead of many executed .existById()
but I would like to resolve it with the correct operators ;)

Kafka Streams. How to emit the final result in an aggregation window with suppressor

With my huge surprise I realized that the "suppress" operator does not emit the last event at window close but only when another event is published on the partition the streams' tasks belong to. So, how to emit the final aggregate result without having never ending event stream?
In a CDC pattern we cannot wait for a subsequent database operation, which could take place after a long time, to emit the final result of the previous aggregation.
The idea is to schedule a FutureTask to send a particular event, let's suppose with a key with a fixed "FLUSH" value, which timestamp fall out of the previous aggregation window. This "FLUSH" event, then, will be filtered out after the suppress step.
For Every record peeked from the stream, will be scheduled a "FLUSH" event, eventually replacing the previous one not yet started to minimize the unnecessary "FLUSH" events.
In this example I used a Tumbling Window, but conceptually it works with other Window kinds too.
Therefore, let's suppose we have a topic "user" and want to aggregate in a list all records falling in a 10 seconds 'Tumbling Window'.
The model are:
User.java
public class User {
private String name;
private String surname;
private String timestamp;
}
UserGrouped.java
public class UserGrouped {
private List<User> userList = new ArrayList<User>();
}
Topology
...
KStream<String, User> userEvents = builder.stream(userTopic, userConsumerOptions);
TimeWindows tumblingWindow = TimeWindows.of(Duration.ofSeconds(windowDuration))
.grace(Duration.ofSeconds(windowGracePeriod));
KStream<String,UserGrouped> userGroupedStram = userEvents
.peek( (key,value) -> {
//Filter out the previous "flush" event to avoid scheduler loop
if (!key.equalsIgnoreCase("FLUSH")) {
//For each event is scheduled a future task that
//will send a "flush" event to all partition assigned to the stream.
scheduleFlushEvent(value.getTimestamp());
}
})
.groupByKey()
.windowedBy(tumblingWindow)
.aggregate(
//INITIALIZER
() -> new UserGrouped(),
//AGGREGATOR
(key, user, userGrouped) -> {
userGrouped.getUserList().add(user);
return userGrouped;
},
//STREAM STORE
Materialized.<String,UserGrouped,WindowStore<Bytes, byte[]>>
as("userGroupedWindowStore")
.withKeySerde(Serdes.String())
.withValueSerde(JsonSerdes.UserGrouped()) //Custom Serdes
)
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded().shutDownWhenFull()))
.toStream( (windowedKey,value) -> windowedKey.key())
//Discard the flush event
.filterNot((key,value) -> key.equalsIgnoreCase("FLUSH"))
.peek( (key, value) -> {
int sizeList = value != null && value.getUserList() != null ? value.getUserList().size() : 0;
log.info("#### USER GROUPED KEY: {}, Num of elements: {}",key, sizeList);
})
;
The scheduler method
private void scheduleFlushEvent(String lastEventTimestamp) {
//add 1 second to (windowSizeInSeconds + windowGracePeriod) to ensure that the flush event will be out of last window
Long delay = Long.valueOf(windowDuration + windowGracePeriod + 1);
//FIND PARTITIONS ASSIGNED TO THE CURRENT STREAM.
//The partitions assigned may change after rebalance events,
//so I need to get them in every iteration.
//In a Spring context you can use a RebalanceListener to update a 'partitionList'
//field of this class defined with #Component annotation
Set<Integer> partitionList = new HashSet<Integer>();
StreamThread currThread = (StreamThread)Thread.currentThread();
for (TaskMetadata taskMetadata : currThread.threadMetadata().activeTasks()) {
for(TopicPartition topicPartition : taskMetadata.topicPartitions()) {
partitionList.add(topicPartition.partition());
}
}
Callable<List<RecordMetadata>> task = () -> {
try {
List<RecordMetadata> recordMetadataList = new ArrayList<RecordMetadata>();
Instant instant = Instant.from(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.parse(lastEventTimestamp));
instant = instant.plusSeconds(delay);
String flushEventTimestamp = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.withZone(ZoneId.systemDefault() )
.format(instant);
User userFlush = new User();
userFlush.setTimestamp(flushEventTimestamp);
String userFlushValue = new String(JsonSerdes.User().serializer().serialize(userTopic, userFlush));
//SEND FLUSH EVENT TO ALL PARTITION ASSIGNED TO THE STREAM THREAD
for(Integer partition : partitionList) {
ProducerRecord<String,String> userRecord = new ProducerRecord<String, String>(userTopic, partition, "FLUSH", userFlushValue);
RecordMetadata recordMetadata = userFlushProducer.send(userRecord).get();
recordMetadataList.add(recordMetadata);
log.info("SENT FLUSH EVENT PARTITION: {}, VALUE: {}",partition, userFlushValue);
}
return recordMetadataList;
} catch (Exception e) {
log.error("ERROR", e);
return null;
}
};
//TASK NOT SCHEDULED YET
if(scheduledFuture == null
|| scheduledFuture.isDone()) {
log.debug("task scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
//TASK ALREADAY SCHEDULED.
//Stop the previous scheduled task and start a newer task with an postponed delay
} else {
if(!scheduledFuture.isDone()
&& scheduledFuture.cancel(false)) {
log.debug("task RE-scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
} else {
log.warn("task not RE-scheduled");
}
}
}

Getting the line number of the Mono/Flux that returned Mono.empty()

Let's say I have a long chain of Monos. Some monos in the chain might return Mono.empty().
I can recover with switchIfEmpty, but I'd like to know which mono raised the empty (maybe so I can know where to add smarter empty handling).
Is there a way to programmatically get this information?
Silly example. In cases where I return how did I get here?, how I can know if the first flatMap or the second flatMap triggered the empty handler?
Mono.just("data")
.flatMap(t -> {
if (System.currentTimeMillis() % 2 == 0) {
return Mono.empty();
}
return Mono.just("happy1");
})
.flatMap(t -> {
if (System.currentTimeMillis() % 2 == 0) {
return Mono.empty();
}
return Mono.just("happy2");
})
.map(s -> {
return "successful complete: " + s;
})
.switchIfEmpty(Mono.fromCallable(() -> {
return "how did I get here?";
}))
.block();
Due to the dynamic nature of Flux and Mono, and to the fact that the onComplete signal is considered neutral enough that it is usually just passed through, there is no generic solution for this.
In your particular example, you could replace the Mono.empty() with something like Mono.empty().doOnComplete(() -> /* log something */).
You could even directly perform the logging in the if block, but the decorated empty trick is probably adaptable to more situations.
Another possibility is to turn emptiness into an error, rather than a switch on onComplete signal.
Errors are less neutral, so there are ways to enrich them for debugging purposes. For instance, with a .checkpoint("flatMapX") statement after each flatMap, you'd get additional stacktrace parts that would point to the flatMap which failed due to emptyness.
A way of turning emptiness to error in Mono is .single(), which will enforce exactly one onNext() or propagate onError(NoSuchElementException).
One thing to keep in mind with this trick is that the placement of checkpoint matters: it MUST be AFTER the single() so that the error raised from the single() gets detected and enriched.
So if I build on your snippet:
static final String PARSEABLE_MARKER = "PARSEABLE MARKER: <";
static final char MARKER_END = '>';
String parseLocation(Exception e) {
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
e.printStackTrace(pw);
String trace = sw.toString();
int start = trace.indexOf(PARSEABLE_MARKER);
if (start > 0) {
trace = trace.substring(start + PARSEABLE_MARKER.length());
trace = trace.substring(0, trace.indexOf(MARKER_END));
return trace;
}
return "I don't know";
}
String testInner() {
Random random = new Random();
final boolean first = random.nextBoolean();
return Mono.just("data")
.flatMap(t -> {
if (System.currentTimeMillis() % 2 == 0 && first) {
return Mono.empty();
}
return Mono.just("happy1");
})
.single()
.checkpoint(PARSEABLE_MARKER + "the first flatMap" + MARKER_END)
.flatMap(t -> {
if (System.currentTimeMillis() % 2 == 0 && !first) {
return Mono.empty();
}
return Mono.just("happy2");
})
.single()
.checkpoint(PARSEABLE_MARKER + "the second flatMap" + MARKER_END)
.map(s -> {
return "successful complete: " + s;
})
.onErrorResume(NoSuchElementException.class, e ->
Mono.just("how did I get here? " + parseLocation(e)))
.block();
}
This can be run in a loop in a test for instance:
#Test
void test() {
int successCount = 0;
int firstCount = 0;
int secondCount = 0;
for (int i = 0; i < 100; i++) {
String message = testInner();
if (message.startsWith("how")) {
if (message.contains("first")) {
firstCount++;
}
else if (message.contains("second")) {
secondCount++;
}
else {
System.out.println(message);
}
}
else {
successCount++;
}
}
System.out.printf("Stats: %d successful, %d detected first, %d detected second", successCount, firstCount, secondCount);
}
Which prints something like:
Stats: 85 successful, 5 detected first, 10 detected second

Process List of entities using completable futures

I have a list of entities of type T. I also have a functional interface which acts as Supplier which has the method to performTask on entity and send back the result R which looks like:
R performTask(T entity) throws Exception.
I want to filter both: the successful results and errors & exceptions coming out of it onto separate maps. The code I wrote here is taking time, Kindly suggest what can be done.
I am looping on the list of entities, then process their completable future one by one, which I think is not the right way to do. Can you all suggest what can be done here ?
private void updateResultAndExceptionMaps(List < T > entities, final TaskProcessor < T, R > taskProcessor) {
ExecutorService executor = createExecutorService();
Map < T, R > outputMap = Collections.synchronizedMap(new HashMap < T, R > ());
Map < T, Exception > errorMap = new ConcurrentHashMap < T, Exception > ();
try {
entities.stream()
.forEach(entity -> CompletableFuture.supplyAsync(() -> {
try {
return taskProcessor.performTask(entity);
} catch (Exception e) {
errorMap.put(entity, (Exception) e.getCause());
LOG.error("Error processing entity Exception: " + entity, e);
}
return null;
}, executor)
.exceptionally(throwable -> {
errorMap.put(entity, (Exception) throwable);
LOG.error("Error processing entity Throwable: " + entity, throwable);
return null;
})
.thenAcceptAsync(R -> outputMap.put(entity, R))
.join()
); // end of for-each
LOG.info("outputMap Map -> " + outputMap);
LOG.info("errorMap Map -> " + errorMap);
} catch (Exception ex) {
LOG.warn("Error: " + ex, ex);
} finally {
executor.shutdown();
}
}
outputmap should contain the entity and result, R.
errorMap should contain entity and Exception.
This is because you iterate over List of entities one by one, create CompletableFuture object and immediately block iteration because of join method which waits until given processor finishes it work or throw exception. You can do that with full multithreading support by converting each entity to CompletableFuture, collect all CompletableFuture instances and after that wait for all invoking join on each.
Below code should do the trick in your case:
entities.stream()
.map(entity -> CompletableFuture.supplyAsync(() -> {
try {
return taskProcessor.performTask(entity);
} catch (Exception e) {
errorMap.put(entity, (Exception) e.getCause());
}
return null;
}, executor)
.exceptionally(throwable -> {
errorMap.put(entity, (Exception) throwable);
return null;
})
.thenAcceptAsync(R -> outputMap.put(entity, R))
).collect(Collectors.toList())
.forEach(CompletableFuture::join);

cassandra driver: return set of failures in list of futures

I've got a list of futures that perform data deletion for given list of studentIds from cassandra:
val studentIds: List<String> = getStudentIds(...)
val boundStatements: List<BoundStatement> = studentIds.map(bindStudentDelete(it))
val deleteFutures = boundStatements.map { session.executeAsync(it) }
deleteFutures.forEach {
// callback that will send metrics for monitoring
Futures.addCallback(it, MyCallback(...))
}
Above I have registered a callback MyCallback(...) for each future for sending metrics. Then I do:
Futures.inCompletionOrder(deleteFutures).forEach { it.get() }
to wait for the completion of all the deletes. If for any reason that some of the futures end up failing (cancelled, something else goes wrong, etc.), I want to return the list of studentIds so that I can deal with it later.
What is the best way to achieve that?
EDIT
The callback could be a way to mutate a state to track success/failure of all the deletions.
class MyCallback(
private val statsDClient: StatsdClient,
private val tags: Array<String>,
val failures: MutableList<String>
) : FutureCallback<Any> {
override fun onSuccess(result: Any?) {
//send success metrics
...
}
override fun onFailure(t: Throwable) {
// send failure metrics
...
// do something here to get the associated studentId
val currId = ...
failures.add(currId)
}
}
Similarly, I could mutate a state in Futures.inCompletionOrder(deleteFutures).forEach block with a try/catch:
val failedDeletes = mutableListOf<String>()
Futures.inCompletionOrder(deleteFutures).forEach {
try {
it.get()
} catch (e: Exception) {
// do something to get the studentId for this future
val currId = ...
failedDeletes.add(currId)
}
}
However, there are 2 things I don't like/know about it. One is that it's mutating a state that we have to define outside. The other is that I still don't know how to get the studentId from the point of failure (in onFailure or catch block).
I have added a code snippet below in JAVA. This is blocking procedure.
ResultSet getUninterruptibly()
Waits for the query to return and return its result. This method is
usually more convenient than Future.get() because it:
Waits for the result uninterruptibly, and so doesn't throw InterruptedException.
Returns meaningful exceptions, instead of having to deal with ExecutionException.
As such, it is the preferred way to get the future result.
Check this link:
Interface ResultSetFuture
List<ResultSetFuture> futures = new ArrayList<>();
List<Long> futureStudentIds = new ArrayList<>();
// List<Long> successfullIds = new ArrayList<>();
List<Long> unsuccessfullIds = new ArrayList<>();
for (long studentid : studentids) {
futures.add(session.executeAsync(statement.deleteStudent(studentid)));
futureStudentIds.add(studentid);
}
for (int index = 0; index < futures.size(); index++) {
try {
futures.get(index).getUninterruptibly();
// successfullIds.add(futureStudentIds.get(index));
} catch (Exception e) {
unsuccessfullIds.add(futureStudentIds.get(index));
LOGGER.error("", e);
}
}
return unsuccessfullIds;
For Non-blocking you can use ListenableFuture.
Asynchronous queries with the Java driver

Categories

Resources