Reactive: executing many .existById() then executing some operations on collected data

Reactive: executing many .existById() then executing some operations on collected data - java

I have many reactive operations .existById() to do,
then on the collected list I want to execute once some operations.
The problem is I don't know which reactive operators I should use, or how to rewrite this to archive the desirable effect.
Every try I coded effect execution of only one part of that code.
First try where no-reactive code weren't waiting on Cassandra repo:
//First part:
signaturesFromFile.forEach(signatureFromFile -> {
SignatureKey signatureKey = new SignatureKey(signatureFromFile.getProfileId(), signatureFromFile.getUserId(), signatureFromFile.getId());
signatureRepositoryCassandra.existsById(signatureKey)
.map(exists -> {
if (!exists) {
notSavedSignatures.add(signatureFromFile);
}
return null;
}).subscribe();
});
//Second part:
String notSavedSignaturesListJson = convertSignaturesToJson(notSavedSignatures);
String pathToNotSavedSigantures = NOT_TRANSFERRED_SIGNATURES_DIRECTORY + signatureFilename;
saveIfFileNotExist(getUserIdFromFileName(signatureFilename), pathToNotSavedSigantures, notSavedSignaturesListJson);
deleteTransferredByUserIdSignaturesFile(signatureFilename);
The second idea was to close it in one reactive stream, but the problem recurred and the roles reversed - operations in the second part weren't executed
Flux.fromIterable(signaturesFromFile).map(signatureFromFile -> {
SignatureKey signatureKey = new SignatureKey(signatureFromFile.getProfileId(), signatureFromFile.getUserId(), signatureFromFile.getId());
//First part:
signatureRepositoryCassandra.existsById(signatureKey)
.map(exists -> {
if (!exists) {
notSavedSignatures.add(signatureFromFile);
}
return null;
}).subscribe();
return null;
}).mergeWith(e -> {
//Second part:
String notSavedSignaturesListJson = convertSignaturesToJson(notSavedSignatures);
String pathToNotSavedSigantures = NOT_TRANSFERRED_SIGNATURES_DIRECTORY + signatureFilename;
saveIfFileNotExist(getUserIdFromFileName(signatureFilename), pathToNotSavedSigantures, notSavedSignaturesListJson);
deleteTransferredByUserIdSignaturesFile(signatureFilename);
}).subscribe();
I have a workaround for that problem ->
.findAll() and filter data instead of many executed .existById()
but I would like to resolve it with the correct operators ;)

Related

CosmosDB : CosmosPatchOperation not working via Stored Procedure

I want to replace cosmos batch with Stored Proc as my requirement is to upsert 100+ records which cosmos batch does not support. I am adding 2 java objects and 1 CosmosPatchOperations
in List and passing to below method.Whenver I am adding cosmos patch object no rows got inserted/updated otherwise it is working fine.I want to perform both insertion and patch operation in same transaction. Can somebody please guide how to modify SP so that it supports both insert and patch operation.
String rowsUpserted = "";
try
{
rowsUpserted = container
.getScripts()
.getStoredProcedure("createEvent")
.execute(Arrays.asList(listObj), options)
.getResponseAsString();
}catch(Exception e){
e.printStackTrace();
}
Stored Proc
function createEvent(items) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var count = 0;
if (!items) throw new Error("The array is undefined or null.");
var numItems = items.length;
if (numItems == 0) {
getContext().getResponse().setBody(0);
return;
}
tryCreate(items[count], callback);
function tryCreate(item, callback) {
var options = { disableAutomaticIdGeneration: false };
var isAccepted = collection.upsertDocument(collectionLink, item, options, callback);
if (!isAccepted) getContext().getResponse().setBody(count);
}
function callback(err, item, options) {
if (err) throw err;
count++;
if (count >= numItems) {
getContext().getResponse().setBody(count);
} else {
tryCreate(items[count], callback);
}
}
}

Patching doesn't appear to be supported by the Collection type in the Javascript stored proc API. I suspect this was done as it's more an optimisiation for remote calls and SP execute locally so it's not really neccessary.
API reference is here: http://azure.github.io/azure-cosmosdb-js-server/Collection.html
upsertDocument is expecting the full document.

convert for loop in java reactive programming - helidon

I am new to reactive programming and using helidon reactive java libraries in our code.
I am unable to achieve the below use case.
I have a scenario as below.
First I invoke a REST API and get response.From the response that contains list of countries I have to invoke another
REST api that retrieves the response for a country id and update the country object.
By the time I invoke second API and set value to country object as below the response is already returned.
I get use .get() and wait() on Single as it blocks the thread.
Please suggest how to overcome the below for loop and update the list of objects reactive way.
Single<WebClientResponse> singleWebClientResp = webClient.get("REST_URL");
Single<String> apiResponse = singleWebClientResponse.flatMapSingle(webClientResponse -> {
return webClientResponse.content().as(String.class);
});
apiResponse.flatMapSingle(fusionAPIResponseString -> {
List<Country> countries =
objectMapper.readValue(fusionAPIResponseString,new TypeReference<List<Country>>() {});
for (Country country : countries) {
getCountryByRegion(country.getRegion()).forSingle(newCountry -> {
LOGGER.log(Level.FINE, "newCountry ---> " + newCountry);
country.setRegion(country.getRegion() + "modified" + newCountry);
});
}
});
private Single<String> getCountryByRegion(String regionName) {
LOGGER.log(Level.FINE, "Entering getCountryByRegion");
Single<WebClientResponse> singleWebClientResponse2 = webClient.get().path("v3.1/region/" + regionName)
.contentType(MediaType.APPLICATION_JSON).request();
Single<String> retVal = singleWebClientResponse2.flatMapSingle(webClientResponse -> {
return webClientResponse.content().as(String.class);
});
LOGGER.log(Level.FINE, "Exiting getCountryByRegion");
return retVal;
}
Regards

// NOTE: this should be a static constant
GenericType<List<Country>> countriesType = new GenericType<>() {};
// NOTE: create the webClient only once, not for every request
WebClient webClient = WebClient.builder()
.addMediaSupport(JacksonSupport.create())
.baseUri("service-url")
.build();
// the pipeline starts with the initial countries (i.e. Single<List<Country>>)
webClient.get()
.path("/countries")
// get the countries as List<Country>
.request(countriesType)
// add each country to the reactive pipeline (i.e. Multi<Country>)
// to allow individual reactive mapping
.flatMap(Multi::just)
// map each country by creating a new country with new region
// use flatMap to inline the webClient result in the reactive pipeline
.flatMap(country ->
webClient.get()
.path("/region/" + country.getRegion())
.request(String.class)
.map(newCountry -> new Country(newCountry, country.getRegion())))
// aggregate all items (i.e. Single<List<Country>>)
.collectList()
.onError(res::send)
.forSingle(res::send))

Kafka Streams. How to emit the final result in an aggregation window with suppressor

With my huge surprise I realized that the "suppress" operator does not emit the last event at window close but only when another event is published on the partition the streams' tasks belong to. So, how to emit the final aggregate result without having never ending event stream?
In a CDC pattern we cannot wait for a subsequent database operation, which could take place after a long time, to emit the final result of the previous aggregation.
The idea is to schedule a FutureTask to send a particular event, let's suppose with a key with a fixed "FLUSH" value, which timestamp fall out of the previous aggregation window. This "FLUSH" event, then, will be filtered out after the suppress step.
For Every record peeked from the stream, will be scheduled a "FLUSH" event, eventually replacing the previous one not yet started to minimize the unnecessary "FLUSH" events.
In this example I used a Tumbling Window, but conceptually it works with other Window kinds too.
Therefore, let's suppose we have a topic "user" and want to aggregate in a list all records falling in a 10 seconds 'Tumbling Window'.
The model are:
User.java
public class User {
private String name;
private String surname;
private String timestamp;
}
UserGrouped.java
public class UserGrouped {
private List<User> userList = new ArrayList<User>();
}
Topology
...
KStream<String, User> userEvents = builder.stream(userTopic, userConsumerOptions);
TimeWindows tumblingWindow = TimeWindows.of(Duration.ofSeconds(windowDuration))
.grace(Duration.ofSeconds(windowGracePeriod));
KStream<String,UserGrouped> userGroupedStram = userEvents
.peek( (key,value) -> {
//Filter out the previous "flush" event to avoid scheduler loop
if (!key.equalsIgnoreCase("FLUSH")) {
//For each event is scheduled a future task that
//will send a "flush" event to all partition assigned to the stream.
scheduleFlushEvent(value.getTimestamp());
}
})
.groupByKey()
.windowedBy(tumblingWindow)
.aggregate(
//INITIALIZER
() -> new UserGrouped(),
//AGGREGATOR
(key, user, userGrouped) -> {
userGrouped.getUserList().add(user);
return userGrouped;
},
//STREAM STORE
Materialized.<String,UserGrouped,WindowStore<Bytes, byte[]>>
as("userGroupedWindowStore")
.withKeySerde(Serdes.String())
.withValueSerde(JsonSerdes.UserGrouped()) //Custom Serdes
)
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded().shutDownWhenFull()))
.toStream( (windowedKey,value) -> windowedKey.key())
//Discard the flush event
.filterNot((key,value) -> key.equalsIgnoreCase("FLUSH"))
.peek( (key, value) -> {
int sizeList = value != null && value.getUserList() != null ? value.getUserList().size() : 0;
log.info("#### USER GROUPED KEY: {}, Num of elements: {}",key, sizeList);
})
;
The scheduler method
private void scheduleFlushEvent(String lastEventTimestamp) {
//add 1 second to (windowSizeInSeconds + windowGracePeriod) to ensure that the flush event will be out of last window
Long delay = Long.valueOf(windowDuration + windowGracePeriod + 1);
//FIND PARTITIONS ASSIGNED TO THE CURRENT STREAM.
//The partitions assigned may change after rebalance events,
//so I need to get them in every iteration.
//In a Spring context you can use a RebalanceListener to update a 'partitionList'
//field of this class defined with #Component annotation
Set<Integer> partitionList = new HashSet<Integer>();
StreamThread currThread = (StreamThread)Thread.currentThread();
for (TaskMetadata taskMetadata : currThread.threadMetadata().activeTasks()) {
for(TopicPartition topicPartition : taskMetadata.topicPartitions()) {
partitionList.add(topicPartition.partition());
}
}
Callable<List<RecordMetadata>> task = () -> {
try {
List<RecordMetadata> recordMetadataList = new ArrayList<RecordMetadata>();
Instant instant = Instant.from(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.parse(lastEventTimestamp));
instant = instant.plusSeconds(delay);
String flushEventTimestamp = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.withZone(ZoneId.systemDefault() )
.format(instant);
User userFlush = new User();
userFlush.setTimestamp(flushEventTimestamp);
String userFlushValue = new String(JsonSerdes.User().serializer().serialize(userTopic, userFlush));
//SEND FLUSH EVENT TO ALL PARTITION ASSIGNED TO THE STREAM THREAD
for(Integer partition : partitionList) {
ProducerRecord<String,String> userRecord = new ProducerRecord<String, String>(userTopic, partition, "FLUSH", userFlushValue);
RecordMetadata recordMetadata = userFlushProducer.send(userRecord).get();
recordMetadataList.add(recordMetadata);
log.info("SENT FLUSH EVENT PARTITION: {}, VALUE: {}",partition, userFlushValue);
}
return recordMetadataList;
} catch (Exception e) {
log.error("ERROR", e);
return null;
}
};
//TASK NOT SCHEDULED YET
if(scheduledFuture == null
|| scheduledFuture.isDone()) {
log.debug("task scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
//TASK ALREADAY SCHEDULED.
//Stop the previous scheduled task and start a newer task with an postponed delay
} else {
if(!scheduledFuture.isDone()
&& scheduledFuture.cancel(false)) {
log.debug("task RE-scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
} else {
log.warn("task not RE-scheduled");
}
}
}

RxJava return JsonArray from Observable<JsonArray>

I am fairly new to functional programming and reactive RxJava. I want to get id and name of a device from database and store it in a Map, I am doing it in RxJava style. I am calling a function that doesn't need to return anything
.doOnNext(t -> updateAssetNameMap())
then the function looks like;
private void updateDeviceNameMap() {
LOGGER.debug("Reading device name and id from database");
Observable<SQLConnection> jdbcConnection = createJdbcConnection();
Scheduler defaultScheduler = RxHelper.scheduler(vertx);
Observable<JsonArray> res = jdbcConnection //need to return JsonArray
.flatMap(connection -> just(connection)
.flatMap(j -> runQuery(connection, "SELECT name,id FROM device")
.observeOn(defaultScheduler)
.doOnNext(m -> LOGGER.info("size: " + m.size()))
.flatMap(job -> { LOGGER.info(">>" + job.getJsonArray(0));
//or if I can extract JsonArray items here,
//I can update my Map here too.
return just(job.getJsonArray(0));
}
)
.doOnError(e -> { LOGGER.error("failed to connect to db", e);
connection.close(); })
.doOnCompleted(connection::close)
.onErrorReturn(e -> null));
//System.out.println("" + res.map(d -> LOGGER.info(d.toString())));
//get the JsonArray and update the deviceNameMap
The connection to DB is made successfully and query is also done correctly.
I can convert any Object to Observable by Observable.from(ObjectName), but can't to the opposite. An appropriate mapping needs to be done after .flatMap(job -> just(job.getJsonArray(0)) but I have no clue how. After running the Verticle, I even cannot see anything logged from line .flatMap(job -> { LOGGER.info(">>" + job.getJsonArray(0));.
Am I missing something ?

You must subscribe to your Observable<JsonArray> otherwise nothing happens.

How to add elements to ConcurrentHashMap using ExecutorService

I have a requirement of reading User Information from 2 different sources (db) per userId and storing consolidated information in a Map with key as userId. Users in numbers can vary based on period they have opted for. Group of users may belong to different Period of Year.eg daily, weekly, monthly users.
I used HashMap and LinkedHashMap to get this done. As it slows down the process and to make it faster, I thought of using Threading here.
Reading some tutorials and examples now I am using ConcurrentHashMap and ExecutorService.
In cases based on some validation I want to skip the current iteration and move to next User info. It doesnot allow to use continue keyword to use within for loop. Is there any way to achieve same differently within Multithreaded code.
Moreover below code piece though it works, but its not significantly that faster than the code without threading which creates doubt if Executor Service is implemented correctly.
How do we debug in case we get any error in Multithreaded code. Execution holds at debug point but its not consistent and it does not move to next line with F6.
Can someone point out if I am missing something in the code. Or any other example of simillar use case also can be of great help.
public void getMap() throws UserException
{
long startTime = System.currentTimeMillis();
Map<String, Map<Integer, User>> map = new ConcurrentHashMap<String, Map<Integer, User>>();
//final String key = "";
try
{
final Date todayDate = new Date();
List<String> applyPeriod = db.getPeriods(todayDate);
for (String period : applyPeriod)
{
try
{
final String key = period;
List<UserTable1> eligibleUsers = db.findAllUsers(key);
Map<Integer, User> userIdMap = new ConcurrentHashMap<Integer, User>();
ExecutorService executor = Executors.newFixedThreadPool(eligibleUsers.size());
CompletionService<User> cs = new ExecutorCompletionService<User>(executor);
int userCount=0;
for (UserTable1 eligibleUser : eligibleUsers)
{
try
{
cs.submit(
new Callable<User>()
{
public User call()
{
int userId = eligibleUser.getUserId();
List<EmployeeTable2> empData = db.findByUserId(userId);
EmployeeTable2 emp = null;
if (null != empData && !empData.isEmpty())
{
emp = empData.get(0);
}else{
String errorMsg = "No record found for given User ID in emp table";
logger.error(errorMsg);
//continue;
// conitnue does not work here.
}
User user = new User();
user.setUserId(userId);
user.setFullName(emp.getFullName());
return user;
}
}
);
userCount++;
}
catch(Exception ex)
{
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
}
for (int i = 0; i < userCount ; i++ ) {
try {
User user = cs.take().get();
if (user != null) {
userIdMap.put(user.getUserId(), user);
}
} catch (ExecutionException e) {
} catch (InterruptedException e) {
}
}
executor.shutdown();
map.put(key, userIdMap);
}
catch(Exception ex)
{
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
}
}
catch(Exception ex){
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
logger.info("Size of Map : " + map.size());
Set<String> periods = map.keySet();
logger.info("Size of periods : " + periods.size());
for(String period :periods)
{
Map<Integer, User> mapOfuserIds = map.get(period);
Set<Integer> userIds = mapOfuserIds.keySet();
logger.info("Size of Set : " + userIds.size());
for(Integer userId : userIds){
User inf = mapOfuserIds.get(userId);
logger.info("User Id : " + inf.getUserId());
}
}
long endTime = System.currentTimeMillis();
long timeTaken = (endTime - startTime);
logger.info("All threads are completed in " + timeTaken + " milisecond");
logger.info("******END******");
}

You really don't want to create a thread pool with as many threads as users you've read from the db. That doesn't make sense most of the time because you need to keep in mind that threads need to run somewhere... There are not many servers out there with 10 or 100 or even 1000 cores reserved for your application. A much smaller value like maybe 5 is often enough, depending on your environment.
And as always for topics about performance: You first need to test what your actual bottleneck is. Your application may simply don't benefit of threading because e.g. you are reading form a db which only allows 5 concurrent connections a the same time. In that case all your other 995 threads will simply wait.
Some other thing to consider is network latency: Reading multiple user ids from multiple threads may even increase the round trip time needed to get the data for one user from the database. An alternative approach might be to not read one user at a time, but the data of all 10'000 of them at once. That way your maybe available 10 GBit Ethernet connection to your database might really speed things up because you have only small communication overhead with the database but it might serve you all data you need in one answer quickly.
So in short, from my opinion your question is about performance optimization of your problem in general, but you don't know enough yet to decide which way to go.

you could try something like that:
List<String> periods = db.getPeriods(todayDate);
Map<String, Map<Integer, User>> hm = new HashMap<>();
periods.parallelStream().forEach(s -> {
eligibleUsers = // getEligibleUsers();
hm.put(s, eligibleUsers.parallelStream().collect(
Collectors.toMap(UserTable1::getId,createUserForId(UserTable1:getId))
});
); //
And in the createUserForId you do your db-reading
private User createUserForId(Integer id){
db.findByUserId(id);
//...
User user = new User();
user.setUserId(userId);
user.setFullName(emp.getFullName());
return user;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reactive: executing many .existById() then executing some operations on collected data - java

Related

CosmosDB : CosmosPatchOperation not working via Stored Procedure

convert for loop in java reactive programming - helidon

Kafka Streams. How to emit the final result in an aggregation window with suppressor

RxJava return JsonArray from Observable<JsonArray>

How to add elements to ConcurrentHashMap using ExecutorService

Categories

Resources