Observable subscription not kicking off sequence of events

Observable subscription not kicking off sequence of events - java

I have the following method start that returns an Observable.
If I subscribe to it, the operations inside the handle (they are DB insert operations)
does not seem to happen. But If I directly subscribe to the handle method, it works and able to perform DB insertions.
Stepping through debug, the data is all correctly captured. It just seems like the subscription didn't work as intended.
Can I please get some advice on why it doesn't work if I subscribe to the start method, and how to fix this please? Thanks.
To note: Unsure if it matters, using RX Java 1.
Doesn't work if I subscribe to following.
public Observable start(KafkaConsumerRecords<String, String> records) {
return Observable.from(records.getDelegate().records().records("TOPIC_NAME"))
.buffer(2)
.map(this::convert)
.map(o -> handle(o));
}
It works if I subscribe to the handle method directly. (By passing in a hardcoded eventObject for testing)
public Observable<ResultSet> handle(Object eventObject) {
Map<String, String> map = (Map<String, String>) eventObject;
String s1 = map.get("item1");
String s2 = map.get("item2");
Observable<ResultSet> rs1 = someDBInsertMethod1(s1);
Observable<ResultSet> rs2 = someDBInsertMethod2(s2);
return Observable.merge(rs1, rs2);
}
For reference, convert method takes in an Object and returns a map.
public Map<String, String> convert(Object records) {
Map<String, String> map = new HashMap<String, String>();
// add some data to map from the object records.
return map;
}
This is how I subscribe via another class in a main method.
// doesn't work, db insert does not happen.
Observable result1 = myClass.start(getHardCodedRecords());
result1.subscribe(resultSet -> System.out.println("printing this thus no errors.... but no DB insert happened"));
//directly subscribing to the handle method and this time it works, able to insert into db.
Observable result2 = myClass.handle(getHardCodedMap());
result2.subscribe(resultSet -> System.out.println("printing this thus no errors.... works"));

Related

Object reuse - mutating same object - in Flink operators

I was reading the doc here, which gives a use case to reuse the object as given below:
stream
.apply(new WindowFunction<WikipediaEditEvent, Tuple2<String, Long>, String, TimeWindow>() {
// Create an instance that we will reuse on every call
private Tuple2<String, Long> result = new Tuple<>();
#Override
public void apply(String userName, TimeWindow timeWindow, Iterable<WikipediaEditEvent> iterable, Collector<Tuple2<String, Long>> collector) throws Exception {
long changesCount = ...
// Set fields on an existing object instead of creating a new one
result.f0 = userName;
// Auto-boxing!! A new Long value may be created
result.f1 = changesCount;
// Reuse the same Tuple2 object
collector.collect(result);
}
}
So every time instead of creating a new Tuple, it seems to be able to use the same Tuple by using its mutable nature in order to decrease the pressure on GC. Would it be applicable in all operators, where we can mutate and pass the same object in the pipeline via collector.collect(...) call?
By using that idea, I wonder in what places I can make such an optimization without breaking the code or introducing sneaky bugs. Again as an example a KeySelector which returns a Tuple taken from this answer given below:
KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer =
streamEmployee.keyBy(
new KeySelector<Employee, Tuple2<String, String>>() {
#Override
public Tuple2<String, String> getKey(Employee value) throws Exception {
return Tuple2.of(value.getCountry(), value.getEmployer());
}
}
);
I wonder if that case, can I reuse the same Tuple by mutating it with different inputs as below. Of course in all cases I assume parallelism is more than 1, probably much higher in a real use case.
KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer =
streamEmployee.keyBy(
new KeySelector<Employee, Tuple2<String, String>>() {
Tuple2<String, String> tuple = new Tuple2<>();
#Override
public Tuple2<String, String> getKey(Employee value) throws Exception {
tuple.f0 = value.getCountry();
tuple.f1 = value.value.getEmployer();
return tuple;
}
}
);
I do not know, if Flink copies objects between stages in the pipeline, so I wonder if it's safe to do such an optimization. I read about enableObjectReuse() configuration in the docs, though I am not sure if I really understood it. Actually, it may be a bit Flink internals, though could not understand when Flink does what to manage data/object/records in the pipeline. May be I should make this clear first?
Thanks,

This is sort of reuse in a KeySelector is not safe. keyBy is not an operator, and the usual rules about object reuse in operator chains (which I covered here) do not apply.

See Dave Anderson's answer to Flink, rule of using 'object reuse mode'
Basically you can't remember input object references across function calls or
modify input objects. So in your situation above with the KeySelector, you're modifying an object that you created, not an input object.

Get result from db in a loop with completable future

I'm using Spring Boot and Spring Data Jpa, and I have logic which consists of 3 request in db which I want to run in parallel. I want to use for this purpose CompletableFuture.
In the end I need to build response object from result of 5 db query runs. 3 of them currently I'm running in a loop.
So I've create CompletableFuture
CompletableFuture<Long> totalFuture = CompletableFuture.supplyAsync(() -> myRepository.getTotal());
CompletableFuture<Long> countFuture = CompletableFuture.supplyAsync(() -> myRepository.getCount());
Then I'm plannig to use .allOf with this future. But I have problem with loop calls. How to rewrite it to use callable as in every request I need to pass value from request, and then sort into map, by key ?
Map<String, Integer> groupcount = new HashMap<>();
request.ids().forEach((key, value) -> count.put(key, myRepository
.getGroupCountId(value));

To explain a little more throughly I'm posting a code snippet which I want to chain but for now it works like this.
List<CompletableFuture<Void>> completableFutures = new ArrayList<>();
Map<String, Integer> groupcount = new ConcurrentHashMap<>();
for (var id : request.Ids().entrySet()) {
completableFutures.add(
CompletableFuture.runAsync(someOperation, EXECUTOR_SERVICE)
.thenApply(v -> runQuery(v.getValues))
.thenAcceptAsync(res-> groupcount .put(v.key, res));
}
CompletableFuture.allOf(completableFutures.toArray(new CompletableFuture[0])).get();

Inconsistent responses when using ConcurrentHashMap in multi-threaded environment

We have a single thread that regularly updates a Map. And then we have multiple other threads that read this map.
This is how the update thread executes
private Map<String, SecondMap> firstMap = new ConcurrentHashMap<>();
private void refresh() //This method is called every X seconds by one thread only
{
List<SecondMap> newData = getLatestData();
final List<String> newEntries = new ArrayList<>();
for(SecondMap map : newData) {
newEntries.add(map.getName());
firstMap.put(map.getName(), map);
}
final Set<String> cachedEntries = firstMap.keySet();
for (final String cachedEntry : cachedEntries) {
if (!newEntries.contains(cachedEntry)) {
firstMap.remove(cachedEntry);
}
}
}
public Map<String, SecondMap> getFirstMap()//Other threads call this
{
return firstMap;
}
The SecondMap class looks like this
class SecondMap {
Map<String, SomeClass> data; //Not necessarily a concurrent hashmap
public Map<String, SomeClass> getData() {
return data;
}
}
Below is the simplified version of how reader threads access
public void getValue() {
Map<String, SecondMap> firstMap = getFirstMap();
SecondMap secondMap = firstMap.get("SomeKey");
secondMap.getData().get("AnotherKey");// This returns null
}
We are seeing that in other threads, when they iterate over the received
firstMap, sometimes they get null values for some keys in the SecondMap. We don't see any null values for keys in the firstMap, but we see null values for keys in second value. One thing that we can rule out is that the method getLatestData will never return such data. It reads from a database and returns these entries. There can never be null values in the database in the first place. Also we see that this happens occasionally. We are probably missing something here in handling multi-threaded situation in a proper way, but I am looking for an explanation why this can happen.

Assuming the Map<String, SomeClass> data; inside the SecondMap class is a HashMap, you can get a null value for a key in two scenarios.
1. If the key maps to a null value. Example "Something" -> null.
2. If the key is not in the map in the first place.
So without knowing much about where the data is coming from. If one of maps returned by getLatestData(); doesn't have the key "SomeKey" in the map at all, it will return null.
Also since there's not enough information about how that Map<String, SomeClass> data; is updated, and if it's mutable or immutable, you may have issues there. If that map is immutable and the SecondMap is immutable then it's more probably ok. But if you are modifying if from multiple threads you should make it a ConcurrentHashMap and if you update the reference to a new Map<String, SomeClass> data from different threads, inside the SecondMap you should also make that reference volatile.
class SecondMap {
volatile Map<String, SomeClass> data; //Not necessarily a concurrent hashmap
public Map<String, SomeClass> getData() {
return data;
}
}
If you'd like to understand in depth on when to use the volatile keyword and all the intricacies of data races, there's a section in this online course https://www.udemy.com/java-multithreading-concurrency-performance-optimization/?couponCode=CONCURRENCY
about it. I have not seen any resource that explains and demonstrates it better. And unfortunately there are so many articles online that just explain it WRONG, which is sad.
I hope from the little information in the question I was able to point you to some directions that might help. Please share more information if nothing of that works, or if something does work, please let me know, I'm curious to know what it was :)

prepare map using Observable Rx-java

Here i my sample code which returns list of JsonDocument from couchbase server.
Cluster cluster = CouchbaseCluster.create();
Bucket bucket = cluster.openBucket();
List<JsonDocument> foundDocs = Observable
.just("key1", "key2", "key3", "key4", "key5")
.flatMap(new Func1<String, Observable<JsonDocument>>() {
#Override
public Observable<JsonDocument> call(String id) {
return bucket.async().get(id);
}
})
.toList()
.toBlocking()
.single();
I want to return Map instead of List.My return type would be Map<String, JsonDocument>.
I tried with toMap method but it did not work for me.

You correctly mentioned in your comment that toMap needs a Function as its argument. This Function will be used to "extract" (or maybe "construct") the key under which each value will be entered into the map.
So, in your case you will need a Function<JsonDocument, String> that takes a JsonDocument as its input and somehow returns a String which you will later use to find the JsonDocument in the Map. What that String will be is something only you can answer - maybe there is some ID inside the JsonDocument?

How to combine these 2 lambdas into a single GroupBy call?

I have a function like this:
private static Map<String, ResponseTimeStats> perOperationStats(List<PassedMetricData> scopedMetrics, Function<PassedMetricData, String> classifier)
{
Map<String, List<PassedMetricData>> operationToDataMap = scopedMetrics.stream()
.collect(groupingBy(classifier));
return operationToDataMap.entrySet().stream()
.collect(toMap(Map.Entry::getKey, e -> StatUtils.mergeStats(e.getValue())));
}
Is there any way to have the groupBy call do the transformation that i do explicitly in line 2 so i dont have to separately stream over the map?
Update
Here is what mergeStats() looks like:
public static ResponseTimeStats mergeStats(Collection<PassedMetricData> metricDataList)
{
ResponseTimeStats stats = new ResponseTimeStats();
metricDataList.forEach(data -> stats.merge(data.stats));
return stats;
}

If you can rewrite StatUtils.mergeStats into a Collector, you could just write
return scopedMetrics.stream().collect(groupingBy(classifier, mergeStatsCollector));
And even if you can't do this, you could write
return scopedMetrics.stream().collect(groupingBy(classifier,
collectingAndThen(toList(), StatUtils::mergeStats)));

In order to group the PassedMetricData instances, you must consume the entire Stream since, for example, the first and last PassedMetricData might be grouped into the same group.
That's why the grouping must be a terminal operation on the original Stream and you must create a new Stream in order to do the transformation on the results of this grouping.
You could chain these two statements, but it won't make much of a difference :
private static Map<String, ResponseTimeStats> perOperationStats(List<PassedMetricData> scopedMetrics, Function<PassedMetricData, String> classifier)
{
return scopedMetrics.stream()
.collect(groupingBy(classifier)).entrySet().stream()
.collect(toMap(Map.Entry::getKey, e -> StatUtils.mergeStats(e.getValue())));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Observable subscription not kicking off sequence of events - java

Related

Object reuse - mutating same object - in Flink operators

Get result from db in a loop with completable future

Inconsistent responses when using ConcurrentHashMap in multi-threaded environment

prepare map using Observable Rx-java

How to combine these 2 lambdas into a single GroupBy call?

Categories

Resources