How to dynamically update an RX Observable?

How to dynamically update an RX Observable? - java

(Working in RxKotlin and RxJava, but using metacode for simplicity)
Many Reactive Extensions guides begin by creating an Observable from already available data. From The introduction to Reactive Programming you've been missing, it's created from a single string
var soureStream= Rx.Observable.just('https://api.github.com/users');
Similarly, from the frontpage of RxKotlin, from a populated list
val list = listOf(1,2,3,4,5)
list.toObservable()
Now consider a simple filter that yields an outStream,
var outStream = sourceStream.filter({x > 3})
In both guides the source events are declared apriori. Which means the timeline of events has some form
source: ----1,2,3,4,5-------
out: --------------4,5---
How can I modify sourceStream to become more of a pipeline? In other words, no input data is available during sourceStream creation? When a source event becomes available, it is immediately processed by out:
source: ---1--2--3-4---5-------
out: ------------4---5-------
I expected to find an Observable.add() for dynamic updates
var sourceStream = Observable.empty()
var outStream = sourceStream.filter({x>3})
//print each element as its added
sourceStream .subscribe({println(it)})
outStream.subscribe({println(it)})
for i in range(5):
sourceStream.add(i)
Is this possible?

I'm new, but how could I solve my problem without a subject? If I'm
testing an application, and I want it to "pop" an update every 5
seconds, how else can I do it other than this Publish subscribe
business? Can someone post an answer to this question that doesn't
involve a Subscriber?
If you want to pop an update every five seconds, then create an Observable with the interval operator, don't use a Subject. There are some dozen different operators for constructing Observables so you rarely need a subject.
That said, sometimes you do need one, and they come in very handy when testing code. I use them extensively in unit tests.
To Use Subject Or Not To Use Subject? is and excellent article on the subject of Subjects.

Related

Persisting state into Kafka using Kafka Streams

I am trying to wrap my head around Kafka Streams and having some fundamental questions that I can't seem to figure out on my own. I understand the concept of a KTable and Kafka State Stores but am having trouble deciding how to approach this. I am also using Spring Cloud Streams, which adds another level of complexity on top of this.
My use case:
I have a rule engine that reads in a Kafka event, processes the event, returns a list of rules that matched and writes it into another topic. This is what I have so far:
#Bean
public Function<KStream<String, ProcessNode>, KStream<String, List<IndicatorEvaluation>>> process() {
return input -> input.mapValues(this::analyze).filter((host, evaluation) -> evaluation != null);
}
public List<IndicatorEvaluation> analyze(final String host, final ProcessNode process) {
// Does stuff
}
Some of the stateful rules look like:
[some condition] REPEATS 5 TIMES WITHIN 1 MINUTE
[some condition] FOLLOWEDBY [some condition] WITHIN 1 MINUTE
[rule A exists and rule B exists]
My current implementation is storing all this information in memory to be able to perform the analysis. For obvious reasons, it is not easily scalable. So I figured I would persist this into a Kafka State Store.
I am unsure of the best way to go about it. I know there is a way to create custom state stores that allow for a higher level of flexibility. I'm not sure if the Kafka DSL will support this.
Still new to Kafka Streams and wouldn't mind hearing a variety of suggestions.

From the description you have given, I believe this use case can still be implemented using the DSL in Kafka Streams. The code you have shown above does not track any state. In your topology, you need to add state by tracking the counts of the rules and store them in a state store. Then you only need to send the output rules when that count hits a threshold. Here is the general idea behind this as a pseudo-code. Obviously, you have to tweak this to satisfy the particular specifications of your use case.
#Bean
public Function<KStream<String, ProcessNode>, KStream<String, List<IndicatorEvaluation>>> process() {
return input -> input
.mapValues(this::analyze)
.filter((host, evaluation) -> evaluation != null)
...
.groupByKey(...)
.windowedBy(TimeWindows.of(Duration.ofHours(1)))
.count(Materialized.as("rules"))
.filter((key, value) -> value > 4)
.toStream()
....
}

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then finally convert into final result hibernate object. In a nutshell something like POJO1 to POJO2 to POJO3.
In Java is there a way where I can deduce that an attribute from POJO3 was made/transformed from this attribute of POJO1. I want to look something where I can capture data flow from one model to another. This tool can be either compile time or runtime, I am ok with both.
I am looking for a tool which can run in parallel with code and provide data lineage details on each run basis.

Now instead of Pojos I will call them States! You are having a start position you iterate and transform your model through different states. At the end you have a final terminal state that you would like to persist to the database
stream(A).map(P1).map(P2).map(P3)....-> set of B
If you use a technic known as Event sourcing you can deduce it yes. How would this look like then? Instead of mapping directly A to state P1 and state P1 to state P2 you will queue all your operations that are necessary and enough to map A to P1 and P1 to P2 and so on... If you want to recover P1 or P2 at any time, it will be just a product of the queued operations. You can at any time rewind forward or rewind backwards as long as you have not yet chaged your DB state. P1,P2,P3 can act as snapshots.
This way you will be able to rebuild the exact mapping flow for this attribute. How fine grained you will queue your oprations, if it is going to be as fine as attribute level , or more course grained it is up to you.
Here is a good article that depicts event sourcing and how it works: https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224
UPDATE:
I can think of one more technic to capture the attribute changes. You can instument your Pojo-s, it is pretty much the same technic used by Hibernate to enhance Pojos and same technic profiles use to for tracing. Then you can capture and react to each setter invocation on the Pojo1,Pojo2,Pojo3. Not sure if I would have gone that way though....
Here is some detiled readin about the byte code instrumentation if https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

I would imagine two reasons, either the code is not developed by you and therefore you want to understand the flow of data along with combinations to convert input to output OR your code is behaving in a way that you are not expecting.
I think you need to log the values of all the pojos, inputs and outputs to any place that you can inspect later for each run.
Example: A database table if you might need after hundred of runs, but if its one time may be to a log in appropriate form. Then you need to yourself manually use those data values layer by later to map to the next layer. I think with availability of code that would be easy. If you have a different need pls. explain.
Please accept and like if you appreciate my gesture to help with my ideas n experience.

There are "time travelling debuggers". For Java, a quick search did only spill this out:
Chronon Time Travelling Debugger, see this screencast how it might help you .
Since your transformations probably use setters and getters this tool might also be interesting: Flow
Writing your own java agent for tracking this is probably not what you want. You might be able to use AspectJ to add some stack trace logging to getters and setters. See here for a quick introduction.

Creating objects in parallel using RxJava

I have written a Spring Boot micro service using RxJava (aggregated service) to implement the following simplified usecase. The big picture is when an instructor uploads a course content document, set of questions should be generated and saved.
User uploads a document to the system.
The system calls a Document Service to convert the document into a text.
Then it calls another question generating service to generate set of questions given the above text content.
Finally these questions are posted into a basic CRUD micro service to save.
When a user uploads a document, lots of questions are created from it (may be hundreds or so). The problem here is I am posting questions one at a time sequentially for the CRUD service to save them. This slows down the operation drastically due to IO intensive network calls hence it takes around 20 seconds to complete the entire process. Here is the current code assuming all the questions are formulated.
questions.flatMapIterable(list -> list).flatMap(q -> createQuestion(q)).toList();
private Observable<QuestionDTO> createQuestion(QuestionDTO question) {
return Observable.<QuestionDTO> create(sub -> {
QuestionDTO questionCreated = restTemplate.postForEntity(QUESTIONSERVICE_API,
new org.springframework.http.HttpEntity<QuestionDTO>(question), QuestionDTO.class).getBody();
sub.onNext(questionCreated);
sub.onCompleted();
}).doOnNext(s -> log.debug("Question was created successfully."))
.doOnError(e -> log.error("An ERROR occurred while creating a question: " + e.getMessage()));
}
Now my requirement is to post all the questions in parallel to the CRUD service and merge the results on completion. Also note that the CRUD service will accept only one question object at a time and that can not be changed. I know that I can use Observable.zip operator for this purpose, but I have no idea on how to apply it in this context since the actual number of questions is not predetermined. How can I change the code in line 1 so that I can improve the performance of the application. Any help is appreciated.

By default the observalbes in flatMap operate on the same scheduler as you subscribed it on. In order to run your createQuestion observables in parallel, you have to subscribe them on a computation scheduler.
questions.flatMapIterable(list -> list)
.flatMap(q -> createQuestion(q).subscribeOn(Schedulers.computation()))
.toList();
Check this article for a full explanation.

Operations on Multiple Streams

Here's what I am doing: I have an event from an RSS feed that is telling me that a Ticket was edited. To get the changes made to that ticket, I have to call a REST service.
So I wanted to do it with a more compact, functional approach, but it just turned into a bunch of craziness. When in fact, the straight old style Java is this simple:
/**
* Since the primary interest is in what has been changed, we focus on getting changes
* often and pushing them into the appropriate channels.
*
* #return changes made since we last checked
*/
public List<ProcessEventChange> getLatestChanges(){
List<ProcessEventChange> recentChanges = new ArrayList<>();
List<ProcessEvent> latestEvents = getLatestEvents();
for (ProcessEvent event : latestEvents){
recentChanges.addAll(getChanges(event));
}
return recentChanges;
}
There were a couple questions on here that related to this that did not seem to have straightforward answers, I am asking this question so that there's a very specific example and the question is crystal clear: is it work reworking this with streams and if so how?
If streams are not good for things like this they are really not good for much. The reason I say that is this is a very common requirement: that some piece of data be enriched with more information from another source.

What you need is flatMap, which can map a single ProcessEvent object of the input list to multiple ProcessEventChange objects, and flatten all those objects to a single Stream of ProcessEventChange.
List<ProcessEventChange> recentChanges = getLatestEvents().
stream().
flatMap(e -> getChanges(e).stream()).
collect(Collectors.toList());

Sequential execution of async operations in Android

Sequential execution of asynchronous operations in Android is at least complicated.
Sequential execution that used to be a semi-colon between two operators like in do_this(); do_that() now requires chaining listeners, which is ugly and barely readable.
Oddly enough, the examples that demonstrate the need for chaining sequential operations usually look contrived, but today I found a perfectly reasonable one.
In Android there is in-app billing, an application can support multiple so-called in-app products (also known as SKU = stock keeping unit), letting the user, for example, buy (pay for) only the functionality that he/she needs (and, alas, also letting bearded men sell bitmaps to teenagers).
The function that retrieves in-app product info is
public void queryInventoryAsync(final boolean querySkuDetails,
final List<String> moreSkus,
final QueryInventoryFinishedListener listener)
and it has a restriction that the list must contain at most 20 items. (Yes it does.)
Even if only a few of these 20 are registered as in-app products.
I want to retrieve, say, information about one hundred in-app products. The first thought would be to invoke this function in a loop, but only one asynchronous operation with the market is allowed at any moment.
One may of course say "do not reuse, change the source", and even provide very good arguments for that, and this is probably what I will finally do, but I write this because I want see an elegant reuse solution.
Is there an elegant (=not cumbersome) pattern or trick that allows to chain several asynchronous operations in the general case?
(I underline that the asynchronous operation that uses a listener is pre-existing code.)
UPD this is what is called "callback hell" ( http://callbackhell.com/ ) in the JavaScript world.

You can sequence AsyncTasks one after the other by calling the execute() method of the next AsyncTask in the onPostExecute() method of the previous one.

Handlers are useful for sequential work on any thread, not only on the UI thread.
Check out HandlerThread, create a Handler based on its Looper, and post background work to the handler.

It looks like ReactiveX promises exactly this.
http://blog.danlew.net/2014/09/22/grokking-rxjava-part-2/
query("Hello, world!") // Returns a List of website URLs based on a text search
.flatMap(urls -> Observable.from(urls))
.flatMap(url -> getTitle(url)) // long operation
.filter(title -> title != null)
.subscribe(title -> System.out.println(title));
ReactiveX for Android:
https://github.com/ReactiveX/RxAndroid
Retrolambda: https://github.com/orfjackal/retrolambda (Lambdas for Java 5,6,7)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.