Operations on Multiple Streams

Operations on Multiple Streams - java

Here's what I am doing: I have an event from an RSS feed that is telling me that a Ticket was edited. To get the changes made to that ticket, I have to call a REST service.
So I wanted to do it with a more compact, functional approach, but it just turned into a bunch of craziness. When in fact, the straight old style Java is this simple:
/**
* Since the primary interest is in what has been changed, we focus on getting changes
* often and pushing them into the appropriate channels.
*
* #return changes made since we last checked
*/
public List<ProcessEventChange> getLatestChanges(){
List<ProcessEventChange> recentChanges = new ArrayList<>();
List<ProcessEvent> latestEvents = getLatestEvents();
for (ProcessEvent event : latestEvents){
recentChanges.addAll(getChanges(event));
}
return recentChanges;
}
There were a couple questions on here that related to this that did not seem to have straightforward answers, I am asking this question so that there's a very specific example and the question is crystal clear: is it work reworking this with streams and if so how?
If streams are not good for things like this they are really not good for much. The reason I say that is this is a very common requirement: that some piece of data be enriched with more information from another source.

What you need is flatMap, which can map a single ProcessEvent object of the input list to multiple ProcessEventChange objects, and flatten all those objects to a single Stream of ProcessEventChange.
List<ProcessEventChange> recentChanges = getLatestEvents().
stream().
flatMap(e -> getChanges(e).stream()).
collect(Collectors.toList());

Related

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then finally convert into final result hibernate object. In a nutshell something like POJO1 to POJO2 to POJO3.
In Java is there a way where I can deduce that an attribute from POJO3 was made/transformed from this attribute of POJO1. I want to look something where I can capture data flow from one model to another. This tool can be either compile time or runtime, I am ok with both.
I am looking for a tool which can run in parallel with code and provide data lineage details on each run basis.

Now instead of Pojos I will call them States! You are having a start position you iterate and transform your model through different states. At the end you have a final terminal state that you would like to persist to the database
stream(A).map(P1).map(P2).map(P3)....-> set of B
If you use a technic known as Event sourcing you can deduce it yes. How would this look like then? Instead of mapping directly A to state P1 and state P1 to state P2 you will queue all your operations that are necessary and enough to map A to P1 and P1 to P2 and so on... If you want to recover P1 or P2 at any time, it will be just a product of the queued operations. You can at any time rewind forward or rewind backwards as long as you have not yet chaged your DB state. P1,P2,P3 can act as snapshots.
This way you will be able to rebuild the exact mapping flow for this attribute. How fine grained you will queue your oprations, if it is going to be as fine as attribute level , or more course grained it is up to you.
Here is a good article that depicts event sourcing and how it works: https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224
UPDATE:
I can think of one more technic to capture the attribute changes. You can instument your Pojo-s, it is pretty much the same technic used by Hibernate to enhance Pojos and same technic profiles use to for tracing. Then you can capture and react to each setter invocation on the Pojo1,Pojo2,Pojo3. Not sure if I would have gone that way though....
Here is some detiled readin about the byte code instrumentation if https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

I would imagine two reasons, either the code is not developed by you and therefore you want to understand the flow of data along with combinations to convert input to output OR your code is behaving in a way that you are not expecting.
I think you need to log the values of all the pojos, inputs and outputs to any place that you can inspect later for each run.
Example: A database table if you might need after hundred of runs, but if its one time may be to a log in appropriate form. Then you need to yourself manually use those data values layer by later to map to the next layer. I think with availability of code that would be easy. If you have a different need pls. explain.
Please accept and like if you appreciate my gesture to help with my ideas n experience.

There are "time travelling debuggers". For Java, a quick search did only spill this out:
Chronon Time Travelling Debugger, see this screencast how it might help you .
Since your transformations probably use setters and getters this tool might also be interesting: Flow
Writing your own java agent for tracking this is probably not what you want. You might be able to use AspectJ to add some stack trace logging to getters and setters. See here for a quick introduction.

How to dynamically update an RX Observable?

(Working in RxKotlin and RxJava, but using metacode for simplicity)
Many Reactive Extensions guides begin by creating an Observable from already available data. From The introduction to Reactive Programming you've been missing, it's created from a single string
var soureStream= Rx.Observable.just('https://api.github.com/users');
Similarly, from the frontpage of RxKotlin, from a populated list
val list = listOf(1,2,3,4,5)
list.toObservable()
Now consider a simple filter that yields an outStream,
var outStream = sourceStream.filter({x > 3})
In both guides the source events are declared apriori. Which means the timeline of events has some form
source: ----1,2,3,4,5-------
out: --------------4,5---
How can I modify sourceStream to become more of a pipeline? In other words, no input data is available during sourceStream creation? When a source event becomes available, it is immediately processed by out:
source: ---1--2--3-4---5-------
out: ------------4---5-------
I expected to find an Observable.add() for dynamic updates
var sourceStream = Observable.empty()
var outStream = sourceStream.filter({x>3})
//print each element as its added
sourceStream .subscribe({println(it)})
outStream.subscribe({println(it)})
for i in range(5):
sourceStream.add(i)
Is this possible?

I'm new, but how could I solve my problem without a subject? If I'm
testing an application, and I want it to "pop" an update every 5
seconds, how else can I do it other than this Publish subscribe
business? Can someone post an answer to this question that doesn't
involve a Subscriber?
If you want to pop an update every five seconds, then create an Observable with the interval operator, don't use a Subject. There are some dozen different operators for constructing Observables so you rarely need a subject.
That said, sometimes you do need one, and they come in very handy when testing code. I use them extensively in unit tests.
To Use Subject Or Not To Use Subject? is and excellent article on the subject of Subjects.

Hazelcast keySet streaming?

I'm new to Hazelcast, and I'm trying to use it to store data in a map that is too large than possible to fit on a single machine.
One of the processes that I need to implement is to go over each of the values in the map and do something with them - not accumulating or aggregation and I don't need to see all the data at once, so there is no memory concern with that.
My trivial implementation would be to use IMap.keySet() and then to iterate over all the keys to get each stored value in turn (and allow the value to be GCed after processing), but my concern is that there is going to be so much data in the system that even just getting the list of keys will be large enough to put undue stress on the system.
I was hoping that there was a streaming API that I can stream keys (or even full entries) in such a way that the local node will not have to cache the entire set locally - but failed to find anything that seemed relevant to me in the documentation.
I would appreciate any suggestions that you may come up with. Thanks.

Hazelcast Jet provides distributed version of j.u.s and adds «streaming» capabilities to IMap.
It allows execution of Java Streams API on the Hazelcast cluster.
import com.hazelcast.jet.JetInstance;
import com.hazelcast.jet.stream.DistributedCollectors;
import com.hazelcast.jet.stream.IStreamMap;
import com.hazelcast.jet.stream.IStreamList;
import static com.hazelcast.jet.stream.DistributedCollectors.toIList;
final IStreamMap<String, Integer> streamMap = instance1.getMap("source");
// stream of entries, you can grab keys from it
IStreamList<String> counts = streamMap.stream()
.map(entry -> entry.getKey().toLowerCase())
.filter(key -> key.length() >= 5)
.sorted()
// this will store the result on cluster as well
// so there is no data movement between client and cluster
.collect(toIList());
Please, find more info about jet here and more examples here.
Cheers,
Vik

While the Hazelcast Jet stream implementation looks impressive, I didn't have a lot of time to invest in looking at upgrading to Hazelcast Jet (in our pretty much bog-standard vert.x setup). Instead I used IMap.executeOnEntries which seems to be doing about the same thing as detailed for Hazelcast Jet by #Vik Gamov, except with a more annoying syntax.
My example:
myMap.executeOnEntries(new EntryProcessor<String,MyEntity>(){
private static final long serialVersionUID = 1L;
#Override
public Object process(Entry<String, MyEntity> entry) {
entry.getValue().fondle();
return null;
}
#Override
public EntryBackupProcessor<String, MyEntity> getBackupProcessor() {
return null;
}});
As you can see, the syntax is quite annoying:
We need to create an actual object, that can be serialized to the cluster - no fancy lambdas here (don't use my serial ID, if you copy&paste this - its broken by design).
One reason it cannot be lambda is that the interface is not functional - you need another method to handle backup copies (or at least to declare that you don't want to handle them, as I do), which while I acknolwedge its importance, its not important all of the time and I would guess that its only important in rare cases.
Obviously you can't (or at least its not trivial) to return data from the process - which is not important in my case, but still.

Sequential execution of async operations in Android

Sequential execution of asynchronous operations in Android is at least complicated.
Sequential execution that used to be a semi-colon between two operators like in do_this(); do_that() now requires chaining listeners, which is ugly and barely readable.
Oddly enough, the examples that demonstrate the need for chaining sequential operations usually look contrived, but today I found a perfectly reasonable one.
In Android there is in-app billing, an application can support multiple so-called in-app products (also known as SKU = stock keeping unit), letting the user, for example, buy (pay for) only the functionality that he/she needs (and, alas, also letting bearded men sell bitmaps to teenagers).
The function that retrieves in-app product info is
public void queryInventoryAsync(final boolean querySkuDetails,
final List<String> moreSkus,
final QueryInventoryFinishedListener listener)
and it has a restriction that the list must contain at most 20 items. (Yes it does.)
Even if only a few of these 20 are registered as in-app products.
I want to retrieve, say, information about one hundred in-app products. The first thought would be to invoke this function in a loop, but only one asynchronous operation with the market is allowed at any moment.
One may of course say "do not reuse, change the source", and even provide very good arguments for that, and this is probably what I will finally do, but I write this because I want see an elegant reuse solution.
Is there an elegant (=not cumbersome) pattern or trick that allows to chain several asynchronous operations in the general case?
(I underline that the asynchronous operation that uses a listener is pre-existing code.)
UPD this is what is called "callback hell" ( http://callbackhell.com/ ) in the JavaScript world.

You can sequence AsyncTasks one after the other by calling the execute() method of the next AsyncTask in the onPostExecute() method of the previous one.

Handlers are useful for sequential work on any thread, not only on the UI thread.
Check out HandlerThread, create a Handler based on its Looper, and post background work to the handler.

It looks like ReactiveX promises exactly this.
http://blog.danlew.net/2014/09/22/grokking-rxjava-part-2/
query("Hello, world!") // Returns a List of website URLs based on a text search
.flatMap(urls -> Observable.from(urls))
.flatMap(url -> getTitle(url)) // long operation
.filter(title -> title != null)
.subscribe(title -> System.out.println(title));
ReactiveX for Android:
https://github.com/ReactiveX/RxAndroid
Retrolambda: https://github.com/orfjackal/retrolambda (Lambdas for Java 5,6,7)

Java List.set() changing list size. Why?

I'm writing android app, which drawing 4 graphs in 1 plot. Graph data is stored in object GraphViewData(x,y). I also have List which contains 4 GraphVieData objects.
I want to give user ability to switch off/on some graphs.
I tried to write myList.set(index, null) to hide graph and then myList.set(index, myObject) to show it again, but every time the List size is changing. So I'm getting IndexOutOfBound exception.
Please, tell me why the List size is changing? Here is List.set() description:
Replaces the element at the specified location in this List with the
specified object. This operation does not change the size of the List.
Code:
public void removeSerie(int id){
graphSeries.set(id, null);
Log.d("CurrentListSize: ", graphSeries.size() + "");
}
public void addSerie(GraphViewData series, int id){
graphSeries.set(id, series);
}

There is a discrepancy between the javadocs for List.set(int, T) between Java and Android. This is worrying, but you should resolve it as following:
The Oracle version of the javadoc is definitive for Java. A List implementation is permitted to change the list size on set.
The Android version of the javadoc should be viewed as incorrect as a source for Java. You could argue that it is correct for Android, but that doesn't really help if you are dealing with code that wasn't specifically written for Android. (For example, code that was written for Java and then compiled for Android.)
The standard List classes in Java and Android won't do this. (Check the source code to be sure, but I'd be extremely surprised if any of them did without the javadocs saying so.)
A custom / 3rd-party List class could do this, and still follow the Java List contract (though arguably not the Android contract).
There are no guarantees that a custom / 3rd-party List class will follow either contract. This renders the whole argument moot ... unless you can persuade the relevant author / supplier to change the relevant list class. (And good luck with that, because it will probably break other peoples' code!!)
So what should you do? I recommend:
If you want to be able to do this with set, make sure that you are using a list class that supports it. Check the code of the list class if necessary.
Otherwise, change your algorithm.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.