Complex custom Collector with Java 8

Complex custom Collector with Java 8 - java

I have a stream of objects which I would like to collect the following way.
Let's say we are handling forum posts:
class Post {
private Date time;
private Data data
}
I want to create a list which groups posts by a period. If there were no posts for X minutes, create a new group.
class PostsGroup{
List<Post> posts = new ArrayList<> ();
}
I want to get a List<PostGroups> containing the posts grouped by the interval.
Example: interval of 10 minutes.
Posts:
[{time:x, data:{}}, {time:x + 3, data:{}} , {time:x + 12, data:{}, {time:x + 45, data:{}}}]
I want to get a list of posts group:
[
{posts : [{time:x, data:{}}, {time:x + 3, data:{}}, {time:x + 12, data:{}]]},
{posts : [{time:x + 45, data:{}]}
]
notice that the first group lasted till X + 22. Then a new post was received at X + 45.
Is this possible?

This problem could be easily solved using the groupRuns method of my StreamEx library:
long MAX_INTERVAL = TimeUnit.MINUTES.toMillis(10);
StreamEx.of(posts)
.groupRuns((p1, p2) -> p2.time.getTime() - p1.time.getTime() <= MAX_INTERVAL)
.map(PostsGroup::new)
.toList();
I assume that you have a constructor
class PostsGroup {
private List<Post> posts;
public PostsGroup(List<Post> posts) {
this.posts = posts;
}
}
The StreamEx.groupRuns method takes a BiPredicate which is applied to two adjacent input elements and returns true if they must be grouped together. This method creates the stream of lists where each list represents the group. This method is lazy and works fine with parallel streams.

You need to retain state between stream entries and write yourself a grouping classifier. Something like this would be a good start.
class Post {
private final long time;
private final String data;
public Post(long time, String data) {
this.time = time;
this.data = data;
}
#Override
public String toString() {
return "Post{" + "time=" + time + ", data=" + data + '}';
}
}
public void test() {
System.out.println("Hello");
long t = 0;
List<Post> posts = Arrays.asList(
new Post(t, "One"),
new Post(t + 1000, "Two"),
new Post(t + 10000, "Three")
);
// Group every 5 seconds.
Map<Long, List<Post>> gouped = posts
.stream()
.collect(Collectors.groupingBy(new ClassifyByTimeBetween(5000)));
gouped.entrySet().stream().forEach((e) -> {
System.out.println(e.getKey() + " -> " + e.getValue());
});
}
class ClassifyByTimeBetween implements Function<Post, Long> {
final long delay;
long currentGroupBy = -1;
long lastDateSeen = -1;
public ClassifyByTimeBetween(long delay) {
this.delay = delay;
}
#Override
public Long apply(Post p) {
if (lastDateSeen >= 0) {
if (p.time > lastDateSeen + delay) {
// Grab this one.
currentGroupBy = p.time;
}
} else {
// First time - start there.
currentGroupBy = p.time;
}
lastDateSeen = p.time;
return currentGroupBy;
}
}

Since no one has provided a solution with a custom collector as it was required in the original problem statement, here is a collector-implementation that groups Post objects based on the provided time-interval.
Date class mentioned in the question is obsolete since Java 8 and not recommended to be used in new projects. Hence, LocalDateTime will be utilized instead.
Post & PostGroup
For testing purposes, I've used Post implemented as a Java 16 record (if you substitute it with a class, the overall solution will be fully compliant with Java 8):
public record Post(LocalDateTime dateTime) {}
Also, I've enhanced the PostGroup object. My idea is that it should be capable to decide whether the offered Post should be added to the list of posts or rejected as the Information expert principle suggests (in short: all manipulations with the data should happen only inside a class to which that data belongs).
To facilitate this functionality, two extra fields were added: interval of type Duration from the java.time package to represent the maximum interval between the earliest post and the latest post in a group, and intervalBound of type LocalDateTime which gets initialized after the first post will be added a later on will be used internally by the method isWithinInterval() to check whether the offered post fits into the interval.
public class PostsGroup {
private Duration interval;
private LocalDateTime intervalBound;
private List<Post> posts = new ArrayList<>();
public PostsGroup(Duration interval) {
this.interval = interval;
}
public boolean tryAdd(Post post) {
if (posts.isEmpty()) {
intervalBound = post.dateTime().plus(interval);
return posts.add(post);
} else if (isWithinInterval(post)) {
return posts.add(post);
}
return false;
}
public boolean isWithinInterval(Post post) {
return post.dateTime().isBefore(intervalBound);
}
#Override
public String toString() {
return "PostsGroup{" + posts + '}';
}
}
I'm making two assumptions:
All posts in the source are sorted by time (if it is not the case, you should introduce sorted() operation in the pipeline before collecting the results);
Posts need to be collected into the minimum number of groups, as a consequence of this it's not possible to split this task and execute stream in parallel.
Building a Custom Collector
We can create a custom collector either inline by using one of the versions of the static method Collector.of() or by defining a class that implements the Collector interface.
These parameters have to be provided while creating a custom collector:
Supplier Supplier<A> is meant to provide a mutable container which store elements of the stream. In this case, ArrayDeque (as an implementation of the Deque interface) will be handy as a container to facilitate the convenient access to the most recently added element, i.e. the latest PostGroup.
Accumulator BiConsumer<A,T> defines how to add elements into the container provided by the supplier. For this task, we need to provide the logic on that will allow determining whether the next element from the stream (i.e. the next Post) should go into the last PostGroup in the Deque, or a new PostGroup needs to be allocated for it.
Combiner BinaryOperator<A> combiner() establishes a rule on how to merge two containers obtained while executing stream in parallel. Since this operation is treated as not parallelizable, the combiner is implemented to throw an AssertionError in case of parallel execution.
Finisher Function<A,R> is meant to produce the final result by transforming the mutable container. The finisher function in the code below turns the container, a deque containing the result, into an immutable list.
Note: Java 16 method toList() is used inside the finisher function, for Java 8 it can be replaced with collect(Collectors.toUnmodifiableList()) or collect(Collectors.toList()).
Characteristics allow providing additional information, for instance Collector.Characteristics.UNORDERED which is used in this case denotes that the order in which partial results of the reduction produced while executing in parallel is not significant. In this case, collector doesn't require any characteristics.
The method below is responsible for generating the collector based on the provided interval.
public static Collector<Post, ?, List<PostsGroup>> groupPostsByInterval(Duration interval) {
return Collector.of(
ArrayDeque::new,
(Deque<PostsGroup> deque, Post post) -> {
if (deque.isEmpty() || !deque.getLast().tryAdd(post)) { // if no groups have been created yet or if adding the post into the most recent group fails
PostsGroup postsGroup = new PostsGroup(interval);
postsGroup.tryAdd(post);
deque.addLast(postsGroup);
}
},
(Deque<PostsGroup> left, Deque<PostsGroup> right) -> { throw new AssertionError("should not be used in parallel"); },
(Deque<PostsGroup> deque) -> deque.stream().collect(Collectors.collectingAndThen(Collectors.toUnmodifiableList())));
}
main() - demo
public static void main(String[] args) {
List<Post> posts =
List.of(new Post(LocalDateTime.of(2022,4,28,15,0)),
new Post(LocalDateTime.of(2022,4,28,15,3)),
new Post(LocalDateTime.of(2022,4,28,15,5)),
new Post(LocalDateTime.of(2022,4,28,15,8)),
new Post(LocalDateTime.of(2022,4,28,15,12)),
new Post(LocalDateTime.of(2022,4,28,15,15)),
new Post(LocalDateTime.of(2022,4,28,15,18)),
new Post(LocalDateTime.of(2022,4,28,15,27)),
new Post(LocalDateTime.of(2022,4,28,15,48)),
new Post(LocalDateTime.of(2022,4,28,15,54)));
Duration interval = Duration.ofMinutes(10);
List<PostsGroup> postsGroups = posts.stream()
.collect(groupPostsByInterval(interval));
postsGroups.forEach(System.out::println);
}
Output:
PostsGroup{[Post[dateTime=2022-04-28T15:00], Post[dateTime=2022-04-28T15:03], Post[dateTime=2022-04-28T15:05], Post[dateTime=2022-04-28T15:08]]}
PostsGroup{[Post[dateTime=2022-04-28T15:12], Post[dateTime=2022-04-28T15:15], Post[dateTime=2022-04-28T15:18]]}
PostsGroup{[Post[dateTime=2022-04-28T15:27]]}
PostsGroup{[Post[dateTime=2022-04-28T15:48], Post[dateTime=2022-04-28T15:54]]}
You can also play around with this Online Demo

Related

How to remove the time measurement logic

I need to calculate the execution time of some methods. These are private methods in the class, so Spring AOP is not appropriate. Now the code looks like this.
public void method() {
StopWatch sw = new StopWatch();
sw.start();
innerMethod1();
sw.stop();
Monitoring.add("eventType1", sw.getLastTaskTimeMillis());
sw.start();
innerMethod2("abs");
sw.stop();
Monitoring.add("eventType2", sw.getLastTaskTimeMillis());
sw.start();
innerMethod3(5, 29);
sw.stop();
Monitoring.add("eventType3", sw.getLastTaskTimeMillis());
}
But inserts with time measurement fit into the business logic. Are there any solutions? These data will be then recorded in the database for grafana. I'm looking towards AspectJ, but I can't pass keys when starting the app.
When class instrumentation is required in environments that do not support or are not supported by the existing LoadTimeWeaver implementations, a JDK agent can be the only solution. For such cases, Spring provides InstrumentationLoadTimeWeaver, which requires a Spring-specific (but very general) VM agent,org.springframework.instrument-{version}.jar (previously named spring-agent.jar).
To use it, you must start the virtual machine with the Spring agent, by supplying the following JVM options:
-javaagent:/path/to/org.springframework.instrument-{version}.jar
to Mark Bramnik
If I understand you correctly, then for methods
private List<String> innerMethod3(int value, int count) {
//
}
private String innerMethod2(String event) {
//
}
need methods
public <T, R, U> U timed(T value, R count, BiFunction<T, R, U> function) {
long start = System.currentTimeMillis();
U result = function.apply(value, count);
Monitoring.add("method", System.currentTimeMillis() - start);
return result;
}
public <T, R> R timed(T value, Function<T, R> function) {
long start = System.currentTimeMillis();
R result = function.apply(value);
Monitoring.add("method", System.currentTimeMillis() - start);
return result;
}
And calling methods:
List<String> timed = timed(5, 5, this::innerMethod3);
String string = timed("string", this::innerMethod2);
But if method4 has 4 parameters, then I need a new method for measuring time and a new functional interface

There are many approaches you can take but all will boil down to refactoring.
Approach 1:
class Timed {
public static void timed(String name, Runnable codeBlock) {
long from = System.currentTimeMillis();
codeBlock.run();
long to = System.currentTimeMillis();
System.out.println("Monitored: " + name + " : " + (to - from) + " ms");
}
public static <T> T timed(String name, Supplier<T> codeBlock) {
long from = System.currentTimeMillis();
T result = codeBlock.get();
long to = System.currentTimeMillis();
System.out.println("Monitored: " + name + " : " + (to - from) + " ms");
return result;
}
}
Notes:
I've used Runnable / Supplier interfaces for simplicity you might want to create your own functional interfaces for this.
I've used System.out - you'll use the existing Monitoring.add call instead
The aforementioned code can be used like this:
Timed.timed("sample.runnable", ()-> { // Timed. can be statically imported for even further brevity
// some code block here
});
// will measure
int result = Timed.timed("sample.callable", () -> 42);
// will measure and result will be 42
Another approach.
Refactor the code to public methods and integrate with Micrometer that already has annotations support (see #Timed).
I don't know what Monitoring is but micrometer already contains both integration with Prometheus (and other similar products that can store the metrics and later on used from grafana) + it keeps in memory the mathematical model of your measurements and doesn't keep in memory the information per each measurement. In the custom implementation its a complicated code to maintain.
Update 1
No, you got it wrong, you don't need to maintain different versions of timed - you need only two versions that I've provided in the solution. In the case that you've presented in the question, you won't even need the second version of timed.
Your code will become:
public void method() {
Timed.timed("eventType1", () -> {
innerMethod1();
});
Timed.timed("eventType2", () -> {
innerMethod2("abs");
});
Timed.timed("eventType3", () -> {
innerMethod3(5, 29);
});
}
The second version is required for the cases where you actually return some value from the "timed" code:
Example:
Lets say you have innerMethod4 that returns String, so you'll write the following code:
String result = Timed.timed("eventType3", () -> {
return innerMethod4(5, 29);
});

Mapping 2 Objects in a Stream to a single 3rd one?

If I have a list of timestamps and a file path of an object that I want to convert, can I make a collection of converters that expect the method signature Converter(filePath, start, end)?
More Detail (Pseuo-Code):
Some list that has timestamps (Imagine they're in seconds) path = somewhere, list = {0, 15, 15, 30},
How can I do something like this:
list.stream.magic.map(start, end -> new Converter (path, start, end))?
Result: new Converter (path, 0, 15), new Converter(path, 15, 30)
Note: I'm aware of BiFunction, but to my knowledge, streams do not implement it.

There are many approaches to get the required result using streams.
But first of all, you're not obliged to use Stream API, and in case of dealing with lists of tens and hundreds elements I would suggest to use plain old list iterations.
Just for instance try the code sample below.
We easily can see the two surface problems arising from the nature of streams and their incompatibility with the very idea of pairing its elements:
it's necessary to apply stateful function which is really tricky for using in map() and should be considered dirty coding; and the mapping produce some nulls on even places that should be filtered out properly;
problems are there when stream contains odd number of elements, and you never can predict if it does.
If you decide to use streams then to make it a clear way we need a custom implementation of Iterator, Spliterator or Collector - depends on demands.
Anyway there are couple of non-obvious corner cases you won't be happy to implement by yourself, so can try tons of third-party stream libs.
Two of the most popular are Streamex and RxJava.
Definitely they have tools for pairing stream elements... but don't forget to check the performance for your case!
import java.util.Objects;
import java.util.function.Function;
import java.util.stream.Stream;
public class Sample
{
public static void main(String... arg)
{
String path = "somewhere";
Stream<Converter> stream = Stream.of(0, 15, 25, 30).map(
new Function<Integer, Converter>()
{
int previous;
boolean even = true;
#Override
public Converter apply(Integer current)
{
Converter converter = even ? null : new Converter(path, previous, current);
even = !even;
previous = current;
return converter;
}
}).filter(Objects::nonNull);
stream.forEach(System.out::println);
}
static class Converter
{
private final String path;
private final int start;
private final int end;
Converter(String path, int start, int end)
{
this.path = path;
this.start = start;
this.end = end;
}
public String toString()
{
return String.format("Converter[%s,%s,%s]", path, start, end);
}
}
}

Akka stream - limiting Flow rate without introducing delay

I'm working with Akka (version 2.4.17) to build an observation Flow in Java (let's say of elements of type <T> to stay generic).
My requirement is that this Flow should be customizable to deliver a maximum number of observations per unit of time as soon as they arrive. For instance, it should be able to deliver at most 2 observations per minute (the first that arrive, the rest can be dropped).
I looked very closely to the Akka documentation, and in particular this page which details the built-in stages and their semantics.
So far, I tried the following approaches.
With throttle and shaping() mode (to not close the stream when the limit is exceeded):
Flow.of(T.class)
.throttle(2,
new FiniteDuration(1, TimeUnit.MINUTES),
0,
ThrottleMode.shaping())
With groupedWith and an intermediary custom method:
final int nbObsMax = 2;
Flow.of(T.class)
.groupedWithin(Integer.MAX_VALUE, new FiniteDuration(1, TimeUnit.MINUTES))
.map(list -> {
List<T> listToTransfer = new ArrayList<>();
for (int i = list.size()-nbObsMax ; i>0 && i<list.size() ; i++) {
listToTransfer.add(new T(list.get(i)));
}
return listToTransfer;
})
.mapConcat(elem -> elem) // Splitting List<T> in a Flow of T objects
Previous approaches give me the correct number of observations per unit of time but these observations are retained and only delivered at the end of the time window (and therefore there is an additional delay).
To give a more concrete example, if the following observations arrives into my Flow:
[Obs1 t=0s] [Obs2 t=45s] [Obs3 t=47s] [Obs4 t=121s] [Obs5 t=122s]
It should only output the following ones as soon as they arrive (processing time can be neglected here):
Window 1: [Obs1 t~0s] [Obs2 t~45s]
Window 2: [Obs4 t~121s] [Obs5 t~122s]
Any help will be appreciated, thanks for reading my first StackOverflow post ;)

I cannot think of a solution out of the box that does what you want. Throttle will emit in a steady stream because of how it is implemented with the bucket model, rather than having a permitted lease at the start of every time period.
To get the exact behavior you are after you would have to create your own custom rate-limit stage (which might not be that hard). You can find the docs on how to create custom stages here: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-customize.html#custom-linear-processing-stages-using-graphstage
One design that could work is having an allowance counter saying how many elements that can be emitted that you reset every interval, for every incoming element you subtract one from the counter and emit, when the allowance used up you keep pulling upstream but discard the elements rather than emit them. Using TimerGraphStageLogic for GraphStageLogic allows you to set a timed callback that can reset the allowance.

I think this is exactly what you need: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-cookbook.html#Globally_limiting_the_rate_of_a_set_of_streams

Thanks to the answer of #johanandren, I've successfully implemented a custom time-based GraphStage that meets my requirements.
I post the code below, if anyone is interested:
import akka.stream.Attributes;
import akka.stream.FlowShape;
import akka.stream.Inlet;
import akka.stream.Outlet;
import akka.stream.stage.*;
import scala.concurrent.duration.FiniteDuration;
public class CustomThrottleGraphStage<A> extends GraphStage<FlowShape<A, A>> {
private final FiniteDuration silencePeriod;
private int nbElemsMax;
public CustomThrottleGraphStage(int nbElemsMax, FiniteDuration silencePeriod) {
this.silencePeriod = silencePeriod;
this.nbElemsMax = nbElemsMax;
}
public final Inlet<A> in = Inlet.create("TimedGate.in");
public final Outlet<A> out = Outlet.create("TimedGate.out");
private final FlowShape<A, A> shape = FlowShape.of(in, out);
#Override
public FlowShape<A, A> shape() {
return shape;
}
#Override
public GraphStageLogic createLogic(Attributes inheritedAttributes) {
return new TimerGraphStageLogic(shape) {
private boolean open = false;
private int countElements = 0;
{
setHandler(in, new AbstractInHandler() {
#Override
public void onPush() throws Exception {
A elem = grab(in);
if (open || countElements >= nbElemsMax) {
pull(in); // we drop all incoming observations since the rate limit has been reached
}
else {
if (countElements == 0) { // we schedule the next instant to reset the observation counter
scheduleOnce("resetCounter", silencePeriod);
}
push(out, elem); // we forward the incoming observation
countElements += 1; // we increment the counter
}
}
});
setHandler(out, new AbstractOutHandler() {
#Override
public void onPull() throws Exception {
pull(in);
}
});
}
#Override
public void onTimer(Object key) {
if (key.equals("resetCounter")) {
open = false;
countElements = 0;
}
}
};
}
}

Java 8 Stream Collectors - Collector to create a Map with objects in multiple buckets

The following code works and is readable but it seems to me I have intermediate operations that feel like they shouldn't be necessary. I've written this simplified version as the actual code is part of a much larger process.
I've got a Collection of Widget, each with a name and multiple types (indicated by constants of the WidgetType enum). These multiple types are gettable as a Stream<WidgetType> though, if necessary, I could return those as some other type. (For various reasons, it is strongly desirable that these be returned as a Stream<WidgetType> because of how these widgets are used later in the actual code.)
These widgets are added to an EnumMap<WidgetType, List<Widget>> which is, later, translated into an EnumMap<WidgetType, Widget[]>.
If each Widget only had a single WidgetType, this would be a trivial solve but, since any Widget could have 1 or more types, I am tripping all over myself with the syntax of the Collectors.groupingBy() method (and its overloads).
Here's the code example, again, fully functional and gives me the exact result I need.
class StackOverFlowExample {
private final Map<WidgetType, Widget[]> widgetMap = new EnumMap<>(WidgetType.class);
public static void main(String[] args) { new StackOverFlowExample(); }
StackOverFlowExample() {
Collection<Widget> widgetList = getWidgetsFromWhereverWidgetsComeFrom();
{
final Map<WidgetType, List<Widget>> intermediateMap = new EnumMap<>(WidgetType.class);
widgetList.forEach(w ->
w.getWidgetTypes().forEach(wt -> {
intermediateMap.putIfAbsent(wt, new ArrayList<>());
intermediateMap.get(wt).add(w);
})
);
intermediateMap.entrySet().forEach(e -> widgetMap.put(e.getKey(), e.getValue().toArray(new Widget[0])));
}
Arrays.stream(WidgetType.values()).forEach(wt -> System.out.println(wt + ": " + Arrays.toString(widgetMap.get(wt))));
}
private Collection<Widget> getWidgetsFromWhereverWidgetsComeFrom() {
return Arrays.asList(
new Widget("1st", WidgetType.TYPE_A, WidgetType.TYPE_B),
new Widget("2nd", WidgetType.TYPE_A, WidgetType.TYPE_C),
new Widget("3rd", WidgetType.TYPE_A, WidgetType.TYPE_D),
new Widget("4th", WidgetType.TYPE_C, WidgetType.TYPE_D)
);
}
}
This outputs:
TYPE_A: [1st, 2nd, 3rd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]
For completeness sake, here's the Widget class and the WidgetType enum:
class Widget {
private final String name;
private final WidgetType[] widgetTypes;
Widget(String n, WidgetType ... wt) { name = n; widgetTypes = wt; }
public String getName() { return name; }
public Stream<WidgetType> getWidgetTypes() { return Arrays.stream(widgetTypes).distinct(); }
#Override public String toString() { return name; }
}
enum WidgetType { TYPE_A, TYPE_B, TYPE_C, TYPE_D }
Any ideas on a better way to execute this logic are welcome. Thanks!

IMHO, the key is to convert a Widget instance to a Stream<Pair<WidgetType, Widget>> instance. Once we have that, we can flatMap a stream of widgets and collect on the resulting stream. Of course we don't have Pair in Java, so have to use AbstractMap.SimpleEntry instead.
widgets.stream()
// Convert a stream of widgets to a stream of (type, widget)
.flatMap(w -> w.getTypes().map(t->new AbstractMap.SimpleEntry<>(t, w)))
// Grouping by the key, and do additional mapping to get the widget
.collect(groupingBy(e->e.getKey(),
mapping(e->e.getValue,
collectingAndThen(toList(), l->l.toArray(new Widget[0])))));
P.S. this is an occasion where IntelliJ's suggestion doesn't shorten a lambda with method reference.

This is a bit convoluted, but it produces the same output, not necessarily in the same order. It uses a static import of java.util.stream.Collectors.*.
widgetMap = widgetList.stream()
.flatMap(w -> w.getWidgetTypes().map(t -> new AbstractMap.SimpleEntry<>(t, w)))
.collect(groupingBy(Map.Entry::getKey, collectingAndThen(mapping(Map.Entry::getValue, toSet()), s -> s.stream().toArray(Widget[]::new))));
Output on my machine:
TYPE_A: [1st, 3rd, 2nd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]

RxJava: Find out if BehaviorSubject was a repeated value or not

I'm making an Android interface that shows some data fetched from the network. I want to have it show the latest available data, and to never be empty (unless no data has been fetched at all yet) so I'm using a BehaviorSubject to give subscribers (my UI) the latest available info, while refreshing it in the background to update it.
This works, but due to another requirement in my UI, I now have to know whether or not the published result was gotten fresh from the network or not. (In other words, I need to know if the published result was BehaviorSubject's saved item or not.)
How can I achieve this? If I need to split it up into multiple Observables, that's fine, as long as I'm able to get the caching behavior of BehaviorSubject (getting the last available result) while also being able to tell if the result returned was from the cache or not. A hacky way I can think of to do it would be to check if the timestamp of the response was relatively soon, but that'd be really sloppy and I'd rather figure out a way to do it with RxJava.

As you mentioned in the question, this can be accomplished with multiple Observables. In essence, you have two Observables: "the fresh response can be observed", and "the cached response can be observed". If something can be "observed", you can express it as an Observable. Let's name the first one original and the second replayed.
See this JSBin (JavaScript but the concepts can be directly translated to Java. There isn't a JavaBin as far as I know, for these purposes).
var original = Rx.Observable.interval(1000)
.map(function (x) { return {value: x, from: 'original'}; })
.take(4)
.publish().refCount();
var replayed = original
.map(function (x) { return {value: x.value, from: 'replayed'}; })
.replay(null, 1).refCount();
var merged = Rx.Observable.merge(original, replayed)
.replay(null, 1).refCount()
.distinctUntilChanged(function (obj) { return obj.value; });
console.log('subscribe 1st');
merged.subscribe(function (x) {
console.log('subscriber1: value ' + x.value + ', from: ' + x.from);
});
setTimeout(function () {
console.log(' subscribe 2nd');
merged.subscribe(function (x) {
console.log(' subscriber2: value ' + x.value + ', from: ' + x.from);
});
}, 2500);
The overall idea here is: annotate the event with a field from indicating its origin. If it's original, it's a fresh response. If it's replayed, it's a cached response. Observable original will only emit from: 'original' and Observable replayed will only emit from: 'replayed'. In Java we would require a bit more boilerplate because you need to make a class to represent these annotated events. Otherwise the same operators in RxJS can be found in RxJava.
The original Observable is publish().refCount() because we want only one instance of this stream, to be shared with all observers. In fact in RxJS and Rx.NET, share() is an alias for publish().refCount().
The replayed Observable is replay(1).refCount() because it is also shared just like the original one is, but replay(1) gives us the caching behavior.
merged Observable contains both original and replayed, and this is what you should expose to all subscribers. Since replayed will immediately emit whenever original does, we use distinctUntilChanged on the event's value to ignore immediate consecutives. The reason we replay(1).refCount() also the merged is because we want the merge of original and replay also to be one single shared instance of a stream shared among all observers. We would have used publish().refCount() for this purpose, but we cannot lose the replay effect that replayed contains, hence it's replay(1).refCount(), not publish().refCount().

Doesn't Distinct cover your case? BehaviorSubject only repeats the latest element after subscription.

I believe what you want is something like this:
private final BehaviorSubject<T> fetched = BehaviorSubject.create();
private final Observable<FirstTime<T>> _fetched = fetched.lift(new Observable.Operator<FirstTime<T>, T>() {
private AtomicReference<T> last = new AtomicReference<>();
#Override
public Subscriber<? super T> call(Subscriber<? super FirstTime<T>> child) {
return new Subscriber<T>(child) {
#Override
public void onCompleted() {
child.onCompleted();
}
#Override
public void onError(Throwable e) {
child.onError(e);
}
#Override
public void onNext(T t) {
if (!Objects.equals(t, last.getAndSet(t))) {
child.onNext(FirstTime.yes(t));
} else {
child.onNext(FirstTime.no(t));
}
}
};
}
});
public Observable<FirstTime<T>> getObservable() {
return _fetched;
}
public static class FirstTime<T> {
final boolean isItTheFirstTime;
final T value;
public FirstTime(boolean isItTheFirstTime, T value) {
this.isItTheFirstTime = isItTheFirstTime;
this.value = value;
}
public boolean isItTheFirstTime() {
return isItTheFirstTime;
}
public T getValue() {
return value;
}
public static <T> FirstTime<T> yes(T value) {
return new FirstTime<>(true, value);
}
public static <T> FirstTime<T> no(T value) {
return new FirstTime<>(false, value);
}
}
The wrapper class FirstTime has a boolean which can be used to see if any subscriber to the Observable has seen it before.
Hope that helps.

Store the information of BehaviorSubject objects in a data structure with a good lookup such as a Dictionnary. Each value would be a key and the value would be the number of iteration.
There so, when you look at a particulary key, if your dictionnary contains it already and its value is already at one, then you know that a value is a repeated value.

I'm not really sure what you want to achieve. Probably you'd just like to have a smart source for the "latest" data and a second source which tells you when the data was refreshed?
BehaviorSubject<Integer> dataSubject = BehaviorSubject.create(42); // initial value, "never empty"
Observable<String> refreshedIndicator = dataSubject.map(data -> "Refreshed!");
refreshedIndicator.subscribe(System.out::println);
Observable<Integer> latestActualData = dataSubject.distinctUntilChanged();
latestActualData.subscribe( data -> System.out.println( "Got new data: " + data));
// simulation of background activity:
Observable.interval(1, TimeUnit.SECONDS)
.limit(100)
.toBlocking()
.subscribe(aLong -> dataSubject.onNext(ThreadLocalRandom.current().nextInt(2)));
Output:
Refreshed!
Got new data: 42
Refreshed!
Got new data: 0
Refreshed!
Refreshed!
Refreshed!
Got new data: 1
Refreshed!
Got new data: 0
Refreshed!
Got new data: 1

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Complex custom Collector with Java 8 - java

Related

How to remove the time measurement logic

Mapping 2 Objects in a Stream to a single 3rd one?

Akka stream - limiting Flow rate without introducing delay

Java 8 Stream Collectors - Collector to create a Map with objects in multiple buckets

RxJava: Find out if BehaviorSubject was a repeated value or not

Categories

Resources