Optimising multiple streams to single loop

Optimising multiple streams to single loop - java

I am trying to find the best way to optimise the converters below to follow the flow I call 'convertAndGroupForUpdate' first which triggers the conversions and relevant mappings.
Any help to optimise this code would be massively appreciated.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(List<SimpleRatifiableAction> actions) {
List<GroupedOrderActionUpdateEntity> groupedActions = new ArrayList<>();
Map<String, List<SimpleRatifiableAction>> groupSimple = actions.stream()
.collect(Collectors.groupingBy(x -> x.getOrderNumber() + x.getActionType()));
groupSimple.entrySet().stream()
.map(x -> convertToUpdateGroup(x.getValue()))
.forEachOrdered(groupedActions::add);
return groupedActions;
}
public GroupedOrderActionUpdateEntity convertToUpdateGroup(List<SimpleRatifiableAction> actions) {
List<OrderActionUpdateEntity> actionList = actions.stream().map(x -> convertToUpdateEntity(x)).collect(Collectors.toList());
return new GroupedOrderActionUpdateEntity(
actions.get(0).getOrderNumber(),
OrderActionType.valueOf(actions.get(0).getActionType()),
actions.get(0).getSource(),
12345,
actions.stream().map(SimpleRatifiableAction::getNote)
.collect(Collectors.joining(", ", "Group Order Note: ", ".")),
actionList);
}
public OrderActionUpdateEntity convertToUpdateEntity(SimpleRatifiableAction action) {
return new OrderActionUpdateEntity(action.getId(), OrderActionState.valueOf(action.getState()));
}

You can’t elide a grouping operation, but you don’t need to store the intermediate result in a local variable.
Further, you should not add to a list manually, when you can collect to a List. Just do it like you did in the other method.
Also, creating a grouping key via string concatenation is tempting, but very dangerous, depending on the contents of the properties, the resulting strings may clash. And string concatenation is rather expensive. Just create a list of the property values, as long as you don’t modify it, it provides the right equality semantics and hash code implementation.
If you want to process the values of a map only, don’t call entrySet(), to map each entry via getValue(). Just use values() in the first place.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(
List<SimpleRatifiableAction> actions) {
return actions.stream()
.collect(Collectors.groupingBy( // use List.of(…, …) in Java 9 or newer
x -> Arrays.asList(x.getOrderNumber(), x.getActionType())))
.values().stream()
.map(x -> convertToUpdateGroup(x))
.collect(Collectors.toList());
}
Since convertToUpdateGroup is processing the list of actions of each group multiple times, there is not much that can be simplified and I wouldn’t inline it either. If there was only one operation, e.g. joining them to a string, you could do that right in the groupingBy operation, but there is no simply way to collect to multiple results.

Related

Accumulating value of objects when carrying the same timestamp

I am currently stuck on this:
I have datapoints that carry a value and a timestamp as a Long (epoch seconds):
public class MyDataPoint(){
private Float value;
private Long timestamp;
//constructor, getters and setters here
}
I have lists that are bound to different sources where these datapoints are coming from.
public class MySource(){
private Interger sourceId;
private List<MyDataPoint> dataPointList;
//constructor, getters and setters here
}
Now I want to accumulate these datapoints in a new list:
each datapoint with the same timestamp should be accumulated in a new datapoint with the sum of the value of each datapoint that carries the same timestamp.
So for instance I have 3 datapoints with the same timestamp, I want to create one datapoint with the timestamp, and the sum of the three values.
However, these datapoints have not started or ended recording at the same time. And for one timestamp maybe only one datapoint exists.
For now I have stuffed all of the datapoints into one list, thinking I could use streams to achieve my goal, but I can't figure it out. Maybe this is the wrong way anyway because I can't see how to use filters or maps to do this.
I have thought about using Optionals since for one timestamp maybe only one exists, but there is no obvious answer for me.
Anyone able to help me out?

I am guessing that you are trying to grouping the value you in the list, then convert it to new list using stream. What i suggest is using Collectors.groupingBy and Collectors.summingInt to convert your List to a Map<Long,Double> first - which holding your timestamp as key and Double as sum of all value that has same timestamp. After this you can convert this map back to the new list.
Not tested yet but to convert your List to Map<Long, Double> should be something like:
dataPointList.stream().collect(Collectors.groupingBy(d -> d.timestamp, Collectors.summingDouble(d -> d.value))); //you can using method reference for better readability

Following assumes your DataPoint is immutable (you cannot use the same instance to accumulate into) so uses an intermediate Map.
Collection<DataPoint> summary = sources.stream()
.flatMap(source -> source.dataPointList.stream()) // smush sources into a single stream of points
.collect(groupingBy(p -> p.timestamp, summingDouble(p -> (double)p.value))) // Collect points into Map<Long, Double>
.entrySet().stream() // New stream, the entries of the Map
.map(e -> new MyDataPoint(e.getKey(), e.getValue()))
.collect(toList());
Another solution avoids the potentially large intermediate Map by collecting directly into a DataPoint.
public static DataPoint combine(DataPoint left, DataPoint right) {
return new DataPoint(left.timestamp, left.value + right.value); // return new if immutable or increase left if not
}
Collection<DataPoint> summary = sources.stream()
.flatMap(source -> source.dataPointList.stream()) // smush into a single stream of points
.collect(groupingBy(p -> p.timestamp, reducing(DataPoint.ZERO, DataPoint::combine))) // Collect all values into Map<Long, DataPoint>
.values();
This can be upgraded to parallelStream() if DataPoint is threadsafe etc

I think the "big picture" solution it's quite easy even if I can predict some multithread issues to complicate all.
In pure Java, you need simply a Map:
Map<Long,List<MyDataPoint>> dataPoints = new HashMap<>();
just use Timestamp as KEY.
For the sake of OOP, Let's create a class like DataPointCollector
public class DataPointCollector {
private Map<Long,List<MyDataPoint>> dataPoints = new HashMap<>();
}
To add element, create a method in DataPointCollector like:
public void addDataPoint(MyDataPoint dp){
if (dataPoints.get(dp.getTimestamp()) == null){
dataPoints.put(dp.getTimestamp(), new ArrayList<MyDataPoint>());
}
dataPoints.get(dp.getTimestamp()).add(dp);
}
This solve most of your theorical problems.
To get the sum, just iterate over the List and sum the values.
If you need a realtime sum, just wrap the List in another object that has totalValue and List<MyDataPoint> as fields and update totalValue on each invokation of addDataPoint(...).
About streams: streams depends by use cases, if in a certain time you have all the DataPoints you need, of course you can use Streams to do things... however streams are often expensive for common cases and I think it's better to focus on an easy solution and then make it cool with streams only if needed

Combine multiple `Collectors::groupBy` functions with Java Streams

I have a problem with correctly combining multiple Collectors::groupingBy functions and then applying them all at once to a given input.
Let's say I have some class implementing following interface:
interface Something {
String get1();
String get2();
String get3();
String get4();
}
And now I can have some list of combinations of the methods from this interface, i.e. these lists can be:
[Something::get1, Something::get3], [Something::get2, Something::get1, Something::get3].
Now, having such a list of methods and having a list of somethings, I would like to group those somethings by getters.
What I mean is that for example for the list [Something::get1, Something::get3] and a list [Something1, Something2, ...] I want to get the list of somethings grouped firstly by get1 and then by get2.
This can be achieved this way:
var lst = List.of(smth1, smth2, smth3);
lst.stream()
.collect(Collectors.groupingBy(Something::get1, Collectors.groupingBy(Something::get3)))
What if I have any arbitrary list of methods that I would like to apply to grouping?
I was thinking of something like this (ofc. this does not work, but you will get the idea):
Assume that List<Function<Something, String>> groupingFunctions is our list of methods we want to apply to grouping.
var collector = groupingFunctions.stream()
.reduce((f1, f2) -> Collectors.groupingBy(f1, Collectors.groupingBy(f2)))
and then
List.of(smth1, smth2, smth3).stream().collect(collector)
But this approach does not work. How to achieve the result I am thinking of?

You can do this:
public static Collector createCollector(Function<A, String>... groupKeys) {
Collector collector = Collectors.toList();
for (int i = groupKeys.length - 1; i >= 0; i--) {
collector = Collectors.groupingBy(groupKeys[i], collector);
}
return collector;
}
This give you a raw collector, hence your stream result after grouping is also raw.
Collector collector = createCollector(Something::get1, Something::get2);
You can use this collector like this:
Object result = somethingList.stream().collect(collector);
Because you know how many groupingBy you passed to the collector, you can cast it to appropriate Map result. In this case two groupingBy is applied:
Map<String, Map<String, List<Something>>> mapResult = (Map<String, Map<String, List<Something>>>) result

Since you don’t know how many functions are in the list, you can’t declare a compile-time type reflecting the nesting. But even when using a collector type producing some unknown result type, composing it is not solvable in the clean functional way you intend. The closest you can get is
var collector = groupingFunctions.stream()
.<Collector<Something,?,?>>reduce(
Collectors.toList(),
(c,f) -> Collectors.groupingBy(f, c),
(c1,c2) -> { throw new UnsupportedOperationException("can't handle that"); });
which has two fundamental problems. There is no way to provide a valid merge function for two Collector instances, so while this may work with a sequential operation, it is not a clean solution. Further, the nesting of the result maps will be in the opposite order; the last function of the list will provide the keys to the outermost map.
There might be ways to fix that, but all of them will make the code even more complicated. Compare this with a straight-forward loop:
Collector<Something,?,?> collector = Collectors.toList();
for(var i = groupingFunctions.listIterator(groupingFunctions.size()); i.hasPrevious(); )
collector = Collectors.groupingBy(i.previous(), collector);
You can use the collector like
Object o = lst.stream().collect(collector);
but need instanceof and type casts to process the Maps…
It would be cleaner to create a single, non-nested Map with List keys which reflect the grouping functions:
Map<List<String>,List<Something>> map = lst.stream().collect(Collectors.groupingBy(
o -> groupingFunctions.stream().map(f -> f.apply(o))
.collect(Collectors.toUnmodifiableList())));
It would allow querying entries like map.get(List.of(arguments, matching, grouping, functions))

type of map() and filter() operations of Optional

Are the map() and filter() of Optional are lazy like Stream?
How can I confirm their type?

There is a fundamental difference between a Stream and an Optional.
A Stream encapsulates an entire processing pipeline, gathering all operations before doing anything. This allows the implementation to pick up different processing strategies, depending on what result is actually requested. This also allows to insert modifiers like unordered() or parallel() into the chain, as at this point, nothing has been done so far, so we can alter the behavior of the subsequent actual processing.
An extreme example is Stream.of(1, 2, 3).map(function).count(), which will not process function at all in Java 9, as the invariant result of 3 can be determined without.
In contrast, an Optional is just a wrapper around a value (if not empty). Each operation will be performed immediately, to return either, a new Optional encapsulating the new value or an empty Optional. In Java 8, all methods returning an Optional, i.e.map, flatMap or filter, will just return an empty optional when being applied to an empty optional, so when chaining them, the empty optional becomes a kind of dead-end.
But Java 9 will introduce Optional<T> or(Supplier<? extends Optional<? extends T>>), which may return a non-empty optional from the supplier when being applied to an empty optional.
Since an Optional represents a (possibly absent) value, rather than a processing pipeline, you can query the same Optional as many times you want, whether the query returns a new Optional or a final value.
It’s easy to verify. The following code
Optional<String> first=Optional.of("abc");
Optional<String> second=first.map(s -> {
System.out.println("Running map");
return s + "def";
});
System.out.println("starting queries");
System.out.println("first: "+(first.isPresent()? "has value": "is empty"));
System.out.println("second: "+(second.isPresent()? "has value": "is empty"));
second.map("second's value: "::concat).ifPresent(System.out::println);
will print
Running map
starting queries
first: has value
second: has value
second's value: abcdef
demonstrating that the mapping function is evaluated immediately, before any other query, and that we still can query the first optional after we created a second via map and query optionals multiple times.
In fact, it is strongly recommended to check via isPresent() first, before calling get().
There is no equivalent stream code, as re-using Stream instances this way is invalid. But we can show that intermediate operations are not performed before the terminal operation has been commenced:
Stream<String> stream=Stream.of("abc").map(s -> {
System.out.println("Running map");
return s + "def";
});
System.out.println("starting query");
Optional<String> result = stream.findAny();
System.out.println("result "+(result.isPresent()? "has value": "is empty"));
result.map("result value: "::concat).ifPresent(System.out::println);
will print
starting query
Running map
result has value
result value: abcdef
showing that the mapping function is not evaluated before the terminal operation findAny() starts. Since we can’t query a stream multiple times, findAny() even uses Optional as return value, which allows us to do that with the final result.
There are other semantic differences between the operations of the same name, e.g. Optional.map will return an empty Optional if the mapping function evaluates to null. For a stream, it makes no difference whether the function passed to map returns null or a non-null value (that’s why we can count the elements without knowing whether it does).

String r = Optional.of("abc")
.map(s -> {
System.out.println("Running map");
return s + "def";
})
.filter(s -> {
System.out.println("First Filter");
return s.equals("abcdef");
})
.map(s -> {
System.out.println("mapping");
return s + "jkl";
})
.orElse("done");
System.out.println(r);
Running this will produce:
Running map, First Filter, mapping abcdefjkl
On the other hand running this:
String r = Optional.of("mnt") //changed
.map(s -> {
System.out.println("Running map");
return s + "def";
})
.filter(s -> {
System.out.println("First Filter");
return s.equals("abcdef");
})
.map(s -> {
System.out.println("mapping");
return s + "jkl";
})
.orElse("done");
Running map, First Filter, done
I always thought that since the map is only execute based on the previous filter, this would be considered lazy. It turns out this is not true:
Optional.of("s").map(String::toUpperCase)
Stream.of("test").map(String::toUpperCase)
The map from Optional will get executed; while the one from Stream will not.
EDIT
go and up-vote the other answer here. This is edited because of the other one.

How to use Stream instead for each loop

I have a loop which update an String object:
String result = "";
for (SomeObject obj: someObjectList) {
result = someMetohd(obj, result);
}
An implementation of someMethod is irrelevant:
private String someMethod(SomeObject obj, String result) {
result = result.concat(obj.toString());
return result;
}
And I want to use Stream instead a loop. How to implement it with Stream?

#SuppressWarnings("OptionalGetWithoutIsPresent")
String result = Stream.concat(Stream.of(""), someObjectList.stream())
.reduce(this::someMethod)
.get();
Your someMethod should be associative as specified in the documentation, however this is only important for parallel streams, while your code is explicitly sequential
As you always add to the result, you can consider it a first element of the stream and then use reduce method which will always merge first two elements - current result and next element
result has to be the first parameter of your someMethod
Because all elements in the stream have to be of the same type, while you have String result and SomeObject elements, you need to change the signature of someMethod to accept two Objects (and do the casts inside the method): private String someMethod(Object result, Object obj). This is the most ugly part of this solution.
You can inline the initial value of the result - no need to define result upfront
You might want to change this::someMethod depending on where this method is declared
Finally, you don't need to worry about handling Optional result, because the stream always has at least one element so it's safe to just call get()

final StringBuilder resultBuilder = new StringBuilder();
someObjectList.stream().map(SomeObject::toString).forEach(resultBuilder::append);
final String result = resultBuilder.toString();
To know more about Streams, you can check this page: http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/, I think it's very helpful.

Although the functional equivalent of what you're trying to achieve here is possible with streams, it's worth reminding you that functional and iterative ways of thinking are not necessarily compatible.
Generally you think of each element on its own, and you don't have visibility over other elements, unless you're using a special function like reduce.
Here's something that does what you've asked for:
final List<Object> objectList = Arrays.asList("a", "b", "c", "d");
String concatString = objectList.stream()
.map(e -> e.toString())
.reduce((result, element) -> result.concat(e))
.get();
Map turns the entire stream into a list, but with the toString function called separately on every element. Reduce is more complex. It can be described as an accumulation. It executes a function between the result, and the current element. In this case, it takes the first element, and concatenates it to the second. It then takes the first/second concatenation, and applies the same function to the third. And so on.
Instead of dealing with lambdas, you can also pass in methods directly, to tighten up your code a bit:
String result = objectList.stream()
.map(Object::toString)
.reduce(String::concat)
.get();

What is the best way to aggregate Streams into one DISTINCT with Java 8

Suppose i have multiple java 8 streams that each stream potentially can be converted into Set<AppStory> , now I want with the best performance to aggregate all streams into one DISTINCT stream by ID , sorted by property ("lastUpdate")
There are several ways to do what but i want the fastest one , for example:
Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true).
map(storyId1 -> vertexToStory1(storyId1).collect(toSet());
Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true).
map(storyId2 -> vertexToStory2(storyId1).collect(toSet());
Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true).
map(storyId3 -> vertexToStory3(storyId3).collect(toSet());
Set<AppStory> set = new HashSet<>();
set.addAll(appStr1)
set.addAll(appStr2)
set.addAll(appStr3) , and than make sort by "lastUpdate"..
//POJO Object:
public class AppStory implements Comparable<AppStory> {
private String storyId;
private String ........... many other attributes......
public String getStoryId() {
return storyId;
}
#Override
public int compareTo(AppStory o) {
return this.getStoryId().compareTo(o.getStoryId());
}
}
... but it is the old way.
How can I create ONE DISTINCT by ID sorted stream with BEST PERFORMANCE
somethink like :
Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet())
Any Ideas ?
BR
Vitaly

I think the parallel overhead is much greater than the actual work as you stated in the comments. So let your Streams do the job in sequential manner.
FYI: You should prefer using Stream::concat because slicing operations like Stream::limit can be bypassed by Stream::flatMap.
Stream::sorted is collecting every element in the Stream into a List, sort the List and then pushing the elements in the desired order down the pipeline. Then the elements are collected again. So this can be avoided by collecting the elements into a List and do the sorting afterwards. Using a List is a far better choice than using a Set because the order matters (I know there is a LinkedHashSet but you can't sort it).
This is the in my opinion the cleanest and maybe the fastest solution since we cannot prove it.
Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false)
.map(this::vertexToStory1);
Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false)
.map(this::vertexToStory2);
Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false)
.map(this::vertexToStory3);
List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3)
.distinct().collect(Collectors.toList());
// assuming AppStory::getLastUpdateTime is of type `long`
stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime));

I can't guarantee that this would be faster than what you have (I guess so, but you'll have to measure to be sure), but you can simply do this, assuming you have 3 streams:
List<AppStory> distinctSortedAppStories =
Stream.of(stream1, stream2, stream3)
.flatMap(Function.identity())
.map(this::vertexToStory)
.distinct()
.sorted(Comparator.comparing(AppStory::getLastUpdate))
.collect(Collectors.toList());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Optimising multiple streams to single loop - java

Related

Accumulating value of objects when carrying the same timestamp

Combine multiple `Collectors::groupBy` functions with Java Streams

type of map() and filter() operations of Optional

How to use Stream instead for each loop

What is the best way to aggregate Streams into one DISTINCT with Java 8

Categories

Resources