Combine multiple `Collectors::groupBy` functions with Java Streams - java

I have a problem with correctly combining multiple Collectors::groupingBy functions and then applying them all at once to a given input.
Let's say I have some class implementing following interface:
interface Something {
String get1();
String get2();
String get3();
String get4();
}
And now I can have some list of combinations of the methods from this interface, i.e. these lists can be:
[Something::get1, Something::get3], [Something::get2, Something::get1, Something::get3].
Now, having such a list of methods and having a list of somethings, I would like to group those somethings by getters.
What I mean is that for example for the list [Something::get1, Something::get3] and a list [Something1, Something2, ...] I want to get the list of somethings grouped firstly by get1 and then by get2.
This can be achieved this way:
var lst = List.of(smth1, smth2, smth3);
lst.stream()
.collect(Collectors.groupingBy(Something::get1, Collectors.groupingBy(Something::get3)))
What if I have any arbitrary list of methods that I would like to apply to grouping?
I was thinking of something like this (ofc. this does not work, but you will get the idea):
Assume that List<Function<Something, String>> groupingFunctions is our list of methods we want to apply to grouping.
var collector = groupingFunctions.stream()
.reduce((f1, f2) -> Collectors.groupingBy(f1, Collectors.groupingBy(f2)))
and then
List.of(smth1, smth2, smth3).stream().collect(collector)
But this approach does not work. How to achieve the result I am thinking of?

You can do this:
public static Collector createCollector(Function<A, String>... groupKeys) {
Collector collector = Collectors.toList();
for (int i = groupKeys.length - 1; i >= 0; i--) {
collector = Collectors.groupingBy(groupKeys[i], collector);
}
return collector;
}
This give you a raw collector, hence your stream result after grouping is also raw.
Collector collector = createCollector(Something::get1, Something::get2);
You can use this collector like this:
Object result = somethingList.stream().collect(collector);
Because you know how many groupingBy you passed to the collector, you can cast it to appropriate Map result. In this case two groupingBy is applied:
Map<String, Map<String, List<Something>>> mapResult = (Map<String, Map<String, List<Something>>>) result

Since you don’t know how many functions are in the list, you can’t declare a compile-time type reflecting the nesting. But even when using a collector type producing some unknown result type, composing it is not solvable in the clean functional way you intend. The closest you can get is
var collector = groupingFunctions.stream()
.<Collector<Something,?,?>>reduce(
Collectors.toList(),
(c,f) -> Collectors.groupingBy(f, c),
(c1,c2) -> { throw new UnsupportedOperationException("can't handle that"); });
which has two fundamental problems. There is no way to provide a valid merge function for two Collector instances, so while this may work with a sequential operation, it is not a clean solution. Further, the nesting of the result maps will be in the opposite order; the last function of the list will provide the keys to the outermost map.
There might be ways to fix that, but all of them will make the code even more complicated. Compare this with a straight-forward loop:
Collector<Something,?,?> collector = Collectors.toList();
for(var i = groupingFunctions.listIterator(groupingFunctions.size()); i.hasPrevious(); )
collector = Collectors.groupingBy(i.previous(), collector);
You can use the collector like
Object o = lst.stream().collect(collector);
but need instanceof and type casts to process the Maps…
It would be cleaner to create a single, non-nested Map with List keys which reflect the grouping functions:
Map<List<String>,List<Something>> map = lst.stream().collect(Collectors.groupingBy(
o -> groupingFunctions.stream().map(f -> f.apply(o))
.collect(Collectors.toUnmodifiableList())));
It would allow querying entries like map.get(List.of(arguments, matching, grouping, functions))

Related

How to count unique values ​in a map that are complex objects

I have the following sample structure:
class MyObject {
private String type;
private String level;
}
Map<String,List<MyObject>> map = new HashMap<>();
MyObject myObject1 = new MyObject();
myObject1.setType("x");
myObject1.setLevel("5");
MyObject myObject2 = new MyObject();
myObject2.setType("y");
myObject2.setLevel("5");
List<MyObject> list1 = new ArrayList<>();
list1.add(myObject1);
list1.add(myObject2);
map.put("1",list1);
MyObject myObject3 = new MyObject();
myObject3.setType("x");
myObject3.setLevel("4");
List<MyObject> list2 = new ArrayList<>();
list2.add(myObject3);
map.put("2",list2);
MyObject myObject4 = new MyObject();
myObject4.setType("x");
myObject4.setLevel("5");
MyObject myObject5 = new MyObject();
myObject5.setType("y");
myObject5.setLevel("5");
List<MyObject> list3 = new ArrayList<>();
list3.add(myObject4);
list3.add(myObject5);
map.put("3",list3);
...
Based on this map, I need to create an intermediate object or some structure where I will store information about the unique values ​​of the map. In the example above, key 1 and key 3 are the same value
so I need to store the information that the combination x = 5, y = 5 occurred twice in the map. The combination x = 4 appeared once in the map. There can be many combinations.
Any suggestions on how to do it the easiest way?
Since this looks like a homework question asking how to generally do the task I will not include code.
Think through what you have to do, write methods you'll need, implement them when you think you have all the pieces you need. Start with a stub for the method that does what you want.
The thing you can have duplicates of in the map (why are they in the map, no idea) are lists. Write a method that compares lists and returns whether they are the same.
To write that method you need a method that can compare MyObject. Best way would be to override equals() method.
Next, it'll be a question if order in the lists matters. If yes, than List equals method will work for you (read the javadoc to see exactly what it does). If not you'll need to write custom code to handle that, or sort the lists before comparison (which would involve writing a comparator for MyObject), or use a library that has that functionality (there should be something in Apache Commons).
Now that we have all that all we come back to the main method, use the ones we wrote appropriately, and all we need is do something with the results. Generally anything will do, a map with the list as key and amount of occurences as value will be simplest unless you have some more constraints or operations to do on the results.

Optimising multiple streams to single loop

I am trying to find the best way to optimise the converters below to follow the flow I call 'convertAndGroupForUpdate' first which triggers the conversions and relevant mappings.
Any help to optimise this code would be massively appreciated.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(List<SimpleRatifiableAction> actions) {
List<GroupedOrderActionUpdateEntity> groupedActions = new ArrayList<>();
Map<String, List<SimpleRatifiableAction>> groupSimple = actions.stream()
.collect(Collectors.groupingBy(x -> x.getOrderNumber() + x.getActionType()));
groupSimple.entrySet().stream()
.map(x -> convertToUpdateGroup(x.getValue()))
.forEachOrdered(groupedActions::add);
return groupedActions;
}
public GroupedOrderActionUpdateEntity convertToUpdateGroup(List<SimpleRatifiableAction> actions) {
List<OrderActionUpdateEntity> actionList = actions.stream().map(x -> convertToUpdateEntity(x)).collect(Collectors.toList());
return new GroupedOrderActionUpdateEntity(
actions.get(0).getOrderNumber(),
OrderActionType.valueOf(actions.get(0).getActionType()),
actions.get(0).getSource(),
12345,
actions.stream().map(SimpleRatifiableAction::getNote)
.collect(Collectors.joining(", ", "Group Order Note: ", ".")),
actionList);
}
public OrderActionUpdateEntity convertToUpdateEntity(SimpleRatifiableAction action) {
return new OrderActionUpdateEntity(action.getId(), OrderActionState.valueOf(action.getState()));
}
You can’t elide a grouping operation, but you don’t need to store the intermediate result in a local variable.
Further, you should not add to a list manually, when you can collect to a List. Just do it like you did in the other method.
Also, creating a grouping key via string concatenation is tempting, but very dangerous, depending on the contents of the properties, the resulting strings may clash. And string concatenation is rather expensive. Just create a list of the property values, as long as you don’t modify it, it provides the right equality semantics and hash code implementation.
If you want to process the values of a map only, don’t call entrySet(), to map each entry via getValue(). Just use values() in the first place.
public List<GroupedOrderActionUpdateEntity> convertAndGroupForUpdate(
List<SimpleRatifiableAction> actions) {
return actions.stream()
.collect(Collectors.groupingBy( // use List.of(…, …) in Java 9 or newer
x -> Arrays.asList(x.getOrderNumber(), x.getActionType())))
.values().stream()
.map(x -> convertToUpdateGroup(x))
.collect(Collectors.toList());
}
Since convertToUpdateGroup is processing the list of actions of each group multiple times, there is not much that can be simplified and I wouldn’t inline it either. If there was only one operation, e.g. joining them to a string, you could do that right in the groupingBy operation, but there is no simply way to collect to multiple results.

How to use Stream instead for each loop

I have a loop which update an String object:
String result = "";
for (SomeObject obj: someObjectList) {
result = someMetohd(obj, result);
}
An implementation of someMethod is irrelevant:
private String someMethod(SomeObject obj, String result) {
result = result.concat(obj.toString());
return result;
}
And I want to use Stream instead a loop. How to implement it with Stream?
#SuppressWarnings("OptionalGetWithoutIsPresent")
String result = Stream.concat(Stream.of(""), someObjectList.stream())
.reduce(this::someMethod)
.get();
Your someMethod should be associative as specified in the documentation, however this is only important for parallel streams, while your code is explicitly sequential
As you always add to the result, you can consider it a first element of the stream and then use reduce method which will always merge first two elements - current result and next element
result has to be the first parameter of your someMethod
Because all elements in the stream have to be of the same type, while you have String result and SomeObject elements, you need to change the signature of someMethod to accept two Objects (and do the casts inside the method): private String someMethod(Object result, Object obj). This is the most ugly part of this solution.
You can inline the initial value of the result - no need to define result upfront
You might want to change this::someMethod depending on where this method is declared
Finally, you don't need to worry about handling Optional result, because the stream always has at least one element so it's safe to just call get()
final StringBuilder resultBuilder = new StringBuilder();
someObjectList.stream().map(SomeObject::toString).forEach(resultBuilder::append);
final String result = resultBuilder.toString();
To know more about Streams, you can check this page: http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/, I think it's very helpful.
Although the functional equivalent of what you're trying to achieve here is possible with streams, it's worth reminding you that functional and iterative ways of thinking are not necessarily compatible.
Generally you think of each element on its own, and you don't have visibility over other elements, unless you're using a special function like reduce.
Here's something that does what you've asked for:
final List<Object> objectList = Arrays.asList("a", "b", "c", "d");
String concatString = objectList.stream()
.map(e -> e.toString())
.reduce((result, element) -> result.concat(e))
.get();
Map turns the entire stream into a list, but with the toString function called separately on every element. Reduce is more complex. It can be described as an accumulation. It executes a function between the result, and the current element. In this case, it takes the first element, and concatenates it to the second. It then takes the first/second concatenation, and applies the same function to the third. And so on.
Instead of dealing with lambdas, you can also pass in methods directly, to tighten up your code a bit:
String result = objectList.stream()
.map(Object::toString)
.reduce(String::concat)
.get();

Convert an RDD into a key value pair RDD, with the values being in a List

I have an pairRDD with tuples being in the following form:
[(1,"b1","c1","d1","e1"), (2,"b2","c2","d2","e2"), ...
What I want is to transform the above into a key-value pair RDD, where the first field will be the key, and the second field a list of strings (value). i.e. I want to turn it to the form:
[(1,["b1","c1","d1","e1"]), (2,["b2","c2","d2","e2"]), ...
After this, is it then possible to access any field that I want?
For example, can I access the tuple (1,["b1","c1","d1","e1"]), then extract only the field d1?
If you have an RDD with Tuples, however the Tuples are represented, you can use mapToPair to transform your RDD of Tuple into a PairRDD with Key and Value as preferred.
In Java 8 this could be
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->new Tuple2(
extractKey(t),
extractTuples(t)
));
Note that this operation will introduce a shuffle.
To state the obvious, extractKey and extractTuples are to be methods to be implemented extracting the parts of the original tuple as needed.
With my limited knowledge of Scala Tuples, And assuming the input is something like scala.Tuple5<String,Integer,Integer,Integer,Integer>, this could be:
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->new Tuple2(
t._1,
Arrays.asList(t._2,t._3,t._4,t._6)
));
If however, you do not know beforehand the arity (number of elements) of your Tuple, then in scala terms, it is a Product. To access your elements dynamically, you will need to use the Product interface, with a choice of:
int productArity()
Object productElement(int n)
Iterator<Object> productIterator()
Then it becomes a regular Java exercise:
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->{
List<String> l = new ArrayList<>(t.productArity()-1);
for (int i = 1; i < t.productArity(); i++) {
l.set(i-1,t.productElement(i));
}
return new Tuple2<>(t._1,l);
}));
I hope I have it all right ... this code above is untested/uncompiled ... So if you can get it to work with corrections, then feel free to apply the corrections in this answer ...
You could try using a map function, eg in Scala:
rdd.map { case (k,v1,v2,v3,v4) => (k,(v1,v2,v3,v4)) }
Or rdd.groupBy could also be used but this could be inefficient on large data sets.

What is the best way to aggregate Streams into one DISTINCT with Java 8

Suppose i have multiple java 8 streams that each stream potentially can be converted into Set<AppStory> , now I want with the best performance to aggregate all streams into one DISTINCT stream by ID , sorted by property ("lastUpdate")
There are several ways to do what but i want the fastest one , for example:
Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true).
map(storyId1 -> vertexToStory1(storyId1).collect(toSet());
Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true).
map(storyId2 -> vertexToStory2(storyId1).collect(toSet());
Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true).
map(storyId3 -> vertexToStory3(storyId3).collect(toSet());
Set<AppStory> set = new HashSet<>();
set.addAll(appStr1)
set.addAll(appStr2)
set.addAll(appStr3) , and than make sort by "lastUpdate"..
//POJO Object:
public class AppStory implements Comparable<AppStory> {
private String storyId;
private String ........... many other attributes......
public String getStoryId() {
return storyId;
}
#Override
public int compareTo(AppStory o) {
return this.getStoryId().compareTo(o.getStoryId());
}
}
... but it is the old way.
How can I create ONE DISTINCT by ID sorted stream with BEST PERFORMANCE
somethink like :
Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet())
Any Ideas ?
BR
Vitaly
I think the parallel overhead is much greater than the actual work as you stated in the comments. So let your Streams do the job in sequential manner.
FYI: You should prefer using Stream::concat because slicing operations like Stream::limit can be bypassed by Stream::flatMap.
Stream::sorted is collecting every element in the Stream into a List, sort the List and then pushing the elements in the desired order down the pipeline. Then the elements are collected again. So this can be avoided by collecting the elements into a List and do the sorting afterwards. Using a List is a far better choice than using a Set because the order matters (I know there is a LinkedHashSet but you can't sort it).
This is the in my opinion the cleanest and maybe the fastest solution since we cannot prove it.
Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false)
.map(this::vertexToStory1);
Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false)
.map(this::vertexToStory2);
Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false)
.map(this::vertexToStory3);
List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3)
.distinct().collect(Collectors.toList());
// assuming AppStory::getLastUpdateTime is of type `long`
stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime));
I can't guarantee that this would be faster than what you have (I guess so, but you'll have to measure to be sure), but you can simply do this, assuming you have 3 streams:
List<AppStory> distinctSortedAppStories =
Stream.of(stream1, stream2, stream3)
.flatMap(Function.identity())
.map(this::vertexToStory)
.distinct()
.sorted(Comparator.comparing(AppStory::getLastUpdate))
.collect(Collectors.toList());

Categories

Resources