Cartesian product using Java Streams - java

I have a cartesian product function in JavaScript:
function cartesianProduct(arr) {
return arr.reduce(function(a,b) {
return a.map(function(x) {
return b.map(function(y) {
return x.concat(y);
});
}).reduce(function(a,b) { return a.concat(b); }, []);
}, [[]]);
}
So that if I have a 3D array:
var data = [[['D']], [['E'],['L','M','N']]];
The result of cartesianProduct(data) would be the 2D array:
[['D','E'], ['D','L','M','N']]
What I'm trying to do is write this cartesian product function in Java using Streams.
So far I have the following in Java:
public Collection<Collection<String>> cartesianProduct(Collection<Collection<Collection<String>>> arr) {
return arr.stream().reduce(new ArrayList<Collection<String>>(), (a, b) -> {
return a.stream().map(x -> {
return b.stream().map(y -> {
return Stream.concat(x.stream(), y.stream());
});
}).reduce(new ArrayList<String>(), (c, d) -> {
return Stream.concat(c, d);
});
});
}
I have a type checking error that states:
ArrayList<String> is not compatible with Stream<Stream<String>>
My guesses as to what is wrong:
I need to use a collector somewhere (maybe after the Stream.concat)
The data type for the identity is wrong

This is possible with a bit of functional programming magic. Here's method which accepts Collection<Collection<Collection<T>>> and produces Stream<Collection<T>>:
static <T> Stream<Collection<T>> cartesianProduct(Collection<Collection<Collection<T>>> arr)
{
return arr.stream()
.<Supplier<Stream<Collection<T>>>> map(c -> c::stream)
.reduce((s1, s2) -> () -> s1.get().flatMap(
a -> s2.get().map(b -> Stream.concat(a.stream(), b.stream())
.collect(Collectors.toList()))))
.orElseGet(() -> () -> Stream.<Collection<T>>of(Collections.emptyList()))
.get();
}
Usage example:
cartesianProduct(
Arrays.asList(Arrays.asList(Arrays.asList("D")),
Arrays.asList(Arrays.asList("E"), Arrays.asList("L", "M", "N"))))
.forEach(System.out::println);
Output:
[D, E]
[D, L, M, N]
Of course instead of .forEach() you can collect the results to the List if you want to return Collection<Collection<T>> instead, but returning Stream seems more flexible to me.
A bit of explanation:
Here we create a stream of stream suppliers via map(c -> c::stream). Each function of this stream may produce by demand a stream of the corresponding collection elements. We do this because streams a once off (otherwise having stream of streams would be enough). After that we reduce this stream of suppliers creating a new supplier for each pair which flatMaps two streams and maps their elements to the concatenated lists. The orElseGet part is necessary to handle the empty input. The last .get() just calls the final stream supplier to get the resulting stream.

Related

how to keep track of mulitple closure variable in Java lambdas (or JVM language)?

Using Java, I am trying to find a clean way to accumulate multiple different value in a series of lambda. For a concrete example see this following piece of JS (typescript) code:
// some filtering helpers. not really interested in that because there are similar things in Java
const mapFilter = <T,U>(arr: T[], transform: (item: T, idx: number, arr: T[]) => U) => arr.map(transform).filter(Boolean)
const mapFilterFlat = <T,U>(arr: T[], transform: (item: T, idx: number, arr: T[]) => U[]) => mapFilter(arr, transform).flat()
const findDeep = () =>
mapFilterFlat(someObj.level1Items, A =>
mapFilterFlat(A.level2Items, B =>
mapFilter(B.level3Items, C =>
// I am able to access closure variables so i can push them all in my result, instead of just the last level
C == something ? ({A, B, C}) : null
)))
let found: {A: any, B: any, C: any}[] = findDeep();
I am not sure if there are existing Java Stream APIs for accumulate such a result. Maybe it's not really possible and i should look into another JVM language ?
I eventually did this, but it's not really concise (although i know Java is not really):
public class Finder implements BiFunction<SomeObj, Predicate<SomeObj>, List<State>> {
static class State {
Integer A;
String B;
List C;
static State from(Map<String, Object> inputs) {
var res = new State();
res.A = (Integer) inputs.get("A");
res.B = (String) inputs.get("B");
res.C = (List) inputs.get("C");
return res;
}
}
Map<String, Object> fields;
<T> T store(String key, T value) {
return (T) fields.put(key, value);
}
public List<State> apply(SomeObj someObj, Predicate<C> predicate) {
fields = new HashMap<>();
return config.level1Items
.stream()
.flatMap(A -> store("A", A).level2Items.stream())
.flatMap(B -> store("B", B).level3Items.stream())
.peek(C -> store("C", C))
.filter(predicate)
.map(o -> State.from(fields))
.collect(Collectors.toList());
}
}
I am not even sure that the BiFunction implementation is useful.
Thanks for your guidances
You are translating TypeScript but you are no translating as it was: you are not "inheriting" the depth of lambda, there are all at the same level and they all don't see the variable from their parent context.
const findDeep = () =>
mapFilterFlat(someObj.level1Items, A =>
mapFilterFlat(A.level2Items, B =>
mapFilter(B.level3Items, C =>
// I am able to access closure variables so i can push them all in my result, instead of just the last level
C == something ? ({A, B, C}) : null
)))
This is not the same as:
return config.level1Items
.stream()
.flatMap(A -> store("A", A).level2Items.stream())
.flatMap(B -> store("B", B).level3Items.stream())
.peek(C -> store("C", C))
.filter(predicate)
.map(o -> State.from(fields))
.collect(Collectors.toList());
This should be something like this:
return config.level1Items
.stream()
.flatMap(A ->
store("A", A).level2Items
.stream()
.flatMap(B -> store("B", B).level3Items
.stream())
)
.peek(C -> store("C", C)) // the same must be done here
.filter(predicate)
.map(o -> State.from(fields))
.collect(Collectors.toList());
If I try to understand your algorithm, you are trying to get all permutation of {A, B, C} where C = something: your code should be something like this, using forEach to iterate over items of Collection/Iterator.
List<Triple<A,B,C>>> collector = new ArrayList<>();
config.level1Items.forEach(a -> {
a.level2Items.forEach(b -> {
b.level3Items.forEach(c -> {
if (c.equals(something)) {
collector.add(new Triple<>(a, b, c));
}
}
});
});
You don't need a stream for that.
Triple is simply an implementation of tuple of 3 value, for example the one at commons-lang3.
Based on NoDataFound's answer I eventually did this:
var collector = AA.stream()
.flatMap(A -> A.BB.stream()
.flatMap(B -> B.CC.stream()
.filter(predicateOnC)
.map(C -> Triple.of(A, B, C))))
.collect(Collectors.toList());

Chain of operations

I have a interface that takes a string and returns a transformed string
I have some classes that will transform in different ways.
Is there any way in Java to create a stream of those classes and make a transformation of a string.
For example:
class MyClass implements MyOperation {
String execute(String s) { return doSomething(s); }
}
class MyClass2 implements MyOperation {
String execute(String s) { return doSomething(s); }
}
ArrayList<MyClass> operations = new ArrayList<>();
operations.add(new MyClass());
operations.add(new MyClass2());
...
operations.stream()...
Can I make a stream of that in order to make lots of transformations for a single string? I thought about .reduce() but it is strict about the data types.
Your classes all implement methods that transform a String to a String. In other words, they can be represented by a Function<String,String>. They can be combined as follows and applied on a single String:
List<Function<String,String>> ops = new ArrayList<> ();
ops.add (s -> s + "0"); // these lambda expressions can be replaced with your methods:
// for example - ops.add((new MyClass())::execute);
ops.add (s -> "1" + s);
ops.add (s -> s + " 2");
// here we combine them
Function<String,String> combined =
ops.stream ()
.reduce (Function.identity(), Function::andThen);
// and here we apply them all on a String
System.out.println (combined.apply ("dididi"));
Output:
1dididi0 2
The ArrayList<MyClass> should be ArrayList<MyOperation> else the call to operations.add(new MyClass2()); would yield a compilation error.
That said you're looking for this overload of reduce:
String result = operations.stream().reduce("myString",
(x, y) -> y.execute(x),
(a, b) -> {
throw new RuntimeException("unimplemented");
});
"myString" is the identity value.
(x, y) -> y.execute(x) is the accumulator function to be applied.
(a, b) -> {... is the combiner function used only when the stream is parallel. So you need not worry about it for a sequential stream.
You may also want to read upon an answer I've posted a while back "Deciphering Stream reduce function".

why java stream "reduce()" accumulates the same object

ComparisonResults comparisonResults = requestsList
.parallelStream()
.map(item -> getResponse(item))
.map(item -> compareToBl(item))
.reduce(new ComparisonResults(), (result1, result2) ->
{
result1.addSingleResult(result2);
return result1;
});
when
private ComparisonResults compareToBl(CompleteRoutingResponseShort completeRoutingResponseShortFresh) {
...
ComparisonResults comparisonResults = new ComparisonResults();
...
return comparisonResults;
}
however when I debug:
.reduce(new ComparisonResults(), (result1, result2) ->
{
result1.addSingleResult(result2);
return result1;
});
I see result1 and result2 are always the same object (object id in the
IDEA)
result1 equals result2
addSingleResult should return a new object as modified a copy of this so you should change your code to:
.reduce(new ComparisonResults(), (result1, result2) ->
{
return result1.addSingleResult(result2);
});
Otherwise, you are always returning the same instance (without modifications).
From Java documentation:
The reduce operation always returns a new value. However, the
accumulator function also returns a new value every time it processes
an element of a stream. Suppose that you want to reduce the elements
of a stream to a more complex object, such as a collection. This might
hinder the performance of your application. If your reduce operation
involves adding elements to a collection, then every time your
accumulator function processes an element, it creates a new collection
that includes the element, which is inefficient. It would be more
efficient for you to update an existing collection instead. You can do
this with the Stream.collect method, which the next section describes.

Cartesian product of streams in Java 8 as stream (using streams only)

I would like to create a method which creates a stream of elements which are cartesian products of multiple given streams (aggregated to the same type at the end by a binary operator). Please note that both arguments and results are streams, not collections.
For example, for two streams of {A, B} and {X, Y} I would like it produce stream of values {AX, AY, BX, BY} (simple concatenation is used for aggregating the strings). So far, I have came up with this code:
private static <T> Stream<T> cartesian(BinaryOperator<T> aggregator, Stream<T>... streams) {
Stream<T> result = null;
for (Stream<T> stream : streams) {
if (result == null) {
result = stream;
} else {
result = result.flatMap(m -> stream.map(n -> aggregator.apply(m, n)));
}
}
return result;
}
This is my desired use case:
Stream<String> result = cartesian(
(a, b) -> a + b,
Stream.of("A", "B"),
Stream.of("X", "Y")
);
System.out.println(result.collect(Collectors.toList()));
Expected result: AX, AY, BX, BY.
Another example:
Stream<String> result = cartesian(
(a, b) -> a + b,
Stream.of("A", "B"),
Stream.of("K", "L"),
Stream.of("X", "Y")
);
Expected result: AKX, AKY, ALX, ALY, BKX, BKY, BLX, BLY.
However, if I run the code, I get this error:
IllegalStateException: stream has already been operated upon or closed
Where is the stream consumed? By flatMap? Can it be easily fixed?
Passing the streams in your example is never better than passing Lists:
private static <T> Stream<T> cartesian(BinaryOperator<T> aggregator, List<T>... lists) {
...
}
And use it like this:
Stream<String> result = cartesian(
(a, b) -> a + b,
Arrays.asList("A", "B"),
Arrays.asList("K", "L"),
Arrays.asList("X", "Y")
);
In both cases you create an implicit array from varargs and use it as data source, thus the laziness is imaginary. Your data is actually stored in the arrays.
In most of the cases the resulting Cartesian product stream is much longer than the inputs, thus there's practically no reason to make the inputs lazy. For example, having five lists of five elements (25 in total), you will have the resulting stream of 3125 elements. So storing 25 elements in the memory is not very big problem. Actually in most of the practical cases they are already stored in the memory.
In order to generate the stream of Cartesian products you need to constantly "rewind" all the streams (except the first one). To rewind, the streams should be able to retrieve the original data again and again, either buffering them somehow (which you don't like) or grabbing them again from the source (colleciton, array, file, network, random numbers, etc.) and perform again and again all the intermediate operations. If your source and intermediate operations are slow, then lazy solution may be much slower than buffering solution. If your source is unable to produce the data again (for example, random numbers generator which cannot produce the same numbers it produced before), your solution will be incorrect.
Nevertheless totally lazy solution is possbile. Just use not streams, but stream suppliers:
private static <T> Stream<T> cartesian(BinaryOperator<T> aggregator,
Supplier<Stream<T>>... streams) {
return Arrays.stream(streams)
.reduce((s1, s2) ->
() -> s1.get().flatMap(t1 -> s2.get().map(t2 -> aggregator.apply(t1, t2))))
.orElse(Stream::empty).get();
}
The solution is interesting as we create and reduce the stream of suppliers to get the resulting supplier and finally call it. Usage:
Stream<String> result = cartesian(
(a, b) -> a + b,
() -> Stream.of("A", "B"),
() -> Stream.of("K", "L"),
() -> Stream.of("X", "Y")
);
result.forEach(System.out::println);
stream is consumed in the flatMap operation in the second iteration. So you have to create a new stream every time you map your result. Therefore you have to collect the stream in advance to get a new stream in every iteration.
private static <T> Stream<T> cartesian(BiFunction<T, T, T> aggregator, Stream<T>... streams) {
Stream<T> result = null;
for (Stream<T> stream : streams) {
if (result == null) {
result = stream;
} else {
Collection<T> s = stream.collect(Collectors.toList());
result = result.flatMap(m -> s.stream().map(n -> aggregator.apply(m, n)));
}
}
return result;
}
Or even shorter:
private static <T> Stream<T> cartesian(BiFunction<T, T, T> aggregator, Stream<T>... streams) {
return Arrays.stream(streams).reduce((r, s) -> {
List<T> collect = s.collect(Collectors.toList());
return r.flatMap(m -> collect.stream().map(n -> aggregator.apply(m, n)));
}).orElse(Stream.empty());
}
You can create a method that returns a stream of List<T> of objects and does not aggregate them. The algorithm is the same: at each step, collect the elements of the second stream to a list and then append them to the elements of the first stream.
The aggregator is outside the method.
#SuppressWarnings("unchecked")
public static <T> Stream<List<T>> cartesianProduct(Stream<T>... streams) {
// incorrect incoming data
if (streams == null) return Stream.empty();
return Arrays.stream(streams)
// non-null streams
.filter(Objects::nonNull)
// represent each list element as SingletonList<Object>
.map(stream -> stream.map(Collections::singletonList))
// summation of pairs of inner lists
.reduce((stream1, stream2) -> {
// list of lists from second stream
List<List<T>> list2 = stream2.collect(Collectors.toList());
// append to the first stream
return stream1.flatMap(inner1 -> list2.stream()
// combinations of inner lists
.map(inner2 -> {
List<T> list = new ArrayList<>();
list.addAll(inner1);
list.addAll(inner2);
return list;
}));
}).orElse(Stream.empty());
}
public static void main(String[] args) {
Stream<String> stream1 = Stream.of("A", "B");
Stream<String> stream2 = Stream.of("K", "L");
Stream<String> stream3 = Stream.of("X", "Y");
#SuppressWarnings("unchecked")
Stream<List<String>> stream4 = cartesianProduct(stream1, stream2, stream3);
// output
stream4.map(list -> String.join("", list)).forEach(System.out::println);
}
String.join is a kind of aggregator in this case.
Output:
AKX
AKY
ALX
ALY
BKX
BKY
BLX
BLY
See also: Stream of cartesian product of other streams, each element as a List?

Use of stream, filter and average on list and jdk8

I have this list of data that look like this;
{id, datastring}
{1,"a:1|b:2|d:3"}
{2,"a:2|c:2|c:4"}
{3,"a:2|bb:2|a:3"}
{4,"a:3|e:2|ff:3"}
What I need to do here is to do operations like average or find all id for which a element in the string is less than a certain value.
Here are some example;
Averages
{a,2}{b,2}{bb,2}{c,3}{d,3}{e,2}{ff,3}
Find all id's where c<4
{2}
Find all id's where a<3
{1,2,3}
Would this be a good use of stream() and filter() ??
Yes you can use stream operations to achieve that but I would suggest to create a class for this datas, so that each row corresponds to one specific instance. That will make your life easier IMO.
class Data {
private int id;
private Map<String, List<Integer>> map;
....
}
That said let's take a look at how you could implement this. First, the find all's implementation:
public static Set<Integer> ids(List<Data> list, String value, Predicate<Integer> boundPredicate) {
return list.stream()
.filter(d -> d.getMap().containsKey(value))
.filter(d -> d.getMap().get(value).stream().anyMatch(boundPredicate))
.map(d -> d.getId())
.collect(toSet());
}
This one is simple to read. You get a Stream<Data> from the list. Then you apply a filter such that you only get instances that have the value given in the map, and that there is a value which satisfies the predicate you give. Then you map each instance to its corresponding id and you collect the resulting stream in a Set.
Example of call:
Set<Integer> set = ids(list, "a", value -> value < 3);
which outputs:
[1, 2, 3]
The average request was a bit more tricky. I ended up with another implementation, you finally get a Map<String, IntSummaryStatistics> at the end (which does contain the average) but also other informations.
Map<String, IntSummaryStatistics> stats = list.stream()
.flatMap(d -> d.getMap().entrySet().stream())
.collect(toMap(Map.Entry::getKey,
e -> e.getValue().stream().mapToInt(i -> i).summaryStatistics(),
(i1, i2) -> {i1.combine(i2); return i1;}));
You first get a Stream<Data>, then you flatMap each entry set of each map to have Stream<Entry<String, List<Integer>>. Now you collect this stream into a map for which each key is mapped by the entry's key and each List<Integer> is mapped by its corresponding IntSummaryStatistics value. If you have two identical keys, you combine their respective IntSummaryStatistics values.
Given you data set, you get a Map<String, IntSummaryStatistics>
ff => IntSummaryStatistics{count=1, sum=3, min=3, average=3.000000, max=3}
bb => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
a => IntSummaryStatistics{count=5, sum=11, min=1, average=2.200000, max=3}
b => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
c => IntSummaryStatistics{count=2, sum=6, min=2, average=3.000000, max=4}
d => IntSummaryStatistics{count=1, sum=3, min=3, average=3.000000, max=3}
e => IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2}
from which you can easily grab the average.
Here's a full working example, the implementation can certainly be improved though.
I know that you have your answer, but here are my versions too :
Map<String, Double> result = list.stream()
.map(Data::getElements)
.flatMap((Multimap<String, Integer> map) -> {
return map.entries().stream();
})
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.averagingInt((Entry<String, Integer> token) -> {
return token.getValue();
})));
System.out.println(result);
List<Integer> result2 = list.stream()
.filter((Data data) -> {
return data.getElements().get("c").stream().anyMatch(i -> i < 4);
})
.map(Data::getId)
.collect(Collectors.toList());
System.out.println(result2);

Categories

Resources