Transform list to mapping using java streams - java

I have the following pattern repeated throughout my code:
class X<T, V>
{
V doTransform(T t) {
return null; // dummy implementation
}
Map<T, V> transform(List<T> item) {
return item.stream().map(x->new AbstractMap.SimpleEntry<>(x, doTransform(x))).collect(toMap(x->x.getKey(), x->x.getValue()));
}
}
Requiring the use of AbstractMap.SimpleEntry is messy and clunky. Linqs use of anonymous types is more elegant.
Is there a simpler way to achieve this using streams?
Thx in advance.

You can call doTransform in the value mapper:
Map<T, V> transform(List<T> item) {
return item.stream().collect(toMap(x -> x, x -> doTransform(x)));
}

Unfortunately, Java doesn't have an exact equivalent of C#'s anonymous types.
In this specific case, you don't need the intermediate map operation as #Jorn Vernee has suggested. instead, you can perform the key and value extraction in the toMap collector.
However, when it gets to cases where you think you need something as such of C#'s anonymous types you may consider:
anonymous objects (may not always be what you want depending on your use case)
Arrays.asList(...), List.of(...) (may not always be what you want depending on your use case)
an array (may not always be what you want depending on your use case)
Ultimately, If you really need to map to something that can contain two different types of elements then I'd stick with the AbstractMap.SimpleEntry.
That, said your current example can be simplified to:
Map<T, V> transform(List<T> items) {
return items.stream().collect(toMap(Function.identity(),this::doTransform));
}

In this specific example, there is no need to do the intermediate storage at all:
Map<T, V> transform(List<T> item) {
return item.stream().collect(toMap(x -> x, x -> doTransform(x)));
}
But if you need it, Java 9 offers a simpler factory method,
Map<T, V> transform(List<T> item) {
return item.stream()
.map(x -> Map.entry(x, doTransform(x)))
.collect(toMap(x -> x.getKey(), x -> x.getValue()));
}
as long as you don’t have to deal with null.
You can use an anonymous inner class here,
Map<T, V> transform(List<T> item) {
return item.stream()
.map(x -> new Object(){ T t = x; V v = doTransform(x); })
.collect(toMap(x -> x.t, x -> x.v));
}
but it’s less efficient. It’s an inner class which captures a reference to the surrounding this, also it captures x, so you have two fields, t and the synthetic one for capturing x, for the same thing.
The latter could be circumvented by using a method, e.g.
Map<T, V> transform(List<T> item) {
return item.stream()
.map(x -> new Object(){ T getKey() { return x; } V v = doTransform(x); })
.collect(toMap(x -> x.getKey(), x -> x.v));
}
But it doesn’t add to readability.
The only true anonymous types are the types generated for lambda expressions, which could be used to store information via higher order functions:
Map<T, V> transform(List<T> item) {
return item.stream()
.map(x -> capture(x, doTransform(x)))
.collect(HashMap::new, (m,f) -> f.accept(m::put), HashMap::putAll);
}
public static <A,B> Consumer<BiConsumer<A,B>> capture(A a, B b) {
return f -> f.accept(a, b);
}
but you’d soon hit the limitations of Java’s type system (it still isn’t a functional programming language) if you try this with more complex scenarios.

Related

How to extract a function where I can pass functions as parameters for these chained lambdas in Java 8?

I have a code pattern in a piece of code using Kafka Streams that keeps repeating, I do a map, then group by key and then reduce. It looks like this:
KTable<ProjectKey, EventConfigurationIdsWithDeletedState> eventConfigurationsByProjectTable = eventConfigurationStream
.map((key, value) -> {
Map<String, Boolean> eventConfigurationUpdates = new HashMap<>();
eventConfigurationUpdates.put(key.getEventConfigurationId(), value != null);
ProjectKey projectKey = ProjectKey.newBuilder().setId(key.getProjectId()).build();
EventConfigurationIdsWithDeletedState eventConfigurationIdsWithDeletedState = EventConfigurationIdsWithDeletedState.newBuilder().setEventConfigurations(eventConfigurationUpdates).build();
return KeyValue.pair(projectKey, eventConfigurationIdsWithDeletedState);
})
.groupByKey()
.reduce((aggValue, newValue) -> {
Map<String, Boolean> newEventConfigurations = newValue.getEventConfigurations();
Map<String, Boolean> aggEventConfigurations = aggValue.getEventConfigurations();
Map.Entry<String, Boolean> newEntry = newEventConfigurations.entrySet().iterator().next();
if (newEntry.getValue())
aggEventConfigurations.putAll(newEventConfigurations);
else
aggEventConfigurations.remove(newEntry.getKey());
if (aggEventConfigurations.size() == 0)
return null;
return aggValue;
});
(with eventConfigurationStream being of type KStream<EventConfigurationKey, EventConfiguration>)
Another example that follows this pattern. Note there's a filter here too but that isn't always the case:
KTable<ProjectKey, NotificationSettingsTransition> globalNotificationSettingsPerProjectTable = notificationSettingTable.toStream()
.filter((key, value) -> {
return key.getEventConfigurationId() == null;
})
.map((key, value) -> {
ProjectKey projectKey = ProjectKey.newBuilder().setId(key.getProjectId()).build();
Map<String, NotificationSetting> notificationSettingsMap = new HashMap<>();
notificationSettingsMap.put(getAsCompoundKeyString(key), value);
NotificationSettingsTransition notificationSettingTransition = NotificationSettingsTransition
.newBuilder()
.setNotificationSettingCompoundKeyLastUpdate(getAsCompoundKey(key))
.setNotificationSettingLastUpdate(value)
.setEventConfigurationIds(new ArrayList<>())
.setNotificationSettingsMap(notificationSettingsMap)
.build();
return KeyValue.pair(projectKey, notificationSettingTransition);
})
.groupByKey()
.reduce((aggValue, newValue) -> {
Map<String, NotificationSetting> notificationSettingMap = aggValue.getNotificationSettingsMap();
String compoundKeyAsString = getAsString(newValue.getNotificationSettingCompoundKeyLastUpdate());
if (newValue.getNotificationSettingLastUpdate() != null)
notificationSettingMap.put(compoundKeyAsString, newValue.getNotificationSettingLastUpdate());
else
notificationSettingMap.remove(compoundKeyAsString);
aggValue.setNotificationSettingCompoundKeyLastUpdate(newValue.getNotificationSettingCompoundKeyLastUpdate());
aggValue.setNotificationSettingLastUpdate(newValue.getNotificationSettingLastUpdate());
aggValue.setNotificationSettingsMap(notificationSettingMap);
return aggValue;
});
(with notificationSettingsTable being of type KTable<NotificationSettingKey, NotificationSetting> notificationSettingTable but immediately being transformed into a KStream as well.)
How could I extract this into a function where I pass a function for the map code and for the reduce code but do not have to repeat the pattern of .map().groupByKey().reduce()? While respecting that the return types are different and depend on the code in the map function and should remain typed. Ideally in Java 8 but higher versions are potentially possible. I think I have a good idea of how to do it when the inner types of the KeyValuePair within the map code wouldn't change, but not sure how to do it now.
You can parameterise your function to accept two generic functions, where the types will be inferred (or set explicitely if not possible) when the function is called.
For the input to map, you want a BiFunction<K, V, T>, and for reduce you want a BiFunction<U, U, U>, where:
K is the type of key in map's function.
V is the type of value in map's function.
T is the return type of map's function.
U is the type of the aggregator, values and return type of reduce's function.
Looking at KStream and KGroupedStream, you can get more detailed type information to constrain the functions further.
This would make your custom function something like this:
<K, V, T, U> U mapGroupReduce(final KStream<K, V> stream, final BiFunction<K, V, T> mapper, final BiFunction<U, U, U> reducer) {
return stream.map(mapper).groupByKey().reduce(reducer);
}
You can then call it like so:
mapGroupReduce(yourStream,
(key, value) -> new KeyValue(k, v)),
(acc, value) -> acc);
In your case, instead of using BiFunctions, you need to use:
KeyValueMapper<K, V, KeyValue<T, U>> for the mapper
Reducer<U> for the reducer.
However, is this really all that much better than just writing stream.map(M).groupByKey().reduce(R) every time? The more verbose version is more explicit, and given the relative sizes of the mapper and reducer, you are not really saving on all that much.

Identity for BinaryOperator

I see in Java8 in UnaryOperator Interface following piece of code which does nothing on parameter and returns same value.
static <T> UnaryOperator<T> identity() {
return t -> t;
}
Is there anything for BinaryOperator which accepts two parameters of samekind and returns one value
static <T> BinaryOperator<T> identity() {
return (t,t) -> t;
}
why I am asking this question is for below requirement,
List<String> list = Arrays.asList("Abcd","Abcd");
Map<String,Integer> map = list.stream().collect(Collectors.toMap(str->str,
str->(Integer)str.length(),(t1,t2)->t1));
System.out.println(map.size());
in above code I don't want to do anything for two values of same key, I just wanted return one value, because in my case for sure values will be same.
As am not using t2 value Sonar throwing error, So I am finding out is there any thing like UnaryOperator.identity() for BinaryOpertor also in java8
Your question doesn't really make sense. If you were to paste your proposed BinaryOperator.identity method into an IDE, you would immediately see that it would complain that the identifier t is declared twice.
To fix this, we need a different identifier for each parameter:
return (t, u) -> t;
Now we can clearly see that this is not an identity function. It's a method which takes two arguments and returns the first one. Therefore the best name for this would be something like getFirst.
To answer your question about whether there's anything like this in the JDK: no. Using an identity function is a common use case, so defining a method for that is useful. Arbitrarily returning the first argument of two is not a common use case, and it's not useful to have a method to do that.
T means they have the same types, not the same values, that is not an identity per-se.
It just means that BinaryOperator will be used for the same types, but providing an identity for different values... this somehow sounds like foldLeft or foldRight or foldLeftIdentity/foldRightIdentity, which java does not have.
Your code seemingly can be improved as
List<String> list = Arrays.asList("Abcd", "Abcd");
Map<String, Integer> map = list.stream()
.collect(Collectors.toMap(Function.identity(), String::length, (a, b) -> a));
System.out.println(map.size());
Or possibly for your use case I don't want to do anything for two values of same key, I just wanted return one value, you may just choose to randomly return any value in using an implementation as following:
private static <T> BinaryOperator<T> any() {
return Math.random() < 0.5 ? ((x, y) -> x) : ((x, y) -> y);
}
and then in your code use it as
Map<String, Integer> map = list.stream()
.collect(Collectors.toMap(Function.identity(), String::length, any()));
Thanks to the suggestions from Holger, Eugene, and Federico, there are other efficient implementations of the any method that can actually involve using :
private static <T> BinaryOperator<T> any() {
// suggested by Holger
return ThreadLocalRandom.current().nextBoolean() ? ((x, y) -> x) : ((x, y) -> y);
// suggested by Eugene
long nt = System.nanoTime();
((nt >>> 32) ^ nt) > 0 ? ((x, y) -> x) : ((x, y) -> y);
}

Collector to split stream up into chunks of given size

I've got a problem at hand that I'm trying to solve with something I'm pretty sure I'm not supposed to do but don't see an alternative. I'm given a List of Strings and should split it up into chunks of a given size. The result then has to be passed to some method for further processing. As the list might be huge the processing should be done asynchronously.
My approach is to create a custom Collector that takes the Stream of Strings and converts it to a Stream<List<Long>>:
final Stream<List<Long>> chunks = list
.stream()
.parallel()
.collect(MyCollector.toChunks(CHUNK_SIZE))
.flatMap(p -> doStuff(p))
.collect(MyCollector.toChunks(CHUNK_SIZE))
.map(...)
...
The code for the Collector:
public final class MyCollector<T, A extends List<List<T>>, R extends Stream<List<T>>> implements Collector<T, A, R> {
private final AtomicInteger index = new AtomicInteger(0);
private final AtomicInteger current = new AtomicInteger(-1);
private final int chunkSize;
private MyCollector(final int chunkSize){
this.chunkSize = chunkSize;
}
#Override
public Supplier<A> supplier() {
return () -> (A)new ArrayList<List<T>>();
}
#Override
public BiConsumer<A, T> accumulator() {
return (A candidate, T acc) -> {
if (index.getAndIncrement() % chunkSize == 0){
candidate.add(new ArrayList<>(chunkSize));
current.incrementAndGet();
}
candidate.get(current.get()).add(acc);
};
}
#Override
public BinaryOperator<A> combiner() {
return (a1, a2) -> {
a1.addAll(a2);
return a1;
};
}
#Override
public Function<A, R> finisher() {
return (a) -> (R)a.stream();
}
#Override
public Set<Characteristics> characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Characteristics.CONCURRENT, Characteristics.UNORDERED));
}
public static <T> MyCollector<T, List<List<T>>, Stream<List<T>>> toChunks(final int chunkSize){
return new MyCollector<>(chunkSize);
}
}
This seems to work in most cases but I get a NPE sometimes.. I'm sure the in the accumulator is not thread safe as there might be two threads interfering when adding new Lists to the main List. I don't mind a chunk having a few too many or too little elements though.
I've tried this instead of the current supplier function:
return () -> (A)new ArrayList<List<T>>(){{add(new ArrayList<T>());}};
To make sure there is always a List present. This doesn't work at all and results in empty lists.
Issues:
I'm pretty sure a custom Spliterator would be a good solution. It would not work for synchronous scenarios however. Also, am I sure the Spliterator is called?
I'm aware I shouldn't have state at all but not sure how to change it.
Questions:
Is this approach completely wrong or somehow fixable?
If I use a Spliterator - can I be sure it's called or is that decided by the underlying implementation?
I'm pretty sure the casts to (A) and (R) in the supplier and finisher are not necessary but IntelliJ complains. Is there something I'm missing?
EDIT:
I've added some more to the client code as the suggestions with IntStream.range won't work when chained.
I realize I could do it differently as suggested in a comment but it's also a little bit about style and knowing if it's possible.
I have CONCURRENT characteristic because I assume the Stream API would fall back to synchronous handling otherwise. The solution is not thread-safe as stated before.
Any help would be greatly appreciated.
Best,
D
I can't comment yet, but I wanted to post the following link to a very similar issue (though not a duplicate, as far as I understand): Java 8 Stream with batch processing
You might also be interested in the following issue on GitHub: https://github.com/jOOQ/jOOL/issues/296
Now, your use of CONCURRENT characteristic is wrong - the doc say the following about Collector.Characteristics.CONCURRENT:
Indicates that this collector is concurrent, meaning that the result container can support the accumulator function being called concurrently with the same result container from multiple threads.
This means that the supplier only gets called once, and the combiner actually never gets called (cf. the source of ReferencePipeline.collect() method). That's why you got NPEs sometimes.
As a result, I suggest a simplified version of what you came up with:
public static <T> Collector<T, List<List<T>>, Stream<List<T>>> chunked(int chunkSize) {
return Collector.of(
ArrayList::new,
(outerList, item) -> {
if (outerList.isEmpty() || last(outerList).size() >= chunkSize) {
outerList.add(new ArrayList<>(chunkSize));
}
last(outerList).add(item);
},
(a, b) -> {
a.addAll(b);
return a;
},
List::stream,
Collector.Characteristics.UNORDERED
);
}
private static <T> T last(List<T> list) {
return list.get(list.size() - 1);
}
Alternatively, you could write a truly concurrent Collector using proper synchronization, but if you don't mind having more than one list with a size less than chunkSize (which is the effect you can get with a non-concurrent Collector like the one I proposed above), I wouldn't bother.
Here is one way, in the spirit of doing it all in one expression, which is oddly satisfying: first associate each string with its index in the list, then use that in the collector to pick a string list to put each string into. Then stream those lists in parallel to your converter method.
final Stream<List<Long>> longListStream = IntStream.range(0, strings.size())
.parallel()
.mapToObj(i -> new AbstractMap.SimpleEntry<>(i, strings.get(i)))
.collect(
() -> IntStream.range(0, strings.size() / CHUNK_SIZE + 1)
.mapToObj(i -> new LinkedList<String>())
.collect(Collectors.toList()),
(stringListList, entry) -> {
stringListList.get(entry.getKey() % CHUNK_SIZE).add(entry.getValue());
},
(stringListList1, stringListList2) -> { })
.parallelStream()
.map(this::doStuffWithStringsAndGetLongsBack);

Collect to map skipping null key/values

Let's say I have some stream and want to collect to map like this
stream.collect(Collectors.toMap(this::func1, this::func2));
But I want to skip null keys/values. Of course, I can do like this
stream.filter(t -> func1(t) != null)
.filter(t -> func2(t) != null)
.collect(Collectors.toMap(this::func1, this::func2));
But is there more beautiful/effective solution?
If you want to avoid evaluating the functions func1 and func2 twice, you have to store the results. E.g.
stream.map(t -> new AbstractMap.SimpleImmutableEntry<>(func1(t), func2(t))
.filter(e -> e.getKey()!=null && e.getValue()!=null)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
This doesn’t make the code shorter and even the efficiency depends on the circumstances. This change pays off, if the costs of evaluating the func1 and func2 are high enough to compensate the creation of temporary objects. In principle, the temporary object could get optimized away, but this isn’t guaranteed.
Starting with Java 9, you can replace new AbstractMap.SimpleImmutableEntry<>(…) with Map.entry(…). Since this entry type disallows null right from the start, it would need filtering before constructing the entry:
stream.flatMap(t -> {
Type1 value1 = func1(t);
Type2 value2 = func2(t);
return value1!=null && value2!=null? Stream.of(Map.entry(value1, value2)): null;
})
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
Alternatively, you may use a pair type of one of the libraries you’re already using (the Java API itself doesn’t offer such a type).
Another way to avoid evaluating the functions twice. Use a pair class of your choice. Not as concise as Holger's but it's a little less dense which can be easier to read.
stream.map(A::doFuncs)
.flatMap(Optional::stream)
.collect(Collectors.toMap(Pair::getKey, Pair::getValue));
private static Optional<Pair<Bar, Baz>> doFuncs(Foo foo)
{
final Bar bar = func1(foo);
final Baz baz = func2(foo);
if (bar == null || baz == null) return Optional.empty();
return Optional.of(new Pair<>(bar, baz));
}
(Choose proper names - I didn't know what types you were using)
One option is to do as in the other answers, i.e. use a Pair type, or an implementation of Map.Entry. Another approach used in functional programming would be to memoize the functions. According to Wikipedia:
memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
So you could do it by caching the results of the functions in maps:
public static <K, V> Function<K, V> memoize(Function<K, V> f) {
Map<K, V> map = new HashMap<>();
return k -> map.computeIfAbsent(k, f);
}
Then, use the memoized functions in the stream:
Function<E, K> memoizedFunc1 = memoize(this::func1);
Function<E, V> memoizedFunc2 = memoize(this::func2);
stream.filter(t -> memoizedFunc1.apply(t) != null)
.filter(t -> memoizedFunc2.apply(t) != null)
.collect(Collectors.toMap(memoizedFunc1, memoizedFunc2));
Here E stands for the type of the elements of the stream, K stands for the type returned by func1 (which is the type of the keys of the map) and V stands for the type returned by func2 (which is the type of the values of the map).
This is a naive solution, but does not call functions twice and does not create extra objects:
List<Integer> ints = Arrays.asList(1, null, 2, null, 3);
Map<Integer, Integer> res = ints.stream().collect(LinkedHashMap::new, (lhm, i) -> {
final Integer integer1 = func1(i);
final Integer integer2 = func2(i);
if(integer1 != null && integer2 != null) {
lhm.put(integer1, integer2);
}
}, (lhm1, lhm2) -> {});
You could create a isFunc1AndFunc2NotNull() method in the current class :
boolean isFunc1AndFunc2NotNull(Foo foo){
return func1(foo) != null && func2(foo) != null;
}
And change your stream as :
stream.filter(this::isFunc1AndFunc2NotNull)
.collect(Collectors.toMap(this::func1, this::func2));

Java 8, how to group stream elements to sets using BiPredicate

I have stream of files, and a method which takes two files as an argument, and return if they have same content or not.
I want to reduce this stream of files to a set (or map) of sets grouping all the files with identical content.
I know this is possible by refactoring the compare method to take one file, returning a hash and then grouping the stream by the hash returned by the function given to the collector. But what is the cleanest way to achieve this with a compare method, which takes two files and returns a boolean?
For clarity, here is an example of the obvious way with the one argument function solution
file.stream().collect(groupingBy(f -> Utility.getHash(f))
But in my case I have the following method which I want to utilize in the partitioning process
public boolean isFileSame(File f, File f2) {
return Files.equal(f, f2)
}
If all you have is a BiPredicate without an associated hash function that would allow an efficient lookup, you can only use a linear probing. There is no builtin collector doing that, but a custom collector working close to the original groupingBy collector can be implemented like
public static <T> Collector<T,?,Map<T,Set<T>>> groupingBy(BiPredicate<T,T> p) {
return Collector.of(HashMap::new,
(map,t) -> {
for(Map.Entry<T,Set<T>> e: map.entrySet())
if(p.test(t, e.getKey())) {
e.getValue().add(t);
return;
}
map.computeIfAbsent(t, x->new HashSet<>()).add(t);
}, (m1,m2) -> {
if(m1.isEmpty()) return m2;
m2.forEach((t,set) -> {
for(Map.Entry<T,Set<T>> e: m1.entrySet())
if(p.test(t, e.getKey())) {
e.getValue().addAll(set);
return;
}
m1.put(t, set);
});
return m1;
}
);
but, of course, the more resulting groups you have, the worse the performance will be.
For your specific task, it will be much more efficient to use
public static ByteBuffer readUnchecked(Path p) {
try {
return ByteBuffer.wrap(Files.readAllBytes(p));
} catch(IOException ex) {
throw new UncheckedIOException(ex);
}
}
and
Set<Set<Path>> groupsByContents = your stream of Path instances
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(YourClass::readUnchecked, Collectors.toSet()),
map -> new HashSet<>(map.values())));
which will group the files by contents and does hashing implicitly. Keep in mind that equal hash does not imply equal contents but this solution does already take care of this. The finishing function map -> new HashSet<>(map.values()) ensures that the resulting collection does not keep the file’s contents in memory after the operation.
A possible solution by the helper class Wrapper:
files.stream()
.collect(groupingBy(f -> Wrapper.of(f, Utility::getHash, Files::equals)))
.keySet().stream().map(Wrapper::value).collect(toList());
If you won't to use the Utility.getHash for some reason, try to use File.length() for the hash function. The Wrapper provides a general solution to customize the hash/equals function for any type (e.g. array). it's useful to keep it into your tool kit. Here is the sample implementation for the Wrapper:
public class Wrapper<T> {
private final T value;
private final ToIntFunction<? super T> hashFunction;
private final BiFunction<? super T, ? super T, Boolean> equalsFunction;
private int hashCode;
private Wrapper(T value, ToIntFunction<? super T> hashFunction, BiFunction<? super T, ? super T, Boolean> equalsFunction) {
this.value = value;
this.hashFunction = hashFunction;
this.equalsFunction = equalsFunction;
}
public static <T> Wrapper<T> of(T value, ToIntFunction<? super T> hashFunction, BiFunction<? super T, ? super T, Boolean> equalsFunction) {
return new Wrapper<>(value, hashFunction, equalsFunction);
}
public T value() {
return value;
}
#Override
public int hashCode() {
if (hashCode == 0) {
hashCode = value == null ? 0 : hashFunction.applyAsInt(value);
}
return hashCode;
}
#Override
public boolean equals(Object obj) {
return (obj == this) || (obj instanceof Wrapper && equalsFunction.apply(((Wrapper<T>) obj).value, value));
}
// TODO ...
}

Categories

Resources