Java reduce a collection of string to a map of occurence - java

Consider the a list as id1_f, id2_d, id3_f, id1_g, how can I use stream to get a reduced map in format of <String, Integer>of statistics like:
id1 2
id2 1
id3 1
Note: the key is part before _. Is reduce function can help here?

This will get the job done:
Map<String, Long> map = Stream.of("id1_f", "id2_d", "id3_f", "id1_g")
.collect(
Collectors.groupingBy(v -> v.split("_")[0],
Collectors.counting())
);

You can also use the toMap collector:
myList.stream()
.collect(Collectors.toMap((String s) -> s.split("_")[0],
(String s) -> 1, Math::addExact);
if you care about the order of the elements then dump the result into a LinkedHashMap.
myList.stream()
.collect(Collectors.toMap((String s) -> s.split("_")[0],
(String s) -> 1, Math::addExact,
LinkedHashMap::new));

A non-stream approach using Map::merge:
Map<String, Integer> result = new LinkedHashMap<>();
myList.forEach(s -> result.merge(s.split("_")[0], 1, Math::addExact));

Since you want to count the elements, I'd suggest using Guava's Multiset interface, which is dedicated to such purpose.
The definition of Multiset from its JavaDoc:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element.
Here are two ways to use it:
1) Without the Stream API:
ImmutableMultiset<String> multiset2 = ImmutableMultiset.copyOf(Lists.transform(
list, str -> StringUtils.substringBefore(str, "_")
));
2) Using the Stream API:
ImmutableMultiset<String> multiset = list.stream()
.map(str -> StringUtils.substringBefore(str, "_"))
.collect(ImmutableMultiset.toImmutableMultiset());
Note that instead of using something like s.split("_")[0], I used Apache Commons Lang's StringUtils.substringBefore, which I find much more readable.
You retrieve the counts of the elements using Multiset.count() method.

Related

Java Map with List value to list using streams?

I am trying to rewrite the method below using streams but I am not sure what the best approach is? If I use flatMap on the values of the entrySet(), I lose the reference to the current key.
private List<String> asList(final Map<String, List<String>> map) {
final List<String> result = new ArrayList<>();
for (final Entry<String, List<String>> entry : map.entrySet()) {
final List<String> values = entry.getValue();
values.forEach(value -> result.add(String.format("%s-%s", entry.getKey(), value)));
}
return result;
}
The best I managed to do is the following:
return map.keySet().stream()
.flatMap(key -> map.get(key).stream()
.map(value -> new AbstractMap.SimpleEntry<>(key, value)))
.map(e -> String.format("%s-%s", e.getKey(), e.getValue()))
.collect(Collectors.toList());
Is there a simpler way without resorting to creating new Entry objects?
A stream is a sequence of values (possibly unordered / parallel). map() is what you use when you want to map a single value in the sequence to some single other value. Say, map "alturkovic" to "ALTURKOVIC". flatMap() is what you use when you want to map a single value in the sequence to 0, 1, or many other values. Hence why a flatMap lambda needs to turn a value into a stream of values. flatMap can thus be used to take, say, a list of lists of string, and turn that into a stream of just strings.
Here, you want to map a single entry from your map (a single key/value pair) into a single element (a string describing it). 1 value to 1 value. That means flatMap is not appropriate. You're looking for just map.
Furthermore, you need both key and value to perform your mapping op, so, keySet() is also not appropriate. You're looking for entrySet(), which gives you a set of all k/v pairs, juts what we need.
That gets us to:
map.entrySet().stream()
.map(e -> String.format("%s-%s", e.getKey(), e.getValue()))
.collect(Collectors.toList());
Your original code makes no effort to treat a single value from a map (which is a List<String>) as separate values; you just call .toString() on the entire ordeal, and be done with it. This means the produced string looks like, say, [Hello, World] given a map value of List.of("Hello", "World"). If you don't want this, you still don't want flatmap, because streams are also homogenous - the values in a stream are all of the same kind, and thus a stream of 'key1 value1 value2 key2 valueA valueB' is not what you'd want:
map.entrySet().stream()
.map(e -> String.format("%s-%s", e.getKey(), myPrint(e.getValue())))
.collect(Collectors.toList());
public static String myPrint(List<String> in) {
// write your own algorithm here
}
Stream API just isn't the right tool to replace that myPrint method.
A third alternative is that you want to smear out the map; you want each string in a mapvalue's List<String> to first be matched with the key (so that's re-stating that key rather a lot), and then do something to that. NOW flatMap IS appropriate - you want a stream of k/v pairs first, and then do something to that, and each element is now of the same kind. You want to turn the map:
key1 = [value1, value2]
key2 = [value3, value4]
first into a stream:
key1:value1
key1:value2
key2:value3
key2:value4
and take it from there. This explodes a single k/v entry in your map into more than one, thus, flatmapping needed:
return map.entrySet().stream()
.flatMap(e -> e.getValue().stream()
.map(v -> String.format("%s-%s", e.getKey(), v))
.collect(Collectors.toList());
Going inside-out, it maps a single entry within a list that belongs to a single k/v pair into the string Key-SingleItemFromItsList.
Adding my two cents to excellent answer by #rzwitserloot. Already flatmap and map is explained in his answer.
List<String> resultLists = myMap.entrySet().stream()
.flatMap(mapEntry -> printEntries(mapEntry.getKey(),mapEntry.getValue())).collect(Collectors.toList());
System.out.println(resultLists);
Splitting this to a separate method gives good readability IMO,
private static Stream<String> printEntries(String key, List<String> values) {
return values.stream().map(val -> String.format("%s-%s",key,val));
}

Java 8 Streams : Count the occurrence of elements(List<String> list1) from list of text data(List<String> list2)

Input :
List<String> elements= new ArrayList<>();
elements.add("Oranges");
elements.add("Figs");
elements.add("Mangoes");
elements.add("Apple");
List<String> listofComments = new ArrayList<>();
listofComments.add("Apples are better than Oranges");
listofComments.add("I love Mangoes and Oranges");
listofComments.add("I don't know like Figs. Mangoes are my favorites");
listofComments.add("I love Mangoes and Apples");
Output : [Mangoes, Apples, Oranges, Figs] -> Output must be in descending order of the number of occurrences of the elements. If elements appear equal no. of times then they must be arranged alphabetically.
I am new to Java 8 and came across this problem. I tried solving it partially; I couldn't sort it. Can anyone help me with a better code?
My piece of code:
Function<String, Map<String, Long>> function = f -> {
Long count = listofComments.stream()
.filter(e -> e.toLowerCase().contains(f.toLowerCase())).count();
Map<String, Long> map = new HashMap<>(); //creates map for every element. Is it right?
map.put(f, count);
return map;
};
elements.stream().sorted().map(function).forEach(e-> System.out.print(e));
Output: {Apple=2}{Figs=1}{Mangoes=3}{Oranges=2}
In real life scenarios you would have to consider that applying an arbitrary number of match operations to an arbitrary number of comments can become quiet expensive when the numbers grow, so it’s worth doing some preparation:
Map<String,Predicate<String>> filters = elements.stream()
.sorted(String.CASE_INSENSITIVE_ORDER)
.map(s -> Pattern.compile(s, Pattern.LITERAL|Pattern.CASE_INSENSITIVE))
.collect(Collectors.toMap(Pattern::pattern, Pattern::asPredicate,
(a,b) -> { throw new AssertionError("duplicates"); }, LinkedHashMap::new));
The Predicate class is quiet valuable even when not doing regex matching. The combination of the LITERAL and CASE_INSENSITIVE flags enables searches with the intended semantic without the need to convert entire strings to lower case (which, by the way, is not sufficient for all possible scenarios). For this kind of matching, the preparation will include building the necessary data structure for the Boyer–Moore Algorithm for more efficient search, internally.
This map can be reused.
For your specific task, one way to use it would be
filters.entrySet().stream()
.map(e -> Map.entry(e.getKey(), listofComments.stream().filter(e.getValue()).count()))
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.forEachOrdered(e -> System.out.printf("%-7s%3d%n", e.getKey(), e.getValue()));
which will print for your example data:
Mangoes 3
Apple 2
Oranges 2
Figs 1
Note that the filters map is already sorted alphabetically and the sorted of the second stream operation is stable for streams with a defined encounter order, so it only needs to sort by occurrences, the entries with equal elements will keep their relative order, which is the alphabetical order from the source map.
Map.entry(…) requires Java 9 or newer. For Java 8, you’d have to use something like
new AbstractMap.SimpleEntry(…) instead.
You can still modify your function to store Map.Entry instead of a complete Map
Function<String, Map.Entry<String, Long>> function = f -> Map.entry(f, listOfComments.stream()
.filter(e -> e.toLowerCase().contains(f.toLowerCase())).count());
and then sort these entries before performing a terminal operation forEach in your case to print
elements.stream()
.map(function)
.sorted(Comparator.comparing(Map.Entry<String, Long>::getValue)
.reversed().thenComparing(Map.Entry::getKey))
.forEach(System.out::println);
This will then give you as output the following:
Mangoes=3
Apples=2
Oranges=2
Figs=1
First thing is to declare an additional class. It'll hold element and count:
class ElementWithCount {
private final String element;
private final long count;
ElementWithCount(String element, long count) {
this.element = element;
this.count = count;
}
String element() {
return element;
}
long count() {
return count;
}
}
To compute count let's declare an additional function:
static long getElementCount(List<String> listOfComments, String element) {
return listOfComments.stream()
.filter(comment -> comment.contains(element))
.count();
}
So now to find the result we need to transform stream of elements to stream of ElementWithCount objects, then sort that stream by count, then transform it back to stream of elements and collect it into result list.
To make this task easier, let's define comparator as a separate variable:
Comparator<ElementWithCount> comparator = Comparator
.comparing(ElementWithCount::count).reversed()
.thenComparing(ElementWithCount::element);
and now as all parts are ready, final computation is easy:
List<String> result = elements.stream()
.map(element -> new ElementWithCount(element, getElementCount(listOfComments, element)))
.sorted(comparator)
.map(ElementWithCount::element)
.collect(Collectors.toList());
You can use Map.Entry instead of a separate class and inline getElementCount, so it'll be "one-line" solution:
List<String> result = elements.stream()
.map(element ->
new AbstractMap.SimpleImmutableEntry<>(element,
listOfComments.stream()
.filter(comment -> comment.contains(element))
.count()))
.sorted(Map.Entry.<String, Long>comparingByValue().reversed().thenComparing(Map.Entry.comparingByKey()))
.map(Map.Entry::getKey)
.collect(Collectors.toList());
But it's much harder to understand in this form, so I recommend to split it to logical parts.

Stream group by multiple keys

I want to use streams in java to group long list of objects based on multiple fields. This will result in map of map of map of map of map of .... of map of lists.
How can I only extract lists from that complex stream?
Here is some example code for demonstration (list of strings, looking for groups with same length and first letter). I'm not interested in keys, just in resulting grouped entities.
List<String> strings = ImmutableList.of("A", "AA", "AAA", "B", "BB", "BBB", "C", "CC", "CCC", "ABA", "BAB", "CAC");
Map<Character, Map<Integer, List<String>>> collect = strings.stream().collect(
groupingBy(s -> s.charAt(0),
groupingBy(String::length)
)
);
This will produce following result
My Map =
{
A =
{
1 = [A]
2 = [AA]
3 = [AAA, ABA]
}
B =
{
1 = [B]
2 = [BB]
3 = [BBB, BAB]
}
C =
{
1 = [C]
2 = [CC]
3 = [CCC, CAC]
}
}
What I'm interested in is actually just lists from the above results and I want to do it ideally as part of groupby operation. I know it can be done for example by looping resulting maps structure. But is there a way to achieve it using streams?
[
[A],
[AA],
[AAA, ABA],
[B],
[BB],
[BBB, BAB],
[C],
[CC],
[CCC, CAC]
]
Instead of creating nested groups by using cascaded Collectors.groupingBy, you should group by a composite key:
Map<List<Object>, List<String>> map = strings.stream()
.collect(Collectors.groupingBy(s -> Arrays.asList(s.charAt(0), s.length())));
Then, simply grab the map values:
List<List<String>> result = new ArrayList<>(map.values());
If you are on Java 9+, you might want to change from Arrays.asList to List.of to create the composite keys.
This approach works very well for your case because you stated that you were not interested in keeping the keys, and because the List implementation returned by both Arrays.asList and List.of are well-defined in terms of their equals and hashCode methods, i.e. they can be safely used as keys in any Map.
I want to use streams in java to group long list of objects based on multiple fields.
This is trickier than your (invalid) example code leads me to think you expect. Nevertheless, you can flatten a stream via its appropriately-named flatMap() method. For a stream such as you describe, you might need to flatten multiple times or to define a custom mapping method or a complex lambda to flatten all the way down to what you're after.
In the case of a Map of the form presented in the question, you might do something like this:
List<List<String>> result = myMap.values().stream()
.flatMap(m -> m.values().stream()) // as many of these as needed
.collect(Collectors.toList());
If you want to get List<List<String>> as in your example you can use :
List<List<String>> list = collect.entrySet().stream()
.flatMap(e -> e.getValue().entrySet().stream())
.map(Map.Entry::getValue)
.collect(Collectors.toList());
Also if you want to get single list of strings, you can add one more flatMap operation:
...
.flatMap(e -> e.getValue().entrySet().stream())
.flatMap(e -> e.getValue().stream())
...
As #John Bollinger mentioned, using stream of values, but not an entries will be more simpler.

Java Stream collect - how to deduce type?

I have been given a stream of words, Stream<String> words, and a class Pair<String,Integer> which realizes a simple tuple for (someString, someInt) with getter and setter methods for both elements called getFirst,setFirst,getSecond,setSecond.
I am now supposed to box each word of the stream into a Pair (word, 1), and then use a Collector to somehow make the whole thing tell me how often each word is in the text. Now I've looked up a Collector that should let me do what I want to, and passed it as .collect(...) to the stream.
But the whole thing is looking so complex, and the type inference and deduction and wildcards that are floating around in that topic aren't making it any easier, so that I got now no clue, just what it is I've created.
I've tried deducing it from the API, and tried all the things I could come up with, but none of it seems to match:
words
.map(x -> new Pair<String,Integer>(x,1))
.collect(Collectors.groupingBy(
x -> x.getFirst(),
Collectors.reducing(
(a,b) -> new Pair<String,Integer>(a.getFirst(), a.getSecond() + b.getSecond())
)
));
Try using Collectors.toMap:
Collection<Pair<String, Integer>> values = words.collect(Collectors.toMap(
Function.identity(),
s -> new Pair<>(s, 1),
(a, b) -> {a.setSecond(a.getSecond() + b.getSecond()); return a;}
)).values();
It creates a map from your stream, using provided:
keyMapper - a mapping function to produce keys
valueMapper - a mapping function to produce values
mergeFunction - a merge function, used to resolve collisions between values associated with the same key
So it groups your Pairs by string value to a map, and then you just call .values() to get a collection of Pairs
The easiest (though not necessarily most efficient) solution would be to group to a map and then convert the entries to pairs:
List<Pair<String, Integer>> pairs = words
.collect(Collectors.groupingBy(x -> x, Collectors.summingInt(x -> 1)))
.entrySet()
.stream()
.map(e -> new Pair(e.getKey(), e.getValue()))
.collect(Collectors.toList());
I agree that entering the world of collectors can be a bit frightening at the beginning, particularly if you need to deal with generic type parameters.
There are many ways to solve your problem, both with and without streams.
With streams:
Map<String, Pair<String, Integer>> map = words.stream()
.collect(Collectors.toMap(
word -> word,
word -> new Pair<>(word, 1),
(o, n) -> {
o.setSecond(o.getSecond() + n.getSecond());
return o;
}));
Collection<Pair<String, Integer>> result = map.values();
Collectors.toMap works by transforming each element of the stream into the keys (this is the 1st argument word -> word, which means we leave the word as is, so that it will be the key of the map), and by transforming each element of the stream into the values (this is the 2nd argument word -> new Pair<>(word, 1), which means that we've found the word for the first time, so we're creating a new Pair instance for that word with a count of 1).
The 3rd argument is a merge function that is to be used to merge values when the 1st argument returns a key that already belongs to the map. As maps can't have more than one entry for the same key, we need a way to merge the value that is already in the map for that key, with the new value produced by the 2nd argument. In this case, o stands for the old value and n for the new value. The way I merge values is by summing the counts for the word and setting the new count in the Pair instance that corresponds to the old value. There's no need to create a new instance of Pair with the word and the new count, as it's safe to accumulate the count by mutating the old instance of Pair.
Without streams:
Map<String, Pair<String, Integer>> map = new HashMap<>();
words.forEach(word -> map.merge(
word,
new Pair<>(word, 1),
(o, n) -> {
o.setSecond(o.getSecond() + n.getSecond());
return o;
}));
Collection<Pair<String, Integer>> result = map.values();
This uses Map.merge and has similar semantics as the previous code.

How to partition a list by predicate using java8?

I have a list a which i want to split to few small lists.
say all the items that contains with "aaa", all that contains with "bbb" and some more predicates.
How can I do so using java8?
I saw this post but it only splits to 2 lists.
public void partition_list_java8() {
Predicate<String> startWithS = p -> p.toLowerCase().startsWith("s");
Map<Boolean, List<String>> decisionsByS = playerDecisions.stream()
.collect(Collectors.partitioningBy(startWithS));
logger.info(decisionsByS);
assertTrue(decisionsByS.get(Boolean.TRUE).size() == 3);
}
I saw this post, but it was very old, before java 8.
Like it was explained in #RealSkeptic comment Predicate can return only two results: true and false. This means you would be able to split your data only in two groups.
What you need is some kind of Function which will allow you to determine some common result for elements which should be grouped together. In your case such result could be first character in its lowercase (assuming that all strings are not empty - have at least one character).
Now with Collectors.groupingBy(function) you can group all elements in separate Lists and store them in Map where key will be common result used for grouping (like first character).
So your code can look like
Function<String, Character> firstChar = s -> Character.toLowerCase(s.charAt(0));
List<String> a = Arrays.asList("foo", "Abc", "bar", "baz", "aBc");
Map<Character, List<String>> collect = a.stream()
.collect(Collectors.groupingBy(firstChar));
System.out.println(collect);
Output:
{a=[Abc, aBc], b=[bar, baz], f=[foo]}
You can use Collectors.groupingBy to turn your stream of (grouping) -> (list of things in that grouping). If you don't care about the groupings themselves, then call values() on that map to get a Collection<List<String>> of your partitions.

Categories

Resources