Collecting occurrences in a HashMap with streams - java

I've got an exercise to solve. I've got a Fox class, which has got name and color fields. My exercise is to find the frequency of the foxes by color.
Thus I've created a HashMap, where the String attribute would be the fox name and the Integer would be the occurrence itself:
Map<String, Integer> freq = new HashMap<>();
Having done that, I have been trying to write the code with streams, but I am struggling to do that. I wrote something like this:
foxes.stream()
.map(Fox::getColor)
.forEach()
//...(continued later on);
, where foxes is a List.
My problem is basically with the syntax. I'd like to do something that if the color has no occurrences then
freq.put(Fox::getName, 1)
else
freq.replace(Fox::getName, freq.get(Fox::getName) + 1)
How should I put it together?

I wouldn't suggest proceeding with your approach simply because there is already a built in the collector for this i.e. groupingBy collector with counting() as downstream:
Map<String, Long> result = foxes.stream()
.collect(Collectors.groupingBy(Fox::getName, Collectors.counting()));
This finds the frequency by "name", likewise, you can get the frequency by colour by changing the classification function.
foxes.stream()
.collect(Collectors.groupingBy(Fox::getColor, Collectors.counting()));

Related

Handling nested Collections with Java 8 streams

Lately I came across a problem during working with nested collections (values of Maps inside a List):
List<Map<String, Object>> items
This list in my case contains 10-20 Maps.
At some point I had to replace value Calculation of key description to Rating. So I come up with this solution:
items.forEach(e -> e.replace("description","Calculation","Rating"));
It would be quite fine and efficient solution if all maps in this list will contain key-Value pair ["description", "Calculation"]. Unfortunately, I know that there will be only one such pair in the whole List<Map<String, Object>>.
The question is:
Is there a better (more efficient) solution of finding and replacing this one value, instead of iterating through all List elements using Java-8 streams?
Perfection would be to have it done in one stream without any complex/obfuscating operations on it.
items.stream()
.filter(map -> map.containsKey("description"))
.findFirst()
.ifPresent(map -> map.replace("description", "Calculation", "Rating"));
You will have to iterate over the list until a map with the key "description" is found. Pick up the first such, and try to replace.
As pointed out by #Holger, if the key "description" isn't single for all the maps, but rather the pair ("description", "Calculation") is unique:
items.stream()
.anyMatch(m -> m.replace("description", "Calculation", "Rating"));

Best Way / data structure to count occurrences of strings

Lets assume I have a very long list of strings. I want to count the number of occurrences of each string. I don't know how many and of what kind the strings are (means: I have no dictionary of all possible strings)
My first idea was to create a Map and to increase the integer every time I find the key again.
But this feels a bit clumsy. Is there a better way to count all occurrences of those strings?
Since Java 8, the easiest way is to use streams:
Map<String, Long> counts =
list.stream().collect(
Collectors.groupingBy(
Function.identity(), Collectors.counting()));
Prior to Java 8, your currently outlined approach works just fine. (And the Java 8+ way is doing basically the same thing too, just with a more concise syntax).
You can do it without streams too:
Map<String, Long> map = new HashMap<>();
list.forEach(x -> map.merge(x, 1L, Long::sum));
If you really want a specific datastructure, you can always look towards Guava's Multiset:
Usage will be similar to this:
List<String> words = Arrays.asList("a b c a a".split(" "));
Multiset<String> wordCounts = words.stream()
.collect(toCollection(HashMultiset::create));
wordCounts.count("a"); // returns 3
wordCounts.count("b"); // returns 1
wordCounts.count("z"); // returns 0, no need to handle null!

Java - Repeated word count in the large file

I want to find the repeated word count from the large file content. Is there any best approach using java 8 stream API?
Updated Details
File format is comma separated values and the file size is around 4 GB
I don’t know if there’s a best approach, and it would also depend on the details you haven’t told us. For now I am assuming a text file with a number of words separated by spaces on each line. A possible approach would be:
Map<String, Long> result = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" ")))
.collect(Collectors.groupingBy(word -> word, Collectors.counting()));
I think the splitting of each line into words needs to be refined; you will probably want to discard punctuation, for example. Take this as a starting point and develop it into what you need in your particular situation.
Edit: with thanks to #4castle for the inspiration, the splitting into words can be done in this way of you prefer a method reference over a lambda:
Map<String, Long> result = Files.lines(filePath)
.flatMap(Pattern.compile(" ")::splitAsStream)
.collect(Collectors.groupingBy(word -> word, Collectors.counting()));
It produces the same. Edit2: nonsense about optimization deleted here.
Maybe we shouldn’t go too far here until we know the more exact requirement for delimiting words in each line.
If you already have a list of all the words, say List<String> words then you can use something like:
Map<String, Integer> counts = words.parallelStream().
collect(Collectors.toConcurrentMap(
w -> w, w -> 1, Integer::sum));
You can perform same operation in different way just count number of words in the file(all words including repetitive words). Then simple add all words to the set(which does not allow duplicate values) collections using stream. Then perform total word count - size of the set. So easily you can get the all repetitive word count.
Long totalWordCount = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" "))).count();
Set<String> uniqueWords = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" ")))
.collect(Collectors.toSet());
Long repetitiveWordCount = totalWordCount - (long) uniqueWords.size();

Find most common/frequent element in an ArrayList in Java

I have an array list with 5 elements each of which is an Enum. I want to build a method which returns another array list with the most common element(s) in the list.
Example 1:
[Activities.WALKING, Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING]
Example 2:
[Activities.WALKING, Activities.WALKING, Activities.JOGGING, Activities.JOGGING, Activities.STANDING]
Method would return: [Activities.WALKING, Activities.JOGGING]
WHAT HAVE I TRIED:
My idea was to declare a count for every activity but that means that if I want to add another activity, I have to go and modify the code to add another count for that activity.
Another idea was to declare a HashMap<Activities, Integer> and iterate the array to insert each activity and its occurence in it. But then how will I extract the Activities with the most occurences?
Can you help me out guys?
The most common way of implementing something like this is counting with a Map: define a Map<MyEnum,Integer> which stores zeros for each element of your enumeration. Then walk through your list, and increment the counter for each element that you find in the list. At the same time, maintain the current max count. Finally, walk through the counter map entries, and add to the output list the keys of all entries the counts of which matches the value of max.
In statistics, this is called the "mode" (in your specific case, "multi mode" is also used, as you want all values that appear most often, not just one). A vanilla Java 8 solution looks like this:
Map<Activities, Long> counts =
Stream.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
long max = Collections.max(counts.values());
List<Activities> result = counts
.entrySet()
.stream()
.filter(e -> e.getValue().longValue() == max)
.map(Entry::getKey)
.collect(Collectors.toList());
Which yields:
[WALKING, JOGGING]
jOOλ is a library that supports modeAll() on streams. The following program:
System.out.println(
Seq.of(WALKING, WALKING, JOGGING, JOGGING, STANDING)
.modeAll()
.toList()
);
Yields:
[WALKING, JOGGING]
(disclaimer: I work for the company behind jOOλ)

Data structure for storing word associations

I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I'd like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general, Word: (Word_it_appears_with, Frequency), ....
Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?
The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.
I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.
Thanks
Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.
You can use the following data structure too:
Map<String, Map<String, Long>>
I would consider one of two options:
Option 1:
class Freq {
String otherWord;
int freq;
}
Multimap<String, Freq> mymap;
or maybe a Table
Table<String, String, int>
Given the above Freq: you might want to do bi-directional mapping:
class Freq{
String thisWord;
int otherFreq;
Freq otherWord;
}
This would allow for very quick updating of data pairs.

Categories

Resources