Best Way / data structure to count occurrences of strings

Best Way / data structure to count occurrences of strings - java

Lets assume I have a very long list of strings. I want to count the number of occurrences of each string. I don't know how many and of what kind the strings are (means: I have no dictionary of all possible strings)
My first idea was to create a Map and to increase the integer every time I find the key again.
But this feels a bit clumsy. Is there a better way to count all occurrences of those strings?

Since Java 8, the easiest way is to use streams:
Map<String, Long> counts =
list.stream().collect(
Collectors.groupingBy(
Function.identity(), Collectors.counting()));
Prior to Java 8, your currently outlined approach works just fine. (And the Java 8+ way is doing basically the same thing too, just with a more concise syntax).

You can do it without streams too:
Map<String, Long> map = new HashMap<>();
list.forEach(x -> map.merge(x, 1L, Long::sum));

If you really want a specific datastructure, you can always look towards Guava's Multiset:
Usage will be similar to this:
List<String> words = Arrays.asList("a b c a a".split(" "));
Multiset<String> wordCounts = words.stream()
.collect(toCollection(HashMultiset::create));
wordCounts.count("a"); // returns 3
wordCounts.count("b"); // returns 1
wordCounts.count("z"); // returns 0, no need to handle null!

Related

Reference antecedent Java stream step without breaking the stream pipeline?

I am new to functional programming, and I am trying to get better.
Currently, I am experimenting with some code that takes on the following basic form:
private static int myMethod(List<Integer> input){
Map<Integer,Long> freq = input
.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
return (int) freq
.keySet()
.stream()
.filter(key-> freq.containsKey(freq.get(key)))
.count();
}
First a hashmap is used to get the frequency of each number in the list. Next, we sum up the amount of keys which have their values that also exist as keys in the map.
What I don't like is how the two streams need to exist apart from one another, where a HashMap is made from a stream only to be instantly and exclusively consumed by another stream.
Is there a way to combine this into one stream? I was thinking something like this:
private static int myMethod(List<Integer> input){
return (int) input
.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.keySet()
.stream()
.filter(key-> freq.containsKey(freq.get(key)))
.count();
}
but the problem here is there is no freq map to reference, as it is used as part of the pipeline, so the filter cannot do what it needs to do.
In summary, I don't like that this collects to a hashmap only then to convert back into a keyset. Is there a way to "streamline" (pun intended) this operation to
Not be going back and forth from stream and hashmap
Reference itself in a way without needing to declare a separate map before the pipeline.
Thank you!

Your keySet is nothing but effectively a HashSet formed of your input. So, you should make use of temporary storage such that:
Set<Integer> freq = new HashSet<>(input);
and further count, filter based on values in a single stream pipeline as
return (int) input
.stream()
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.values() // just using the frequencies evaluated
.stream()
.filter(count -> freq.contains(count.intValue()))
.count();

Efficient way to select a range of key-value pairs from a Java HashMap

I am trying to come up with an efficient way to select a discrete, but arbitrary range of key-value pairs from a HashMap. This is easy in Python, but seems difficult in Java. I was hoping to avoid using Iterators, since they seems slow for this application (correct me if I'm wrong).
For example, I'd like to be able to make the following call:
ArrayList<Pair<K, V>> values = pairsFromRange(hashMap, 0, 5);

You can't do anything that performs meaningfully better than an Iterator to do this with a HashMap.
If you use a TreeMap, however, this becomes easy: use subMap(0, 5) or the like.

Looks straightforward with lambdas (implies iteration of course). skip(n) and limit(n) should allow to address any slice of the map.
Map<String, String> m = new HashMap<>();
m.put("k1","v");
m.put("k2","v");
m.put("k3","v");
m.put("k4","v");
m.put("k5","v");
Map<String,String> slice = m.entrySet().stream()
.limit(3)
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue()));
System.out.println(slice);
slice ==> {k1=v, k2=v, k3=v}
slice = m.entrySet().stream()
.skip(2)
.limit(3)
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue()));
System.out.println(slice);
slice ==> {k3=v, k4=v, k5=v}

How to use Java streams to create a list out of a map site

Starting with a map like:
Map<Integer, String> mapList = new HashMap<>();
mapList.put(2,"b");
mapList.put(4,"d");
mapList.put(3,"c");
mapList.put(5,"e");
mapList.put(1,"a");
mapList.put(6,"f");
I can sort the map using Streams like:
mapList.entrySet()
.stream()
.sorted(Map.Entry.<Integer, String>comparingByKey())
.forEach(System.out::println);
But I need to get list (and a String) of the correspondent sorted elements (that would be: a b c d e f) that do correspond with the keys: 1 2 3 4 5 6.
I cannot find the way to do it in that Stream command.
Thanks
As #MA says in his comment I need a mapping and that is not explained in this question: How to convert a Map to List in Java?
So thank you very much #MA
Sometimes people are too fast into closing questions!

You can use a mapping collector:
var sortedValues = mapList.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.collect(Collectors.mapping(Entry::getValue, Collectors.toList()))

You could also use some of the different collection classes instead of streams:
List<String> list = new ArrayList<>(new TreeMap<>(mapList).values());
The downside being that if you do all that in a single line it can get quite messy, quite fast. Additionally you're throwing away the intermediate TreeMap just for the sorting.

If you want to sort on the keys and collect only the values, you need to use a mapping function to only preserve the values after your sorting. Afterwards you can just collect or do a foreach loop.
mapList.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.map(Map.Entry::getValue)
.collect(Collectors.toList());

Collecting occurrences in a HashMap with streams

I've got an exercise to solve. I've got a Fox class, which has got name and color fields. My exercise is to find the frequency of the foxes by color.
Thus I've created a HashMap, where the String attribute would be the fox name and the Integer would be the occurrence itself:
Map<String, Integer> freq = new HashMap<>();
Having done that, I have been trying to write the code with streams, but I am struggling to do that. I wrote something like this:
foxes.stream()
.map(Fox::getColor)
.forEach()
//...(continued later on);
, where foxes is a List.
My problem is basically with the syntax. I'd like to do something that if the color has no occurrences then
freq.put(Fox::getName, 1)
else
freq.replace(Fox::getName, freq.get(Fox::getName) + 1)
How should I put it together?

I wouldn't suggest proceeding with your approach simply because there is already a built in the collector for this i.e. groupingBy collector with counting() as downstream:
Map<String, Long> result = foxes.stream()
.collect(Collectors.groupingBy(Fox::getName, Collectors.counting()));
This finds the frequency by "name", likewise, you can get the frequency by colour by changing the classification function.
foxes.stream()
.collect(Collectors.groupingBy(Fox::getColor, Collectors.counting()));

Java - Repeated word count in the large file

I want to find the repeated word count from the large file content. Is there any best approach using java 8 stream API?
Updated Details
File format is comma separated values and the file size is around 4 GB

I don’t know if there’s a best approach, and it would also depend on the details you haven’t told us. For now I am assuming a text file with a number of words separated by spaces on each line. A possible approach would be:
Map<String, Long> result = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" ")))
.collect(Collectors.groupingBy(word -> word, Collectors.counting()));
I think the splitting of each line into words needs to be refined; you will probably want to discard punctuation, for example. Take this as a starting point and develop it into what you need in your particular situation.
Edit: with thanks to #4castle for the inspiration, the splitting into words can be done in this way of you prefer a method reference over a lambda:
Map<String, Long> result = Files.lines(filePath)
.flatMap(Pattern.compile(" ")::splitAsStream)
.collect(Collectors.groupingBy(word -> word, Collectors.counting()));
It produces the same. Edit2: nonsense about optimization deleted here.
Maybe we shouldn’t go too far here until we know the more exact requirement for delimiting words in each line.

If you already have a list of all the words, say List<String> words then you can use something like:
Map<String, Integer> counts = words.parallelStream().
collect(Collectors.toConcurrentMap(
w -> w, w -> 1, Integer::sum));

You can perform same operation in different way just count number of words in the file(all words including repetitive words). Then simple add all words to the set(which does not allow duplicate values) collections using stream. Then perform total word count - size of the set. So easily you can get the all repetitive word count.
Long totalWordCount = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" "))).count();
Set<String> uniqueWords = Files.lines(filePath)
.flatMap(line -> Stream.of(line.split(" ")))
.collect(Collectors.toSet());
Long repetitiveWordCount = totalWordCount - (long) uniqueWords.size();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best Way / data structure to count occurrences of strings - java

You can do it without streams too: Map<String, Long> map = new HashMap<>(); list.forEach(x -> map.merge(x, 1L, Long::sum));

Related

Reference antecedent Java stream step without breaking the stream pipeline?

Efficient way to select a range of key-value pairs from a Java HashMap

How to use Java streams to create a list out of a map site

Collecting occurrences in a HashMap with streams

Java - Repeated word count in the large file

Categories

Resources