Java count word frequency using stream

Java count word frequency using stream - java

Hey I need to count frequency of words and return a string listing them. I have to omit words that have less than 4 characters and words that have count of less than 10. I have to order them from highest to lowest count as well as alphabetically if count is same.
Here's the code.
import java.util.*;
import java.util.stream.*;
public class Words {
public String countWords(List<String> lines) {
String text = lines.toString();
String[] words = text.split("(?U)\\W+");
Map<String, Long> freq = Arrays.stream(words).sorted()
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.counting()));
LinkedHashMap<String, Long> freqSorted = freq.entrySet().stream()
.filter(x -> x.getKey().length() > 3)
.filter(y -> y.getValue() > 9)
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.collect(Collectors.toMap(Map.Entry::getKey,
Map.Entry::getValue, (oldValue, newValue) -> oldValue,
LinkedHashMap::new));
return freqSorted.keySet().stream()
.map(key -> key + " - " + freqSorted.get(key))
.collect(Collectors.joining("\n", "", ""));
}
}
I can't change the argument of this method. I have trouble sorting it alphabetically after sorting it by value. Tried using thenCompare but couldn't make it work. Aside from that I'd appreciate any feedback on how to reduce number of lines so I don't have to stream 3 times.

Another aproach to do it in one go without intermediate collecting into maps is to wrap your grouping collector in collectingAndThen, where you can format your final result :
public String countWords(List<String> lines) {
String text = lines.toString();
String[] words = text.split("(?U)\\W+");
return Arrays.stream(words)
.filter(s -> s.length() > 3)
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(String::toLowerCase, Collectors.counting()),
map -> map.entrySet()
.stream()
.filter(e -> e.getValue() > 9)
.sorted(Map.Entry.<String, Long>comparingByValue().reversed()
.thenComparing(Map.Entry.comparingByKey()))
.map(e -> String.format("%s - %d", e.getKey(), e.getValue()))
.collect(Collectors.joining(System.lineSeparator()))));
}

Here is one approach. I am using your frequency count map as the source.
first define a comparator.
then sort putting the existing map into sorted order
toMap takes a key, value, merge function, and final map of LinkedhashMap to preserve the order.
Comparator<Entry<String, Long>> comp =
Entry.comparingByValue(Comparator.reverseOrder());
comp = comp.thenComparing(Entry.comparingByKey());
Map<String, Long> freqSorted = freq.entrySet().stream()
.filter(x -> x.getKey().length() > 3
&& x.getValue() > 9)
.sorted(comp)
.collect(Collectors.toMap(Entry::getKey,
Entry::getValue, (a, b) -> a,
LinkedHashMap::new));
Notes:
To verify that the sorting is proper you can comment out the filter and use fewer words.
you do not need to sort your initial stream of words when preparing the frequency count as they will be sorted in the final map.
the merge function is syntactically required but not used since there are no duplicates.
I chose not to use TreeMap as once the stream is sorted, there is no need to sort again.

The problem should be your LinkedHasMap because it only keeps insertion order and therefore can't be sorted. You can try using TreeMap since it can be sorted and keeps the order.
And I think you shouldn't focus about getting as less lines as possible instead try to get it as readable as possible for the future. So I think what you have there is fine because you split the streams in logical parts; Counting, Sorting and joining!
To swap to TreeMap just change the variable and collector type
Would look like this:
import java.util.*;
import java.util.stream.*;
public class Words {
public String countWords(List<String> lines) {
String text = lines.toString();
String[] words = text.split("(?U)\\W+");
Map<String, Long> freq = Arrays.stream(words).sorted()
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.counting()));
TreeMap<String, Long> freqSorted = freq.entrySet().stream()
.filter(x -> x.getKey().length() > 3)
.filter(y -> y.getValue() > 9)
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.collect(Collectors.toMap(Map.Entry::getKey,
Map.Entry::getValue, (oldValue, newValue) -> oldValue,
TreeMap::new));
return freqSorted.keySet().stream()
.map(key -> key + " - " + freqSorted.get(key))
.collect(Collectors.joining("\n", "", ""));
}
}

Related

To use streams to find keys in a list with max occurance

We have a list:
List<String> strList = Arrays.asList("10.0 string1", "10.3 string2", "10.0 string3", "10.4 string4","10.3 string5");
each entry is a string made of 2 strings separated by space.
Objective is to find all the entries with max number of occurance (i.e 10.0 and 10.3 wit 2 occurrences).
The following code works. Question is could these 3 statements be reduced to 1 or at least 2?
var map2 = strList.stream()
.map(m -> {String[] parts = m.split(" "); return parts[0];})
.collect((Collectors.groupingBy(Function.identity(),LinkedHashMap::new, Collectors.counting())));
var max3 = map2.entrySet().stream()
.max((entry1, entry2) -> entry1.getValue() > entry2.getValue() ? 1 : -1)
.get()
.getValue();
var listOfMax2 = map2.entrySet().stream()
.filter(entry -> entry.getValue() == max3)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
System.out.println(listOfMax2);

The code you have is pretty straight forward if you change the names of your variables to something meaningfull. You could write a custom collector, but i doubt it is worth the effort and is able to make your code much more readable. The easiest solution I can think of is, if you insists in chaining your stream, to first build the frequency and then invert the map to use the values(frequencies) as key and keys as value and to collect to a Treemap, which is sorted by key, and get the last entry:
List<String> strList = Arrays.asList("10.0 string1", "10.3 string2", "10.0 string3", "10.4 string4", "10.3 string5");
var mostFrequentEntries =
strList.stream()
.map(s -> s.substring(0, s.indexOf(' ')))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.collect(Collectors.groupingBy(Map.Entry::getValue, TreeMap::new, Collectors.mapping(Map.Entry::getKey, Collectors.toList())))
.lastEntry().getValue();
System.out.println(mostFrequentEntries);

This simplest way I know is to start with a frequency count of for the targeted value and return the maximum value and the map in a data structure for subsequent processing.
Here is some data (added to yours for demo)
List<String> strList = Arrays.asList("10.0 string1",
"10.0 string2", "10.3 string3", "10.0 string4",
"10.3 string5", "10.4 string6", "10.3 string7",
"10.4 string8", "10.5 string9", "10.6 string10");
first, stream the list and create a map based on frequency. This is done via using toMap and incrementing the count for duplicate keys.
then stream the entries of that map looking for the maximum count. Then return the count and the map in a SimpleEntry data structure.
Entry<Integer,Map<String,Integer>> result = strList.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(str -> str.split("\\s+")[0],, s -> 1,Integer::sum),
m -> new SimpleEntry<>(
m.isEmpty() ? 0 : Collections.max(m.values()),m)));
Now, using the returned map and the maximum count, print all the keys that have the same count.
int max = result.getKey();
result.getValue().forEach((k,v)-> {
if (v == max) {
System.out.println(k);
}
});
prints
10.4
10.3
10.0
Thanks to Holger for making some suggestions regarding Collections.max and the two argument version of String.split().

Java streams average

I need to create two methods using streams. A method that returns an average score of each task.
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {}
and a method that returns a task with the highest average score.
public String easiestTask(Stream<CourseResult> results) {}
I can only modify those 2 methods.
Here is CourseResult class
public class CourseResult {
private final Person person;
private final Map<String, Integer> taskResults;
public CourseResult(final Person person, final Map<String, Integer> taskResults) {
this.person = person;
this.taskResults = taskResults;
}
public Person getPerson() {
return person;
}
public Map<String, Integer> getTaskResults() {
return taskResults;
}
}
And methods that create CourseResult objects.
private final String[] programTasks = {"Lab 1. Figures", "Lab 2. War and Peace", "Lab 3. File Tree"};
private final String[] practicalHistoryTasks = {"Shieldwalling", "Phalanxing", "Wedging", "Tercioing"};
private Stream<CourseResult> programmingResults(final Random random) {
int n = random.nextInt(names.length);
int l = random.nextInt(lastNames.length);
return IntStream.iterate(0, i -> i + 1)
.limit(3)
.mapToObj(i -> new Person(
names[(n + i) % names.length],
lastNames[(l + i) % lastNames.length],
18 + random.nextInt(20)))
.map(p -> new CourseResult(p, Arrays.stream(programTasks).collect(toMap(
task -> task,
task -> random.nextInt(51) + 50))));
}
private Stream<CourseResult> historyResults(final Random random) {
int n = random.nextInt(names.length);
int l = random.nextInt(lastNames.length);
AtomicInteger t = new AtomicInteger(practicalHistoryTasks.length);
return IntStream.iterate(0, i -> i + 1)
.limit(3)
.mapToObj(i -> new Person(
names[(n + i) % names.length],
lastNames[(l + i) % lastNames.length],
18 + random.nextInt(20)))
.map(p -> new CourseResult(p,
IntStream.iterate(t.getAndIncrement(), i -> t.getAndIncrement())
.map(i -> i % practicalHistoryTasks.length)
.mapToObj(i -> practicalHistoryTasks[i])
.limit(3)
.collect(toMap(
task -> task,
task -> random.nextInt(51) + 50))));
}
Based on these methods I can calculate an average of each task by dividing sum of scores of this task by 3, because there are only 3 Persons tho I can make it so it divides by a number equal to number of CourseResult objects in a stream if these methods get their .limit(3) changed.
I don't know how to access keys of taskResults Map. I think I need them to then return a map of unique keys. A value for each unique key should be an average of values from taskResults map assigend to those keys.

For your first question: map each CourseResult to taskResults, flatmap to get all entries of each taskResults map form all CourseResults, group by map keys (task names) and collect averaging the values for same keys:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.groupingBy(Map.Entry::getKey, Collectors.averagingInt(Map.Entry::getValue)));
}
You can use the same approach for your second question to calculate the average for each task and finaly stream over the entries of the resulting map to find the task with the highest average.
public String easiestTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.groupingBy(Map.Entry::getKey, Collectors.averagingInt(Map.Entry::getValue)))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.orElse("No easy task found");
}
To avoid code duplication you can call the first method within the second:
public String easiestTask(Stream<CourseResult> results) {
return averageScoresPerTask(results).entrySet()
.stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.orElse("No easy task found");
}
EDIT
To customize the calculation of the average regardless how many items your maps contain, don't use the inbuilt operations like Collectors.averagingInt or Collectors.averagingDouble. Instead wrap your collector in collectingAndThen and sum the scores using Collectors.summingInt and finally after collecting divide using a divisor according if the task name starts with Lab or not:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results.map(CourseResult::getTaskResults)
.flatMap(m -> m.entrySet().stream())
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(Map.Entry::getKey, Collectors.summingInt(Map.Entry::getValue)),
map -> map.entrySet()
.stream()
.collect(Collectors.toMap(
Map.Entry::getKey,
e -> e.getKey().startsWith("Lab") ? e.getValue() / 3. : e.getValue() / 4.))
));
}

To create a map containing an average score for each task, you need to flatten the map taskResults of every CourseResult result object in the stream and group the data by key (i.e. by task name).
For that you can use collector groupingBy(), as its downstream collector that would be responsible for calculation the average from the score-values mapped to the same task you can use averagingDouble().
That's how it might look like:
public Map<String, Double> averageScoresPerTask(Stream<CourseResult> results) {
return results
.map(CourseResult::getTaskResults) // Stream<Map<String, Integer>> - stream of maps
.flatMap(map -> map.entrySet().stream()) // Stream<Map.Entry<String, Integer>> - stream of entries
.collect(Collectors.groupingBy(
Map.Entry::getKey,
Collectors.averagingDouble(Map.Entry::getValue)
));
}
To find the easiest task, you can use this map instead of passing the stream as an argument because the logic of this method requires applying the same operations. It would make sense in the real life scenario when you're retrieving the data that is stored somewhere (it would be better to avoid double-processing it) and more over in your case you can't generate a stream from the source twice and pass into these two methods because in your case stream data is random. Passing the same stream into both method is not an option because you can execute a stream pipeline only once, when it hits the terminal operation - it's done, you can't use it anymore, hence you can't pass the same stream with random data in these two methods.
public String easiestTask(Map<String, Double> averageByTask) {
return averageByTask.entrySet().stream()
.max(Map.Entry.comparingByValue()) // produces result of type Optianal<Map.Entry<String, Double>>
.map(Map.Entry::getKey) // transforming into Optianal<String>
.orElse("no data"); // or orElseThrow() if data is always expected to be present depending on your needs
}

Method to calculate the most frequent last name from list of given users with Java Stream API

Function should return optional of most frequent last name (if it encountered at least two times) or optional empty if number of last names is the same or list of users is empty
This is what i came up with, but it doesnt return Optional.empty
#Override
public Optional<String> getMostFrequentLastName(final List<User> users) {
return users.stream()
.map(User::getLastName)
.distinct()
.collect
(Collectors.groupingBy(
Function.identity(),
Collectors.summingInt(w -> 1)
))
.entrySet()
.stream()
.filter(stringIntegerEntry -> stringIntegerEntry.getValue() >= 2)
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.map(Map.Entry::getKey)
.findFirst();
}
This is my test class
public static void main(String[] args) {
Optional<String> optionalS = Stream.of(new User("name1"),
new User("name1"), new User("name2"), new User("name2"))
.map(User::getLastName)
.collect
(Collectors.groupingBy(
Function.identity(),
Collectors.counting()
))
.entrySet()
.stream()
.filter(stringIntegerEntry -> stringIntegerEntry.getValue() >= 2)
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.map(Map.Entry::getKey)
.findFirst();
System.out.println(optionalS.toString());
}
Here is the awnser
Optional[name2]
But should be
Optional[empty]

You may use
Optional<String> optionalS =
Stream.of(new User("name1"), new User("name1"), new User("name2"), new User("name2"))
.collect(Collectors.groupingBy(User::getLastName, Collectors.counting()))
.entrySet()
.stream()
.filter(entry -> entry.getValue() >= 2)
.reduce((e1, e2) -> e1.getValue() < e2.getValue()? e2:
e1.getValue() > e2.getValue()? e1:
new AbstractMap.SimpleImmutableEntry<>(null, e1.getValue()))
.map(Map.Entry::getKey);
System.out.println(optionalS.toString());
Getting the maximum value is a form of Reduction. Since you want to get an empty optional in case of a tie, the simplest solution is to write the reduction function explicitly, use the Map.Entry with the bigger value if there is one, otherwise construct a new Map.Entry with a null key.
The result of the reduction is already an Optional, which will be empty if there were no elements (with a count >=2). So the last map step is applied on an Optional. If already empty, the map function won’t be evaluated and the resulting Optional stays empty. If the optional is not empty, but Map.Entry::getKey evaluates to null, the resulting optional will be empty.

It seems to me that if you have the same number of maximum of some different lastNames you want to return an Optional::empty, as such:
Map<String, Long> map =
Stream.of(new User("name1"),
new User("name1"),
new User("name2"),
new User("name2"))
.collect(Collectors.groupingBy(User::getLastName, Collectors.counting()));
map.entrySet()
.stream()
.max(Entry.comparingByValue())
.flatMap(en -> {
boolean b = map.entrySet()
.stream()
.filter(x -> !x.getKey().equals(en.getKey()))
.mapToLong(Entry::getValue)
.noneMatch(x -> x == en.getValue());
return b ? Optional.of(en.getKey()) : Optional.empty();
})
.ifPresent(System.out::println);
}

Here my monster for you:
Optional<String> optionalS = Stream.of(
new User("name1"),
new User("name1"),
new User("name2"),
new User("name2"))
.map(User::getLastName)
.collect(
Collectors.groupingBy(
Function.identity(),
Collectors.counting()
))
.entrySet()
.stream()
.filter(stringIntegerEntry -> stringIntegerEntry.getValue() >= 2)
.collect(
Collectors.groupingBy(
Map.Entry::getValue,
Collectors.toList()
))
.entrySet()
.stream()
.sorted(Comparator.comparing(
Map.Entry::getKey,
Comparator.reverseOrder()))
.map(Map.Entry::getValue)
.findFirst()
.filter(x -> x.size() == 1)
.map(x -> x.get(0).getKey());
System.out.println(optionalS);

As far as I undestand your solution in stream you code creates
Map<String(lastname),Integer(number of occurence)>
and then filter that map where number of occurence >=2 and in your test case you have map with entries:
<"name1",2>
<"name2",2>
So ordering by value will still return two values.
You should try create
Map<Integer,List<String>>
which will store number of occurence -> names, then filter map keys, sort them descending and (in map value) you will get most frequently lastname (or lastnames if there were more than once in input).
//edit
Below short snippet with my solution:
Map<Integer, List<String>> map = new HashMap<>();
map.put(2,Arrays.asList("name1","name2"));
Optional<String> optionalS = map
.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey(Comparator.reverseOrder()))
.findFirst() //get max{map's keys}
.filter(x->x.getValue().size() == 1) //get lastname that occured only once
.map(x->x.getValue().get(0)); //get that lastname (above filter check that list has only one element) or Optional.empty if stream didn't find any
System.out.println(optionalS.toString());
I skipped the part of creating map.
P.S. You can replace HashMap with TreeMap with custom comparator to avoid sorting in stream.

How create Map<String,List<Long>> java 8 with single stream?

I have to create a Map<String,List<Long>> (possibly with single stream ) with key= name of the course , value= number of times the course was chosen as first choice (first entry of the list) ,second choice (second entry of the list) ,third choice (third entry of the list) ,
for Example : Chemstry, List< 4,6,7>
I tried with this but gives me errors:
return courses.values().stream()
.collect(groupingBy(Course::getNome,TreeMap::new, collectingAndThen(Course::getchoice, counting()));

The grouping and counting is fairly simple, but getting into a list takes a bit more work. Here's one way to do that by collectingAndThen streaming the counts:
courses.values()
.stream()
.collect(groupingBy(
Course::getName,
collectingAndThen(
groupingBy(Course::getChoice, counting()),
counts -> IntStream.range(0, 3)
.mapToObj(i -> counts.getOrDefault(i + 1, 0L))
.collect(toList()))))
Ideone Demo
EDIT: #Eugene suggests I've misunderstood the requirements. If you want to list all the choices rather than the top three, just replace 3 with Collections.max(counts.keySet()).

Just for the fun of it, if you are willing to do it in two steps:
static Map<String, List<Long>> group(Map<?, Course> courses) {
Map<String, List<Long>> m = courses.values()
.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(
Course::getName,
Course::getChoice,
Math::max),
map -> map.entrySet().stream()
.collect(Collectors.toMap(
Entry::getKey,
e -> new ArrayList<>(Collections.nCopies(e.getValue(), 0L))))
));
courses.values()
.forEach(x -> {
List<Long> l = m.get(x.getName());
l.set(x.getChoice() - 1, l.get(x.getChoice() - 1) + 1);
});
return m;
}

Swapping key from a Map<Key, List<Values>>

I'm searching a solution for this problem(it is for an exam):
I have a Map < String, SortedSet < String > > operators populated by a function
public void addOperator(String operatorName, String... destinationNames) throws ProposalException {
if(operators.containsKey((operatorName))){
throw new ProposalException("Operator " + operatorName + "already into system!");
}
else{
SortedSet<String> destinationstemp=new TreeSet<>();
for(String s: destinationNames){
if(s!=null){
destinationstemp.add(s);
}
}
operators.put(operatorName, destinationstemp);
}
Now, i want to create a new Map < String, SortedSet < String > > destinations that has as key the destinationName and as values the operatorNames related.
How can i make this out?
P.S: this one up there is the usage of the methods and the not-in-code part is the output wanted. Sorry for the bad formattation of the code. ph is the instance of the façade pattern class
public SortedSet<String> getDestOperators(String destinationName) {...}//method that returns the **destinations** values related to destinationName}
ph.addOperator("op3","london","rome");
ph.addOperator("op2","london","berlin");
ph.addOperator("op5","berlin","rome","madrid");
ph.addOperator("op1","london","madrid","berlin");
ph.addOperator("op10","rome");
ph.addOperator("op4","madrid","berlin");
System.out.println(ph.getDestOperators("madrid"));
Output: [op1, op4, op5]

you need to go through each entry in your map and check if inner set contains the value you are checking against,
public SortedSet<String> getDestOperators(String destinationName) {
Set<String> result = new HashSet<String>();
for(Map.Entry<String,Set<String>> entry : operators.getValues()){
if(entry.getValue().contains(destinationName)){
results.add(entry.getKey());
}
}
return result;
}

To get your example output a simple one-liner with streams:
List<String> result = operators.entrySet().stream().filter(entry -> entry.getValue().contains(destinationName)).map(Entry::getKey).sorted().collect(Collectors.toList());
or here for better readability spread over multiple lines:
List<String> result = operators
.entrySet()
.stream()
.filter(entry -> entry.getValue().contains(destinationName))
.map(Entry::getKey)
.sorted()
.collect(Collectors.toList());
A more complex one-liner if you want to "reverse" the mapping as described in your text:
Map<String, List<String>> result = operators.entrySet().stream().flatMap(entry -> entry.getValue().stream().collect(Collectors.toMap(Function.identity(), o -> Arrays.asList(entry.getKey()))).entrySet().stream()).collect(Collectors.toMap(Entry::getKey, Entry::getValue, (a, b) -> Stream.of(a, b).flatMap(List::stream).sorted().collect(Collectors.toList())));
or here for better readability spread over multiple lines:
Map<String, List<String>> result2 = operators
.entrySet()
.stream()
.flatMap(entry -> entry
.getValue()
.stream()
.collect(Collectors.toMap(Function.identity(),
o -> Arrays.asList(entry.getKey())))
.entrySet()
.stream())
.collect(Collectors.toMap(Entry::getKey,
Entry::getValue,
(a, b) -> Stream.of(a, b)
.flatMap(List::stream)
.sorted()
.collect(Collectors.toList())));

What you need to do, is loop over each operator, and then loop over all entries in the list, if value from the list is not yet present in your output map, you add it, else you modify its colection of operators.
Here is some code for you:
origin.forEach((key, list) -> {list.forEach(city -> {
if(result.containsKey(city))
result.get(city).add(key);
else{
SortedSet<String> set = new TreeSet<>();
set.add(key);
result.put(city, set);
});
});

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java count word frequency using stream - java

Related

To use streams to find keys in a list with max occurance

Java streams average

Method to calculate the most frequent last name from list of given users with Java Stream API

How create Map<String,List<Long>> java 8 with single stream?

Swapping key from a Map<Key, List<Values>>

Categories

Resources