How to preserve all Subgroups while applying nested groupingBy collector - java

I am trying to group a list of employees by the gender and department.
How do I ensure all departments are included in a sorted order for each gender, even when the relevant gender count is zero?
Currently, I have the following code and output
employeeRepository.findAll().stream()
.collect(Collectors.groupingBy(Employee::getGender,
Collectors.groupingBy(Employee::getDepartment,
Collectors.counting())));
//output
//{MALE={HR=1, IT=1}, FEMALE={MGMT=1}}
Preferred output is:
{MALE={HR=1, IT=1, MGMT=0}, FEMALE={HR=0, IT=0, MGMT=1}}

To achieve that, first you have to group by department, and only then by gender, not the opposite.
The first collector groupingBy(Employee::getDepartment, _downstream_ ) will split the data set into groups based on department. As it downstream collector partitioningBy(employee -> employee.getGender() == Employee.Gender.MALE, _downstream_ ) will be applied, it'll divide the data mapped to each department into two parts based on the employee gender. And finally, Collectors.counting() applied as a downstream will provide the total number of employees of each gender for every department.
So the intermediate map produced by the collect() operation will be of type Map<String, Map<Boolean, Long>> - employee count by gender (Boolean) for each department (for simplicity, department is a plain string).
The next step in transform this map into Map<Employee.Gender, Map<String, Long>> - employee count by department for each gender.
My approach is to create a stream over the entry set and replace each entry with a new one, which will hold a gender as its key and in order to preserve the information about a department its value in turn will be an entry with a department as a key and a with a count by department as its value.
Then collect the stream of entries with groupingBy by the entry key. Apply mapping as a downstream collector to extract the nested entry. And then apply Collectors.toMap() to collect entries of type Map.Entry<String, Long> into map.
all departments are included in a sorted order
To insure the order in the nested map (department by count) a NavigableMap should be used.
In order to do that, a flavor of toMap() that expects a mapFactory needs to be used (it also expects a mergeFunction which isn't really useful for this task since there will be no duplicates, but it has to be provided as well).
public static void main(String[] args) {
List<Employee> employeeRepository =
List.of(new Employee("IT", Employee.Gender.MALE),
new Employee("HR", Employee.Gender.MALE),
new Employee("MGMT", Employee.Gender.FEMALE));
Map<Employee.Gender, NavigableMap<String, Long>> departmentCountByGender = employeeRepository
.stream()
.collect(Collectors.groupingBy(Employee::getDepartment, // Map<String, Map<Boolean, Long>> - department to *employee count* by gender
Collectors.partitioningBy(employee -> employee.getGender() == Employee.Gender.MALE,
Collectors.counting())))
.entrySet().stream()
.flatMap(entryDep -> entryDep.getValue().entrySet().stream()
.map(entryGen -> Map.entry(entryGen.getKey() ? Employee.Gender.MALE : Employee.Gender.FEMALE,
Map.entry(entryDep.getKey(), entryGen.getValue()))))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue,
Collectors.toMap(Map.Entry::getKey,
Map.Entry::getValue,
(v1, v2) -> v1,
TreeMap::new))));
System.out.println(departmentCountByGender);
}
Dummy Employee class used for demo-purposes:
class Employee {
enum Gender {FEMALE, MALE};
private String department;
private Gender gender;
// etc.
// constructor, getters
}
Output
{FEMALE={HR=0, IT=0, MGMT=1}, MALE={HR=1, IT=1, MGMT=0}}

You can continue to work on the result of your code:
List<String> deptList = employees.stream().map(Employee::getDepartment).sorted().toList();
Map<Gender, Map<String, Long>> tmpResult = employees.stream()
.collect(Collectors.groupingBy(Employee::getGender, Collectors.groupingBy(Employee::getDepartment, Collectors.counting())));
Map<Gender, Map<String, Long>> finalResult = new HashMap<>();
for (Map.Entry<Gender, Map<String, Long>> entry : tmpResult.entrySet()) {
Map<String, Long> val = new LinkedHashMap<>();
for (String dept : deptList) {
val.put(dept, entry.getValue().getOrDefault(dept, 0L));
}
finalResult.put(entry.getKey(), val);
}
System.out.print(finalResult);
Probably readability or maintainability of code won't be good if you want to achieve result with one line of code.
However, there is one alternative if you don't mind to use third-party library: abacus-common
Map<Gender, Map<String, Integer>> result = Stream.of(employees)
.groupByToEntry(Employee::getGender, MoreCollectors.countingIntBy(Employee::getDepartment)) // step 1) group by gender
.mapValue(it -> Maps.newMap(deptList, Fn.identity(), dept -> it.getOrDefault(dept, 0), IntFunctions.ofLinkedHashMap())) // step 2) process the value.
.toMap();
Declaration: I'm the developer of abacus-common

Related

Building immutable map using a list of objects

I have a list of students and a delegate that has a function to get a list of servers for a given student (getServersForStudent(student)). I would like to create a map for a list of students indexed for each server. A student can be in many servers.
private Map<Server, Student> getStudentsByServer(List<Student> students) {
Map<Server, List<Student>> map = new HashMap<>();
students.forEach(student ->
List<Server> servers = delegate.getServersForStudent(student);
if (!servers.isEmpty()) {
servers.forEach(server -> map.putIfAbsent(server, new ArrayList<>()).add(student));
}
);
return map
}
This works perfectly, but I would like to refactor this to use streams in order to make an immutable collection instead. I tried doing this with groupingBy, but I wasn't able to get the right result:
students
.stream()
.collect(
Collectors.groupingBy(
student -> delegate.getServersForStudent(student);
Collectors.mapping(Function.identity(), Collectors.toList())
)
);
This grouping doesn't have the same result as above since it is grouping by lists. Does anyone have any suggestions on how to best do this with Java streams?
Streams are not required to return an immutable collection; simply copy your collection into an immutable one in the end or wrap it in an unmodifiable wrapper:
private Map<Server, Student> getStudentsByServer(final List<Student> students) {
final Map<Server, List<Student>> map = new HashMap<>();
for (final Student student : students) {
for (final Server server : delegate.getServersForStudent(student)) {
map.computeIfAbsent(server, new ArrayList<>())
.add(student);
}
);
// wrap:
// return Collections.unmodifiableMap(map);
// or copy:
return Map.copyOf(map);
}
If you really want to do it stream-based, you have to first create a stream of tuples (student, server), which you can then group. Java does not have a specific tuple type, but short of creating a custom type, you can misuse Map.Entry<K, V> for that:
students
.stream()
.flatMap(student -> delegate.getServersForStudent(student)
.stream()
.map(server -> Map.entry(student, server)))
.collect(
Collectors.groupingBy(
tuple -> tuple.getValue(),
Collectors.mapping(
tuple -> tuple.getKey(),
Collectors.toList())));
Note that the collection return by Collectors don't make any promises about the (im)mutability. If you require immutability, you have to add another collection step using Collectors.collectingAndThen:
.collect(
Collectors.collectingAndThen(
Collectors.groupingBy(
tuple -> tuple.getValue(),
Collectors.mapping(
tuple -> tuple.getKey(),
Collectors.toList())),
Map::copyOf);
// or wrap with: Collections::unmodifiableMap
And it's definitely worthwhile to mention that an unmodifiable/immutable map as in the example above still allows to modify the list of servers, because that Collectors.toList() currently returns an ArrayList. If you require the value of the map to be immutable too, you have to take care of that yourself, e.g. using Collectors.toUnmodifiableList or by copying/wrapping the list again.

Groupby counts in java

I am pretty new to java moving from c#. I have the following class.
class Resource {
String name;
String category;
String component;
String group;
}
I want to know the following numbers:
1. Count of resources in the category.
2. Distinct count of components in each category. (component names can be duplicate)
3. Count of resources grouped by category and group.
I was able to achieve a little bit of success using Collectors.groupingBy. However, the result is always like this.
Map<String, List<Resource>>
To get the counts I have to parse the keyset and compute the sizes.
Using c# linq, I can easily compute all the above metrics.
I am assuming there is definitely a better way to do this in java as well. Please advise.
For #1, I'd use Collectors.groupingBy along with Collectors.counting:
Map<String, Long> resourcesByCategoryCount = resources.stream()
.collect(Collectors.groupingBy(
Resource::getCategory,
Collectors.counting()));
This groups Resource elements by category, counting how many of them belong to each category.
For #2, I wouldn't use streams. Instead, I'd use the Map.computeIfAbsent operation (introduced in Java 8):
Map<String, Set<String>> distinctComponentsByCategory = new LinkedHashMap<>();
resources.forEach(r -> distinctComponentsByCategory.computeIfAbsent(
r.getCategory(),
k -> new HashSet<>())
.add(r.getGroup()));
This first creates a LinkedHashMap (which preserves insertion order). Then, Resource elements are iterated and put into this map in such a way that they are grouped by category and each group is added to a HashSet that is mapped to each category. As sets don't allow duplicates, there won't be duplicated groups for any category. Then, the distinct count of groups is the size of each set.
For #3, I'd again use Collectors.groupingBy along with Collectors.counting, but I'd use a composite key to group by:
Map<List<String>, Long> resourcesByCategoryAndGroup = resources.stream()
.collect(Collectors.groupingBy(
r -> Arrays.asList(r.getCategory(), r.getGroup()), // or List.of
Collectors.counting()));
This groups Resource elements by category and group, counting how many of them belong to each (category, group) pair. For the grouping key, a two-element List<String> is being used, with the category being its 1st element and the component being its 2nd element.
Or, instead of using a composite key, you could use nested grouping:
Map<String, Map<String, Long>> resourcesByCategoryAndGroup = resources.stream()
.collect(Collectors.groupingBy(
Resource::getCategory,
Collectors.groupingBy(
Resource::getGroup,
Collectors.counting())));
Thanks Fedrico for detailed response. #1 and #3 worked great. For #2, i would like to see an output of Map. Here's the code that i am using currently to get that count. This is without using collectors in old style.
HashMap<String, HashSet<String>> map = new HashMap<>();
for (Resource resource : resources) {
if (map.containsKey(resource.getCategory())) {
map.get(resource.getCategory()).add(resource.getGroup());
} else
HashSet<String> componentSet = new HashSet<>();
componentSet.add(resource.getGroup());
map.put(resource.getCategory(), componentSet);
}
}
log.info("Group count in each category");
for (Map.Entry<String, HashSet<String>> entry : map.entrySet()) {
log.info("{} - {}", entry.getKey(), entry.getValue().size());
}

Use java stream to group by 2 keys on the same type

Using java stream, how to create a Map from a List to index by 2 keys on the same class?
I give here a code Example, I would like the map "personByName" to get all person by firstName OR lastName, so I would like to get the 3 "steves": when it's their firstName or lastname. I don't know how to mix the 2 Collectors.groupingBy.
public static class Person {
final String firstName;
final String lastName;
protected Person(String firstName, String lastName) {
super();
this.firstName = firstName;
this.lastName = lastName;
}
public String getFirstName() {
return firstName;
}
public String getLastName() {
return lastName;
}
}
#Test
public void testStream() {
List<Person> persons = Arrays.asList(
new Person("Bill", "Gates"),
new Person("Bill", "Steve"),
new Person("Steve", "Jobs"),
new Person("Steve", "Wozniac"));
Map<String, Set<Person>> personByFirstName = persons.stream().collect(Collectors.groupingBy(Person::getFirstName, Collectors.toSet()));
Map<String, Set<Person>> personByLastName = persons.stream().collect(Collectors.groupingBy(Person::getLastName, Collectors.toSet()));
Map<String, Set<Person>> personByName = persons.stream().collect(Collectors.groupingBy(Person::getLastName, Collectors.toSet()));// This is wrong, I want bot first and last name
Assert.assertEquals("we should search by firstName AND lastName", 3, personByName.get("Steve").size()); // This fails
}
I found a workaround by looping on the 2 maps, but it is not stream-oriented.
You can do it like this:
Map<String, Set<Person>> personByName = persons.stream()
.flatMap(p -> Stream.of(new SimpleEntry<>(p.getFirstName(), p),
new SimpleEntry<>(p.getLastName(), p)))
.collect(Collectors.groupingBy(SimpleEntry::getKey,
Collectors.mapping(SimpleEntry::getValue, Collectors.toSet())));
Assuming you add a toString() method to the Person class, you can then see result using:
List<Person> persons = Arrays.asList(
new Person("Bill", "Gates"),
new Person("Bill", "Steve"),
new Person("Steve", "Jobs"),
new Person("Steve", "Wozniac"));
// code above here
personByName.entrySet().forEach(System.out::println);
Output
Steve=[Steve Wozniac, Bill Steve, Steve Jobs]
Jobs=[Steve Jobs]
Bill=[Bill Steve, Bill Gates]
Wozniac=[Steve Wozniac]
Gates=[Bill Gates]
You could merge the two Map<String, Set<Person>> for example
Map<String, Set<Person>> personByFirstName =
persons.stream()
.collect(Collectors.groupingBy(
Person::getFirstName,
Collectors.toCollection(HashSet::new))
);
persons.stream()
.collect(Collectors.groupingBy(Person::getLastName, Collectors.toSet()))
.forEach((str, set) -> personByFirstName.merge(str, set, (s1, s2) -> {
s1.addAll(s2);
return s1;
}));
// personByFirstName contains now all personByName
One way would be by using the newest JDK12's Collector.teeing:
Map<String, List<Person>> result = persons.stream()
.collect(Collectors.teeing(
Collectors.groupingBy(Person::getFirstName,
Collectors.toCollection(ArrayList::new)),
Collectors.groupingBy(Person::getLastName),
(byFirst, byLast) -> {
byLast.forEach((last, peopleList) ->
byFirst.computeIfAbsent(last, k -> new ArrayList<>())
.addAll(peopleList));
return byFirst;
}));
Collectors.teeing collects to two separate collectors and then merges the results into a final value. From the docs:
Returns a Collector that is a composite of two downstream collectors. Every element passed to the resulting collector is processed by both downstream collectors, then their results are merged using the specified merge function into the final result.
So, the above code collects to a map by first name and also to a map by last name and then merges both maps into a final map by iterating the byLast map and merging each one of its entries into the byFirst map by means of the Map.computeIfAbsent method. Finally, the byFirst map is returned.
Note that I've collected to a Map<String, List<Person>> instead of to a Map<String, Set<Person>> to keep the example simple. If you actually need a map of sets, you could do it as follows:
Map<String, Set<Person>> result = persons.stream().
.collect(Collectors.teeing(
Collectors.groupingBy(Person::getFirstName,
Collectors.toCollection(LinkedHashSet::new)),
Collectors.groupingBy(Person::getLastName, Collectors.toSet()),
(byFirst, byLast) -> {
byLast.forEach((last, peopleSet) ->
byFirst.computeIfAbsent(last, k -> new LinkedHashSet<>())
.addAll(peopleSet));
return byFirst;
}));
Keep in mind that if you need to have Set<Person> as the values of the maps, the Person class must implement the hashCode and equals methods consistently.
If you want a real stream-oriented solution, make sure you don't produce any large intermediate collections, else most of the sense of streams is lost.
If just you want to just filter all Steves, filter first, collect later:
persons.stream
.filter(p -> p.getFirstName().equals('Steve') || p.getLastName.equals('Steve'))
.collect(toList());
If you want to do complex things with a stream element, e.g. put an element into multiple collections, or in a map under several keys, just consume a stream using forEach, and write inside it whatever handling logic you want.
You cannot key your maps by multiple values. For what you want to achieve, you have three options:
Combine your "personByFirstName" and "personByLastName" maps, you will have duplicate values (eg. Bill Gates will be in the map under the key Bill and also in the map under the key Gates). #Andreas answer gives a good stream-based way to do this.
Use an indexing library like lucene and index all your Person objects by first name and last name.
The stream approach - it will not be performant on large data sets but you can stream your collection and use filter to get your matches:
persons
.stream()
.filter(p -> p.getFirstName().equals("Steve")
|| p.getLastName().equals("Steve"))
.collect(Collectors.asList());
(I've written the syntax from memory so you might have to tweak it).
If I got it right you want to map each Person twice, once for the first name and once for the last.
To do this you have to double your stream somehow. Assuming Couple is some existing 2-tuple (Guava or Vavr have some nice implementation) you could:
persons.stream()
.map(p -> new Couple(new Couple(p.firstName, p), new Couple(p.lastName, p)))
.flatMap(c -> Stream.of(c.left, c.right)) // Stream of Couple(String, Person)
.map(c -> new Couple(c.left, Arrays.asList(c.right)))
.collect(Collectors.toMap(Couple::getLeft, Couple::getRight, Collection::addAll));
I didn't test it, but the concept is: make a stream of (name, person), (surname, person)... for every person, then simply map for the left value of each couple. The asList is to have a collection as value. If you need a Set chenge the last line with .collect(Collectors.toMap(Couple::getLeft, c -> new HashSet(c.getRight), Collection::addAll))
Try SetMultimap, either from Google Guava or my library abacus-common
SetMultimap<String, Person> result = Multimaps.newSetMultimap(new HashMap<>(), () -> new HashSet<>()); // by Google Guava.
// Or result = N.newSetMultimap(); // By Abacus-Util
persons.forEach(p -> {
result.put(p.getFirstName(), p);
result.put(p.getLastName(), p);
});

Java API Streams collecting stream in Map where value is a TreeSet

There is a Student class which has name, surname, age fields and getters for them.
Given a stream of Student objects.
How to invoke a collect method such that it will return Map where keys are age of Student and values are TreeSet which contain surname of students with such age.
I wanted to use Collectors.toMap(), but got stuck.
I thought I could do like this and pass the third parameter to toMap method:
stream().collect(Collectors.toMap(Student::getAge, Student::getSurname, new TreeSet<String>()))`.
students.stream()
.collect(Collectors.groupingBy(
Student::getAge,
Collectors.mapping(
Student::getSurname,
Collectors.toCollection(TreeSet::new))
))
Eugene has provided the best solution to what you want as it's the perfect job for the groupingBy collector.
Another solution using the toMap collector would be:
Map<Integer, TreeSet<String>> collect =
students.stream()
.collect(Collectors.toMap(Student::getAge,
s -> new TreeSet<>(Arrays.asList(s.getSurname())),
(l, l1) -> {
l.addAll(l1);
return l;
}));

Collect and map in Java8 Streams [duplicate]

I have the following class.
class Person {
String name;
LocalDate birthday;
Sex gender;
String emailAddress;
public int getAge() {
return birthday.until(IsoChronology.INSTANCE.dateNow()).getYears();
}
public String getName() {
return name;
}
}
I'd like to be able to group by age and then collect the list of the persons names rather than the Person object itself; all in a single nice lamba expression.
To simplify all of this I am linking my current solution that store the result of the grouping by age and then iterates over it to collect the names.
ArrayList<OtherPerson> members = new ArrayList<>();
members.add(new OtherPerson("Fred", IsoChronology.INSTANCE.date(1980, 6, 20), OtherPerson.Sex.MALE, "fred#example.com"));
members.add(new OtherPerson("Jane", IsoChronology.INSTANCE.date(1990, 7, 15), OtherPerson.Sex.FEMALE, "jane#example.com"));
members.add(new OtherPerson("Mark", IsoChronology.INSTANCE.date(1990, 7, 15), OtherPerson.Sex.MALE, "mark#example.com"));
members.add(new OtherPerson("George", IsoChronology.INSTANCE.date(1991, 8, 13), OtherPerson.Sex.MALE, "george#example.com"));
members.add(new OtherPerson("Bob", IsoChronology.INSTANCE.date(2000, 9, 12), OtherPerson.Sex.MALE, "bob#example.com"));
Map<Integer, List<Person>> collect = members.stream().collect(groupingBy(Person::getAge));
Map<Integer, List<String>> result = new HashMap<>();
collect.keySet().forEach(key -> {
result.put(key, collect.get(key).stream().map(Person::getName).collect(toList()));
});
Current solution
Not ideal and for the sake of learning I'd like to have a more elegant and performing solution.
When grouping a Stream with Collectors.groupingBy, you can specify a reduction operation on the values with a custom Collector. Here, we need to use Collectors.mapping, which takes a function (what the mapping is) and a collector (how to collect the mapped values). In this case the mapping is Person::getName, i.e. a method reference that returns the name of the Person, and we collect that into a List.
Map<Integer, List<String>> collect =
members.stream()
.collect(Collectors.groupingBy(
Person::getAge,
Collectors.mapping(Person::getName, Collectors.toList()))
);
You can use a mapping Collector to map the list of Person to a list of person names :
Map<Integer, List<String>> collect =
members.stream()
.collect(Collectors.groupingBy(Person::getAge,
Collectors.mapping(Person::getName, Collectors.toList())));
You can also use Collectors.toMap and provide mapping for key, value and merge function(if any).
Map<Integer, String> ageNameMap =
members.stream()
.collect(Collectors.toMap(
person -> person.getAge(),
person -> person.getName(), (pName1, pName2) -> pName1+"|"+pName2)
);

Categories

Resources