In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?
For example I have a list of Person object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?
Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:
persons.stream()
.map(Wrapper::new)
.distinct()
.map(Wrapper::unwrap)
...;
The class Wrapper might look as follows:
class Wrapper {
private final Person person;
public Wrapper(Person person) {
this.person = person;
}
public Person unwrap() {
return person;
}
public boolean equals(Object other) {
if (other instanceof Wrapper) {
return ((Wrapper) other).person.getName().equals(person.getName());
} else {
return false;
}
}
public int hashCode() {
return person.getName().hashCode();
}
}
Another solution, using Set. May not be the ideal solution, but it works
Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());
Or if you can modify the original list, you can use removeIf method
persons.removeIf(p -> !set.add(p.getName()));
There's a simpler approach using a TreeSet with a custom comparator.
persons.stream()
.collect(Collectors.toCollection(
() -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName()))
));
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
You can use groupingBy collector:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));
If you want to have another stream you can use this:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<Person> persons = ...;
MutableList<Person> distinct =
ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));
If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<Person> persons = ...;
MutableList<Person> distinct =
persons.distinct(HashingStrategies.fromFunction(Person::getName));
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Similar approach which Saeed Zarinfam used but more Java 8 style:)
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
.map(plans -> plans.stream().findFirst().get())
.collect(toList());
You can use StreamEx library:
StreamEx.of(persons)
.distinct(Person::getName)
.toList()
I recommend using Vavr, if you can. With this library you can do the following:
io.vavr.collection.List.ofAll(persons)
.distinctBy(Person::getName)
.toJavaSet() // or any another Java 8 Collection
Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
final Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t));
}
Then call:
persons.stream().filter(distinctByKey(p -> p.getName());
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Distinct objects list can be found using:
List distinctPersons = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
ArrayList::new));
I made a generic version:
private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
return Collectors.collectingAndThen(
toMap(
keyExtractor,
t -> t,
(t1, t2) -> t1
),
(Map<R, T> map) -> map.values().stream()
);
}
An exemple:
Stream.of(new Person("Jean"),
new Person("Jean"),
new Person("Paul")
)
.filter(...)
.collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
.map(...)
.collect(toList())
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
Set<YourPropertyType> set = new HashSet<>();
list
.stream()
.filter(it -> set.add(it.getYourProperty()))
.forEach(it -> ...);
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set".
Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));
}
This works like a charm:
Grouping the data by unique key to form a map.
Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
.collect(groupingBy(Person::getName))
.values()
.stream()
.flatMap(values -> values.stream().limit(1))
.collect(toList());
The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:
Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
Person previous;
public boolean test(Person p) {
if(previous!=null && c.compare(previous, p)==0)
return false;
previous=p;
return true;
}
})./* more stream operations here */;
Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.
There are lot of approaches, this one will also help - Simple, Clean and Clear
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(11, "Ravi"));
employees.add(new Employee(12, "Stalin"));
employees.add(new Employee(23, "Anbu"));
employees.add(new Employee(24, "Yuvaraj"));
employees.add(new Employee(35, "Sena"));
employees.add(new Employee(36, "Antony"));
employees.add(new Employee(47, "Sena"));
employees.add(new Employee(48, "Ravi"));
List<Employee> empList = new ArrayList<>(employees.stream().collect(
Collectors.toMap(Employee::getName, obj -> obj,
(existingValue, newValue) -> existingValue))
.values());
empList.forEach(System.out::println);
// Collectors.toMap(
// Employee::getName, - key (the value by which you want to eliminate duplicate)
// obj -> obj, - value (entire employee object)
// (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key
Output - toString() overloaded
Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}
Here is the example
public class PayRoll {
private int payRollId;
private int id;
private String name;
private String dept;
private int salary;
public PayRoll(int payRollId, int id, String name, String dept, int salary) {
super();
this.payRollId = payRollId;
this.id = id;
this.name = name;
this.dept = dept;
this.salary = salary;
}
}
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;
public class Prac {
public static void main(String[] args) {
int salary=70000;
PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
PayRoll payRoll2=new PayRoll(1411, 2 , "B", "Technical", salary);
PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
List<PayRoll>list=new ArrayList<PayRoll>();
list.add(payRoll);
list.add(payRoll2);
list.add(payRoll3);
list.add(payRoll4);
list.add(payRoll5);
list.add(payRoll6);
Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));
k.entrySet().forEach(p->
{
if(p.getValue().isPresent())
{
System.out.println(p.getValue().get());
}
});
}
}
Output:
PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.
Building on #josketres's answer, I created a generic utility method:
You could make this more Java 8-friendly by creating a Collector.
public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
return input.stream()
.collect(toCollection(() -> new TreeSet<>(comparer)));
}
#Test
public void removeDuplicatesWithDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(7), new C(42), new C(42));
Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
assertEquals(2, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 7));
assertTrue(result.stream().anyMatch(c -> c.value == 42));
}
#Test
public void removeDuplicatesWithoutDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(1), new C(2), new C(3));
Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
assertEquals(3, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 1));
assertTrue(result.stream().anyMatch(c -> c.value == 2));
assertTrue(result.stream().anyMatch(c -> c.value == 3));
}
private class C {
public final int value;
private C(int value) {
this.value = value;
}
}
Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.
public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
(map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
(left, right) -> {
left.putAll(right);
return left;
}, map -> new ArrayList<>(map.values()),
Collector.Characteristics.UNORDERED)); }
I had a situation, where I was suppose to get distinct elements from list based on 2 keys.
If you want distinct based on two keys or may composite key, try this
class Person{
int rollno;
String name;
}
List<Person> personList;
Function<Person, List<Object>> compositeKey = personList->
Arrays.<Object>asList(personList.getName(), personList.getRollno());
Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));
List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
.filter(settingMap ->
settingMap.getValue().size() > 1)
.collect(Collectors.toList());
A variation of the top answer that handles null:
public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
}
In my tests:
assertEquals(
asList("a", "bb"),
Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));
assertEquals(
asList(5, null, 2, 3),
Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));
val maps = asList(
hashMapWith(0, 2),
hashMapWith(1, 2),
hashMapWith(2, null),
hashMapWith(3, 1),
hashMapWith(4, null),
hashMapWith(5, 2));
assertEquals(
asList(0, 2, 3),
maps.stream()
.filter(distinctBy(m -> m.get("val")))
.map(m -> m.get("i"))
.collect(toList()));
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
My solution in this listing:
List<HolderEntry> result ....
List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
HolderEntry::getId,
holder -> holder, //or Function.identity() if you want
(holder1, holder2) -> holder1
)).values());
In my situation i want to find distinct values and put their in List.
Related
I wrote a stream pipeline:
private void calcMin(Clazz clazz) {
OptionalInt min = listOfObjects.stream().filter(y -> (y.getName()
.matches(clazz.getFilter())))
.map(y -> (y.getUserNumber()))
.mapToInt(Integer::intValue)
.min();
list.add(min.getAsInt());
}
This pipeline gives me the lowest UserNumber.
So far, so good.
But I also need the greatest UserNumber.
And I also need the lowest GroupNumber.
And also the greatest GroupNumber.
I could write:
private void calcMax(Clazz clazz) {
OptionalInt max = listOfObjects.stream().filter(y -> (y.getName()
.matches(clazz.getFilter())))
.map(y -> (y.getUserNumber()))
.mapToInt(Integer::intValue)
.max();
list.add(max.getAsInt());
}
And I could also write the same for .map(y -> (y.getGroupNumber())).
This will work, but it is very redudant.
Is there a way to do it more variable?
There are two differences in the examples: the map() operation, and the terminal operation (min() and max()). So, to reuse the rest of the pipeline, you'll want to parameterize these.
I will warn you up front, however, that if you call this parameterized method directly from many places, your code will be harder to read. Comprehension of the caller's code will be easier if you keep a helper function—with a meaningful name—that delegates to the generic method. Obviously, there is a balance here. If you wanted to add additional functional parameters, the number of helper methods would grow rapidly and become cumbersome. And if you only call each helper from one place, maybe using the underlying function directly won't add too much clutter.
You don't show the type of elements in the stream. I'm using the name MyClass in this example as a placeholder.
private static OptionalInt extremum(
Collection<? extends MyClass> input,
Clazz clazz,
ToIntFunction<? super MyClass> valExtractor,
Function<IntStream, OptionalInt> terminalOp) {
IntStream matches = input.stream()
.filter(y -> y.getName().matches(clazz.getFilter()))
.mapToInt(valExtractor);
return terminalOp.apply(matches);
}
private OptionalInt calcMinUserNumber(Clazz clazz) {
return extremum(listOfObjects, clazz, MyClass::getUserNumber, IntStream::min);
}
private OptionalInt calcMaxUserNumber(Clazz clazz) {
return extremum(listOfObjects, clazz, MyClass::getUserNumber, IntStream::max);
}
private OptionalInt calcMinGroupNumber(Clazz clazz) {
return extremum(listOfObjects, clazz, MyClass::getGroupNumber, IntStream::min);
}
private OptionalInt calcMaxGroupNumber(Clazz clazz) {
return extremum(listOfObjects, clazz, MyClass::getGroupNumber, IntStream::max);
}
...
And here's a usage example:
calcMaxGroupNumber(clazz).ifPresent(list::add);
The solution may reduce redundancy but it removes readability from the code.
IntStream maxi = listOfObjects.stream().filter(y -> (y.getName()
.matches(clazz.getFilter())))
.map(y -> (y.getUserNumber()))
.mapToInt(Integer::intValue);
System.out.println(applier(() -> maxi, IntStream::max));
//System.out.println(applier(() -> maxi, IntStream::min));
...
public static OptionalInt applier(Supplier<IntStream> supplier, Function<IntStream, OptionalInt> predicate) {
return predicate.apply(supplier.get());
}
For the sake of variety, I want to add the following approach which uses a nested Collectors.teeing (Java 12 or higher) which enables to get all values by just streaming over the collection only once.
For the set up, I am using the below simple class :
#AllArgsConstructor
#ToString
#Getter
static class MyObject {
int userNumber;
int groupNumber;
}
and a list of MyObjects:
List<MyObject> myObjectList = List.of(
new MyObject(1, 2),
new MyObject(2, 3),
new MyObject(3, 4),
new MyObject(5, 3),
new MyObject(6, 2),
new MyObject(7, 6),
new MyObject(1, 12));
If the task was to get the max and min userNumber one could do a simple teeing like below and add for example the values to map:
Map<String , Integer> maxMinUserNum =
myObjectList.stream()
.collect(
Collectors.teeing(
Collectors.reducing(Integer.MAX_VALUE, MyObject::getUserNumber, Integer::min),
Collectors.reducing(Integer.MIN_VALUE, MyObject::getUserNumber, Integer::max),
(min,max) -> {
Map<String,Integer> map = new HashMap<>();
map.put("minUser",min);
map.put("maxUser",max);
return map;
}));
System.out.println(maxMinUserNum);
//output: {minUser=1, maxUser=7}
Since the task also includes to get the max and min group numbers, we could use the same approach as above and only need to nest the teeing collector :
Map<String , Integer> result =
myObjectList.stream()
.collect(
Collectors.teeing(
Collectors.teeing(
Collectors.reducing(Integer.MAX_VALUE, MyObject::getUserNumber, Integer::min),
Collectors.reducing(Integer.MIN_VALUE, MyObject::getUserNumber, Integer::max),
(min,max) -> {
Map<String,Integer> map = new LinkedHashMap<>();
map.put("minUser",min);
map.put("maxUser",max);
return map;
}),
Collectors.teeing(
Collectors.reducing(Integer.MAX_VALUE, MyObject::getGroupNumber, Integer::min),
Collectors.reducing(Integer.MIN_VALUE, MyObject::getGroupNumber, Integer::max),
(min,max) -> {
Map<String,Integer> map = new LinkedHashMap<>();
map.put("minGroup",min);
map.put("maxGroup",max);
return map;
}),
(map1,map2) -> {
map1.putAll(map2);
return map1;
}));
System.out.println(result);
output
{minUser=1, maxUser=7, minGroup=2, maxGroup=12}
Is it possible use Collectors.groupingBy() with Collectors.counting() to count to the field of a custom object instead of creating a map and transforming it afterwards.
I have a list of users, like this:
public class User {
private String firstName;
private String lastName;
// some more attributes
// getters and setters
}
I want to count all users with the same first and last name. Therefore I have custom object looking like this:
public static class NameGroup {
private String firstName;
private String lastName;
private long count;
// getters and setters
}
I can collect the name groups using this:
List<NameGroup> names = users.stream()
.collect(Collectors.groupingBy(p -> Arrays.asList(p.getFirstName(), p.getLastName()), Collectors.counting()))
.entrySet().stream()
.map(e -> new NameGroup(e.getKey().get(0), e.getKey().get(1), e.getValue()))
.collect(Collectors.toList());
With this solution I have to group the users first to a map and transform it afterwards to my custom object. Is it possible to count all names directly to nameGroup.count to avoid iterating twice over the list (or map) and improve the performance?
You could collect directly to NameGroup.count, but it would be less efficient than what you have, not more.
Internally, the map is being used to maintain a data structure that can efficiently track the name combinations and map them to counts which are updated as more matches are found. Reinventing that data structure is painful and unlikely to result in meaningful improvements.
You could try to collect NameGroups directly in the map instead of going via a count, but most approaches for that would, again, be more expensive than what you have now, and certainly much more awkward.
Honestly: what you have now is perfectly good, and not inefficient in any ways that are important. You should almost certainly stick to what you have.
Not very clean but you can possibly do it as :
List<NameGroup> convertUsersToNameGroups(List<User> users) {
return new ArrayList<>(users.stream()
.collect(Collectors.toMap(p -> Arrays.asList(p.getFirstName(), p.getLastName()),
p -> new NameGroup(p.getFirstName(), p.getLastName(), 1L),
(nameGroup1, nameGroup2) -> new NameGroup(nameGroup1.getFirstName(), nameGroup1.getLastName(),
nameGroup1.getCount() + nameGroup2.getCount()))).values());
}
You can minimize allocations of intermediate objects, e.g. all the Arrays.asList(...) objects, by build a map yourself, instead of using streaming.
This relies on the fact that your NameGroup is mutable.
To even make the code simpler, lets add two helpers to NameGroup:
public static class NameGroup {
// fields here
public NameGroup(User user) {
this.firstName = user.getFirstName();
this.lastName = user.getLastName();
}
public void incrementCount() {
this.count++;
}
// other constructors, getters and setters here
}
With that in place, you can implement the logic like this:
Map<User, NameGroup> map = new TreeMap<>(Comparator.comparing(User::getFirstName)
.thenComparing(User::getLastName));
users.stream().forEach(user -> map.computeIfAbsent(user, NameGroup::new).incrementCount());
List<NameGroup> names = new ArrayList<>(map.values());
Or if you don't actually need a list, the last line can be simplified to:
Collection<NameGroup> names = map.values();
public static class NameGroup {
// ...
#Override
public boolean equals(Object other) {
final NameGroup o = (NameGroup) other;
return firstName.equals(o.firstName) && lastName.equals(o.lastName);
}
#Override
public int hashCode() {
return Objects.hash(firstName, lastName);
}
#Override
public String toString() {
return firstName + " " + lastName + " " + count;
}
}
public static void main(String[] args) throws IOException {
List<User> users = new ArrayList<>();
users.add(new User("fooz", "bar"));
users.add(new User("fooz", "bar"));
users.add(new User("foo", "bar"));
users.add(new User("foo", "bar"));
users.add(new User("foo", "barz"));
users.stream()
.map(u -> new NameGroup(u.getFirstName(), u.getLastName(), 1L))
.reduce(new HashMap<NameGroup, NameGroup>(), (HashMap<NameGroup, NameGroup> acc, NameGroup e) -> {
acc.compute(e, (k, v) -> v == null ? e : new NameGroup(e.firstName, e.lastName, e.count + acc.get(e).count));
return acc;
}, (a, b) -> {
b.keySet().forEach(e -> a.compute(e, (k, v) -> v == null ? e : new NameGroup(e.firstName, e.lastName, e.count + a.get(e).count)));
return a;
}).values().forEach(x -> System.out.println(x));
}
This will output
fooz bar 2
foo barz 1
foo bar 2
I don't know what your requirements are but I modified your NameGroup class to accept a User class instead of first and last names. This then negated the need for for selecting the values from the intermediate stream of List and just from a stream of User. But it still requires the map.
List<NameGroup> names =
users.stream().collect(Collectors.groupingBy(p -> p,Collectors.counting()))
.entrySet().stream()
.map(e -> new NameGroup(e.getKey(), e.getValue())).collect(
Collectors.toList());
I was trying to filter a list based on multiple conditions, sorting.
class Student{
private int Age;
private String className;
private String Name;
public Student(int age, String className, String name) {
Age = age;
this.className = className;
Name = name;
}
public int getAge() {
return Age;
}
public void setAge(int age) {
Age = age;
}
public String getClassName() {
return className;
}
public void setClassName(String className) {
this.className = className;
}
public String getName() {
return Name;
}
public void setName(String name) {
Name = name;
}
}
Now if I have a list of that, say
List<Student> students = new ArrayList<>();
students.add(new Student(24, "A", "Smith"));
students.add(new Student(24, "A", "John"));
students.add(new Student(30, "A", "John"));
students.add(new Student(20, "B", "John"));
students.add(new Student(24, "B", "Prince"));
How would I be able to get a list of the oldest students with a distinct name?
In C# this would be quite simple by using System.Linq GroupBy then comparing and then flattening with select, I'm not too sure how I could achieve the same in Java.
Use the toMap collector:
Collection<Student> values = students.stream()
.collect(toMap(Student::getName,
Function.identity(),
BinaryOperator.maxBy(Comparator.comparingInt(Student::getAge))))
.values();
Explanation
We're using this overload of toMap:
toMap(Function<? super T,? extends K> keyMapper,
Function<? super T,? extends U> valueMapper,
BinaryOperator<U> mergeFunction)
Student::getName above is the keyMapper function used to extract the values for the map keys.
Function.identity() above is the valueMapper function used to extract the values for the map values where Function.identity() simply returns the elements in the source them selves i.e. the Student objects.
BinaryOperator.maxBy(Comparator.comparingInt(Student::getAge)) above is the merge function used to "decide which Student object to return in the case of a key collission i.e. when two given students have the same name" in this case taking the oldest Student .
Finally, invoking values() returns us a collection of students.
The equivalent C# code being:
var values = students.GroupBy(s => s.Name, v => v,
(a, b) => b.OrderByDescending(e => e.Age).Take(1))
.SelectMany(x => x);
Explanation (for those unfamiliar with .NET)
We're using this extension method of GroupBy:
System.Collections.Generic.IEnumerable<TResult> GroupBy<TSource,TKey,TElement,TResult>
(this System.Collections.Generic.IEnumerable<TSource> source,
Func<TSource,TKey> keySelector,
Func<TSource,TElement> elementSelector,
Func<TKey,System.Collections.Generic.IEnumerable<TElement>,TResult> resultSelector);
s => s.Name above is the keySelector function used to extract the value to group by.
v => v above is the elementSelector function used to extract the values i.e. the Student objects them selves.
b.OrderByDescending(e => e.Age).Take(1) above is the resultSelector which given an IEnumerable<Student> represented as b takes the oldest student.
Finally, we apply .SelectMany(x => x); to collapse the resulting IEnumerable<IEnumerable<Student>> into a IEnumerable<Student>.
Or without streams:
Map<String, Student> map = new HashMap<>();
students.forEach(x -> map.merge(x.getName(), x, (oldV, newV) -> oldV.getAge() > newV.getAge() ? oldV : newV));
Collection<Student> max = map.values();
If you need a grouping only sorted, it is quite simple:
Map<String, List<Student>> collect = students.stream() // stream capabilities
.sorted(Comparator.comparingInt(Student::getAge).reversed()) // sort by age, descending
.collect(Collectors.groupingBy(Student::getName)); // group by name.
Output in collect:
Prince=[Student [Age=24, className=B, Name=Prince]],
Smith=[Student [Age=24, className=A, Name=Smith]],
John=[Student [Age=30, className=A, Name=John], Student [Age=24, className=A, Name=John], Student [Age=20, className=B, Name=John]]
Just to mix and merge the other solutions, you could alternatively do :
Map<String, Student> nameToStudentMap = new HashMap<>();
Set<Student> finalListOfStudents = students.stream()
.map(x -> nameToStudentMap.merge(x.getName(), x, (a, b) -> a.getAge() > b.getAge() ? a : b))
.collect(Collectors.toSet());
There are lots of articles that are about Java 8 lambda operations however I couldn't find what I need until now. I tried to convert them to my approach unfortunately I couldn't succeed
Imagine that you have request that comes in POJO such as ;
public class DummyRequest {
private String name;
private String surname;
private String country;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getSurname() {
return surname;
}
public void setSurname(String surname) {
this.surname = surname;
}
public String getCountry() {
return country;
}
public void setCountry(String country) {
this.country= country;
}
}
During REST/SOAP request surname variable will be assigned as null.
List<Person> persons = Arrays.asList(
new Person("maria", "gambert", "italy"),
new Person("jack", "johson", "usa"),
new Person("johnson", "jack", "usa"),
new Person("kate", "julia", "spain"),
new Person("jack","bob","uk");
DummyRequest dr = new DummyRequest();
dr.setName("jack");
dr.setCountry("usa");
If I knew in advance that the surname field would be null, I could manage to filter the list like this, ignoring that field:
List<Person> result4 = persons.stream().
filter(x -> dummyRequest.getName().equals(x.getName())).
filter( x-> dummyRequest.getCountry().equals(x.getCountry())).
collect(Collectors.toList());
However I don't know which fields will be null and which will not. How could I instead filter my data according to non-null request parameters only?
If you want to only apply the filter for which the dummyRequest contains a non-null value, you would have to dynamically build your stream for the most efficient solution.
This could easily be done by implementing a helper method that conditionally applies a filter:
public static <T, V> Stream<T> filterIfNotNull(Stream<T> stream, V filterValue, Function<T, V> property) {
if (filterValue == null) {
return stream;
}
return stream.filter(t -> filterValue.equals(property.apply(t)));
}
(in your example T would always be Person and V would always be String, but this generic version allows more reusability without additional complexity at the call site)
Then the stream/collect can be implemented like this:
Stream<Person> personStream = persons.stream();
personStream = filterIfNotNull(personStream, dummyRequest.getName(), Person::getName);
personStream = filterIfNotNull(personStream, dummyRequest.getSurname(), Person::getSurname);
personStream = filterIfNotNull(personStream, dummyRequest.getCountry(), Person::getCountry);
List<Person> result4 = personStream.collect(Collectors.toList());
This technique guarantees that the null-check on the request's properties is only applied once.
I would define a static method for that, since you are duplicating the code so many times:
private static boolean nullableOrEqual(String left, String right) {
return left == null || left.equals(right);
}
And then the usage would be:
List<Person> result = persons.stream()
.filter(x -> nullableOrEqual(dr.getSurname(), x.getSurname()))
.filter(x -> nullableOrEqual(dr.getCountry(), x.getCountry()))
.filter(x -> nullableOrEqual(dr.getName(), x.getName()))
.collect(Collectors.toList());
If you want to filter only by the non-null properties of dummyRequest, you can simply add a null check to each Predicate:
List<Person> result4 =
persons.stream()
.filter(x -> dummyRequest.getSurname() == null || dummyRequest.getSurname().equals(x.getSurname()))
.filter(x -> dummyRequest.getName() == null || dummyRequest.getName().equals(x.getName()))
.filter(x -> dummyRequest.getCountry() == null || dummyRequest.getCountry().equals(x.getCountry()))
.collect(Collectors.toList());
You could create a checkNonNullProperties helper method that returns a Predicate<Person> that only checks for equality of non-null properties of your DummyRequest instance. You could use it as follows:
Predicate<Person> condition = checkNonNullProperties(
Arrays.asList(
dr.getCountry(),
dr.getName(),
dr.getSurname()),
Arrays.asList(
Person::getCountry,
Person::getName,
Person::getSurname));
List<Person> result = people.stream()
.filter(condition)
.collect(Collectors.toList());
The helper method:
private static <T> Predicate<T> checkNonNullProperties(
List<?> values,
List<Function<T, ?>> extractors) {
return IntStream.range(0, values.size()).mapToObj(i ->
(Predicate<T>) t -> {
Object value = values.get(i);
Object property = extractors.get(i).apply(t);
return value == null || value.equals(property);
})
.reduce(t -> true, Predicate::and);
}
The checkNonNullProperties method receives a list of values to check for equality and a list of functions that will extract the properties from the argument of the returned predicate. The extracted properties will be checked for equality against their corresponding values only for those values that are non-null.
I'm using an IntStream to drive iteration over both lists. In the mapToObj method I'm mapping the stream's int value to a predicate that returns true when the provided value is null or when it's equal to the extracted property.
In the end, these predicates are reduced to a final predicate via the Predicate::and operator. In the reduce call, I'm providing the identity predicate for the AND operator, which is t -> true (always returns true).
I have two classes that are structured like this:
public class Company {
private List<Person> person;
...
public List<Person> getPerson() {
return person;
}
...
}
public class Person {
private String tag;
...
public String getTag() {
return tag;
}
...
}
Basically the Company class has a List of Person objects, and each Person object can get a Tag value.
If I get the List of the Person objects, is there a way to use Stream from Java 8 to find the one Tag value that is the most common among all the Person objects (in case of a tie, maybe just a random of the most common)?
String mostCommonTag;
if(!company.getPerson().isEmpty) {
mostCommonTag = company.getPerson().stream() //How to do this in Stream?
}
String mostCommonTag = getPerson().stream()
// filter some person without a tag out
.filter(it -> Objects.nonNull(it.getTag()))
// summarize tags
.collect(Collectors.groupingBy(Person::getTag, Collectors.counting()))
// fetch the max entry
.entrySet().stream().max(Map.Entry.comparingByValue())
// map to tag
.map(Map.Entry::getKey).orElse(null);
AND the getTag method appeared twice, you can simplify the code as further:
String mostCommonTag = getPerson().stream()
// map person to tag & filter null tag out
.map(Person::getTag).filter(Objects::nonNull)
// summarize tags
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
// fetch the max entry
.entrySet().stream().max(Map.Entry.comparingByValue())
// map to tag
.map(Map.Entry::getKey).orElse(null);
You could collect the counts to a Map, then get the key with the highest value
List<String> foo = Arrays.asList("a","b","c","d","e","e","e","f","f","f","g");
Map<String, Long> f = foo
.stream()
.collect(Collectors.groupingBy(v -> v, Collectors.counting()));
String maxOccurence =
Collections.max(f.entrySet(), Comparator.comparing(Map.Entry::getValue)).getKey();
System.out.println(maxOccurence);
This should work for you:
private void run() {
List<Person> list = Arrays.asList(() -> "foo", () -> "foo", () -> "foo",
() -> "bar", () -> "bar");
Map<String, Long> commonness = list.stream()
.collect(Collectors.groupingBy(Person::getTag, Collectors.counting()));
Optional<String> mostCommon = commonness.entrySet().stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey);
System.out.println(mostCommon.orElse("no elements in list"));
}
public interface Person {
String getTag();
}
The commonness map contains the information which tag was found how often. The variable mostCommon contains the tag that was found most often. Also, mostCommon is empty, if the original list was empty.
If you are open to using a third-party library, you can use Collectors2 from Eclipse Collections with a Java 8 Stream to create a Bag and request the topOccurrences, which will return a MutableList of ObjectIntPair which is the tag value and the count of the number of occurrences.
MutableList<ObjectIntPair<String>> topOccurrences = company.getPerson()
.stream()
.map(Person::getTag)
.collect(Collectors2.toBag())
.topOccurrences(1);
String mostCommonTag = topOccurrences.getFirst().getOne();
In the case of a tie, the MutableList will have more than one result.
Note: I am a committer for Eclipse Collections.
This is helpful for you,
Map<String, Long> count = persons.stream().collect(
Collectors.groupingBy(Person::getTag, Collectors.counting()));
Optional<Entry<String, Long>> maxValue = count .entrySet()
.stream().max((entry1, entry2) -> entry1.getValue() > entry2.getValue() ? 1 : -1).get().getKey();
maxValue.get().getValue();
One More solution by abacus-common
// Comparing the solution by jdk stream,
// there is no "collect(Collectors.groupingBy(Person::getTag, Collectors.counting())).entrySet().stream"
Stream.of(company.getPerson()).map(Person::getTag).skipNull() //
.groupBy(Fn.identity(), Collectors.counting()) //
.max(Comparators.comparingByValue()).map(e -> e.getKey()).orNull();
// Or by multiset
Stream.of(company.getPerson()).map(Person::getTag).skipNull() //
.toMultiset().maxOccurrences().map(e -> e.getKey()).orNull();