Java Lambda Stream Distinct() on arbitrary key? [duplicate] - java

This question already has answers here:
Java 8 Distinct by property
(34 answers)
Closed 3 years ago.
I frequently ran into a problem with Java lambda expressions where when I wanted to distinct() a stream on an arbitrary property or method of an object, but wanted to keep the object rather than map it to that property or method. I started to create containers as discussed here but I started to do it enough to where it became annoying and made a lot of boilerplate classes.
I threw together this Pairing class, which holds two objects of two types and allows you to specify keying off the left, right, or both objects. My question is... is there really no built-in lambda stream function to distinct() on a key supplier of some sorts? That would really surprise me. If not, will this class fulfill that function reliably?
Here is how it would be called
BigDecimal totalShare = orders.stream().map(c -> Pairing.keyLeft(c.getCompany().getId(), c.getShare())).distinct().map(Pairing::getRightItem).reduce(BigDecimal.ZERO, (x,y) -> x.add(y));
Here is the Pairing class
public final class Pairing<X,Y> {
private final X item1;
private final Y item2;
private final KeySetup keySetup;
private static enum KeySetup {LEFT,RIGHT,BOTH};
private Pairing(X item1, Y item2, KeySetup keySetup) {
this.item1 = item1;
this.item2 = item2;
this.keySetup = keySetup;
}
public X getLeftItem() {
return item1;
}
public Y getRightItem() {
return item2;
}
public static <X,Y> Pairing<X,Y> keyLeft(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.LEFT);
}
public static <X,Y> Pairing<X,Y> keyRight(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.RIGHT);
}
public static <X,Y> Pairing<X,Y> keyBoth(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.BOTH);
}
public static <X,Y> Pairing<X,Y> forItems(X item1, Y item2) {
return keyBoth(item1, item2);
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item1 == null) ? 0 : item1.hashCode());
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item2 == null) ? 0 : item2.hashCode());
}
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pairing<?,?> other = (Pairing<?,?>) obj;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
if (item1 == null) {
if (other.item1 != null)
return false;
} else if (!item1.equals(other.item1))
return false;
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
if (item2 == null) {
if (other.item2 != null)
return false;
} else if (!item2.equals(other.item2))
return false;
}
return true;
}
}
UPDATE:
Tested Stuart's function below and it seems to work great. The operation below distincts on the first letter of each string. The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream
public class DistinctByKey {
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
public static void main(String[] args) {
final ImmutableList<String> arpts = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI");
arpts.stream().filter(distinctByKey(f -> f.substring(0,1))).forEach(s -> System.out.println(s));
}
Output is...
ABQ
CHI
PHX
BWI

The distinct operation is a stateful pipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:
/**
* Stateful filter. T is type of stream element, K is type of extracted key.
*/
static class DistinctByKey<T,K> {
Map<K,Boolean> seen = new ConcurrentHashMap<>();
Function<T,K> keyExtractor;
public DistinctByKey(Function<T,K> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKey class.
This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.
UPDATE
It's possible to get rid of the K type parameter since it's not actually used for anything other than being stored in a map. So Object is sufficient.
/**
* Stateful filter. T is type of stream element.
*/
static class DistinctByKey<T> {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
Function<T,Object> keyExtractor;
public DistinctByKey(Function<T,Object> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.
(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T> and rename the method to eval. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)
UPDATE 2
Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
BigDecimal totalShare = orders.stream()
.filter(distinctByKey(o -> o.getCompany().getId()))
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);

You more or less have to do something like
elements.stream()
.collect(Collectors.toMap(
obj -> extractKey(obj),
obj -> obj,
(first, second) -> first
// pick the first if multiple values have the same key
)).values().stream();

Another way of finding distinct elements
List<String> uniqueObjects = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI")
.stream()
.collect(Collectors.groupingBy((p)->p.substring(0,1))) //expression
.values()
.stream()
.flatMap(e->e.stream().limit(1))
.collect(Collectors.toList());

A variation on Stuart Marks second update. Using a Set.
public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Set<Object> seen = Collections.newSetFromMap(new ConcurrentHashMap<>());
return t -> seen.add(keyExtractor.apply(t));
}

We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())

To answer your question in your second update:
The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream:
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
In your code sample, distinctByKey is only invoked one time, so the ConcurrentHashMap created just once. Here's an explanation:
The distinctByKey function is just a plain-old function that returns an object, and that object happens to be a Predicate. Keep in mind that a predicate is basically a piece of code that can be evaluated later. To manually evaluate a predicate, you must call a method in the Predicate interface such as test. So, the predicate
t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null
is merely a declaration that is not actually evaluated inside distinctByKey.
The predicate is passed around just like any other object. It is returned and passed into the filter operation, which basically evaluates the predicate repeatedly against each element of the stream by calling test.
I'm sure filter is more complicated than I made it out to be, but the point is, the predicate is evaluated many times outside of distinctByKey. There's nothing special* about distinctByKey; it's just a function that you've called one time, so the ConcurrentHashMap is only created one time.
*Apart from being well made, #stuart-marks :)

You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
ListIterate.distinct(list, HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
If you can refactor list to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
list.distinct(HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.

Set.add(element) returns true if the set did not already contain element, otherwise false.
So you can do like this.
Set<String> set = new HashSet<>();
BigDecimal totalShare = orders.stream()
.filter(c -> set.add(c.getCompany().getId()))
.map(c -> c.getShare())
.reduce(BigDecimal.ZERO, BigDecimal::add);
If you want to do this parallel, you must use concurrent map.

It can be done something like
Set<String> distinctCompany = orders.stream()
.map(Order::getCompany)
.collect(Collectors.toSet());

Related

How to make a Predicate from a custom list of Predicates in Java?

I'm relatively new to programming and I have been wondering for past two days how to make a Predicate that is made from a custom list of other Predicates. So I've came up with some kind of solution. Below is a code snippet that should give you an idea. Because I have written it based on solely reading various pieces of documentations I have two questions: 1/ is it a good solution? 2/ is there some other, recommended solution for this problem?
public class Tester {
private static ArrayList<Predicate<String>> testerList;
//some Predicates of type String here...
public static void addPredicate(Predicate<String> newPredicate) {
if (testerList == null)
{testerList = new ArrayList<Predicate<String>>();}
testerList.add(newPredicate);
}
public static Predicate<String> customTesters () {
return s -> testerList.stream().allMatch(t -> t.test(s));
}
}
You could have a static method that receives many predicates and returns the predicate you want:
public static <T> Predicate<T> and(Predicate<T>... predicates) {
// TODO Handle case when argument is null or empty or has only one element
return s -> Arrays.stream(predicates).allMatch(t -> t.test(s));
}
A variant:
public static <T> Predicate<T> and(Predicate<T>... predicates) {
// TODO Handle case when argument is null or empty or has only one element
return Arrays.stream(predicates).reduce(t -> true, Predicate::and);
}
Here I'm using Stream.reduce, which takes the identity and an operator as arguments. Stream.reduce applies the Predicate::and operator to all elements of the stream to produce a result predicate, and uses the identity to operate on the first element of the stream. This is why I have used t -> true as the identity, otherwise the result predicate might end up evaluating to false.
Usage:
Predicate<String> predicate = and(s -> s.startsWith("a"), s -> s.length() > 4);
Java Predicate has a nice function of AND which returns new Predicate which is evaluation of both predicates. You can add them all into one with this.
https://docs.oracle.com/javase/8/docs/api/java/util/function/Predicate.html#and-java.util.function.Predicate-
example :
Predicate<String> a = str -> str != null;
Predicate<String> b = str -> str.length() != 0;
Predicate<String> c = a.and(b);
c.test("Str");
//stupid test but you see the idea :)

Group and Reduce list of objects

I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...) to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
#Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If you break the grouping and reducing steps up, you can get something cleaner:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo class:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this question that shows you should implement a hashCode method or distinct may not behave correctly.
In the case of the distinct method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent (this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
Another approach is to use my StreamEx library which provides a bunch of "partial reduction" methods including collapse:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap solution and it's simpler and somewhat more flexible as collapse is an intermediate operation, so you can collect in another way.
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap solution.
As already pointed out by others, an intermediate Map is unavoidable, as that’s the way of finding the objects to merge. Further, you should not modify source data during reduction.
Nevertheless, you can achieve both without creating multiple Foo instances:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo class, as it should have if Foo is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>) to the merged items was sufficient.
Since this is not the case and Java 8 lacks the flatMapping collector, the flatmapping step is moved to the second step, making it look much more complicated.
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.

Compare two sets of different types

I'm looking for a way to tell if two sets of different element types are identical if I can state one-to-one relation between those element types. Is there a standard way for doing this in java or maybe guava or apache commons?
Here is my own implementation of this task. For example, I have two element classes which I know how to compare. For simplicity, I compare them by id field:
class ValueObject {
public int id;
public ValueObject(int id) { this.id=id; }
public static ValueObject of(int id) { return new ValueObject(id); }
}
class DTO {
public int id;
public DTO(int id) { this.id=id; }
public static DTO of(int id) { return new DTO(id); }
}
Then I define an interface which does the comparison
interface TwoTypesComparator<L,R> {
boolean areIdentical(L left, R right);
}
And the actual method for comparing sets looks like this
public static <L,R> boolean areIdentical(Set<L> left, Set<R> right, TwoTypesComparator<L,R> comparator) {
if (left.size() != right.size()) return false;
boolean found;
for (L l : left) {
found = false;
for (R r : right) {
if (comparator.areIdentical(l, r)) {
found = true; break;
}
}
if (!found) return false;
}
return true;
}
Example of a client code
HashSet<ValueObject> valueObjects = new HashSet<ValueObject>();
valueObjects.add(ValueObject.of(1));
valueObjects.add(ValueObject.of(2));
valueObjects.add(ValueObject.of(3));
HashSet<DTO> dtos = new HashSet<DTO>();
dtos.add(DTO.of(1));
dtos.add(DTO.of(2));
dtos.add(DTO.of(34));
System.out.println(areIdentical(valueObjects, dtos, new TwoTypesComparator<ValueObject, DTO>() {
#Override
public boolean areIdentical(ValueObject left, DTO right) {
return left.id == right.id;
}
}));
I'm looking for the standard solution to to this task. Or any suggestions how to improve this code are welcome.
This is what I would do in your case. You have sets. Sets are hard to compare, but on top of that, you want to compare on their id.
I see only one proper solution where you have to normalize the wanted values (extract their id) then sort those ids, then compare them in order, because if you don't sort and compare you can possibly skip pass over duplicates and/or values.
Think about the fact that Java 8 allows you to play lazy with streams. So don't rush over and think that extracting, then sorting then copying is long. Lazyness allows it to be rather fast compared to iterative solutions.
HashSet<ValueObject> valueObjects = new HashSet<>();
valueObjects.add(ValueObject.of(1));
valueObjects.add(ValueObject.of(2));
valueObjects.add(ValueObject.of(3));
HashSet<DTO> dtos = new HashSet<>();
dtos.add(DTO.of(1));
dtos.add(DTO.of(2));
dtos.add(DTO.of(34));
boolean areIdentical = Arrays.equals(
valueObjects.stream()
.mapToInt((v) -> v.id)
.sorted()
.toArray(),
dtos.stream()
.mapToInt((d) -> d.id)
.sorted()
.toArray()
);
You want to generalize the solution? No problem.
public static <T extends Comparable<?>> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted()
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted()
.toArray()
);
}
And for a T that is not comparable:
public static <T> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor, Comparator<T> comparator) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted(comparator)
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted(comparator)
.toArray()
);
}
You mention Guava and if you don't have Java 8, you can do the following, using the same algorithm:
List<Integer> voIds = FluentIterables.from(valueObjects)
.transform(valueObjectIdGetter())
.toSortedList(intComparator());
List<Integer> dtoIds = FluentIterables.from(dtos)
.transform(dtoIdGetter())
.toSortedList(intComparator());
return voIds.equals(dtoIds);
Another solution would be to use List instead of Set (if you are allowed to do so). List has a method called get(int index) that retrieves the element at the specified index and you can compare them one by one when both your lists have the same size. More on lists: http://docs.oracle.com/javase/7/docs/api/java/util/List.html
Also, avoid using public variables in your classes. A good practice is to make your variables private and use getter and setter methods.
Instantiate lists and add values
List<ValueObject> list = new ArrayList<>();
List<DTO> list2 = new ArrayList<>();
list.add(ValueObject.of(1));
list.add(ValueObject.of(2));
list.add(ValueObject.of(3));
list2.add(DTO.of(1));
list2.add(DTO.of(2));
list2.add(DTO.of(34));
Method that compares lists
public boolean compareLists(List<ValueObject> list, List<DTO> list2) {
if(list.size() != list2.size()) {
return false;
}
for(int i = 0; i < list.size(); i++) {
if(list.get(i).id == list2.get(i).id) {
continue;
} else {
return false;
}
}
return true;
}
Your current method is incorrect or at least inconsistent for general sets.
Imagine the following:
L contains the Pairs (1,1), (1,2), (2,1).
R contains the Pairs (1,1), (2,1), (2,2).
Now if your id is the first value your compare would return true but are those sets really equal? The problem is that you have no guarantee that there is at most one Element with the same id in the set because you don't know how L and R implement equals so my advise would be to not compare sets of different types.
If you really need to compare two Sets the way you described I would go for copying all Elements from L to a List and then go through R and every time you find the Element in L remove it from the List. Just make sure you use LinkedList instead of ArrayList .
You could override equals and hashcode on the dto/value object and then do : leftSet.containsAll(rightSet) && leftSet.size().equals(rightSet.size())
If you can't alter the element classes, make a decorator and have the sets be of the decorator type.

Lambdas, multiple forEach with casting

Need some help thinking in lambdas from my fellow StackOverflow luminaries.
Standard case of picking through a list of a list of a list to collect some children deep in a graph. What awesome ways could Lambdas help with this boilerplate?
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
for (final Service service : server.findServices()) {
if (service.getContainer() instanceof Engine) {
final Engine engine = (Engine) service.getContainer();
for (final Container possibleHost : engine.findChildren()) {
if (possibleHost instanceof Host) {
final Host host = (Host) possibleHost;
for (final Container possibleContext : host.findChildren()) {
if (possibleContext instanceof Context) {
final Context context = (Context) possibleContext;
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
}
}
}
}
}
}
return list;
}
Note the list itself is going to the client as JSON, so don't focus on what is returned. Must be a few neat ways I can cut down the loops.
Interested to see what my fellow experts create. Multiple approaches encouraged.
EDIT
The findServices and the two findChildren methods return arrays
EDIT - BONUS CHALLENGE
The "not important part" did turn out to be important. I actually need to copy a value only available in the host instance. This seems to ruin all the beautiful examples. How would one carry state forward?
final ContextInfo info = new ContextInfo(context.getPath());
info.setHostname(host.getName()); // The Bonus Challenge
It's fairly deeply nested but it doesn't seem exceptionally difficult.
The first observation is that if a for-loop translates into a stream, nested for-loops can be "flattened" into a single stream using flatMap. This operation takes a single element and returns an arbitrary number elements in a stream. I looked up and found that StandardServer.findServices() returns an array of Service so we turn this into a stream using Arrays.stream(). (I make similar assumptions for Engine.findChildren() and Host.findChildren().
Next, the logic within each loop does an instanceof check and a cast. This can be modeled using streams as a filter operation to do the instanceof followed by a map operation that simply casts and returns the same reference. This is actually a no-op but it lets the static typing system convert a Stream<Container> to a Stream<Host> for example.
Applying these transformations to the nested loops, we get the following:
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
return list;
}
But wait, there's more.
The final forEach operation is a slightly more complicated map operation that converts a Context into a ContextInfo. Furthermore, these are just collected into a List so we can use collectors to do this instead of creating and empty list up front and then populating it. Applying these refactorings results in the following:
public List<ContextInfo> list() {
final StandardServer server = getServer();
return Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.map(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
}
I usually try to avoid multi-line lambdas (such as in the final map operation) so I'd refactor it into a little helper method that takes a Context and returns a ContextInfo. This doesn't shorten the code at all, but I think it does make it clearer.
UPDATE
But wait, there's still more.
Let's extract the call to service.getContainer() into its own pipeline element:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.filter(container -> container instanceof Engine)
.map(container -> (Engine)container)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
// ...
This exposes the repetition of filtering on instanceof followed by a mapping with a cast. This is done three times in total. It seems likely that other code is going to need to do similar things, so it would be nice to extract this bit of logic into a helper method. The problem is that filter can change the number of elements in the stream (dropping ones that don't match) but it can't change their types. And map can change the types of elements, but it can't change their number. Can something change both the number and types? Yes, it's our old friend flatMap again! So our helper method needs to take an element and return a stream of elements of a different type. That return stream will contain a single casted element (if it matches) or it will be empty (if it doesn't match). The helper function would look like this:
<T,U> Stream<U> toType(T t, Class<U> clazz) {
if (clazz.isInstance(t)) {
return Stream.of(clazz.cast(t));
} else {
return Stream.empty();
}
}
(This is loosely based on C#'s OfType construct mentioned in some of the comments.)
While we're at it, let's extract a method to create a ContextInfo:
ContextInfo makeContextInfo(Context context) {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
}
After these extractions, the pipeline looks like this:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren()))
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(this::makeContextInfo)
.collect(Collectors.toList());
Nicer, I think, and we've removed the dreaded multi-line statement lambda.
UPDATE: BONUS CHALLENGE
Once again, flatMap is your friend. Take the tail of the stream and migrate it into the last flatMap before the tail. That way the host variable is still in scope, and you can pass it to a makeContextInfo helper method that's been modified to take host as well.
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren())
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(ctx -> makeContextInfo(ctx, host)))
.collect(Collectors.toList());
This would be my version of your code using JDK 8 streams, method references, and lambda expressions:
server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
In this approach, I replace your if-statements for filter predicates. Take into account that an instanceof check can be replaced with a Predicate<T>
Predicate<Object> isEngine = someObject -> someObject instanceof Engine;
which can also be expressed as
Predicate<Object> isEngine = Engine.class::isInstance
Similarly, your casts can be replaced by Function<T,R>.
Function<Object,Engine> castToEngine = someObject -> (Engine) someObject;
Which is pretty much the same as
Function<Object,Engine> castToEngine = Engine.class::cast;
And adding items manually to a list in the for loop can be replaced with a collector. In production code, the lambda that transforms a Context into a ContextInfo can (and should) be extracted into a separate method, and used as a method reference.
Solution to bonus challenge
Inspired by #EdwinDalorzo answer.
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<>();
final StandardServer server = getServer();
return server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> mapContainers(
Arrays.stream(host.findChildren()), host.getName())
)
.collect(Collectors.toList());
}
private static Stream<ContextInfo> mapContainers(Stream<Container> containers,
String hostname) {
return containers
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
info.setHostname(hostname); // The Bonus Challenge
return info;
});
}
First attempt beyond ugly. It will be years before I find this readable. Has to be a better way.
Note the findChildren methods return arrays which of course work with for (N n: array) syntax, but not with the new Iterable.forEach method. Had to wrap them with Arrays.asList
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
asList(server.findServices()).forEach(service -> {
if (!(service.getContainer() instanceof Engine)) return;
final Engine engine = (Engine) service.getContainer();
instanceOf(Host.class, asList(engine.findChildren())).forEach(host -> {
instanceOf(Context.class, asList(host.findChildren())).forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
});
});
return list;
}
The utility methods
public static <T> Iterable<T> instanceOf(final Class<T> type, final Collection collection) {
final Iterator iterator = collection.iterator();
return () -> new SlambdaIterator<>(() -> {
while (iterator.hasNext()) {
final Object object = iterator.next();
if (object != null && type.isAssignableFrom(object.getClass())) {
return (T) object;
}
}
throw new NoSuchElementException();
});
}
And finally a Lambda-powerable implementation of Iterable
public static class SlambdaIterator<T> implements Iterator<T> {
// Ya put your Lambdas in there
public static interface Advancer<T> {
T advance() throws NoSuchElementException;
}
private final Advancer<T> advancer;
private T next;
protected SlambdaIterator(final Advancer<T> advancer) {
this.advancer = advancer;
}
#Override
public boolean hasNext() {
if (next != null) return true;
try {
next = advancer.advance();
return next != null;
} catch (final NoSuchElementException e) {
return false;
}
}
#Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
final T v = next;
next = null;
return v;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Lots of plumbing and no doubt 5x the byte code. Must be a better way.

Is there an elegant way to remove nulls while transforming a Collection using Guava?

I have a question about simplifying some Collection handling code, when using Google Collections (update: Guava).
I've got a bunch of "Computer" objects, and I want to end up with a Collection of their "resource id"s. This is done like so:
Collection<Computer> matchingComputers = findComputers();
Collection<String> resourceIds =
Lists.newArrayList(Iterables.transform(matchingComputers, new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
}));
Now, getResourceId() may return null (and changing that is not an option right now), yet in this case I'd like to omit nulls from the resulting String collection.
Here's one way to filter nulls out:
Collections2.filter(resourceIds, new Predicate<String>() {
#Override
public boolean apply(String input) {
return input != null;
}
});
You could put all that together like this:
Collection<String> resourceIds = Collections2.filter(
Lists.newArrayList(Iterables.transform(matchingComputers, new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
})), new Predicate<String>() {
#Override
public boolean apply(String input) {
return input != null;
}
});
But this is hardly elegant, let alone readable, for such a simple task! In fact, plain old Java code (with no fancy Predicate or Function stuff at all) would arguably be much cleaner:
Collection<String> resourceIds = Lists.newArrayList();
for (Computer computer : matchingComputers) {
String resourceId = computer.getResourceId();
if (resourceId != null) {
resourceIds.add(resourceId);
}
}
Using the above is certainly also an option, but out of curiosity (and desire to learn more of Google Collections), can you do the exact same thing in some shorter or more elegant way using Google Collections?
There's already a predicate in Predicates that will help you here -- Predicates.notNull() -- and you can use Iterables.filter() and the fact that Lists.newArrayList() can take an Iterable to clean this up a little more.
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(
Iterables.transform(matchingComputers, yourFunction),
Predicates.notNull()
)
);
If you don't actually need a Collection, just an Iterable, then the Lists.newArrayList() call can go away too and you're one step cleaner again!
I suspect you might find that the Function will come in handy again, and will be most useful declared as
public class Computer {
// ...
public static Function<Computer, String> TO_ID = ...;
}
which cleans this up even more (and will promote reuse).
A bit "prettier" syntax with FluentIterable (since Guava 12):
ImmutableList<String> resourceIds = FluentIterable.from(matchingComputers)
.transform(getResourceId)
.filter(Predicates.notNull())
.toList();
static final Function<Computer, String> getResourceId =
new Function<Computer, String>() {
#Override
public String apply(Computer computer) {
return computer.getResourceId();
}
};
Note that the returned list is an ImmutableList. However, you can use copyInto() method to pour the elements into an arbitrary collection.
It took longer than #Jon Skeet expected, but Java 8 streams do make this simple:
List<String> resourceIds = computers.stream()
.map(Computer::getResourceId)
.filter(Objects::nonNull)
.collect(Collectors.toList());
You can also use .filter(x -> x != null) if you like; the difference is very minor.
Firstly, I'd create a constant filter somewhere:
public static final Predicate<Object> NULL_FILTER = new Predicate<Object>() {
#Override
public boolean apply(Object input) {
return input != null;
}
}
Then you can use:
Iterable<String> ids = Iterables.transform(matchingComputers,
new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
}));
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(ids, NULL_FILTER));
You can use the same null filter everywhere in your code.
If you use the same computing function elsewhere, you can make that a constant too, leaving just:
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(
Iterables.transform(matchingComputers, RESOURCE_ID_PROJECTION),
NULL_FILTER));
It's certainly not as nice as the C# equivalent would be, but this is all going to get a lot nicer in Java 7 with closures and extension methods :)
You could write your own method like so. this will filter out nulls for any Function that returns null from the apply method.
public static <F, T> Collection<T> transformAndFilterNulls(List<F> fromList, Function<? super F, ? extends T> function) {
return Collections2.filter(Lists.transform(fromList, function), Predicates.<T>notNull());
}
The method can then be called with the following code.
Collection c = transformAndFilterNulls(Lists.newArrayList("", "SD", "DDF"), new Function<String, Long>() {
#Override
public Long apply(String s) {
return s.isEmpty() ? 20L : null;
}
});
System.err.println(c);

Categories

Resources