Lambdas, multiple forEach with casting - java

Need some help thinking in lambdas from my fellow StackOverflow luminaries.
Standard case of picking through a list of a list of a list to collect some children deep in a graph. What awesome ways could Lambdas help with this boilerplate?
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
for (final Service service : server.findServices()) {
if (service.getContainer() instanceof Engine) {
final Engine engine = (Engine) service.getContainer();
for (final Container possibleHost : engine.findChildren()) {
if (possibleHost instanceof Host) {
final Host host = (Host) possibleHost;
for (final Container possibleContext : host.findChildren()) {
if (possibleContext instanceof Context) {
final Context context = (Context) possibleContext;
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
}
}
}
}
}
}
return list;
}
Note the list itself is going to the client as JSON, so don't focus on what is returned. Must be a few neat ways I can cut down the loops.
Interested to see what my fellow experts create. Multiple approaches encouraged.
EDIT
The findServices and the two findChildren methods return arrays
EDIT - BONUS CHALLENGE
The "not important part" did turn out to be important. I actually need to copy a value only available in the host instance. This seems to ruin all the beautiful examples. How would one carry state forward?
final ContextInfo info = new ContextInfo(context.getPath());
info.setHostname(host.getName()); // The Bonus Challenge

It's fairly deeply nested but it doesn't seem exceptionally difficult.
The first observation is that if a for-loop translates into a stream, nested for-loops can be "flattened" into a single stream using flatMap. This operation takes a single element and returns an arbitrary number elements in a stream. I looked up and found that StandardServer.findServices() returns an array of Service so we turn this into a stream using Arrays.stream(). (I make similar assumptions for Engine.findChildren() and Host.findChildren().
Next, the logic within each loop does an instanceof check and a cast. This can be modeled using streams as a filter operation to do the instanceof followed by a map operation that simply casts and returns the same reference. This is actually a no-op but it lets the static typing system convert a Stream<Container> to a Stream<Host> for example.
Applying these transformations to the nested loops, we get the following:
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
return list;
}
But wait, there's more.
The final forEach operation is a slightly more complicated map operation that converts a Context into a ContextInfo. Furthermore, these are just collected into a List so we can use collectors to do this instead of creating and empty list up front and then populating it. Applying these refactorings results in the following:
public List<ContextInfo> list() {
final StandardServer server = getServer();
return Arrays.stream(server.findServices())
.filter(service -> service.getContainer() instanceof Engine)
.map(service -> (Engine)service.getContainer())
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(possibleHost -> possibleHost instanceof Host)
.map(possibleHost -> (Host)possibleHost)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(possibleContext -> possibleContext instanceof Context)
.map(possibleContext -> (Context)possibleContext)
.map(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
}
I usually try to avoid multi-line lambdas (such as in the final map operation) so I'd refactor it into a little helper method that takes a Context and returns a ContextInfo. This doesn't shorten the code at all, but I think it does make it clearer.
UPDATE
But wait, there's still more.
Let's extract the call to service.getContainer() into its own pipeline element:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.filter(container -> container instanceof Engine)
.map(container -> (Engine)container)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
// ...
This exposes the repetition of filtering on instanceof followed by a mapping with a cast. This is done three times in total. It seems likely that other code is going to need to do similar things, so it would be nice to extract this bit of logic into a helper method. The problem is that filter can change the number of elements in the stream (dropping ones that don't match) but it can't change their types. And map can change the types of elements, but it can't change their number. Can something change both the number and types? Yes, it's our old friend flatMap again! So our helper method needs to take an element and return a stream of elements of a different type. That return stream will contain a single casted element (if it matches) or it will be empty (if it doesn't match). The helper function would look like this:
<T,U> Stream<U> toType(T t, Class<U> clazz) {
if (clazz.isInstance(t)) {
return Stream.of(clazz.cast(t));
} else {
return Stream.empty();
}
}
(This is loosely based on C#'s OfType construct mentioned in some of the comments.)
While we're at it, let's extract a method to create a ContextInfo:
ContextInfo makeContextInfo(Context context) {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
}
After these extractions, the pipeline looks like this:
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren()))
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(this::makeContextInfo)
.collect(Collectors.toList());
Nicer, I think, and we've removed the dreaded multi-line statement lambda.
UPDATE: BONUS CHALLENGE
Once again, flatMap is your friend. Take the tail of the stream and migrate it into the last flatMap before the tail. That way the host variable is still in scope, and you can pass it to a makeContextInfo helper method that's been modified to take host as well.
return Arrays.stream(server.findServices())
.map(service -> service.getContainer())
.flatMap(container -> toType(container, Engine.class))
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.flatMap(possibleHost -> toType(possibleHost, Host.class))
.flatMap(host -> Arrays.stream(host.findChildren())
.flatMap(possibleContext -> toType(possibleContext, Context.class))
.map(ctx -> makeContextInfo(ctx, host)))
.collect(Collectors.toList());

This would be my version of your code using JDK 8 streams, method references, and lambda expressions:
server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> Arrays.stream(host.findChildren()))
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
return info;
})
.collect(Collectors.toList());
In this approach, I replace your if-statements for filter predicates. Take into account that an instanceof check can be replaced with a Predicate<T>
Predicate<Object> isEngine = someObject -> someObject instanceof Engine;
which can also be expressed as
Predicate<Object> isEngine = Engine.class::isInstance
Similarly, your casts can be replaced by Function<T,R>.
Function<Object,Engine> castToEngine = someObject -> (Engine) someObject;
Which is pretty much the same as
Function<Object,Engine> castToEngine = Engine.class::cast;
And adding items manually to a list in the for loop can be replaced with a collector. In production code, the lambda that transforms a Context into a ContextInfo can (and should) be extracted into a separate method, and used as a method reference.

Solution to bonus challenge
Inspired by #EdwinDalorzo answer.
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<>();
final StandardServer server = getServer();
return server.findServices()
.stream()
.map(Service::getContainer)
.filter(Engine.class::isInstance)
.map(Engine.class::cast)
.flatMap(engine -> Arrays.stream(engine.findChildren()))
.filter(Host.class::isInstance)
.map(Host.class::cast)
.flatMap(host -> mapContainers(
Arrays.stream(host.findChildren()), host.getName())
)
.collect(Collectors.toList());
}
private static Stream<ContextInfo> mapContainers(Stream<Container> containers,
String hostname) {
return containers
.filter(Context.class::isInstance)
.map(Context.class::cast)
.map(context -> {
ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
info.setHostname(hostname); // The Bonus Challenge
return info;
});
}

First attempt beyond ugly. It will be years before I find this readable. Has to be a better way.
Note the findChildren methods return arrays which of course work with for (N n: array) syntax, but not with the new Iterable.forEach method. Had to wrap them with Arrays.asList
public List<ContextInfo> list() {
final List<ContextInfo> list = new ArrayList<ContextInfo>();
final StandardServer server = getServer();
asList(server.findServices()).forEach(service -> {
if (!(service.getContainer() instanceof Engine)) return;
final Engine engine = (Engine) service.getContainer();
instanceOf(Host.class, asList(engine.findChildren())).forEach(host -> {
instanceOf(Context.class, asList(host.findChildren())).forEach(context -> {
// copy to another object -- not the important part
final ContextInfo info = new ContextInfo(context.getPath());
info.setThisPart(context.getThisPart());
info.setNotImportant(context.getNotImportant());
list.add(info);
});
});
});
return list;
}
The utility methods
public static <T> Iterable<T> instanceOf(final Class<T> type, final Collection collection) {
final Iterator iterator = collection.iterator();
return () -> new SlambdaIterator<>(() -> {
while (iterator.hasNext()) {
final Object object = iterator.next();
if (object != null && type.isAssignableFrom(object.getClass())) {
return (T) object;
}
}
throw new NoSuchElementException();
});
}
And finally a Lambda-powerable implementation of Iterable
public static class SlambdaIterator<T> implements Iterator<T> {
// Ya put your Lambdas in there
public static interface Advancer<T> {
T advance() throws NoSuchElementException;
}
private final Advancer<T> advancer;
private T next;
protected SlambdaIterator(final Advancer<T> advancer) {
this.advancer = advancer;
}
#Override
public boolean hasNext() {
if (next != null) return true;
try {
next = advancer.advance();
return next != null;
} catch (final NoSuchElementException e) {
return false;
}
}
#Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
final T v = next;
next = null;
return v;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Lots of plumbing and no doubt 5x the byte code. Must be a better way.

Related

Filter list contains multiple objects java

The code below works fine for me now, but it is not future proof, becuase the numbers of if else statments and instanceof. I would like to extend the Transport list with more objects like bicyles, motors etc.... but every time when I add new object I need to add more if else statements and create more instanceof. Does anyone have a better idea or better solution?
private static Transport filterObjects(List<Transport> listOfTransport, int refNr) {
List<Transport> cars = listOfTransport.stream()
.filter(transport -> transport instanceof Cars)
.collect(Collectors.toList());
List<Transport> airPlanes = listOfTransport.stream()
.filter(transport -> transport instanceof Airplanes)
.collect(Collectors.toList());
if (!cars.isEmpty()){
return cars.get(refNr);
} else if (!airPlanes.isEmpty()) {
return airPlanes.get(refNr);
} else {
return null;
}
}
Pass in the subtype you want. Maybe this would work:
private static Transport filterObjects(List<Transport> listOfTransport, Class clazz, int refNr) {
List<Transport> transports = listOfTransport.stream().filter(clazz::isInstance).collect(Collectors.toList());
return !transports.isEmpty() ? transports.get(refNr) : null;
}
Just as you currently prioritize cars over planes, as your transport types grow you also need some kind of priority on which to return preferentially. You can solve this with an enum. You only need to expand your enum accordingly as soon as you add a new transport type. The enum could look something like:
enum Priority{
Car(1),
Airplane(2);
private int value;
Priority (int value) {
this.value = value;
}
public int getValue() {
return value;
}
}
Then you can refactor your method by grouping the elements of your list by their simple class names and adding them to a sorted map using the priority you define in your enum. You can then use the first entry of the map to determine the return value. Example:
private static Transport filterObjects(List<Transport> listOfTransport, int refNr) {
Comparator<String> comp = Comparator.comparingInt(e -> Priority.valueOf(e).getValue());
List<Transport> result =
listOfTransport.stream()
.collect(Collectors.groupingBy(
e -> e.getClass().getSimpleName(),
() -> new TreeMap<>(comp),
Collectors.toList()))
.firstEntry().getValue();
return (result != null && 0 <= refNr && refNr < result.size()) ?
result.get(refNr) : null;
}
First group the list elements into a map based on subtype, then create a list of subtypes of transport. Iterate this list and then check if corresponding entry exists in the map:
private static final List<Class> subTypes = List.of(Cars.class, Airplanes.class);
private static Transport filterObjects(List<Transport> listOfTransport, int refNr) {
Map<Class, List<Transport>> map = listOfTransport.stream()
.collect(Collectors.groupingBy(t -> t.getClass()));
Optional<List<Transport>> op = subTypes.stream()
.filter(map::containsKey)
.findFirst();
if(op.isPresent()) {
return op.get().get(refNr); // This could cause IndexOutOfBoundsException
}else{
return null;
}
}
Well, you could do the following.
First, define your order:
static final List<Class<? extends Transport>> ORDER = List.of(
Car.class,
Airplane.class
);
Then, you could write the following method:
private static Transport filterObjects(List<Transport> listOfTransport, int refNr) {
Map<Class<? extends Transport>, Transport> map = listOfTransport.stream()
.collect(Collectors.groupingBy(Transport::getClass, Collectors.collectingAndThen(Collectors.toList(), list -> list.get(refNr))));
return ORDER.stream()
.filter(map::containsKey)
.map(map::get)
.findFirst()
.orElse(null);
}
What this does, is mapping each distinct Class to the refNrth element which is a subtype of the respective class.
Then it walks over ORDER and checks if an element has been found within the original listOfTransport. The key won't exist in the map if listOfTransport does not contain any element of the particular class.
Note that if any element of a particular class exists in the map, the number of elements of that class is assumed to be at least refNr, otherwise an IndexOutOfBoundsException is thrown. With other words, each transport must occur 0 or at least refNr times within the listOfTransport.
Also note that getClass() does not necessarily yield the same result as instanceof. However, I have assumed here that each respective transport does not have further subclasses.

Optimize Nested-if using any alternative DataStructure in Java

How to optimize the nested-if block to have a quick comparison. Below is my code where it compares two different java objects. I have a member variable which has the pattern too which lies in one of the if block.
listOfFilters is a subset of Map<String, List<Filter>>. Below method is invoked with the below signature. This list can be as many as 400~1000.
checkRequest(incomingRequest,map.get(incomingRequest.getFiltersForThis()))
Problem -
public boolean checkRequest(Request incomingRequest, List<Filter> listOfFilters){
for(Filter filter : listOfFilters){
if(incomingRequest.getName() == filter.getName()){
if(incomingRequest.getOrigen() == filter.getOrigen()){
.....
.....
.....
filterMatched = true;
}
}
}
}
}
}
I need to compare the incoming request as above with each Filter available in the system. O(n) is the complexity.
Is there any way I can use the data structure to reduce the complexity from O(n) to O(log n).
Performance hits when the number of filters configured is more in the system.
I cannot use hashcode() or equals() because the incomingRequest should still succeed if the corresponding filter field is not available for it. It means the incomingRequest should match all the filter values but, in case if it doesn't have related filter field, it should just pass.
public boolean checkMatchOrigen(){
return (filter.getOrigen() == null || filter.getOrigen().isEmpty()) ||
(incomingRequest.getOrigen() != null &&
incomingRequest.getOrigen().trim().equals(filter.getOrigen()));
}
You could create a structure like a decision tree or a database index. There is the rather complicated task.
For example, you have four filters:
Name is n1, origin is o1;
Name is n1, origin is o2;
Name is n2, origin is o1;
Name is n2, origin is o5;
One of possible decision trees is:
or-->nameIs(n1)->and->or-->originIs(o1)
| |->originIs(o2)
|
|->nameIs(n2)->and->or-->originIs(o1)
|->originIs(o5)
The idea is to check 'n1' only once for both filters included it and so on. Usually, the stronges filters have to be checked first. Again, it's difficult to predict, which filter will reject more requests.
For example, i've build the tree from your data structure:
public class DemoApplication {
// Group filter list by names, except nulls
public static Map<String, List<Filter>> mapNameToFilter(List<Filter> filters) {
return filters
.stream()
.filter(filter -> filter.getName() != null)
.collect(groupingBy(Filter::getName));
}
// Create predicate to check name and all chunked origins for all entries
public static Predicate<Request> createPredicateByNameAndOrigin(Map<String, List<Filter>> nameToFilterMap) {
return nameToFilterMap
.keySet()
.stream()
.map(name -> {
final Predicate<Request> filterByName = request -> name.equals(request.getName());
final Map<String, List<Filter>> originToFilterMap = mapOriginToFilter(nameToFilterMap.get(name));
return filterByName.and(createPredicateByOrigin(originToFilterMap));
})
.reduce(Predicate::or)
.orElse(filter -> true);
}
// Group filter list by origins, except nulls
public static Map<String, List<Filter>> mapOriginToFilter(List<Filter> filters) {
return filters
.stream()
.filter(filter -> filter.getOrigin() != null)
.collect(groupingBy(Filter::getOrigin));
}
// Create predicate to check origin for all entries
public static Predicate<Request> createPredicateByOrigin(Map<String, List<Filter>> originToFilterMap) {
return originToFilterMap
.keySet()
.stream()
.map(origin -> {
final Predicate<Request> filterByOrigin = request -> origin.equals(request.getOrigin());
return filterByOrigin; // Or go deeper to create more complex predicate
})
.reduce(Predicate::or)
.orElse(filter -> true);
}
public static void main(String[] args) {
List<Filter> list = new ArrayList<>();
list.add(new Filter("n1", "o1"));
list.add(new Filter("n1", "o2"));
list.add(new Filter("n2", "o1"));
list.add(new Filter("n2", "o5"));
list.add(new Filter(null, "o10"));
list.add(new Filter(null, "o20"));
Predicate<Request> p = createPredicateByNameAndOrigin(mapNameToFilter(list));
System.out.println(p.test(new RequestImpl("n1", "2")));
System.out.println(p.test(new RequestImpl("n1", "1")));
System.out.println(p.test(new RequestImpl("n2", "1")));
System.out.println(p.test(new RequestImpl("n10", "3")));
}
}
I've used JDK Predicates which can be presented as a tree with operations as nodes. There is no correct processing with null values in this realization, but it can be easy added.
Note, that my tree is static and need to be rebuilded after each change of the filter list. And it's not balanced. So it's not a solution, just an example.
If you need only filter by equality critera, you could create map for each field. Again, the same grouping idea when checking. In this case, you can dynamically rebuild searching maps:
public class DemoApplication {
public static List<Filter> filters = new ArrayList<>();
public static Map<String, Set<Filter>> nameToFiltersMap = new HashMap<>();
public static Map<String, Set<Filter>> originToFiltersMap = new HashMap<>();
public static void addFilter(Filter filter) {
filters.add(filter);
// Rebuild name index
Set<Filter> nameFilters = nameToFiltersMap.getOrDefault(filter.getName(), new HashSet<>());
nameFilters.add(filter);
nameToFiltersMap.put(filter.getName(), nameFilters);
// Rebuild origin index
Set<Filter> originFilters = originToFiltersMap.getOrDefault(filter.getOrigin(), new HashSet<>());
originFilters.add(filter);
originToFiltersMap.put(filter.getOrigin(), originFilters);
}
public static boolean test(Request request) {
// Get all filters matched by name
Set<Filter> nameFilters = nameToFiltersMap.get(request.getName());
if (nameFilters != null) {
// Get all filters matched by origin
Set<Filter> originFilters = originToFiltersMap.get(request.getOrigin());
for (Filter nameFilter: nameFilters) {
if (originFilters != null && originFilters.contains(nameFilter)) {
return true; //filter matches
}
}
}
return false;
}
public static void main(String[] args){
addFilter(new Filter("n1", "o1"));
addFilter(new Filter("n1", "o2"));
addFilter(new Filter("n2", "o1"));
addFilter(new Filter("n2", "o5"));
addFilter(new Filter(null, "o7"));
addFilter(new Filter(null, "o8"));
System.out.println(test(new RequestImpl(null, "o7")));
System.out.println(test(new RequestImpl(null, "o9")));
System.out.println(test(new RequestImpl("n1", "o1")));
System.out.println(test(new RequestImpl("n1", "o3")));
System.out.println(test(new RequestImpl("n2", "o5")));
System.out.println(test(new RequestImpl("n3", "o3")));
}
}
Also, you can create a custom tree data structure with dynamic rebuilding and rebalancing. But may be better to use database or searching engine?
First, you should not use Object as the type of the request. At least for this question, use an interface having the appropriate methods, so that your code has a chance to compile.
interface Request { ... }
Then, if you have really many filters, you can group these filters by name.
Map<String, List<Filter>> filtersByName = ...;
After that, your filtering code becomes:
String reqName = blankToNull(request.getName());
if (reqName != null) {
List<Filter> nameFilters = filtersByName.get(reqName);
if (anyFilterMatches(nameFilters, request)) {
return Decision.REJECT;
}
}
If any of these filters rejects the request, you're done. Otherwise proceed with the next field.
This pattern will be more efficient if the names of the filters differ a lot.

Java Lazy Stream of Strings including List<String>

I'm creating a Stream of String lazily, for the first two simple items. However, part of my stream is List of String.
Stream<String> streamA = Stream.concat(
Stream.generate(item::getStringA),
Stream.generate(item::getStringB))
return Stream.concat(streamA, item.getStringList(param).stream())
The above works, but .getStringList needs to be called lazily as well. It's not clear to me how to fetch it and "merge" it in with the rest of the stream.
I think, what you actually want to do, is
return Stream.<Supplier<Stream<String>>>of(
() -> Stream.of(item.getStringA()),
() -> Stream.of(item.getStringB()),
() -> item.getStringList(param).stream())
.flatMap(Supplier::get);
This produces a fully lazy Stream<String> where, e.g. .limit(0).count() will not call any method on item or .findFirst() will only invoke getStringA(), etc.
The stream’s content will be equivalent to
Stream.concat(
Stream.of(item.getStringA(), item.getStringB()), item.getStringList(param).stream())
I don't think any of this does what you think it does. Stream.generate always generates an infinite stream. But the closest thing to what you want is going to be
StreamSupport.stream(() -> item.getStringList(param).spliterator(), 0, false)
...which will lazily call item.getStringList(param). (What you want isn't really an intended use case of Stream, so it's not very well supported.)
What you could do is return a Stream from your item.getStringList()
public static void main(String[] args) throws InterruptedException {
Item item = new Item();
Stream<String> a = Stream.concat(Stream.of("A"), Stream.of("B"));
Stream<String> anotherStream = Stream.concat(a, item.getStringList());
anotherStream.forEach(System.out::println);
}
private static class Item {
public Stream<String> getStringList() {
List<String> l = new ArrayList<>();
l.add("C");
l.add("D");
l.add("E");
l.add("F");
final AtomicInteger i = new AtomicInteger(0);
return Stream.iterate(l.get(i.get()), (f) -> {
// Proof of laziness
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return l.get(i.getAndIncrement());
})
// Iterate is by default unbound
.limit(l.size());
}
}
I'm not sure how helpful that approach would be still, since your list is still in the memory.

Java Lambda Stream Distinct() on arbitrary key? [duplicate]

This question already has answers here:
Java 8 Distinct by property
(34 answers)
Closed 3 years ago.
I frequently ran into a problem with Java lambda expressions where when I wanted to distinct() a stream on an arbitrary property or method of an object, but wanted to keep the object rather than map it to that property or method. I started to create containers as discussed here but I started to do it enough to where it became annoying and made a lot of boilerplate classes.
I threw together this Pairing class, which holds two objects of two types and allows you to specify keying off the left, right, or both objects. My question is... is there really no built-in lambda stream function to distinct() on a key supplier of some sorts? That would really surprise me. If not, will this class fulfill that function reliably?
Here is how it would be called
BigDecimal totalShare = orders.stream().map(c -> Pairing.keyLeft(c.getCompany().getId(), c.getShare())).distinct().map(Pairing::getRightItem).reduce(BigDecimal.ZERO, (x,y) -> x.add(y));
Here is the Pairing class
public final class Pairing<X,Y> {
private final X item1;
private final Y item2;
private final KeySetup keySetup;
private static enum KeySetup {LEFT,RIGHT,BOTH};
private Pairing(X item1, Y item2, KeySetup keySetup) {
this.item1 = item1;
this.item2 = item2;
this.keySetup = keySetup;
}
public X getLeftItem() {
return item1;
}
public Y getRightItem() {
return item2;
}
public static <X,Y> Pairing<X,Y> keyLeft(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.LEFT);
}
public static <X,Y> Pairing<X,Y> keyRight(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.RIGHT);
}
public static <X,Y> Pairing<X,Y> keyBoth(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.BOTH);
}
public static <X,Y> Pairing<X,Y> forItems(X item1, Y item2) {
return keyBoth(item1, item2);
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item1 == null) ? 0 : item1.hashCode());
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item2 == null) ? 0 : item2.hashCode());
}
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pairing<?,?> other = (Pairing<?,?>) obj;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
if (item1 == null) {
if (other.item1 != null)
return false;
} else if (!item1.equals(other.item1))
return false;
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
if (item2 == null) {
if (other.item2 != null)
return false;
} else if (!item2.equals(other.item2))
return false;
}
return true;
}
}
UPDATE:
Tested Stuart's function below and it seems to work great. The operation below distincts on the first letter of each string. The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream
public class DistinctByKey {
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
public static void main(String[] args) {
final ImmutableList<String> arpts = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI");
arpts.stream().filter(distinctByKey(f -> f.substring(0,1))).forEach(s -> System.out.println(s));
}
Output is...
ABQ
CHI
PHX
BWI
The distinct operation is a stateful pipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:
/**
* Stateful filter. T is type of stream element, K is type of extracted key.
*/
static class DistinctByKey<T,K> {
Map<K,Boolean> seen = new ConcurrentHashMap<>();
Function<T,K> keyExtractor;
public DistinctByKey(Function<T,K> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKey class.
This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.
UPDATE
It's possible to get rid of the K type parameter since it's not actually used for anything other than being stored in a map. So Object is sufficient.
/**
* Stateful filter. T is type of stream element.
*/
static class DistinctByKey<T> {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
Function<T,Object> keyExtractor;
public DistinctByKey(Function<T,Object> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.
(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T> and rename the method to eval. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)
UPDATE 2
Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
BigDecimal totalShare = orders.stream()
.filter(distinctByKey(o -> o.getCompany().getId()))
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
You more or less have to do something like
elements.stream()
.collect(Collectors.toMap(
obj -> extractKey(obj),
obj -> obj,
(first, second) -> first
// pick the first if multiple values have the same key
)).values().stream();
Another way of finding distinct elements
List<String> uniqueObjects = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI")
.stream()
.collect(Collectors.groupingBy((p)->p.substring(0,1))) //expression
.values()
.stream()
.flatMap(e->e.stream().limit(1))
.collect(Collectors.toList());
A variation on Stuart Marks second update. Using a Set.
public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Set<Object> seen = Collections.newSetFromMap(new ConcurrentHashMap<>());
return t -> seen.add(keyExtractor.apply(t));
}
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
To answer your question in your second update:
The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream:
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
In your code sample, distinctByKey is only invoked one time, so the ConcurrentHashMap created just once. Here's an explanation:
The distinctByKey function is just a plain-old function that returns an object, and that object happens to be a Predicate. Keep in mind that a predicate is basically a piece of code that can be evaluated later. To manually evaluate a predicate, you must call a method in the Predicate interface such as test. So, the predicate
t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null
is merely a declaration that is not actually evaluated inside distinctByKey.
The predicate is passed around just like any other object. It is returned and passed into the filter operation, which basically evaluates the predicate repeatedly against each element of the stream by calling test.
I'm sure filter is more complicated than I made it out to be, but the point is, the predicate is evaluated many times outside of distinctByKey. There's nothing special* about distinctByKey; it's just a function that you've called one time, so the ConcurrentHashMap is only created one time.
*Apart from being well made, #stuart-marks :)
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
ListIterate.distinct(list, HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
If you can refactor list to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
list.distinct(HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Set.add(element) returns true if the set did not already contain element, otherwise false.
So you can do like this.
Set<String> set = new HashSet<>();
BigDecimal totalShare = orders.stream()
.filter(c -> set.add(c.getCompany().getId()))
.map(c -> c.getShare())
.reduce(BigDecimal.ZERO, BigDecimal::add);
If you want to do this parallel, you must use concurrent map.
It can be done something like
Set<String> distinctCompany = orders.stream()
.map(Order::getCompany)
.collect(Collectors.toSet());

Is there an elegant way to remove nulls while transforming a Collection using Guava?

I have a question about simplifying some Collection handling code, when using Google Collections (update: Guava).
I've got a bunch of "Computer" objects, and I want to end up with a Collection of their "resource id"s. This is done like so:
Collection<Computer> matchingComputers = findComputers();
Collection<String> resourceIds =
Lists.newArrayList(Iterables.transform(matchingComputers, new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
}));
Now, getResourceId() may return null (and changing that is not an option right now), yet in this case I'd like to omit nulls from the resulting String collection.
Here's one way to filter nulls out:
Collections2.filter(resourceIds, new Predicate<String>() {
#Override
public boolean apply(String input) {
return input != null;
}
});
You could put all that together like this:
Collection<String> resourceIds = Collections2.filter(
Lists.newArrayList(Iterables.transform(matchingComputers, new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
})), new Predicate<String>() {
#Override
public boolean apply(String input) {
return input != null;
}
});
But this is hardly elegant, let alone readable, for such a simple task! In fact, plain old Java code (with no fancy Predicate or Function stuff at all) would arguably be much cleaner:
Collection<String> resourceIds = Lists.newArrayList();
for (Computer computer : matchingComputers) {
String resourceId = computer.getResourceId();
if (resourceId != null) {
resourceIds.add(resourceId);
}
}
Using the above is certainly also an option, but out of curiosity (and desire to learn more of Google Collections), can you do the exact same thing in some shorter or more elegant way using Google Collections?
There's already a predicate in Predicates that will help you here -- Predicates.notNull() -- and you can use Iterables.filter() and the fact that Lists.newArrayList() can take an Iterable to clean this up a little more.
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(
Iterables.transform(matchingComputers, yourFunction),
Predicates.notNull()
)
);
If you don't actually need a Collection, just an Iterable, then the Lists.newArrayList() call can go away too and you're one step cleaner again!
I suspect you might find that the Function will come in handy again, and will be most useful declared as
public class Computer {
// ...
public static Function<Computer, String> TO_ID = ...;
}
which cleans this up even more (and will promote reuse).
A bit "prettier" syntax with FluentIterable (since Guava 12):
ImmutableList<String> resourceIds = FluentIterable.from(matchingComputers)
.transform(getResourceId)
.filter(Predicates.notNull())
.toList();
static final Function<Computer, String> getResourceId =
new Function<Computer, String>() {
#Override
public String apply(Computer computer) {
return computer.getResourceId();
}
};
Note that the returned list is an ImmutableList. However, you can use copyInto() method to pour the elements into an arbitrary collection.
It took longer than #Jon Skeet expected, but Java 8 streams do make this simple:
List<String> resourceIds = computers.stream()
.map(Computer::getResourceId)
.filter(Objects::nonNull)
.collect(Collectors.toList());
You can also use .filter(x -> x != null) if you like; the difference is very minor.
Firstly, I'd create a constant filter somewhere:
public static final Predicate<Object> NULL_FILTER = new Predicate<Object>() {
#Override
public boolean apply(Object input) {
return input != null;
}
}
Then you can use:
Iterable<String> ids = Iterables.transform(matchingComputers,
new Function<Computer, String>() {
public String apply(Computer from) {
return from.getResourceId();
}
}));
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(ids, NULL_FILTER));
You can use the same null filter everywhere in your code.
If you use the same computing function elsewhere, you can make that a constant too, leaving just:
Collection<String> resourceIds = Lists.newArrayList(
Iterables.filter(
Iterables.transform(matchingComputers, RESOURCE_ID_PROJECTION),
NULL_FILTER));
It's certainly not as nice as the C# equivalent would be, but this is all going to get a lot nicer in Java 7 with closures and extension methods :)
You could write your own method like so. this will filter out nulls for any Function that returns null from the apply method.
public static <F, T> Collection<T> transformAndFilterNulls(List<F> fromList, Function<? super F, ? extends T> function) {
return Collections2.filter(Lists.transform(fromList, function), Predicates.<T>notNull());
}
The method can then be called with the following code.
Collection c = transformAndFilterNulls(Lists.newArrayList("", "SD", "DDF"), new Function<String, Long>() {
#Override
public Long apply(String s) {
return s.isEmpty() ? 20L : null;
}
});
System.err.println(c);

Categories

Resources