I have a project where, in different scenarios, I have to work on different subsets of a large dataset. The way I have written the code, there is a Collector interface, and a class DataCollector implements Collector. The class DataCollector is instantiated with the condition of the subset-creation, and these conditions are enums.
Let's say the dataset is a set of 1 million English words, and I want to work on the subset of words consisting of odd number of letters. Then, I do the following:
DataCollector dataCollector = new DataCollector(CollectionType.WORDS_OF_ODD_LENGTH);
Set<String> fourLetteredWords = dataCollector.collect();
where CollectionType is the enum class
enum CollectionType {
WORDS_OF_ODD_LENGTH,
WORDS_OF_EVEN_LENGTH,
STARTING_WITH_VOWEL,
STARTING_WITH_CONSONANT,
....
}
The data collector calls a java.util.Predicate depending on the enum with which it was instantiated.
So far, this approach has been robust and flexible enough, but now I am facing increasingly complex scenarios (e.g., collect words of even length starting with a vowel). I would like to avoid adding new CollectionType for every such scenario. What I have noticed is that many of these complex scenarios are just logical operations on the simpler ones (e.g., condition_1 && (condition_2 || condition_3)).
The end-user is the one who specifies these conditions, and the only control I have is that I can specify the set of such conditions. As in, the end-user can only select from CollectionType. Right now, I am trying to generalize from the ability to select only one condition to the ability to select one or more. For that, I need something like
DataCollector dataCollector = new DataCollector(WORDS_OF_ODD_LENGTH &&
STARTING_WITH_VOWEL);
Is there a way I model my enums to carry out such operations? I am open to other ideas (as in, should I just scrap this enum-based approach for something else, etc.).
I suggest you use Java 8 which has Predicate and operations supporting predicates.
enum CollectionType implements Predicate<String> {
WORDS_OF_ODD_LENGTH(s -> s.length() % 2 != 0),
WORDS_OF_EVEN_LENGTH(WORDS_OF_ODD_LENGTH.negate()),
STARTING_WITH_VOWEL(s -> isVowel(s.charAt(0))),
STARTING_WITH_CONSONANT(STARTING_WITH_VOWEL.negate()),
COMPLEX_CHECK(CollectionType::complexCheck);
private final Predicate<String> predicate;
CollectionType(Predicate<String> predicate) {
this.predicate = predicate;
}
static boolean isVowel(char c) {
return "AEIOUaeiou".indexOf(c) >= 0;
}
public boolean test(String s) {
return predicate.test(s);
}
public static boolean complexCheck(String s) {
// many lines of code, calling many methods
}
}
The you can write a Predicate like
Predicate<String> p = WORDS_OF_ODD_LENGTH.and(STARTING_WITH_CONSONANT);
or even five letter words starting with a vowel
Predicate<String> p = STARTING_WITH_VOWEL.and(s -> s.length() == 5);
Say you wanted to use this filter on reading the file, you can do
List<String> oddWords = Files.lines(path).filter(WORDS_OF_ODD_LENGTH).collect(toList());
Or you could index them as you load them with
Map<Integer, List<String>> wordsBySize = Files.lines(path)
.collect(groupBy(s -> s.length()));
Even though you have made your enum is a Predicate you can optimise its usage like this.
if (predicate == WORDS_OF_ODD_LENGTH || predicate == WORDS_OF_EVEN_LENGTH) {
// assume if the first word in a list of words of the same length
// then take all words of that length.
return wordsBySize.values().stream()
.filter(l -> predicate.test(l.get(0)))
.flatMap(l -> l.stream()).collect(toList());
} else {
return wordsBySize.values().stream()
.flatMap(l -> l.stream())
.filter(predicate)
.collect(toList());
}
i.e. by using enum you can recognise some predicates and optimise for them. (Whether that is a good idea or not I will leave to you)
Related
In the following code, a local method is called on every element of a HashSet. If it returns a special value we halt the loop. Otherwise we add every return value to a new HashSet.
HashSet<Object> myHashSet=…;
HashSet<Object> mySecondHashSet=…;
for (Object s : myHashSet) {
Object value = my_method(s);
if(value==specialValue)
return value;
else
mySecondHashSet.add(value);
}
I’d like to parralelize this process. None of the objects in the HashSet have any objects in common (it’s a tree-like structure) so I know they can run without any synchonization issues. How do I modify the code such that each call of my_method(s) starts a new tread, and also that if one of the threads evaluates to the special values, all the threads halt without returning and the special value is returned?
Having in mind java 8, this could be relatively simple, while it won't preserve your initial code semantics:
In case all you need is to return special value once you hit it
if (myHashSet.parallelStream()
.map(x -> method(x))
.anyMatch(x -> x == specialValue)) {
return specialValue;
}
If you need to keep transformed values until you meet the special value, you already got an answer from #Elliot in comments, while need to mention that semantic is not the same as your original code, since no orderer will be preserved.
While it yet to be checked, but I would expect following to be optimized and stop once it will hit wanted special value:
if (myHashSet.parallelStream()
.anyMatch(x -> method(x) == specialValue)) {
return specialValue;
}
I would do that in two passes:
find if any of the transformed set elements matches the special value;
transform them to a Set.
Starting a new thread for each transformation is way too heavy, and will bring your machine to its knees (unless you have very few elements, in which case parallelizing is probably not worth the effort.
To avoid transforming the values twice with my_method, you can do the transformation lazily and memoize the result:
private class Memoized {
private Object value;
private Object transformed;
private Function<Object, Object> transform;
public Memoized(Object value, Function<Object, Object> transform) {
this.value = value;
}
public Object getTransformed() {
if (transformed == null) {
transformed = transform.apply(value);
}
return transformed;
}
}
And then you can use the following code:
Set<Memoized> memoizeds =
myHashSet.stream() // no need to go parallel here
.map(o -> new Memoized(o, this::my_method))
.collect(Collectors.toSet());
Optional<Memoized> matching = memoized.parallelStream()
.filter(m -> m.getTransformed().equals(specialValue))
.findAny();
if (matching.isPresent()) {
return matching.get().getTransformed();
}
Set<Object> allTransformed =
memoized.parallelStream()
.map(m -> m.getTransformed())
.collect(Collectors.toSet());
I have an entity Employee
class Employee{
private String name;
private String addr;
private String sal;
}
Now i have list of these employees. I want to filter out those objects which has name = null and set addr = 'A'. I was able to achieve like below :
List<Employee> list2= list.stream()
.filter(l -> l.getName() != null)
.peek(l -> l.setAddr("A"))
.collect(Collectors.toList());
Now list2 will have all those employees whose name is not null and then set addr as A for those employees.
What i also want to find is those employees which are filtered( name == null) and save them in DB.One way i achieved is like below :
List<Employee> list2= list.stream()
.filter(l -> filter(l))
.peek(l -> l.setAddr("A"))
.collect(Collectors.toList());
private static boolean filter(Employee l){
boolean j = l.getName() != null;
if(!j)
// save in db
return j;
}
1) Is this the right way?
2) Can we do this directly in lambda expression instead of writing separate method?
Generally, you should not use side effect in behavioral parameters. See the sections “Stateless behaviors” and “Side-effects” of the package documentation. Also, it’s not recommended to use peek for non-debugging purposes, see “In Java streams is peek really only for debugging?”
There’s not much advantage in trying to squeeze all these different operations into a single Stream pipeline. Consider the clean alternative:
Map<Boolean,List<Employee>> m = list.stream()
.collect(Collectors.partitioningBy(l -> l.getName() != null));
m.get(false).forEach(l -> {
// save in db
});
List<Employee> list2 = m.get(true);
list2.forEach(l -> l.setAddr("A"));
Regarding your second question, a lambda expression allows almost everything, a method does. The differences are on the declaration, i.e. you can’t declare additional type parameters nor annotate the return type. Still, you should avoid writing too much code into a lambda expression, as, of course, you can’t create test cases directly calling that code. But that’s a matter of programming style, not a technical limitation.
If you are okay in using peek for implementing your logic (though it is not recommended unless for learning), you can do the following:
List<Employee> list2= list.stream()
.peek(l -> { // add this peek to do persistence
if(l.getName()==null){
persistInDB(l);
}
}).filter(l -> l.getName() != null)
.peek(l -> l.setAddr("A"))
.collect(Collectors.toList());
You can also do something like this:
List<Employee> list2 = list.stream()
.filter(l->{
boolean condition = l.getName()!=null;
if(condition){
l.setAddr("A");
} else {
persistInDB(l);
}
return condition;
})
.collect(Collectors.toList());
Hope this helps!
I am looking for some help in converting some code I have to use the really nifty Java 8 Stream library. Essentially I have a bunch of student objects and I would like to get back a list of filtered objects as seen below:
List<Integer> classRoomList;
Set<ScienceStudent> filteredStudents = new HashSet<>();
//Return only 5 students in the end
int limit = 5;
for (MathStudent s : mathStudents)
{
// Get the scienceStudent with the same id as the math student
ScienceStudent ss = scienceStudents.get(s.getId());
if (classRoomList.contains(ss.getClassroomId()))
{
if (!exclusionStudents.contains(ss))
{
if (limit > 0)
{
filteredStudents.add(ss);
limit--;
}
}
}
}
Of course the above is a super contrived example I made up for the sake of learning more Java 8. Assume all students are extended from a Student object with studentId and classRoomId. An additional requirement I would require is the have the result be an Immutable set.
A quite literal translation (and the required classes to play around)
interface ScienceStudent {
String getClassroomId();
}
interface MathStudent {
String getId();
}
Set<ScienceStudent> filter(
Collection<MathStudent> mathStudents,
Map<String, ScienceStudent> scienceStudents,
Set<ScienceStudent> exclusionStudents,
List<String> classRoomList) {
return mathStudents.stream()
.map(s -> scienceStudents.get(s.getId()))
.filter(ss -> classRoomList.contains(ss.getClassroomId()))
.filter(ss -> !exclusionStudents.contains(ss))
.limit(5)
.collect(Collectors.toSet());
}
Multiple conditions to filter really just translate into multiple .filter calls or a combined big filter like ss -> classRoomList.contains(ss.getClassroomId()) && !exclusion...
Regarding immutable set: You best wrap that around the result manually because collect expects a mutable collection that can be filled from the stream and returned once finished. I don't see an easy way to do that directly with streams.
The null paranoid version
return mathStudents.stream().filter(Objects::nonNull) // math students could be null
.map(MathStudent::getId).filter(Objects::nonNull) // their id could be null
.map(scienceStudents::get).filter(Objects::nonNull) // and the mapped science student
.filter(ss -> classRoomList.contains(ss.getClassroomId()))
.filter(ss -> !exclusionStudents.contains(ss))
.limit(5)
.collect(Collectors.toSet());
I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...) to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
#Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If you break the grouping and reducing steps up, you can get something cleaner:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo class:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this question that shows you should implement a hashCode method or distinct may not behave correctly.
In the case of the distinct method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent (this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
Another approach is to use my StreamEx library which provides a bunch of "partial reduction" methods including collapse:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap solution and it's simpler and somewhat more flexible as collapse is an intermediate operation, so you can collect in another way.
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap solution.
As already pointed out by others, an intermediate Map is unavoidable, as that’s the way of finding the objects to merge. Further, you should not modify source data during reduction.
Nevertheless, you can achieve both without creating multiple Foo instances:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo class, as it should have if Foo is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>) to the merged items was sufficient.
Since this is not the case and Java 8 lacks the flatMapping collector, the flatmapping step is moved to the second step, making it look much more complicated.
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.
I know in Java 8, I can do filtering like this :
List<User> olderUsers = users.stream().filter(u -> u.age > 30).collect(Collectors.toList());
But what if I have a collection and half a dozen filtering criteria, and I want to test the combination of the criteria ?
For example I have a collection of objects and the following criteria :
<1> Size
<2> Weight
<3> Length
<4> Top 50% by a certain order
<5> Top 20% by a another certain ratio
<6> True or false by yet another criteria
And I want to test the combination of the above criteria, something like :
<1> -> <2> -> <3> -> <4> -> <5>
<1> -> <2> -> <3> -> <5> -> <4>
<1> -> <2> -> <5> -> <4> -> <3>
...
<1> -> <5> -> <3> -> <4> -> <2>
<3> -> <2> -> <1> -> <4> -> <5>
...
<5> -> <4> -> <3> -> <3> -> <1>
If each testing order may give me different results, how to write a loop to automatically filter through all the combinations ?
What I can think of is to use another method that generates the testing order like the following :
int[][] getTestOrder(int criteriaCount)
{
...
}
So if the criteriaCount is 2, it will return : {{1,2},{2,1}}
If the criteriaCount is 3, it will return : {{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}}
...
But then how to most efficiently implement it with the filtering mechanism in concise expressions that comes with Java 8 ?
Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go....
One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.
Another issue is that some of the OP's "filters" are context sensitive, such as "top 50% by a certain order". This can't be done with a simple filter(predicate) construct on a stream.
The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.
Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:
class Widget {
String name() { ... }
int length() { ... }
double weight() { ... }
// constructors, fields, toString(), etc.
}
Let's start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:
List<Predicate<Widget>> allPredicates = Arrays.asList(
w -> w.length() >= 10,
w -> w.weight() > 40.0,
w -> w.name().compareTo("c") > 0);
Given this list, we can permute them (probably not useful, since they're order independent) or select any subset we want. Let's say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that's a composite and of them all:
Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
compositePredicate = compositePredicate.and(allPredicates.get(i));
}
This works, but it fails if the list is empty, and since we're doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:
Predicate<Widget> compositePredicate =
allPredicates.stream()
.reduce(w -> true, Predicate::and);
(Credit: I learned this technique from #venkat_s. If you ever get a chance, go see him speak at a conference. He's good.)
Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)
Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:
widgetList.stream()
.filter(compositePredicate)
.forEach(System.out::println);
Context Sensitive Filters
Now let's consider what I referred to as a "context sensitive" filter, which is represented by the example like "top 50% in a certain order", say the top 50% of widgets by weight. "Context sensitive" isn't the best term for this but it's what I've got at the moment, and it is somewhat descriptive in that it's relative to the number of elements in the stream up to this point.
How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It's kind of like sorted() in a pipeline which can't tell which is the first element to output until it has read every single input element and has sorted them.
The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:
List<Widget> temp =
list.stream()
.sorted(comparing(Widget::weight).reversed())
.collect(toList());
temp.stream()
.limit((long)(temp.size() * 0.5))
.forEach(System.out::println);
This isn't complicated, but it's a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list's size in the 50% computation.
This is limiting, though, in that it's a "static" representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?
A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:
Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
List<Widget> temp =
stream.sorted(comparing(Widget::weight).reversed())
.collect(toList());
return temp.stream()
.limit((long)(temp.size() * 0.5));
}
A similar example might be to find the shortest three widgets:
Stream<Widget> shortestThree(Stream<Widget> stream) {
return stream.sorted(comparing(Widget::length))
.limit(3);
}
Now we can write something that combines these stateful filters with ordinary stream operations:
shortestThree(
top50PercentByWeight(
widgetList.stream()
.filter(w -> w.length() >= 10)))
.forEach(System.out::println);
This works, but is kind of lousy because it reads "inside-out" and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it's still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.
A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be
UnaryOperator<Stream<Widget>>
If we were to put these into a list, the type of that list would be
List<UnaryOperator<Stream<Widget>>>
Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let's call one of our new things a Criterion. It turns out that there's little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:
#FunctionalInterface
public interface Criterion {
Stream<Widget> apply(Stream<Widget> s);
}
Now we can create a list of criteria like this:
List<Criterion> criteria = Arrays.asList(
this::shortestThree,
this::lengthGreaterThan20
);
(We'll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it's still somewhat limiting. First, it can't be combined with ordinary predicates. Second, there's a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.
This creates a criterion that selects the top N widgets, given a comparator:
Criterion topN(Comparator<Widget> cmp, long n) {
return stream -> stream.sorted(cmp).limit(n);
}
This creates a criterion that selects the top p percent of widgets, given a comparator:
Criterion topPercent(Comparator<Widget> cmp, double pct) {
return stream -> {
List<Widget> temp =
stream.sorted(cmp).collect(toList());
return temp.stream()
.limit((long)(temp.size() * pct));
};
}
And this creates a criterion from an ordinary predicate:
Criterion fromPredicate(Predicate<Widget> pred) {
return stream -> stream.filter(pred);
}
Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:
List<Criterion> criteria = Arrays.asList(
fromPredicate(w -> w.length() > 10), // longer than 10
topN(comparing(Widget::length), 4L), // longest 4
topPercent(comparing(Widget::weight).reversed(), 0.50) // heaviest 50%
);
Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:
Criterion allCriteria =
criteria.stream()
.reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));
The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.
Now that we've composed all the criteria, we can apply it to a stream of widgets like so:
allCriteria.apply(widgetList.stream())
.forEach(System.out::println);
This is still a bit inside-out, but it's fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.
The functional programming gurus are probably saying "He just reinvented ... !" which is probably true. I'm sure this has probably been invented somewhere already, but it's new to Java, because prior to lambda, it just wasn't feasible to write Java code that uses these techniques.
Update 2014-04-07
I've cleaned up and posted the complete sample code in a gist.
We could add a counter with a map so we know how many elements we have after the filters. I created a helper class that has a method that counts and returns the same object passed:
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
I had to convert the stream to list and back to stream in the middle because the limit would use the initial count otherwise. Is all a but "hackish" but is all I could think.
I could do it a bit differently using a Function for my mapped class:
class DoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
The only thing will change in the stream is:
map((p) -> counter.pass(p)).
will become:
map(counter).
My complete test class including the two examples:
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Demo2 {
Random r = new Random();
class Person {
public int size, weitght,length, age;
public Person(int s, int w, int l, int a){
this.size = s;
this.weitght = w;
this.length = l;
this.age = a;
}
public String toString() {
return "P: "+this.size+", "+this.weitght+", "+this.length+", "+this.age+".";
}
}
public List<Person>create(int size) {
List<Person>persons = new ArrayList<>();
while(persons.size()<size) {
persons.add(new Person(r.nextInt(10)+10, r.nextInt(10)+10, r.nextInt(10)+10,r.nextInt(20)+14));
}
return persons;
}
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
class PDoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public PDoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
PDoNothingButCount<Person> counter = new PDoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map(counter).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public void runDemo2() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public static void main(String str[]) {
Demo2 demo = new Demo2();
System.out.println("Demo 2:");
demo.runDemo2();
System.out.println("Demo 1:");
demo.runDemo();
}
}