Loop fusion of Stream in Java-8 (how it works internally) - java

I'm reading the book 'Java in Action'.
And I saw an example code of Stream in the book.
List<String> names = menu.stream()
.filter(d -> {
System.out.println("filtering" + d.getName());
return d.getCalories() > 300;
})
.map(d -> {
System.out.println("mapping" + d.getName());
return d.getName();
})
.limit(3)
.collect(toList());
When the code is executed, the result is as follows.
filtering __1__.
mapping __1__.
filtering __2__.
mapping __2__.
filtering __3__.
mapping __3__.
That is, because of limit(3), the log message is printed only 3 times!
In this book, this is called in "loop fusion."
But, I don't understand this.
Because, if you know whether an object is filtered, you have to calculate the filtering function. Then, "filtering ..." message is should be printed, I think.
Please, Explain me about how the loop fusion works internally.

“Because, if you [want to] know whether an object is filtered, you have to calculate the filtering function”, is right, but perhaps your sample data wasn’t sufficient to illustrate the point. If you try
List<String> result = Stream.of("java", "streams", "are", "great", "stuff")
.filter(s -> {
System.out.println("filtering " + s);
return s.length()>=4;
})
.map(s -> {
System.out.println("mapping " + s);
return s.toUpperCase();
})
.limit(3)
.collect(Collectors.toList());
System.out.println("Result:");
result.forEach(System.out::println);
it will print
filtering java
mapping java
filtering streams
mapping streams
filtering are
filtering great
mapping great
Result:
JAVA
STREAMS
GREAT
Showing that
In order to find three elements matching the filter, you might have to evaluate more than three elements, here, four element are evaluated, but you don’t need to evaluate subsequent elements once you have three matches
The subsequent mapping function only need to be applied to matching elements. This allows to conclude that it is irrelevant whether .map(…).limit(…)or .limit(…).map(…) was specified.
This differs from the relative position of .filter and .limit which is relevant.
The term “loop fusion” implies that there is not a filtering loop, followed by a mapping loop, followed by a limit operation, but only one loop (conceptionally), performing the entire work, equivalent to the following single loop:
String[] source = { "java", "streams", "are", "great", "stuff"};
List<String> result = new ArrayList<>();
int limit = 3;
for(String s: source) {
System.out.println("filtering " + s);
if(s.length()>=4) {
System.out.println("mapping " + s);
String t = s.toUpperCase();
if(limit-->0) {
result.add(t);
}
else break;
}
}

I think you got it wrong. limit is actually called short-circuiting (because it is executed only 3 times).
While loop fusion is filter and map executed at a single pass. These two operations where merged into a single one that is executed at each element.
You do not see output like this:
filtering
filtering
filtering
mapping
mapping
mapping
Instead you see filter followed immediately by a map; so these two operations were merged into a single one.
Generally you should not care how that is done internally (it build a pipeline of these operations), because this might change and it is implementation specific.

Related

How to collect data from a stream in different lists based on a condition?

I have a stream of data as shown below and I wish to collect the data based on a condition.
Stream of data:
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L120;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L121;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L126;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L122;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions?
Note: splittedLine array below is my stream of data.
For instance: I have tried the following but I think there's a shorter way:
List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");
List<List<String>> list= L100_ENTITY_NAMES.stream()
.map(entity -> Arrays.stream(splittedLine)
.filter(line -> {
String[] values = line.split(String.valueOf(DELIMITER));
if(values.length > 0){
return entity.equals(values[2]);
}
else{
return false;
}
}).collect(Collectors.toList())).collect(Collectors.toList());
I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.
Assuming splittedLine is the array of lines, I'd probably do something like this:
Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);
Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}
return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());
Note that this isn't necessarily shorter but has a couple of other advantages:
It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
You get an entity name for each list rather than having to rely on the indices in both lists
It's easier to understand because you use distinct steps:
split and map the line
filter null values, i.e. lines that aren't valid in the first place
filter lines that don't have any of the L100 entity names
collect the filtered lines by entity name so you can easily access the sub lists
I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.
First, I would create a model modelling our data:
public record LBasedEntity(long id, int zero, String lcode, …) { }
Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.
private static LBasedEntity parse(String line) {
String[] parts = line.split(";");
if (parts.length < 3) {
return null;
}
long id = Long.parseLong(parts[0]);
int zero = Integer.parseInt(parts[1]);
String lcode = parts[2];
…
return new LBasedEntity(id, zero, lcode, …);
}
Then the mapping is trivial:
Map<String, List<LBasedEntity>> result = Arrays.stream(lines)
.map(line -> parse(line))
.filter(Objects::nonNull)
.filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode()))
.collect(Collectors.groupingBy(LBasedEntity::lcode));
map(line -> parse(line)) parses the line into an LBasedEntity object (or whatever you call it);
filter(Objects::nonNull) filters out all null values produced by the parse method;
The next filter selects all entities of which the lcode property is contained in the L100_ENTITY_NAMES list (I would turn this into a Set, to speed things up);
Then a Map is with key-value pairs of L100_ENTITY_NAME → List<LBasedEntity>.
You're effectively asking for what languages like Scala provide on collections: groupBy. In Scala you could write:
splitLines.groupBy(_(2)) // Map[String, List[String]]
Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold or groupBy function.
HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
if (line.length < 2) continue;
ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
xs.addAll(Arrays.asList(line));
map.put(line[2], xs);
}
As you can see, it's very easy to understand, and actually shorter than the stream based solution.
I'm leveraging two key methods on a HashMap.
The first is getOrDefault; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an empty ArrayList.
The second is put, which actually acts like a putOrReplace because it lets us override the previous value associated with the key.
I hope that was helpful. :)
you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.
if (values.length > 0) {
return entity.equals(values[2]);
} else {
return false;
}
I would suggest introduce two tiny private methods to improve the readability, like this:
List<List<String>> list = L100_ENTITY_NAMES.stream()
.map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());
private List<String> getLinesByEntity(String[] splittedLine, String entity) {
return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
}
private boolean isLineMatched(String entity, String line) {
String[] values = line.split(DELIMITER);
return values.length > 0 && entity.equals(values[2]);
}

How to loop through stream groups in Java to perform operations on Strings in each group

I have a sorted ArrayList A and used streams to group by the substring(3,7). I'm using a large dataset, so I don't know all the different substring(3,7) there are.
ArrayList A for example looks something like this (but with a lot more data): ooo122ppp, aaa122b333, zzz122bmmm, ccc9o9i333, mmm9o9i111, qqqQmQm888, 777QmQmlll, vvvjjj1sss
I need to loop through each group so that I can do something to that grouped data. I've tried for loops, if statements, etc, but can't figure it out. I've tried this, but I get an error regarding the for loop. How am I able to loop through each group I have to perform operations on the Strings in the group?
Collection<List<String>> grouped = A.stream().collect(groupingBy(ex -> ex.substring(3,7))).values();
for(int g=0; g<grouped.forEach(); g++) {
//do something
}
You can use the forEach method on Collection. It should not be confused with a regular for loop.
grouped.forEach(group -> {
group.forEach(str -> {
//do something
});
});
Are you looking for something like this?
List<String> stringList = List.of("ooo122ppp", "aaa122b333", "zzz122bmmm", "ccc9o9i333", "mmm9o9i111", "qqqQmQm888", "777QmQmlll", "vvvjjj1sss");
Map<String, List<String>> collection = stringList.stream().collect(Collectors.groupingBy(ex -> ex.substring(3, 7)));
for (Map.Entry<String, List<String>> entry : collection.entrySet()) {
System.out.println("group <" + entry.getKey() + "> strings " + entry.getValue());
}
Output
group <jjj1> strings [vvvjjj1sss]
group <122b> strings [aaa122b333, zzz122bmmm]
group <122p> strings [ooo122ppp]
group <9o9i> strings [ccc9o9i333, mmm9o9i111]
group <QmQm> strings [qqqQmQm888, 777QmQmlll]
Otherwise please try to better explain the requirement ;)

How can I aggregate elements on a flux by group / how to reduce groupwise?

Assume you have a flux of objects with the following structure:
class Element {
String key;
int count;
}
Now imagine those elements flow in a predefined sort order, always in groups of a key, like
{ key = "firstKey", count=123}
{ key = "firstKey", count=1 }
{ key = "secondKey", count=4 }
{ key = "thirdKey", count=98 }
{ key = "thirdKey", count=5 }
.....
What I want to do is create a flux which returns one element for each distinct key and summed count for each key-group.
So basically like a classic reduce for each group, but using the reduce operator does not work, because it only returns a single element and I want to get a flux with one element for each distinct key.
Using bufferUntil might work, but has the drawback, that I have to keep a state to check if the key has changed in comparison to the previous one.
Using groupBy is an overkill, as I know that each group has come to an end once a new key is found, so I don't want to keep anything cached after that event.
Is such an aggregation possible using Flux, without keeping a state outside of the flow?
This is currently (as of 3.2.5) not possible without keeping track of state yourself. distinctUntilChanged could have fit the bill with minimal state but doesn't emit the state, just the values it considered as "distinct" according to said state.
The most minimalistic way of solving this is with windowUntil and compose + an AtomicReference for state-per-subscriber:
Flux<Tuple2<T, Integer>> sourceFlux = ...; //assuming key/count represented as `Tuple2`
Flux<Tuple2<T, Integer>> aggregated = sourceFlux.compose(source -> {
//having this state inside a compose means it will not be shared by multiple subscribers
AtomicReference<T> last = new AtomicReference<>(null);
return source
//use "last seen" state so split into windows, much like a `groupBy` but with earlier closing
.windowUntil(i -> !i.getT1().equals(last.getAndSet(i.getT1())), true)
//reduce each window
.flatMap(window -> window.reduce((i1, i2) -> Tuples.of(i1.getT1(), i1.getT2() + i2.getT2()))
});
That really worked for me! Thanks for that post.
Please note that in the meantime the "compose" method was renamed. You need to use transformDeferred instead.
In my case I have a "Dashboard" object which has an id (stored as UUID) on which I want to group the source flux:
Flux<Dashboard> sourceFlux = ... // could be a DB query. The Flux must be sorted according the id.
sourceFlux.transformDeferred(dashboardFlux -> {
// this stores the dashboardId's as the Flux publishes. It is used to decide when to open a new window
// having this state inside a compose means it will not be shared by multiple subscribers
AtomicReference<UUID> last = new AtomicReference<>(null);
return dashboardFlux
//use "last seen" state so split into windows, much like a `groupBy` but with earlier closing
.windowUntil(i -> !i.getDashboardId().equals(last.getAndSet(i.getDashboardId())), true)
//reduce each window
.flatMap(window -> window.reduce(... /* reduce one window here */));
})

How to validate that a Java 8 Stream has two specific elements in it?

Let's say I have List<Car> and I want to search through that list to verify that I have both a Civic AND a Focus. If it's an OR it's very easy in that I can just apply an OR on the .filter(). Keep in mind that I can't do filter().filter() for this type of AND.
A working solution would be to do:
boolean hasCivic = reportElements.stream()
.filter(car -> "Civic".equals(car.getModel()))
.findFirst()
.isPresent();
boolean hasFocus = reportElements.stream()
.filter(car -> "Focus".equals(car.getModel()))
.findFirst()
.isPresent();
return hasCivic && hasFocus;
But then I'm basically processing the list twice. I can't apply an && in the filter nor can I do filter().filter().
Is there a way to process the stream once to find if the list contains both a Civic and a Focus car?
IMPORTANT UPDATE: The key problem with the solutions provided is that they all guarantee O(n) whereas my solution could be done after just two comparisons. If my list of cars is say 10 million cars then there would be a very significant performance cost. My solution however doesn't feel right, but maybe it is the best solution performance wise...
You could filter the stream on "Civic" or "Focus", and then run a collector on getModel() returning a Set<String>. Then you could test if your set contains both keys.
Set<String> models = reportElements.stream()
.map(Car::getModel)
.filter(model -> model.equals("Focus") || model.equals("Civic"))
.collect(Collectors.toSet());
return models.contains("Focus") && models.contains("Civic");
However, this would process the entire stream; it wouldn't "fast succeed" when both have been found.
The following is a "fast succeed" short-circuiting method. (Updated to include comments and clarifications from comments, below)
return reportElements.stream()
.map(Car::getModel)
.filter(model -> model.equals("Focus") || model.equals("Civic"))
.distinct()
.limit(2)
.count() == 2;
Breaking the stream operations down one at a time, we have:
.map(Car::getModel)
This operation transforms the stream of cars into a stream of car models.
We do this for efficiency.
Instead of calling car.getModel() multiple times in various places in the remainder of the pipeline (twice in the filter(...) to test against each of the desired models, and again for the distinct() operation), we apply this mapping operation once.
Note that this does not create the "temporary map" mentioned in the comments;
it merely translates the car into the car's model for the next stage of the pipeline.
.filter(model -> model.equals("Focus") || model.equals("Civic"))
This filters the stream of car models, allowing only the "Focus" and "Civic" car models to pass.
.distinct()
This pipeline operation is a stateful intermediate operation.
It remembers each car model that it sees in a temporary Set.
(This is likely the "temporary map" mentioned in the comments.)
Only if the model does not exist in the temporary set,
will it be (a) added to the set, and (b) passed on to the next stage of the pipeline.
At this point in the pipeline, there can only be at most two elements in the stream: "Focus" or "Civic" or neither or both.
We know this because we know the filter(...) will only ever pass those two models, and we know that distinct() will remove any duplicates.
However, this stream pipeline itself does not know that.
It would continue to pass car objects to the map stage to be converted into model strings, pass these models to the filter stage, and send on any matching items to the distinct stage.
It cannot tell that this is futile, because it doesn't understand that nothing else can pass through the algorithm; it simple executes the instructions.
But we do understand.
At most two distinct models can pass through the distinct() stage.
So, we follow this with:
.limit(2)
This is a short-circuiting stateful intermediate operation.
It maintains a count of the number of items which pass through, and
after the indicated amount, it terminates the stream, causing all subsequent items to be discarded without even starting down the pipeline.
At this point in the pipeline, there can only be at most two elements in the stream: "Focus" or "Civic" or neither or both.
But if both, then the stream has been truncated and is at the end.
.count() == 2;
Count up the number of items that made it through the pipeline,
and test against the desired number.
If we found both models, the stream will immediate terminate, count() will return 2, and true will be returned.
If both models are not present, of course, the stream is processed until the bitter end, count() will return a value less that two, and false will result.
Example, using an infinite stream of models.
Every third model is a "Civic", every 7th model is a "Focus", the remainder are all "Model #":
boolean matched = IntStream.iterate(1, i -> i + 1)
.mapToObj(i -> i % 3 == 0 ? "Civic" : i % 7 == 0 ? "Focus" : "Model "+i)
.peek(System.out::println)
.filter(model -> model.equals("Civic") || model.equals("Focus"))
.peek(model -> System.out.println(" After filter: " + model))
.distinct()
.peek(model -> System.out.println(" After distinct: " + model))
.limit(2)
.peek(model -> System.out.println(" After limit: " + model))
.count() == 2;
System.out.println("Matched = "+matched);
Output:
Model 1
Model 2
Civic
After filter: Civic
After distinct: Civic
After limit: Civic
Model 4
Model 5
Civic
After filter: Civic
Focus
After filter: Focus
After distinct: Focus
After limit: Focus
Matched = true
Notice that 3 models got through the filter(), but only 2 made it past distinct() and limit().
More importantly, notice that true was returned long before the end of the infinite stream of models was reached.
Generalizing the solution, since the OP wants something that could work with people, or credit cards, or IP addresses, etc., and the search criteria is probably not a fixed set of two items:
Set<String> models = Set.of("Focus", "Civic");
return reportElements.stream()
.map( Car::getModel )
.filter( models::contains )
.distinct()
.limit( models.size() )
.count() == models.size();
Here, given an arbitrary models set, existence of any particular set of car models may be obtained, not limited to just 2.
You can do:
reportElements.stream()
.filter(car -> "Civic".equals(car.getModel()) || "Focus".equals(car.getModel()))
.collect(Collectors.toMap(
c -> c.getModel(),
c -> c,
(c1, c2) -> c1
)).size() == 2;
or even with Set
reportElements.stream()
.filter(car -> "Civic".equals(car.getModel()) || "Focus".equals(car.getModel()))
.map(car -> car.getModel())
.collect(Collectors.toSet())
.size() == 2;
and with distinct
reportElements.stream()
.filter(car -> "Civic".equals(car.getModel()) || "Focus".equals(car.getModel()))
.map(car -> car.getModel())
.distinct()
.count() == 2L;
The reason it "doesn't feel right" is because you are forcing the stream API to do something it doesn't want to do. You would almost surely be better off with a traditional loop:
boolean hasFocus = false, hasCivic = false;
for (Car c : reportElements) {
if ("Focus".equals(c.getModel())) hasFocus = true;
if ("Civic".equals(c.getModel())) hasCivic = true;
if (hasFocus & hasCivic) return true;
}
return false;

How to dynamically do filtering in Java 8?

I know in Java 8, I can do filtering like this :
List<User> olderUsers = users.stream().filter(u -> u.age > 30).collect(Collectors.toList());
But what if I have a collection and half a dozen filtering criteria, and I want to test the combination of the criteria ?
For example I have a collection of objects and the following criteria :
<1> Size
<2> Weight
<3> Length
<4> Top 50% by a certain order
<5> Top 20% by a another certain ratio
<6> True or false by yet another criteria
And I want to test the combination of the above criteria, something like :
<1> -> <2> -> <3> -> <4> -> <5>
<1> -> <2> -> <3> -> <5> -> <4>
<1> -> <2> -> <5> -> <4> -> <3>
...
<1> -> <5> -> <3> -> <4> -> <2>
<3> -> <2> -> <1> -> <4> -> <5>
...
<5> -> <4> -> <3> -> <3> -> <1>
If each testing order may give me different results, how to write a loop to automatically filter through all the combinations ?
What I can think of is to use another method that generates the testing order like the following :
int[][] getTestOrder(int criteriaCount)
{
...
}
So if the criteriaCount is 2, it will return : {{1,2},{2,1}}
If the criteriaCount is 3, it will return : {{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}}
...
But then how to most efficiently implement it with the filtering mechanism in concise expressions that comes with Java 8 ?
Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go....
One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.
Another issue is that some of the OP's "filters" are context sensitive, such as "top 50% by a certain order". This can't be done with a simple filter(predicate) construct on a stream.
The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.
Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:
class Widget {
String name() { ... }
int length() { ... }
double weight() { ... }
// constructors, fields, toString(), etc.
}
Let's start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:
List<Predicate<Widget>> allPredicates = Arrays.asList(
w -> w.length() >= 10,
w -> w.weight() > 40.0,
w -> w.name().compareTo("c") > 0);
Given this list, we can permute them (probably not useful, since they're order independent) or select any subset we want. Let's say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that's a composite and of them all:
Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
compositePredicate = compositePredicate.and(allPredicates.get(i));
}
This works, but it fails if the list is empty, and since we're doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:
Predicate<Widget> compositePredicate =
allPredicates.stream()
.reduce(w -> true, Predicate::and);
(Credit: I learned this technique from #venkat_s. If you ever get a chance, go see him speak at a conference. He's good.)
Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)
Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:
widgetList.stream()
.filter(compositePredicate)
.forEach(System.out::println);
Context Sensitive Filters
Now let's consider what I referred to as a "context sensitive" filter, which is represented by the example like "top 50% in a certain order", say the top 50% of widgets by weight. "Context sensitive" isn't the best term for this but it's what I've got at the moment, and it is somewhat descriptive in that it's relative to the number of elements in the stream up to this point.
How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It's kind of like sorted() in a pipeline which can't tell which is the first element to output until it has read every single input element and has sorted them.
The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:
List<Widget> temp =
list.stream()
.sorted(comparing(Widget::weight).reversed())
.collect(toList());
temp.stream()
.limit((long)(temp.size() * 0.5))
.forEach(System.out::println);
This isn't complicated, but it's a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list's size in the 50% computation.
This is limiting, though, in that it's a "static" representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?
A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:
Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
List<Widget> temp =
stream.sorted(comparing(Widget::weight).reversed())
.collect(toList());
return temp.stream()
.limit((long)(temp.size() * 0.5));
}
A similar example might be to find the shortest three widgets:
Stream<Widget> shortestThree(Stream<Widget> stream) {
return stream.sorted(comparing(Widget::length))
.limit(3);
}
Now we can write something that combines these stateful filters with ordinary stream operations:
shortestThree(
top50PercentByWeight(
widgetList.stream()
.filter(w -> w.length() >= 10)))
.forEach(System.out::println);
This works, but is kind of lousy because it reads "inside-out" and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it's still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.
A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be
UnaryOperator<Stream<Widget>>
If we were to put these into a list, the type of that list would be
List<UnaryOperator<Stream<Widget>>>
Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let's call one of our new things a Criterion. It turns out that there's little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:
#FunctionalInterface
public interface Criterion {
Stream<Widget> apply(Stream<Widget> s);
}
Now we can create a list of criteria like this:
List<Criterion> criteria = Arrays.asList(
this::shortestThree,
this::lengthGreaterThan20
);
(We'll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it's still somewhat limiting. First, it can't be combined with ordinary predicates. Second, there's a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.
This creates a criterion that selects the top N widgets, given a comparator:
Criterion topN(Comparator<Widget> cmp, long n) {
return stream -> stream.sorted(cmp).limit(n);
}
This creates a criterion that selects the top p percent of widgets, given a comparator:
Criterion topPercent(Comparator<Widget> cmp, double pct) {
return stream -> {
List<Widget> temp =
stream.sorted(cmp).collect(toList());
return temp.stream()
.limit((long)(temp.size() * pct));
};
}
And this creates a criterion from an ordinary predicate:
Criterion fromPredicate(Predicate<Widget> pred) {
return stream -> stream.filter(pred);
}
Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:
List<Criterion> criteria = Arrays.asList(
fromPredicate(w -> w.length() > 10), // longer than 10
topN(comparing(Widget::length), 4L), // longest 4
topPercent(comparing(Widget::weight).reversed(), 0.50) // heaviest 50%
);
Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:
Criterion allCriteria =
criteria.stream()
.reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));
The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.
Now that we've composed all the criteria, we can apply it to a stream of widgets like so:
allCriteria.apply(widgetList.stream())
.forEach(System.out::println);
This is still a bit inside-out, but it's fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.
The functional programming gurus are probably saying "He just reinvented ... !" which is probably true. I'm sure this has probably been invented somewhere already, but it's new to Java, because prior to lambda, it just wasn't feasible to write Java code that uses these techniques.
Update 2014-04-07
I've cleaned up and posted the complete sample code in a gist.
We could add a counter with a map so we know how many elements we have after the filters. I created a helper class that has a method that counts and returns the same object passed:
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
I had to convert the stream to list and back to stream in the middle because the limit would use the initial count otherwise. Is all a but "hackish" but is all I could think.
I could do it a bit differently using a Function for my mapped class:
class DoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
The only thing will change in the stream is:
map((p) -> counter.pass(p)).
will become:
map(counter).
My complete test class including the two examples:
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Demo2 {
Random r = new Random();
class Person {
public int size, weitght,length, age;
public Person(int s, int w, int l, int a){
this.size = s;
this.weitght = w;
this.length = l;
this.age = a;
}
public String toString() {
return "P: "+this.size+", "+this.weitght+", "+this.length+", "+this.age+".";
}
}
public List<Person>create(int size) {
List<Person>persons = new ArrayList<>();
while(persons.size()<size) {
persons.add(new Person(r.nextInt(10)+10, r.nextInt(10)+10, r.nextInt(10)+10,r.nextInt(20)+14));
}
return persons;
}
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
class PDoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public PDoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
PDoNothingButCount<Person> counter = new PDoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map(counter).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public void runDemo2() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public static void main(String str[]) {
Demo2 demo = new Demo2();
System.out.println("Demo 2:");
demo.runDemo2();
System.out.println("Demo 1:");
demo.runDemo();
}
}

Categories

Resources