I want to collect the values returned by a method called inside a forEach:
#PostMapping(value="insert-ppl")
public String insertPeople(#RequestBody #Valid #NotNull List<Person> people){
people.forEach(this::insertPerson);
}
the insertPerson method returns a string signaling whether the insert in the database was successfull or not. I want to take the strings returned from each call to the insertPerson. But I can't use the .collect because is not a stream.
How can I do that?
The forEach is supposed to be used to execute a consuming operation on each element. It's not supposed to be used to collect any results.
For this purpose you need to do it differently. I think best option would be to just use a standard for loop and add result of each insertPerson call into a list that you then return at the end.
Theoretically, you could use something like people.stream().map(this::insertPerson).toList() but that's in my opinion not great idea, because map() is supposed to just map input to output without any side effects, however your insertPerson obviously has side effects, which may make it more "problematic" for a reader to understand what is really happening.
There's a couple of reasons why this code isn't suitable for streams.
Firstly, your goal is to operate via side-effects (because you need to update the database), the only operations which are intended for such cases are forEach() and forEachOrdered(). And as API documentation worns these methods should be used with great care.
Note that map() isn't suitable for that purpose, it wasn't designed to operate via side-effects. And intermediate operations like map() in some cases could be optimized away from the stream. It's not likely to happen in a pipeline build like that: stream().map().collect(), but the key point is that intermediate operations semantically are not intended to perform resulting actions, in your case call insertPerson() is a resulting action and accumulated string is only a byproduct.
For your purposes, you can still use Iterable.forEach().
#PostMapping(value = "insert-ppl")
public String insertPeople(#RequestBody #Valid #NotNull List<Person> people) {
StringJoiner result = new StringJoiner("\n"); // provide the required delimiter instead of "\n"
people.forEach(person -> result.add(insertPerson(person)));
return result.toString();
}
Related
I am trying to use Java Streams to make the sequential processing of a list of customers run in parallel. This is a short-term band-aid to a problem that we are solving as part of re-architecting our entire system.
What I am starting with is a List<Customer> Customers structure that contains the customer contact information and all the relevant transaction data. Conceptually, the code I am replacing looks like:
long emailsSent = 0;
List<Customer> customers = methodLoadingAllrelevantData();
for (Customer customer: customers) {
boolean isEmailSent = sendEmail(customer);
if (isEmailSent) {
emailsSent++;
}
}
The sendMail(customer) function:
Determines if an email should be sent
Formats the email
Attempts to send the email
Returns true if the email was sent successfully
Not great code, but I am just trying to get some more speed from the existing code, not trying to make it better. The method and all its calls are 100-percent thread-safe.
I put it in the following stream structure:
ForkJoinPool limitedParallelThreadPool = new ForkJoinPool(numberOfThreads);
emailsSent = limitedParallelThreadPool.submit( () ->
customers.stream().parallel()
.map(this::_emailCustomer)
.filter(b -> b == true).count()
).get();
This does work as expected, returning the same data as the sequential version.
My questions are: because the purpose of my method is to generates an email, is it poor practice for me to use a map function? Is there a better answer? I am, in effect, mapping a Customer to a boolean, but part of this mapping requires the process to trigger an email.
I was originally trying to use a forEach()operator, but I could not figure out how to get the count without adding state information to the sendMail function, which interferes with it being thread-safe.
Returns true if the email was sent successfully
It wouldn't be the worst idea to take advantage of the fact that your _emailCustomer method returns a boolean, so you can use Stream#filter instead of a combination of both Stream#map and Stream#filter:
customers.parallelStream()
.filter(this::_emailCustomer)
.count()
To answer your question, though, it would depend on the use-case whether or not Stream#map is the correct intermediate operation to use. According to the documentation of Stream#map, the Function argument that the method accepts must be:
a non-interfering, stateless function to apply to each element
If your _emailCustomer method is either interfering or stateful, then I would refrain from calling it within Stream#map, especially in a parallel context.
Since you don't care in which order those emails are getting sent, I'd say you kind of OK in this example. It's just that you are relying on side-effects of the map intermediate operation and potentially that can bite you. For example this:
Stream.of(1,2,3,4)
.map(x -> x + 1)
.count();
will not execute the map at all (starting with java-9), since all you want is count and map will not change the final count. Your example is safe from that since you are filtering, thus the final count is not known, thus map has to be executed. As said, for a parallel environment there is no guarantee about the order in which map will be executed.
It's a pity though that your sendEmail returns something at all, all the email services I wrote were more like a event thing - fire and forget; but I can't tell your exact scenario needed.
Think about the fact that your map operation will block, until you get a response back and that might trigger this part of the documentation that you need to look at:
A ForkJoinPool is constructed with a given target parallelism level; by default, equal to the number of available processors. The pool attempts to maintain enough active (or available) threads by dynamically adding, suspending, or resuming internal worker threads, even if some tasks are stalled waiting to join others
I hope my question is clear enough.
Let's say your have an API that performs requests over a network backed by IO-streams (input and output).
Using RxJava (which I am debuting with), I would think it could be possible to do the following:
public Single<MyData> getMyDataFromApi() {
return requestMyData()
.map/flat/then()->waitAndprocessData() // here is the missing link for me. What can I use ?
.andThen()->returnData()
As you will understand, the method requestMyData returns a Completable which sole responsibility and only task it to perform said request (IO-type operation).
Then, upon performing the request, the remote entity shall process it and return a result the requested MyData object by performing an IO-type operation as well.
The key-point here, is that I work with streams (both input and output) which reading and writing operations are obviously performed in separate IO threads (using Schedulers.io()).
So in the end, is there a way so that my getMyDataFromApi() method does the following :
Perform the request -> it's a completable
Wait for the result -> something like a subscribe ? but without splitting the chain
Process the result -> it's a single or can be a lambada in map method
Return the result -> final element, obviously a single
To conclude, I strongly believe that requestMyData's signature should be that of a Single, because it's getter and I am expecting a result or an error.
Without having the implementation of the methods is quite hard to understand the real problem.
If requestMyData returns a Completable and waitAndprocessData a Single, you can do the following:
return requestMyData().andThen(waitAndprocessData());
Anyway remember that a Completable is computation without any value but only indication for completion (or exceptions).
What is the use case for Mono<T> and Flux<T> as parameter in function.
Code
Flux<String> findByLastName(Mono<String> lastname) {
//implementation
}
When I invoke above method from rest, what will be difference from not using String.class as parameter?
To answer your first comment question:
#ErwinBolwidt i know use case for Mono/Flux in computation but i dont understand specifically using it as method parameter – Bibek Shakya
When you use it as a parameter you have to deal with it as a stream (meaning you don't have it yet) so for example you should never say lastname.block(), because this means you've just blocked the thread until this value is available.
Disclaimer Extra information
If you're asking whether you should wrap anything from now on in a Mono or a flux, then of course not, because it adds unnecessary complexity to the method and the caller.
And for a design perspective, answer is simple, by asking basic questions:
When to use a Mono in general ?
Well, when you still don't have the value.
When to use Flux in general ?
Well, when you have a stream of data coming or not.
So we should not think of who is using the method and try to make the method convenient for him, but actually we should think of the method needs.
And a use case for that is when the method actually needs argument in this way; meaning you actually do stream processing inside, for example your method accepts an infinite stream of sensor data, and the method inside is going crazy like:
Flux<Point> processSensor(Flux<Double> data){
return data.filter(blabla).zipWith(blabla).map(...);
}
Only use cases I can think of why a method parameter is Mono<String> lastname
Was retrieved from a WebClient/Router type function
#Secured("ROLE_EVERYONE") was used in a previous method to retrieve the lastname
For this to work the return type of the method must be a
org.reactivestreams.Publisher (i.e. Mono/Flux).
Is there a way in Java to apply a function to all the elements of a Stream without breaking the Stream chain? I know I can call forEach, but that method returns a void, not a Stream.
There are (at least) 3 ways. For the sake of example code, I've assumed you want to call 2 consumer methods methodA and methodB:
A. Use peek():
list.stream().peek(x -> methodA(x)).forEach(x -> methodB(x));
Although the docs say only use it for "debug", it works (and it's in production right now)
B. Use map() to call methodA, then return the element back to the stream:
list.stream().map(x -> {method1(x); return x;}).forEach(x -> methodB(x));
This is probably the most "acceptable" approach.
C. Do two things in the forEach():
list.stream().forEach(x -> {method1(x); methodB(x);});
This is the least flexible and may not suit your need.
You are looking for the Stream's map() function.
example:
List<String> strings = stream
.map(Object::toString)
.collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
The best option you have is to apply the map to your stream. which returns a stream consisting of the results of applying the given function to the elements of the stream.
For example:
IntStream.range(1, 100)
.boxed()
.map(item->item+3)
.map(item->item*2)...
We are applying several modifications to the stream but in some case we don't want to modify the stream. We just want to visit every element and then pass it down the stream without modification (like the peek() method in the streams API). in such cases, we can
StreamItem peekyMethod(StreamItem streamItemX) {
// .... visit the streamItemX
//Then pass it down the stream
return streamItemX;
}
Not entirely sure what you mean by breaking the stream chain, but any operation on a Stream that returns a Stream will not break or consume your Stream. Streams are consumed by terminal operations and as you noted the forEach does not return a Stream<T> and as such ends the stream, by executing all the intermediate operations before the forEach and the forEach itself.
In the example that you provided in the comments:
myStream.map(obj -> {obj.foo(); return obj;}
You can't really do this with one liner. Of course you could use a method reference, but then your returned Stream would be of a different type (assuming foo returns a type):
myStream.map(Obj::foo) // this will turn into Stream<T>, where T is
// the return type of foo, instead of Stream<Obj>
Besides that your map operation is stateful, which is strongly discouraged. Your code will compile and might even work as you want it to - but it might later fail. map operations should be stateless.
You can use map method but you have to create helper method which returns this. For example:
public class Fluent {
public static <T> Function<T, T> of(Consumer<T> consumer) {
return t -> {
consumer.accept(t);
return t;
};
}
}
And use it when you want to call void method:
list.stream().map(Fluent.of(SomeClass::method));
or if you want to use it with method with some argument:
list.stream().map(Fluent.of(x -> x.method("hello")))
I think you are looking for Stream.peek. But read the docs carefully, as it was designed mainly as a debug method. From the docs:
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline
The action passed to peek must be non interfering.
I think the cleanest way is to add a mutator to the objects in the stream.
For example,
class Victim {
private String tag;
private Victim withTag(String t)
this.tag = t;
return this;
}
}
List<Victim> base = List.of(new Victim());
Stream<Victim> transformed = base.stream().map(v -> v.withTag("myTag"));
If you prefer (and many will), you can have the withTag method create and return a new Victim; this allows you to make Victim immutable.
Suppose I have a method that returns a read-only view into a member list:
class Team {
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers() {
return Collections.unmodifiableList(players);
}
}
Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does not store a reference to the list for later inspection!
Given this common scenario, should I return a stream instead?
public Stream<Player> getPlayers() {
return players.stream();
}
Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?
The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.
First, note that you can always get a Collection from a Stream, and vice versa:
// If API returns Collection, convert with stream()
getFoo().stream()...
// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());
So the question is, which is more useful to your callers.
If your result might be infinite, there's only one choice: Stream.
If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.
If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.
Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. If you return a Stream, they can do collect(toCollection(factory)) and get it in exactly the form they want.
The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.
The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.
So I would say that most of the time, Stream is the right answer — it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.
If you already have a suitable Collection "lying around", and it seems likely that your users would rather interact with it as a Collection, then it is a reasonable choice (though not the only one, and more brittle) to just return what you have.
I have a few points to add to Brian Goetz' excellent answer.
It's quite common to return a Stream from a "getter" style method call. See the Stream usage page in the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.
Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:
static Stream<String> lines(path)
static List<String> readAllLines(path)
Not only does readAllLines have to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The lines method can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:
try (Stream<String> lines = Files.lines(path)) {
List<String> firstTen = lines.limit(10).collect(toList());
}
Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.
An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a get prefix. Also, while stream() is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:
Stream<Attribute> attributes();
Stream<Element> elements();
Were streams designed to always be "terminated" inside the same expression they were created in?
That is how they are used in most examples.
Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)
IMHO the best solution is to encapsulate why you are doing this, and not return the collection.
e.g.
public int playerCount();
public Player player(int n);
or if you intend to count them
public int countPlayersWho(Predicate<? super Player> test);
If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.
Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.
While some of the more high-profile respondents gave great general advice, I'm surprised no one has quite stated:
If you already have a "materialized" Collection in-hand (i.e. it was already created before the call - as is the case in the given example, where it is a member field), there is no point converting it to a Stream. The caller can easily do that themselves. Whereas, if the caller wants to consume the data in its original form, you converting it to a Stream forces them to do redundant work to re-materialize a copy of the original structure.
In contrast to collections, streams have additional characteristics. A stream returned by any method might be:
finite or infinite
parallel or sequential (with a default globally shared threadpool that can impact any other part of an application)
ordered or non-ordered
holding references to be closed or not
These differences also exists in collections, but there they are part of the obvious contract:
All Collections have size, Iterator/Iterable can be infinite.
Collections are explicitly ordered or non-ordered
Parallelity is thankfully not something the collection care about beyond thread-safety
Collections also are not closable typically, so also no need to worry about using try-with-resources as a guard.
As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are
(finite, ordered, sequential, requiring-close)
(finite, ordered, parallel, requiring-close)
(finite, non-ordered, sequential, requiring-close)...
Writing method guards for streams that throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.
Documentation mitigates the problem, but it is flawed and often overlooked, and does not help when a stream provider is modified. As an example, see these javadocs of Java8 Files:
/**
* [...] The returned stream encapsulates a Reader. If timely disposal of
* file system resources is required, the try-with-resources
* construct should be used to ensure that the stream's close
* method is invoked after the stream operations are completed.
*/
public static Stream<String> lines(Path path, Charset cs)
/**
* [...] no mention of closing even if this wraps the previous method
*/
public static Stream<String> lines(Path path)
That leaves Stream only as a valid choice in a method signature when none of the problems above matter, typically when the stream producer and consumer are in the same codebase, and all consumers are known (e.g. not part of the public interface of a class reusable in many places).
It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).
I think it depends on your scenario. May be, if you make your Team implement Iterable<Player>, it is sufficient.
for (Player player : team) {
System.out.println(player);
}
or in the a functional style:
team.forEach(System.out::println);
But if you want a more complete and fluent api, a stream could be a good solution.
Perhaps a Stream factory would be a better choice. The big win of only
exposing collections via Stream is that it better encapsulates your
domain model’s data structure. It’s impossible for any use of your domain classes to affect the inner workings of your List or Set simply
by exposing a Stream.
It also encourages users of your domain class to
write code in a more modern Java 8 style. It’s possible to
incrementally refactor to this style by keeping your existing getters
and adding new Stream-returning getters. Over time, you can rewrite
your legacy code until you’ve finally deleted all getters that return
a List or Set. This kind of refactoring feels really good once you’ve
cleared out all the legacy code!
I would probably have 2 methods, one to return a Collection and one to return the collection as a Stream.
class Team
{
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers()
{
return Collections.unmodifiableList(players);
}
public Stream<Player> getPlayerStream()
{
return players.stream();
}
}
This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.
This also only adds 1 more method to your API so you don't have too many methods