Invoking .map() on an infinite stream? - java

According to Javadocs for SE 8 Stream.map() does the following
Returns a stream consisting of the results of applying the given function to the elements of this stream.
However, a book I'm reading (Learning Network Programming with Java, Richard M. Reese) on networking implements roughly the following code snippet in an echo server.
Supplier<String> inputLine = () -> {
try {
return br.readLine();
} catch(IOException e) {
e.printStackTrace();
return null;
}
};
Stream.generate(inputLine).map((msg) -> {
System.out.println("Recieved: " + (msg == null ? "end of stream" : msg));
out.println("echo: " + msg);
return msg;
}).allMatch((msg) -> msg != null);
This is supposed to be a functional way to accomplish getting user input to print to the socket input stream. It works as intended, but I don't quite understand how. Is it because map knows the stream is infinite so it lazily executes as new stream tokens become available? It seems like adding something to a collection currently being iterated over by map is a little black magick. Could someone please help me understand what is going on behind the scenes?
Here is how I restated this in order to avoid the confusing map usage. I believe the author was trying to avoid an infinite loop since you can't break out of a forEach.
Stream.generate(inputLine).allMatch((msg) -> {
boolean alive = msg != null;
System.out.println("Recieved: " + (alive ? msg : "end of stream"));
out.println("echo: " + msg);
return alive;
});

Streams are lazy. Think of them as workers in a chain that pass buckets to each other. The laziness is in the fact that they will only ask the worker behind them for the next bucket if the worker in front of them asks them for it.
So it's best to think about this as allMatch - being a final action, thus eager - asking the map stream for the next item, and the map stream asking the generate stream for the next item, and the generate stream going to its supplier, and providing that item as soon as it arrives.
It stops when allMatch stops asking for items. And it does so when it knows the answer. Are all items in this stream not null? As soon as the allMatch receives an item that is null, it knows the answer is false, and will finish and not ask for any more items. Because the stream is infinite, it will not stop otherwise.
So you have two factors causing this to work the way it work - one is allMatch asking eagerly for the next item (as long as the previous ones weren't null), and the generate stream that - in order to supply that next item - may need to wait for the supplier that waits for the user to send more input.
But it should be said that map shouldn't have been used here. There should not be side effects in map - it should be used for mapping an item of one type to an item of another type. I think this example was used only as a study aid. The much simpler and straightforward way would be to use BufferedReader's method lines() which gives you a finite Stream of the lines coming from the buffered reader.

Yes - Streams are setup lazily until and unless you perform a terminal operation (final action) on the Stream. Or simpler:
For as long as the operations on your stream return another Stream, you do not have a terminal operation, and you keep on chaining until you have something returning anything other than a Stream, including void.
This makes sense, as to be able to return anything other than a Stream, the operations earlier in your stream will need to be evaluated to actually be able to provide the data.
In this case, and as per documentation, allMatch returns a boolean, and thus final execution of your stream is required to calculate that boolean. This is the point also where you provide a Predicate limiting your resulting Stream.
Also note that in the documentation it states:
This is a short-circuiting terminal operation.
Follow that link for more information on those terminal operations, but a terminal operation basically means that it will actually execute the operation. Additionally, the limiting of your infinite stream is the 'short-circuiting' aspect of that method.

Here are two the most relevant sentences of the java-stream documentation. The snippet you provided is a perfect example of these working together:
Stream::generate(Supplier<T> s) says that it returns:
Returns an infinite sequential unordered stream where each element is generated by the provided Supplier.
3rd dot of Stream package documentation:
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
In a shortcut, this generated stream await the further elements until the terminal operation is reached. As long as the execution inside the supplied Supplier<T>, the stream pipeline continues.
As an example, if you provide the following Supplier, the execution has no chance to stop and will continue infinitely:
Supplier<String> inputLine = () -> {
return "Hello world";
};

Related

How to fix "Intermediate Stream methods should not be left unused" on Sonarqube

I found this bug on Sonarqube:
private String getMacAdressByPorts(Set<Integer> ports) {
ports.stream().sorted(); // sonar list show "Refactor the code so this stream pipeline is used"
return ports.toString();
} //getMacAdressByPorts
I have been searching for a long time on the Internet, but it was no use. Please help or try to give some ideas how to achieve this.
The sorted() method has no effect on the Set you pass in; actually, it's a non-terminal operation, so it isn't even executed. If you want to sort your ports, you need something like
return ports.stream().sorted().collect(Collectors.joining(","));
EDIT:
as #Slaw correctly points out, to get the same format you had before (ie [item1, item2, item3], you also need to add the square brackets to the joining collector, ie Collectors.joining(", ", "[", "]"). I left those out for simplicity.
From the Sonar Source documentation about this warning (emphasis of mine):
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. After the terminal operation is performed, the stream pipeline is considered consumed, and cannot be used again. Such a reuse will yield unexpected results.
Official JavaDoc for stream() gives more details on sorted() (emphasis of mine):
Returns a stream consisting of the elements of this stream, sorted according to natural order. If the elements of this stream are not Comparable, a java.lang.ClassCastException may be thrown when the terminal operation is executed. [...] This is a stateful intermediate operation.
This implies that only using sorted() will yield no result. From the Oracle Stream package documentation (still emphasis of mine):
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
sorted() returns another stream(), and not a sorted list. To solve your Sonar issue (and maybe your code issue, in that manner), you have to call a terminal operation in order to run all the intermediate operations. You can find a list (non-exhausive, I think) of terminal operations on CodeJava for instance.
In your case, the solution might look like:
private String getMacAdressByPorts(Set<Integer> ports) {
/* Ports > Stream > Sort (intermediate operation) >
/ Collector (final operation) > List > String
/ Note that you need to import the static method toList()
/ from the Collector API, otherwise it won't compile
*/
return ports.stream().sorted().collect(toList()).toString();
}
I finally solved the problem used the code below.
private String getMacAdressByPorts(Set<Integer> ports) {
return ports.stream().sorted().collect(Collectors.toList()).toString();

Stream spliterator implementation detail

While looking into the source code of the WrappingSpliterator::trySplit, I was very mislead by it's implementation:
#Override
public Spliterator<P_OUT> trySplit() {
if (isParallel && buffer == null && !finished) {
init();
Spliterator<P_IN> split = spliterator.trySplit();
return (split == null) ? null : wrap(split);
}
else
return null;
}
And if you are wondering why this matters, is because for example this:
Arrays.asList(1,2,3,4,5)
.stream()
.filter(x -> x != 1)
.spliterator();
is using it. In my understanding the addition of any intermediate operation to a stream, will cause that code to be triggered.
Basically this method says that unless the stream is parallel, treat this Spliterator as one that can not be split, at all. And this matters to me. In one of my methods (this is how I got to that code), I get a Stream as input and "parse" it in smaller pieces, manually, with trySplit. You can think for example that I am trying to do a findLast from a Stream.
And this is where my desire to split in smaller chunks is nuked, because as soon as I do:
Spliterator<T> sp = stream.spliterator();
Spliterator<T> prefixSplit = sp.trySplit();
I find out that prefixSplit is null, meaning that I basically can't do anything else other than consume the entire sp with forEachRemaning.
And this is a bit weird, may be it makes some sense for when filter is present; because in this case the only way (in my understanding) a Spliterator could be returned is using some kind of a buffer, may be even with a predefined size (much like Files::lines). But why this:
Arrays.asList(1,2,3,4)
.stream()
.sorted()
.spliterator()
.trySplit();
returns null is something I don't understand. sorted is a stateful operation that buffers the elements anyway, without actually reducing or increasing their initial number, so at least theoretically this can return something other than null...
When you invoke spliterator() on a Stream, there are only two possible outcomes with the current implementation.
If the stream has no intermediate operations you’ll get the source spliterator that has been used to construct the stream and whose splitting capability is entirely independent from the stream’s parallel state, as in fact, the spliterator doesn’t know anything about the stream.
Otherwise, you’ll get a WrappingSpliterator, which will encapsulate a source Spliterator and a pipeline state, expressed as PipelineHelper. This combination of Spliterator and PipelineHelper does not need to work in parallel and, in fact, would not work in case of distinct(), as the WrappingSpliterator will get an entirely different combination, depending on whether the Stream is parallel or not.
For stateless intermediate operations, it would not make a difference though. But, as discussed in “Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?”, the WrappingSpliterator is a “one-fits-all implementation” that doesn’t consider the actual nature of the pipeline, so its limitations are the superset of all possible limitations of all supported pipeline stages. So the existence of one scenario that wouldn’t work when ignoring the parallel flag is enough to forbid splitting for all pipelines when not being parallel.

Reusing a stream when extra filtering is needed - Java

The following code, in summary, is mean to filter through the first parameter for entities that are in the second parameter. And if a 'change' was specified in the second parameter, it should filter a narrower result.
When I run this I get an error of 'IllegalStateException: stream has already been operated upon or closed.'
Is there a way that I can reuse the same stream?
I have seen people implementing something like Supplier>, but I don't think that would work for this case. Or, I'm just not familiar enough with it to understand how I could use Supplier.
/**
* Filters through DocumentAuditEntityListing to find existence of the entities
* ActionEnum, ActionContextEnum, LevelEnum, & StatusEnum.
*
* #param audits A list of audits to search
* #param toFind The audit entities to find
* #return If entities found, return DocumentAudit, else null
*/
public DocumentAudit verifyAudit(DocumentAuditEntityListing audits, DocumentAudit toFind) {
//Filter for proper entities
Stream stream = audits.getEntities().stream().filter(doc -> (
doc.getAction().equals(toFind.getAction())) &&
doc.getActionContext().equals(toFind.getActionContext()) &&
doc.getLevel().equals(toFind.getLevel()) &&
doc.getStatus().equals(toFind.getStatus()));
//If changes were specified, filter further.
if (toFind.getChanges() != null){
stream.filter(change -> (toFind.getChanges().contains(change)));
}
return (DocumentAudit) stream.findFirst().orElse(null);
}
You need to assign the resulting stream with stream = stream.filter(change -> ...).
Also make the stream typesafe (i.e. Stream<DocumentAudit>). Generics have been around since Java 5, so you don't have an excuse to use raw types.
Stream<DocumentAudio> stream = audits.getEntities().stream()
.filter(doc ->
doc.getAction().equals(toFind.getAction())) &&
doc.getActionContext().equals(toFind.getActionContext()) &&
doc.getLevel().equals(toFind.getLevel()) &&
doc.getStatus().equals(toFind.getStatus());
if (toFind.getChanges() != null){
stream = stream.filter(change -> toFind.getChanges().contains(change));
}
return stream.findFirst().orElse(null);
The Supplier is not a trick, but simply used to facilitate the creation of a new stream. The rule is is still the same - you can not reuse a stream, under no circumstances. The general approach of using the Supplier is something like this:
Supplier<Stream<Something>> sup = () -> yourList.stream();
So every time you call sup.get() you are getting a new Stream. There is nothing fancy in this, just a matter of style.
Actually your question is a bit different then the ones I have already seen like this. Basically you are doing:
Stream<Some> s = ...
s.filter
s.filter
s.forEach
And I find the error message a bit misleading here, what stream has already been operated upon or closed means is actually that a single terminal or intermediate operation can be applied only once to a single Stream. If this would have been more clear in the error message it would make more sense.
The error message is confusing, because the stream is not being reused (only one terminal operation is being called, which is findFirst()).
However, as pointed out by others, the application of stream.filter(...) within the if block is being lost, because the stream returned by that filter operation is never asigned to a variable. This is the cause of such confusing error.
Despite all this, I would like to show you another way to deal with this requirement. Instead of conditionally applying an extra filter, you could work on the predicate first and then, once ready, pass it to the stream.filter(...) operation:
Predicate<DocumentAudit> pred = doc ->
doc.getAction().equals(toFind.getAction()) &&
doc.getActionContext().equals(toFind.getActionContext()) &&
doc.getLevel().equals(toFind.getLevel()) &&
doc.getStatus().equals(toFind.getStatus());
// If changes were specified, filter further.
if (toFind.getChanges() != null) {
pred = pred.and(change -> toFind.getChanges().contains(change));
}
return audits.getEntities().stream()
.filter(pred)
.findFirst()
.orElse(null);
Some comments...
I've used Predicate.and to compose the conditions.
You were using a raw stream and casting in the last line. Please don't do that anymore. We are on 2017...
Ideally, you should use Optional as the returning value of your method (instead of returning either the found object or null). This way, users of your method would know in advance that the object might either have been found or not. And they could even use Optional's handy operations to conditionally perform actions over the object, or conditionally map it to some of its attributes, etc.

Does the JDK provide a dummy consumer?

I have a need in a block of code to consume 'n' items from a stream then finish, in essence:
public static <T> void eat(Stream<T> stream, int n)
// consume n items of the stream (and throw them away)
}
In my situation, I can't change the signature to return Stream<T> and simply return stream.skip(n); I have to actually throw away some elements from the stream (not simple logic) - to be ready for a down stream consumer which doesn't need to know how, or even that, this has happened.
The simplest way to do this is to use limit(n), but I have to call a stream terminating method to activate the stream, so in essence I have:
public static <T> void skip(Stream<T> stream, int n) {
stream.limit(n).forEach(t -> {});
}
Note: This code is a gross over simplification of the actual code and is for illustrative purposes only. Actually, limit won't work because there is logic around what/how to consume elements. Think of it like consuming "header" elements from a stream, then having a consumer consume the "body" elements.
This question is about the "do nothing" lambda t -> {}.
Is there a "do nothing" consumer somewhere in the JDK, like the "do nothing" function Function.identity()?
No, JDK does not provide dummy consumer as well as other predefined functions like dummy runnable, always-true predicate or supplier which always returns zero. Just write t -> {}, it's anyways shorter than calling any possible ready method which will do the same.
Introducing the dummy (empty) consumer was considered in the scope of the ticket:
[JDK-8182978] Add default empty consumer - Java Bug System.
Archived: [JDK-8182978] Add default empty consumer - Java Bug System.
According to the ticket, it was decided not to introduce it.
Therefore, there is no dummy (empty) consumer in the JDK.
Yes. Well, more or less yes...
Since a Function is also a Consumer, you can use Function.identity() as a "do nothing" Consumer.
However, the compiler needs a little help to make the leap:
someStream.forEach(identity()::apply);

Java 8 Stream: Iterating, Processing and Count

Is it ok to process and count processed data in such way?
long count = userDao.findApprovedWithoutData().parallelStream().filter(u -> {
Data d = dataDao.findInfoByEmail(u.getEmail());
boolean ret = false;
if (d != null) {
String result = "";
result += getFieldValue(d::getName, ". \n");
result += getFieldValue(d::getOrganization, ". \n");
result += getFieldValue(d::getAddress, ". \n");
if(!result.isEmpty()) {
u.setData(d.getInfo());
userDao.update(u);
ret = true;
}
}
return ret;
}).count();
So, in short: iterate over not complete records, update if data is present and count this number of records?
IMHO this is bad code, because:
The filter predicate has (quite significant) side effects
Predicates should not have side effects (just like getters shouldn't). It's unexpected, and that makes it bad.
The filter predicate is very inefficient
Each execution of the predicate causes a large chain of queries to fire, which makes this code not scaleable.
At first glance, the main purpose seems to be getting a count, but really that's a minor (dispensable) bit of info
Good code makes it obvious what is going on (unlike this code)
You should change the code to use a (fairly simple) single update query (that employs a join) and get the count from the "number of rows updated" info in the result from the persistence API.
It depends on your definition of process . I cannot give you a clear yes or no because, I think it is hard to conclude without understanding your code and how it is implemented.
You are using Parallel Stream and what happens there is Java runtime splits the Stream into sub-streams based on number of available threads in ForkJoinPool's common pool.
When using parallelism you need to be careful for possible side effects:
Interference (Lambda expression in a stream should not interfere)
Lambda expressions in stream operations should not interfere.
Interference occurs when the source of a stream is modified while a
pipeline processes the stream.
Statetful Lambda expressions
Avoid using stateful lambda expressions as parameters in stream
operations. A stateful lambda expression is one whose result depends
on any state that might change during the execution of a pipeline.
Looking at your question and applying the above points to it.
Non-interference > strongly states that Lambda expressions should not interfere with the source of stream (unless stream source is concurrent) during pipeline operation because it can cause:
Exception (i.e. ConcurrentModificationException)
Incorrect Answer
Nonconformant behaviour
With exception of well-behaved streams where the modification takes place during intermediate operation (i.e. filter), read more in here.
Your Lambda expression does interfere with the source of the stream, which is not advised but, the interference is within Intermediate operation and now everything comes down to whether the stream is well-behaved or not. So you might consider re-thinking your lambda expression when it comes to interference. It might also come down to how you update the source of the stream via userDao.udpate, which is not clear from your question.
Stateful Lambda Expression > Your Lambda expression does not seem to be stateful and that is because the result of Lambda depends on value/s that do not change during the execution of the pipeline. So this does not apply to your case.
I advise you go through the documentation of Java 8 Stream as well as this blog which explains Java 8 Stream really well with examples.

Categories

Resources