Java8 - Count after filter on stream - java

I hope this question was not asked before.
In java 8, I have an array of String myArray in input and an integer maxLength.
I want to count the number of string in my array smaller than maxLength. I WANT to use stream to resolve this issue.
For that I thought to do this :
int solution = Arrays.stream(myArray).filter(s -> s.length() <= maxLength).count();
However I'm not sure if it is the right way to do this. It will need to go through first array once and then go through the filtered array to count.
But if I don't use a stream, I could easely make an algorithm where I loop once over myArray.
My questions are very easy: Is there a way to resolve this issue with the same time performance than with a loop ? Is it always a "good" solution to use stream ?

However I'm not sure if it is the right way to do this. It will need
to go through first array once and then go through the filtered array
to count.
Your assumption that it will perform multiple passes is wrong. There is something calling operation fusion i.e. multiple operations can be executed in a single pass on the data;
In this case Arrays.stream(myArray) will create a stream object (cheap operation and lightweight object) , filter(s -> s.length() <= maxLength).count(); will be combined into a single pass on the data because there is no stateful operation in the pipeline as opposed to filtering all the elements of the stream and then counting all the elements which pass the predicate.
A quote from Brian Goetz post here states:
Stream pipelines, in contrast, fuse their operations into as few
passes on the data as possible, often a single pass. (Stateful
intermediate operations, such as sorting, can introduce barrier points
that necessitate multipass execution.)
As for:
My questions are very easy: Is there a way to resolve this issue with
the same time performance than with a loop ?
depends on the amount of data and cost per element. Anyhow, for a small number of elements the imperative for loops will almost always win if not always.
Is it always a "good" solution to use stream ?
No, if you really care about performance then measure, measure and measure.
Use streams for it being declarative, for its abstraction, composition and the possibility of benefitting from parallelism when you know you will benefit from it that is.

You can use range instead of stream and filter the output.
int solution = IntStream.range(0, myArray.length)
.filter(index -> myArray[index].length() <= maxLength)
.count();

Related

How do elements go through the stream?

How do elements of a stream go thought the stream itself? Is it like it takes 1 element and passes it thought all functions (map then sort then collect) and then takes second elements and repeats the cycle or is it like it takes all elements and maps them then sorts and finally collects?
new ArrayList<Integer>().stream()
.map(x -> x.byteValue())
.sorted()
.collect(Collectors.toList());
It depends entirely on the stream. It is usually evaluated lazily, which means it takes it one element at a time, but under certain conditions it needs to get all the elements before it continues to the next step. For example, consider the following code:
IntStream.generate(() -> (int) (Math.random() * 100))
.limit(20)
.filter(i -> i % 2 == 0)
.sorted()
.forEach(System.out::println);
This stream generates random numbers from 0 to 99, limited to 20 elements, after which it filters the numbers by checking wether or not they are even, if they are, they continue. Until now, it's done one element at a time. The change comes when you request a sorting of the stream. The sorted() method sorts the stream by the natural ordering of the elements, or by a provided comparator. For you to sort something you need access to all elements, because you don't know the last element's value until you get it. It could be the first element after you sort it. So this method waits for the entire stream, sorts it and returns the sorted stream. After that this code just prints the sorted stream one element at a time.
That depends on the actual Streamimplementation. This mostly applies to parallel streams, because spliterators tend to chunk the amount of data and you don't know which element will be process when.
In general, a stream goes through each element in order (but doesn't have to). The simplest way to check this behaviour is to put in some breakpoints and see when they actually hit.
Also, certain operations may wait until all prior operations are executed (namely collet())
I advise to check the javadoc and read it carefully, because it gives away enough hints to get an expectation.
something like this, yes.
if you have a stream of integers let's say 1,2,3,4,5 and you do some operations on it, let's say stream().map(x -> x*3).filter(x -> x%2==0).findFirst()
it will first take the first value (1), it will be multiplied by 3, and then it will check if it's even.
Because it's not, it will take the second one (2), multiply by 3 (=6), check if it is even (it is), find first.
this will be the first one and now it stops and returns.
Which means the other integers from the stream won't be evaluated (multiplied and checked if even) as it is not necessary

Length of an infinite IntStream?

I have created an randomIntStream by this:
final static PrimitiveIterator.OfInt startValue = new Random().ints(0, 60).iterator();
The documentation says this stream is actually endless.
I want to understand what happens there in the backround.
ints(0,60) is generating an infinite stream of integers. If this is infinite, why my machine is not leaking any memory?
I wonder, how many numbers are actually really generated and if this implemenentation can cause an error at the point where the stream still ends? Or will this stream constantly filled with new integers on the fly and it really never ends therefore?
And if I already ask this question, what is the best practise right now to generate random numbers nowadays?
The stream is infinite¹ so you can generate as many ints as you want without running out. It does not mean that it keeps generating them when you aren't asking for any.
How many numbers are actually generated depends on the code you write. Every time you retrieve a value from the iterator, a value is generated. None is generated in the background, so there's no "extra" memory being used.
¹ as far as your lifetime is concerned, see Eran's answer
To be exact,
IntStream java.util.Random.ints(int randomNumberOrigin, int randomNumberBound) returns:
an effectively unlimited stream of pseudorandom int values, each conforming to the given origin (inclusive) and bound (exclusive).
This doesn't mean infinite. Looking at the Javadoc, you'll see an implementation note stating that it actually limits the returned IntStream to Long.MAX_VALUE elements:
Implementation Note:
This method is implemented to be equivalent to ints(Long.MAX_VALUE, randomNumberOrigin, randomNumberBound).
Of course Long.MAX_VALUE is a very large number, and therefore the returned IntStream can be seen as "effectively" without limit. For example, if you consume 1000000 ints of that stream every second, it will take you about 292471 years to run out of elements.
That said, as mentioned by the other answers, that IntStream only generates as many numbers as are required by its consumer (i.e. the terminal operation that consumes the ints).
Streams do not (in general1) store all of their elements in any kind of a data structure:
No storage. A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Instead, each stream element is computed one-by-one each time the stream advances. In your example, each random int would actually be computed when you invoke startValue.nextInt().
So when we do e.g. new Random().ints(0,60), the fact that the stream is effectively infinite isn't a problem, because no random ints are actually computed until we perform some action that traverses the stream. Once we do traverse the stream, ints are only computed when we request them.
Here's a small example using Stream.generate (also an infinite stream) which shows this order of operations:
Stream.generate(() -> {
System.out.println("generating...");
return "hello!";
})
.limit(3)
.forEach(elem -> {
System.out.println(elem);
});
The output of that code is:
generating...
hello!
generating...
hello!
generating...
hello!
Notice that our generator supplier is called once just before every call of our forEach consumer, and no more. If we didn't use limit(3), the program could run forever, but it wouldn't run out of memory.
If we did new Random().ints(0,60).forEach(...), it would work the same way. The stream would do random.nextInt(60) once before every call to the forEach consumer. The elements wouldn't be accumulated anywhere unless we used some action that required it, such as distinct() or a collector instead of forEach.
Some streams probably use a data structure behind the scenes for temporary storage. For example, it's common to use a stack during tree traversal algorithms. Also, some streams such as those created using a Stream.Builder will require a data structure to put their elements in.
As said by #Kayaman in his answer. The stream is infinite in the way that numbers can be generated forever. The point lies in the word can. It does only generate numbers if you really request them. It will not just generate X amount of numbers and then stores them somewhere (unless you tell it to do so).
So if you want to generate n (where n is an integer) random numbers. You can just call the overload of ints(0, 60), ints(n, 0, 60) on the stream returned by Random#ints():
new Random().ints(n, 0, 60)
Above will still not generate n random numbers, because it is an IntStream which is lazily executed. So when not using a terminal operation (e.g. collect() or forEach()) nothing really happens.
Creating a generator does not generate any numbers. In concept, this generator will continually generate new numbers; there is no point at which it would not return the next value when asked.

Sorted stream derived from Infinite Stream fails to iterate

import java.util.stream.*;
import java.util.*;
class TestInfiniteStream {
public static void main(String args[]) {
IntStream infiniteStream = new Random().ints();
IntStream sortedStream = infiniteStream.sorted();
sortedStream.forEach(i -> System.out.println(i));
}
}
After compiling and executing this code I get the following error.
Exception in thread "main" java.lang.IllegalArgumentException: Stream size exceeds max array size
Does sorting a stream fail on an infinite stream?
The simple answer to “Does sorting a stream fail on an infinite stream?” is “Yes.” sorted() is a stateful intermediate operation which has been implemented by buffering the entire contents and sorting it, before passing any elements to the downstream operations.
In theory, it doesn’t need to be that way. Since you are using forEach, which has been explicitly specified as processing the elements in an undefined order, the sorting step could be omitted in your new Random().ints().sorted().forEach(System.out::println); use case. But even if you used forEachOrdered, there is a theoretically achievable correct answer. Since your stream is infinite and will repeatedly contain all int values, a correct sorted output would print -2147483648 (==Integer.MIN_VALUE) forever, as that’s the smallest value that is contained infinite times in that stream.
However, to give this correct answer, the implementation would need specific code to handle this scenario, which is of not much practical value. Instead, the implementation handles this case like any other sorting of a stream scenario, which will fail for infinite streams.
In this specific case, the stream has an optimization that leads to a different, unusual exception message. As Eugene pointed out, this stream behaves like a fixed size stream of Long.MAX_VALUE (==2⁶³) elements rather than a truly infinite stream. That’s fair, considering that the stream produced by Random will repeat after 2⁴⁸ values, so the entire stream has been repeated 32768 times before it will end instead of running forever. You are unlikely to witness this “sudden” ending after processing 9223372036854775807 elements anyway. But a consequence of this optimization is that the stream will fail-fast with the “Stream size exceeds max array size” message instead of failing with an “OutOfMemoryError” after some processing.
If you eliminate the size information, e.g. via
new Random().ints().filter(x -> true).sorted().forEach(System.out::println);
the operation will try to buffer until failing with java.lang.OutOfMemoryError. The same happens with
IntStream.generate(new Random()::nextInt).sorted().forEach(System.out::println);
which provides no size information to the stream in the first place. In either case, it never gets to sort anything as the buffering happens before the sorting starts.
If you want to get “sorted runs for some limit of elements” as you said in a comment, you have to apply a limit before sorting, e.g.
new Random().ints().limit(100).sorted().forEach(System.out::println);
though it will be more efficient to still use a sized stream, e.g.
new Random().ints(100).sorted().forEach(System.out::println);
No, you cannot sort an infinite stream.
Your infinite stream new Random().ints() produces more integers than can be stored in an array (or any array), which is used behind the scenes for storing the integers to be sorted. An array of course cannot hold an infinite number of integers; only a capacity close to Integer.MAX_VALUE numbers.
Taking a step back, how would anyone or anything sort an infinite number of integers? Nothing can and no one can; it would take at least an infinite amount of time.
Does sorting a stream fail on an infinite stream?
You've kind of answered your own question; the IllegalArgumentException is the specific cause of the failure. It only occurred because you made a program that attempted to do it, and you ran up against a Java array limitation.
The sorted method will attempt to read the entire stream before sorting anything. It won't do any intermediate sorting before it has read the entire stream, so no sorting will be done, partial or full.
Well that is an interesting question, unfortunately your key description is in comments. First of all, there isn't really an "infinite" stream in your case, the responsible spliterator RandomIntsSpliterator even says that:
... and also by treating "infinite" as equivalent to Long.MAX_VALUE...
And now the interesting part, that spliterator will will report these characteristics:
public int characteristics() {
return (Spliterator.SIZED | Spliterator.SUBSIZED |
Spliterator.NONNULL | Spliterator.IMMUTABLE);
}
Well, I don't know the reasons of why, but reporting SIZED or SUBSIZED for an infinite stream... May be it does not matter either (because you usually chain these with a limit).
Well because SIZED is reported (with a size of Long.MAX_VALUE), there is a sink for this internally SizedIntSortingSink that has a check:
if (size >= Nodes.MAX_ARRAY_SIZE)
throw new IllegalArgumentException(Nodes.BAD_SIZE);
which obviously will fail.
On the contrast IntStream.generate does not report SIZED - which makes sense to me, so the entire input has to be buffered by sorted and later handled to the terminal operation; obviously this will fail with an OutOfMemory.
It's also interesting to prove that distinct will not act as a full barrier here, waiting for all values to be processed. Instead it can pass them to the terminal operation, once it knows that this has not been seen before:
Random r = new Random();
IntStream.generate(() -> r.nextInt())
.distinct()
.forEach(System.out::println);
This will run for quite a while, before finally dying with a OutOfMemory. or as very nicely added by Holger in comments, it could hang too:
IntStream.generate(() -> new Random()
.nextInt(2))
.distinct()
.forEach(System.out::println);

Prioritize stream filter functions in Java

I am seeking for an option to filter in-streams but using priorities.
The following is the pseudo code:
results.stream().filter(prio1).ifNotFound(filter(prio2)).collect(toList())
The list of results shall be filtered by first criteria called "prio1" and if there ain't no match found the second filter shall be applied to try filter on second criteria called prio2 and then the results shall be collected
How do I achive this in Java 8 using streams?
I am looking for a one-liner in stream.
You will need to stream() your results twice, but the following should work as a one-liner:
results.stream().filter(results.stream().anyMatch(prio1) ? prio1 : prio2).collect(Collectors.toList());
(Credit to flakes for first publishing a multiple-liner using a similar strategy.)
Edit: Since some excellent new answers have come to light, I thought I would offer a short defense of this multiple-stream / anyMatch strategy making reference to certain other parts of this thread:
As pointed out by eckes, anyMatch is optimized to return early and thus minimal time is spent reading the extra stream (especially for the case where prio1 is likely to match). In fact, anyMatch will only read the whole stream in the fallback (prio2) case, so for the average run you are only iterating through one-and-a-fraction list lengths.
Using the Collectors.groupingBy(...) method constructs a Map and two Lists in every case, while the approach above only creates at most a single List. The difference in memory overhead here will become quite significant as the size of results increases. The grouping is done for the entire stream, so even if the very first element happens to pass prio1, every element has to be checked against prio1.or(prio2) and then against prio1 once more.
groupingBy does not account for the case where prio1 and prio2 are not mutually exclusive. If prio2.test(e) can return true for some e which passes prio1, such elements will be missing within the fallback prio2 list. Using anyMatch and one filter at a time avoids this problem.
The line length and complexity of the above method seems far more manageable to me.
Just another approach that does not use anyMatch, but rather groups the entries before operating on the results.
Optional.of(results.stream()
.filter(prio1.or(prio2))
.collect(Collectors.groupingBy(prio1::test)))
.map(map -> map.getOrDefault(true, map.get(false)))
.ifPresent(System.out::println);
I used Optional so that you have a "one liner" (just formatted it, so that it gets more readable). Instead of ifPresent you could also just use orElseGet(Collections::emptyList) and save the result into a List<String>.
The groupingBy puts all prio1-matching entries from the prio1 and prio2 filtered entries into the key true and the remaining prio2-matching entries into false. If we haven't any entries in true, then the prio2-filtered entries are returned as default. If there aren't any prio1 or prio2-matching results, nothing happens.
Note that if you return the Map directly then you only have all prio2-matching entries in false if your filters are mutually exclusive.
Just make a condition:
final List<Foo> foo;
if (results.stream().anyMatch(prio1)) {
foo = results.stream().filter(prio1).collect(Collectors.toList());
} else {
foo = results.stream().filter(prio2).collect(Collectors.toList());
}
If you really want a one liner then you can do the following, but there's no way to get around streaming the list twice. I would argue that the if/else version is cleaner and easier to maintain.
final List<Foo> foo = results.stream()
.filter(results.stream().anyMatch(prio1)? prio1 : prio2)
.collect(Collectors.toList());

When should I use IntStream.range in Java?

I would like to know when I can use IntStream.range effectively. I have three reasons why I am not sure how useful IntStream.range is.
(Please think of start and end as integers.)
If I want an array, [start, start+1, ..., end-2, end-1], the code below is much faster.
int[] arr = new int[end - start];
int index = 0;
for(int i = start; i < end; i++)
arr[index++] = i;
This is probably because toArray() in IntStream.range(start, end).toArray() is very slow.
I use MersenneTwister to shuffle arrays. (I downloaded MersenneTwister class online.) I do not think there is a way to shuffle IntStream using MersenneTwister.
I do not think just getting int numbers from start to end-1 is useful. I can use for(int i = start; i < end; i++), which seems easier and not slow.
Could you tell me when I should choose IntStream.range?
There are several uses for IntStream.range.
One is to use the int values themselves:
IntStream.range(start, end).filter(i -> isPrime(i))....
Another is to do something N times:
IntStream.range(0, N).forEach(this::doSomething);
Your case (1) is to create an array filled with a range:
int[] arr = IntStream.range(start, end).toArray();
You say this is "very slow" but, like other respondents, I suspect your benchmark methodology. For small arrays there is indeed more overhead with stream setup, but this should be so small as to be unnoticeable. For large arrays the overhead should be negligible, as filling a large array is dominated by memory bandwidth.
Sometimes you need to fill an existing array. You can do that this way:
int[] arr = new int[end - start];
IntStream.range(0, end - start).forEach(i -> arr[i] = i + start);
There's a utility method Arrays.setAll that can do this even more concisely:
int[] arr = new int[end - start];
Arrays.setAll(arr, i -> i + start);
There is also Arrays.parallelSetAll which can fill an existing array in parallel. Internally, it simply uses an IntStream and calls parallel() on it. This should provide a speedup for large array on a multicore system.
I've found that a fair number of my answers on Stack Overflow involve using IntStream.range. You can search for them using these search criteria in the search box:
user:1441122 IntStream.range
One application of IntStream.range I find particularly useful is to operate on elements of an array, where the array indexes as well as the array's values participate in the computation. There's a whole class of problems like this.
For example, suppose you want to find the locations of increasing runs of numbers within an array. The result is an array of indexes into the first array, where each index points to the start of a run.
To compute this, observe that a run starts at a location where the value is less than the previous value. (A run also starts at location 0). Thus:
int[] arr = { 1, 3, 5, 7, 9, 2, 4, 6, 3, 5, 0 };
int[] runs = IntStream.range(0, arr.length)
.filter(i -> i == 0 || arr[i-1] > arr[i])
.toArray();
System.out.println(Arrays.toString(runs));
[0, 5, 8, 10]
Of course, you could do this with a for-loop, but I find that using IntStream is preferable in many cases. For example, it's easy to store an unknown number of results into an array using toArray(), whereas with a for-loop you have to handle copying and resizing, which distracts from the core logic of the loop.
Finally, it's much easier to run IntStream.range computations in parallel.
Here's an example:
public class Test {
public static void main(String[] args) {
System.out.println(sum(LongStream.of(40,2))); // call A
System.out.println(sum(LongStream.range(1,100_000_000))); //call B
}
public static long sum(LongStream in) {
return in.sum();
}
}
So, let's look at what sum() does: it counts the sum of an arbitrary stream of numbers. We call it in two different ways: once with an explicit list of numbers, and once with a range.
If you only had call A, you might be tempted to put the two numbers into an array and pass it to sum() but that's clearly not an option with call B (you'd run out of memory). Likewise you could just pass the start and end for call B, but then you couldn't support the case of call A.
So to sum it up, ranges are useful here because:
We need to pass them around between methods
The target method doesn't just work on ranges but any stream of numbers
But it only operates on individual numbers of the stream, reading them sequentially. (This is why shuffling with streams is a terrible idea in general.)
There is also the readability argument: code using streams can be much more concise than loops, and thus more readable, but I wanted to show an example where a solution relying on IntStreans is functionally superior too.
I used LongStream to emphasise the point, but the same goes for IntStream
And yes, for simple summing this may look like a bit of an overkill, but consider for example reservoir sampling
IntStream.range returns a range of integers as a stream so you can do stream processing over it.
like taking square of each element
IntStream.range(1, 10).map(i -> i * i);
Here are few differences that comes to my head between IntStream.range and traditional for loops :
IntStream are lazily evaluated, the pipeline is traversed when calling a terminal operation. For loops evaluate at each iteration.
IntStream will provides you some functions that are commonly applied to a range of ints such as sum and avg.
IntStream will allow you to code multiple operation over a range of int in a functional way which read more fluently - specially if you have a lot of operations.
So basically use IntStream when one or more of these differences are useful to you.
But please bear in mind that shuffling a Stream sound quite strange as a Stream is not a data structure and therefore it does not really make sense to shuffle it (in case you were planning on building a special IntSupplier). Shuffle the result instead.
As for the performance, while there may be a few overhead, you will still iterate N times in both case and should not really care more.
Basically, if you want Stream operations, you can use the range() method. For example, to use concurrency or want to use map() or reduce(). Then you are better off with IntStream.
For example:
IntStream.range(1, 5).parallel().forEach(i -> heavyOperation());
Or:
IntStream.range(1, 5).reduce(1, (x, y) -> x * y)
// > 24
You can achieve the second example also with a for-loop, but you need intermediate variables etc.
Also, if you want the first match for example, you can use findFirst() and cousins to stop consuming the rest of the Stream
It totally depends on the use case. However, the syntax and stream API adds lot of easy one liners which can definitely replace the conventional loops.
IntStream is really helpful and syntactic sugar in some cases,
IntStream.range(1, 101).sum();
IntStream.range(1, 101).average();
IntStream.range(1, 101).filter(i -> i % 2 == 0).count();
//... and so on
Whatever you can do with IntStream you can do with conventional loops. As one liner is more precise to understand and maintain.
Still for negative loops we can not use IntStream#range, it only works in positive increment. So following is not possible,
for(int i = 100; i > 1; i--) {
// Negative loop
}
Case 1 : Yes conventional loop is much faster in this case as toArray has a bit overhead.
Case 2 : I don't know anything about it, my apologies.
Case 3 : IntStream is not slow at all, IntStream.range and conventional loop are almost same in terms of performance.
See :
Java 8 nested loops with streams & performance
You could implement your Mersenne Twister as an Iterator and stream from that.

Categories

Resources