How Stream is more efficient? - java

I am trying to digest Stream package and seems like it's very difficult for me to understand.
I was reading Stream package documentation and at a point I tried to implement it to learn by doing. This is the text I have read:
Intermediate operations return a new stream. They are always lazy;
executing an intermediate operation such as filter() does not actually
perform any filtering, but instead creates a new stream that, when
traversed, contains the elements of the initial stream that match the
given predicate. Traversal of the pipeline source does not begin until
the terminal operation of the pipeline is executed.
I understand this much that they provide a new Stream, so my first question is, Is creating a stream without traversing a heavy operation?
Now, since intermediate operations are lazy and terminal operations are eager and also streams are meant to be efficient than old programming standards of if-else and more readable.
Processing streams lazily allows for significant efficiencies; in a
pipeline such as the filter-map-sum example above, filtering, mapping,
and summing can be fused into a single pass on the data, with minimal
intermediate state. Laziness also allows avoiding examining all the
data when it is not necessary; for operations such as "find the first
string longer than 1000 characters", it is only necessary to examine
just enough strings to find one that has the desired characteristics
without examining all of the strings available from the source. (This
behavior becomes even more important when the input stream is infinite
and not merely large.)
To demonstrate this, I started implemented a small program to understand the concept. Here is the program:
List<String> stringList = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
stringList.add("String" + i);
}
long start = System.currentTimeMillis();
Stream stream = stringList.stream().filter(s -> s.contains("99"));
long midEnd = System.currentTimeMillis();
System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
System.out.println(stream.findFirst().get());
long end = System.currentTimeMillis();
System.out.println("Whole time in millis: " + (end - start));
System.out.println("Time in millis for Terminal operation: " + (end - midEnd));
start = System.currentTimeMillis();
for (String ss1 : stringList) {
if (ss1.contains("99")) {
System.out.println(ss1);
break;
}
}
end = System.currentTimeMillis();
System.out.println("Time in millis with old standard: " + (end - start));
I have executed this program many times and each time it has proved me that, creating a new stream from intermediate operations is the heavy task to do. Terminal operations do take very little time as compared to intermediate operations.
And overall, old if-else pattern is way more efficient than streams. So, again more questions here:
Did I misunderstand something?
If I understand correct, why and when to use streams?
If I am doing or understanding anything wrong, can you please help clarify my concepts Package java.util.stream?
Actual Numbers:
Try 1:
Time is millis before applying terminal operation: 73
String99
Whole time in millis: 76
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0
Try 2:
Time is millis before applying terminal operation: 56
String99
Whole time in millis: 59
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0
Try 3:
Time is millis before applying terminal operation: 69
String99
Whole time in millis: 72
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0
These are my machine details if this help:
Memory: 11.6 GiB
Processor: Intel® Core™ i7-3632QM CPU # 2.20GHz × 8
OS-Type: 64-bit

One of the rationales for the Stream api is that it eliminates the inherent assumption of the for loop, that all iteration happens in the same way. When you use an iterator-based for loop, you are hard-coding the iteration logic to always iterate sequentially. Consider the question, "what if I wanted to change the implementation of the 'for' loop with something more efficient?"
The Stream api addresses that--it abstracts the notion of iteration and allows other ways of processing multiple data points to be considered -- iterate serially vs. in parallel, add optimizations if it is known that the data is unordered, etc.
Consider your example--although you can't change the implementation of the for loop, you can change the implementation of the Stream to suit different situations. For example, if you have more cpu-intensive operations to do on each task, you might choose a parallel Stream. Here's an example with 10 ms delays simulate more complex processing, done in parallel, with very different results:
List<String> stringList = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
stringList.add("String" + i);
}
long start = System.currentTimeMillis();
Stream stream = stringList.parallelStream().filter(s -> {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
return s.contains("99" );});
long midEnd = System.currentTimeMillis();
System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
System.out.println(stream.findAny().get());
long end = System.currentTimeMillis();
System.out.println("Whole time in millis: " + (end - start));
System.out.println("Time in millis for Terminal operation: " + (end - midEnd));
start = System.currentTimeMillis();
for (String ss1 : stringList) {
try {
Thread.sleep(20);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (ss1.contains("99")) {
System.out.println(ss1);
break;
}
}
end = System.currentTimeMillis();
System.out.println("Time in millis with old standard: " + (end - start));
I kept the same benchmark logic everyone is complaining about, to make it easier for you to compare.
As you can see, there are situations where for loops will always be more efficient than using a Stream, but Streams offer significant advantages in certain situations as well. It would be unwise to extrapolate from one isolated test that one approach is always better than the other--that is an axiom for life as well.

Unless your tests involve JMH, then your code is pretty much a proof of nothing and even worse, it will give an ALTERED impression of reality.
assylias made the comment that should make it clear on what goes wrong.
Also your measurements of the "intermediate operation" and then the "short circuit" are also wrong. The intermediate operation, because it is lazy, does nothing really, it will only take place when a terminal one will kick in.
If you ever worked with guava, this is how transform/filter is done in their code also, at least logically.

As others have already have noted your benchmark is flawed. The main problem is that the results are skewed by ignoring compilation time. Try the following:
Stream stream = stringList.stream().filter(s -> s.contains("99"));
long start = System.currentTimeMillis();
stream = stringList.stream().filter(s -> s.contains("99"));
long midEnd = System.currentTimeMillis();
Now the code that backs filter is already compiled and the second call is fast. Even this would work:
Stream stream = stringList.stream().map(s -> s);
long start = System.currentTimeMillis();
stream = stringList.stream().filter(s -> s.contains("99"));
long midEnd = System.currentTimeMillis();
map shares most of the code with filter, so calling filter is fast here, too, because the code is already compiled. And in case you ask: Calling filter or map on a different stream would work too, of course.
Your "old style" code doesn't require additional compilation.

I really don't trust your "benchmark", because too many things can go wrong, you better use a framework. But anyways, when people or docs say it is more efficient they don't mean the example you provided.
Streams as lifted collection (they don't hold data) are more efficient than eager ones like Scala Lists for instance where a filter allocates a new List and the map transforms the results to a new List.
When we compare with this implementation Streams win.
But yeah streams allocate objects which is vey cheap on modern JVMs and looked after in modern GC's.

Related

How to get the time taken for part of a program in Unix

I'm working on comparing a Binary Search Tree to an AVL one and want to see the usr/sys time for a search operation performed on both. Thing is: I have an application (SearchBST.java/SearchAVL.java) that reads in a file and populates the trees, and then searches them. I want to know if I can check the usr/sys time for just the searching instead of the entire thing (inserting and searching). It seems to me that the insertion is causing the AVL's time (using "time java SearchAVL") to be roughly the same as the BST's.
Should I be doing it differently (such that populating the tree doesn't affect the overall time)? I'll post some code as soon as I can, but I wanted to see if anyone has any thoughts in the mean time.
Why don't you measure the time inside your application?
// Read file to a temporary collection or array
// to prevent meassuring disk performance instead of tree performance
long t = System.nanoTime();
// populate tree
long tPopulate = System.nanoTime() - t;
t = System.nanoTime();
// search tree
long tSearch = System.nanoTime() - t;
System.out.println("tPopulate = " + tPopulate + " ns");
System.out.println("tSearch = " + tSearch + " ns");
This will only print the wall clock time, but since you don't have any Thread.sleep(...) commands or things like that in your program, the wall clock time shouldn't differ much from the user time.

Why is parallel stream slower?

I was playing around with infinite streams and made this program for benchmarking. Basically the bigger the number you provide, the faster it will finish. However, I was amazed to find that using a parellel stream resulted in exponentially worse performance compared to a sequential stream. Intuitively, one would expect an infinite stream of random numbers to be generated and evaluated much faster in a multi-threaded environment, but this appears not to be the case. Why is this?
final int target = Integer.parseInt(args[0]);
if (target <= 0) {
System.err.println("Target must be between 1 and 2147483647");
return;
}
final long startTime, endTime;
startTime = System.currentTimeMillis();
System.out.println(
IntStream.generate(() -> new Double(Math.random()*2147483647).intValue())
//.parallel()
.filter(i -> i <= target)
.findFirst()
.getAsInt()
);
endTime = System.currentTimeMillis();
System.out.println("Execution time: "+(endTime-startTime)+" ms");
I totally agree with the other comments and answers but indeed your test behaves strange in case that the target is very low. On my modest laptop the parallel version is on average about 60x slower when very low targets are given. This extreme difference cannot be explained by the overhead of the parallelization in the stream APIs so I was also amazed :-). IMO the culprit lies here:
Math.random()
Internally this call relies on a global instance of java.util.Random. In the documentation of Random it is written:
Instances of java.util.Random are threadsafe. However, the concurrent
use of the same java.util.Random instance across threads may encounter
contention and consequent poor performance. Consider instead using
ThreadLocalRandom in multithreaded designs.
So I think that the really poor performance of the parallel execution compared to the sequential one is explained by the thread contention in random rather than any other overheads. If you use ThreadLocalRandom instead (as recommended in the documentation) then the performance difference will not be so dramatic. Another option would be to implement a more advanced number supplier.
The cost of passing work to multiple thread is expensive es the first time you do it. This cost is fairly fixed so even if your task is trivial, the overhead is relatively high.
One of the problems you have is that highly inefficient code is a very poor way to determine how well a solution performs. Also, how it runs the first time and how it runs after a few seconds can often be 100x different (can be much more) I suggest using an example which is already optimal and only then attempt to use multiple threads.
e.g.
long start = System.nanoTime();
int value = (int) (Math.random() * (target+1L));
long time = System.nanoTime() - value;
// don't time IO as it is sooo much slower
System.out.println(value);
Note: this will not be efficient until the code has warmuped and been compiled. i.e. ignore the first 2-5 seconds this code is run.
Following suggestions from various answers, I think I've fixed it. I'm not sure what the exact bottleneck was but on an i5-4590T the parallel version with the following code performs faster than the sequential variant. For brevity, I've included only the relevant parts of the (refactored) code:
static IntStream getComputation() {
return IntStream
.generate(() -> ThreadLocalRandom.current().nextInt(2147483647));
}
static void computeSequential(int target) {
for (int loop = 0; loop < target; loop++) {
final int result = getComputation()
.filter(i -> i <= target)
.findAny()
.getAsInt();
System.out.println(result);
}
}
static void computeParallel(int target) {
IntStream.range(0, target)
.parallel()
.forEach(loop -> {
final int result = getComputation()
.parallel()
.filter(i -> i <= target)
.findAny()
.getAsInt();
System.out.println(result);
});
}
EDIT: I should also note that I put it all in a loop to get longer running times.

Java : Issue with capturing execution time per iteration in a Map

I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.
Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);
As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.
I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?
You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}
I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.

Java - System.out effect on performance

I've seen this question and it's somewhat similar. I would like to know if it really is a big factor that would affect the performance of my application. Here's my scenario.
I have this Java webapp that can upload thousands of data from a Spreadsheet which is being read per row from top to bottom. I'm using System.out.println() to show on the server's side on what line the application is currently reading.
- I'm aware of creating a log file. In fact, I'm creating a log file and at the same time, displaying the logs on the server's prompt.
Is there any other way of printing the current data on the prompt?
I was recently testing with (reading and) writing large (1-1.5gb) text-files, and I found out that:
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(java.io.FileDescriptor.out), "UTF-8"), 512));
out.println(yourString);
//...
out.flush();
is in fact almost 250% faster than
System.out.println(yourString);
My test-program first read about 1gb of data, processed it a bit and outputted it in slightly different format.
Test results (on Macbook Pro, with SSD reading&writing using same disk):
data-output-to-system-out > output.txt => 1min32sec
data-written-to-file-in-java => 37sec
data-written-to-buffered-writer-stdout > output.txt => 36sec
I did try with multiple buffer sized between 256-10k but that didn't seem to matter.
So keep in mind if you're creating unix command-line tools with Java where output is meant to be directed or piped to somewhere else, don't use System.out directly!
It can have an impact on your application performance. The magnitude will vary depending on the kind of hardware you are running on and the load on the host.
Some points on which this can translate to performance wise:
-> Like Rocket boy stated, println is synchronized, which means you will be incurring in locking overhead on the object header and may cause thread bottlenecks depending on your design.
-> Printing on the console requires kernel time, kernel time means the cpu will not be running on user mode which basically means your cpu will be busy executing on kernel code instead of your application code.
-> If you are already logging this, that means extra kernel time for I/O, and if your platform does not support asynchronous I/O this means your cpu might become stalled on busy waits.
You can actually try and benchmark this and verify this yourself.
There are ways to getaway with this like for example having a really fast I/O, a huge machine for dedicated use maybe and biased locking on your JVM options if your application design will not be multithreaded on that console printing.
Like everything on performance, it all depends on your hardware and priorities.
System.out.println()
is synchronized.
public void println(String x) {
synchronized (this) {
print(x);
newLine();
}
If multiple threads write to it, its performance will suffer.
Yes, it will have a HUGE impact on performance. If you want a quantifiable number, well then there's plenty of software and/or ways of measuring your own code's performance.
System.out.println is very slow compared to most slow operations. This is because it places more work on the machine than other IO operations (and it is single threaded)
I suggest you write the output to a file and tail the output of this file. This way, the output will still be slow, but it won't slow down your web service so much.
Here's a very simple program to check performance of System.out.println and compare it with multiplication operation (You can use other operations or function specific to your requirements).
public class Main{
public static void main(String []args) throws InterruptedException{
long tTime = System.nanoTime();
long a = 123L;
long b = 234L;
long c = a*b;
long uTime = System.nanoTime();
System.out.println("a * b = "+ c +". Time taken for multiplication = "+ (uTime - tTime) + " nano Seconds");
long vTime = System.nanoTime();
System.out.println("Time taken to execute Print statement : "+ (vTime - uTime) + " nano Seconds");
}
}
Output depends on your machine and it's current state.
Here's what I got on : https://www.onlinegdb.com/online_java_compiler
a * b = 28782. Time taken for multiplication = 330 nano Seconds
Time taken to execute Print statement : 338650 nano Seconds
EDIT :
I have logger set up on my local machine so wanted to give you idea of performance difference between system.out.println and logger.info - i.e., performance comparison between console print vs logging
public static void main(String []args) throws InterruptedException{
long tTime = System.nanoTime();
long a = 123L;
long b = 234L;
long c = a*b;
long uTime = System.nanoTime();
System.out.println("a * b = "+ c +". Time taken for multiplication = "+ (uTime - tTime) + " nano Seconds");
long vTime = System.nanoTime();
System.out.println("Time taken to execute Print statement : "+ (vTime - uTime) + " nano Seconds");
long wTime = System.nanoTime();
logger.info("a * b = "+ c +". Time taken for multiplication = "+ (uTime - tTime) + " nano Seconds");
long xTime = System.nanoTime();
System.out.println("Time taken to execute log statement : "+ (xTime - wTime) + " nano Seconds");
}
Here's what I got on my local machine :
a * b = 28782. Time taken for multiplication = 1667 nano Seconds
Time taken to execute Print statement : 34080917 nano Seconds
2022-11-15 11:36:32.734 [] INFO CreditAcSvcImpl uuid: - a * b = 28782. Time taken for multiplication = 1667 nano Seconds
Time taken to execute log statement : 9645083 nano Seconds
Notice that system.out.println is taking almost 24 ms higher then the logger.info.

What's wrong with System.nanoTime?

I have a very long string with the pattern </value> at the very end, I am trying to test the performance of some function calls, so I made the following test to try to find out the answer... but I think I might be using nanoTime incorrectly? Because the result doesn't make sense no matter how I swap the order around...
long start, end;
start = System.nanoTime();
StringUtils.indexOf(s, "</value>");
end = System.nanoTime();
System.out.println(end - start);
start = System.nanoTime();
s.indexOf("</value>");
end = System.nanoTime();
System.out.println(end - start);
start = System.nanoTime();
sb.indexOf("</value>");
end = System.nanoTime();
System.out.println(end - start);
I get the following:
163566 // StringUtils
395227 // String
30797 // StringBuilder
165619 // StringBuilder
359639 // String
32850 // StringUtils
No matter which order I swap them around, the numbers will always be somewhat the same... What's the deal here?
From java.sun.com website's FAQ:
Using System.nanoTime() between various points in the code to perform elapsed time measurements should always be accurate.
Also:
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()
The differences between the two runs is in the order of microseconds and that is expected. There are many things going on on your machine which make the execution environment never the same between two runs of your application. That is why you get that difference.
EDIT: Java API says:
This method provides nanosecond precision, but not necessarily
nanosecond accuracy.
Most likely there's memory initialization issues or other things that happen at the JVM's startup that is skewing your numbers. You should get a bigger sample to get more accurate numbers. Play around with the order, run it multiple times, etc.
It is more than likely that the methods you check use some common code behind the scenes. But the JIT will do its work only after about 10.000 invocations. Hence, this could be the cause why your first two example seem to be always slower.
Quick fix: just do the 3 method calls before the first measuring on a long enoug string.

Categories

Resources