Is stream.max() always faster than stream.reduce()? [duplicate] - java

This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 5 years ago.
I have this following code snippet
// List of persons with name and age
List<Person> persons = new ArrayList<>();
// Adding 10,000 objects
for(int i = 0 ; i < 10000 ; i ++) {
Person p = new Person();
p.setName("Person " + i);
p.setAge(i);
persons.add(p);
}
long time1 = System.nanoTime();
System.out.println("Time before steram.reduce()" + time1);
Optional<Person> o1 = Optional<Person> o1 = persons.stream().reduce(BinaryOperator.maxBy(Comparator.comparingInt(p -> p.getAge())));
long time2 = System.nanoTime();
System.out.println(o1.get() + "\nTime after stream.reduce() " + time2);
System.out.println("**** Rough execution time for stream.reduce() : " + (time2 - time1) + " nano secs");
long time3 = System.nanoTime();
System.out.println("Time before stream.max() " + time3);
Optional<Person> o2 = persons.stream().max((p01, p02) -> p01.getAge() - p02.getAge());
long time4 = System.nanoTime();
System.out.println(o2.get() + "\nTime after stream.max() " + time4);
System.out.println("**** Rough execution time for stream.max() : " + (time4 - time3) + " nano secs");
While this might not be the ideal way to figure out execution time, basically what I am trying to do here is find the oldest Person and print out the time it took to find it out using stream.reduce() vs stream.max().
Output
Time before steram.reduce()8834253431112
[ Person 9999, 9999]
Time after stream.reduce() 8834346269743
**** Rough execution time for stream.reduce() : 92838631 nano secs
Time before stream.max() 8834346687875
[ Person 9999, 9999]
Time after stream.max() 8834350117000
**** Rough execution time for stream.max() : 3429125 nano secs
P.S. I have ran this code multiple times changing the order of stream.max() and stream.reduce() and found out that stream.reduce() takes significantly more time to produce the output than stream.max().
So is stream.max() always faster than stream.reduce()? If yes then, when should we use stream.reduce()?

The ReferencePipeline implementation of max looks as follows:
public final Optional<P_OUT> max(Comparator<? super P_OUT> comparator) {
return reduce(BinaryOperator.maxBy(comparator));
}
So any performance difference that you observe is just an artifact of the approach that you use for measuring the performance.
Or, more clearly: The answer is No, it is not "always faster".
Edit: Just for reference, here is a slightly adjusted version of your code. It does run the test for different numbers of elements, repeatedly. This is still not a real, reliable (Micro) Benchmark, but more reliable than running the whole thing only once:
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
public class MaxReducePerformance
{
public static void main(String[] args)
{
for (int n=500000; n<=5000000; n+=500000)
{
List<Person> persons = new ArrayList<>();
for (int i = 0; i < n; i++)
{
Person p = new Person();
p.setName("Person " + i);
p.setAge(i);
persons.add(p);
}
System.out.println("For " + n);
long time1 = System.nanoTime();
Optional<Person> o1 = persons.stream().reduce((p01, p02) ->
{
if (p01.getAge() < p02.getAge())
return p02;
return p01;
});
long time2 = System.nanoTime();
double d0 = (time2 - time1) / 1e9;
System.out.println("Reduce: "+d0+" seconds, " + o1);
long time3 = System.nanoTime();
Optional<Person> o2 =persons.stream().max(
(p01, p02) -> p01.getAge() - p02.getAge());
long time4 = System.nanoTime();
double d1 = (time4 - time3) / 1e9;
System.out.println("Max : "+d1+" seconds, " + o2);
}
}
}
class Person
{
String name;
int age;
void setName(String name)
{
this.name = name;
}
void setAge(int age)
{
this.age = age;
}
int getAge()
{
return age;
}
}
The output should show that the durations are basically equal.

Your reduce function evaluates getAge twice in each iteration, that's why the result might be slower depending on the compiler optimizations, restructure your code and check the result.
Also, Stream.max might benefit from built-in VM optimizations so you should always stick with built-in functions instead of implementing equivalent ones.

Related

Does entity name length impact program performance in case of reflection?

in the case of using reflection we are accessing entities by their names encoded in strings like this m = getMethod("someMethod"). To find the requested entity a string comparison has to be done. Does it mean that the length of the entity name influences the performance. If it so how much is this impact on the performance?
The answer is heavily dependent on the Java Virtual Machine, you're using. I wrote a test program, just to get some numbers for a JVM 1.8.0_05 (yes, it's old ;-):
import java.lang.reflect.Method;
public class ReflectionAccessTest {
public final static void main(String[] args) throws Exception {
for (int i = 0; i < 100000; i++) {
// do some "training"
ReflectionTarget.class.getMethod("a", Integer.TYPE, Integer.TYPE);
ReflectionTarget.class.getMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", Integer.TYPE, Integer.TYPE);
ReflectionTarget.class.getMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", Integer.TYPE, Integer.TYPE);
}
Method method = null;;
long start;
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
// do some "training"
method = ReflectionTarget.class.getMethod("a", Integer.TYPE, Integer.TYPE);
}
System.out.println("Time to get method with short name " + (System.currentTimeMillis() - start) + " ms");
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
method.invoke(null, Integer.MAX_VALUE, Integer.MIN_VALUE);
}
System.out.println("Time to execute method with short name " + (System.currentTimeMillis() - start) + " ms");
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
// do some "training"
method = ReflectionTarget.class.getMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", Integer.TYPE, Integer.TYPE);
}
System.out.println("Time to get method with medium name " + (System.currentTimeMillis() - start) + " ms");
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
method.invoke(null, Integer.MAX_VALUE, Integer.MIN_VALUE);
}
System.out.println("Time to execute method with medium name " + (System.currentTimeMillis() - start) + " ms");
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
// do some "training"
method = ReflectionTarget.class.getMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", Integer.TYPE, Integer.TYPE);
}
System.out.println("Time to get method with long name " + (System.currentTimeMillis() - start) + " ms");
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
method.invoke(null, Integer.MAX_VALUE, Integer.MIN_VALUE);
}
System.out.println("Time to execute method with long name " + (System.currentTimeMillis() - start) + " ms");
}
private static class ReflectionTarget {
public static void a(int a, int b) {
// do nothing
}
public static void aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(int a, int b) {
// do nothing
}
public static void aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa(int a, int b) {
// do nothing
}
}
}
The output is as follows:
Time to get method with short name 1012 ms
Time to execute method with short name 58 ms
Time to get method with medium name 3690 ms
Time to execute method with medium name 177 ms
Time to get method with long name 6279 ms
Time to execute method with long name 180 ms
The times are actually dependent on the length of the name (that surprised me first but on second thought it's obvious because there needs to be some kind of equaliy-test that is length-dependent).
But you can also see that the impact is negligible. A call of getMethod takes 0.1 nanoseconds for a method with a name with only one character and takes 0.6 nanoseconds for a method with a crazy long name (haven't counted the number of as).
If this difference is actually relevant for you, you might try out caching-mechanisms of the method you retrieved. But dependent on the time the called method takes, that might be completely useless unless its execution time is also in the range of sub-nanoseconds.

Same code block takes different time duration to excute

I was trying to check the time of execution with similar blocks.
Sample code and output are below,
public class Tester {
public static void main(String[] args) {
System.out.println("Run 1");
List<Integer> list = new ArrayList<>();
int i = 0;
long st = System.currentTimeMillis();
while (++i < 10000) {
list.add(i);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - st));
System.out.println("Run 2");
int j = 0;
List<Integer> list2 = new ArrayList<>();
long ST = System.currentTimeMillis();
while (++j < 10000) {
list2.add(j);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - ST));
System.out.println("Run 3");
int k = 0;
List<Integer> list3 = new ArrayList<>();
long ST2 = System.currentTimeMillis();
while (++k < 10000) {
list3.add(k);
}
System.out.println("Time taken :" + (System.currentTimeMillis() - ST2));
}
}
Output
Run 1
Time taken :6
Run 2
Time taken :3
Run 3
Time taken :1
Why am I getting different time of execution?
This is probably to just-in-time compilation and hotspot optimizing on the array list, but you cannot be 100% sure.
Apart from that, your sample size is much too small to be significant.
a) Since the java code is compiled to bytecode some optimizations are done to your code, anyway this might not have something to do with your observations
b) Each subsequent similar operation has better execution time until the jvm is "warmed up" for that operation, due to JVM lazy loading or CPU caching for example.
c) If you want to try benchmarking check out Java Microbenchmark harness (JMH)

System.nanoTime vs System.currentTimeMillis

According to its documentation, System.nanoTime returns
nanoseconds since some fixed but arbitrary origin time. However, on all x64 machines I tried the code below, there were time jumps, moving that fixed origin time around. There may be some flaw in my method to acquire the correct time using an alternative method (here, currentTimeMillis). However, the main purpose of measuring relative times (durations) is negatively affected, too.
I came across this problem trying to measure latencies when comparing different queues to LMAX's Disruptor where I got very negative latencies sometimes. In those cases, start and end timestamps were created by different threads, but the latency was computed after those threads had finished.
My code here takes time using nanoTime, computes the fixed origin in currentTimeMillis time, and compares that origin between calls. And since I must ask a question here: What is wrong with this code? Why does it observe violations of the fixed origin contract? Or does it not?
import java.text.*;
/**
* test coherency between {#link System#currentTimeMillis()} and {#link System#nanoTime()}
*/
public class TimeCoherencyTest {
static final int MAX_THREADS = Math.max( 1, Runtime.getRuntime().availableProcessors() - 1);
static final long RUNTIME_NS = 1000000000L * 100;
static final long BIG_OFFSET_MS = 2;
static long startNanos;
static long firstNanoOrigin;
static {
initNanos();
}
private static void initNanos() {
long millisBefore = System.currentTimeMillis();
long millisAfter;
do {
startNanos = System.nanoTime();
millisAfter = System.currentTimeMillis();
} while ( millisAfter != millisBefore);
firstNanoOrigin = ( long) ( millisAfter - ( startNanos / 1e6));
}
static NumberFormat lnf = DecimalFormat.getNumberInstance();
static {
lnf.setMaximumFractionDigits( 3);
lnf.setGroupingUsed( true);
};
static class TimeCoherency {
long firstOrigin;
long lastOrigin;
long numMismatchToLast = 0;
long numMismatchToFirst = 0;
long numMismatchToFirstBig = 0;
long numChecks = 0;
public TimeCoherency( long firstNanoOrigin) {
firstOrigin = firstNanoOrigin;
lastOrigin = firstOrigin;
}
}
public static void main( String[] args) {
Thread[] threads = new Thread[ MAX_THREADS];
for ( int i = 0; i < MAX_THREADS; i++) {
final int fi = i;
final TimeCoherency tc = new TimeCoherency( firstNanoOrigin);
threads[ i] = new Thread() {
#Override
public void run() {
long start = getNow( tc);
long firstOrigin = tc.lastOrigin; // get the first origin for this thread
System.out.println( "Thread " + fi + " started at " + lnf.format( start) + " ns");
long nruns = 0;
while ( getNow( tc) < RUNTIME_NS) {
nruns++;
}
final long runTimeNS = getNow( tc) - start;
final long originDrift = tc.lastOrigin - firstOrigin;
nruns += 3; // account for start and end call and the one that ends the loop
final long skipped = nruns - tc.numChecks;
System.out.println( "Thread " + fi + " finished after " + lnf.format( nruns) + " runs in " + lnf.format( runTimeNS) + " ns (" + lnf.format( ( double) runTimeNS / nruns) + " ns/call) with"
+ "\n\t" + lnf.format( tc.numMismatchToFirst) + " different from first origin (" + lnf.format( 100.0 * tc.numMismatchToFirst / nruns) + "%)"
+ "\n\t" + lnf.format( tc.numMismatchToLast) + " jumps from last origin (" + lnf.format( 100.0 * tc.numMismatchToLast / nruns) + "%)"
+ "\n\t" + lnf.format( tc.numMismatchToFirstBig) + " different from first origin by more than " + BIG_OFFSET_MS + " ms"
+ " (" + lnf.format( 100.0 * tc.numMismatchToFirstBig / nruns) + "%)"
+ "\n\t" + "total drift: " + lnf.format( originDrift) + " ms, " + lnf.format( skipped) + " skipped (" + lnf.format( 100.0 * skipped / nruns) + " %)");
}};
threads[ i].start();
}
try {
for ( Thread thread : threads) {
thread.join();
}
} catch ( InterruptedException ie) {};
}
public static long getNow( TimeCoherency coherency) {
long millisBefore = System.currentTimeMillis();
long now = System.nanoTime();
if ( coherency != null) {
checkOffset( now, millisBefore, coherency);
}
return now - startNanos;
}
private static void checkOffset( long nanoTime, long millisBefore, TimeCoherency tc) {
long millisAfter = System.currentTimeMillis();
if ( millisBefore != millisAfter) {
// disregard since thread may have slept between calls
return;
}
tc.numChecks++;
long nanoMillis = ( long) ( nanoTime / 1e6);
long nanoOrigin = millisAfter - nanoMillis;
long oldOrigin = tc.lastOrigin;
if ( oldOrigin != nanoOrigin) {
tc.lastOrigin = nanoOrigin;
tc.numMismatchToLast++;
}
if ( tc.firstOrigin != nanoOrigin) {
tc.numMismatchToFirst++;
}
if ( Math.abs( tc.firstOrigin - nanoOrigin) > BIG_OFFSET_MS) {
tc.numMismatchToFirstBig ++;
}
}
}
Now I made some small changes. Basically, I bracket the nanoTime calls between two currentTimeMillis calls to see if the thread has been rescheduled (which should take more than currentTimeMillis resolution). In this case, I disregard the loop cycle. Actually, if we know that nanoTime is sufficiently fast (as on newer architectures like Ivy Bridge), we can bracket in currentTimeMillis with nanoTime.
Now the long >10ms jumps are gone. Instead, we count when we get more than 2ms away from first origin per thread. On the machines I have tested, for a runtime of 100s, there are always close to 200.000 jumps between calls. It is for those cases that I think either currentTimeMillis or nanoTime may be inaccurate.
As has been mentioned, computing a new origin each time means you are subject to error.
// ______ delay _______
// v v
long origin = (long)(System.currentTimeMillis() - System.nanoTime() / 1e6);
// ^
// truncation
If you modify your program so you also compute the origin difference, you'll find out it's very small. About 200ns average I measured which is about right for the time delay.
Using multiplication instead of division (which should be OK without overflow for another couple hundred years) you'll also find that the number of origins computed that fail the equality check is much larger, about 99%. If the reason for error is because of the time delay, they would only pass when the delay happens to be identical to the last one.
A much simpler test is to accumulate elapsed time over some number of subsequent calls to nanoTime and see if it checks out with the first and last calls:
public class SimpleTimeCoherencyTest {
public static void main(String[] args) {
final long anchorNanos = System.nanoTime();
long lastNanoTime = System.nanoTime();
long accumulatedNanos = lastNanoTime - anchorNanos;
long numCallsSinceAnchor = 1L;
for(int i = 0; i < 100; i++) {
TestRun testRun = new TestRun(accumulatedNanos, lastNanoTime);
Thread t = new Thread(testRun);
t.start();
try {
t.join();
} catch(InterruptedException ie) {}
lastNanoTime = testRun.lastNanoTime;
accumulatedNanos = testRun.accumulatedNanos;
numCallsSinceAnchor += testRun.numCallsToNanoTime;
}
System.out.println(numCallsSinceAnchor);
System.out.println(accumulatedNanos);
System.out.println(lastNanoTime - anchorNanos);
}
static class TestRun
implements Runnable {
volatile long accumulatedNanos;
volatile long lastNanoTime;
volatile long numCallsToNanoTime;
TestRun(long acc, long last) {
accumulatedNanos = acc;
lastNanoTime = last;
}
#Override
public void run() {
long lastNanos = lastNanoTime;
long currentNanos;
do {
currentNanos = System.nanoTime();
accumulatedNanos += currentNanos - lastNanos;
lastNanos = currentNanos;
numCallsToNanoTime++;
} while(currentNanos - lastNanoTime <= 100000000L);
lastNanoTime = lastNanos;
}
}
}
That test does indicate the origin is the same (or at least the error is zero-mean).
As far as I know the method System.currentTimeMillis() makes indeed sometimes jumps, dependent on the underlying OS. I have observed this behaviour myself sometimes.
So your code gives me the impression you try to get the offset between System.nanoTime() and System.currentTimeMillis() repeated times. You should rather try to observe this offset by calling System.currentTimeMillis() only once before you can say that System.nanoTimes() causes sometimes jumps.
By the way, I will not pretend that the spec (javadoc describes System.nanoTime() related to some fixed point) is always perfectly implemented. You can look on this discussion where multi-core CPUs or changes of CPU-frequencies can negatively affect the required behaviour of System.nanoTime(). But one thing is sure. System.currentTimeMillis() is far more subject to arbitrary jumps.

Testing code to get the average time for the calls

This is my code, I am trying to test what's the average time to make a call to getLocationIp method by passing ipAddress in that. So what I did is that, I am generating some random ipAddress and passing that to getLocationIp and then calculating the time difference. And then putting that to HashMap with there counts. And after wards I am priniting the hash map to see
what's the actual count. So this is the right way to test this? or there is some other way. Becuase in my case I am not sure whether my generateIPAddress method generates random ipAddress everytime. I am also having start_total time before entering the loop and then end_total time after everything gets completed. So on that I can calculate the average time?
long total = 10000;
long found = 0;
long found_country = 0;
long runs = total;
Map<Long, Long> histgram = new HashMap<Long, Long>();
try {
long start_total = System.nanoTime();
while(runs > 0) {
String ipAddress = generateIPAddress();
long start_time = System.nanoTime();
resp = GeoLocationService.getLocationIp(ipAddress);
long end_time = System.nanoTime();
long difference = (end_time - start_time)/1000000;
Long count = histgram.get(difference);
if (count != null) {
count++;
histgram.put(Long.valueOf(difference), count);
} else {
histgram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
long end_total = System.nanoTime();
long finalTotal = (end_total - start_total)/1000000;
float avg = (float)(finalTotal) / total;
Set<Long> keys = histgram.keySet();
for (Long key : keys) {
Long value = histgram.get(key);
System.out.println("$$$GEO OPTIMIZE SVC MEASUREMENT$$$, HG data, " + key + ":" + value);
}
This is my generateIpAddress method-
private String generateIPAddress() {
Random r = new Random();
String s = r.nextInt(256) + "." + r.nextInt(256) + "." + r.nextInt(256) + "." + r.nextInt(256);
return s;
}
Any suggestions will be appreciated.
Generally when you benchmark functions you want to run the multiple times and average the results That gives you are clearer indication of the actual time your program will spend in them, considering that you rarely care about the performance of something only run once.

Stringtemplate low performance in comparison to Velocity and Mvel

I am trying to do some evaluation of template frameworks.
For a simple performance test I'm using these templates
private static String mvelTemplate = "Hello, my name is #{name},"
+ " #foreach{user : group.users} - #{user.id} - #{user.name} "
+ " #end{}";
private static String velocityTemplate = "Hello, my name is ${name},"
+ "#foreach($user in $group.users) - ${user.id} - ${user.name} #end " ;
private static String stringTemplate = "Hello, my name is <name>,"
+ "<group.users:{x| - <x.id> - <x.name>}> ";
// the group has 20 users
// 'Java' uses plain StringBuffer
The part of Stringtemplate is
ST st = new ST(stringTemplate);
for (Map.Entry<String, Object> entry : vars.entrySet()) {
st.add(entry.getKey(),entry.getValue());
}
start = System.currentTimeMillis();
for (int n = 0; n < 10000; n ++) {
st.render();
}
end = System.currentTimeMillis();
And the results are
Mvel.Compiled elapsed:68ms. ~147K per second
Velocity Cache elapsed:183ms. ~54K per second
StringTemplate elapsed:234ms. ~42K per second
Java elapsed:21ms. ~476K per second
Since I have no idea of string template, here is my question:
Is StringTemplate really that slow or is there an other (faster) way to render a template with it.
Update:
vars looks like this:
Map<String,Object> vars = new HashMap<String,Object>();
Group g = new Group("group1");
for (int i = 0; i < 20; i++) {
g.addUser(new User(i, "user" + i));
}
vars.put("group", g);
vars.put("name", "john");
now with 1.000.000 iterations per template and looped the whole benchmark 10 times
Mvel.Compiled elapsed:7056ms. ~141K per second
Velocity Cache elapsed:18239ms. ~54K per second
StringTemplate elapsed:22926ms. ~43K per second
Java elapsed:2182ms. ~458K per second
part of what you are observing is likely a compiler warm-up issue. When I run the test I enclose below 10000, it takes 350ms on my computer. when I increased to 100,000 it takes 1225ms, which is only 3.5x more time not 10x more time. when I run it 1,000,000 I get 8397ms which is only about 7x the cost and time when it should be 10x. Clearly the compiler is doing something interesting here with optimization. For a long-running program, I would expect ST to do better in your tests. The garbage collector could also be doing something here. Try your examples with bigger loop lengths.
Anyway, speed was not my first priority with ST v4, but thank you for pointing this out. I will probably look into optimizing at some point. I don't think I've even run a profiler on it.
import org.stringtemplate.v4.*;
import java.util.*;
public class T {
public static class User {
public int id;
public String name;
public User(int id, String name) {
this.id= id;
this.name = name;
}
}
private static String stringTemplate = "Hello, my name is <name>,"
+ "<users:{x| - <x.id> - <x.name>}> ";
public static void main(String[] args) {
ST st = new ST(stringTemplate);
List<User> users = new ArrayList<User>();
for (int i=1; i<=5; i++) {
users.add(new User(i, "bob"+i));
}
st.add("users", users);
st.add("name", "tjp");
long start = System.currentTimeMillis();
for (int n = 0; n < 1000000; n ++) {
st.render();
}
long end = System.currentTimeMillis();
System.out.printf("%d ms\n", end-start);
}
}

Categories

Resources