I'm happy enough with Guava on Java 8 - are there any performance benefits or pitfalls in migrating to Streams for sequential code?
I've started a project on GitHub to play with this.
Initial results are surprisingly positive for Streams - for an identity map over strings, Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) on 2014 MacBookAir 1.7Ghz i7, code version https://github.com/dmcg/iterables-v-streams#ea8498ee0627fc59834001a837fa92fba4bcf47ebcf47e
Experiment selection:
Benchmark Methods: [guava, iterate, streams]
Instruments: [allocation, runtime]
User parameters: {}
Virtual machines: [default]
Selection type: Full cartesian product
This selection yields 6 experiments.
Trial Report (1 of 6):
Experiment {instrument=allocation, benchmarkMethod=iterate, vm=default, parameters={}}
Results:
bytes(B): min=4072.00, 1st qu.=4072.00, median=4072.00, mean=4072.00, 3rd qu.=4072.00, max=4072.00
objects: min=3.00, 1st qu.=3.00, median=3.00, mean=3.00, 3rd qu.=3.00, max=3.00
Trial Report (2 of 6):
Experiment {instrument=allocation, benchmarkMethod=guava, vm=default, parameters={}}
Results:
bytes(B): min=15104.00, 1st qu.=15104.00, median=15104.00, mean=15104.00, 3rd qu.=15104.00, max=15104.00
objects: min=17.00, 1st qu.=17.00, median=17.00, mean=17.00, 3rd qu.=17.00, max=17.00
Trial Report (3 of 6):
Experiment {instrument=allocation, benchmarkMethod=streams, vm=default, parameters={}}
Results:
bytes(B): min=15272.00, 1st qu.=15272.00, median=15272.00, mean=15527.64, 3rd qu.=15432.00, max=17252.80
objects: min=20.00, 1st qu.=20.00, median=20.00, mean=25.00, 3rd qu.=26.00, max=53.00
Trial Report (4 of 6):
Experiment {instrument=runtime, benchmarkMethod=guava, vm=default, parameters={}}
Results:
runtime(ns): min=13365.32, 1st qu.=13660.61, median=13802.51, mean=13961.91, 3rd qu.=14445.46, max=14715.34
Trial Report (5 of 6):
Experiment {instrument=runtime, benchmarkMethod=iterate, vm=default, parameters={}}
Results:
runtime(ns): min=9952.47, 1st qu.=10892.64, median=11372.35, mean=11243.07, 3rd qu.=11785.48, max=12024.76
Trial Report (6 of 6):
Experiment {instrument=runtime, benchmarkMethod=streams, vm=default, parameters={}}
Results:
runtime(ns): min=10527.26, 1st qu.=11051.70, median=11747.29, mean=11631.15, 3rd qu.=12205.97, max=12581.39
Collected 81 measurements from:
2 instrument(s)
2 virtual machine(s)
3 benchmark(s)
Execution complete: 1.188 min.
Results have been uploaded. View them at: https://microbenchmarks.appspot.com/runs/d2c7f83b-2cfa-4217-ab0b-e8d506eaa85c
I'm still getting my head around Google calipers, but it seems to suggest that Streams are faster than Guava, and not much slower than a straight for loop.
Related
Current stats:
Generating a .xls file containing 3 tables, each of one containing 100K rows x 3 columns took ~24s with the following breakdown:
building the JRXlsExporter - ~6s
JRXlsExporter.exportReport() - ~18s
The code was executed on the following hardware: i7 6core CPU, 16GB RAM, SSD storage. The maximum heap size for the JVM was set to 4GB.
Adding a snippet just to give you some insight on how I'm adding multiple tables. I'm using in-memory datasources for populating the tables.
DynamicReportBuilder drb = new DynamicReportBuilder();
...
DynamicReport subReport = this.createSubreport(report.getSubReports()[i]);
String dsName = String.valueOf(i);
drb.addConcatenatedReport(subReport, new ClassicLayoutManager(),
dsName, DJConstants.DATA_SOURCE_ORIGIN_PARAMETER,
DJConstants.DATA_SOURCE_TYPE_COLLECTION, false);
...
DynamicJasper version:
dynamicjasper: 5.1.0
jasperreports: 6.3.0
dynamicjasper core fonts: 1.0
So far tried the following options:
played with virtualization: no virtualization, JRSwapFileVirtualizer, JRGzipVirtualizer and JRFileVirtualizer. Best times have been registered using JRSwapFileVirtualizer (this is the one used for the provided stats)
tried changing net.sf.jasperreports.text.measurer.factory with com.jaspersoft.jasperserver.api.engine.jasperreports.util.SingleLineTextMeasurerFactory [1] but wasn't able to find any source code for SingleLineTextMeasurerFactory .
[1] https://community.jaspersoft.com/wiki/how-modify-jasperreportsproperties-file-disable-multi-line-data-processing
I want to use this function here is the code on github to sample my dataset in 2 parts 90% traning data set ( for example) and 10% (the rest) are the test ( for example tried this code :
library(XLConnect)
library(readxl)
library(xlsx)
library(readxl)
ybi <- read_excel("D:/ii.xls")
#View(ybi)
test= stratified(ybi, 8, .1)
no= (test$ID_unit) # to get indices of the testdataset samples
train = ybi [-no,] # the indices for training data
write.xlsx(train,"D:/mm.xlsx",sheetName = "Newdata")
in fact my data have 8 attributes and 65534 row.
I have selected by the code above just 10% based on the 8 eigth attribute which is the class it gives me without any problm the test set but not the training data ther error is on the figure (joined)error
how to fix it!
It looks like you JVM has no enough memory allocated for the heap.
As a quick fix, export system variable _JAVA_OPTIONS
export _JAVA_OPTIONS="-Xmx8G -Xms1G -Xcheck:jni"
you can also use:
options(java.parameters = "-Xmx8G")
and set -Xmx to a value that will make R happy.
I am a very proficient C# developer, but need to start writing code that works on the JVM. The Java language is feature poor compared to C# these days, so I was interested in the features that Scala offers.
However, when hearing that in Scala, all operators are simply methods, I became suspicious of the performance impact that would have on math-heavy computations (which is important for the types of applications my team writes)
So I ran some simple int based tests, and find that Scala is about 30x slower than the equivalent Java code. Not good! Can anyone tell me what I'm doing wrong? or how to improve the computational performance of the scala example to be on par with Java?
UPDATE1: as pointed out by the first two answers, I was being a super-noob and running this in the IntelliJ IDE. I don't know how to run the scala app via the java command line, which may be an IntelliJ issue. Thanks for the help guys, I'll need to investigate simple commandline execution of scala before I continue with perf testing, as the IDE given results are obviously too inaccurate.
UPDATE2: Luigi in the comments says in IntelliJ he gets equal times, so it seems that my wild difference isn't due to IntelliJ? Any other ideas on what this could be? I'll try getting this running via command line and post an update with my results.
UPDATE3:
after running this via commandline, I get the same 30x perf difference.
My computer is a 3core AMD x64 3.4Ghz, running J2SE 6 jdk 64bit 1.6.0_31, Window7.
Here are my runtimes:
Java: 210ms.
Scala: between 2000 and 7400ms (generally the 7000 range)
so, i suppose the question is still open. why is scala running so slow on my platform? something with the java 64bit runtime, or with Java 6?
runtime versions:
C:\Users\jason>java -showversion
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
C:\Users\jason>scala
Welcome to Scala version 2.9.1-1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31).
UPDATE 4 while my original test has a 30x difference, increasing the iterations to 100000000 causes the difference to shrink to about 33%, so it seems scala was still being dominated by some unknown initialization cost on my machine. i'll close this with the highest rated answer as i don't think we'll find out the performance problem, due to no one except me seeing the issue :(
*UPDATE 5, SOLUTION: based on the help from the 2 answers i got, i figured out the problem, see my answer below for more details (summary: the first call to System.nanoTime() takes a long time) *
Here are my sample apps:
//scala
object HelloWorld {
//extends Application {
def main(args: Array[String]) {
println("hello scala")
var total: Long = 0
var i: Long = 0
var x: Long=0;
//warm up of the JVM to avoid timing of runtime initialization
while (i < 100000)
{
x=i;
x += x - 1;
x -= x + 1;
x += 1;
x -= 1;
total += x;
i+=1;
}
//reset variables
total = 0
i = 0;
//start timing
var start: Long = System.nanoTime
//run test
while (i < 100000) {
x=i;
x += x - 1;
x -= x + 1;
x += 1;
x -= 1;
total += x;
i+=1;
}
var end: Long = System.nanoTime
System.out.println("ms, checksum = ")
System.out.println((end - start) / 1000)
System.out.println(total)
}
}
and here is the java equivalent, 30x faster
//java
public class app {
public static void main(String[] args)
{
String message = "hello, java";
System.out.println(message);
long total = 0;
//warm up of the JVM to avoid timing of runtime initialization
for(long i=0;i< 100000;i++)
{
long x=i;
x+=x-1;
x-=x+1;
x++;
x--;
total+=x;
}
//reset variables
total = 0;
//start timing and run test
long start = System.nanoTime();
for(long i=0;i< 100000;i++)
{
long x=i;
x+=x-1;
x-=x+1;
x++;
x--;
total+=x;
}
long end = System.nanoTime();
System.out.println("ms, checksum = ");
System.out.println((end-start)/1000);
System.out.println(total);
}
}
So, I guess I figured out the answer myself.
The problem is in the call to System.nanoTime. Doing this has some initialization cost (loading up the Java base libraries, etc) which is much less expensive to load when called from the Java runtime than from the Scala runtime.
I prove this by changing the initial value of total, instead setting it to
var total: Long = System.nanoTime()
This is added before the first "warm up" loop, and doing so now makes both versions of the app (Java and Scala) run at the same time: about 2100 for 1000000 iterations.
Thanks for your guys' help on this, I wouldn't have figured this out without your assistance.
ps: I'll leave the "accepted answer" as-is because I wouldn't have tracked this down without his help.
I've re-run your code (and increased number of cycles x1000, so to get some meaning into benchmark).
Results:
Scala: 92 ms
Java: 59 ms
You can see that Java is 30% faster.
Looking at the bytecode, I can say that two versions are almost identical - so the difference is really strange (the bytecode listing is quite long, so I won't post it here).
Increasing the count x10000 gives this:
Scala: 884 ms
Java: 588 ms
Since the results are fairly stable, there should be some constant factor lurking somewhere. Maybe in some parameters that "scala" runner passes to JVM?
EDIT:
My configuration:
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
$ scala -version
Scala code runner version 2.9.0.1 -- Copyright 2002-2011, LAMP/EPFL
$ inxi -SCD
System: Host the-big-maker Kernel 2.6.35-22-generic x86_64 (64 bit) Distro Linux Mint 10 Julia
CPU: Quad core AMD Phenom II X4 965 (-MCP-) cache 2048 KB flags (lm nx sse sse2 sse3 sse4a svm)
Clock Speeds: (1) 800.00 MHz (2) 800.00 MHz (3) 800.00 MHz (4) 800.00 MHz
Disks: HDD Total Size: 750.2GB (5.8% used) 1: /dev/sda OCZ 90.0GB
2: /dev/sdb ST3500413AS 500.1GB 3: /dev/sdc ST3802110A 80.0GB
4: /dev/sdd Maxtor_6Y080M0 80.0GB
$ javac app.java
$ scalac app.scala
$ scala HelloWorld
hello scala
ms, checksum =
1051
-100000
$ java app
hello, java
ms, checksum =
1044
-100000
What I'm doing wrong?
We need to implement an application for evaluating results of an online programming challenge. The users will implement the programming challenge and compile their source through a web interface. We are supposed to compile the submitted sources on the fly and present some statistics of the program like expected memory consumption and possible performance indicators of the sources. Does anybody know how can we gather memory consumption and performance indicators of the program statically from the sources?
While you could possibly do static analysis of the source to infer performance characteristics, I suspect it would be far simpler to just run a JUnit test suite over the code.
If you can present your challenge as a code stub or interface, you should be able to create a suitable JUnit suite which validates correctness and tests performance.
Granted, JUnit may not be the best way of running performance tests but you can likely bend it to the task. Alternatively you could look at JMeter or something similar.
Found something very useful. I am not sure if this is what I am looking for. I am yet to analyse the results. But this is quite interesting.
We can gather some performance statistics using the HPROF profiler agent shipped with the JDK release. The good thing is that it can be run during the compilation to produce some interesting statistics on the code beign compiled. Following are some samples. More details can be found at http://download.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/tooldescr.html#gbluz
$ javac -J-agentlib:hprof=heap=sites Hello.java
SITES BEGIN (ordered by live bytes) Wed Oct 4 13:13:42 2006
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 44.13% 44.13% 1117360 13967 1117360 13967 301926 java.util.zip.ZipEntry
2 8.83% 52.95% 223472 13967 223472 13967 301927 com.sun.tools.javac.util.List
3 5.18% 58.13% 131088 1 131088 1 300996 byte[]
4 5.18% 63.31% 131088 1 131088 1 300995 com.sun.tools.javac.util.Name[]
$ javac -J-agentlib:hprof=heap=dump Hello.java
HEAP DUMP BEGIN (39793 objects, 2628264 bytes) Wed Oct 4 13:54:03 2006
ROOT 50000114 (kind=<thread>, id=200002, trace=300000)
ROOT 50000006 (kind=<JNI global ref>, id=8, trace=300000)
ROOT 50008c6f (kind=<Java stack>, thread=200000, frame=5)
:
CLS 50000006 (name=java.lang.annotation.Annotation, trace=300000)
loader 90000001
OBJ 50000114 (sz=96, trace=300001, class=java.lang.Thread#50000106)
name 50000116
group 50008c6c
contextClassLoader 50008c53
inheritedAccessControlContext 50008c79
blockerLock 50000115
OBJ 50008c6c (sz=48, trace=300000, class=java.lang.ThreadGroup#50000068)
name 50008c7d
threads 50008c7c
groups 50008c7b
ARR 50008c6f (sz=16, trace=300000, nelems=1,
elem type=java.lang.String[]#5000008e)
[0] 500007a5
CLS 5000008e (name=java.lang.String[], trace=300000)
super 50000012
loader 90000001
:
HEAP DUMP END
$ javac -J-agentlib:hprof=cpu=times Hello.java
CPU TIME (ms) BEGIN (total = 2082665289) Wed oct 4 13:43:42 2006
rank self accum count trace method
1 3.70% 3.70% 1 311243 com.sun.tools.javac.Main.compile
2 3.64% 7.34% 1 311242 com.sun.tools.javac.main.Main.compile
3 3.64% 10.97% 1 311241 com.sun.tools.javac.main.Main.compile
4 3.11% 14.08% 1 311173 com.sun.tools.javac.main.JavaCompiler.compile
5 2.54% 16.62% 8 306183 com.sun.tools.javac.jvm.ClassReader.listAll
6 2.53% 19.15% 36 306182 com.sun.tools.javac.jvm.ClassReader.list
7 2.03% 21.18% 1 307195 com.sun.tools.javac.comp.Enter.main
8 2.03% 23.21% 1 307194 com.sun.tools.javac.comp.Enter.complete
9 1.68% 24.90% 1 306392 com.sun.tools.javac.comp.Enter.classEnter
10 1.68% 26.58% 1 306388 com.sun.tools.javac.comp.Enter.classEnter
...
CPU TIME (ms) END
I was curious about the runhprof output? I am mainly concerned about the memory section. It looks like there are multiple entries of the same class. Why would that be.
Is there a way to get hprof to print how much memory a particular class(the instances of that class) take up in memory. One value for each class.
Also, what tools do you use beside 'hat' to analyze the output?
I ran the java command with jvm arg:
-Xrunhprof:heap=sites,depth=4,format=a,file=prof/hprof_dump.txt
Here is brief snippet of the output. Some classes are listed multiple times in the output.
SITES BEGIN (ordered by live bytes) Tue Jul 28 19:33:41 2009
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 29.75% 29.75% 700080 43755 576000016 36000001 307483 java.lang.Double
2 7.13% 36.88% 167840 5245 370432 11576 300993 clojure.lang.PersistentHashMap$LeafNode
3 2.09% 38.98% 49296 2054 60048 2502 301295 clojure.lang.Symbol
4 2.09% 41.07% 49200 3 49200 3 301071 char[]
5 1.33% 42.40% 31344 1306 68088 2837 300998 clojure.lang.PersistentHashMap$BitmapIndexedNode
6 1.10% 43.50% 25800 645 25800 645 301050 clojure.lang.Var
7 1.05% 44.54% 24624 3 24624 3 301069 byte[]
8 0.86% 45.40% 20184 841 49608 2067 301003 clojure.lang.PersistentHashMap$INode[]
9 0.78% 46.18% 18304 572 58720 1835 301308 clojure.lang.PersistentList
10 0.75% 46.93% 17568 549 17568 549 308832 java.lang.String[]
11 0.70% 47.62% 16416 2 16416 2 301036 byte[]
Eclipse Memory Analyzer is excellent. Loads the dump file up very very quickly, produces lots of nice reports about the heapdump, lets you query the dump for objects/classes using a SQL-like language. Love it.