I try to do a performance test with Java between several serialization formats including avro/protobuf/thrift and etc.
Test bases on deserializing a byte array message having 30 long type fields for 1,000,000 times.
The result for avro is not good.
protobuf/thrift uses around 2000 milliseconds in average, but it takes 9000 milliseconds for avro.
In the document it advice to reuse decoder, so I do the code as follow.
byte[] bytes = readFromFile("market.avro");
long begin = System.nanoTime();
DatumReader<Market> userDatumReader = new ReflectDatumReader<>(Market.class);
InputStream inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder reuse = DecoderFactory.get().binaryDecoder(inputStream, null);
Market marketReuse = new Market();
for (int i = 0; i < loopCount; i++) {
inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(inputStream, reuse);
userDatumReader.read(marketReuse, decoder);
long end = System.nanoTime() - begin;
System.out.println("avro loop " + loopCount + " times: " + (end * 1d / 1000 / 1000));
I think avro should not be that slow, so I believe I do something wrong, but I am not sure what's the point. Do I make the 'reuse' in a wrong way?
Is there any advice for avro performance testing? Thanks in advance.
Took me a while to figure this one out. But apparently
DecoderFactory.get().binaryDecoder is the culprit - it creates a buffer of 8KB every time it is invoked. And this buffer is not re-used, but reallocated on every invocation. I don't see any reason why there is a buffer involved in the first place.
The saner alternative is to use DecoderFactory.get().directBinaryDecoder
I am trying to convert an image to a byte array so that I can transfer it over the network for further processing.
Now in C# following code does the job in about 3 or 2 milliseconds.
Image image = Image.FromFile("D:/tst.jpg");
DateTime pre = DateTime.Now;
int sz;
using (MemoryStream sourceImageStream = new MemoryStream())
image.Save(sourceImageStream, System.Drawing.Imaging.ImageFormat.Jpeg);
byte[] sourceImageData = sourceImageStream.ToArray();
sz = sourceImageData.Count();
MessageBox.Show("Size " + sz + " time : " + (DateTime.Now - pre).TotalMilliseconds);
Size 268152 time : 3.0118
But in Java doing the same as below takes way too much time.
BuffredImage image = ImageIO.read(new File("D:/tst.jpg"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Instant pre = Instant.now();
ImageIO.write( image, "jpeg", baos );
Instant now = Instant.now();
System.out.println("Size " + baos.size() + " time : " + ChronoUnit.MILLIS.between(pre, now));
Size 268167 time : 91.0
The source image is a JPG image. In C# when using png compressing. time was around 90ms. So my guess is that Java is taking time to somehow still compress the same JPG image. Image dimension is 2048 * 1536.
Java is frustratingly slow here. How can I get rid of this problem in Java?
Take this image into consideration.
Size 1987059 time : 11.0129
Size 845093 time : 155.0
The source image is 1987059 bytes (which is same as C# encoded byte array). But in Java it is compressed to 845093 bytes. I have tried setting the compression quality to 1f like this but it didn't help to reduce the time.
The main problem with this kind of testing is pointed out in the first comment: This is a micro-benchmark. If you run that code only once in Java, you'll mostly measure the time taken to initialize the run-time, class loading and initialisatizion.
Here's a slightly modified version of your code (I originally wrote this as an answer to your follow-up question that is now closed as a duplicate, but the same concept applies), that at least includes a warm-up time. And you'll see that there's a quite a difference in the measurments. On my 2014 MacBook Pro, the output is:
Initial load time 415 ms (5)
Average warm up load time 73 ms (5)
Normal load time 65 ms (5)
As you see, the "normal" time to load an image, is a lot less than the initial time, which includes a lot of overhead.
public class TestJPEGSpeed {
public static void main(String[] args) throws IOException {
File input = new File(args[0]);
test(input, 1, "Initial");
test(input, 100, "Average warm up");
test(input, 1, "Normal");
private static void test(File input, int runs, final String type) throws IOException {
BufferedImage image = null;
long start = System.currentTimeMillis();
for (int i = 0; i < runs; i++) {
image = ImageIO.read(input);
long stop = System.currentTimeMillis();
System.out.println(type + " load time " + ((stop - start) / runs) + " ms (type=" + image.getType() + ")");
(I also wrote a different version, that took a second parameter, and loaded a different file in the "normal" case, but the measurements were similar, so I left it out).
Most likely there's still issues with this benchmark, like measuring I/O time, rather than decoding time, but at least it's a little more fair.
PS: Some bonus background information. If you use an Oracle JRE at least, the bundled JPEG plugin for ImageIO uses JNI, and a native compiled version of IJG's libjpeg (written in C). This is used for both reading and writing JPEG. You could probably see better performance, if you used native bindings for libjpegTurbo. But as this is all native code, it's unlikely the performance will vary drastically from platform to platform.
I didn't think there was a difference between an inputstream object read from a local file vs one from a network source (Amazon S3 in this case) so hopefully someone can enlighten me.
These programs were run on a VM running Centos 6.3.
The test file in both cases are 10MB.
Local file code:
InputStream is = new FileInputStream("/home/anyuser/test.jpg");
int read = 0;
int buf_size = 1024 * 1024 * 2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
System.out.println("reading for the " + i + "th time");
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
The output of this code is this: it reads 5 times, which makes sense since the buffer size read in is 2MB and the file is 10MB.
reading for the 0th time
reading for the 1th time
reading for the 2th time
reading for the 3th time
reading for the 4th time
Time to read = 103ms
Now, we have the same code run with the same 10MB test file, except this time, the source is from Amazon S3. We don't start reading until we finish getting the stream from S3. However, this time, the read loop is running through thousands of times, when it should only read it 5 times.
InputStream is;
long t1 = System.currentTimeMillis();
is = getS3().getFileFromBucket(S3Path,input);
long t2 = System.currentTimeMillis();
System.out.print("Time to get file " + input + " from S3: ");
System.out.println((t2-t1) + "ms");
int read = 0;
int buf_size = 1024*1024*2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
if ((i % 100) == 0)
System.out.println("reading for the " + i + "th time");
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
The output is as follows:
Time to get file test.jpg from S3: 2456ms
reading for the 0th time
reading for the 100th time
reading for the 200th time
reading for the 300th time
reading for the 400th time
reading for the 500th time
reading for the 600th time
reading for the 700th time
reading for the 800th time
reading for the 900th time
reading for the 1000th time
reading for the 1100th time
reading for the 1200th time
reading for the 1300th time
reading for the 1400th time
Time to read = 14471ms
The amount of time taken to read the stream changes from run to run. Sometimes it takes 60 seconds, sometimes 15 seconds. It doesn't get faster than 15 sec. The read loop still loops through 1400+ times on each test run of the program, even though I think it should only be 5 times, like the local file example.
Is this how inputstream works when the source is through the network, even though we had finished getting the file from the network source? Thanks in advance for your help.
I don't think it's specific to java. When you read from the network, the actual read call to the operating system will return a packet of data at a time, no matter how big is the buffer you allocated. If you check the size of the read data (your read variable), it should show the size of the network packet used.
This is one of the reason why people use a separate thread to read from network and avoid blocking by using async i/o technique.
As #imel96 points out, there is nothing in the documentation that guarantees the behaviour you are expecting. You will never read 2MB at a time from a socket, because the socket receive buffer isn't normally that large, quite apart from other factors such as bandwidth.
Using the following code as a benchmark, the system can write 10,000 rows to disk in a fraction of a second:
void withSync() {
int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30);
clock_t uend = clock();
close (f);
printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
In the above code, 10,000 records can be written and flushed out to disk in a fraction of a second, output below:
sync() seconds:0.006268 writes per second:0.000002
In the Java version, it takes over 4 seconds to write 10,000 records. Is this just a limitation of Java, or am I missing something?
public void testFileChannel() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
ByteBuffer b = ByteBuffer.allocateDirect(64*1024);
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
long e=System.currentTimeMillis();
System.out.println("With flush "+(e-s));
Returns this:
With flush 4263
Please help me understand what is the correct/fastest way to write records to disk in Java.
Note: I am using the RandomAccessFile class in combination with a ByteBuffer as ultimately we need random read/write access on this file.
Actually, I am surprised that test is not slower. The behavior of force is OS dependent but broadly it forces the data to disk. If you have an SSD you might achieve 40K writes per second, but with an HDD you won't. In the C example its clearly isn't committing the data to disk as even the fastest SSD cannot perform more than 235K IOPS (That the manufacturers guarantee it won't go faster than that :D )
If you need the data committed to disk every time, you can expect it to be slow and entirely dependent on the speed of your hardware. If you just need the data flushed to the OS and if the program crashes but the OS does not, you will not loose any data, you can write data without force. A faster option is to use memory mapped files. This will give you random access without a system call for each record.
I have a library Java Chronicle which can read/write 5-20 millions records per second with a latency of 80 ns in text or binary formats with random access and can be shared between processes. This only works this fast because it is not committing the data to disk on every record, but you can test that if the JVM crashes at any point, no data written to the chronicle is lost.
This code is more similar to what you wrote in C. Takes only 5 msec on my machine. If you really need to flush after every write, it takes about 60 msec. Your original code took about 11 seconds on this machine. BTW, closing the output stream also flushes.
public static void testFileOutputStream() throws IOException {
OutputStream os = new BufferedOutputStream( new FileOutputStream( "/tmp/fos" ) );
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.nanoTime();
for ( int i = 0; i < 10000; i++ ) {
os.write( bytes );
long e = System.nanoTime();
System.out.println( "outputstream " + ( e - s ) / 1e6 );
Java equivalent of fputs is file.write("012345678901234567890123456789"); , you are calling 4 functions and just 1 in C, delay seems obvious
i think this is most similar to your C version. i think the direct buffers in your java example are causing many more buffer copies than the C version. this takes about 2.2s on my (old) box.
public static void testFileChannelSimple() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
long e=System.currentTimeMillis();
System.out.println("With flush "+(e-s));
While googling, I see that using java.io.File#length() can be slow.
FileChannel has a size() method that is available as well.
Is there an efficient way in java to get the file size?
Well, I tried to measure it up with the code below:
For runs = 1 and iterations = 1 the URL method is fastest most times followed by channel. I run this with some pause fresh about 10 times. So for one time access, using the URL is the fastest way I can think of:
LENGTH sum: 10626, per Iteration: 10626.0
CHANNEL sum: 5535, per Iteration: 5535.0
URL sum: 660, per Iteration: 660.0
For runs = 5 and iterations = 50 the picture draws different.
LENGTH sum: 39496, per Iteration: 157.984
CHANNEL sum: 74261, per Iteration: 297.044
URL sum: 95534, per Iteration: 382.136
File must be caching the calls to the filesystem, while channels and URL have some overhead.
import java.io.*;
import java.net.*;
import java.util.*;
public enum FileSizeBench {
public long getResult() throws Exception {
File me = new File(FileSizeBench.class.getResource(
return me.length();
public long getResult() throws Exception {
FileInputStream fis = null;
try {
File me = new File(FileSizeBench.class.getResource(
fis = new FileInputStream(me);
return fis.getChannel().size();
} finally {
public long getResult() throws Exception {
InputStream stream = null;
try {
URL url = FileSizeBench.class
stream = url.openStream();
return stream.available();
} finally {
public abstract long getResult() throws Exception;
public static void main(String[] args) throws Exception {
int runs = 5;
int iterations = 50;
EnumMap<FileSizeBench, Long> durations = new EnumMap<FileSizeBench, Long>(FileSizeBench.class);
for (int i = 0; i < runs; i++) {
for (FileSizeBench test : values()) {
if (!durations.containsKey(test)) {
durations.put(test, 0l);
long duration = testNow(test, iterations);
durations.put(test, durations.get(test) + duration);
// System.out.println(test + " took: " + duration + ", per iteration: " + ((double)duration / (double)iterations));
for (Map.Entry<FileSizeBench, Long> entry : durations.entrySet()) {
System.out.println(entry.getKey() + " sum: " + entry.getValue() + ", per Iteration: " + ((double)entry.getValue() / (double)(runs * iterations)));
private static long testNow(FileSizeBench test, int iterations)
throws Exception {
long result = -1;
long before = System.nanoTime();
for (int i = 0; i < iterations; i++) {
if (result == -1) {
result = test.getResult();
} else if ((result = test.getResult()) != result) {
throw new Exception("variance detected!");
return (System.nanoTime() - before) / 1000;
The benchmark given by GHad measures lots of other stuff (such as reflection, instantiating objects, etc.) besides getting the length. If we try to get rid of these things then for one call I get the following times in microseconds:
file sum___19.0, per Iteration___19.0
raf sum___16.0, per Iteration___16.0
channel sum__273.0, per Iteration__273.0
For 100 runs and 10000 iterations I get:
file sum__1767629.0, per Iteration__1.7676290000000001
raf sum___881284.0, per Iteration__0.8812840000000001
channel sum___414286.0, per Iteration__0.414286
I did run the following modified code giving as an argument the name of a 100MB file.
import java.io.*;
import java.nio.channels.*;
import java.net.*;
import java.util.*;
public class FileSizeBench {
private static File file;
private static FileChannel channel;
private static RandomAccessFile raf;
public static void main(String[] args) throws Exception {
int runs = 1;
int iterations = 1;
file = new File(args[0]);
channel = new FileInputStream(args[0]).getChannel();
raf = new RandomAccessFile(args[0], "r");
HashMap<String, Double> times = new HashMap<String, Double>();
times.put("file", 0.0);
times.put("channel", 0.0);
times.put("raf", 0.0);
long start;
for (int i = 0; i < runs; ++i) {
long l = file.length();
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != file.length()) throw new Exception();
times.put("file", times.get("file") + System.nanoTime() - start);
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != channel.size()) throw new Exception();
times.put("channel", times.get("channel") + System.nanoTime() - start);
start = System.nanoTime();
for (int j = 0; j < iterations; ++j)
if (l != raf.length()) throw new Exception();
times.put("raf", times.get("raf") + System.nanoTime() - start);
for (Map.Entry<String, Double> entry : times.entrySet()) {
entry.getKey() + " sum: " + 1e-3 * entry.getValue() +
", per Iteration: " + (1e-3 * entry.getValue() / runs / iterations));
All the test cases in this post are flawed as they access the same file for each method tested. So disk caching kicks in which tests 2 and 3 benefit from. To prove my point I took test case provided by GHAD and changed the order of enumeration and below are the results.
Looking at result I think File.length() is the winner really.
Order of test is the order of output. You can even see the time taken on my machine varied between executions but File.Length() when not first, and incurring first disk access won.
LENGTH sum: 1163351, per Iteration: 4653.404
CHANNEL sum: 1094598, per Iteration: 4378.392
URL sum: 739691, per Iteration: 2958.764
CHANNEL sum: 845804, per Iteration: 3383.216
URL sum: 531334, per Iteration: 2125.336
LENGTH sum: 318413, per Iteration: 1273.652
URL sum: 137368, per Iteration: 549.472
LENGTH sum: 18677, per Iteration: 74.708
CHANNEL sum: 142125, per Iteration: 568.5
When I modify your code to use a file accessed by an absolute path instead of a resource, I get a different result (for 1 run, 1 iteration, and a 100,000 byte file -- times for a 10 byte file are identical to 100,000 bytes)
LENGTH sum: 33, per Iteration: 33.0
CHANNEL sum: 3626, per Iteration: 3626.0
URL sum: 294, per Iteration: 294.0
In response to rgrig's benchmark, the time taken to open/close the FileChannel & RandomAccessFile instances also needs to be taken into account, as these classes will open a stream for reading the file.
After modifying the benchmark, I got these results for 1 iterations on a 85MB file:
file totalTime: 48000 (48 us)
raf totalTime: 261000 (261 us)
channel totalTime: 7020000 (7 ms)
For 10000 iterations on same file:
file totalTime: 80074000 (80 ms)
raf totalTime: 295417000 (295 ms)
channel totalTime: 368239000 (368 ms)
If all you need is the file size, file.length() is the fastest way to do it. If you plan to use the file for other purposes like reading/writing, then RAF seems to be a better bet. Just don't forget to close the file connection :-)
import java.io.File;
import java.io.FileInputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;
public class FileSizeBench
public static void main(String[] args) throws Exception
int iterations = 1;
String fileEntry = args[0];
Map<String, Long> times = new HashMap<String, Long>();
times.put("file", 0L);
times.put("channel", 0L);
times.put("raf", 0L);
long fileSize;
long start;
long end;
File f1;
FileChannel channel;
RandomAccessFile raf;
for (int i = 0; i < iterations; i++)
// file.length()
start = System.nanoTime();
f1 = new File(fileEntry);
fileSize = f1.length();
end = System.nanoTime();
times.put("file", times.get("file") + end - start);
// channel.size()
start = System.nanoTime();
channel = new FileInputStream(fileEntry).getChannel();
fileSize = channel.size();
end = System.nanoTime();
times.put("channel", times.get("channel") + end - start);
// raf.length()
start = System.nanoTime();
raf = new RandomAccessFile(fileEntry, "r");
fileSize = raf.length();
end = System.nanoTime();
times.put("raf", times.get("raf") + end - start);
for (Map.Entry<String, Long> entry : times.entrySet()) {
System.out.println(entry.getKey() + " totalTime: " + entry.getValue() + " (" + getTime(entry.getValue()) + ")");
public static String getTime(Long timeTaken)
if (timeTaken < 1000) {
return timeTaken + " ns";
} else if (timeTaken < (1000*1000)) {
return timeTaken/1000 + " us";
} else {
return timeTaken/(1000*1000) + " ms";
I ran into this same issue. I needed to get the file size and modified date of 90,000 files on a network share. Using Java, and being as minimalistic as possible, it would take a very long time. (I needed to get the URL from the file, and the path of the object as well. So its varied somewhat, but more than an hour.) I then used a native Win32 executable, and did the same task, just dumping the file path, modified, and size to the console, and executed that from Java. The speed was amazing. The native process, and my string handling to read the data could process over 1000 items a second.
So even though people down ranked the above comment, this is a valid solution, and did solve my issue. In my case I knew the folders I needed the sizes of ahead of time, and I could pass that in the command line to my win32 app. I went from hours to process a directory to minutes.
The issue did also seem to be Windows specific. OS X did not have the same issue and could access network file info as fast as the OS could do so.
Java File handling on Windows is terrible. Local disk access for files is fine though. It was just network shares that caused the terrible performance. Windows could get info on the network share and calculate the total size in under a minute too.
If you want the file size of multiple files in a directory, use Files.walkFileTree. You can obtain the size from the BasicFileAttributes that you'll receive.
This is much faster then calling .length() on the result of File.listFiles() or using Files.size() on the result of Files.newDirectoryStream(). In my test cases it was about 100 times faster.
Actually, I think the "ls" may be faster. There are definitely some issues in Java dealing with getting File info. Unfortunately there is no equivalent safe method of recursive ls for Windows. (cmd.exe's DIR /S can get confused and generate errors in infinite loops)
On XP, accessing a server on the LAN, it takes me 5 seconds in Windows to get the count of the files in a folder (33,000), and the total size.
When I iterate recursively through this in Java, it takes me over 5 minutes. I started measuring the time it takes to do file.length(), file.lastModified(), and file.toURI() and what I found is that 99% of my time is taken by those 3 calls. The 3 calls I actually need to do...
The difference for 1000 files is 15ms local versus 1800ms on server. The server path scanning in Java is ridiculously slow. If the native OS can be fast at scanning that same folder, why can't Java?
As a more complete test, I used WineMerge on XP to compare the modified date, and size of the files on the server versus the files locally. This was iterating over the entire directory tree of 33,000 files in each folder. Total time, 7 seconds. java: over 5 minutes.
So the original statement and question from the OP is true, and valid. Its less noticeable when dealing with a local file system. Doing a local compare of the folder with 33,000 items takes 3 seconds in WinMerge, and takes 32 seconds locally in Java. So again, java versus native is a 10x slowdown in these rudimentary tests.
Java 1.6.0_22 (latest), Gigabit LAN, and network connections, ping is less than 1ms (both in the same switch)
Java is slow.
From GHad's benchmark, there are a few issue people have mentioned:
1>Like BalusC mentioned: stream.available() is flowed in this case.
Because available() returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream.
So 1st to remove the URL this approach.
2>As StuartH mentioned - the order the test run also make the cache difference, so take that out by run the test separately.
Now start test:
When CHANNEL one run alone:
CHANNEL sum: 59691, per Iteration: 238.764
When LENGTH one run alone:
LENGTH sum: 48268, per Iteration: 193.072
So looks like the LENGTH one is the winner here:
public long getResult() throws Exception {
File me = new File(FileSizeBench.class.getResource(
return me.length();