Deserializing avro is slow

Deserializing avro is slow - java

I try to do a performance test with Java between several serialization formats including avro/protobuf/thrift and etc.
Test bases on deserializing a byte array message having 30 long type fields for 1,000,000 times.
The result for avro is not good.
protobuf/thrift uses around 2000 milliseconds in average, but it takes 9000 milliseconds for avro.
In the document it advice to reuse decoder, so I do the code as follow.
byte[] bytes = readFromFile("market.avro");
long begin = System.nanoTime();
DatumReader<Market> userDatumReader = new ReflectDatumReader<>(Market.class);
InputStream inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder reuse = DecoderFactory.get().binaryDecoder(inputStream, null);
Market marketReuse = new Market();
for (int i = 0; i < loopCount; i++) {
inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(inputStream, reuse);
userDatumReader.read(marketReuse, decoder);
}
long end = System.nanoTime() - begin;
System.out.println("avro loop " + loopCount + " times: " + (end * 1d / 1000 / 1000));
I think avro should not be that slow, so I believe I do something wrong, but I am not sure what's the point. Do I make the 'reuse' in a wrong way?
Is there any advice for avro performance testing? Thanks in advance.

Took me a while to figure this one out. But apparently
DecoderFactory.get().binaryDecoder is the culprit - it creates a buffer of 8KB every time it is invoked. And this buffer is not re-used, but reallocated on every invocation. I don't see any reason why there is a buffer involved in the first place.
The saner alternative is to use DecoderFactory.get().directBinaryDecoder

Related

Java BufferedImage to byte array conversion is too slow compared to other languages

I am trying to convert an image to a byte array so that I can transfer it over the network for further processing.
Now in C# following code does the job in about 3 or 2 milliseconds.
Image image = Image.FromFile("D:/tst.jpg");
DateTime pre = DateTime.Now;
int sz;
using (MemoryStream sourceImageStream = new MemoryStream())
{
image.Save(sourceImageStream, System.Drawing.Imaging.ImageFormat.Jpeg);
byte[] sourceImageData = sourceImageStream.ToArray();
sz = sourceImageData.Count();
}
MessageBox.Show("Size " + sz + " time : " + (DateTime.Now - pre).TotalMilliseconds);
Output:
Size 268152 time : 3.0118
But in Java doing the same as below takes way too much time.
BuffredImage image = ImageIO.read(new File("D:/tst.jpg"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Instant pre = Instant.now();
ImageIO.write( image, "jpeg", baos );
baos.flush();
Instant now = Instant.now();
System.out.println("Size " + baos.size() + " time : " + ChronoUnit.MILLIS.between(pre, now));
Output:
Size 268167 time : 91.0
The source image is a JPG image. In C# when using png compressing. time was around 90ms. So my guess is that Java is taking time to somehow still compress the same JPG image. Image dimension is 2048 * 1536.
Java is frustratingly slow here. How can I get rid of this problem in Java?
Take this image into consideration.
C#:
Size 1987059 time : 11.0129
Java:
Size 845093 time : 155.0
The source image is 1987059 bytes (which is same as C# encoded byte array). But in Java it is compressed to 845093 bytes. I have tried setting the compression quality to 1f like this but it didn't help to reduce the time.

The main problem with this kind of testing is pointed out in the first comment: This is a micro-benchmark. If you run that code only once in Java, you'll mostly measure the time taken to initialize the run-time, class loading and initialisatizion.
Here's a slightly modified version of your code (I originally wrote this as an answer to your follow-up question that is now closed as a duplicate, but the same concept applies), that at least includes a warm-up time. And you'll see that there's a quite a difference in the measurments. On my 2014 MacBook Pro, the output is:
Initial load time 415 ms (5)
Average warm up load time 73 ms (5)
Normal load time 65 ms (5)
As you see, the "normal" time to load an image, is a lot less than the initial time, which includes a lot of overhead.
Code:
public class TestJPEGSpeed {
public static void main(String[] args) throws IOException {
File input = new File(args[0]);
test(input, 1, "Initial");
test(input, 100, "Average warm up");
test(input, 1, "Normal");
}
private static void test(File input, int runs, final String type) throws IOException {
BufferedImage image = null;
long start = System.currentTimeMillis();
for (int i = 0; i < runs; i++) {
image = ImageIO.read(input);
}
long stop = System.currentTimeMillis();
System.out.println(type + " load time " + ((stop - start) / runs) + " ms (type=" + image.getType() + ")");
}
}
(I also wrote a different version, that took a second parameter, and loaded a different file in the "normal" case, but the measurements were similar, so I left it out).
Most likely there's still issues with this benchmark, like measuring I/O time, rather than decoding time, but at least it's a little more fair.
PS: Some bonus background information. If you use an Oracle JRE at least, the bundled JPEG plugin for ImageIO uses JNI, and a native compiled version of IJG's libjpeg (written in C). This is used for both reading and writing JPEG. You could probably see better performance, if you used native bindings for libjpegTurbo. But as this is all native code, it's unlikely the performance will vary drastically from platform to platform.

What is the less expensive hash algorithm?

I don't know so much in hash algorithms.
I need to compute the hash of an incoming file live in Java before forwarding the file a remote system (a bit like S3) which requires a file hash in MD2/MD5/SHA-X.
This hash is not computed for security reasons but simply for a consistency checksum.
I am able to compute this hash live while forwarding the file, with a DigestInputStream of Java standard library, but would like to know which algorithm is the best to use to avoid performance problems of using the DigestInputStream?
One of my former collegue tested and told us that computing the hash live can be quite expensive compared to an unix command line or on a file.
Edit about premature optimization:
I work an a company which targets to help other companies to dematerialize their documents.
This means we have a batch which handle document transfers from other companies. We target in the future millions of document per days and actually, the execution time of this batch is sensitive for our business.
An hashing optimisation of 10 milliseconds for 1 million document per day is a daily execution time reduced of 3 hours which is pretty huge.

If you simply want to detect accidental corruption during transmission, etc, then a simple (non-crypto) checksum should be sufficient. But note that (for example) a 16 bit checksum will fail to detect random corruption one time in 216. And it is no guard against someone deliberately modifying the data.
The Wikipedia page on Checksums, lists various options including a number of commonly used (and cheap) ones like Adler-32 and CRCs.
However, I agree with #ppeterka. This smells of "premature optimization".

I know that lot of people do not believe in micro benchmark but let me post the result what I've got.
Input:
bigFile.txt = appx 143MB size
hashAlgorithm = MD2, MD5, SHA-1
test code:
while (true){
long l = System.currentTimeMillis();
MessageDigest md = MessageDigest.getInstance(hashAlgorithm);
try (InputStream is = new BufferedInputStream(Files.newInputStream(Paths.get("bigFile.txt")))) {
DigestInputStream dis = new DigestInputStream(is, md);
int b;
while ((b = dis.read()) != -1){
}
}
byte[] digest = md.digest();
System.out.println(System.currentTimeMillis() - l);
}
results:
MD5
------
22030
10356
9434
9310
11332
9976
9575
16076
-----
SHA-1
-----
18379
10139
10049
10071
10894
10635
11346
10342
10117
9930
-----
MD2
-----
45290
34232
34601
34319
-----
Seems that MD2 a bit slower that MD5 or SHA-1

Like NKukhar I've tried to do a micro-benchmark, but with a different code and better results:
public static void main(String[] args) throws Exception {
String bigFile = "100mbfile";
// We put the file bytes in memory, we don't want to mesure the time it takes to read from the disk
byte[] bigArray = IOUtils.toByteArray(Files.newInputStream(Paths.get(bigFile)));
byte[] buffer = new byte[50_000]; // the byte buffer we will use to consume the stream
// we prepare the algos to test
Set<String> algos = ImmutableSet.of(
"no_hash", // no hashing
MessageDigestAlgorithms.MD5,
MessageDigestAlgorithms.SHA_1,
MessageDigestAlgorithms.SHA_256,
MessageDigestAlgorithms.SHA_384,
MessageDigestAlgorithms.SHA_512
);
int executionNumber = 20;
for ( String algo : algos ) {
long totalExecutionDuration = 0;
for ( int i = 0 ; i < 20 ; i++ ) {
long beforeTime = System.currentTimeMillis();
InputStream is = new ByteArrayInputStream(bigArray);
if ( !"no_hash".equals(algo) ) {
is = new DigestInputStream(is, MessageDigest.getInstance(algo));
}
while ((is.read(buffer)) != -1) { }
long executionDuration = System.currentTimeMillis() - beforeTime;
totalExecutionDuration += executionDuration;
}
System.out.println(algo + " -> average of " + totalExecutionDuration/executionNumber + " millies per execution");
}
}
This produces the following output for a 100mb file on a good i7 developer machine:
no_hash -> average of 6 millies per execution
MD5 -> average of 201 millies per execution
SHA-1 -> average of 335 millies per execution
SHA-256 -> average of 576 millies per execution
SHA-384 -> average of 481 millies per execution
SHA-512 -> average of 464 millies per execution

How is reading an InputStream object from a local file different than from the network (via Amazon S3)?

I didn't think there was a difference between an inputstream object read from a local file vs one from a network source (Amazon S3 in this case) so hopefully someone can enlighten me.
These programs were run on a VM running Centos 6.3.
The test file in both cases are 10MB.
Local file code:
InputStream is = new FileInputStream("/home/anyuser/test.jpg");
int read = 0;
int buf_size = 1024 * 1024 * 2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
baos.write(buf,0,read);
System.out.println("reading for the " + i + "th time");
i++;
}
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
The output of this code is this: it reads 5 times, which makes sense since the buffer size read in is 2MB and the file is 10MB.
reading for the 0th time
reading for the 1th time
reading for the 2th time
reading for the 3th time
reading for the 4th time
Time to read = 103ms
Now, we have the same code run with the same 10MB test file, except this time, the source is from Amazon S3. We don't start reading until we finish getting the stream from S3. However, this time, the read loop is running through thousands of times, when it should only read it 5 times.
InputStream is;
long t1 = System.currentTimeMillis();
is = getS3().getFileFromBucket(S3Path,input);
long t2 = System.currentTimeMillis();
System.out.print("Time to get file " + input + " from S3: ");
System.out.println((t2-t1) + "ms");
int read = 0;
int buf_size = 1024*1024*2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
baos.write(buf,0,read);
if ((i % 100) == 0)
System.out.println("reading for the " + i + "th time");
i++;
}
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
The output is as follows:
Time to get file test.jpg from S3: 2456ms
reading for the 0th time
reading for the 100th time
reading for the 200th time
reading for the 300th time
reading for the 400th time
reading for the 500th time
reading for the 600th time
reading for the 700th time
reading for the 800th time
reading for the 900th time
reading for the 1000th time
reading for the 1100th time
reading for the 1200th time
reading for the 1300th time
reading for the 1400th time
Time to read = 14471ms
The amount of time taken to read the stream changes from run to run. Sometimes it takes 60 seconds, sometimes 15 seconds. It doesn't get faster than 15 sec. The read loop still loops through 1400+ times on each test run of the program, even though I think it should only be 5 times, like the local file example.
Is this how inputstream works when the source is through the network, even though we had finished getting the file from the network source? Thanks in advance for your help.

I don't think it's specific to java. When you read from the network, the actual read call to the operating system will return a packet of data at a time, no matter how big is the buffer you allocated. If you check the size of the read data (your read variable), it should show the size of the network packet used.
This is one of the reason why people use a separate thread to read from network and avoid blocking by using async i/o technique.

As #imel96 points out, there is nothing in the documentation that guarantees the behaviour you are expecting. You will never read 2MB at a time from a socket, because the socket receive buffer isn't normally that large, quite apart from other factors such as bandwidth.

How do you write to disk (with flushing) in Java and maintain performance?

Using the following code as a benchmark, the system can write 10,000 rows to disk in a fraction of a second:
void withSync() {
int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30);
fsync(f);
}
clock_t uend = clock();
close (f);
printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}
In the above code, 10,000 records can be written and flushed out to disk in a fraction of a second, output below:
sync() seconds:0.006268 writes per second:0.000002
In the Java version, it takes over 4 seconds to write 10,000 records. Is this just a limitation of Java, or am I missing something?
public void testFileChannel() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
c.force(true);
ByteBuffer b = ByteBuffer.allocateDirect(64*1024);
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
b.clear();
b.put("012345678901234567890123456789".getBytes());
b.flip();
c.write(b);
c.force(false);
}
long e=System.currentTimeMillis();
raf.close();
System.out.println("With flush "+(e-s));
}
Returns this:
With flush 4263
Please help me understand what is the correct/fastest way to write records to disk in Java.
Note: I am using the RandomAccessFile class in combination with a ByteBuffer as ultimately we need random read/write access on this file.

Actually, I am surprised that test is not slower. The behavior of force is OS dependent but broadly it forces the data to disk. If you have an SSD you might achieve 40K writes per second, but with an HDD you won't. In the C example its clearly isn't committing the data to disk as even the fastest SSD cannot perform more than 235K IOPS (That the manufacturers guarantee it won't go faster than that :D )
If you need the data committed to disk every time, you can expect it to be slow and entirely dependent on the speed of your hardware. If you just need the data flushed to the OS and if the program crashes but the OS does not, you will not loose any data, you can write data without force. A faster option is to use memory mapped files. This will give you random access without a system call for each record.
I have a library Java Chronicle which can read/write 5-20 millions records per second with a latency of 80 ns in text or binary formats with random access and can be shared between processes. This only works this fast because it is not committing the data to disk on every record, but you can test that if the JVM crashes at any point, no data written to the chronicle is lost.

This code is more similar to what you wrote in C. Takes only 5 msec on my machine. If you really need to flush after every write, it takes about 60 msec. Your original code took about 11 seconds on this machine. BTW, closing the output stream also flushes.
public static void testFileOutputStream() throws IOException {
OutputStream os = new BufferedOutputStream( new FileOutputStream( "/tmp/fos" ) );
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.nanoTime();
for ( int i = 0; i < 10000; i++ ) {
os.write( bytes );
}
long e = System.nanoTime();
os.close();
System.out.println( "outputstream " + ( e - s ) / 1e6 );
}

Java equivalent of fputs is file.write("012345678901234567890123456789"); , you are calling 4 functions and just 1 in C, delay seems obvious

i think this is most similar to your C version. i think the direct buffers in your java example are causing many more buffer copies than the C version. this takes about 2.2s on my (old) box.
public static void testFileChannelSimple() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw");
FileChannel c = raf.getChannel();
c.force(true);
byte[] bytes = "012345678901234567890123456789".getBytes();
long s = System.currentTimeMillis();
for(int i=0;i<10000;i++){
raf.write(bytes);
c.force(true);
}
long e=System.currentTimeMillis();
raf.close();
System.out.println("With flush "+(e-s));
}

Parsing a text file on BlackBerry takes forever

I was originally using RIM's native xml parser methods to parse a 150k text file, approximately 5000 lines of xml, however it was taking about 2 minutes to complete, so I tried a line based format:
Title: Book Title Line 1 Line
2 Line 3
I should be able to read the file in less time than it takes to blink, but it is still slow.
Identifier books is a Vector of Book objects and lines are stored in a vector of strings in the Book object.
class classs = Class.forName("com.Gui.FileLoader");
InputStream is = classs.getResourceAsStream( fileName );
int totalFileSize = IOUtilities.streamToBytes( is ).length;
int totalRead = 0;
//Thought that maybe a shared input stream would be faster, in this case it't not.
SharedInputStream sis = SharedInputStream.getSharedInputStream( classs.getResourceAsStream( fileName ) );
LineReader lr = new LineReader( sis );
String strLine = new String( lr.readLine() );
totalRead += strLine.length();
Book book = null;
//Loop over the file until EOF is reached, catch EOF error move on with life after that.
while(1 == 1){
//If Line = Title: then we've got a new book add the old book to our books vector.
if (strLine.startsWith("Title:")){
if (book != null){
books.addElement( book );
}
book = new Book();
book.setTitle( strLine.substring( strLine.indexOf(':') + 1).trim() );
strLine = new String( lr.readLine() );
totalRead += strLine.length();
continue;
}
int totalComplete = (int) ( ( (double) totalRead / (double) totalFileSize ) * 100.00);
_observer.processStatusUpdate( totalComplete , book.getTitle() );
book.addLine( strLine );
strLine = new String( lr.readLine(), "ascii" );
totalRead += strLine.length();
}

For one thing, you're reading in the file twice - once for determining the size and then again for parsing it. Since you're already reading it into a byte array for determining the size, why not pass that byte array into a ByteArrayInputStream constructor? For example:
//Used to determine file size and then show in progress bar, app is threaded.
byte[] fileBytes = IOUtilities.streamToBytes( is );
int totalFileSize = fileBytes.length;
int totalRead = 0;
ByteArrayInputStream bais = new ByteArrayInputStream( fileBytes );
LineReader lr = new LineReader( bais);
This way it won't matter if the rest of the classes reading from the stream are reading a byte at a time - it's all in-memory.

It is easy to assume that all the operations you've elided from the code sample finish in constant time. I am guessing that one of them is doing something inefficiently, such as book.addLine( strLine ); or perhaps _observer.processStatusUpdate( totalComplete , book.getTitle() ); If those operations are not able to complete in constant time, then you could easily have a quadratic parsing algorithm.
Just thinking about the operations is the best way to figure it out, but if you're stumped, try using the BlackBerry profiler. Run your program in the Eclipse debugger and get it to stop at a breakpoint just before parsing. Then, in Eclipse, select 'window .. show view .. other .. BlackBerry .. BlackBerry Profiler View'
Select the 'setup options' button from the profiler view toolbar. It has a blue triangle in the icon. Set 'method attribution' to cumulative, and 'what to profile' to 'time including native methods'
then continue your program. once parsing is finished, you'll need to pause program execution, then click on the 'method' tab of the profiler view. You should be able to determine your pain point from there.

Where does the profiler say you spend your time?
If you do not have a preferred profiler there is jvisualvm in the Java 6 JDK.
(My guess is that you will find all the time being spent on the way down to "read a character from the file". If so, you need to buffer)

Try using new BufferedInputStream(classs.getResourceAsStream(fileName));
EDIT:
Apparently the documentation that says they have BufferedInputStream is wrong.
I am going to leave this wrong answer here just so people have that info (doc being wrong).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Deserializing avro is slow - java

Related

Java BufferedImage to byte array conversion is too slow compared to other languages

What is the less expensive hash algorithm?

How is reading an InputStream object from a local file different than from the network (via Amazon S3)?

How do you write to disk (with flushing) in Java and maintain performance?

Parsing a text file on BlackBerry takes forever

Categories

Resources