Java MappedByteBuffer.get() surprisingly slow

Java MappedByteBuffer.get() surprisingly slow - java

I'm trying to understand if the performance i'm obtaining from the get() method of the MappedByteBuffer class is normal or not. My code is the following:
private byte[] testBuffer = new byte[4194304];
private File sdcardDir, filepath;
private FileInputStream inputStream;
private FileChannel fileChannel;
private MappedByteBuffer mappedByteBuffer;
// Obtain the root folder of the external storage
sdcardDir = Environment.getExternalStorageDirectory();
// Create the reference to the file to be read
filepath = new File(sdcardDir, "largetest.avi");
inputStream = new FileInputStream(filepath);
fileChannel = inputStream.getChannel();
mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, (4194304));
Log.d("GFXUnpack", "Starting to read");
mappedByteBuffer.position(0);
mappedByteBuffer.get(testBuffer, 0, (4194304));
Log.d("GFXUnpack", "Ended to read");
mappedByteBuffer.rewind();
Since i'm a beginner, and i needed the fastest way to read data from SD card, i looked for documentation and i found that File Mapping is considered, in many cases, the fastest approach to read from a file. But if i run the above code, although the buffer is correctly filled, the performance is so slow (or maybe not ? You decide !!) that i can read those 4194304 bytes in almost 5 seconds, that's less than 1MB per second. I'm using Eclipse directly connected to my Optimus Dual smartphone; the same time is required for the read even if i put the reading operation in a loop (maybe overhead initialization wouldn't take place if multiple reads are performed...Not the case).
This file size-time relation doesn't change if i reduce or make the file larger: 8 megs will be read in almost 9 seconds, 2 megs in 2 seconds, and so on.
I've read that even a slow SD card can be read at a speed of at least 5 MB per second...
Note that 4194304 is a power of 2 value, since i've read that this would increase performance.
Please tell me your opinion: is 1MB per second the actual performance on a modern smartphone, or is there something wrong with my code ? Thank you

Its worth nothing that in the Hotspot JVM, MappedByteBuffer.get() uses an intrinsic rather than a native call. When copying large sections blocks of data it copies multiple bytes at a time e.g. 8 bytes or longer with MMX instructions.
AFAIK, Android doesn't do this, which makes this call much more expensive.

I can't see anything wrong with your code. It is probably just the speed of the device and / or the file system implementation. As Tom Hawtin puts it "[m]emory mapped I/O will not make your disks run faster".

Related

Buffered-Input Stream [duplicate]

Let me preface this post with a single caution. I am a total beginner when it comes to Java. I have been programming PHP on and off for a while, but I was ready to make a desktop application, so I decided to go with Java for various reasons.
The application I am working on is in the beginning stages (less than 5 classes) and I need to read bytes from a local file. Typically, the files are currently less than 512kB (but may get larger in the future). Currently, I am using a FileInputStream to read the file into three byte arrays, which perfectly satisfies my requirements. However, I have seen a BufferedInputStream mentioned, and was wondering if the way I am currently doing this is best, or if I should use a BufferedInputStream as well.
I have done some research and have read a few questions here on Stack Overflow, but I am still having troubles understanding the best situation for when to use and not use the BufferedInputStream. In my situation, the first array I read bytes into is only a few bytes (less than 20). If the data I receive is good in these bytes, then I read the rest of the file into two more byte arrays of varying size.
I have also heard many people mention profiling to see which is more efficient in each specific case, however, I have no profiling experience and I'm not really sure where to start. I would love some suggestions on this as well.
I'm sorry for such a long post, but I really want to learn and understand the best way to do these things. I always have a bad habit of second guessing my decisions, so I would love some feedback. Thanks!

If you are consistently doing small reads then a BufferedInputStream will give you significantly better performance. Each read request on an unbuffered stream typically results in a system call to the operating system to read the requested number of bytes. The overhead of doing a system call is may be thousands of machine instructions per syscall. A buffered stream reduces this by doing one large read for (say) up to 8k bytes into an internal buffer, and then handing out bytes from that buffer. This can drastically reduce the number of system calls.
However, if you are consistently doing large reads (e.g. 8k or more) then a BufferedInputStream slows things a bit. You typically don't reduce the number of syscalls, and the buffering introduces an extra data copying step.
In your use-case (where you read a 20 byte chunk first then lots of large chunks) I'd say that using a BufferedInputStream is more likely to reduce performance than increase it. But ultimately, it depends on the actual read patterns.

If you are using a relatively large arrays to read the data a chunk at a time, then BufferedInputStream will just introduce a wasteful copy. (Remember, read does not necessarily read all of the array - you might want DataInputStream.readFully). Where BufferedInputStream wins is when making lots of small reads.

BufferedInputStream reads more of the file that you need in advance. As I understand it, it's doing more work in advance, like, 1 big continous disk read vs doing many in a tight loop.
As far as profiling - I like the profiler that's built into netbeans. It's really easy to get started with. :-)

I can't speak to the profiling, but from my experience developing Java applications I find that using any of the buffer classes - BufferedInputStream, StringBuffer - my applications are exceptionally faster. Because of which, I use them even for the smallest files or string operation.

import java.io.*;
class BufferedInputStream
{
public static void main(String arg[])throws IOException
{
FileInputStream fin=new FileInputStream("abc.txt");
BufferedInputStream bis=new BufferedInputStream(fin);
int size=bis.available();
while(true)
{
int x=bis.read(fin);
if(x==-1)
{
bis.mark(size);
System.out.println((char)x);
}
}
bis.reset();
while(true)
{
int x=bis.read();
if(x==-1)
{
break;
System.out.println((char)x);
}
}
}
}

Efficiently sharing data between processes in different languages

Context
I am writing a Java program that communicates with a C# program through standard in and standard out. The C# program is started as a child process. It gets "requests" through stdin and sends "responses" through stdout. The requests are very lightweight (a few bytes size), but the responses are large. In a normal run of the program, the responses amount for about 2GB of data.
I am looking for ways to improve performance, and my measurements indicate that writing to stdout is a bottleneck. Here are the numbers from a normal run:
Total time: 195 seconds
Data transferred through stdout: 2026MB
Time spent writing to stdout: 85 seconds
stdout throughput: 23.8 MB/s
By the way, I am writing all the bytes to an in-memory buffer first, and copying them in one go to stdout to make sure I only measure stdout write time.
Question
What is an efficient and elegant way to share data between the C# child process and the Java parent process? It is clear that stdout is not going to be enough.
I have read here and there about sharing memory through memory mapped files, but the Java and .NET APIs give me the impression that I'm looking in the wrong place.

Before you invest more in memory mapped files or named pipes I would first check whether you actually read and write efficiently. java.lang.Process.getInputStream() uses a BufferedInputStream, so the reader side should be OK. But in your C# program you will most likely use Console.Write. The problem here is that AutoFlush is enabled by default. So every single write explicitely flushes the stream. I wrote my last C# code years ago, so I'm not up-to-date. But maybe it is possible to set the AutoFlush property of Console.Out to false and flush the stream manually after multiple writes.
If disabling AutoFlush should not be possible the only way to improve performance with Console.Out would be to write more text with a single write.
Another potential bottleneck may be a shell in between that has to interpret the written data. Ensure that you execute the C# program directly and not through a script or by calling the command executor.
Before you start using memory mapped files I would first try to simply write into a file. As long as you have enough free memory that is not used by your programs or others and as long as there are no other programs with frequent disk access the operating system will be able to hold quite a big amount of written data within the file system cache. As long as your Java program reads fast enough from file while your C# program is writing to the file chances are high that only some or even no data has to be loaded from disk.

As Matthew Watson mentioned in the comments, it is indeed possible and incredibly fast to use a memory mapped file. In fact, the throughput for my program went from 24 MB/s to 180 MB/s. Below is the gist of it.
The following Java code creates the memory mapped file used for communication and opens a buffer we can read from:
var path = Paths.get("test.mmap");
var channel = FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
var mappedByteBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 200_000 * 8);
The following C# code opens the memory mapped file and creates a stream that you can use to write bytes to it (note that buffer is the name of the array of bytes to be written):
// This code assumes the file has already been created on the Java side
var file = File.Open("test.mmap", FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
var memoryMappedFile = MemoryMappedFile.CreateFromFile(file, fileName, 0, MemoryMappedFileAccess.ReadWrite, HandleInheritability.None, false);
var stream = memoryMappedFile.CreateViewStream();
stream.Write(buffer, 0, buffer.Length);
stream.Flush();
Of course, you need to somehow synchronize the Java and the C# side. For the sake of simplicity, I didn't include that in the code above. In my code, I am using standard in and standard out to signal when it is safe to read / write.

Why to use BitmapFactory.Options.inTempStorage?

What are intended use cases for the BitmapFactory.Options.inTempStorage option?
Documentation is pretty terse on this:
Temp storage to use for decoding. Suggest 16K or so.
If I'm not mistaken it means that if you don't provide the buffer explicitly, it would create and use one by itself.
So the only benefit I see is reusing the same 16K buffer for multiple decodings which seems to have quite questionable impact on performance/memory usage optimization.
So why SDK authors give us control over the temp storage for decoding? Should providing much greater buffer improve decoding performance?
Can someone expand on this?

It seems that your assumption is the correct one - this option is mainly for recycling the buffer itself.
From the Android Source Code:
// pass some temp storage down to the native code. 1024 is made up,
// but should be large enough to avoid too many small calls back
// into is.read(...) This number is not related to the value passed
// to mark(...) above.
byte [] tempStorage = null;
if (opts != null) tempStorage = opts.inTempStorage;
if (tempStorage == null) tempStorage = new byte[16 * 1024];
This means that if you do not send this buffer, it will be allocated. Though does not look like an optimization for most cases, if you load many small images - the allocation of a 16K buffer per image might be pricy.
Regarding the buffer size, as you can see from the comments in the code - there is no magic number. What happens is that the Native code that decodes the image, uses the InputStream managed code to fetch the actual raw bytes (from disk/network etc). It uses the allocated buffer to communicate the bytes for each READ call. So, it is really depends on the InputStream. For example, disk IS might read from the disk in a bulk of 4k and then 16k is more than enough - passing in a buffer bigger than that will not improve the performance since the buffer will not fill up more than 4k at each READ call.
In any case, considering this kind of optimization should be for a really specific cases - if you have such a case, you can provide a bigger buffer and see if it has any affect on the performance.

Files.newInputStream creates slow InputStream

On my Windows 7 Files.newInputStream returns sun.nio.ch.ChannelInputStream. When I tested its performance vs FileInputStream I was surprised to know that FileInputStream is faster.
This test
InputStream in = new FileInputStream("test");
long t0 = System.currentTimeMillis();
byte[] a = new byte[16 * 1024];
for (int n; (n = in.read(a)) != -1;) {
}
System.out.println(System.currentTimeMillis() - t0);
reads 100mb file in 125 ms. If I replace the first line with
InputStream in = Files.newInputStream(Paths.get("test"));
I get 320ms.
If Files.newInputStream is slower what advantages it has over FileInputStream?

If you tested new FileInputStream second, you are probably just seeing the effect of cache priming by the operating system. It isn't plausible that Java is causing any significant difference to an I/O-bound process. Try it the other way around, and on a much larger dataset.

I don't want to be the buzzkill, but the javadoc doesn't state any advantages, nor does any documentation I could find
Opens a file, returning an input stream to read from the file. The
stream will not be buffered, and is not required to support the mark
or reset methods. The stream will be safe for access by multiple
concurrent threads. Reading commences at the beginning of the file.
Whether the returned stream is asynchronously closeable and/or
interruptible is highly file system provider specific and therefore
not specified.
I think the method is just a utility method not necessarily meant to replace or improve on FileInputStream. Note that the concurrency point might explain some slow down.

Your FileInputStream and FileOutputstreams might introduce long GC pauses
Every time you create either a FileInputStream or a FileOutputStream, you are creating an object. Even if you close it correctly and promptly, it will be put into a special category that only gets cleaned up when the garbage collector does a full GC. Sadly, due to backwards compatibility constraints, this is not something that can be fixed in the JDK anytime soon as there could be some code out there where somebody has extended FileInputStream / FileOutputStream and is relying on those finalize() methods to ensure the call to close().
The solution (at least if you are using Java 7 or newer) is not too hard
— just switch to Files.newInputStream(...) and Files.newOutputStream(...)
https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful

The document said
"The stream will not be buffered"
It's because Files.newInputStream(Paths) support non-blocking IO.
You can try in debug mode, you can open non blocking inputstream and in the same time modify the file, but if you use FileInputStream, you cannot do such things.
FileInputStream will require "write lock" of file, so it can buffer the content of file, increase the speed of reading.
But ChannelInputStream cannot. It must guaranteed that it is reading the "current" content of file.
Above is my experience, I didn't check every point in Java doc.

Java How to improve reading of 50 Gigabit file

I am reading a 50G file containing millions of rows separated by newline character. Presently I am using following syntax to read the file
String line = null;
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("FileName")));
while ((line = br.readLine()) != null)
{
// Processing each line here
// All processing is done in memory. No IO required here.
}
Since the file is too big, it is taking 2 Hrs to process the whole file. Can I improve the reading of file from the harddisk so that the IO(Reading) operation takes minimal time. The restriction with my code is that I have to process each line sequential order.

it is taking 2 Hrs to process the whole file.
50 GB / 2 hours equals approximately 7 MB/s. It's not a bad rate at all. A good (modern) hard disk should be capable of sustaining higher rate continuously, so maybe your bottleneck is not the I/O? You're already using BufferedReader, which, like the name says, is buffering (in memory) what it reads. You could experiment creating the reader with a bit bigger buffer than the default size (8192 bytes), like so:
BufferedReader br = new BufferedReader(
new InputStreamReader(new FileInputStream("FileName")), 100000);
Note that with the default 8192 bytes buffer and 7 MB/s throughput the BufferedReader is going to re-fill its buffer almost 1000 times per second, so lowering that number could really help cutting down some overhead. But if the processing that you're doing, instead of the I/O, is the bottleneck, then no I/O trick is going to help you much. You should maybe consider making it multi-threaded, but whether it's doable, and how, depends on what "processing" means here.

Your only hope is to parallelize the reading and processing of what's inside. Your strategy should be to never require the entire file contents to be in memory at once.
Start by profiling the code you have to see where the time is being spent. Rewrite the part that takes the most time and re-profile to see if it improved. Keep repeating until you get an acceptable result.
I'd think about Hadoop and a distributed solution. Data sets that are larger than yours are processed routinely now. You might need to be a bit more creative in your thinking.

Without NIO you won't be able to break the throughput barrier. For example, try using new Scanner(File) instead of directly creating readers. Recently I took a look at that source code, it uses NIO's file channels.
But the first thing I would suggest is to run an empty loop with BufferedReader that does nothing but reading. Note the throughput -- and also keep an eye on the CPU. If the loop floors the CPU, then there's definitely an issue with the IO code.

Disable the antivirus and any other program which adds to disk contention while reading the file.
Defragment the disk.
Create a raw disk partition and read the file from there.
Read the file from an SSD.
Create a 50GB Ramdisk and read the file from there.

I think you may get the best results by re-considering the problem you're trying to solve. There's clearly a reason you're loading this 50Gig file. Consider if there isn't a better way to break the stored data down and only use the data you really need.

The way you read the file is fine. There might be ways to get it faster, but it usually requires understanding where your bottleneck is. Because the IO throughput is actually on the lower end, I assume the computation is having a performance side effect. If its not too lengthy you could show you whole program.
Alternatively, you could run your program without the contents of the loop and see how long it takes to read through the file :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.