Android: Memory friendly modification of image bytes

Android: Memory friendly modification of image bytes - java

I'm currently making an Android App that modifies some bytes of an image. For this, I've written this code:
Bitmap bmp = BitmapFactory.decodeStream(new FileInputStream(path));
ByteBuffer buffer = ByteBuffer.allocate(bmp.getWidth()*bmp.getHeight());
bmp.copyPixelsToBuffer(buffer);
return buffer.array();
The problem is that this way uses too much Heap memory, and throws OutOfMemoryException.
I know that I can make the heap memory for the App bigger, but it doesn't seem like a good design choice.
Is there a more memory-friendly way of changing bytes of an image?

It looks like there are two copies of the pixel data on the managed heap:
The uncompressed data in the Bitmap
The copy of the data in the ByteBuffer
The memory requirement could be halved by leaving the data in the Bitmap and using getPixel() / setPixel() (or perhaps editing a row at a time with the "bulk" variants), but that adds some overhead.
Depending on the nature of the image, you may be able to use a less precise format (e.g. RGB 565 instead of 8888), halving the memory requirement.
As noted in one of the comments, you could uncompress the data to a file, memory-map it with java.nio.channels.FileChannel#map(), and access it through a MappedByteBuffer. This adds a fair bit of overhead to loading and saving, and may be annoying since you have to work through a ByteBuffer rather than a byte[].
Another option is expanding the heap with android:largeHeap (documented here), though in some respects you're just postponing the inevitable: you may be asked to edit an image that is too large for the "large" heap. Also, the capacity of a "large" heap varies from device to device, just as the "normal-sized" heap does. Whether or not this makes sense depends in part on how large the images you're loading are.
Before you do any of this I'd recommend using the heap analysis tools (see e.g. this blog post) to see where your memory is going. Also, look at the logcat above the out-of-memory exception; it should identify the size of the allocation that failed. Make sure it looks "reasonable", i.e. you're not inadvertently allocating significantly more than you think you are.

Related

How much memory should I use in my Java program?

I am making a Java program.
It involves making a image with size up to 9933 * 14043 pixels (which is A0 size and 300 ppi). The image is 24 bit, so it would take up about 400mb of space. The BufferedImage class some how take more RAM than the bitmap's actual size, so the image would comsume about 600 mb of RAM.
With other data, the app would at max take about 700 mb of ram when rendering the large image. I haven't had any problem with it so far. However, if the end user doesn't have enough free ram, the JVM will not be able to allocate the memory for the bitmap and will throw an OutOfMemoryError.
So what should I do?
I came up with something:
Catch the error and throw prompt to the user.
Wait some time until here's enough memory. If the waiting last too long, throw prompt.
Write my own bitmap class, and write the image by part with an FileOutputStream. The .bmp format is not terribly complicated. (Actually I already wrote and optimized most of it.) By rendering the bitmap by part, the whole image doesn't have to stay in RAM. The size of the parts can be changed dynamically according to the available memory size. However this is kind of reinventing the wheel and takes a significant amount of work. Also, the part of the image that involves text must be placed into a BufferedImage and then converted to my class (because I don't want to look into the true font format). Anyway, if the Java BufferedImage class works in my case, I wouldn't go down this way.

I doubt that anyone has less than a gig of ram nowadays. So you can check if the user has enough memory with Runtime.getRuntime().maxMemory(), and if they don't just show an error and close. Here's an example that uses JOptionPane in the case of an error:
long memory = Runtime.getRuntime().maxMemory(); //in bytes
long required = 700 * 1024 * 1024; //700MB, in bytes
if(memory < required) {
JOptionPane.showMessageDialog(null, "You don't have enough memory. (700MB required)", "Error", JOptionPane.ERROR_MESSAGE);
System.exit(0);
}
maxMemory() returns the maximum amount of memory the JVM will attempt to use (in bytes).

Bitmap.Config.HARDWARE vs Bitmap.Config.RGB_565

API 26 adds new option Bitmap.Config.HARDWARE:
Special configuration, when bitmap is stored only in graphic memory.
Bitmaps in this configuration are always immutable. It is optimal for
cases, when the only operation with the bitmap is to draw it on a
screen.
Questions that aren't explained in docs:
Should we ALWAYS prefer now Bitmap.Config.HARDWARE over
Bitmap.Config.RGB_565 when speed is of top priority and quality
and mutability are not (e.g. for thumbnails, etc)?
Does pixel data after decoding using this option actually NOT
consume ANY heap memory and resides in GPU memory only? If so, this seems
to finally be a relief for OutOfMemoryException concern when
working with images.
What quality compared to RGB_565, RGBA_F16 or ARGB_8888 should we expect
from this option?
Is speed of decoding itself the same/better/worth compared to
decoding with RGB_565?
(Thanks #CommonsWare for pointing to it in comments) What would
happen if we exceed GPU memory when decoding an image using this
option? Would some exception be thrown (maybe the same OutOfMemoryException :)?

Documentation and public source code is not pushed yet to Google's git. So my research is based only on partial information, some experiments, and on my own experience porting JVM's to various devices.
My test created large mutable Bitmap and copied it into a new HARDWARE Bitmap on a click of a button, adding it into a bitmap list. I managed to create several instances of the large bitmaps before it crashed.
I was able to find this in the android-o-preview-4 git push:
+struct AHardwareBuffer;
+#ifdef EGL_EGLEXT_PROTOTYPES
+EGLAPI EGLClientBuffer eglGetNativeClientBufferANDROID (const struct AHardwareBuffer *buffer);
+#else
+typedef EGLClientBuffer (EGLAPIENTRYP PFNEGLGETNATIVECLIENTBUFFERANDROID) (const struct AHardwareBuffer *buffer);
And looking for the documentation of AHardwareBuffer, under the hood it is creating an EGLClientBuffer backed by ANativeWindowBuffer (native graphic buffer) in Android shared memory ("ashmem"). But the actual implementation may vary across hardware.
So as to the questions:
Should we ALWAYS prefer now Bitmap.Config.HARDWARE over Bitmap.Config.RGB_565...?
For SDK >= 26, HARDWARE configuration can improve the low level bitmap drawing by preventing the need to copy the pixel data to the GPU every time the same bitmap returns to the screen. I guess it can prevent losing some frames when a bitmap is added to the screen.
The memory is not counted against your app, and my test confirmed this.
The native library docs say it will return null if memory allocation was unsuccessful.
Without the source code, it is not clear what the Java implementation (the API implementors) will do in this case - it might decide to throw OutOfMemoryException or fallback to a different type of allocation.
Update: Experiment reveals that no OutOfMemoryException is thrown. While the allocation is successful - everything works fine. Upon failed allocation - the emulator crashed (just gone). On other occasions I've got a weird NullPointerException when allocating Bitmap in app memory.
Due to the unpredictable stability, I would not recommend using this new API in production currently. At least not without extensive testing.
Does pixel data after decoding using this option actually NOT consume ANY heap memory and resides in GPU memory only? If so, this
seems to finally be a relief for OutOfMemoryException concern when
working with images.
Pixel data will be in shared memory (probably texture memory), but there still be a small Bitmap object in Java referencing it (so "ANY" is inaccurate).
Every vendor can decide to implement the actual allocation differently, it's not a public API they are bound to.
So OutOfMemoryException may still be an issue. I'm not sure how it can be handled correctly.
What quality compared to RGB_565/ARGB_8888?
The HARDWARE flag is not about quality, but about pixel storage location. Since the configuration flags cannot be OR-ed, I suppose that the default (ARGB_8888) is used for the decoding.
(Actually, the HARDWARE enum seem like a hack to me).
Is speed of decoding itself the same/better/worse...?
HARDWARE flag seem unrelated to decoding, so the same as ARGB_8888.
What would happen if we exceed GPU memory?
My test result in very bad things when memory is running out.
The emulator crashed horribly sometimes, and I've got unexpected unrelated NPE on other occasions. No OutOfMemoryException occurred, and there was also no way to tell when the GPU memory is running out, so no way to foresee this.

Why to use BitmapFactory.Options.inTempStorage?

What are intended use cases for the BitmapFactory.Options.inTempStorage option?
Documentation is pretty terse on this:
Temp storage to use for decoding. Suggest 16K or so.
If I'm not mistaken it means that if you don't provide the buffer explicitly, it would create and use one by itself.
So the only benefit I see is reusing the same 16K buffer for multiple decodings which seems to have quite questionable impact on performance/memory usage optimization.
So why SDK authors give us control over the temp storage for decoding? Should providing much greater buffer improve decoding performance?
Can someone expand on this?

It seems that your assumption is the correct one - this option is mainly for recycling the buffer itself.
From the Android Source Code:
// pass some temp storage down to the native code. 1024 is made up,
// but should be large enough to avoid too many small calls back
// into is.read(...) This number is not related to the value passed
// to mark(...) above.
byte [] tempStorage = null;
if (opts != null) tempStorage = opts.inTempStorage;
if (tempStorage == null) tempStorage = new byte[16 * 1024];
This means that if you do not send this buffer, it will be allocated. Though does not look like an optimization for most cases, if you load many small images - the allocation of a 16K buffer per image might be pricy.
Regarding the buffer size, as you can see from the comments in the code - there is no magic number. What happens is that the Native code that decodes the image, uses the InputStream managed code to fetch the actual raw bytes (from disk/network etc). It uses the allocated buffer to communicate the bytes for each READ call. So, it is really depends on the InputStream. For example, disk IS might read from the disk in a bulk of 4k and then 16k is more than enough - passing in a buffer bigger than that will not improve the performance since the buffer will not fill up more than 4k at each READ call.
In any case, considering this kind of optimization should be for a really specific cases - if you have such a case, you can provide a bigger buffer and see if it has any affect on the performance.

Java Heap Hard Drive

I have been working on a Java program that generates fractal orbits for quite some time now. Much like photographs, the larger the image, the better it will be when scaled down. The program uses a 2D object (Point) array, which is written to when a point's value is calculated. That is to say the Point is stored in it's corresponding value, I.e.:
Point p = new Point(25,30);
histogram[25][30] = p;
Of course, this is edited for simplicity. I could just write the point values to a CSV, and apply them to the raster later, but using similar methods has yielded undesirable results. I tried for quite some time because I enjoyed being able to make larger images with the space freed by not having this array. It just won't work. For clarity I'd like to add that the Point object also stores color data.
The next problem is the WriteableRaster, which will have the same dimensions as the array. Combined the two take up a great deal of memory. I have come to accept this, after trying to change the way it is done several times, each with lower quality results.
After trying to optimize for memory and time, I've come to the conclusion that I'm really limited by RAM. This is what I would like to change. I am aware of the -Xmx switch (set to 10GB). Is there any way to use Windows' virtual memory to store the raster and/or the array? I am well aware of the significant performance hit this will cause, but in lieu of lowering quality, there really doesn't seem to be much choice.

The OS is already making hard drive space into RAM for you and every process of course -- no magic needed. This will be more of a performance disaster than you think; it will be so slow as to effectively not work.
Are you looking for memory-mapped files?
http://docs.oracle.com/javase/6/docs/api/java/nio/MappedByteBuffer.html
If this is really to be done in memory, I would bet that you could dramatically lower your memory usage with some optimization. For example, your Point object is mostly overhead and not data. Count up the bytes needed for the reference, then for the Object overhead, compared to two ints.
You could reduce the overhead to nothing with two big parallel int arrays for your x and y coordinates. Of course you'd have to encapsulate this for access in your code. But it could halve your memory usage for this data structure. Millions fewer objects also speeds up GC runs.
Instead of putting a WritableRaster in memory, consider writing out the image file in some simple image format directly, yourself. BMP can be very simple. Then perhaps using an external tool to efficiently convert it.
Try -XX:+UseCompressedOops to reduce object overhead too. Also try -XX:NewRatio=20 or higher to make the JVM reserve almost all its heap for long-lived objects. This can actually let you use more heap.

It is not recommended to configure your JVM memory parameters (Xmx) in order to make the operating system to allocate from it's swap memory. apparently the garbage collection mechanism needs to have random access to heap memory and if doesn't, the program will thrash for a long time and possibly lock up. please check the answer given already to my question (last paragraph):
does large value for -Xmx postpone Garbage Collection

Optimal data structure for a large number of (float) values

I'm developing a visualization app for Android (including older devices running Android 2.2).
The input model of my app contains an area, which typically consists of tens of thousands of vertices. Typical models have 50000-100000 vertices (each with an x,y,z float coord), i.e. they use up 600K-1200 kilobytes of total memory. The app requires all vertices are available in memory at any time. This is all I can share about this app (I'm not allowed to share high-level use cases), so I'm wondering if my below conclusions are correct and whether there is a better solution.
For example, assume there are count=50000 vertices. I see two solutions:
1.) My earlier solution was using an own VertexObj (better readability due to encapsulation, better locality when accessing individual coordinates):
public static class VertexObj {
public float x, y, z;
}
VertexObj mVertices = new VertexObj[count]; // 50,000 objects
2.) My other idea is using a large float[] instead:
float[] mVertices = new VertexObj[count * 3]; // 150,000 float values
The problem with the first solution is the big memory overhead -- we are on a mobile device where the app's heap might be limited to 16-24MB (and my app needs memory for other things too). According to the official Android pages, object allocation should be avoided when it is not truly necessary. In this case, the memory overhead can be huge even for 50,000 vertices:
First of all, the "useful" memory is 50000*3*4 = 600K (this is used up by float values). Then we have +200K overhead due to the VertexObj elements, and probably another +400K due to Java object headers (they're probably at least 8 bytes per object on Android, too). This is 600K "wasted" memory for 50,000 vertices, which is 100% overhead (!). In case of 100,000 vertices, the overhead is 1.2MB.
The second solution is much better, as it requires only the useful 600K for float values.
Apparently, the conclusion is that I should go with float[], but I would like to know the risks in this case. Note that my doubts might be related with lower-level (not strictly Android-specific) aspects of memory management as well.
As far as I know, when I write new float[300000], the app requests the VM to reserve a contiguous block of 300000*4 = 1200K bytes. (It happened to me in Android that I requested a 1MB byte[], and I got an OutOfMemoryException, even though the Dalvik heap had much more than 1MB free. I suppose this was because it could not reserve a contiguous block of 1MB.)
Since the GC of Android's VM is not a compacting GC, I'm afraid that if the memory is "fragmented", such a hugefloat[] allocation may result in an OOM. If I'm right here, then this risk should be handled. E.g. what about allocating more float[] objects (and each would store a portion such as 200KB)? Such linked-list memory management mechanisms are used by operating systems and VMs, so it sounds unusual to me that I would need to use it here (on application level). What am I missing?
If nothing, then I guess that the best solution is using a linked list of float[] objects (to avoid OOM but keep overhead small)?

The out of memory you are facing while allocating the float array is quite strange.
If the biggest countinous memory block available in the heap is smaller then the memory required by the float array, the heap increases his size in order to accomodate the required memory.
Of course, this would fail if the heap has already reach the maximum available to your application. This would mean, that your application has exausted the heap, and then release a significant number of objects that resulted in memory fragmentation, and no more heap to allocate. However, if this is the case, and assuming that the fragmented memory is enough to hold the float array (otherwise your application wouldn't run anyawy), it's just a matter of allocating order.
If you allocate the memory required for the float array during application startup, you have plenty of countinous memory for it. Then, you just let your application do the remaining stuff, as the countigous memory is already allocated.
You can easly check the memory blocks being allocated (and the free ones) using DDMS in Eclipse, selecting yout app, and pressing Update Heap button.
Just for the sake of of avoiding misleading you, I've tested it before post, allocationg several contigous memory bloocks of float[300000].
Regards.

I actually ran into a problem where I wanted to embed data for a test case. You'll have quite a fun time embedding huge arrays because Eclipse kept complaining when the function exceeded something like 65,535 bytes of data due to me declaring an array like that. However, this is actually a quite common approach.
The rest goes into optimization. The big question is this: would be worth the trouble doing all of that optimizing? If you aren't hard-up on RAM, you should be fine using 1.2 megs. There's also a chance that Java will whine if you have an array that large, but you can do things like use a fancier data structure like a LinkedList or chop up the array into smaller ones. For statically set data, I feel an array might be a good choice if you are reading it like crazy.
I know you can make .xml files for integers, so storing as an integer with a tactic like multiplying the value, reading it in, and then dividing it by a value would be another option. You can also put in things like text files into your assets folder. Just do it once in the application and you can read/write however you like.
As for double vs float, I feel that in your case, a math or science case, that doubles would be safer if you can pull it off. If you do any math, you'll have less chance of error with double, especially with an operation like multiplication. floats are usually faster. I'm not sure if Java does SIMD packing, but if it does, more floats can be packed into an SIMD register than doubles

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.