Fast way to compress binary data?

Fast way to compress binary data? - java

I have some binary data (pixel values) in a int[] (or a byte[] if you prefer) that I want to write to disk in an Android app. I only want to use a small amount of processing time but want as much compression as I can for this. What are my options?
In many cases the array will contain lots of consecutive zeros so something simple and fast like RLE compression would probably work well. I can't see any Android API functions for this though. If I have to loop over the array in Java, this will be very slow as there is no JIT on most Android devices. I could use the NDK but I'd rather avoid this if I can.

DeflatorOutputStream takes ~25 ms to compress 1 MB in Java. Its a native method so a JIT should not make much difference.
Do you have a requirement which says 0.2s or 0.5s is too slow?
Can you do it in a background thread so the user doesn't notice how long it takes?
GZIP is based on the Deflator + CRC32 so is likely to be much the same or slightly slower.
Deflator has several modes. The DEFAULT_STRATEGY is fastest in Java, but simpler compressions such as HUFFMAN_ONLY might be faster for you.

Android has Java's DeflaterOutputStream. Would that work?

Pass the byte array to
http://download.oracle.com/javase/6/docs/api/java/io/FileWriter.html
and chain
http://download.oracle.com/javase/1.4.2/docs/api/java/util/zip/GZIPOutputStream.html
to it
then when you need to read the data back in do the reverse
http://download.oracle.com/javase/1.4.2/docs/api/java/io/FileReader.html
and chain
http://download.oracle.com/javase/1.4.2/docs/api/java/util/zip/GZIPInputStream.html
Depending on the size of the file your saving you will see some compression Gzip is good like that, if your not seeing much of a trade of just write the data uncompressed using a buffered writer(That should be the fastest). Also if you do gzip it using a buffered writer reader could also speed it up a bit.

I've had to solve basically the same problem on another platform and my solution was to use a modified LZW compression. First, do some difference filtering (similar to PNG) on the 32bpp image. This will turn most of the image to black if there are large areas of common color. Then use a generic GIF compression algorithm treating the filtered image as if it's 8bpp. You'll get decent compression and it works very quickly. This will need to run in native code (NDK). It's really quite easy to get native code working on Android.

Random thought: if it's image data, try saving it as png. Standard java has it, i'm sure android will too, and probably optimized with native code. It has pretty good compression and it's lossless.

Related

Stream image format conversions and resizing in Java

So, lets say I want to recode some PNG to JPEG in Java. The image has extreme resolution, lets say for example 10 000 x 10 000px. Using "standard" Java image API Writers and Reader, you need at some point to have entire image decoded in RAM, which takes extreme amount of RAM space (hundreds of MB). I have been looking how other tools do this, and I found that ImageMagick uses disk pixel storage, but this seems to by way too slower for my needs. So what I need is tru streaming recoder. And by true streaming I mean read and process data by chuncks or bins, not just give stream as input but decode it whole beforehand.
Now, first the theory behind - is it even possible, given JPEG and PNG algorithms, to do this using streams, or lets say in bins of data? So there is no need to have entire image encoded in memory(or other storage)? In JPEG compression, first few stages could be done in streams, but I believe Huffman encoding needs to build entire tree of value probabilities after quantization, therefore it needs to analyze whole image - so whole image needs to be decoded beforehand, or somehow on demand by regions.
And the golden question, if above could be achieved, is there any Java library that can actually work in this way? And save large amount of RAM?

If I create a 10,000 x 10,000 PNG file, full of incompressible noise, with ImageMagick like this:
convert -size 10000x10000 xc:gray +noise random image.png
I see ImageMagick uses 675M of RAM to create the resulting 572MB file.
I can convert it to a JPEG with vips like this:
vips im_copy image.png output.jpg
and vips uses no more than 100MB of RAM while converting, and takes 7 seconds on a reasonable spec iMac around 4 years old - albeit with SSD.

I have thought about this for a while, and I would really like to implement such a library. Unfortunately, it's not that easy. Different image formats store pixels in different ways. PNG or GIFs may be interlaced. JPEGs may be progressive (multiple scans). TIFFs are often striped or tiled. BMPs are usually stored bottom up. PSDs are channeled. Etc.
Because of this, the minimum amount of data you have to read to recode to a different format, may in worst case be the entire image (or maybe not, if the format supports random access and you can live with a lot of seeking back and forth)... Resampling (scaling) the image to a new file using the same format would probably work in most cases though (probably not so good for progressive JPEGs, unless you can resample each scan separately).
If you can live with disk buffer though, as the second best option, I have created some classes that allows for BufferedImages to be backed by nio MappedByteBuffers (memory-mapped file Buffers, kind of like virtual memory). While performance isn't really like in-memory images, it's also not entirely useless. Have a look at MappedImageFactory and MappedFileBuffer.

I've written a PNG encoder/decoder that does that (read and write progressively, which only requires to store a row in memory) for PNG format: PNGJ
I don't know if there is something similar with JPEG

Advice on replacing a block of bytes in a file at run time, when the file is read

Folks. I trust that the community will see this as a relevant question. My apologies if not and mods, please close.
I am developing a video playback app with static content for a customer. My customer wants me to implement some basic security to stop someone unpacking the deployed app (it's for Android) and simply copying the MPEGs. My customer has made basic protection a critical requirement and, he's paying the bills :)
The files are too big to decrpyt on the fly so I'm considering the following approach. I'd welcome thoughts and suggestions as to alternatives. I am aware of the arguments for and against copy protection schemes and security through obscurity, which my proposed approach uses and my question is not "should I?".
Take a block of bytes, say 256, from somewhere in the header of the MPG. Replace those bytes with random values such that the MPEG won't play without a lot of effort to repair it. Store the original 256 bytes in one of the apps bitmaps such that the bitmap still displays properly. When playing the video, read it in through a byte stream and replace the bytes with their original values before passing them to the output stream.
In summary:
Extract 256 bytes from the header of the MPEG
Store these bytes in a bitmap
Randomise values in the original bytes
At run time, read the 256 bytes back out of the bitmap
Read MPEG through an inputstream using a byte array buffer
Replace randomised bytes with the original values
Stream the input to an outputstream which is the input to the video player.
I do recognise at least 2 ways to defeat this, reverse engineering and screen grabbing but the point is to prevent the average thief simply copying my customers content with no effort.
Thoughts folks?
Thanks

I would suggest using an encryption/decryption scheme for the entire stream:
Real time video stream decryption is the standard way to deal with this issue. Its processing overhead is negligible when compared to the actual video decoding. For example, each and every single DVD player out there supports the CSS encryption scheme.
While using Java does impose some restrictions, such as the inability to make effective use of various CPU-specific instructions, you should be able to find a decryption algorithm that is not very expensive. I would suggest profiling your application before rejecting stream encryption algorithms out of hand.
Mangling the header does make some video files hard to read, but far from impossible. Some files have redundant information, others are actually the result of straight-out concatenation which would leave any following segments readable. Some streaming video codecs actually insert enough metadata to rebuild the stream every few seconds. And there are a lot of video formats out there.
In other words there is no way to guarantee that removing any number of bytes from the start of a file would make it unreadable. I also think that imposing on your client a bunch of restrictions w.r.t. the video formats that they can use is not reasonable and limits the future usefulness of your application.

Java - Parallelizing Gzip

I was assigned to parallelize GZip in Java 7, and I am not sure which is possible.
The assignment is:
Parallelize gzip using a given number of threads
Each thread takes a 1024 KiB block, using the last 32 KiB block from
the previous block as a dictionary. There is an option to use no
dicitionary
Read from Stdin and stdout
What I have tried:
I have tried using GZIPOutputStream, but there doesn't seem to be a
way to isolate and parallelize the deflate(), nor can I access the
deflater to alter the dictionary. I tried extending off of GZIPOutputStream, but it didn't seem to act as I wanted to, since I still couldn't isolate the compress/deflate.
I tried using Deflater with wrap enabled and a FilterOutputStream to
output the compressed bytes, but I wasn't able to get it to compress
properly in GZip format. I made it so each thread had a compressor that will write to a byte array, then it will write to the OutputStream.
I am not sure if I am did my approaches wrong or took the wrong approaches completely. Can anyone point me the right direction for which classes to use for this project?

Yep, zipping a file with dictionary can't be parallelized, as everything depends on everything. Maybe your teacher asked you to parallelize the individual gzipping of multiple files in a folder? That would be a great example of parallelized work.

To make a process concurrent, you need to have portions of code which can run concurrently and independently. Most compression algorithms are designed to be run sequentially, where every byte depends on every byte has comes before.
The only way to do compression concurrently is to change the algorythm (making it incompatible with existing approaches)

I think you can do it by inserting appropriate resets in the compression stream. The idea is that the underlying compression engine used in gzip allows the deflater to be reset, with an aim that it makes it easier to recover from stream corruption, though at a cost of making the compression ratio worse. After reset, the deflater will be in a known state and thus you could in fact start from that state (which is independent of the content being compressed) in multiple threads (and from many locations in the input data, of course) produce a compressed chunk and include the data produced when doing the following reset so that it takes the deflater back to the known state. Then you've just to reassemble the compressed pieces into the overall compressed stream. “Simple!” (Hah!)
I don't know if this will work, and I suspect that the complexity of the whole thing will make it not a viable choice except when you're compressing single very large files. (If you had many files, it would be much easier to just compress each of those in parallel.) Still, that's what I'd try first.
(Also note that the gzip format is just a deflated stream with extra metadata.)

how to optimize image in java for performance

im transeffing image throw tcp/ip and i like to optimize it and still good quality as much as possible
what kind of methods or algorithms i can use ?
p.s
now if i think about it maybe i should ask what is the best and the fast way to send image
via tcp/ip

To find the right answer to your question, you need to have a look at the images themselves. Are they real world images captured on camera? Or are they synthetic images, like icons or graphs?
Lossy compression (like JPEG) works very well for real scenes with many gradients and smooth edges. For images with solid colors and hard edges, you have a much higher (even perceived) loss in image quality and less gain in compression rates compared to lossless compression.
Basically, established image formats for your domain are PNG (Portable Network Graphics) and JPEG. PNG images are always compressed lossless, but their compression algorithm works better than competition, i.e. GIF. If the images are well-suited, you gain compression rates comparable to JPEG, if not (like real world images), you gain typical ZIP compression rates (around 50%).
After determining lossy/lossless compression (or a combination, based on picture type -- you could also think of compressing images first in both formats and then compare, if processing time does not matter as much as network througput), you should also take the advantage of progressive coding, which is supported both by JPEG and PNG formats.
With progressive coding, basically the data is organized in a way that the more data you receive, the better the quality (other than just sending the images row-by-row). The advantage here is that you can show the image to the user already while it is still being received. However, for this you need a decoder who exposes this functionality.
I don't know about the libraries available in Java for this.

You should check Java Advanced Imaging API.
But to use it effectively you will need to understand what type of image operations are right for your problem. This will depend, among other things, on the encoding of your source image.
As for the "good quality as much as possible", you will most likely need to experiment with various compression techniques and their relevant parameters before deciding which one gives the right balance of speed, size and quality for your needs.

You may take a look at this. It's a comparison between common compression algorithms (quality and compression rate).
Edit: it is not directly java, but you probably can find an implementation of the desired algorithm.

For images intended for human viewing JPEG is quite nice. What is in the remote end? A browser?

How would you change a single byte in a file?

What is the best way to change a single byte in a file using Java? I've implemented this in several ways. One uses all byte array manipulation, but this is highly sensitive to the amount of memory available and doesn't scale past 50 MB or so (i.e. I can't allocate 100MB worth of byte[] without getting OutOfMemory errors). I also implemented it another way which works, and scales, but it feels quite hacky.
If you're a java io guru, and you had to contend with very large files (200-500MB), how might you approach this?
Thanks!

I'd use RandomAccessFile, seek to the position I wanted to change and write the change.

If all I wanted to do was change a single byte, I wouldn't bother reading the entire file into memory. I'd use a RandomAccessFile, seek to the byte in question, write it, and close the file.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.