I'm trying to implement image compression algorithm based on DCT for color JPEG. I'm newbie in image processing so I need some help. What I need is clarification of an algorithm.
I'm using DCT implementation from here
So, here is the algorithm as I understood it:
Load an image using ImageIO into BufferedImage.
Create 3 matrices (1 for each channel: red, green, blue):
int rgb = bufferedImage.getRGB(i, j);
int red = (rgb >> 16) & 0xFF;
int green = (rgb >> 8) & 0xFF;
int blue = rgb & 0xFF;
Increase matrices to the size so they can be split in chunks 8x8 (where 8 is the size of DCT matrix, N)
For each matrix, split it into chunks of the size 8x8 (result: splittedImage)
Perform forwardDCT on matrices from splittedImage (result: dctImage).
Perform quantization on matrices from dctImage (result: quantizedImage)
Here I don't know what to do. I can:
merge quantizedImage matrices into one matrix margedImage, convert it into Vector and perform compressImage method.
or convert small matrices from quantizedImage into Vector and perform compressImage method on them, and then marge them into one matrix
So, here I got 3 matrices for red, green and blue colors. Than I convert those matrices into one RGB matrix and create new BufferedImage and using method setRGB to set pixel values. Then perform saving image to file.
Extra questions:
Is it better to convert RGB into YCbCr and perform DCT on Y, Cb and Cr?
Javadoc of compressImage method says that it's not Huffman Encoding, but Run-Length encoding. So will the compressed image be opened by image viewer? Or I should use Huffman Encoding according to JPEG specification, and is there any open source Huffman Encoding implementation in Java?
If you want to follow the implementation steps, I suggest reading:
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_1?ie=UTF8&qid=1399765722&sr=8-1&keywords=compressed+image+file+formats
In regard your questions:
1) The JPEG standard knows nothing about color spaces and does not care whether you use RGB or YCbCr, or CMYK. There are several JPEG file format (e.g., JFIF, EXIF, ADOBE) that specify the color spaces--usually YCbCr.
The reason for using YCbCr is that if follows the JPEG trend of concentrating information. There tends to be more useful information in the Y component than the Cb or Cr components. Using YCbCr, you can sample 4 Ys for ever Cb and Cr (or even 16) for every Y. That reduces the amount of data to be compressed by 1/2.
Note that the JPEG file formats specify limits on sampling (JPEG allows 2:3 sampling while most implementations do not).
2) The DCT coefficients are Runlength encoded then huffman (or arithmetic) encoded. You have to use both.
Related
I have been playing around with some image processing tools in java and was wondering how to create my own image format (with its own file extension and header).
Say I am trying to store 2 jpeg images into a new file with extension .abcde
How would I approach this and modify the file header?
I did some research and found this, but it didnt have any sort of example.
https://gamedevelopment.tutsplus.com/tutorials/create-custom-binary-file-formats-for-your-games-data--gamedev-206
Any advice/references/examples would be great, thanks.
Okay. Here you should ask yourself a question — what is binary file? It is some type of "box" that contains 8-bit values (0-255 range). And what actually images are? The same. But data in images is stored specifically. If you look at bitmaps (*.BMP files) for example, you'll see that it contains RGB values (it actually depends on the color depth, but we are talking about 24-bit colors now). So, what does it mean? It means that each pixel is stored with 3 bytes (24 bits) for Red Green and Blue values.
EG: 0x00 0x00 0x00 — This is black pixel in hex representation
0xFF 0xFF 0xFF — White.
But storing image data in such way is quite inefficient. For example, 1024x1024 image would be 1024*1024*3=3145728 bytes!! Or 3 MEGABYTES. And that is not including alpha channel, which also stored with one byte.
So, here the data compression comes in, it can be lossy or lossless. If you open for example a PNG file in the hex editor, you'll see a DEFLATE compressed data (LZ77+Huffman coding). It is lossless.
GIF files are compressed with LZW, which is quite inefficient nowadays. There lots of really nice image formats such as FLIF or BPG, which compress images better than PNG and JPEG.
So, if you want to create your own file format, you can create just a raw pixel container file (like BMP or PCX), or compressed pixel file (here you should write a custom data compression algorithm, C or C++ suits the best for this purpose).
I need to do some image manipulation in java. I am porting python code, which uses numpy arrays with dimensions cols, rows, channels; these are floating point. I know how to get RGB out of a BufferedImage and how to put it back; this question is about how to lay out the resulting float image.
Here are some of the options:
direct translation:
float[][][] img = new float[cols][rows][channels];
put channels first:
float[][][] img = new float[channels][cols][rows];
combine indexes:
float[] img = new float[rows*cols*channels];
img[ i * cols * channels + j * channels + k ] = ...;
Option 1 has the advantage that it reads the same as the original code; but it seems non-idiomatic for Java, and probably not fast.
Option 2 should be faster, if I understand how Java N-dimensional arrays work under the hood; at the cost of looking slightly odd. It seems this allocates channels*cols arrays of size rows, as opposed to option 1 which allocates rows*cols arrays of size channels (a very large number of tiny arrays = large overhead).
Option 3 seems to be closest to what the AWT and other Java code does; but it requires passing around the dimensions (they are not built into the array) and it is very easy to get the indexing wrong (especially when there is other index arithmetic going on).
Which of these is better and why? What are some of the other pros and cons? Is there an even better way?
UPDATE
I benchmarked options 1 and 2, on a non-trivial example of image processing which runs four different algorithms (in a 10x loop, so the VM gets to warm up). This is on OpenJDK 7 on Ubuntu, Intel i5 cpu. Surprisingly, there isn't much of a speed difference: option 2 is about 6% slower than option 1. There is a pretty large difference in amount of memory garbage-collected (using java -verbose:gc): option 1 collects 1.32 GB of memory during the entire run, while option 2 collects only 0.87 GB (not quite half, but then again not all images used are color). I wonder how much difference there will be in Dalvik?
BoofCV has float image types and the raw pixel data can manipulated directly. See the tutorial.
BoofCV provides several routines for quickly converting BufferedImage into different BoofCV image types. Using BoofCV routines for converting to/from BufferedImages are very fast.
Convert a BufferedImage to a multispectral float type image with BoofCV:
MultiSpectral<ImageFloat32> image =
ConvertBufferedImage.convertFromMulti(image,null,true,ImageFloat32.class);
Access pixel value from the float image array:
float value = image.getBand(i).data[ image.startIndex + y*image.stride + x];
Another way to get and set the pixel value:
float f = image.getBand(i).get(x, y);
...
image.getBand(i).set(x, y, f);
Where i represents the index of the color channel.
Convert a BoofCV image back to BufferedImage:
BufferedImage bufferedImage =
new BufferedImage(image.width, image.height, BufferedImage.TYPE_4BYTE_ABGR);
BufferedImage bufferedImage = ConvertBufferedImage.convertTo(
image, bufferedImage, true);
You are right, option 3 has a smaller memory footprint.
As for which performs better, you'd have to profile and/or benchmark the options.
Given your statement that row and column counts are large, I'd go with option 3, but wrap the array in a class that knows the dimensions, e.g. called Image.
The option 3 is used by the BufferedImage in Java. It's good for memory as said Andreas, but for image processing and information continuity it's not optimal.
The most practical would be:
float[][] img = new float[channels][cols*rows];
Like that, the channels are separated and thus can be processed independently. This representation would be optimal if you want to call native codes.
i am performing operations on a grayscale image, and the resultant image of these operations has the same extension as the input image. for an example if the input image is .jpg or .png the output image is either .jpg or .png respectively.
and I am converting the image into grayscale as follows:
ImgProc.cvtColor(mat, grayscale, ImgProc.COLOR_BGR2GRAY),
and I am checking the channels count using:
.channels()
the problem is when I wnat to know how many channels the image contain, despit it is a grayscale image, i always receive umber of channels = 3!!
kindly please let me know why that is happening
The depth (or better color depth) is the number of bits used to represent a color value. a color depth of 8 usually means 8-bits per channel (so you have 256 color values - or better: shades of grey- per channel - from 0 to 255) and 3 channels mean then one pixel value is composed of 3*8=24 bits.
However, this also depends on nomenclature. Usually you will say
"Color depth is 8-bits per channel"
but you also could say
"The color depth of the image is 32-bits"
and then mean 8 bits per RGBA channel or
"The image has a color depth of 24-bits"
and mean 8-bits per R,G and B channels.
The grayscale image has three channels because technically it is not a grayscale image. It is a colored image with the same values for all the three channels (r, g, b) in every pixel. Therefore, visually it looks like a grayscale image.
To check the channels in the image, use-
img.getbands()
Is there a simple way to get an rgba int[] from an argb BufferedImage? I need it to be converted for opengl, but I don't want to have to iterate through the pixel array and convert it myself.
OpenGL 1.2+ supports a GL_BGRA pixel format and reversed packed pixels.
On the surface BGRA does not sound like what you want, but let me explain.
Calls like glTexImage2D (...) do what is known as pixel transfer, which involves packing and unpacking image data. During the process of pixel transfer, data conversion may be performed, special alignment rules may be followed, etc. The data conversion step is what we are particularly interested in here; you can transfer pixels in a number of different layouts besides the obvious RGBA component order.
If you reverse the byte order (e.g. data type = GL_UNSIGNED_INT_8_8_8_8_REV) together with a GL_BGRA format, you will effectively transfer ARGB pixels without any real effort on your part.
Example glTexImage2D (...) call:
glTexImage2D (..., GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, image);
The usual use-case for _REV packed data types is handling endian differences between different processors, but it also comes in handy when you want to reverse the order of components in an image (since there is no such thing as GL_ARGB).
Do not convert things for OpenGL - it is perfectly capable of doing this by itself.
In order to transition between argb and rgba you can apply "bit-wise shifts" in order to convert them back and forth in a fast and concise format.
argb = rgba <<< 8
rgba = argb <<< 24
If you have any further questions, this topic might should give you a more in-depth answer on converting between rgba and argb.
Also, if you'd like to learn more about java's bitwise operators check out this link
I'm developing an android app that for every YUV image passed from camera, it randomly pick 10 pixels from it and check if they are red or blue.
I know how to do this for RGB images, but not for YUV format.
I cannot convert it pixel by pixel into a RGB image because of the run time constrains.
I'm assuming you're using the Camera API's preview callbacks, where you get a byte[] array of data for each frame.
First, you need to select which YUV format you want to use. NV21 is required to be supported, and YV12 is required since Android 3.0. None of the other formats are guaranteed to be available. So NV21 is the safest choice, and also the default.
Both of these formats are YUV 4:2:0 formats; the color information is subsampled by 2x in both dimensions, and the layout of the image data is fairly different from the standard interleaved RGB format. FourCC.org's NV21 description, as one source, has the layout information you want - first the Y plane, then the UV data interleaved. But since the two color planes are only 1/4 of the size of the Y plane, you'll have to decide how you want to upsample them - the simplest is nearest neighbor. So if you want pixel (x,y) from the image of size (w, h), the nearest neighbor approach is:
Y = image[ y * w + x];
U = image[ w * h + floor(y/2) * (w/2) + floor(x/2) + 1]
V = image[ w * h + floor(y/2) * (w/2) + floor(x/2) + 0]
More sophisticated upsampling (bilinear, cubic, etc) for the chroma channels can be used as well, but what's suitable depends on the application.
Once you have the YUV pixel, you'll need to interpret it. If you're more comfortable operating in RGB, you can use these JPEG conversion equations at Wikipedia to get the RGB values.
Or, you can just use large positive values of V (Cr) to indicate red, especially if U (Cb) is small.
From the answer of Reuben Scratton in
Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?
You can make the camera preview use RGB format instead of YUV.
Try this:
Camera.Parameters.setPreviewFormat(ImageFormat.RGB_565);
YUV is just another colour space. You can define red in YUV space just as you can in RGB space. A simple calculator suggests an RGB value of 255,0,0 (red) should appear as something like 76,84,255 in YUV space so just look for something close to that.