Ive made this method for getting me the pixel values of an image, im using it to compare 1 image against 50 other images. However it takes forever to produce outputs. Does anyone know of a way l can speed this method up? Would converting the images to Grayscale be a quicker way? If anyone could help with code, that would be great!
public static double[] GetHistogram (BufferedImage img) {
double[] myHistogram = new double [16777216];
for (int y = 0; y < img.getHeight(); y += 1)
{
for (int x = 0; x < img.getWidth(); x += 1)
{
int clr = img.getRGB(x,y);
Color c = new Color(img.getRGB(x, y));
int pixelIntValue = (int) c.getBlue() * 65536 + c.getGreen() * 256 + c.getRed();
myHistogram[pixelIntValue]++;
}
}
return myHistogram;
}
TLDR: use a smaller image and read this paper.
You should try to eliminate any unnecessary function calls as #Piglet mentioned, but you should definitely keep the colors in one histogram instead of a separate histogram for R, G, and B. Aside from getting rid of the extra function calls, I think there are four things you can do to speed up your algorithm—both creating and comparing the histograms—and reduce the memory usage (because less page caching means less disk thrashing and more speed).
Use a smaller image
One of the advantages of color histogram indexing is that it is relatively independent of resolution. The color of an object does not change with the size of the image. Obviously, there are limits to this—imagine trying to match objects using a 1×1 image. However, if your images have millions of pixels (like the images from most smart phones these days), you should definitely resize it. These authors found that an image resolution of only 16×11 still produced very good results [see page 17], but even resizing down to ~100×100 pixels should still provide a significant speed-up.
BufferedImage inherits the method getScaledInstance from Image, which you can use to get a smaller image.
double scalingFactor = 0.25; //You need to choose this value to work with your images
int aSmallHeight = myBigImage.getHeight() * scalingFactor;
int aSmallWidth = myBigImage.getWidth() * scalingFactor;
Image smallerImage = myBigImage.getScaledInstance(aSmallWidth, aSmallHeight, SCALE_FAST);
Reducing your image size is the single most effective thing you can do to speed up your algorithm. If you do nothing else, at least do this.
Use less information from each color channel
This won't make as much difference for generating your histograms because it will actually require a little more computation, but it will dramatically speed up comparing the histograms. The general idea is called quantization. Basically, if you have red values in the range 0..255, they can be represented as one byte. Within that byte, some bits are more important than others.
Consider this color sample image. I placed a mostly arbitrary shade of red in the top left, and in each of the other corners, I ignored one or more bits in the red channel (indicated by the underscores in the color byte). I intentionally chose a color with lots of one bits in it so that I could show the "worst" case of ignoring a bit. (The "best" case, when we ignore a zero bit, has no effect on the color.)
There's not much difference the upper right and upper left corners, even though we ignored one bit. The upper left and lower left have a visible, but minimal difference even though we ignored 3 bits. The Upper left and lower right corners are very different even though we ignored only one bit because it was the most significant bit. By strategically ignoring less significant bits, you can reduce the size of your histogram, which means there's less for the JVM to move around and fewer bins when it comes time to compare them.
Here are some solid numbers. Currently, you have 28×28×28 = 16777216 bins. If you ignore the 3 least significant bits from each color channel, you will get
25×25×25 = 32768 bins, which is 1/512 of the number of bins you are currently using. You may need to experiment with your set of images to see what level of quantization still produces acceptable results.
Quantization is very simple to implement. You can just ignore the rightmost bits by performing the bit shift operations.
int numBits = 3;
int quantizedRed = pixelColor.getRed() >> numBits;
int quantizedGreen = pixelColor.getGreen() >> numBits;
int quantizedBlue = pixelColor.getBlue() >> numBits;
Use a different color space
While grayscale might be quicker, you should not use grayscale because you lose all of your color information that way. When you're matching objects using color histograms, the actual hue or chromaticity is more important than how light or dark something is. (One reason for this is because the lighting intensity can vary across an image or even between images.) There are other representations of color that you could use that don't require you to use 3 color channels.
For example, L*a*b* (see also this) uses one channel (L) to encode the brightness, and two channels (a, b) to encode color. The a and b channels each range from -100 to 100, so if you create a histogram using only a and b, you would only need 40000 bins. The disadvantage of a histogram of only a and b is that you lose the ability to record black and white pixels. Other color spaces each have their own advantages and disadvantages for your algorithm.
It is generally not very difficult to convert between color spaces because there are many existing implementations of color space conversion functions that are freely available on the internet. For example, here is a Java conversion from RGB to L*a*b*.
If you do choose to use a different color space, be careful using quantization as well. You should apply any quantization after you do the color space conversion, and you will need to test different quantization levels because the new color space might be more or less sensitive to quantization than RGB. My preference would be to leave the image in RGB because quantization is already so effective at reducing the number of bins.
Use different data types
I did some investigating, and I notices that BufferedImage stores the image as a Raster, which uses a SampleModel to describe how pixels are stored in the data buffer. This means there is a lot of overhead just to retrieve the value of one pixel. You will achieve faster results if your image is stored as byte[] or int[]. You can get the byte array using
byte[] pixels = ((DataBufferByte) bufferedImage.getRaster().getDataBuffer()).getData();
See the answer to this previous question for more information and some sample code to convert it to a 2D array.
This last thing might not make much difference, but I noticed that you are using double for storing your histogram. You should consider whether int would work instead. In Java, int has a maximum value of > 2 billion, so overflow shouldn't be an issue (unless you are making a histogram of an image with more than 2 billion pixels, in which case, see my first point). An int uses only half as much memory as a double (which is a big deal when you have thousands or millions of histogram bins), and for many math operations they can be faster (though this depends on your hardware).
If you want to read more about color histograms for object matching, go straight to the source and read Swain and Ballard's Color Indexing paper from 1991.
Calculating a histogram with 16777216 classes is quite unusual.
Most histograms are calculated for each channel separately resulting in a 256 class histogram each for R,G and B. Or just one if you convert the image to grayscale.
I am no expert in Java. I don't know how clever the compilers optimize code.
But you call img.getHeight() for every row and img.getWidth() for every column of your image.
I don't know how often those expressions are actually evaluated but maybe you can save some processing time if you just use 2 variables that you assign the width and height of your image to befor you start your loops.
You also call img.getRGB(x,y) twice for every pixel. Same story. Maybe it is faster to just do it once. Function calls are usually slower than reading variables from memory.
You should also think about what you are doing here. img.getRGB(x,y) gives you an integer representation for a color.
Then you put that integer into a contrustor to make a Color object out of it. Then you use c.getBlue() and so on to get integer values for red, green and blue out of that Color object. Just to put it together into a integer again?
You could just use the return value of getRGB straight away and at least save 4 function calls, 3 multiplications, 3 summations...
So again given that I programmed Java for the last time like 10 years ago my function would look more like that:
public static double[] GetHistogram (BufferedImage img) {
double[] myHistogram = new double [16777216];
int width = img.getWidth()
int height = img.getHeight()
for (int y = 0; y < height; y += 1)
{
for (int x = 0; x < width; x += 1)
{
int clr = img.getRGB(x,y);
myHistogram[clr]++;
}
}
return myHistogram;
}
Of course the array type and size won't be correct and that whole 16777216 class histogram doesn't make sense but maybe that helps you a bit to speed things up.
I'd just use a bit mask to get the red, green and blue values out of that integer and create three histograms.
Related
I am trying to do some image processing in Java. I used ImageIO library for reading and writing images. I can read the image pixel value in two ways as follows (there might be other methods which do not know).
Using BufferedImage's getRGB method:
pixel = image.getRGB(x,y);
Using Raster's getSample method:
WritableRaster raster = image.getRaster();
pixel = raster.getSample(x,y,0);
What is the difference in the above two approaches?
1: The first approach will always return a pixel in int ARGB format, and in the sRGB color space. Regardless of the image's internal representation. This means that unless the image's internal representation is TYPE_INT_ARGB, some conversion has to be done. This is sometimes useful, because it's predictable, but just as often it's quite slow. As an example, color space conversion is quite expensive. Also, if the image has higher precision than 8 bits per sample and/or 4 samples per pixel, precision loss occurs. This may or may not be acceptable, given your use case.
2: The second approach may give you a pixel value, but not in all cases, as it gives you the sample value at (x,y) for the the band 0 (the first band). For TYPE_INT_ARGB this will be the same as the pixel value. For TYPE_BYTE_INDEXED this will be the index to use in the look up table (you need to look it up to get the pixel value). For TYPE_3BYTE_BGR this will give you the blue value only (you need to combine it with the samples in band 1 and 2 to get the full pixel value). Etc. for other types. For samples that are not internally represented as an int, data type conversion occurs (and in rare cases precision loss). It might work for you, but I've never had much use for the getSample(...) methods.
Instead I suggest you look into what I believe to be the fastest way to get at pixel data. That is using the getDataElements method:
Object pixel = null; // pixel initialized on first invocation of getDataElements
for (y) {
for (x) {
pixel = raster.getDataElements(x, y, pixel);
}
}
This will give you the "native" values from the data buffer, without any conversion.
You then need to have special handling for each transfer type (see the DataBuffer class) you want to support, and perhaps a common fallback for non-standard types.
This will have the same "problem" as your approach 2 for pixel values vs normalized RGB values, so you might need to convert/look up "manually".
What approach is better, as always, depends. You have to look at each use case, and decide what's more important. Ease/simplicity, or the best possible performance (or perhaps best quality?).
I'm practicing some simple 2D game programming, and came up with a theory that during animation (the actual change in a image position is best calculated with floating point numbers). I have a feeling that if you move an image around with ints the animation won't be as smooth.
In Java it seems you can't draw an image with floating point numbers to give an image a position. But apparently when you initially declare your x and y 's, you can declare them as Double, or Float, and when it comes to actually drawing the image you have to cast them to ints. Like I find HERE :
/**
* Draw this entity to the graphics context provided
*
* #param g The graphics context on which to draw
*/
public void draw(Graphics g) {
sprite.draw(g,(int) x,(int) y);
}
My question is about how Java handles the conversion?
If the code casts these doubles at the last minute, why have them as doubles in the first place?
Does Java hide the numbers after the decimal?
I know in C and C++ the numbers after the decimal get cut off and you only see whats before it. How does Java handle this casting?
Pixels on a display are discrete and limited in number; therefore display coordinates need to be integer numbers - floating point numbers make no sense, as you do not physically have a pixel at e.g. (341.4, 234,7).
That said, integers should only be used at the final drawing stage. When you calculate object movement, speeds etc, you need to use floating point numbers. Integers will cause an amazing number of precision problems. Consider the following snippet:
a = 1;
x = (a / 2) * 2;
If a and x are floating point numbers, x will finally have the expected number of 1. If they are integers, you will get 0.
Baseline: use floating point types for physics computations and convert to int at drawing time. That will allow you to perform the physics calculations with as much precision as required.
EDIT:
As far as the conversion from FP numbers to integers is concerned, while FP numbers have a greater range, the values produced by your physics calculation after normalization to your drawing area size should not normally overflow an int type.
That said, Java truncates the floating point numbers when converting to an integer type, which can create artifacts (e.g. an animation with no pixels at the rightmost pixel column, due to e.g. 639.9 being converted to 639 rather than 640). You might want to have a look at Math.round() or some of the other rounding methods provided by Java for more reasonable results.
Java truncates the decimals. Eg:
(int) 2.34 == 2
(int) 2.90 == 2
The reason for not being able to draw at a floating position is simply that there's no half pixels etc :)
Java casts floats to int by dropping the decimal. But I don't think having x and y coordinates in floats make any sense. You have pixel on the screen which cannot be presented in anything less than one pixel. For example you can't draw a pixel .5px x .5px because on the screen it will just be 1px x 1px pixel. I am not a computer game programmer but I have written one animation engine in Java and it was very smooth. I can share this if you'd like.
Note that you should draw using ints but do all your calculation using doubles. For things like rotating or anything that relies on a mathematical formula should be done in decimal.
The reason x and y need to be doubles is for when they need to be computed mathematically, for example:
x += (delta * dx) / 1000;
You want to avoid overflows and loss of precision up until you paint the pixel.
I've heard that the data in gray-scale images with 8-bits color depth is stored in the first 7 bits of a byte of each pixel and the last bit keep intact! So we can store some information using the last bit of all pixels, is it true?
If so, how the data could be interpreted in individual pixels? I mean there is no Red, Blue and Green! so what do those bits mean?
And How can I calculate the average value of all pixels of an image?
I prefer to use pure java classes not JAI or other third parties.
Update 1
BufferedImage image = ...; // loading image
image.getRGB(i, j);
getRGB method always return an int which is bigger than one byte!!!
What should I do?
My understanding is that 8-bits colour depth means there is 8-bits per pixel (i.e. one byte) and that Red, Gren and Blue are all this value. e.g. greyscale=192 means Red=192, Green=192, Blue=192. There is no 7 bits plus another 1 bit.
AFAIK, you can just use a normal average. However I would use long for the sum and make sure each byte is unsigned i.e. `b & 0xff
EDIT: If the grey scale is say 128 (or 0x80), I would expect the RGB to be 128,128,128 or 0x808080.
I'm able to display waveform but I don't know how to implement zoom in on the waveform.
Any idea?
Thanks piccolo
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
Answered by https://stackoverflow.com/users/234815/John%20Knoeller
Working on this right now, c# with a little linq but should be easy enough to read and understand. The idea here is to have a array of float values from -1 to 1 representing the amplitude for every sample in the wav file. Then knowing how many samples per second, we then need a scaling factor - segments per second. At this point you simply are reducing the datapoints and smoothing them out. to zoom in really tight give a samples per second of 1000, to zoom way out maybe 5-10. Note right now im just doing normal averaing, where this needs to be updated to be much more efficent and probably use RMS (root-mean-squared) averaging to make it perfect.
private List<float> BuildAverageSegments(float[] aryRawValues, int iSamplesPerSecond, int iSegmentsPerSecond)
{
double nDurationInSeconds = aryRawValues.Length/(double) iSamplesPerSecond;
int iNumSegments = (int)Math.Round(iSegmentsPerSecond*nDurationInSeconds);
int iSamplesPerSegment = (int) Math.Round(aryRawValues.Length/(double) iNumSegments); // total number of samples divided by the total number of segments
List<float> colAvgSegVals = new List<float>();
for(int i=0; i<iNumSegments-1; i++)
{
int iStartIndex = i * iSamplesPerSegment;
int iEndIndex = (i + 1) * iSamplesPerSegment;
float fAverageSegVal = aryRawValues.Skip(iStartIndex).Take(iEndIndex - iStartIndex).Average();
colAvgSegVals.Add(fAverageSegVal);
}
return colAvgSegVals;
}
Outside of this you need to get your audio into a wav format, you should be able to find source everywhere to read that data, then use something like this to convert the raw byte data to floats - again this is horribly rough and inefficent but clear
public float[] GetFloatData()
{
//Scale Factor - SignificantBitsPerSample
if (Data != null && Data.Length > 0)
{
float nMaxValue = (float) Math.Pow((double) 2, SignificantBitsPerSample);
float[] aryFloats = new float[Data[0].Length];
for (int i = 0; i < Data[0].Length; i++ )
{
aryFloats[i] = Data[0][i]/nMaxValue;
}
return aryFloats;
}
else
{
return null;
}
}
I have a buffered image in java and I want to record how similar each pixel is to another based on the color value. so the pixels with 'similar' colors will have a higher similarity value. for example red and pink will have a similarity value 1000 but red and blue will have something like 300 or less.
how can I do this. when I get the RGB from a buffered Image pixel it returns a negative integer I am not sure how to implement this with that.
First, how are you getting the integer value?
Once you get the RGB values, you could try
((r2 - r1)2 + (g2 - g1)2 + (b2 - b1)2)1/2
This would give you the distance in 3D space from the two points, each designated by (r1,g1,b1) and (r2,g2,b2).
Or there are more sophisticated ways using the HSV value of the color.
HSL is a bad move. L*a*b is a color space designed to represent how color is actually percieved, and is based on data from hundreds of experiments involving people with real eyes looking at different colors and saying "I can tell the difference between those two. But not those two".
Distance in L*a*b space represents actual percieved distance according to the predictions derived from those experiments.
Once you convert into L*a*b you just need to measure linear distance in a 3D space.
I suggest you start reading here
Color difference formulas if you want to do this right. It explains the ΔE*ab, ΔE*94, ΔE*00 and ΔE*CMC formulas for calculating color difference.
If you are going to use HSV you need to realize that HSV are not points in a three dimensional space but rather the angle, magnitude, and distance-from-top of a cone. To calculate the distance of an HSV value you either need to determine your points in 3d space by transforming.
X = Cos(H)*S*V
Y = Sin(H)*S*V
Z = V
For both points and then taking the Euclidian distance between them:
Sqrt((X0 - X1)*(X0 - X1) + (Y0 - Y1)*(Y0 - Y1) + (Z0 - Z1)*(Z0 - Z1))
At a cost of 2 Cos, 2 Sin, and a square root.
Alternatively you can actually calculate distance a bit more easily if you're so inclined by realizing that when flattened to 2D space you simply have two vectors from the origin, and applying the law of cosign to find the distance in XY space:
C² = A² + B² + 2*A*B*Cos(Theta)
Where A = S*V of the first value, and B = S*V of the second and cosign is the difference theta or H0-H1
Then you factor in Z, to expand the 2D space into 3D space.
A = S0*V0
B = S1*V1
dTheta = H1-H0
dZ = V0-V1
distance = sqrt(dZ*dZ + A*A + B*B + 2*A*B*Cos(dTheta);
Note that because the law of cosigns gives us C² we just plug it right in there with the change in Z. Which costs 1 Cos and 1 Sqrt. HSV is plenty useful, you just need to know what type of color space it's describing. You can't just slap them into a euclidian function and get something coherent out of it.
The easiest is to convert both colours to HSV value and find the difference in H values. Minimal changes means the colours are similar. It's up to you to define a threshold though.
You're probably calling getRGB() on each pixel which is returning the color as 4 8 bits bytes, the high byte alpha, the next byte red, the next byte green, the next byte blue. You need to separate out the channels. Even then, color similarity in RGB space is not so great - you might get much better results using HSL or HSV space. See here for conversion code.
In other words:
int a = (argb >> 24) & 0xff;
int r = (argb >> 16) & 0xff;
int g = (argb >> 8) & 0xff;
int b = argb & 0xff;
I don't know the specific byte ordering in java buffered images, but I think that's right.
You could get the separate bytes as follows:
int rgb = bufferedImage.getRGB(x, y); // Returns by default ARGB.
int alpha = (rgb >>> 24) & 0xFF;
int red = (rgb >>> 16) & 0xFF;
int green = (rgb >>> 8) & 0xFF;
int blue = (rgb >>> 0) & 0xFF;
I find HSL values easier to understand. HSL Color explains how they work and provides the conversion routines. Like the other answer you would need to determine what similiar means to you.
There's an interesting paper on exactly this problem:
A New Perceptually Uniform Color Space with Associated Color Similarity Measure for Content-Based Image and Video Retrieval
by M. Sarifuddin and Rokia Missaoui
You can find this easily using Google or in particular [Google Scholar.][1]
To summarise, some color spaces (e.g. RGB, HSV, Lab) and distance measures (such as Geometric mean and Euclidean distance) are better representations of human perception of color similarity than others. The paper talks about a new color space, which is better than the rest, but it also provides a good comparison of the common existing color spaces and distance measures. Qualitatively*, it seems the best measure for perceptual distance using commonly available color spaces is : the HSV color space and a cylindrical distance measure.
*At least, according to Figure 15 in the referenced paper.
The cylindrical distance measure is (in Latex notation):
D_{cyl} = \sqrt{\Delta V^{2}+S_1^{2}+S_2^{2}-2S_1S_2cos(\Delta H)}
This is a similar question to #1634206.
If you're looking for the distance in RGB space, the Euclidean distance will work, assuming you treat red, green, and blue values all equally.
If you want to weight them differently, as is commonly done when converting color/RGB to grayscale, you need to weight each component by a different amount. For example, using the popular conversion from RGB to grayscale of 30% red + 59% green + 11% blue:
d2 = (30*(r1-r2))**2 + (59*(g1-g2))**2 + (11*(b1-b2))**2;
The smaller the value of d2, the closer the colors (r1,g1,b1)and(r2,g2,b2) are to each other.
But there are other color spaces to choose from than just RGB, which may be better suited to your problem.
Color perception is not linear because the human eye is more sensitive to certain colors than others.
So jitter answered correctly
i tried it out. the HSL/HSV value is definitely not useful. for instance:
all colors with L=0 are 'black' (RGB 000000), though their HSL difference may implicate a high color distance.
all colors with S=0 are a shade of 'gray', though their HSL difference may implicate a high color distance.
the H (hue) range begins and ends with a shade of 'red', so H=0 and H=[max] (360° or 100% or 240, depending on the application) are both red and relatively similar to each other, but the Euclidean HSL distance is close to maximum.
so my recommendation is to use the Euclidean RGB distance (r2-r1)² + (g2-g1)² + (b2-b1)² without root. the (subjective) threshold of 1000 then works fine for similar colors. colors with differences > 1000 are well distinguishable by the human eye. additionally it can be helful to weight the components differently (see prev. post).