I'm looking for an algorithm that can find perceptual similarity between two images, actually i want to input one picture into system and it search whole my database which contain huge amount of picture and then retrieve the images which have more perceptual similarity with source image, could any body please help me ?
I mean i want to find similar pic. i heard some algorithm can find similar pictures base on the source pic's shape, color and etc (pixel by pixel). i wanna have the system that i input the source image and system retrieve the similar images based on perceptual features like shape, color, size and etc.
Thanks
You need to define carefully what 'perceptually similar' means to you, before trying to find a measurable entity that captures that. Imagine a picture of of a grass field under a blue sky with a horse. Should your application retrieve all horse pictures? Or all pictures with green grass and a blue sky? In the latter case, the above mentioned color histograms are a good start. Alternatively you could look at gaussian mixture models (GMM), they are used quite a bit in retrieval. This code could be a starting point and this article Image retrieval using color histograms
generated by Gauss mixture vector quantization
More complicated is the so called "bag of words" or "visual words" approach. It is increasingly used for image categorization and identification. This algorithm usually starts by detecting robust points in an image, meaning that these points will survive certain image distortions. Example popular algorithms are SIFT and SURF. The region around these found points is captured with a descriptor, which could for example be a smart histogram.
In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset. Also noteworthy, are results and software from Caltech itself.
One simple way to start is comparing the Color Histogram.
But the following article proposes the use of Joint Histogram instead. You may also take a look.
http://www.cs.cornell.edu/rdz/joint-histograms.html
Related
I want an efficient way to finding out sub-images from a Image for Ex. we have a image of country map and it contain states as sub-image.
Then i need a way to finding out sub-images of states from country map.
If you have only pixels, you'll need image processing algorithms to find sub-images.
Your solution will be specific to what the image you are processing looks like. For example, if states are outlined in a certain color, you could try edge detection. If states are each different colors, you could run a flood fill like algorithm to create boundaries for each state.
This is a difficult problem. Try computer vision or object recognition for keywords in your research.
However, I suggest instead you build up vector files which define boundaries of subimages by hand. If you've only a few to do, this isn't a big deal.
What is the best way to identify an image's type? rwong's answer on this question suggests that Google segments images into the following groups:
Photo - continuous-tone
Clip art - smooth shading
Line drawing - bitonal
What is the best strategy for classifying an image into one of those groups? I'm currently using Java but any general approaches are welcome.
Thanks!
Update:
I tried the unique colour counting method that tyjkenn mentioned in a comment and it seems to work for about 90% of the cases that I've tried. In particular black and white photos are hard to correctly detect using unique colour count alone.
Getting the image histogram and counting the peeks alone doesn't seem like it will be a viable option. For example this image only has two peaks:
Here are two more images I've checked out:
Rather simple, but effective approaches to differentiate between drawings and photos. Use them in combination to achieve a the best accuracy:
1) Mime type or file extension
PNGs are typically clip arts or drawings, while JPEGs are mostly photos.
2) Transparency
If the image has an alpha channel, it's most likely a drawing. In case an alpha channel exists, you can additionally iterate over all pixels to check if transparency is indeed used. Here a Python example code:
from PIL import Image
img = Image.open('test.png')
transparency = False
if img.mode in ('RGBA', 'RGBa', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
if img.mode != 'RGBA': img = img.convert('RGBA')
transparency = any(px for px in img.getdata() if px[3] < 220)
print 'Transparency:', transparency
3) Color distribution
Clip arts often have regions with identical colors. If a few color make up a significant part of the image, it's rather a drawing than a photo. This code outputs the percentage of the image area that is made from the ten most used colors (Python example):
from PIL import Image
img = Image.open('test.jpg')
img.thumbnail((200, 200), Image.ANTIALIAS)
w, h = img.size
print sum(x[0] for x in sorted(img.convert('RGB').getcolors(w*h), key=lambda x: x[0], reverse=True)[:10])/float((w*h))
You need to adapt and optimize those values. Is ten colors enough for your data? What percentage is working best for you. Find it out by testing a larger number of sample images. 30% or more is typically a clip art. Not for sky photos or the likes, though. Therefore, we need another method - the next one.
4) Sharp edge detection via FFT
Sharp edges result in high frequencies in a Fourier spectrum. And typically such features are more often found in drawings (another Python snippet):
from PIL import Image
import numpy as np
img = Image.open('test.jpg').convert('L')
values = abs(numpy.fft.fft2(numpy.asarray(img.convert('L')))).flatten().tolist()
high_values = [x for x in values if x > 10000]
high_values_ratio = 100*(float(len(high_values))/len(values))
print high_values_ratio
This code gives you the number of frequencies that are above one million per area. Again: optimize such numbers according to your sample images.
Combine and optimize these methods for your image set. Let me know if you can improve this - or just edit this answer, please. I'd like to improve it myself :-)
This problem can be solved by image classification and that's probably Google's solution to the problem. Basically, what you have to do is (i) get a set of images labeled into 3 categories: photo, clip-art and line drawing; (ii) extract features from these images; (iii) use the image's features and label to train a classifier.
Feature Extraction:
In this step you have to extract visual information that may be useful for the classifier to discriminate between the 3 categories of images:
A very basic yet useful visual feature is the image histogram and its variants. For example, the gray level histogram of a photo is probably smoother than a histogram of a clipart, where you have regions that may be all of the same color value.
Another feature that one can use is to convert the image to the frequency domain (e.g. using FFT or DCT) and measure the energy of high frequency components. Because line drawings will probably have sharp transitions of colors, its high frequency components will tend to accumulate more energy.
There's also a number of other feature extraction algorithms that may be used.
Training a Classifier:
After the feature extraction phase, we will have for each image a vector of numeric values (let's call it the image feature vector) and its tuple. That's a suitable input for a training a classifier. As for the classifier, one may consider Neural Networks, SVM and others.
Classification:
Now that we have a trained classifier, to classify an image (i.e. detect a image category) we simply have to extract its features and input it to the classifier and it will return its predicted category
Histograms would be a first way to do this.
Convert the color image to grayscale and calculate the histogram.
A very bi-modal histogram with 2 sharp peaks in black (or dark) and white (or right), probably with much more white, are a good indication for line-drawing.
If you have just a few more peaks then it is likely a clip-art type image.
Otherwise it's a photo.
In addition to color histograms, also consider edge information and the consistency of line widths throughout the image.
Photo - natural edges will have a variety of edge strengths, and it's less likely that there will be many parallel edges.
Clip art - A watershed algorithm could help identify large, connected regions of consistent brightness. In clip art and synthetic images designed for high visibility there are more likely to be perfectly straight lines and parallel lines. A histogram of edge strengths is likely to have a few very strong peaks.
Line drawing - synthetic lines are likely to have very consistent width. The Stroke Width Transform could help you identify strokes. (One of the basic principles is to find edge gradients that "point at" each other.) A histogram of edge strengths may have only one strong peak.
Ok I'm writing a small Java app that accepts two images as inputs, compares them, then gives a quantitative output as a measure of similarity (eg. 50% similar).
To my understanding FFT is a good way to measure similarity of two images. But I can't for the love of god figure out how to code/implement it.
So far I've implemented another function which basically gives me two histograms (one for each image). All I need now is to write a method that will FFT an image and give me a quantifiable outcome.
Can anyone help me out with this? I'd really like to see some sample codes, if not at least a point in the right direction. Much thanks in advance.
Similarity is not an exact term. For example: if you have circle, and an ellipse are they similar? They are both round objects, so in this sense they are - but if we want to filter out circles only they are not. You will have to define a measure (or measures - for example roundness, intensity distribution, size, orientation, number of objects, euler number, etc.), than calculate it for each image. The similarity of the two images will be (some kind of) distance between the two calculated values. This could be euclidean distance (for two real measures), or some kind of error function (RMS for intensity distributions).
You will have to choose to which transforms should your measure stay invariant (is the rotated image similar to the original? If yes, simple fourier transform is not appropriate).
Measuring similarity of an image is hard, if you have to do that I would read about image stitching. If you just need to distinguish BLOB-s, first try to calculate some simple measures (I recommend calculating moments - area, orientation; read K-means clusteing), or 1D fourier transform of the distance of the contour from the center of the mass (whic is a little bit more difficult).
Before you attempt to code up a 2DFT, you should fully understand the math behind it. flolo is correct that you can compute it by first doing a 1D FFT on the rows and columns and then combining the results, but I have no reason to believe the L_inf norm is the best way to convert them to a metric, since it completely skips the usual combining step to create the full 2DFT. Take a look at http://fourier.eng.hmc.edu/e101/lectures/Image_Processing/node6.html at the very bottom of the page.
That said, there may be better ways to compare images that don't require comparing 2D arrays of information. For instance, PCA (Principal Component Analysis, which is just a matter of running SVD {Singular Value Decomposition} on your images after mean-centering them, though I'd take a look at the wikipedia article on it first) will give you a 1D vector which you could then apply some L_p norm to directly to compare, although in this case, i would use something like sum(min(a_i/b_i , b_i/a_i))/length(a), where a and b are the 1D vectors you got from the transform.
There are many good sites with code for a fft on an 1-D array of values. You just apply this fft row by row on your image. And afterwards you do fft columnwise on the results.
Now you need a metric to get from the resulting transformed image, my suggestion would be to try the max-norm (L_inf). That is max_{x,y}{fft2d(imag1)[x,y] - fft2d(imag2)[x,y]}.
If you just want to check if it is likely that one image is a quick edit of another for something like DRM of stock photography then check the percentages of a normalized color palette within probable regions. If they match within an THRESHOLD for a NUMBER_OF_TEST_COLORS in any one of a number of TEST_REGIONS within the image then you have a "suspect"... you still need a human to check the suspects. But this is a quick and dirty way to find many of the image re-sizers, horiz/vert flippers, and background color changers, file format changers, and other subtle variations... of course "normalizing the colors" to a quantized palette is an art unto itself. I would recommend quantizing images into nearest "web safe" colors for practicality.
I'm a blue collar garbage man in comparison to a mathematician, but garbage men are quite practical! I have had good success with this kind of approach in grouping similar images and search by color applications.
In weka I load an arff file. I can view the relationship between attributes using the visualize tab.
However I can't understand the meaning of the jitter slider. What is its purpose?
You can find the answer in the mailing list archives:
The jitter function in the Visualize panel just adds artificial random
noise to the coordinates of the plotted points in order to spread the
data out a bit (so that you can see points that might have been
obscured by others).
I don't know weka, but generally jitter is a term for the variation of a periodic signal to some reference interval. I'm guessing the slider allows you to set some range or threshold below which data points are treated as being regular, or to modify the output to introduce some variation. The wikipedia entry can give you some background.
Update: from this pdf, the jitter slider is for this purpose:
“Jitter” option to deal with nominal attributes (and to detect “hidden”data points)
Based on the accompanying slide it looks like it introduces some variation in the visualisation, perhaps to show when two data points overlap.
Update 2: This google books extract (to Data mining By Ian H. Witten, Eibe Frank) seems to confirm my guess:
[jitter] is a random displacement applied to X and Y values to separate points that lie on top of one another. Without jitter, 1000 instances at the same data point would look just the same as 1 instance
I don't know the products you mention, but jittering generally means randomising the sample positions. Eg, in ray tracing you would normally render a ray though each pixel on the screen. Jittering adds a random offset to each ray to reduce issues caused by regular aliasing.
Given:
two images of the same subject matter;
the images have the same resolution, colour depth, and file format;
the images differ in size and rotation; and
two lists of (x, y) co-ordinates that correlate the images.
I would like to know:
How do you transform the larger image so that it visually aligns to the second image?
(Optional.) What are the minimum number of points needed to get an accurate transformation?
(Optional.) How far apart do the points need to be to get an accurate transformation?
The transformation would need to rotate, scale, and possibly shear the larger image. Essentially, I want to create (or find) a program that does the following:
Input two images (e.g., TIFFs).
Click several anchor points on the small image.
Click the several corresponding anchor points on the large image.
Transform the large image such that it maps to the small image by aligning the anchor points.
This would help align pictures of the same stellar object. (For example, a hand-drawn picture from 1855 mapped to a photograph taken by Hubble in 2000.)
Many thanks in advance for any algorithms (preferably Java or similar pseudo-code), ideas or links to related open-source software packages.
This is called Image Registration.
Mathworks discusses this, Matlab has this ability, and more information is in the Elastix Manual.
Consider:
Open source Matlab equivalents
IRTK
IRAF
Hugin
you can use the javax.imageio or Java Advanced Imaging api's for rotating, shearing and scaling the images once you found out what you want to do with them.
For a C++ implementation (without GUI), try the old KLT (Kanade-Lucas-Tomasi) tracker.
http://www.ces.clemson.edu/~stb/klt/