Find and Crop relevant image area automatically - java

We are trying to crop the relevant area of an image (photo) with a square aspect ratio (1:1), similar to what Facebook does when creating thumbnails.
In our case, it doesn't really matter if the crop has the original height (or width when the image orientation is portrait h>w) of the image to be processed or the auto-crop is resizing itself as well
I am thinking of algorithms like comparing objects with background or focus or something like a heat-map, combining colors and/or areas to find the most relevant part. There could be several ideas/methods to find the main part of the image to be used, similar to face detection.
We are looking for a Java (Android)-based solution or anything that can be adopted for Java / Android. Any help or idea would be greatly appreciated! Thank you!

I would do this in two steps, where the initial step is more robust and the second could be based on, for example, entropy. For the first step, you can use SURF which is relatively common nowadays and I would expect to find Java implementations of it. SURF will give a set of key points that it considers important to describe your image. Considering where these key points are in your image, you have a set of (x, y) coordinates from which you use to reduce the area of your initial image to that which encloses this set of points. Now, since these key points might be anywhere in your image, you will probably want to discard some of them (i.e., those that are too far from the others -- outliers). A very simple way to do this discarding step is considering the convex hull from the initial set of key points, from there, you can peel this hull multiple times. Each time you "peel" it, you are effectively discarding the points in the current convex hull.
Here is a sample for such first step:
f = Import["http://fohn.net/duck-pictures-facts/mallard-duck.jpg"];
kp = ImageKeypoints[f, MaxFeatures -> 200];
Show[f, Graphics[{PointSize[Medium], Red, Point[kp]}]]
After peeling once the convex hull formed by the key points and trimming the image according to the bounding rectangle of the remaining points:
From the image above, you can decide which sub-region of it to pick based on some other method. One that is apparently common is the one used by Reddit, which successively remove slices of lesser entropy from the image. Quickly searching for it, I found one such implementation at https://github.com/christopherhan/pycrop/blob/master/pycrop.py#L33, it is very simple.
Another different kind of method that you might wanna try is called Seam-Carving. Also note that depending on how large is the initial image, it is unlikely that cropping a small piece of it will give anything relevant. In those cases, it is more interesting to first resize the image and then apply the relevant methods.

Related

Image processing - OpenCV, Identifying digits

I am new to image processing and to opencv in particular.
I am working on an OCR project in which i need to identify numbers.
This is my image to process:
Lets say i already optimized the image, my questions are:
In the image the number are always apeared several times, lets say i found the contours, so how can i know which one if the the best one to process?
How can I know in what angle I need to rotate each contour to make It stright?
In the image the number are always apeared several times, lets say i found the contours, so how can i know which one if the the best one to process?
You want always the biggest number, because they are least warped by perspective. So you always want the numbers in the middle of the image, because they are also n the middle of the ball.
How can I know in what angle I need to rotate each contour to make It stright?
Have a look at rotated rect. I explained how to find the angle in this thread.
Since you always have a perfectly centered ball, you should think about using mapping to "unwarp" your ball (so do a projection like from the globe onto a map). It should be pretty straightforward afterwards to find the numbers on the flat image.
Edit: Since you only have 10 numbers you might also "brute force" the solution with a big enough training set. So just throw all numbers you detect into a classifier and keep the most likely solution.
1) I agree with #Sebastian in the first part. Exploit the fact that in your scenario the numbers are placed in the surface of a ball, so first select the blobs inside a centered region of interest.
2) The contours shown in the image are not rotated (the numbers are). Instead of "rotating" these bounding boxes, which seems to be quite a headache, I'd rather use them combined with rotation invariant keypoints. I'll clarify this:
a) You know where your numbers are, so you don't have to search in the entire image. OK, keep these already selected regions in mind.
b) You can take "straight" samples of the numbers 0-9 and use them as ground truth.
c) You can perform a matching search between each "ground truth" image and each candidate region. Now, forget the scale/rotation: use scale/rotation invariant keypoints! Something like this:
Again, notice that you have already selected the region-of-interest, so in your case the search will consist on checking the number of matches (number of blue lines) between each of the registered numbers and your candidate. I think it worth a try! :)
You can find more info on the different keypoints available in opencv here.
Hope that it helps!

How to do perspective fixing?

I'm searching for a fast way to fix perspective of a picture given in java or any language.And currently i really don't have any idea how to do it, nor find anything useful in Google.
Input:
Point[4] , Color[][]
Output:
Perspective-Fixed Color[][]
By Perspective Fixing, i meant the one in Photoshop. Just Like:
I^d appreciate it if you tell me how the code piece works since i want to understand the logic.
The simple solution is to just remap coordinates from the original to the final image, copying pixels from one coordinate space to the other, rounding off as necessary -- which may result in some pixels being copied several times adjacent to each other, and other pixels being skipped, depending on whether you're stretching or shrinking (or both) in either dimension. Make sure your copying iterates through the destination space, so all pixels are covered there even if they're painted more than once, rather than thru the source which may skip pixels in the output.
The better solution involves calculating the corresponding source coordinate without rounding, and then using its fractional position between pixels to compute an appropriate average of the (typically) four pixels surrounding that location. This is essentially a filtering operation, so you lose some resolution -- but the result looks a LOT better to the human eye; it does a much better job of retaining small details and avoids creating straight-line artifacts which humans find objectionable.
Note that the same basic approach can be used to remap flat images onto any other shape, including 3D surface mapping.

Need Conceptual Help Rendering a Heat Map

I need to create a heatmap for android google maps. I have geolocation and points that have negative and positive weight attributed to them that I would like to visually represent. Unlike the majority of heatmaps, I want these positive and negative weights to destructively interfere; that is, when two points are close to each other and one is positive and the other is negative, the overlap of them destructively interferes, effectively not rendering areas that cancel out completely.
I plan on using the android google map's TileOverlay/TileProvider class that has the job of creating/rendering tiles based a given location and zoom. (I don't have an issue with this part.)
How should I go about rendering these Tiles? I plan on using java's Graphics class but the best that I can think of is going through each pixel, calculating what color it should be based on the surrounding data points, and rendering that pixel. This seems very inefficient, however, and I was looking for suggestions on a better approach.
Edit: I've considered everything from using a non-android Google Map inside of a WebView to using a TileOverlay to using a GroundOverlay. What I am now considering doing is having a large 2 dimensional array of "squares." Each square would have a long, lat, and total +/- weights. When a new data point is added, instead of rendering it exactly where it is, it will be added to the "square" that it is in. The weight of this data point will be added to the square and then I will use the GoogleMap Polygon object to render the square on the map. The ratio of +points to -points will determine the color that is rendered, with a ratio closer to 1:1 being clear, >1 being blue (cold point), and <1 being red (hot point).
Edit: a.k.a. clustering the data into small regional groups
I suggest trying
going through each pixel, calculating what color it should be based on the surrounding data points, and rendering that pixel.
Even if it slow, it will work. There are not too many Tiles on the screen, there are not too many pixels in each Tile and all this is done on a background thread.
All this is still followed by translating Bitmap into byte[]. The byte[] is a representation of PNG or JPG file, so it's not a simple pixel mapping from Bitmap. The last operation takes some time too and may possibly require more processing power than your whole algorithm.
Edit (moved from comment):
What you describe in the edit sounds like a simple clustering on LatLng. I can't say it's a better or worse idea, but it's something worth a try.

What is the meaning of jitter in visualize tab of weka

In weka I load an arff file. I can view the relationship between attributes using the visualize tab.
However I can't understand the meaning of the jitter slider. What is its purpose?
You can find the answer in the mailing list archives:
The jitter function in the Visualize panel just adds artificial random
noise to the coordinates of the plotted points in order to spread the
data out a bit (so that you can see points that might have been
obscured by others).
I don't know weka, but generally jitter is a term for the variation of a periodic signal to some reference interval. I'm guessing the slider allows you to set some range or threshold below which data points are treated as being regular, or to modify the output to introduce some variation. The wikipedia entry can give you some background.
Update: from this pdf, the jitter slider is for this purpose:
“Jitter” option to deal with nominal attributes (and to detect “hidden”data points)
Based on the accompanying slide it looks like it introduces some variation in the visualisation, perhaps to show when two data points overlap.
Update 2: This google books extract (to Data mining By Ian H. Witten, Eibe Frank) seems to confirm my guess:
[jitter] is a random displacement applied to X and Y values to separate points that lie on top of one another. Without jitter, 1000 instances at the same data point would look just the same as 1 instance
I don't know the products you mention, but jittering generally means randomising the sample positions. Eg, in ray tracing you would normally render a ray though each pixel on the screen. Jittering adds a random offset to each ray to reduce issues caused by regular aliasing.

Find an Image within an Image

I am looking for the best way to detect an image within another image. I have a small image and would like to find the location that it appears within a larger image - which will actually be screen captures. Conceptually, it is like a 'Where's Waldo?' sort of search in the larger image.
Are there any efficient/quick ways to accomplish this? Speed is more important than memory.
Edit:
The 'inner' image may not always have the same scale but will have the same rotation.
It is not safe to assume that the image will be perfectly contained within the other, pixel for pixel.
Wikipedia has an article on Template Matching, with sample code.
(While that page doesn't handle changed scales, it has links to other styles of matching, for example Scale invariant feature transform)
If rotation also had to be catered for, the Generalised Hough Transform can be used.
You can treat this as a substring problem, where characters in the alphabet are pixels and your string is the image. You would need also to use a special character in a similar vein to a linebreak, to denote the image boundary.
The algorithm you want is on wikipedia: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
Update: If you cannot assume that the image is perfectly contained within the other, pixel for pixel, then this approach will not work.
There are other, more complicated algorithms based on the same dynamic programming concept as the above, but I won't go into them unless it's necessary.

Categories

Resources