I have to find out the contour of the image. After that, I want to find out how to fill in hole in the number characters, but not in the other space. The image is the following.
http://i.stack.imgur.com/jlLYE.jpg
Actually, if it is not possible, is there any other method for me to perform segmentation of this image by using openCV in java platform? I want the image contains the characters only. Thankyou.
http://i.stack.imgur.com/kY4Dh.png
Here is a simple method (But I am not sure if it will work everywhere. Test it yourself)
NB: Code is in Python, I don't do Java, sorry about that :(
Load the grayscale image
Apply Otsu's binarization
import cv2
import numpy as np
img = cv2.imread('test.png',0)
ret, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Below is the thresholded image:
Now you can try two methods:
3a. Median Blurring with a 3x3 kernel
res = cv2.medianBlur(thresh,3)
Result:
3b. Erosion with a 3x1 kernel (vertical). 3x1 because all the lines in your image are more-horizontal. It there are vertical lines in other images, you may need to take 3x3 kernel (not sure. Check it)
kernel = np.ones((3,1))
cls = cv2.erode(thresh, kernel)
If you think you are losing some parts of digits also, you can apply dilation after erosion, or replace everything with morphological opening function.
Result:
Finally, find contours. It will also pick up some noise left in the preprocessed image, but you can filter out them by checking their aspect ratio, area etc.
There is a not that complicated solution. Think about the vertical run length representation of the image: since you have only black and white there, you can think that each vertical line of the image is a list containing only 1(black pixel) and 0 (white pixel) so you will have for instace 01111111110000000000000111. This can be minimized if you will take only the lengths of each sublist containing only 1 or 0, so instead of 0000000001111111000111111111111 you will have 0 9 7 3 12, starts with 0 because let us say that you always start with the count of black pixels, and since there you don't have any black pixels at the begining you put a 0 (it will be much easier to work like this). After you have this reprezentation you take the maximal value for a white run (the run is actually that count of white or black pixels) and the minimal one and go throught all white runs. For each of them you see if the value is closer to the smallest white run, and if this is the case you just remove it.
This algorithm should work for the given image ;)
Related
I dont understand what happens with pixels in Virtual Display in Android when output dimensions are reduced compared to the input ones ?
When I have for example input = size of my Display = 1920x960 and I set outputs to be 1920/3 and 960/3, what happens in that case with image pixels:
pixel density is increased or
maybe it takes only smaller part of screen that is centered and has dimensions 640x320 or
something else?
Additionally, is there a way that I can only grab center part of screen as in picture below?
By digging in the AOSP, watching, looking at a thread and experimenting~
I have come to my own (might be overly) simplified conclusion.
Android calls the native Java SDK which produces the information that's needed to render a bitmap/pixels on a Java program.
If the results are the same don't need to pass/copy it again to the GPU.
If the results are not the same pass/copy it to the GPU to be "invalidated" then re-rendered.
Now to your question.
By looking at the Bitmap class and looking at this thread It came to my mind that resizing depends on the scaling ratio passed on to the Matrix class.
If resized, it will expensively create a new Bitmap that looks like something either a pretty-bad higher pixel-density or not-so-smooth lower pixel density.
If the pixel-density is increased (smaller dimensions, your case) it will look squashed and if need be, the colors are averaged to the nearest neighbouring pixels. ("kind of" like how JPEG works).
After resizing it will still stay to it's origin (top-left part of the rendered object) which is defined by it's X and Y coordinates.
For your second question, about screen grabbing you can take a look at this and then programatically resize the image by doing something like this:
//...
Bitmap.createBitmap(screenshot_bitmap, left, top, right, bottom);
//...
So I'm trying to fill an ArrayList<Rectangle> with the bounds of each letter of an image file.
For example, given this .png image:
I want to fill an ArrayList<Rectangle> with 14 Rectangle(one rectangle for each letter)
We can assume that the image will contain only 2 colors, one for the background and one for the letters, in this case, pixels will be either white or red.
At first, I thought I could search for white columns in between the letters, then if I found a completely white column I could get for example the width by getting the lowest red pixel value and the highest red pixel value and width = maxX-minX and so on:
x = minX;
y = minY;
w = maxX-minX;
h = maxY-minY;
letterBounds.add(new Rectangle(x,y,w,h));
The problem is that there's no space in between the letters, not even 1 pixel:
My next idea was for each red pixel I find, look for a neighbor that hasn't been seen yet, then if I can't find a neighbor I have all the pixels to get the bounds of that letter. But with this approach, I will get 2 rectangles for letters like "i" I could then write some algorithm to merge those rectangles but I don't know how that will turn out with other multi part letters, and before I try that I wanted to ask here for more ideas
So do you guys have any ideas?
You can use the OpenCV cv2.findContours() function. Instead of using the cv2.drawcontours() function for drawing the contours, which will highlight the outline of the letter, you could instead draw a rectangle on the image by using the cv2.rectangle and by extracting the coordinates from cv2.findContours() function.
I think two step algorithm is enough to solve the problem if not using library like OpenCV.
histogram
seam calculation
1. histogram
C.....C..C...
.C.C.C...C...
. C.C....CCCC
1111111003111
dot(.) means background color(white)
C means any colors except background color(in your case, red)
accumulating the number of vertical pixels with non-background color generates histogram.
*
*
******..****
0123456789AB
It is clear the boundary exists at 6 and 7
2. seam calculation
Some cases like We, cannot be solved by histogram because there is no empty vertical lines at all.
Seam Carving algorithm gives us some hints
https://en.wikipedia.org/wiki/Seam_carving
More detail implementation is found at
princeton.edu - seamCarving.html
Energy calcuation for a pixel
The red numbers are not color values for pixels, but energy values calculated from adjacent pixels.
The vertical pathes with minimal energy give us the boundary of each characters.
3. On more...
Statistical data is required to determine whether to apply the seam carving or not.
Max and min width of characters
Even if histogram give us vertical boundaries, it is not clear there are two or more characters in a group.
I am new to android, and my group is currently creating a graphing application using a GlSurfaceView using opengl es 2.0.
We have recently displayed the grid and tickmarks on the plot and now I have been assigned the task to implement a numeric scale and labeling the x and y axis as "X" and "Y".
After doing a lot of research I have determined to accomplish this by rendering a string of characters to a bitmap. I have encountered many problems in achieving this. I understand the basic concept. I know I will need the alphanumeric characters "0123456789" and "XY"and"-"(for the -x and -y scale). I have seen many different examples and have tried extensively to follow JVitella's example here here
I am beginning to grasp the concept but as far as the my string goes I know I have 13 characters so how large should my bitmap be?
Also in Jvitelas example I am dumbfounded by the code:
Drawable background = context.getResources().getDrawable(R.drawable.background);
I dont understand what exactly is going on and when I code this I recieve a syntax error on context.
For my application I understand I would need to save the string into a bit map much like this. I would create a bitmap but how big should it be? Then I create a canvas from the bitmap and canvas.drawText into the bitmap.
[ 0 1 2 3 4 ]
| 5 6 7 8 9 |
[ X Y Z ]
Basically I am asking:
How to achieve the following bit map above?
How would I draw single digit numbers from the bit map?
How would I draw numbers with more than one digit?
You're asking a lot of questions, but I'll try to answer a few:
so how large should my bitmap be?
It's really up to you, depending on how crisp you want the text to be. You could allocate a huge bitmap with hundreds of pixels for each character that would zoom very well, or a very small bitmap with limited resolution. I'd say whatever "font size" you want to have, allocate at least that many pixels in height for each character. So if you want to draw something with a font size of "20", then maybe you need a bitmap 5x20 by 3x20 or 100x60.
How would I draw single digit numbers from the bit map?
You'll draw a quad with opengl in the place where you want to draw a letter, and you use the texture coordinates of that quad to pick a letter.
For example if I want to draw an X, then you draw a quad on the screen, and assign it's texcoords from (0,0) to (0.2, 0.33), which selects the left 1/5th of the texture, and the bottom 1/3rd of the texture. You'll see how a box like this lines up with the position of the "X" in your texture.
How would I draw numbers with more than one digit?
You just draw two independent single digits right next to each other.
If your only goal here is to draw text in Android, it might be easier to just use a FrameLayout, and layer TextViews overtop of your GLSurfaceView. OpenGL isn't designed for text which makes it somewhat cumbersome.
What is the best way to identify an image's type? rwong's answer on this question suggests that Google segments images into the following groups:
Photo - continuous-tone
Clip art - smooth shading
Line drawing - bitonal
What is the best strategy for classifying an image into one of those groups? I'm currently using Java but any general approaches are welcome.
Thanks!
Update:
I tried the unique colour counting method that tyjkenn mentioned in a comment and it seems to work for about 90% of the cases that I've tried. In particular black and white photos are hard to correctly detect using unique colour count alone.
Getting the image histogram and counting the peeks alone doesn't seem like it will be a viable option. For example this image only has two peaks:
Here are two more images I've checked out:
Rather simple, but effective approaches to differentiate between drawings and photos. Use them in combination to achieve a the best accuracy:
1) Mime type or file extension
PNGs are typically clip arts or drawings, while JPEGs are mostly photos.
2) Transparency
If the image has an alpha channel, it's most likely a drawing. In case an alpha channel exists, you can additionally iterate over all pixels to check if transparency is indeed used. Here a Python example code:
from PIL import Image
img = Image.open('test.png')
transparency = False
if img.mode in ('RGBA', 'RGBa', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
if img.mode != 'RGBA': img = img.convert('RGBA')
transparency = any(px for px in img.getdata() if px[3] < 220)
print 'Transparency:', transparency
3) Color distribution
Clip arts often have regions with identical colors. If a few color make up a significant part of the image, it's rather a drawing than a photo. This code outputs the percentage of the image area that is made from the ten most used colors (Python example):
from PIL import Image
img = Image.open('test.jpg')
img.thumbnail((200, 200), Image.ANTIALIAS)
w, h = img.size
print sum(x[0] for x in sorted(img.convert('RGB').getcolors(w*h), key=lambda x: x[0], reverse=True)[:10])/float((w*h))
You need to adapt and optimize those values. Is ten colors enough for your data? What percentage is working best for you. Find it out by testing a larger number of sample images. 30% or more is typically a clip art. Not for sky photos or the likes, though. Therefore, we need another method - the next one.
4) Sharp edge detection via FFT
Sharp edges result in high frequencies in a Fourier spectrum. And typically such features are more often found in drawings (another Python snippet):
from PIL import Image
import numpy as np
img = Image.open('test.jpg').convert('L')
values = abs(numpy.fft.fft2(numpy.asarray(img.convert('L')))).flatten().tolist()
high_values = [x for x in values if x > 10000]
high_values_ratio = 100*(float(len(high_values))/len(values))
print high_values_ratio
This code gives you the number of frequencies that are above one million per area. Again: optimize such numbers according to your sample images.
Combine and optimize these methods for your image set. Let me know if you can improve this - or just edit this answer, please. I'd like to improve it myself :-)
This problem can be solved by image classification and that's probably Google's solution to the problem. Basically, what you have to do is (i) get a set of images labeled into 3 categories: photo, clip-art and line drawing; (ii) extract features from these images; (iii) use the image's features and label to train a classifier.
Feature Extraction:
In this step you have to extract visual information that may be useful for the classifier to discriminate between the 3 categories of images:
A very basic yet useful visual feature is the image histogram and its variants. For example, the gray level histogram of a photo is probably smoother than a histogram of a clipart, where you have regions that may be all of the same color value.
Another feature that one can use is to convert the image to the frequency domain (e.g. using FFT or DCT) and measure the energy of high frequency components. Because line drawings will probably have sharp transitions of colors, its high frequency components will tend to accumulate more energy.
There's also a number of other feature extraction algorithms that may be used.
Training a Classifier:
After the feature extraction phase, we will have for each image a vector of numeric values (let's call it the image feature vector) and its tuple. That's a suitable input for a training a classifier. As for the classifier, one may consider Neural Networks, SVM and others.
Classification:
Now that we have a trained classifier, to classify an image (i.e. detect a image category) we simply have to extract its features and input it to the classifier and it will return its predicted category
Histograms would be a first way to do this.
Convert the color image to grayscale and calculate the histogram.
A very bi-modal histogram with 2 sharp peaks in black (or dark) and white (or right), probably with much more white, are a good indication for line-drawing.
If you have just a few more peaks then it is likely a clip-art type image.
Otherwise it's a photo.
In addition to color histograms, also consider edge information and the consistency of line widths throughout the image.
Photo - natural edges will have a variety of edge strengths, and it's less likely that there will be many parallel edges.
Clip art - A watershed algorithm could help identify large, connected regions of consistent brightness. In clip art and synthetic images designed for high visibility there are more likely to be perfectly straight lines and parallel lines. A histogram of edge strengths is likely to have a few very strong peaks.
Line drawing - synthetic lines are likely to have very consistent width. The Stroke Width Transform could help you identify strokes. (One of the basic principles is to find edge gradients that "point at" each other.) A histogram of edge strengths may have only one strong peak.
I am trying to solve a problem of compositing two images in Java. The program will take a part of the first image and past it on the second image. The goal is to make the boundary between the two images less visible. The boundary must be chosen in such a way that the difference between the two images at the boundary is small.
My Tasks:
To write a method to choose the boundary between the two images. The method will receive the overlapping parts of the input images. This must first be transformed so that the boundary always starts from the left-top corner to the right-bottom corner.
NOTE:
The returned image should not be the joined image but gives which parts of the two images were used.
The pixels of the boundary line can be marked with a constant(SEAM). Pixels of the first image can be marked with integer 0, pixels of the second image with integer 1. After choosing the boundary line, the floodfill algorithm can be used to fill the extra pixels with 0 or 1.
NOTE: The image can be represented as a graph whereby each pixel is connected with its left, right, top and bottom neighbor. So using the flood fill will be like depth-first search.
The shortest path algorithm must be used to choose the boundary in order to make it small.
NOTE: I can not use any java data structure except Arrays (not even ArrayList)
Guys, am new in this area and am trying to solve it. What steps must I follow to solve this problem? or a pointer to a tutorial
I would do it so:
Choose the width of the border checked. At your will.
1. find the maximal possible shift in pixels. That is D.
2. For all possible shifts in the square (+-D,+-D) find the k (correlation quocient) for the border. The border is taken in the middle of the shift.
3. The shift that has the largest k is the best. Let it be taken for granted.
4. Now begin to move the border, checking it by "k" the same way. Find the place of it. Done.
If D is large and the process is long, do it in 2(or more) stages. On the first stages the step of counting k is large, the last stage has step of 1. You could also use previous filtering.
If the border or relative images' position could be turned, the algorithm doesn't change principally - only add to it trying for the best k among different slightly turned positions and later - turned border, too.