I have an image that contains an illuminate. First I crop the area which I want to process then convert it into the binary image. I use Otsu's thresholding, but it gives a bad result for this problem. I have to try to use adaptive threshold, but this method dependent on block size and C parameter (opencv method). What should I do to get a good result in this problem?
Original image, but I crop the certain area
:
Otsu Thresholding result
adaptive threshold in not suitable for your case. if you like to simply create a binary image with black background and white text (or vise-versa), and you have tight cropped area, you can simply do below steps:
1-convert image to gray scale
2-normalize your image (ignore 1% of darkest and lightest pixels)
3-use a fixed threshold (something between 0.3 to 0.7)
4-do some morphological enhancement like eroding, dilating, opening and closing for eliminating noise.
adaptive thresholding used in case of uneven luminance when you have a gradient light on board which is not present in you example.
Related
I dont understand what happens with pixels in Virtual Display in Android when output dimensions are reduced compared to the input ones ?
When I have for example input = size of my Display = 1920x960 and I set outputs to be 1920/3 and 960/3, what happens in that case with image pixels:
pixel density is increased or
maybe it takes only smaller part of screen that is centered and has dimensions 640x320 or
something else?
Additionally, is there a way that I can only grab center part of screen as in picture below?
By digging in the AOSP, watching, looking at a thread and experimenting~
I have come to my own (might be overly) simplified conclusion.
Android calls the native Java SDK which produces the information that's needed to render a bitmap/pixels on a Java program.
If the results are the same don't need to pass/copy it again to the GPU.
If the results are not the same pass/copy it to the GPU to be "invalidated" then re-rendered.
Now to your question.
By looking at the Bitmap class and looking at this thread It came to my mind that resizing depends on the scaling ratio passed on to the Matrix class.
If resized, it will expensively create a new Bitmap that looks like something either a pretty-bad higher pixel-density or not-so-smooth lower pixel density.
If the pixel-density is increased (smaller dimensions, your case) it will look squashed and if need be, the colors are averaged to the nearest neighbouring pixels. ("kind of" like how JPEG works).
After resizing it will still stay to it's origin (top-left part of the rendered object) which is defined by it's X and Y coordinates.
For your second question, about screen grabbing you can take a look at this and then programatically resize the image by doing something like this:
//...
Bitmap.createBitmap(screenshot_bitmap, left, top, right, bottom);
//...
I've researched a lot and could not find a definitive answer. What kind of image color is most used for pHash input to generate the hash/fingerprint.
For example I have a target image that I'm looking for within a source image, but the target can have many colors and shades, but the shape is always the same (ex: tulips). I have experimented with the image as is, turned gray scale and threshold (pure black and white). I know most pHash libraries will gray scale the input first before the hash is made.
But before I move forward is pre-processing the image color worthwhile? (ignoring size and rotation, and assuming source and target are the same for both)
So after testing and more research it's best to use the original colored image. Most pHash will gray scale an image regardless, so performing a gay scale followed by the internal gray scale actually produced poor results. The same goes for Threshold (pure black and white). There were more collisions and many more false positives.
I used a 64 bit pHash and worked very well. I also tried with Wavelet Hash which was good for color changes but not good for overall matching.
What worked for me is a large data set that was feed into a BinaryTree. This way the look ups were fast and had many examples to compare to. For Java I used: https://github.com/KilianB/JImageHash
I'm trying to develop simple PC application for license plate recognition (Java + OpenCV + Tess4j). Images aren't really good (in further they will be good). I want to preprocess image for tesseract, and I'm stuck on detection of license plate (rectangle detection).
My steps:
1) Source Image
Mat img = new Mat();
img = Imgcodecs.imread("sample_photo.jpg");
Imgcodecs.imwrite("preprocess/True_Image.png", img);
2) Gray Scale
Mat imgGray = new Mat();
Imgproc.cvtColor(img, imgGray, Imgproc.COLOR_BGR2GRAY);
Imgcodecs.imwrite("preprocess/Gray.png", imgGray);
3) Gaussian Blur
Mat imgGaussianBlur = new Mat();
Imgproc.GaussianBlur(imgGray,imgGaussianBlur,new Size(3, 3),0);
Imgcodecs.imwrite("preprocess/gaussian_blur.png", imgGaussianBlur);
4) Adaptive Threshold
Mat imgAdaptiveThreshold = new Mat();
Imgproc.adaptiveThreshold(imgGaussianBlur, imgAdaptiveThreshold, 255, CV_ADAPTIVE_THRESH_MEAN_C ,CV_THRESH_BINARY, 99, 4);
Imgcodecs.imwrite("preprocess/adaptive_threshold.png", imgAdaptiveThreshold);
Here should be 5th step, which is detection of plate region (probably even without deskewing for now).
I croped needed region from image (after 4th step) with Paint, and got:
Then I did OCR (via tesseract, tess4j):
File imageFile = new File("preprocess/adaptive_threshold_AFTER_PAINT.png");
ITesseract instance = new Tesseract();
instance.setLanguage("eng");
instance.setTessVariable("tessedit_char_whitelist", "acekopxyABCEHKMOPTXY0123456789");
String result = instance.doOCR(imageFile);
System.out.println(result);
and got (good enough?) result - "Y841ox EH" (almost true)
How can I detect and crop plate region after 4th step? Have I to make some changes (improvements) in 1-4 steps? Would like to see some example implemented via Java + OpenCV (not JavaCV).
Thanks in advance.
EDIT (thanks to #Abdul Fatir's answer)
Well, I provide working (for me atleast) code sample (Netbeans+Java+OpenCV+Tess4j) for those who interested in this question. Code is not the best, but I made it just for studying.
http://pastebin.com/H46wuXWn (do not forget to put tessdata folder into your project folder)
Here's how I suggest you should do this task.
Convert to Grayscale.
Gaussian Blur with 3x3 or 5x5 filter.
Apply Sobel Filter to find vertical edges.
Sobel(gray, dst, -1, 1, 0)
Threshold the resultant image to get a binary image.
Apply a morphological close operation using suitable structuring element.
Find contours of the resulting image.
Find minAreaRect of each contour. Select rectangles based on aspect ratio and minimum and maximum area.
For each selected contour, find edge density. Set a threshold for edge density and choose the rectangles breaching that threshold as possible plate regions.
Few rectangles will remain after this. You can filter them based on orientation or any criteria you deem suitable.
Clip these detected rectangular portions from the image after adaptiveThreshold and apply OCR.
a) Result after Step 5
b) Result after Step 7. Green ones are all the minAreaRects and the Red ones are those which satisfy the following criteria: Aspect Ratio range (2,12) & Area range (300,10000)
c) Result after Step 9. Selected rectangle. Criteria: Edge Density > 0.5
EDIT
For edge-density, what I did in the above examples is the following.
Apply Canny Edge detector directly to input image. Let the cannyED image be Ic.
Multiply results of Sobel filter and Ic. Basically, take an AND of Sobel and Canny images.
Gaussian Blur the resultant image with a large filter. I used 21x21.
Threshold the resulting image using OTSU's method. You'll get a binary image
For each red rectangle, rotate the portion inside this rectangle (in the binary image) to make it upright. Loop through the pixels of the rectangle and count white pixels. (How to rotate?)
Edge Density = No. of White Pixels in the Rectangle/Total no. of Pixels in the rectangle
Choose a threshold for edge density.
NOTE: Instead of going through steps 1 to 3, you can also use the binary image from step 5 for calculating edge density.
Actually OpenCV has pre-trained model specially for Russian license plates: haarcascade_russian_plate_number
Also there is open source ANPR project for Russian license plates: plate_recognition. It is not use tesseract, but it has quite good pre-trained neural network.
You find all connected components (the white areas) and determine their outline.
If you filter them based on size (as part of the image), ratio (width-height) and white/black ratio to retrieve candidate-plates.
Undo the transformation of the rectangle
Remove the bolts
Pass in image to the OCR engine.
How to draw warped text like this picture in libgdx?
There are different methods to do this – and they do not come standard in libgdx, so you will have to implement one yourself.
Convert the text to outlines. Warp each of the coordinates. Draw polyfilled objects using these warped coordinates. This is what professional software such as Adobe Illustrator and CorelDraw do.
Draw the text into a bitmap. Warp the bitmap. For a better result, draw the bitmap at twice the output size so you can use subsampling.
(Based on the rather poor quality of the sample image) Draw each of the characters slightly rotated. You can base the amount of rotation on the total number of characters (quick, dirty, and simple), or ever so slightly improve it by using the individual widths of each character to determine its relative position inside the entire string, and base the amount of rotation on that.
Are you going to use this picture for some motion or you just need it for display? If it's the latter why don't you just draw it in gimp, photoshop or even paint and position it/scale it on where you need it on the screen as normal sprite/actor?
What is the best way to identify an image's type? rwong's answer on this question suggests that Google segments images into the following groups:
Photo - continuous-tone
Clip art - smooth shading
Line drawing - bitonal
What is the best strategy for classifying an image into one of those groups? I'm currently using Java but any general approaches are welcome.
Thanks!
Update:
I tried the unique colour counting method that tyjkenn mentioned in a comment and it seems to work for about 90% of the cases that I've tried. In particular black and white photos are hard to correctly detect using unique colour count alone.
Getting the image histogram and counting the peeks alone doesn't seem like it will be a viable option. For example this image only has two peaks:
Here are two more images I've checked out:
Rather simple, but effective approaches to differentiate between drawings and photos. Use them in combination to achieve a the best accuracy:
1) Mime type or file extension
PNGs are typically clip arts or drawings, while JPEGs are mostly photos.
2) Transparency
If the image has an alpha channel, it's most likely a drawing. In case an alpha channel exists, you can additionally iterate over all pixels to check if transparency is indeed used. Here a Python example code:
from PIL import Image
img = Image.open('test.png')
transparency = False
if img.mode in ('RGBA', 'RGBa', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
if img.mode != 'RGBA': img = img.convert('RGBA')
transparency = any(px for px in img.getdata() if px[3] < 220)
print 'Transparency:', transparency
3) Color distribution
Clip arts often have regions with identical colors. If a few color make up a significant part of the image, it's rather a drawing than a photo. This code outputs the percentage of the image area that is made from the ten most used colors (Python example):
from PIL import Image
img = Image.open('test.jpg')
img.thumbnail((200, 200), Image.ANTIALIAS)
w, h = img.size
print sum(x[0] for x in sorted(img.convert('RGB').getcolors(w*h), key=lambda x: x[0], reverse=True)[:10])/float((w*h))
You need to adapt and optimize those values. Is ten colors enough for your data? What percentage is working best for you. Find it out by testing a larger number of sample images. 30% or more is typically a clip art. Not for sky photos or the likes, though. Therefore, we need another method - the next one.
4) Sharp edge detection via FFT
Sharp edges result in high frequencies in a Fourier spectrum. And typically such features are more often found in drawings (another Python snippet):
from PIL import Image
import numpy as np
img = Image.open('test.jpg').convert('L')
values = abs(numpy.fft.fft2(numpy.asarray(img.convert('L')))).flatten().tolist()
high_values = [x for x in values if x > 10000]
high_values_ratio = 100*(float(len(high_values))/len(values))
print high_values_ratio
This code gives you the number of frequencies that are above one million per area. Again: optimize such numbers according to your sample images.
Combine and optimize these methods for your image set. Let me know if you can improve this - or just edit this answer, please. I'd like to improve it myself :-)
This problem can be solved by image classification and that's probably Google's solution to the problem. Basically, what you have to do is (i) get a set of images labeled into 3 categories: photo, clip-art and line drawing; (ii) extract features from these images; (iii) use the image's features and label to train a classifier.
Feature Extraction:
In this step you have to extract visual information that may be useful for the classifier to discriminate between the 3 categories of images:
A very basic yet useful visual feature is the image histogram and its variants. For example, the gray level histogram of a photo is probably smoother than a histogram of a clipart, where you have regions that may be all of the same color value.
Another feature that one can use is to convert the image to the frequency domain (e.g. using FFT or DCT) and measure the energy of high frequency components. Because line drawings will probably have sharp transitions of colors, its high frequency components will tend to accumulate more energy.
There's also a number of other feature extraction algorithms that may be used.
Training a Classifier:
After the feature extraction phase, we will have for each image a vector of numeric values (let's call it the image feature vector) and its tuple. That's a suitable input for a training a classifier. As for the classifier, one may consider Neural Networks, SVM and others.
Classification:
Now that we have a trained classifier, to classify an image (i.e. detect a image category) we simply have to extract its features and input it to the classifier and it will return its predicted category
Histograms would be a first way to do this.
Convert the color image to grayscale and calculate the histogram.
A very bi-modal histogram with 2 sharp peaks in black (or dark) and white (or right), probably with much more white, are a good indication for line-drawing.
If you have just a few more peaks then it is likely a clip-art type image.
Otherwise it's a photo.
In addition to color histograms, also consider edge information and the consistency of line widths throughout the image.
Photo - natural edges will have a variety of edge strengths, and it's less likely that there will be many parallel edges.
Clip art - A watershed algorithm could help identify large, connected regions of consistent brightness. In clip art and synthetic images designed for high visibility there are more likely to be perfectly straight lines and parallel lines. A histogram of edge strengths is likely to have a few very strong peaks.
Line drawing - synthetic lines are likely to have very consistent width. The Stroke Width Transform could help you identify strokes. (One of the basic principles is to find edge gradients that "point at" each other.) A histogram of edge strengths may have only one strong peak.