I'm trying to detect rectangles using OpenCV. However, sometimes this is getting pretty difficult after running the Canny method, because two of the edges are usually being erased out. I've tried many different sets of thresholds and blurring it before applying Canny, but I haven't got major positive results yet. Currently, I'm not blurring the image, so this is pretty much what I'm doing:
Mat imgSource = Highgui.imread(filepath);
Imgproc.Canny(imgSource, imgSource, 300, 600, 5, true);
Example:
original http://imagizer.imageshack.us/a/img822/8776/27i9j.jpg
Canny http://imagizer.imageshack.us/a/img841/9868/wkc95.jpg
Then, I'm trying OpenCV's findContours method to detect the rectangle, it works 80% of the time, how can I improve it?
Try with different threshold value, in this case you will get better result when using lower threshold values, like 10,100.
blur(src,src,Size(3,3));
cvtColor(src,tmp,CV_BGR2GRAY);
Canny( src, thr, 10, 100, 3 );
Or in another way you will get the contour images by applying threshold
like,
threshold(tmp,thr,50,255,THRESH_BINARY_INV);
the problem here is the image compression JPEG file type probably.
try converting image to monochrome since you only have Black/white image
and edit the threshold value. this should eliminate the noise around the edges of the lines.
then canny can be applied with any values.
Related
Please note, I am a complete beginner in computer vision and OpenCV(Java).
My objective is to identify parking signs, and to draw bounding boxes around them. My problem is that the four signs from the top (with red borders) were not identified (see last image). I am also noticing that the Canny edge detection does not capture the edges of these four signs (see second image). I have tried with other images, and got the same results. My approach is as follows:
Load the image and convert it to gray scale
Pre-process the image by applying bilateralFilter and Gaussian blur
Execute Canny edge detection
Find all contours
Calculate the perimeter with arcLength and approximate the contour with approxPolyDP
If approximated figure has 4 points, then assuming it is a rectangle hence adding the contour
Finally, draw the contours that has 4 points exactly.
Mat filtered = new Mat();
Mat edges = new Mat(src.size(), CvType.CV_8UC1);
Imgproc.cvtColor(src, edges, Imgproc.COLOR_RGB2GRAY);
Imgproc.bilateralFilter(edges, filtered, 11, 17, 17);
org.opencv.core.Size s = new Size(5, 5);
Imgproc.GaussianBlur(filtered, filtered, s, 0);
Imgproc.Canny(filtered, filtered, 170, 200);
List<MatOfPoint> contours = new ArrayList<MatOfPoint>();
Imgproc.findContours(filtered, contours, new Mat(), Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);
List<MatOfPoint> rectangleContours = new ArrayList<MatOfPoint>();
for (MatOfPoint contour : contours) {
MatOfPoint2f dst = new MatOfPoint2f();
contour.convertTo(dst, CvType.CV_32F);
perimeter = Imgproc.arcLength(dst, true);
approximationAccuracy = 0.02 * perimeter;
MatOfPoint2f approx = new MatOfPoint2f();
Imgproc.approxPolyDP(dst, approx, approximationAccuracy, true);
if (approx.total() == 4) {
rectangleContours.add(contour);
Toast.makeText(reactContext.getApplicationContext(), "Rectangle detected" + approx.total(), Toast.LENGTH_SHORT).show();
}
}
Imgproc.drawContours(src, rectangleContours, -1, new Scalar(0, 255, 0), 5);
Very happy to get advice on how I could resolve this issue, even if it implies changing my stratergy.
What about starting with OCR, Tesseract, in order to recognize big "P" and other parking-related text patterns?
(Toast seems like Android: How can I use Tesseract in Android?
General Tesseract for Java: https://www.geeksforgeeks.org/tesseract-ocr-with-java-with-examples/ )
Another example, in Python, but see the preprocessing and other tricks and ideas for making the letters recognizable when the image has gradients, lower contrast, small fonts etc.: How to obtain the best result from pytesseract?
Also, there could be filtering by color, since the colors of the signs are known. The conversion to grayscale removes that valuable information, so finding the edges is OK, but the colors still can be used. E.g. split the colors to b,g,r and use each channel as grayscale and possibly boost it. The red and blue borders would stand out.
It seems the contrast around the red borders is too low, the blue signs are brighter compared to the black contour. If not splitting, before converting to grayscale, some of the color channels could be amplified anyway, like the red one.
Searching for big yellow/blue regions with low contrast, with text found, "P" etc. Tesseract has a function returning the boxes of the text that was found.
Also once you find a sign somewhere or a bar of signs and their directions, you could search there, vertically/horizontally.
You may search HoughLines as well, that may find the black border around the signs.
Calculate the perimeter with arcLength and approximate the contour
with approxPolyDP
If approximated figure has 4 points, then assuming it is a rectangle
hence adding the contour
IMO finding exactly 4 points (or after simplification of the polygon) is hard and may be not enough of an evidence, also there are round corners etc. if contours are compared directly.
The angles between the vertices and the distances matter - are the lines parallel (with some precision) etc.
The process could be iterative: gradually reducing the polygon detail, checking the area and perimeter, until the vertices reach 4 (or about that). If the area and perimeter don't change much (the ratio has to be found) after polygon aproximation (simplifying the round corners etc.), while the number of points in the contour gets reduced. I'd try also a comparison to the bounding box and the convex hull measurements etc.
If you need to only detect the parking signs, then treat this problem as a classic object detection problem (just like face detection). For the best results, you will need to use deep learning based convolutional neural network models.
To start with you can train the YOLO model which will give you a lot better results that anything you tried with OpenCV. You need at least 500 images. Then you need to annotate them. This tutorial is kick start tutorial on YOLO. Let's give a try.
Like YOLO there are so many models and all of them can be trained using similar process. So if you want to deploy your model on android, I will recommend you to choose a tensorflow based model. Train it on your PC and integrate the trained serialized model in your app.
I am trying to create an OCR application. I need to locate the text using contours. However, my image has a lot of noise and I was wondering if there is a way to remove it.
My current code:
// Input image already converted to a matrix
Imgproc.cvtColor(matrixImage, matrixImage, Imgproc.COLOR_BGR2GRAY);
// Gaussian blur
Imgproc.GaussianBlur(matrixImage, matrixImage, new Size(7,7), 0);
Imgproc.threshold(matrixImage, matrixImage, 125, 255, Imgproc.THRESH_BINARY_INV);
// This is my current approach for removing noise. However, there is still
// a lot of random areas that can be removed.
// Remove specs from image
Mat morphingMatrix = Mat.ones(3,3, CV_8UC1);
Imgproc.morphologyEx(matrixImage, matrixImage, Imgproc.MORPH_OPEN, morphingMatrix);
// Image denoising
Photo.fastNlMeansDenoising(matrixImage, matrixImage);
My input image. I allow users to manually mark the corners so the transformed image below only applies the transformation to the middle white piece of paper.
I have a solution for your problem. However, it does not involve removal of noise in this case.
Step 1:
I have obtained my own transformed image from the original image uploaded by you:
I presume you know how to perform transformations on images as stated in your question. Nevertheless to learn more about them visit THIS. To learn about homographic proJections visit THIS SITE.
I obtained the gray scale of this image:
Step 2:
To this image I performed adaptive threshold using Gaussian filter:
Step 3:
This step involves a couple of morphological operations:
Firstly, to remove the unwanted spots in the image, I used morphological closing operation:
Secondly, I used morphological dilation operation (which you may not need unless you want to highlight your text):
I'm trying to develop simple PC application for license plate recognition (Java + OpenCV + Tess4j). Images aren't really good (in further they will be good). I want to preprocess image for tesseract, and I'm stuck on detection of license plate (rectangle detection).
My steps:
1) Source Image
Mat img = new Mat();
img = Imgcodecs.imread("sample_photo.jpg");
Imgcodecs.imwrite("preprocess/True_Image.png", img);
2) Gray Scale
Mat imgGray = new Mat();
Imgproc.cvtColor(img, imgGray, Imgproc.COLOR_BGR2GRAY);
Imgcodecs.imwrite("preprocess/Gray.png", imgGray);
3) Gaussian Blur
Mat imgGaussianBlur = new Mat();
Imgproc.GaussianBlur(imgGray,imgGaussianBlur,new Size(3, 3),0);
Imgcodecs.imwrite("preprocess/gaussian_blur.png", imgGaussianBlur);
4) Adaptive Threshold
Mat imgAdaptiveThreshold = new Mat();
Imgproc.adaptiveThreshold(imgGaussianBlur, imgAdaptiveThreshold, 255, CV_ADAPTIVE_THRESH_MEAN_C ,CV_THRESH_BINARY, 99, 4);
Imgcodecs.imwrite("preprocess/adaptive_threshold.png", imgAdaptiveThreshold);
Here should be 5th step, which is detection of plate region (probably even without deskewing for now).
I croped needed region from image (after 4th step) with Paint, and got:
Then I did OCR (via tesseract, tess4j):
File imageFile = new File("preprocess/adaptive_threshold_AFTER_PAINT.png");
ITesseract instance = new Tesseract();
instance.setLanguage("eng");
instance.setTessVariable("tessedit_char_whitelist", "acekopxyABCEHKMOPTXY0123456789");
String result = instance.doOCR(imageFile);
System.out.println(result);
and got (good enough?) result - "Y841ox EH" (almost true)
How can I detect and crop plate region after 4th step? Have I to make some changes (improvements) in 1-4 steps? Would like to see some example implemented via Java + OpenCV (not JavaCV).
Thanks in advance.
EDIT (thanks to #Abdul Fatir's answer)
Well, I provide working (for me atleast) code sample (Netbeans+Java+OpenCV+Tess4j) for those who interested in this question. Code is not the best, but I made it just for studying.
http://pastebin.com/H46wuXWn (do not forget to put tessdata folder into your project folder)
Here's how I suggest you should do this task.
Convert to Grayscale.
Gaussian Blur with 3x3 or 5x5 filter.
Apply Sobel Filter to find vertical edges.
Sobel(gray, dst, -1, 1, 0)
Threshold the resultant image to get a binary image.
Apply a morphological close operation using suitable structuring element.
Find contours of the resulting image.
Find minAreaRect of each contour. Select rectangles based on aspect ratio and minimum and maximum area.
For each selected contour, find edge density. Set a threshold for edge density and choose the rectangles breaching that threshold as possible plate regions.
Few rectangles will remain after this. You can filter them based on orientation or any criteria you deem suitable.
Clip these detected rectangular portions from the image after adaptiveThreshold and apply OCR.
a) Result after Step 5
b) Result after Step 7. Green ones are all the minAreaRects and the Red ones are those which satisfy the following criteria: Aspect Ratio range (2,12) & Area range (300,10000)
c) Result after Step 9. Selected rectangle. Criteria: Edge Density > 0.5
EDIT
For edge-density, what I did in the above examples is the following.
Apply Canny Edge detector directly to input image. Let the cannyED image be Ic.
Multiply results of Sobel filter and Ic. Basically, take an AND of Sobel and Canny images.
Gaussian Blur the resultant image with a large filter. I used 21x21.
Threshold the resulting image using OTSU's method. You'll get a binary image
For each red rectangle, rotate the portion inside this rectangle (in the binary image) to make it upright. Loop through the pixels of the rectangle and count white pixels. (How to rotate?)
Edge Density = No. of White Pixels in the Rectangle/Total no. of Pixels in the rectangle
Choose a threshold for edge density.
NOTE: Instead of going through steps 1 to 3, you can also use the binary image from step 5 for calculating edge density.
Actually OpenCV has pre-trained model specially for Russian license plates: haarcascade_russian_plate_number
Also there is open source ANPR project for Russian license plates: plate_recognition. It is not use tesseract, but it has quite good pre-trained neural network.
You find all connected components (the white areas) and determine their outline.
If you filter them based on size (as part of the image), ratio (width-height) and white/black ratio to retrieve candidate-plates.
Undo the transformation of the rectangle
Remove the bolts
Pass in image to the OCR engine.
I have to find out the contour of the image. After that, I want to find out how to fill in hole in the number characters, but not in the other space. The image is the following.
http://i.stack.imgur.com/jlLYE.jpg
Actually, if it is not possible, is there any other method for me to perform segmentation of this image by using openCV in java platform? I want the image contains the characters only. Thankyou.
http://i.stack.imgur.com/kY4Dh.png
Here is a simple method (But I am not sure if it will work everywhere. Test it yourself)
NB: Code is in Python, I don't do Java, sorry about that :(
Load the grayscale image
Apply Otsu's binarization
import cv2
import numpy as np
img = cv2.imread('test.png',0)
ret, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Below is the thresholded image:
Now you can try two methods:
3a. Median Blurring with a 3x3 kernel
res = cv2.medianBlur(thresh,3)
Result:
3b. Erosion with a 3x1 kernel (vertical). 3x1 because all the lines in your image are more-horizontal. It there are vertical lines in other images, you may need to take 3x3 kernel (not sure. Check it)
kernel = np.ones((3,1))
cls = cv2.erode(thresh, kernel)
If you think you are losing some parts of digits also, you can apply dilation after erosion, or replace everything with morphological opening function.
Result:
Finally, find contours. It will also pick up some noise left in the preprocessed image, but you can filter out them by checking their aspect ratio, area etc.
There is a not that complicated solution. Think about the vertical run length representation of the image: since you have only black and white there, you can think that each vertical line of the image is a list containing only 1(black pixel) and 0 (white pixel) so you will have for instace 01111111110000000000000111. This can be minimized if you will take only the lengths of each sublist containing only 1 or 0, so instead of 0000000001111111000111111111111 you will have 0 9 7 3 12, starts with 0 because let us say that you always start with the count of black pixels, and since there you don't have any black pixels at the begining you put a 0 (it will be much easier to work like this). After you have this reprezentation you take the maximal value for a white run (the run is actually that count of white or black pixels) and the minimal one and go throught all white runs. For each of them you see if the value is closer to the smallest white run, and if this is the case you just remove it.
This algorithm should work for the given image ;)
What is the best way to identify an image's type? rwong's answer on this question suggests that Google segments images into the following groups:
Photo - continuous-tone
Clip art - smooth shading
Line drawing - bitonal
What is the best strategy for classifying an image into one of those groups? I'm currently using Java but any general approaches are welcome.
Thanks!
Update:
I tried the unique colour counting method that tyjkenn mentioned in a comment and it seems to work for about 90% of the cases that I've tried. In particular black and white photos are hard to correctly detect using unique colour count alone.
Getting the image histogram and counting the peeks alone doesn't seem like it will be a viable option. For example this image only has two peaks:
Here are two more images I've checked out:
Rather simple, but effective approaches to differentiate between drawings and photos. Use them in combination to achieve a the best accuracy:
1) Mime type or file extension
PNGs are typically clip arts or drawings, while JPEGs are mostly photos.
2) Transparency
If the image has an alpha channel, it's most likely a drawing. In case an alpha channel exists, you can additionally iterate over all pixels to check if transparency is indeed used. Here a Python example code:
from PIL import Image
img = Image.open('test.png')
transparency = False
if img.mode in ('RGBA', 'RGBa', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
if img.mode != 'RGBA': img = img.convert('RGBA')
transparency = any(px for px in img.getdata() if px[3] < 220)
print 'Transparency:', transparency
3) Color distribution
Clip arts often have regions with identical colors. If a few color make up a significant part of the image, it's rather a drawing than a photo. This code outputs the percentage of the image area that is made from the ten most used colors (Python example):
from PIL import Image
img = Image.open('test.jpg')
img.thumbnail((200, 200), Image.ANTIALIAS)
w, h = img.size
print sum(x[0] for x in sorted(img.convert('RGB').getcolors(w*h), key=lambda x: x[0], reverse=True)[:10])/float((w*h))
You need to adapt and optimize those values. Is ten colors enough for your data? What percentage is working best for you. Find it out by testing a larger number of sample images. 30% or more is typically a clip art. Not for sky photos or the likes, though. Therefore, we need another method - the next one.
4) Sharp edge detection via FFT
Sharp edges result in high frequencies in a Fourier spectrum. And typically such features are more often found in drawings (another Python snippet):
from PIL import Image
import numpy as np
img = Image.open('test.jpg').convert('L')
values = abs(numpy.fft.fft2(numpy.asarray(img.convert('L')))).flatten().tolist()
high_values = [x for x in values if x > 10000]
high_values_ratio = 100*(float(len(high_values))/len(values))
print high_values_ratio
This code gives you the number of frequencies that are above one million per area. Again: optimize such numbers according to your sample images.
Combine and optimize these methods for your image set. Let me know if you can improve this - or just edit this answer, please. I'd like to improve it myself :-)
This problem can be solved by image classification and that's probably Google's solution to the problem. Basically, what you have to do is (i) get a set of images labeled into 3 categories: photo, clip-art and line drawing; (ii) extract features from these images; (iii) use the image's features and label to train a classifier.
Feature Extraction:
In this step you have to extract visual information that may be useful for the classifier to discriminate between the 3 categories of images:
A very basic yet useful visual feature is the image histogram and its variants. For example, the gray level histogram of a photo is probably smoother than a histogram of a clipart, where you have regions that may be all of the same color value.
Another feature that one can use is to convert the image to the frequency domain (e.g. using FFT or DCT) and measure the energy of high frequency components. Because line drawings will probably have sharp transitions of colors, its high frequency components will tend to accumulate more energy.
There's also a number of other feature extraction algorithms that may be used.
Training a Classifier:
After the feature extraction phase, we will have for each image a vector of numeric values (let's call it the image feature vector) and its tuple. That's a suitable input for a training a classifier. As for the classifier, one may consider Neural Networks, SVM and others.
Classification:
Now that we have a trained classifier, to classify an image (i.e. detect a image category) we simply have to extract its features and input it to the classifier and it will return its predicted category
Histograms would be a first way to do this.
Convert the color image to grayscale and calculate the histogram.
A very bi-modal histogram with 2 sharp peaks in black (or dark) and white (or right), probably with much more white, are a good indication for line-drawing.
If you have just a few more peaks then it is likely a clip-art type image.
Otherwise it's a photo.
In addition to color histograms, also consider edge information and the consistency of line widths throughout the image.
Photo - natural edges will have a variety of edge strengths, and it's less likely that there will be many parallel edges.
Clip art - A watershed algorithm could help identify large, connected regions of consistent brightness. In clip art and synthetic images designed for high visibility there are more likely to be perfectly straight lines and parallel lines. A histogram of edge strengths is likely to have a few very strong peaks.
Line drawing - synthetic lines are likely to have very consistent width. The Stroke Width Transform could help you identify strokes. (One of the basic principles is to find edge gradients that "point at" each other.) A histogram of edge strengths may have only one strong peak.