I have created a neural network in Keras using the InceptionV3 pretrained model:
base_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(2048, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(len(labels_list), activation='sigmoid')(x)
I trained the model successfully and want to following image: https://imgur.com/a/hoNjDfR. Therefore, the image is cropped to 299x299 and normalized (just devided by 255):
def img_to_array(img, data_format='channels_last', dtype='float32'):
if data_format not in {'channels_first', 'channels_last'}:
raise ValueError('Unknown data_format: %s' % data_format)
# Numpy array x has format (height, width, channel)
# or (channel, height, width)
# but original PIL image has format (width, height, channel)
x = np.asarray(img, dtype=dtype)
if len(x.shape) == 3:
if data_format == 'channels_first':
x = x.transpose(2, 0, 1)
elif len(x.shape) == 2:
if data_format == 'channels_first':
x = x.reshape((1, x.shape[0], x.shape[1]))
else:
x = x.reshape((x.shape[0], x.shape[1], 1))
else:
raise ValueError('Unsupported image shape: %s' % (x.shape,))
return x
def load_image_as_array(path):
if pil_image is not None:
_PIL_INTERPOLATION_METHODS = {
'nearest': pil_image.NEAREST,
'bilinear': pil_image.BILINEAR,
'bicubic': pil_image.BICUBIC,
}
# These methods were only introduced in version 3.4.0 (2016).
if hasattr(pil_image, 'HAMMING'):
_PIL_INTERPOLATION_METHODS['hamming'] = pil_image.HAMMING
if hasattr(pil_image, 'BOX'):
_PIL_INTERPOLATION_METHODS['box'] = pil_image.BOX
# This method is new in version 1.1.3 (2013).
if hasattr(pil_image, 'LANCZOS'):
_PIL_INTERPOLATION_METHODS['lanczos'] = pil_image.LANCZOS
with open(path, 'rb') as f:
img = pil_image.open(io.BytesIO(f.read()))
width_height_tuple = (IMG_HEIGHT, IMG_WIDTH)
resample = _PIL_INTERPOLATION_METHODS['nearest']
img = img.resize(width_height_tuple, resample)
return img_to_array(img, data_format=K.image_data_format())
img_array = load_image_as_array('https://imgur.com/a/hoNjDfR')
img_array = img_array/255
Then I predict it with the trained model in Keras:
predict(img_array.reshape(1,img_array.shape[0],img_array.shape[1],img_array.shape[2]))
The result is the following:
array([[0.02083278, 0.00425783, 0.8858412 , 0.17453966, 0.2628744 ,
0.00428194, 0.2307986 , 0.01038828, 0.07561868, 0.00983179,
0.09568241, 0.03087404, 0.00751176, 0.00651798, 0.03731382,
0.02220723, 0.0187968 , 0.02018479, 0.3416505 , 0.00586909,
0.02030778, 0.01660049, 0.00960067, 0.02457979, 0.9711478 ,
0.00666443, 0.01468313, 0.0035468 , 0.00694743, 0.03057212,
0.00429407, 0.01556832, 0.03173089, 0.01407397, 0.35166138,
0.00734553, 0.0508953 , 0.00336689, 0.0169737 , 0.07512951,
0.00484502, 0.01656419, 0.01643038, 0.02031735, 0.8343202 ,
0.02500874, 0.02459189, 0.01325032, 0.00414564, 0.08371573,
0.00484318]], dtype=float32)
The important point is that it has four values with a value greater than 0.8:
>>> y[y>=0.8]
array([0.9100583 , 0.96635956, 0.91707945, 0.9711707 ], dtype=float32))
Now I have converted my network to .pb and imported it in an android project. I wanted to predict the same image in android. Therefore I also resize the image and normalize it like I did in Python by using the following code:
// Resize image:
InputStream imageStream = getAssets().open("test3.jpg");
Bitmap bitmap = BitmapFactory.decodeStream(imageStream);
Bitmap resized_image = utils.processBitmap(bitmap,299);
and then normalize by using the following function:
public static float[] normalizeBitmap(Bitmap source,int size){
float[] output = new float[size * size * 3];
int[] intValues = new int[source.getHeight() * source.getWidth()];
source.getPixels(intValues, 0, source.getWidth(), 0, 0, source.getWidth(), source.getHeight());
for (int i = 0; i < intValues.length; ++i) {
final int val = intValues[i];
output[i * 3] = Color.blue(val) / 255.0f;
output[i * 3 + 1] = Color.green(val) / 255.0f;
output[i * 3 + 2] = Color.red(val) / 255.0f ;
}
return output;
}
But in java I get other values. None of the four indices has a value greater than 0.8.
The value of the four indices are between 0.1 and 0.4!!!
I have checked my code several times, but I don't understand why in android I don't get the same values for the same image? Any idea or hint?
I'm developing an android app to recognize text in particular plate, as in photo here:
I have to recognize the texts in white (e.g. near to "Mod."). I'm using Google ML Kit's text recognition APIs, but it fails. So, I'm using OpenCV to edit image but I don't know how to emphasize the (white) texts so OCR recognize it. I tried more stuff, like contrast, brightness, gamma correction, adaptive thresholding, but the cases vary a lot depending on how the photo is taken. Do you have any ideas?
Thank u very much.
I coded this example in Python (since OpenCV's SIFT in Android is paid) but you can still use this to understand how to solve it.
First I created this image as a template:
Step 1: Load images
""" 1. Load images """
# load image of plate
src_path = "nRHzD.jpg"
src = cv2.imread(src_path)
# load template of plate (to be looked for)
src_template_path = "nRHzD_template.jpg"
src_template = cv2.imread(src_template_path)
Step 2: Find the template using SIFT and perspective transformation
# convert images to gray scale
src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
src_template_gray = cv2.cvtColor(src_template, cv2.COLOR_BGR2GRAY)
# use SIFT to find template
n_matches_min = 10
template_found, homography = find_template(src_gray, src_template_gray, n_matches_min)
warp = transform_perspective_and_crop(homography, src, src_gray, src_template)
warp_gray = cv2.cvtColor(warp, cv2.COLOR_BGR2GRAY)
warp_hsv = cv2.cvtColor(warp, cv2.COLOR_BGR2HSV)
template_hsv = cv2.cvtColor(src_template, cv2.COLOR_BGR2HSV)
Step 3: Find regions of interest (using the green parts of the template image)
green_hsv_lower_bound = [50, 250, 250]
green_hsv_upper_bound = [60, 255, 255]
mask_rois, mask_rois_img = crop_img_in_hsv_range(warp, template_hsv, green_hsv_lower_bound, green_hsv_upper_bound)
roi_list = separate_rois(mask_rois, warp_gray)
# sort the rois by distance to top right corner -> x (value[1]) + y (value[2])
roi_list = sorted(roi_list, key=lambda values: values[1]+values[2])
Step 4: Apply a Canny Edge detection to the rois (regions of interest)
for i, roi in enumerate(roi_list):
roi_img, roi_x_offset, roi_y_offset = roi
print("#roi:{} x:{} y:{}".format(i, roi_x_offset, roi_y_offset))
roi_img_blur_threshold = cv2.Canny(roi_img, 40, 200)
cv2.imshow("ROI image", roi_img_blur_threshold)
cv2.waitKey()
There are many ways for you to detect the digits, one of the easiest approaches is to run a K-Means Clustering on each of the contours.
Full code:
""" This code shows a way of getting the digit's edges in a pre-defined position (in green) """
import cv2
import numpy as np
def find_template(src_gray, src_template_gray, n_matches_min):
# Initiate SIFT detector
sift = cv2.xfeatures2d.SIFT_create()
""" find grid using SIFT """
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(src_template_gray, None)
kp2, des2 = sift.detectAndCompute(src_gray, None)
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)
if len(good) > n_matches_min:
src_pts = np.float32([kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
matchesMask = mask.ravel().tolist()
h_template, w_template = src_template_gray.shape
pts = np.float32([[0, 0], [0, h_template - 1], [w_template - 1, h_template - 1], [w_template - 1,0]]).reshape(-1,1,2)
homography = cv2.perspectiveTransform(pts, M)
else:
print "Not enough matches are found - %d/%d" % (len(good), n_matches_min)
matchesMask = None
# show matches
draw_params = dict(matchColor = (0, 255, 0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)
if matchesMask:
src_gray_copy = src_gray.copy()
sift_matches = cv2.polylines(src_gray_copy, [np.int32(homography)], True, 255, 2, cv2.LINE_AA)
sift_matches = cv2.drawMatches(src_template_gray, kp1, src_gray_copy, kp2, good, None, **draw_params)
return sift_matches, homography
def transform_perspective_and_crop(homography, src, src_gray, src_template_gray):
""" get mask and contour of template """
mask_img_template = np.zeros(src_gray.shape, dtype=np.uint8)
mask_img_template = cv2.polylines(mask_img_template, [np.int32(homography)], True, 255, 1, cv2.LINE_AA)
_ret, contours, hierarchy = cv2.findContours(mask_img_template, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
template_contour = None
# approximate the contour
c = contours[0]
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)
# if our approximated contour has four points, then
# we can assume that we have found our template
warp = None
if len(approx) == 4:
template_contour = approx
cv2.drawContours(mask_img_template, [template_contour] , -1, (255,0,0), -1)
""" Transform perspective """
# now that we have our template contour, we need to determine
# the top-left, top-right, bottom-right, and bottom-left
# points so that we can later warp the image -- we'll start
# by reshaping our contour to be our finals and initializing
# our output rectangle in top-left, top-right, bottom-right,
# and bottom-left order
pts = template_contour.reshape(4, 2)
rect = np.zeros((4, 2), dtype = "float32")
# the top-left point has the smallest sum whereas the
# bottom-right has the largest sum
s = pts.sum(axis = 1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# compute the difference between the points -- the top-right
# will have the minumum difference and the bottom-left will
# have the maximum difference
diff = np.diff(pts, axis = 1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
# now that we have our rectangle of points, let's compute
# the width of our new image
(tl, tr, br, bl) = rect
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
# ...and now for the height of our new image
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
# take the maximum of the width and height values to reach
# our final dimensions
maxWidth = max(int(widthA), int(widthB))
maxHeight = max(int(heightA), int(heightB))
# construct our destination points which will be used to
# map the screen to a top-down, "birds eye" view
homography = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")
# calculate the perspective transform matrix and warp
# the perspective to grab the screen
M = cv2.getPerspectiveTransform(rect, homography)
warp = cv2.warpPerspective(src, M, (maxWidth, maxHeight))
# resize warp
h_template, w_template, _n_channels = src_template_gray.shape
warp = cv2.resize(warp, (w_template, h_template), interpolation=cv2.INTER_AREA)
return warp
def crop_img_in_hsv_range(img, hsv, lower_bound, upper_bound):
mask = cv2.inRange(hsv, np.array(lower_bound), np.array(upper_bound))
# do an MORPH_OPEN (erosion followed by dilation) to remove isolated pixels
kernel = np.ones((5,5), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# Bitwise-AND mask and original image
res = cv2.bitwise_and(img, img, mask=mask)
return mask, res
def separate_rois(column_mask, img_gray):
# go through each of the boxes
# https://stackoverflow.com/questions/41592039/contouring-a-binary-mask-with-opencv-python
border = cv2.copyMakeBorder(column_mask, 1, 1, 1, 1, cv2.BORDER_CONSTANT, value=0)
_, contours, hierarchy = cv2.findContours(border, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE, offset=(-1, -1))
cell_list = []
for contour in contours:
cell_mask = np.zeros_like(img_gray) # Create mask where white is what we want, black otherwise
cv2.drawContours(cell_mask, [contour], -1, 255, -1) # Draw filled contour in mask
# turn that mask into a rectangle
(x,y,w,h) = cv2.boundingRect(contour)
#print("x:{} y:{} w:{} h:{}".format(x, y, w, h))
cv2.rectangle(cell_mask, (x, y), (x+w, y+h), 255, -1)
# copy the img_gray using that mask
img_tmp_region = cv2.bitwise_and(img_gray, img_gray, mask= cell_mask)
# Now crop
(y, x) = np.where(cell_mask == 255)
(top_y, top_x) = (np.min(y), np.min(x))
(bottom_y, bottom_x) = (np.max(y), np.max(x))
img_tmp_region = img_tmp_region[top_y:bottom_y+1, top_x:bottom_x+1]
cell_list.append([img_tmp_region, top_x, top_y])
return cell_list
""" 1. Load images """
# load image of plate
src_path = "nRHzD.jpg"
src = cv2.imread(src_path)
# load template of plate (to be looked for)
src_template_path = "nRHzD_template.jpg"
src_template = cv2.imread(src_template_path)
""" 2. Find the plate (using the template image) and crop it into a rectangle """
# convert images to gray scale
src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
src_template_gray = cv2.cvtColor(src_template, cv2.COLOR_BGR2GRAY)
# use SIFT to find template
n_matches_min = 10
template_found, homography = find_template(src_gray, src_template_gray, n_matches_min)
warp = transform_perspective_and_crop(homography, src, src_gray, src_template)
warp_gray = cv2.cvtColor(warp, cv2.COLOR_BGR2GRAY)
warp_hsv = cv2.cvtColor(warp, cv2.COLOR_BGR2HSV)
template_hsv = cv2.cvtColor(src_template, cv2.COLOR_BGR2HSV)
""" 3. Find regions of interest (using the green parts of the template image) """
green_hsv_lower_bound = [50, 250, 250]
green_hsv_upper_bound = [60, 255, 255]
mask_rois, mask_rois_img = crop_img_in_hsv_range(warp, template_hsv, green_hsv_lower_bound, green_hsv_upper_bound)
roi_list = separate_rois(mask_rois, warp_gray)
# sort the rois by distance to top right corner -> x (value[1]) + y (value[2])
roi_list = sorted(roi_list, key=lambda values: values[1]+values[2])
""" 4. Apply a Canny Edge detection to the rois (regions of interest) """
for i, roi in enumerate(roi_list):
roi_img, roi_x_offset, roi_y_offset = roi
print("#roi:{} x:{} y:{}".format(i, roi_x_offset, roi_y_offset))
roi_img_blur_threshold = cv2.Canny(roi_img, 40, 200)
cv2.imshow("ROI image", roi_img_blur_threshold)
cv2.waitKey()
I am trying to use K-means JavaCV implementation, but I have the following error:
OpenCV Error: Assertion failed (!centers.empty()) in cvKMeans2, file src\matrix.cpp, line 4233
My source code is:
IplImage src = cvLoadImage(fileName, CV_LOAD_IMAGE_COLOR);
int cluster_count = 3;
int attempts = 10;
CvTermCriteria termCriteria = new CvTermCriteria(TermCriteria.EPS + TermCriteria.MAX_ITER, 10, 1.0);
cvReshape(src, src.asCvMat(), 1, src.height() * src.width());
IplImage samples = cvCreateImage(cvGetSize(src), src.depth(), 1);
cvConvertImage(src, samples, CV_32F);
IplImage labels = cvCreateImage(new CvSize(samples.height()), 1, CV_8U);
IplImage centers = cvCreateImage(new CvSize(cluster_count), 1, CV_32F);
cvKMeans2(samples, cluster_count, labels, termCriteria, 1, new long[attempts], KMEANS_RANDOM_CENTERS, centers, new double[attempts]);
I'm beginner on JavaCV and would to know what I doing wrong in this code?
I'm new to opencv's svm's. I tried a sample classifier but it only returns 0 as the predicted label. I even used the value 5 for training as well as the prediction.
I've been changing the values for about a hundred times but i just don't get what's wrong. I'm using OpenCV 3.0 with Java. Here's my code:
Mat labels = new Mat(new Size(1,4),CvType.CV_32SC1);
labels.put(0, 0, 1);
labels.put(1, 0, 1);
labels.put(2, 0, 1);
labels.put(3, 0, 0);
Mat data = new Mat(new Size(1,4),CvType.CV_32FC1);
data.put(0, 0, 5);
data.put(1, 0, 2);
data.put(2, 0, 3);
data.put(3, 0, 8);
Mat testSamples = new Mat(new Size(1,1),CvType.CV_32FC1);
testSamples.put(0,0,5);
SVM svm = SVM.create();
TermCriteria criteria = new TermCriteria(TermCriteria.EPS + TermCriteria.MAX_ITER,100,0.1);
svm.setKernel(SVM.LINEAR);
svm.setType(SVM.C_SVC);
svm.setGamma(0.5);
svm.setNu(0.5);
svm.setC(1);
svm.setTermCriteria(criteria);
//data is N x 64 trained data Mat , labels is N x 1 label Mat with integer values;
svm.train(data, Ml.ROW_SAMPLE, labels);
Mat results = new Mat();
int predictedClass = (int) svm.predict(testSamples, results, 0);
Even if i change the lables to 1 and 2, I still get 0.0 as a response. So something has to be absolutely wrong... I just don't know what to do. Please help! :)
I had a similar problem in C++. I'm not too sure if it's the same in Java but in C++ the predictions were saved in the results Matrix instead of returned as a float.
I am learning how to get the local and global maximum in an image, and as far as know, in one image there is only one global Maximum and one global minimum, and i managed to get these values and their corresponding locations in the image. so my questions are:
how to get the local maxima in an image
how to get the local minima in an image
as you see in the code below, I am using mask, but at run time i receieve the below mentioned error message. so please let me know why do we need mask and how to use it properly.
update:
Line 32 is: MinMaxLocResult s = Core.minMaxLoc(gsMat, mask);
code:
public static void main(String[] args) {
MatFactory matFactory = new MatFactory();
FilePathUtils.addInputPath(path_Obj);
Mat bgrMat = matFactory.newMat(FilePathUtils.getInputFileFullPathList().get(0));
Mat gsMat = SysUtils.rgbToGrayScaleMat(bgrMat);
Log.D(TAG, "main", "gsMat.dump(): \n" + gsMat.dump());
Mat mask = new Mat(new Size(3,3), CvType.CV_8U);//which type i should set for the mask
MinMaxLocResult s = Core.minMaxLoc(gsMat, mask);
Log.D(TAG, "main", "s.maxVal: " + s.maxVal);//to get the global maximum
Log.D(TAG, "main", "s.minVal: " + s.minVal);//to get the global minimum
Log.D(TAG, "main", "s.maxLoc: " + s.maxLoc);//to get the coordinates of the global maximum
Log.D(TAG, "main", "s.minLoc: " + s.minLoc);//to get the coordinates of the global minimum
}
error message:
OpenCV Error: Assertion failed (A.size == arrays[i0]->size) in cv::NAryMatIterator::init, file ..\..\..\..\opencv\modules\core\src\matrix.cpp, line 3197
Exception in thread "main" CvException [org.opencv.core.CvException: ..\..\..\..\opencv\modules\core\src\matrix.cpp:3197: error: (-215) A.size == arrays[i0]->size in function cv::NAryMatIterator::init
]
at org.opencv.core.Core.n_minMaxLocManual(Native Method)
at org.opencv.core.Core.minMaxLoc(Core.java:7919)
at com.example.globallocalmaxima_00.MainClass.main(MainClass.java:32)
In order to calculate global min/max values you don't need to use mask completely.
For calculating local min/max values you can do a little trick. You need to perform dilate/erode operation and then compare pixel value with values of original image. If value of original image and dilated/eroded image are equal therefore this pixel is local min/max.
The code is following:
Mat eroded = new Mat();
Mat dilated = new Mat();
Imgproc.erode(gsMat, eroded, Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(5,5)));
Imgproc.dilate(gsMat, dilate, Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(5,5)));
Mat localMin = new Mat(gsMat.size(), CvType.CV_8U, new Scalar(0));
Mat localMax = new Mat(gsMat.size(), CvType.CV_8U, new Scalar(0));
for (int i=0; i<gsMat.height; i++)
for (int j=0; j<gsMat.width; j++)
{
if (gsMat.get(i,j) == eroded.get(i,j))
localMin.put(i,j,255);
if (gsMat.get(i,j) == dilated.get(i,j))
localMax.put(i,j,255);
}
Please note, I'm not a Java programmer. So, code is only illustration of algorithm.