I am trying to rotate image without standard method , making color array and manipulate it, but when I invoke the, rotation I get black points (look the picture)
Here is my code, colScaled is the picture I am trying to convert to an array:
public void arrays() {
colScaled = zoom2();
int j = 0;
int i = 0;
angel = Integer.parseInt(this.mn.jTextField1.getText());
float degree = (float) Math.toRadians(angel);
float cos = (float) Math.cos(degree);
float sin = (float) Math.sin(degree);
int W = Math.round(colScaled[0].length * Math.abs(sin) + colScaled.length * Math.abs(cos));
int H = Math.round(colScaled[0].length * Math.abs(cos) + colScaled.length * Math.abs(sin));
int x;
int y;
int xn = (int) W / 2;
int yn = (int) H / 2;
int hw = (int) colScaled.length / 2;
int hh = (int) colScaled[0].length / 2;
BufferedImage image = new BufferedImage(W + 1, H + 1, im.getType());
for (i = 0; i < colScaled.length; i++) {
for (j = 0; j < colScaled[0].length; j++) {
x = Math.round((i - hw) * cos - (j - hh) * sin + xn);
y = Math.round((i - hw) * sin + (j - hh) * cos + yn);
image.setRGB(x, y, colScaled[i][j]);
}
}
ImageIcon ico = new ImageIcon(image);
this.mn.jLabel1.setIcon(ico);
}
Notice this block in your code :-
for (i = 0; i < colScaled.length; i++) {
for (j = 0; j < colScaled[0].length; j++) {
x = Math.round((i - hw) * cos - (j - hh) * sin + xn);
y = Math.round((i - hw) * sin + (j - hh) * cos + yn);
image.setRGB(x, y, colScaled[i][j]);
}
}
The x and y is pixel coordinate in source image (colScaled).
The objective of this code is to fill all pixels in destination image (image).
In your loop, there is no guarantee that all pixels in the destination image will be filled, even it is in the rectangle zone.
The above image depict the problem.
See? It is possible that the red pixel in the destination image will not be written.
The correct solution is to iterating pixel in destination image, then find a corresponding pixel in source image later.
Edit: After posting, I just saw the Spektre's comment.
I agree, it seems to be a duplicated question. The word "pixel array" made me thing it is not.
I am trying to find image in an image. I do this for desktop automation. At this moment, I'm trying to be fast, not precise. As such, I have decided to match similar image solely based on the same average color.
If I pick several icons on my desktop, for example:
And I will search for the last one (I'm still wondering what this file is):
You can clearly see what is most likely to be the match:
In different situations, this may not work. However when image size is given, it should be pretty reliable and lightning fast.
I can get a screenshot as BufferedImage object:
MSWindow window = MSWindow.windowFromName("Firefox", false);
BufferedImage img = window.screenshot();
//Or, if I can estimate smaller region for searching:
BufferedImage img2 = window.screenshotCrop(20,20,50,50);
Of course, the image to search image will be loaded from template saved in a file:
BufferedImage img = ImageIO.read(...whatever goes in there, I'm still confused...);
I explained what all I know so that we can focus on the only problem:
Q: How can I get average color on buffered image? How can I get such average color on sub-rectangle of that image?
Speed wins here. In this exceptional case, I consider it more valuable than code readability.
I think that no matter what you do, you are going to have an O(wh) operation, where w is your width and h is your height.
Therefore, I'm going to post this (naive) solution to fulfil the first part of your question as I do not believe there is a faster solution.
/*
* Where bi is your image, (x0,y0) is your upper left coordinate, and (w,h)
* are your width and height respectively
*/
public static Color averageColor(BufferedImage bi, int x0, int y0, int w,
int h) {
int x1 = x0 + w;
int y1 = y0 + h;
long sumr = 0, sumg = 0, sumb = 0;
for (int x = x0; x < x1; x++) {
for (int y = y0; y < y1; y++) {
Color pixel = new Color(bi.getRGB(x, y));
sumr += pixel.getRed();
sumg += pixel.getGreen();
sumb += pixel.getBlue();
}
}
int num = w * h;
return new Color(sumr / num, sumg / num, sumb / num);
}
There is a constant time method for finding the mean colour of a rectangular section of an image but it requires a linear preprocess. This should be fine in your case. This method can also be used to find the mean value of a rectangular prism in a 3d array or any higher dimensional analog of the problem. I will be using a gray scale example but this can be easily extended to 3 or more channels simply by repeating the process.
Lets say we have a 2 dimensional array of numbers we will call "img".
The first step is to generate a new array of the same dimensions where each element contains the sum of all values in the original image that lie within the rectangle that bounds that element and the top left element of the image.
You can use the following method to construct such an image in linear time:
int width = 1920;
int height = 1080;
//source data
int[] img = GrayScaleScreenCapture();
int[] helperImg = int[width * height]
for(int y = 0; y < height; ++y)
{
for(int x = 0; x < width; ++x)
{
int total = img[y * width + x];
if(x > 0)
{
//Add value from the pixel to the left in helperImg
total += helperImg[y * width + (x - 1)];
}
if(y > 0)
{
//Add value from the pixel above in helperImg
total += helperImg[(y - 1) * width + x];
}
if(x > 0 && y > 0)
{
//Subtract value from the pixel above and to the left in helperImg
total -= helperImg[(y - 1) * width + (x - 1)];
}
helperImg[y * width + x] = total;
}
}
Now we can use helperImg to find the total of all values within a given rectangle of img in constant time:
//Some Rectangle with corners (x0, y0), (x1, y0) , (x0, y1), (x1, y1)
int x0 = 50;
int x1 = 150;
int y0 = 25;
int y1 = 200;
int totalOfRect = helperImg[y1 * width + x1];
if(x0 > 0)
{
totalOfRect -= helperImg[y1 * width + (x0 - 1)];
}
if(y0 > 0)
{
totalOfRect -= helperImg[(y0 - 1) * width + x1];
}
if(x0 > 0 && y0 > 0)
{
totalOfRect += helperImg[(y0 - 1) * width + (x0 - 1)];
}
Finally, we simply divide totalOfRect by the area of the rectangle to get the mean value:
int rWidth = x1 - x0 + 1;
int rheight = y1 - y0 + 1;
int meanOfRect = totalOfRect / (rWidth * rHeight);
Here's a version based on k_g's answer for a full BufferedImage with adjustable sample precision (step).
public static Color getAverageColor(BufferedImage bi) {
int step = 5;
int sampled = 0;
long sumr = 0, sumg = 0, sumb = 0;
for (int x = 0; x < bi.getWidth(); x++) {
for (int y = 0; y < bi.getHeight(); y++) {
if (x % step == 0 && y % step == 0) {
Color pixel = new Color(bi.getRGB(x, y));
sumr += pixel.getRed();
sumg += pixel.getGreen();
sumb += pixel.getBlue();
sampled++;
}
}
}
int dim = bi.getWidth()*bi.getHeight();
// Log.info("step=" + step + " sampled " + sampled + " out of " + dim + " pixels (" + String.format("%.1f", (float)(100*sampled/dim)) + " %)");
return new Color(Math.round(sumr / sampled), Math.round(sumg / sampled), Math.round(sumb / sampled));
}
I am trying to properly rotate a sword in my 2D game. I have a sword image file, and I wish to rotate the image at the player's location. I tried using Graphics2D and AffineTransform, but the problem is that the player moves on a different coordinate plane, the Screen class, and the Graphics uses the literal location of the pixels on the JFrame. So, I realized that I need to render the sword by rotating the image itself, and then saving it into a pixel array for my screen class to render. However, I don't know how to do this. Here is the code for my screen rendering method:
public void render(double d, double yOffset2, BufferedImage image, int colour,
int mirrorDir, double scale, SpriteSheet sheet) {
d -= xOffset;
yOffset2 -= yOffset;
boolean mirrorX = (mirrorDir & BIT_MIRROR_X) > 0;
boolean mirrorY = (mirrorDir & BIT_MIRROR_Y) > 0;
double scaleMap = scale - 1;
for (int y = 0; y < image.getHeight(); y++) {
int ySheet = y;
if (mirrorY)
ySheet = image.getHeight() - 1 - y;
int yPixel = (int) (y + yOffset2 + (y * scaleMap) - ((scaleMap * 8) / 2));
for (int x = 0; x < image.getWidth(); x++) {
int xPixel = (int) (x + d + (x * scaleMap) - ((scaleMap * 8) / 2));
int xSheet = x;
if (mirrorX)
xSheet = image.getWidth() - 1 - x;
int col = (colour >> (sheet.pixels[xSheet + ySheet
* sheet.width])) & 255;
if (col < 255) {
for (int yScale = 0; yScale < scale; yScale++) {
if (yPixel + yScale < 0 || yPixel + yScale >= height)
continue;
for (int xScale = 0; xScale < scale; xScale++) {
if (x + d < 0 || x + d >= width)
continue;
pixels[(xPixel + xScale) + (yPixel + yScale)
* width] = col;
}
}
}
}
}
}
Here is one of my poor attempts to call the render method from the Sword Class:
public void render(Screen screen) {
AffineTransform at = new AffineTransform();
at.rotate(1, image.getWidth() / 2, image.getHeight() / 2);
AffineTransformOp op = new AffineTransformOp(at,
AffineTransformOp.TYPE_BILINEAR);
image = op.filter(image, null);
screen.render(this.x, this.y, image, SwordColor, 1, 1.5, sheet);
hitBox.setLocation((int) this.x, (int) this.y);
for (Entity entity : level.getEntities()) {
if (entity instanceof Mob) {
if (hitBox.intersects(((Mob) entity).hitBox)) {
// ((Mob) entity).health--;
}
}
}
}
Thank you for any help you can provide, and please feel free to tell me if theres a better way to do this.
You can rotate() the image around an anchor point, also seen here in a Graphics2D context. The method concatenates translate(), rotate() and translate() operations, also seen here as explicit transformations.
Addendum: It rotates the image, but how do I save the pixels of the image as an array?
Once you filter() the image, use one of the ImageIO.write() methods to save the resulting RenderedImage, for example.
I am writing a java code that divides an image into chunks and rotate to some degree and combine the chunks to become one final image. Then use same code to divide the image into chunks and rotate opposite. I expect to get the same image as the original but I get an image with black line separated between them. For example an image is divided into 8 rows and 8 columns and conduct rotation. I have googled it and come up with the following code:
public static BufferedImage Didvide( BufferedImage image , int Bz ,double angle ){
int rows = Bz;
int cols = Bz;
int chunks = rows * cols;
int chunkWidth = image.getWidth() / cols;
int chunkHeight = image.getHeight() / rows;
int count = 0;
BufferedImage imgs[] = new BufferedImage[chunks];
for (int x = 0; x < rows; x++) {
for (int y = 0; y < cols; y++) {
imgs[count] = new BufferedImage(chunkWidth, chunkHeight,
image.getType());
// draws image chunk
Graphics2D gr = imgs[count++].createGraphics();
gr.drawImage(image, 0, 0, chunkWidth, chunkHeight, chunkWidth
* y, chunkHeight * x, chunkWidth * y + chunkWidth,
chunkHeight * x + chunkHeight, null);
gr.dispose();
}
}
BufferedImage[] Rimgs = new BufferedImage[imgs.length];
for (int i = 0; i < imgs.length; i++) {
Rimgs[i] = rotate(imgs[i], angle);
}
chunkWidth = Rimgs[0].getWidth();
chunkHeight = Rimgs[0].getHeight();
// Initializing the final image
BufferedImage finalImg = new BufferedImage(chunkWidth * cols,
chunkHeight * rows, BufferedImage.TYPE_3BYTE_BGR);
int num = 0;
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
finalImg.createGraphics().drawImage(Rimgs[num], chunkWidth * j,
chunkHeight * i, null);
num++;
} } return finalImg; }
public static BufferedImage rotate(BufferedImage image, double angle) {
double sin = Math.abs(Math.sin(angle)), cos = Math.abs(Math.cos(angle));
int w = image.getWidth(), h = image.getHeight();
int neww = (int) Math.floor(w * cos + h * sin), newh = (int) Math
.floor(h * cos + w * sin);
GraphicsConfiguration gc = getDefaultConfiguration();
BufferedImage result = gc.createCompatibleImage(neww, newh,
Transparency.OPAQUE);
Graphics2D g = result.createGraphics();
g.translate((neww - w) / 2, (newh - h) / 2);
g.rotate(angle, w / 2, h / 2);
g.drawRenderedImage(image, null);
g.dispose();
return result;
}
The problem I face after dividing an image of baboo gray-scale 298X298 pixel into 8 col and 8 row, the resulting image has black lines separating columns. However when I divide the image into 12 or 4 it works fine. Can you please let me know where I should consider.
Seems I can not post image.
When I divide and rotate the image into 8 rows and 8 columns of an image with 298X298, I get a result of 296X296 pixel. How can I fix this. So the size of before dividing and rotating is same as after.
Thanks in advance for your help.
Does anyone know of an algorithm (or search terms / descriptions) to locate a known image within a larger image?
e.g.
I have an image of a single desktop window containing various buttons and areas (target). I also have code to capture a screen shot of the current desktop. I would like an algorithm that will help me find the target image within the larger desktop image (what exact x and y coordinates the window is located at). The target image may be located anywhere in the larger image and may not be 100% exactly the same (very similar but not exact possibly b/c of OS display differences)
Does anyone know of such an algorithm or class of algorithms?
I have found various image segmentation and computer vision algorithms but they seem geared to "fuzzy" classification of regions and not locating a specific image within another.
** My goal is to create a framework that, given some seed target images, can find "look" at the desktop, find the target area and "watch" it for changes. **
Have a look at the paper I wrote: http://werner.yellowcouch.org/Papers/subimg/index.html. It's highly detailed and appears to be the only article discussing how to apply fourier transformation to the problem of subimage finding.
In short, if you want to use the fourier transform one could apply the following formula: the correlation between image A and image B when image A is shifted over dx,dy is given in the following matrix: C=ifft(fft(A) x conjugate(fft(B)). So, the position in image C that has the highest value, has the highest correlation and that position reflects dx,dy.
This result works well for subimages that are relatively large. For smaller images, some more work is necessary as explained in the article. Nevertheless, such fourier transforms are quite fast. It results in around 3*sxsylog_2(sx*sy)+3*sx*sy operations.
You said your image may not be exactly the same, but then say you don't want "fuzzy" algorithms. I'm not sure those are compatible. In general, though, I think you want to look at image registration algorithms. There's an open source C++ package called ITK that might provide some hints. Also ImageJ is a popular open source Java package. Both of these have at least some registration capabilities available if you poke around.
Here's the skeleton of code you'd want to use:
// look for all (x,y) positions where target appears in desktop
List<Loc> findMatches(Image desktop, Image target, float threshold) {
List<Loc> locs;
for (int y=0; y<desktop.height()-target.height(); y++) {
for (int x=0; x<desktop.width()-target.width(); x++) {
if (imageDistance(desktop, x, y, target) < threshold) {
locs.append(Loc(x,y));
}
}
}
return locs;
}
// computes the root mean squared error between a rectangular window in
// bigImg and target.
float imageDistance(Image bigImg, int bx, int by, Image target) {
float sum_dist2 = 0.0;
for (int y=0; y<target.height(); y++) {
for (int x=0; x<target.width(); x++) {
// assume RGB images...
for (int colorChannel=0; colorChannel<3; colorChannel++) {
float dist = target.getPixel(x,y) - bigImg.getPixel(bx+x,by+y);
sum_dist2 += dist * dist;
}
}
}
return Math.sqrt(sum_dist2 / target.width() / target.height());
}
You could consider other image distances (see a similar question). For your application, the RMS error is probably a good choice.
There are probably various Java libraries that compute this distance for you efficiently.
You could use unique visual elements of this target area to determine its position. These unique visual elements are like a "signature". Examples: unique icons, images and symbols. This approach works independently of the window resolution if you have unique elements in the corners. For fixed sized windows, just one element is sufficient to find all window coordinates.
Below I illustrate the idea with a simple example using Marvin Framework.
Unique elements:
Program output:
Original Image:
window.png
Source code:
import static marvin.MarvinPluginCollection.*;
public class FindSubimageWindow {
public FindSubimageWindow(){
MarvinImage window = MarvinImageIO.loadImage("./res/window.png");
MarvinImage eclipse = MarvinImageIO.loadImage("./res/eclipse_icon.png");
MarvinImage progress = MarvinImageIO.loadImage("./res/progress_icon.png");
MarvinSegment seg1, seg2;
seg1 = findSubimage(eclipse, window, 0, 0);
drawRect(window, seg1.x1, seg1.y1, seg1.x2-seg1.x1, seg1.y2-seg1.y1);
seg2 = findSubimage(progress, window, 0, 0);
drawRect(window, seg2.x1, seg2.y1, seg2.x2-seg2.x1, seg2.y2-seg2.y1);
drawRect(window, seg1.x1-10, seg1.y1-10, (seg2.x2-seg1.x1)+25, (seg2.y2-seg1.y1)+20);
MarvinImageIO.saveImage(window, "./res/window_out.png");
}
private void drawRect(MarvinImage image, int x, int y, int width, int height){
x-=4; y-=4; width+=8; height+=8;
image.drawRect(x, y, width, height, Color.red);
image.drawRect(x+1, y+1, width-2, height-2, Color.red);
image.drawRect(x+2, y+2, width-4, height-4, Color.red);
}
public static void main(String[] args) {
new FindSubimageWindow();
}
}
I considered the solution of Werner Van Belle (since all other approaches are incredible slow - not practicable at all):
An Adaptive Filter for the Correct Localization of Subimages: FFT
based Subimage Localization Requires Image Normalization to work
properly
I wrote the code in C# where I need my solution, but I am getting highly inaccurate results. Does it really not work well, in contrary to Van Belle's statement, or did I do something wrong? I used https://github.com/tszalay/FFTWSharp as a C# wrapper for FFTW.
Here is my translated code: (original in C++ at http://werner.yellowcouch.org/Papers/subimg/index.html)
using System.Diagnostics;
using System;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using FFTWSharp;
using unsigned1 = System.Byte;
using signed2 = System.Int16;
using signed8 = System.Int64;
public class Subimage
{
/**
* This program finds a subimage in a larger image. It expects two arguments.
* The first is the image in which it must look. The secon dimage is the
* image that is to be found. The program relies on a number of different
* steps to perform the calculation.
*
* It will first normalize the input images in order to improve the
* crosscorrelation matching. Once the best crosscorrelation is found
* a sad-matchers is applied in a grid over the larger image.
*
* The following two article explains the details:
*
* Werner Van Belle; An Adaptive Filter for the Correct Localization
* of Subimages: FFT based Subimage Localization Requires Image
* Normalization to work properly; 11 pages; October 2007.
* http://werner.yellowcouch.org/Papers/subimg/
*
* Werner Van Belle; Correlation between the inproduct and the sum
* of absolute differences is -0.8485 for uniform sampled signals on
* [-1:1]; November 2006
*/
unsafe public static Point FindSubimage_fftw(string[] args)
{
if (args == null || args.Length != 2)
{
Console.Write("Usage: subimg\n" + "\n" + " subimg is an image matcher. It requires two arguments. The first\n" + " image should be the larger of the two. The program will search\n" + " for the best position to superimpose the needle image over the\n" + " haystack image. The output of the the program are the X and Y\n" + " coordinates.\n\n" + " http://werner.yellowouch.org/Papers/subimg/\n");
return new Point();
}
/**
* The larger image will be called A. The smaller image will be called B.
*
* The code below relies heavily upon fftw. The indices necessary for the
* fast r2c and c2r transforms are at best confusing. Especially regarding
* the number of rows and colums (watch our for Asx vs Asx2 !).
*
* After obtaining all the crosscorrelations we will scan through the image
* to find the best sad match. As such we make a backup of the original data
* in advance
*
*/
int Asx = 0, Asy = 0;
signed2[] A = read_image(args[0], ref Asx, ref Asy);
int Asx2 = Asx / 2 + 1;
int Bsx = 0, Bsy = 0;
signed2[] B = read_image(args[1], ref Bsx, ref Bsy);
unsigned1[] Asad = new unsigned1[Asx * Asy];
unsigned1[] Bsad = new unsigned1[Bsx * Bsy];
for (int i = 0; i < Bsx * Bsy; i++)
{
Bsad[i] = (unsigned1)B[i];
Asad[i] = (unsigned1)A[i];
}
for (int i = Bsx * Bsy; i < Asx * Asy; i++)
Asad[i] = (unsigned1)A[i];
/**
* Normalization and windowing of the images.
*
* The window size (wx,wy) is half the size of the smaller subimage. This
* is useful to have as much good information from the subimg.
*/
int wx = Bsx / 2;
int wy = Bsy / 2;
normalize(ref B, Bsx, Bsy, wx, wy);
normalize(ref A, Asx, Asy, wx, wy);
/**
* Preparation of the fourier transforms.
* Aa is the amplitude of image A. Af is the frequence of image A
* Similar for B. crosscors will hold the crosscorrelations.
*/
IntPtr Aa = fftw.malloc(sizeof(double) * Asx * Asy);
IntPtr Af = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);
IntPtr Ba = fftw.malloc(sizeof(double) * Asx * Asy);
IntPtr Bf = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);
/**
* The forward transform of A goes from Aa to Af
* The forward tansform of B goes from Ba to Bf
* In Bf we will also calculate the inproduct of Af and Bf
* The backward transform then goes from Bf to Aa again. That
* variable is aliased as crosscors;
*/
//#original: fftw_plan_dft_r2c_2d
//IntPtr forwardA = fftwf.dft(2, new int[] { Asy, Asx }, Aa, Af, fftw_direction.Forward, fftw_flags.Estimate);//equal results
IntPtr forwardA = fftwf.dft_r2c_2d(Asy, Asx, Aa, Af, fftw_flags.Estimate);
//#original: fftw_plan_dft_r2c_2d
//IntPtr forwardB = fftwf.dft(2, new int[] { Asy, Asx }, Ba, Bf, fftw_direction.Forward, fftw_flags.Estimate);//equal results
IntPtr forwardB = fftwf.dft_r2c_2d(Asy, Asx, Ba, Bf, fftw_flags.Estimate);
double* crosscorrs = (double*)Aa;
//#original: fftw_plan_dft_c2r_2d
//IntPtr backward = fftwf.dft(2, new int[] { Asy, Asx }, Bf, Aa, fftw_direction.Backward, fftw_flags.Estimate);//equal results
IntPtr backward = fftwf.dft_c2r_2d(Asy, Asx, Bf, Aa, fftw_flags.Estimate);
/**
* The two forward transforms of A and B. Before we do so we copy the normalized
* data into the double array. For B we also pad the data with 0
*/
for (int row = 0; row < Asy; row++)
for (int col = 0; col < Asx; col++)
((double*)Aa)[col + Asx * row] = A[col + Asx * row];
fftw.execute(forwardA);
for (int j = 0; j < Asx * Asy; j++)
((double*)Ba)[j] = 0;
for (int row = 0; row < Bsy; row++)
for (int col = 0; col < Bsx; col++)
((double*)Ba)[col + Asx * row] = B[col + Bsx * row];
fftw.execute(forwardB);
/**
* The inproduct of the two frequency domains and calculation
* of the crosscorrelations
*/
double norm = Asx * Asy;
for (int j = 0; j < Asx2 * Asy; j++)
{
double a = ((double*)Af)[j * 2];//#Af[j][0];
double b = ((double*)Af)[j * 2 + 1];//#Af[j][1];
double c = ((double*)Bf)[j * 2];//#Bf[j][0];
double d = ((double*)Bf)[j * 2 + 1];//#-Bf[j][1];
double e = a * c - b * d;
double f = a * d + b * c;
((double*)Bf)[j * 2] = (double)(e / norm);//#Bf[j][0] = (fftw_real)(e / norm);
((double*)Bf)[j * 2 + 1] = (double)(f / norm);//Bf[j][1] = (fftw_real)(f / norm);
}
fftw.execute(backward);
/**
* We now have a correlation map. We can spent one more pass
* over the entire image to actually find the best matching images
* as defined by the SAD.
* We calculate this by gridding the entire image according to the
* size of the subimage. In each cel we want to know what the best
* match is.
*/
int sa = 1 + Asx / Bsx;
int sb = 1 + Asy / Bsy;
int sadx = 0;
int sady = 0;
signed8 minsad = Bsx * Bsy * 256L;
for (int a = 0; a < sa; a++)
{
int xl = a * Bsx;
int xr = xl + Bsx;
if (xr > Asx) continue;
for (int b = 0; b < sb; b++)
{
int yl = b * Bsy;
int yr = yl + Bsy;
if (yr > Asy) continue;
// find the maximum correlation in this cell
int cormxat = xl + yl * Asx;
double cormx = crosscorrs[cormxat];
for (int x = xl; x < xr; x++)
for (int y = yl; y < yr; y++)
{
int j = x + y * Asx;
if (crosscorrs[j] > cormx)
cormx = crosscorrs[cormxat = j];
}
int corx = cormxat % Asx;
int cory = cormxat / Asx;
// We dont want subimages that fall of the larger image
if (corx + Bsx > Asx) continue;
if (cory + Bsy > Asy) continue;
signed8 sad = 0;
for (int x = 0; sad < minsad && x < Bsx; x++)
for (int y = 0; y < Bsy; y++)
{
int j = (x + corx) + (y + cory) * Asx;
int i = x + y * Bsx;
sad += Math.Abs((int)Bsad[i] - (int)Asad[j]);
}
if (sad < minsad)
{
minsad = sad;
sadx = corx;
sady = cory;
// printf("* ");
}
// printf("Grid (%d,%d) (%d,%d) Sip=%g Sad=%lld\n",a,b,corx,cory,cormx,sad);
}
}
//Console.Write("{0:D}\t{1:D}\n", sadx, sady);
/**
* Aa, Ba, Af and Bf were allocated in this function
* crosscorrs was an alias for Aa and does not require deletion.
*/
fftw.free(Aa);
fftw.free(Ba);
fftw.free(Af);
fftw.free(Bf);
return new Point(sadx, sady);
}
private static void normalize(ref signed2[] img, int sx, int sy, int wx, int wy)
{
/**
* Calculate the mean background. We will subtract this
* from img to make sure that it has a mean of zero
* over a window size of wx x wy. Afterwards we calculate
* the square of the difference, which will then be used
* to normalize the local variance of img.
*/
signed2[] mean = boxaverage(img, sx, sy, wx, wy);
signed2[] sqr = new signed2[sx * sy];
for (int j = 0; j < sx * sy; j++)
{
img[j] -= mean[j];
signed2 v = img[j];
sqr[j] = (signed2)(v * v);
}
signed2[] var = boxaverage(sqr, sx, sy, wx, wy);
/**
* The normalization process. Currenlty still
* calculated as doubles. Could probably be fixed
* to integers too. The only problem is the sqrt
*/
for (int j = 0; j < sx * sy; j++)
{
double v = Math.Sqrt(Math.Abs((double)var[j]));//#double v = sqrt(fabs(var[j])); <- ambigous
Debug.Assert(!double.IsInfinity(v) && v >= 0);
if (v < 0.0001) v = 0.0001;
img[j] = (signed2)(img[j] * (32 / v));
if (img[j] > 127) img[j] = 127;
if (img[j] < -127) img[j] = -127;
}
/**
* As a last step in the normalization we
* window the sub image around the borders
* to become 0
*/
window(ref img, sx, sy, wx, wy);
}
private static signed2[] boxaverage(signed2[] input, int sx, int sy, int wx, int wy)
{
signed2[] horizontalmean = new signed2[sx * sy];
Debug.Assert(horizontalmean != null);
int wx2 = wx / 2;
int wy2 = wy / 2;
int from = (sy - 1) * sx;
int to = (sy - 1) * sx;
int initcount = wx - wx2;
if (sx < initcount) initcount = sx;
int xli = -wx2;
int xri = wx - wx2;
for (; from >= 0; from -= sx, to -= sx)
{
signed8 sum = 0;
int count = initcount;
for (int c = 0; c < count; c++)
sum += (signed8)input[c + from];
horizontalmean[to] = (signed2)(sum / count);
int xl = xli, x = 1, xr = xri;
/**
* The area where the window is slightly outside the
* left boundary. Beware: the right bnoundary could be
* outside on the other side already
*/
for (; x < sx; x++, xl++, xr++)
{
if (xl >= 0) break;
if (xr < sx)
{
sum += (signed8)input[xr + from];
count++;
}
horizontalmean[x + to] = (signed2)(sum / count);
}
/**
* both bounds of the sliding window
* are fully inside the images
*/
for (; xr < sx; x++, xl++, xr++)
{
sum -= (signed8)input[xl + from];
sum += (signed8)input[xr + from];
horizontalmean[x + to] = (signed2)(sum / count);
}
/**
* the right bound is falling of the page
*/
for (; x < sx; x++, xl++)
{
sum -= (signed8)input[xl + from];
count--;
horizontalmean[x + to] = (signed2)(sum / count);
}
}
/**
* The same process as aboe but for the vertical dimension now
*/
int ssy = (sy - 1) * sx + 1;
from = sx - 1;
signed2[] verticalmean = new signed2[sx * sy];
Debug.Assert(verticalmean != null);
to = sx - 1;
initcount = wy - wy2;
if (sy < initcount) initcount = sy;
int initstopat = initcount * sx;
int yli = -wy2 * sx;
int yri = (wy - wy2) * sx;
for (; from >= 0; from--, to--)
{
signed8 sum = 0;
int count = initcount;
for (int d = 0; d < initstopat; d += sx)
sum += (signed8)horizontalmean[d + from];
verticalmean[to] = (signed2)(sum / count);
int yl = yli, y = 1, yr = yri;
for (; y < ssy; y += sx, yl += sx, yr += sx)
{
if (yl >= 0) break;
if (yr < ssy)
{
sum += (signed8)horizontalmean[yr + from];
count++;
}
verticalmean[y + to] = (signed2)(sum / count);
}
for (; yr < ssy; y += sx, yl += sx, yr += sx)
{
sum -= (signed8)horizontalmean[yl + from];
sum += (signed8)horizontalmean[yr + from];
verticalmean[y + to] = (signed2)(sum / count);
}
for (; y < ssy; y += sx, yl += sx)
{
sum -= (signed8)horizontalmean[yl + from];
count--;
verticalmean[y + to] = (signed2)(sum / count);
}
}
return verticalmean;
}
private static void window(ref signed2[] img, int sx, int sy, int wx, int wy)
{
int wx2 = wx / 2;
int sxsy = sx * sy;
int sx1 = sx - 1;
for (int x = 0; x < wx2; x++)
for (int y = 0; y < sxsy; y += sx)
{
img[x + y] = (signed2)(img[x + y] * x / wx2);
img[sx1 - x + y] = (signed2)(img[sx1 - x + y] * x / wx2);
}
int wy2 = wy / 2;
int syb = (sy - 1) * sx;
int syt = 0;
for (int y = 0; y < wy2; y++)
{
for (int x = 0; x < sx; x++)
{
/**
* here we need to recalculate the stuff (*y/wy2)
* to preserve the discrete nature of integers.
*/
img[x + syt] = (signed2)(img[x + syt] * y / wy2);
img[x + syb] = (signed2)(img[x + syb] * y / wy2);
}
/**
* The next row for the top rows
* The previous row for the bottom rows
*/
syt += sx;
syb -= sx;
}
}
private static signed2[] read_image(string filename, ref int sx, ref int sy)
{
Bitmap image = new Bitmap(filename);
sx = image.Width;
sy = image.Height;
signed2[] GreyImage = new signed2[sx * sy];
BitmapData bitmapData1 = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
unsafe
{
byte* imagePointer = (byte*)bitmapData1.Scan0;
for (int y = 0; y < bitmapData1.Height; y++)
{
for (int x = 0; x < bitmapData1.Width; x++)
{
GreyImage[x + y * sx] = (signed2)((imagePointer[0] + imagePointer[1] + imagePointer[2]) / 3.0);
//4 bytes per pixel
imagePointer += 4;
}//end for x
//4 bytes per pixel
imagePointer += bitmapData1.Stride - (bitmapData1.Width * 4);
}//end for y
}//end unsafe
image.UnlockBits(bitmapData1);
return GreyImage;
}
}
Your don't need fuzzy as in "neural network" because (as I understand) you don't have rotation, tilts or similar. If OS display differences are the only modifications the difference should be minimal.
So WernerVanBelle's paper is nice but not really necessary and MrFooz's code works - but is terribly innefficient (O(width * height * patter_width * pattern_height)!).
The best algorithm I can think of is the Boyer-Moore algorithm for string searching, modified to allow 2 dimensional searches.
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
Instead of one displacement you will have to store a pair of displacements dx and dy for each color. When checking a pixel you move only in the x direction x = x + dx and store only the minimum of the dy's DY = min(DY, dy) to set the new y value after a whole line has been tested (ie x > width).
Creating a table for all possible colors probably is prohibitve due to the imense number of possible colors, so either use a map to store the rules (and default to the pattern dimensions if a color is not inside the map) or create tables for each color seperately and set dx = max(dx(red), dx(green), dx(blue)) - which is only an approximation but removes the overhead of a map.
In the preprocessing of the bad-character rule, you can account for small deviations of colors by spreading rules from all colors to their "neighbouring" colors (however you wish to define neighbouring).