I am attempting (and failing at) locating area's containing text from a larger image. Specifically I am looking to recognize titles of Magic cards. At the moment I have managed to cut the images down to blocks containing the title, such as
input image.
Despite this and even with training the ocr library to work with only with this font accuracy is still low. As far as I can tell the best thing I can do is crop the image to only the text. After research I still have been unable to do so. I attempted to implement the solutions presented in Extracting text OpenCV however the text is too close to the border for this to work. attempt image. If possible help in the form of java would be greatly appreciated. (sorry for the image links, I don't have the reputation to embed images)
Posting answer as suggested.
This answer relies on the text always being close to the same distance/offset away from the border.
Find the boundings of the border using Canny/Hough etc, and with whatever filtering techniques works best with your images (erosion, dilution, sharpen, grayscale, binary thresholding, etc).
Then take a smaller interior submat() of this border bounding Rect to get an approximation of where the text should be and run the ocr on this submat.
Related
Lets say I use a black key jpg image as my layout in Java Code. Is there an easy way to get the region out of an image so I can use region.Contains() for my onTouchListener?
I've looked for them and there isn't any. I ask this because Im making a piano app and would like to get the regions of the images I use for the black/white keys.
Rather than use an image, try rendering the piano keys yourself, and check if input is in the black key regions you render, rather than using an image.
If you really want to use the image, define the black regions within the image by hand.
Detecting regions in an image is a difficult problem, one you probably don't want to get into for such a simple purpose. If you had many images, or had to do this frequently it might be worth the headache but as you presumably don't... don't worry about it.
I have searched over bunch of sites, and I was unable to find solution for my problem.
This is the problem:
I am making PDF's in Java using iText library.
Everything works fine except one thing.
Transparent PNG images have black/gray border around non-transparent area.
I didn't set any borders in code, and actually I have tried to remove them (with no luck).
Can someone help me how to solve this problem?
The closest answer what I have found is: Resizing an image in asp.net without losing the image quality
But I cannot (don't know) interpret this code in Java.
My code is pretty big to copy/paste, but these are steps:
create document
load image from given path
manipulate image (resize, rotate, positioning)
add image to current page
save pdf file
This is what I have tried also:
http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2157267
http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2330200
I have tried more than those 2, but I didn't bookmarked them (none of them worked)
Thanks in advance
UPDATE: I forgot to mention that my original pictures don't have border. Border is created somehow by iText. I initially thought that it was bug, but since iText 5.0.2 this problem remained so now I doubt that is bug (I am currently using 5.1.3).
UPDATE 2 I forgot to add this link: http://itext-general.2136553.n4.nabble.com/template/NamlServlet.jtp?macro=print_post&node=2157261
Here is presented VB script that works, but I cannot convert to Java code (it still draws black border), so can someone help me at least with this to convert good?
You could use the java BufferedImage method, getSubImage(x, y, w, h) which allows you to crop a sub image out of an existing image. That way you could cut out the edges.
See here: Class BufferedImage
I need to to clip variablesized images into puzzle shaped pices like this(not squares): http://www.fernando.com.ar/jquery-puzzle/
I have considered the posibility of doing this with a php library like Cairo or GD, but have little to no experience with these librays, and see no immidiate soulution for creating a clipping mask dynamicaly scalable for different sized images.
I'm looking for guidance/tips on which serverside programing language to use to accomplish this task, and preferably an approach to this problem.
You can create an image using GD with the size of the puzzle piece. and then copy the full image on that image with the right cropping to get the right part of the image.
Then you can just dynamically color in every part of the piece you want to remove with a distinct color (eg #0f0) and then use imagecolorallocatealpha to make that color transparent. Do it for each piece and you have your server side image pieces.
However, if I where you I would create the clipping mask of each puzzle peace in advance in the distinct color. That would make two images per connection (one with the "circle" connecter sticking out and one where this circle connector fits into). That way you can just copy these masks onto the image to create nice edges quickly.
GD is quite complicated, I've heard very good things about Image Magick for which there is a PHP version and lots of documentation on php.net. However, not all web servers would have this installed by default.
http://www.php.net/manual/en/book.imagick.php
If you choose to do it using PHP with GD then the code here may help:
http://php.amnuts.com/index.php?do=view&id=15&file=class.imagemask.php
Essentially what you need to do with GD is to start with a mask at a particular size and then use the imagecopyresampled function to copy the mask image resource to a larger or smaller size. To see what I mean, check out the _getMaskImage method class shown at the url above. A working example of the output can be seen at:
http://php.amnuts.com/demos/image-mask/
The problem with doing it via GD, as far as I can tell, is that you need to do it a pixel at a time if you want to achieve varying opacity levels, so processing a large image could take a few seconds. With ImageMagick this may not be the case.
My question is similar to this one, but is more specific in scope.
In my card game application, I would like for users to be able to click on words located in a scanned jpeg image. Please see this sample Pokemon trading card.
In this case, the user should be able to hover his mouse over the text "Scratch", upon which a pulsing rectangular border will appear around the text, indicating that it is clickable. The problem is how to detect the border of the text. There will be an array of words KNOWN BEFOREHAND that the user may click on (these will be retrieved from a database on a card-by-card basis). To continue our example, the array in this case will be ["Scratch", "Live Coal"]. Once the user clicks on "Scratch", the application must know via a call-back that "Scratch" was chosen instead of "Live Coal".
I was thinking of using optical character recognition libraries to solve this problem, but the open-source options for this are poor in quality (e.g. GOCR) and/or not well-tested on multiple platforms (e.g. Tesseract). I only care about Windows and Mac compatibility. Am I missing an obvious/simpler solution/algorithm that does not require OCR? I cannot simply hand-code in bounding boxes for each card, as there will be thousands of scanned cards in my database. The user may also upload his own custom card scans with an accompanying array of clickable text.
Text color is not always black. See this panorama of different card and text styles that will be permitted. The black cards have white text, and the third-to-last card (Zekrom) has black text with a white outline.
Solutions in any programming language are appreciated. However, please note that I am looking for open-source algorithms and/or libraries. If there is a solution in Ruby or Java, even better, as my code is primarily in these two languages.
EDIT: I forgot to mention that the order of the words/phrases in the array will be the same as on the card. Thus, the array will be ["Scratch", "Live Coal"] instead of ["Live Coal", "Scratch"]. I am mentioning this because it can potentially simplify the task. Thus, for this example, I can simply look for black pixels (though I have to watch out for the black star in the white circle). However, there will be more difficult cases where there is descriptive text under the attack name in a smaller font (again, see the panorama for examples).
I would just write a program that allows you to visually draw a bounding box around your text for simplicity but could could do this buy detecting differences in pixel color. Since the text is black you could see where the upper-left most black pixel is without large indents and within the bottom half of the card.
When the cursor is stationary, check if there is a black pixel either underneath or to 4 pixels around the cursor. If it is, check the first three consecutive (because there still might be a non-black pixel between the letters) non-black pixels to the left of the cursor, to the right, to the top and at the bottom. If yes, use these locations to draw a square. You can use OpenCV.
Given:
two images of the same subject matter;
the images have the same resolution, colour depth, and file format;
the images differ in size and rotation; and
two lists of (x, y) co-ordinates that correlate the images.
I would like to know:
How do you transform the larger image so that it visually aligns to the second image?
(Optional.) What are the minimum number of points needed to get an accurate transformation?
(Optional.) How far apart do the points need to be to get an accurate transformation?
The transformation would need to rotate, scale, and possibly shear the larger image. Essentially, I want to create (or find) a program that does the following:
Input two images (e.g., TIFFs).
Click several anchor points on the small image.
Click the several corresponding anchor points on the large image.
Transform the large image such that it maps to the small image by aligning the anchor points.
This would help align pictures of the same stellar object. (For example, a hand-drawn picture from 1855 mapped to a photograph taken by Hubble in 2000.)
Many thanks in advance for any algorithms (preferably Java or similar pseudo-code), ideas or links to related open-source software packages.
This is called Image Registration.
Mathworks discusses this, Matlab has this ability, and more information is in the Elastix Manual.
Consider:
Open source Matlab equivalents
IRTK
IRAF
Hugin
you can use the javax.imageio or Java Advanced Imaging api's for rotating, shearing and scaling the images once you found out what you want to do with them.
For a C++ implementation (without GUI), try the old KLT (Kanade-Lucas-Tomasi) tracker.
http://www.ces.clemson.edu/~stb/klt/