My question is similar to this one, but is more specific in scope.
In my card game application, I would like for users to be able to click on words located in a scanned jpeg image. Please see this sample Pokemon trading card.
In this case, the user should be able to hover his mouse over the text "Scratch", upon which a pulsing rectangular border will appear around the text, indicating that it is clickable. The problem is how to detect the border of the text. There will be an array of words KNOWN BEFOREHAND that the user may click on (these will be retrieved from a database on a card-by-card basis). To continue our example, the array in this case will be ["Scratch", "Live Coal"]. Once the user clicks on "Scratch", the application must know via a call-back that "Scratch" was chosen instead of "Live Coal".
I was thinking of using optical character recognition libraries to solve this problem, but the open-source options for this are poor in quality (e.g. GOCR) and/or not well-tested on multiple platforms (e.g. Tesseract). I only care about Windows and Mac compatibility. Am I missing an obvious/simpler solution/algorithm that does not require OCR? I cannot simply hand-code in bounding boxes for each card, as there will be thousands of scanned cards in my database. The user may also upload his own custom card scans with an accompanying array of clickable text.
Text color is not always black. See this panorama of different card and text styles that will be permitted. The black cards have white text, and the third-to-last card (Zekrom) has black text with a white outline.
Solutions in any programming language are appreciated. However, please note that I am looking for open-source algorithms and/or libraries. If there is a solution in Ruby or Java, even better, as my code is primarily in these two languages.
EDIT: I forgot to mention that the order of the words/phrases in the array will be the same as on the card. Thus, the array will be ["Scratch", "Live Coal"] instead of ["Live Coal", "Scratch"]. I am mentioning this because it can potentially simplify the task. Thus, for this example, I can simply look for black pixels (though I have to watch out for the black star in the white circle). However, there will be more difficult cases where there is descriptive text under the attack name in a smaller font (again, see the panorama for examples).
I would just write a program that allows you to visually draw a bounding box around your text for simplicity but could could do this buy detecting differences in pixel color. Since the text is black you could see where the upper-left most black pixel is without large indents and within the bottom half of the card.
When the cursor is stationary, check if there is a black pixel either underneath or to 4 pixels around the cursor. If it is, check the first three consecutive (because there still might be a non-black pixel between the letters) non-black pixels to the left of the cursor, to the right, to the top and at the bottom. If yes, use these locations to draw a square. You can use OpenCV.
Related
Introduction
The title is a bit complicated so let's break it down:
I have an image submitted by a user
The image is a top view of a landscape featuring clearly marked regions. For example, if this was a park the image would be a top view of the parks layout.
I need to allow the user to classify different elements in the image and estimate the area occupied by those elements. Continuing with the park analogy; the park may have two pavilions and a sand volleyball court. I must allow the user to mark the points of interest (let's say the volleyball court) and compute their area (given the overall dimensions of the depicted park)
Current Ideas
I think I should create a buffered image and use that as the background of a canvas.
I'm not sure about the user input. My first idea was to have the users drag rectangles associated with a specific feature (ie. red rectangle for volleyball courts) to the region over the image. Rectangles work because the elements are mostly rectangular but I don't know that users can resize rectangles.
To reiterate, the main problem is determining the area occupied by physical structures in a given image. No machine vision, just plain old Mouse Events.
How should I be approaching the user input dilemma? Any APIs I should be digging through?
Please let me know if I can improve the question and explanation.
A portion of my app involves the user drawing images that will be later strung together in a PDF. The user is free to use the entire screen to draw. Once the user is done drawing, I'd like to trim off all of the white space before adding the image to a PDF. This is where I am having problems. I've thought of two different methods to determine the location of trimmable white space and both seem clumsy.
My first thought was having the motion event of the stylus record if the event has gone outside of the box so far. If it has, I would expand the box to accommodate this. Unfortunately I could see polling every time there is a motion event being bad for performance. I can't just look at up and down events because the user could draw something like the letter V.
Then I thought I could look at all the pixels (using getPixel()) and see where the highest, lowest, rightmost and leftmost black pixels are. Again this seems like a really inefficient way to find the box. I'm sure I could skip some pixels to improve performance, but I can't skip too many.
Is there a standard way of doing what I want to do? I haven't been able to find anything.
You can inside your editor, where you record that this pixel has been drawn upon, update the maximum and minimum X and Y, and then use them later to crop the image.
If the user is drawing, aren't you already handling the onTouchEvent callback in order to capture the drawing events? If so, it shouldn't be a big deal to keep a minX, maxX, minY and maxY and check each recorded drawing event against these values.
A customer requested me a software, and one of its requirements is build a form and fill it with data collected from database.
This form is currently being created in Excel. It uses cells to build the form, some cells have blank background, others blank background with black bottom border (to look like a line where text is typed), others have gray background with white text, and there's also a logo image. In Excel, some cells are merged to become bigger than other cells. They fill the text in another spreadsheet and the required cells in the form take that text and format it.
I've looked many report frameworks in Java, some are very complex and some look like Excel's graph builders, but I saw none that can make a complex 2D form like this.
Data filled in it is simple, like name, quantity, some numbers, but they have different length requiring for example that name's cell to be merged to cover a full horizontal line, and some have smaller font size. There's no repeated data that would require sorting and I have no problem gathering the data.
In the end, the filled form must also be printed, so I can't use normal Swing table or grid. It will be used in Windows now, but it'd be nice to support Linux printing too.
Any suggestion of a Java component that builds a 2D layout like this and fills it with strings will be very much appreciated. I even thought of taking a screenshot of their current form and just use 2D Graphics to print the text, but I'd not be able to print it.
This is an example of the kind of form I must build, it's somewhat like that but some areas have gray background with white text:
No, it's not a duplicate, but it is a good example of the layout.
If I have an image of a table of boxes, with some coloured in, is there an image processing library that can help me turn this into an array?
Thanks
You can use a thresholding function to binarize the image into dark/light pixels so dark pixels are 0 and light ones are 1.
Then you would want to remove image artifacts using dilation and erosion functions to remove noise (all these are well defined on Wikipedia).
Finally if you know where the boxes are, you can just get the value in the center of each box to determine the array value, or possibly use an area near the center and take the prevailing value (i.e. more 0's is a filled in square, more 1's is and empty square).
If you are scanning these boxes and there is a lot of variation in the position of the boxes, you will have to perform some level of image registration using known points, or fiducials.
As far as what tools to use to do this, I'd recommend first trying this manually using a tool like ImageJ, which has a UI and can also be used programatically since it is written all in Java.
Other good libraries for this include OpenCV and the Java Advanced Imaging API.
Your results will definitely vary depending on the input images and how consistenly lit and positioned they are.
The best way to see how it will do for your data is to try applying these processing steps manually to see where your threshold value should be, how much dilating/eroding you need to get consistent results.
I'm working on small project which requires: Change clothes (shirt/pants etc.) of a person in any 2D image he chooses to upload. So somehow edges needs to be detected and relevant areas are supposed to be filled with new patterns. I do see a lot of other complications, but let's assume simple patterns have to be filled only.
For a web application, is it possible to do it in HTML5? Any other alternatives?
For a standalone application, what kind of technology would be preferred, C++/Java?
Update
Based on Bart's comment:
Any useful pointer like Bart's would be really useful
Assumption: Clear traceable 'standing' human figure in 2d image
Since it's an image, there is no real-time scenario
Assumption: Clear traceable 'standing' human figure in 2d image
A way to do this is to require the user to take two pictures. One picture is the one with the user in it, the other picture must be taken in the same camera position and orientation, but the user steps out of the frame for that one.
Since both pictures will have the same background you can compare pixel by pixel between the two images and flag those pixels that have a difference over some threshold. Of course the threshold must be selected so that camera noise isn't detected as a difference. Once you have the collection of pixels that are different you can filter them and calculate an approximate silhouette for the user from the pixels on the edge.
A simplification of the above method can be done if you have control over the background. You could use a bluescreen to avoid having to have a second picture with the background.