Handwriting signature detection - java

I'm trying to find if a scanned pdf form contains a signature (like making sure a check is signed).
The problem domain:
I will be receiving document packages (multi page pdf's with multiple forms). I have already put together document package classifiers that will check the package for all documents and scale the images to a common size. After that I know where the signatures should be and can scan the area of the document specifically. What I'm looking for is the best approach to making sure there is a signature present. I've considered just checking for a base threshold of dark pixels but that seems so clumsy. The trouble with signatures is that they are not really writing, more of a personal mark.
The only thing I can come up with is a machine learning method to look for loopyness? But I'm not all the familiar with machine learning and don't even know where to start with something like that. Anyone with some suggestions for practical approaches would very appreciated.
I'm coding this in Java if that's helpful at all

What you asked was very broad so there isn't a lot of information that we can give you. However, I can point you to some helpful links:
http://java-ml.sourceforge.net/ --This is a library that you can download that has lots of useful algorithms and other code to include in your program
https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU --this is a series that explains neural networks (something you might want to look into for your machine learning)
So a big tip I have for your algorithm is to instead of looking for how long exactly all of the loops and things are, look at all of their relative distances
"Relative distances from what?" you say. Well this is where the next tip comes in handy: instead of keeping track of the lines, keep track of the tips of the loops and the order of these points. If you then take the distance between all of them (relatively of course which means to set one of the lengths to zero). Along to keeping track of the distances, you should also keep track of the angles. You would calculate the angle ABC by taking the distance between (A,B), (B,C), and (A,C) (A,B, and C being coordinates on the xy plane) which creates a triangle between the points which allows you to use trigonometry to calculate the angle.
(I am assuming that for all of these you are also trying to detect who's signature it is of course because it actually doesn't really complicate things much at all) When trying to match up the signature detected to the stored signatures to see if they are the "same," don't make it to where the distances and angles have to be exact. Give a margin of error (like use a % range above and below). Here is a tip: Make the margin of error rather large. That way if it is written poorly, it will still be detected. This raises the chances of more than one signature being picked up. Luckily, there is a simply solution to this. Just have it run the algorithm again on the signatures that were found but with the margin of error smaller (you of course don't do this manually, the program does it). Continue decreasing the margin of error until you get only one signature remaining.
I am hoping you have ideas already for detecting where the actual signature is but check for the difference in darkness of the pixels of course. Make sure it is pretty continuous. Also take note of the fact that signatures are commonly signed in both black or blue or sometimes red and other fancy colors.

Related

Video and audio analysis in Java

I would like to do some audio and video analysis in Java.
In a bit more detail, I would like to identify the points in audio/video that have either been monotonous for quite some time or have drastically changed compared to some previous state.
If you want to look at it in a mathematical way, I can try to explain it like this:
Example:
You have an audio file. You should extract the waveform of that
audio file. You could try to approximate that waveform with some
simpler function, that can be expressed as a closed formula. Let's
call that function f(t).
Now, to find out how your function behaves (is it increasing or decreasing) at some point or interval, I guess I could use the first derivative,f'(t). If I'd like even more information, I assume second derivative, f''(t) would also come in handy.
So, if we assume we can do that then I guess I'd have 1 piece of information about the audio.
However, if I'm not mistaken, audio files can also have spectrograms, so I'm unsure how they fall into all of this.
So, the real question goes here: Is there a way to do this in Java (efficiently)? I've been doing some digging and I've found MusicG, however, the last update date is July 2012, which leads me to believe this may be abandoned.
The second part refers to video files, but without their audio component.
This is where I'll have more questions, so I'm just gonna go and shoot them.
How do you identify points of change in "pace" in videos?
Here's an example:
Imagine the video shows car driver's point of view while he's driving
on a long, straight road. Since the surroundings are mostly the same,
the pace could be described as "not changing much". At one point, the
road begins to curve but the driver, due to him falling asleep" is not
following the road that precisely, so the surroundings start to change
somewhat, and so does the pace. At the apex of that curve there is a
tree, which grows bigger and bigger as the car is approaching it.
Here, the POV (and the pace) is changing quite a lot, since the tree
is getting bigger and bigger. In the end, the car crashes into a tree,
all hell breaks loose, the car starts to roll uncontrollably, which
indicates a really intense pace.
I'm assuming one way could be to do an image segmentation and somehow determine which portions of the frames are changing, and how big are those portions to try to determine pace, but I'd like additional input.
If anyone has had prior experience doing any sort of related work in Java, what approaches did you explore and/or use? One thing that immediately comes to my mind is JavaCV, but as I said, with my limited experience, I'm unsure what to actually try.

Detecting an object (words) in an image

I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work

Java library or strategy to tell if an image contains a signature

I have an image file, and I need to determine if a specified area of this image contains a signature. Or to put it in end-user terms, "Has this document been signed?"
What I have done so far is to examine all the pixels contained in the area, to calculate an average "darkness", and compare that to a reference value. If the difference in darkness exceeds some threshold, then I consider it signed.
The problem with this (admittedly simplistic) approach is that because the pixels of the signature itself are such a small fraction of area, I have to use a very low darkness threshold, which results in a large number of false positives. I can't distinguish a real signature from stray markings, smudges, fax artifacts, etc.
To be clear...I'm not trying to match any specific signature or set of signatures. That is, I don't care who signed it, only whether it is signed.
Is anyone aware of a Java library that can do this, or of a better approach to this problem than what I am currently doing?
EDIT:
This is an example of the kinds of images I am working with. This document would be faxed to the recipient, signed and faxed back. It won't be this clean-looking by the time I need to look for a signature.
I do not know of any simple solutions. You can wrap over queXF or write something similar in Java. This paper talks about color code algorithm to recognize signatures.
This is what I believe can be done (although not a very good solution) but may still work. It would involve a bit of machine learning. I am assuming that your image does not contain hand written text and its just an image.
First thing to do would be to create a dataset of images which contain a signature and those which do not. The positive samples of the dataset should only contain signatures (you can learn a classifier for multiple aspect ratios) and negative samples should contain random images of the same aspect ratio/dimension. Now, you can compute some feature over these samples (HoG can be used as a feature, although I do not claim it is the best one to use for this application) and learn a SVM for each aspect ratio.
The next step would be to slide a detection window (of different aspect ratios) throughout the image and use the multiple SVMs you have learnt and check if any of them gives a positive response.
Although this approach may not work always, but should give a decent amount of accuracy. The more data you will use to learn, the better the results would get (and if you can come up with a good feature vector to represent a signature, it would help you case even further)

Detecting if multiple bent lines intersect

I'm in the process of making a swimlane diagram but can't come up with a good algorithm to automatically lay out the lines that connect the nodes in the diagram. What I essentially want is this.
However, I don't have any protection against lines overlapping or intersecting right now and it sometimes gets very messy.
Does anyone know a way to detect if a line will intersect ANY of the lines already drawn?
One idea that I've come up with is to store the paths in an array or table and check the entire table every time a new line is slated to be drawn but that does not seem efficient.
I'm doing this in javascript and java through the use of GWT so maybe there is an easy way to solve this using one of the tools provided by these languages?
If your real issue is to minimize the line intersections, there are several algorithms that try to accomplish this in diagrams. Check this link for example, there are also more algorithms that are used in auto routing for electric design automation that are also used in this kind of diagrams, like Lee algorithm, or A* Algorithm.
I don't know if the tools that you are using have enough flexibility to implement this kind of algorithms, usually you need to implement your own heuristic according to the specific type of diagram, but I hope that this links are enough to give you good ideas.
Minimizing the line intersections in a graph is a difficult NP-Hard problem, check this link (about the crossing number) for more information.
Good luck.

Fastest algorithm for locating an object in a field

What would be the best algorithm in terms of speed for locating an object in a field?
The field consists of 18 by 18 squares with side length 30.48 cm. The robot is placed in the square (0,0) and its job is to reach the light source while avoiding obstacles along the way. To locate the light source, the robot does a 360 degree turn to find the angle with the highest light reading and then travels towards the source. It can reliably detect a light source from 100 cm.
The way I'm implementing this presently is I'm storing the information about each tile in a 2x2 array. The possible values of the tiles are unexplored (default), blocked (there's an obstacle), empty (there's nothing in there). I'm thinking of using the DFS algorithm where the children are at position (i+3,j) or (i,j+3). However, considering the fact that I will be doing a rotation to locate the angle with the highest light reading at each child, I think there may be an algorithm which may be able to locate the light source faster than DFS. Also, I will only be travelling in the x and y directions since the robot will be using the grid lines on the floor to make corrections to it's x and y positions.
I would appreciate it if a fast and reliable algorithm could be suggested to accomplish this task.
This is a really broad question, and I'm not an expert so my answer is based on "first principles" thinking rather than experience in the field.
(I'm assuming that your robot has generally unobstructed line of sight and movement; i.e. it is an open area with scattered obstacles, not in a maze.)
The problem is interpreting the information that you get back from a 360 degree scan.
If the robot sees the light source, then traversing a route to the light source is either trivial, or a "simple" maze walking task.
The difficulty is when you don't see the source. It might mean that the source is not within the circle of visibility. But it could also mean that the light is behind an obstacle. And unfortunately, a simple sensor like you are describing cannot distinguish these two cases.
If your sensor system allowed you to see the obstacles, you could plot the locations of the "shadow" regions (regions behind obstacles), and use that to keep track of the places that are left to search. So your strategy would be to visit a small number of locations and do a scan at each, then methodically "tidy up" a small number of areas that were in shadow.
But since you cannot easily tell where the shadow areas are, you need an algorithm that (ultimately) searches everywhere. DFS is a general strategy that searches everywhere, but it does it by (in effect) looking in the nooks and crannies first. A better strategy is to a breadth first search, and only visit the nooks and crannies if the wide-scale scans didn't find the light source.
I would appreciate it if a fast and reliable algorithm could be suggested to accomplish this task.
I think you are going to need to develop one yourself. (Isn't this the point of the problem / task / competition?)
Although it may not look like it, this looks a more like a maze following problem than anything. I suppose this is some kind of challenge or contest situation, where there's always a path from start to target, but suppose there's not for a moment. One of the successful results for a robot navigating a beacon fully surrounded by obstacles would be a report with a description of a closed path of obstacles surrounding a signal. If there's not such a closed path, then you can find a hole in somewhere; this is why is looks like maze following.
So the basic algorithm I'd choose is to start with a spiraling-inward tranversal, sweeping out a path narrow enough so that you're sure to see a beacon if one is present. If there are no obstacles (a degenerate case), this finds the target in minimal time. (Hint: each turn reduces the number of cells your sensor can locate per step.)
Take the spiral traversal to be counter-clockwise. What you have then is related to the rule for solving mazes by keeping your right hand on the wall and following the generated path. In this case, you have the complication that, while the start of the maze is on the boundary, the end may not be. It's possible of the right-hand-touching path to fail in such a situation. Detecting this situation requires looking for "cavities" in the region swept out by adjacency to the wall.

Categories

Resources