For the last week I've been researching and experimenting with facial recognition. The intended application is for a person to be able to look up a person's information in a database (SQL) by simply taking a picture of their face. The initial expectation was to be able to compress a face down to a key or hash and use this as the database lokup. This need not be extremely accurate as the person looking up the information can and most likely will end up doing a final comparison between the original image on file and the person standing in front of them.
OpenCV/JavaCV seems to be the obvious starting point, and the facial detection that it provides works well, however the implementation of Eigenfaces for facial recognition isn't ideal because online training by recompiling hundreds of thousands of user faces every time a new face needs to be added to the training set wouldn't work.
I am experimenting with using SURF descriptors on a face extracted using OpenCV's Haar Cascade features, and this appears to get me closer to the intended result, however I am unable to think of a way to efficiently lookup and compare roughly 30 descriptors (which are either 64 or 128 dimensional vectors) in a database. I've done some reading about LSH and Spectral Hashing algorithms, however there are no implementations to be found for Java and my math isn't strong enough to implement them myself.
Does anyone have any thoughts or ideas on how this might be accomplished, or if it is even possible?
Hashing isn't complicated, nor do you need a degree in maths.
Assuming that any 2 images will result in a fairly similar number of 'descriptors' then it only requires that you get a reasonable match with enough of them to get to a high enough confidence factor.
How specific these descriptors are determines what level of collision you can accept in your hashing algorithm.
As you have several of them, I would suggest that you don't need anything too sophisticated - after all, you probably want a level of 'fuzziness' in your search?
Start with something simple - experiment and refine. You might even find that you'll need different hashing for different descriptors - i.e. some might be more specific than others?
Hopefully some food for thought.
Related
I am currently doing my dissertation which would involve in having 2 people a professional athlete and an amateur. First with the image processing skeletonization I would like to record the professional athlete while performing the squat exercise , then when the amateur performs the exercise I want to be able to compare the professional skeleton with that of the amateur to see if it is properly formed.
Please I m open for any suggestions and opinions , Would gladly appreciate some help
Here lies your question:
properly formed.
What does properly performed actually mean ? How can this be quantified ?
Bare in mind I'm not an athletic/experienced in this field.
If I were given the task I would counter-intuitively go in the opposite direction:
moving away form Processing 3/kinect/computer. I would instead:
find a professional athlete
find a skilled with trainer with functional mobility training.
find an amateur (probably easiest)
Item 2 will be trickier.For example FMS seems to put a lot of emphasis on correct exercising and mobility (to enhance performance and reduce risk of injuries). I'm not sure if that's the only approach or the best. You might want to check opinions on Physical Fitness, consult with people studying/teaching exercise science, etc. Do check credentials as it feels like a field where everyone has an opinion/preference.
The idea is to understand how a professional educated trainer asses correct movement. Take note of how that works in the real world and try to systemise it.
What are the cues for a correct execution ?
is the key poses
the motion in between
how the skeletal and muscular system work together/ the weights/forces applied/etc.
Having a better understanding of how this works in the real world should lead you to things you can start quantifying/comparing numerically on a computer.
Try to make a checklist/score system manually using a pen and paper based on the information you gather. If this works you already have a system you can start programming.
The next step is acquiring the data.
This is probably where the kinect comes, but bare in mind:
the second version of the kinect is more precise than the first
there is a Kinect2 SDK wrapper for Processing 3: use that if you can (windows only). There is a way you can get libfreenect2 working with OpenNI on osx/linux and therefore with SimpleOpenNI in Processing, but it's not straight forward and you won't have the same precision on the skeleton tracking algorithm
use data that is as precise as possible:
you can get the accuracy of a tracked skeleton joint
use an environment that doesn't contain a complex background (makes it easy to segment users and detect/track skeletons with little change of mistaking it for something else). prefer artificial non-incandescent light (less of a problem with kinect v2, but still you want as little IR interference as possible).
comparing orientation matrices or joints on single poses might not be enough to get the full picture: how do you capture/quantify motion taking into account the things that the kinect can't easily see: muscles flexing/forces applied/moving centre of gravity/etc.
try to use a grid system that will make it simple to pair the digital values with real world measurements. Check out how people used to study motion in the past, for example Étienne-Jules Marey or Eadweard Muybridge
Motion capture by Étienne-Jules Marey
Motion study by Eadweard Muybridge (notice the grid)
It's a pretty full on project to get right involving bits of anatomy/physics/kinematics/etc.
Start with the research first:
how did people study this in the past ?
what are the current developments ?
how does it work in the real world (without computers) ?
Take your constraints into account:
what resources (people/gear/etc.) can you use ?
how much time do you have available ?
Given the above, what topic/section of the project can be realistically be tackled to get useful results.
Overall probably something along these lines:
background research
real world studies
comparison system has feature which can be measured both with kinect and by a person
record data (real world data + mobility comparison evalutation and kinect data + mobility comparison)
compare data
write evaluation of findings (how effective is the system? what are limitations ? what could be improved (future work) ? etc.)
In short be aware of the kinect limitations: skeleton tracking is probability based: it's not 100% accurate. use data that's as clean/correct as possible to begin with (make it easy to acquire good data if you can control the capture environment). From what a real trainer would track, what could you track with a kinect ? do a comparison of the intersecting measurements.
I want to implement object detection in license plate (the city name) . I have an image:
and I want to detect if the image contains the word "بابل":
I have tried using a template matching method using OpenCV and also using MATLAB but the result is poor when tested with other images.
I have also read this page, but I was not able to get a good understanding of what to do from that.
Can anyone help me or give me a step by step way to solve that?
I have a project to recognize the license plate and we can recognize and detect the numbers but I need to detect and recognize the words (it is the same words with more cars )
Your question is very broad, but I will do my best to explain optical character recognition (OCR) in a programmatic context and give you a general project workflow followed by successful OCR algorithms.
The problem you face is easier than most, because instead of having to recognize/differentiate between different characters, you only have to recognize a single image (assuming this is the only city you want to recognize). You are, however, subject to many of the limitations of any image recognition algorithm (quality, lighting, image variation).
Things you need to do:
1) Image isolation
You'll have to isolate your image from a noisy background:
I think that the best isolation technique would be to first isolate the license plate, and then isolate the specific characters you're looking for. Important things to keep in mind during this step:
Does the license plate always appear in the same place on the car?
Are cars always in the same position when the image is taken?
Is the word you are looking for always in the same spot on the license plate?
The difficulty/implementation of the task depends greatly on the answers to these three questions.
2) Image capture/preprocessing
This is a very important step for your particular implementation. Although possible, it is highly unlikely that your image will look like this:
as your camera would have to be directly in front of the license plate. More likely, your image may look like one of these:
depending on the perspective where the image is taken from. Ideally, all of your images will be taken from the same vantage point and you'll simply be able to apply a single transform so that they all look similar (or not apply one at all). If you have photos taken from different vantage points, you need to manipulate them or else you will be comparing two different images. Also, especially if you are taking images from only one vantage point and decide not to do a transform, make sure that the text your algorithm is looking for is transformed to be from the same point-of-view. If you don't, you'll have an not-so-great success rate that's difficult to debug/figure out.
3) Image optimization
You'll probably want to (a) convert your images to black-and-white and (b) reduce the noise of your images. These two processes are called binarization and despeckling, respectively. There are many implementations of these algorithms available in many different languages, most accessible by a Google search. You can batch process your images using any language /free tool if you want, or find an implementation that works with whatever language you decide to work in.
4) Pattern recognition
If you only want to search for the name of this one city (only one word ever), you'll most likely want to implement a matrix matching strategy. Many people also refer to matrix matching as pattern recognition so you may have heard it in this context before. Here is an excellent paper detailing an algorithmic implementation that should help you immensely should you choose to use matrix matching. The other algorithm available is feature extraction, which attempts to identify words based on patterns within letters (i.e. loops, curves, lines). You might use this if the font style of the word on the license plate ever changes, but if the same font will always be used, I think matrix matching will have the best results.
5) Algorithm training
Depending on the approach you take (if you use a learning algorithm), you may need to train your algorithm with data that is tagged. What this means is that you have a series of images that you've identified as True (contains city name) or False (does not). Here's a psuedocode example of how this works:
train = [(img1, True), (img2, True), (img3, False), (img4, False)]
img_recognizer = algorithm(train)
Then, you apply your trained algorithm to identify untagged images.
test_untagged = [img5, img6, img7]
for image in test_untagged:
img_recognizer(image)
Your training sets should be much larger than four data points; in general, the bigger the better. Just make sure, as I said before, that all the images are of an identical transformation.
Here is a very, very high-level code flow that may be helpful in implementing your algorithm:
img_in = capture_image()
cropped_img = isolate(img_in)
scaled_img = normalize_scale(cropped_img)
img_desp = despeckle(scaled_img)
img_final = binarize(img_desp)
#train
match() = train_match(training_set)
boolCity = match(img_final)
The processes above have been implemented many times and are thoroughly documented in many languages. Below are some implementations in the languages tagged in your question.
Pure Java
cvBlob in OpenCV (check out this tutorial and this blog post too)
tesseract-ocr in C++
Matlab OCR
Good luck!
If you ask "I want to detect if the image contains the word "بابل" - this is classic problem which is solved using http://code.opencv.org/projects/opencv/wiki/FaceDetection like classifier.
But I assume you still want more. Years ago I tried to solve simiar problems and I provide example image to show how good/bad it was:
To detected licence plate I used very basic rectangle detection which is included in every OpenCV samples folder. And then used perspective transform to fix layout and size. It was important to implement multiple checks to see if rectangle looks good enough to be licence plate. For example if rectangle is 500px tall and 2px wide, then probably this is not what I want and was rejected.
Use https://code.google.com/p/cvblob/ to extract arabic text and other components on detected plate. I just had similar need yesterday on other project. I had to extract Japanese kanji symbols from page:
CvBlob does a lot of work for you.
Next step use technique explained http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ to match city name. Just teach algorithm with example images of different city names and soon it will tell 99% of them just out of box. I have used similar approaches on different projects and quite sure they work
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?
This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.
The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html
Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins
The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)
If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial
Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.
To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.
Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).
There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.
I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.
I want to implement a OCR system. I need my program to not make any mistakes on the letters it does choose to recognize. It doesn't matter if it cannot recognize a lot of them (i.e high precision even with a low recall is Okay).
Can someone help me choose a suitable ML algorithm for this. I've been looking around and find some confusing things. For example, I found contradicting statements about SVM. In the scikits learn docs, it was mentioned that we cannot get probability estimates for SVM. Whereas, I found another post that says it is possible to do this in WEKA.
Anyway, I am looking for a machine learning algorithm that best suites this purpose. It would be great if you could suggest a library for the algorithm as well. I prefer Python based solutions, but I am OK to work with Java as well.
It is possible to get probability estimates from SVMs in scikit-learn by simply setting probability=True when constructing the SVC object. The docs only warn that the probability estimates might not be very good.
The quintessential probabilistic classifier is logistic regression, so you might give that a try. Note that LR is a linear model though, unlike SVMs which can learn complicated non-linear decision boundaries by using kernels.
I've seen people using neural networks with good results, but that was already a few years ago. I asked an expert colleague and he said that nowadays people use things like nearest-neighbor classifiers.
I don't know scikit or WEKA, but any half-decent classification package should have at least k-nearest neighbors implemented. Or you can implement it yourself, it's ridiculously easy. Give that one a try: it will probably have lower precision than you want, however you can make a slight modification where instead of taking a simple majority vote (i.e. the most frequent class among the neighbors wins) you require larger consensus among the neighbors to assign a class (for example, at least 50% of neighbors must be of the same class). The larger the consensus you require, the larger your precision will be, at the expense of recall.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.