Combine FisherFaces and LBPH algorithm to improve accuracy - java

Hi everyone i’m trying to implement a face recognition system for a video surveillance application.
In this context test images are low quality, illumination change from an image to another, and, moreover, the detected subjects are not always in the same pose.
As first recognizer i used FisherFaces and, with 49 test images, i obtain an accuracy of 35/49, without considering the distances of each classified subject (i just considered labels). Trying to get a better accuracy i attempt to make a preprocessing both of training images and test images; the preprocessing i choose is described in “Mastering OpenCV with Practical Computer Vision Projects” book. The steps are:
detection of the eyes in order to allign and rotate a face;
separate histogram equalization to standardize the lighting in the image;
filtering to reduce the effect of pixel noise because the histogram equalization increase it;
the last step is to apply an elliptical mask to the face in order to delete some details of the face that are not significant for the recognition.
Well, with this type of preprocessing, i obtain worse results than before (4/49 subjects properly classified). So i thought of using another classifier, the LBPH recognizer, to improve the accuracy of the recognition since these two types of algorithms have different features and different ways to classify a face; if one use them together maybe the accuracy increase.
So my question is about the ways to combine these two algorithms; anyone knows how to merge the two outputs in order to obtain better accuracy? I thought at this: if FisherFaces and LBPH give the same result (the same label) then there is no problem; otherwise if they disagree my idea is to take the vector of the labels and the vector of the distances for each algorithm and for each subject sum the corresponding distances; at this point the label of the test image is the one that has the shortest distance.
This is just my idea but there are other ways to fuse the output of both algorithm also because i should change the code of the predict function of face module in OpenCV since it returns an int type not a vector of int.

Related

Neural Network Training Criteria: How to Train on Multiple Categories (i.e. shape and color) Without Over-Training

I've been exploring image recognition via Neural Networks. After some research, I started with Encog and their "ImageNeuralNetwork.java" example.
In their example, they use one image per US currency coin (penny, dime, etc) as a training set and then identify a given image of a coin accordingly.
Now I want to use their example as a starting point to practice with different images. I'm trying to use shapes/colors as training. For example, I want the program to recognize the difference between a red circle and red rectangle, but I also want to recognize the difference between a red circle and a blue circle.
I remember reading that you shouldn't over-train and give every possible combination of training images (as in giving 4 images in this case, of 2 differently colored circles and 2 differently colored rectangles).
Would I still be able to use Encog's coin identification example to train on multiple categories (shape and color) or is this another concept? Is there a particular minimum number of training images I can provide without providing every possible color/shape combination and thus over-training?
When it comes to avoid over-training there are no reliable thumb rules. It totally depends on the structure of your network and features of your data. Most people who construct neural networks manage the problem of over-training (or over-fitting) by trial and error. As long as your network is classifying training data with high accuracy and testing data with poor accuracy, you are over-training and you will need to reduce your training iterations and build the network again and keep repeating this. So to answer your second question there is no particular minimum number of images.
As for your first question, you can definitely train on multiple categories and there are several ways to do this... either by having multiple output neurons for each category or by having an encoded output... but most commonly having a separate network for each category works better. Also for color or shape recognition principal component analysis works better compared to neural network in most cases.

GPS data comparison after smoothing

I'm trying to compare multiple algorithms that are used to smooth GPS data. I'm wondering what should be the standard way to compare the results to see which one provides better smoothing.
I was thinking on a machine learning approach. To crate a car model based on a classifier and check on which tracks provides better behaviour.
For the guys who have more experience on this stuff, is this a good approach? Are there other ways to do this?
Generally, there is no universally valid way for comparing two datasets, since it completely depends on the applied/required quality criterion.
For your appoach
I was thinking on a machine learning approach. To crate a car model
based on a classifier and check on which tracks provides better
behaviour.
this means that you will need to define your term "better behavior" mathematically.
One possible quality criterion for your application is as follows (it consists of two parts that express opposing quality aspects):
First part (deviation from raw data): Compute the RMSE (root mean squared error) between the smoothed data and the raw data. This gives you a measure for the deviation of your smoothed track from the given raw coordinates. This means, that the error (RMSE) increases, if you are smoothing more. And it decreases if you are smoothing less.
Second part (track smoothness): Compute the mean absolute lateral acceleration that the car will experience along the track (second deviation). This will decrease if you are smoothing more, and it will increase if you are smoothing less. I.e., it behaves in contrary to the RMSE.
Result evaluation:
(1) Find a sequence of your data where you know that the underlying GPS track is a straight line or where the tracked object is not moving. Note, that for those tracks, the (lateral) acceleration is zero by definition(!).
For these, compute RMSE and mean absolute lateral acceleration.
The RMSE of appoaches that have (almost) zero acceleration results from measurement inaccuracies!
(2) Plot the results in a coordinate system with the RMSE on the x axis and the mean acceleration on the y axis.
(3) Pick all approaches that have an RMSE similar to what you found in step (1).
(4) From those approaches, pick the one(s) with the smallest acceleration. Those give you the smoothest track with an error explained through measurement inaccuracies!
(5) You're done :)
I have no experience on this topic but I have few things in mind that may help you.
You know it is a car. You know that the data is generated from a car so you can define a set of properties of a car. For example if a car is moving with speed above 50km than the angle of the corner should be at least 110 degrees. I am absolutely guessing with the values but if you do a little research i am sure you will be able to define such properties. Next thing you can do is to test how each approximation fits the car properties and choose the best one.
Raw data. I assume you are testing all methods on a part of given road. You can generate a "raw gps track" - a track that best fits the movement of a car. Google maps may help you to generate such track os some gps devise with higher accuracy. Than you measure the distance between each approximation and your generated track - the one with the min distance wins.
i think you easily match the coordinates after the address conversion.
because address have street,area and city. so you can easily match the different radius.
let try this link
Take a look at this paper that discusses comparing machine learning algorithms:
"Choosing between two learning algorithms
based on calibrated tests" available at:
http://www.cs.waikato.ac.nz/ml/publications/2003/bouckaert-calibrated-tests.pdf
Also check out this paper:
"Bayesian Comparison of Machine Learning Algorithms on Single and
Multiple Datasets" available at:
http://www.jmlr.org/proceedings/papers/v22/lacoste12/lacoste12.pdf
Note: It is noted from the question that you are looking into the best way to compare the results for machine learning algorithms and are not looking for additional machine learning algorithms that may implement this feature.
Machine Learning is not an well suited approach for that task, you would have to define what is good smoothing...
Principially your task cannot be solved by an algorithm that gives an general answer because every smoothing destroy the original data by some amount and adds invented positions, and different systems/humans that use the smoothed data react differently on that changed data.
The question is: What do you want to achieve with smoothing?
Why do you need smoothing? (have you forgotten to implement or enable a stand still filter that eliminates movement while the vehicle is standing still, which in GPS introduces jumping location during stand still?)
The GPS chip has already built in a (best possible?) real time smoothing using a Kalman filter, having on the one side more information than a post processed smotthing algo, on the other side it has less.
So next you have to ask yourself: do you compare post processing smooting algos or real time algos? (probably post processing) Comparing a real time smoothing algorithm with a post process smoothing algorithm is not fair.
Again: What do you expect from smoothed data: That they look somewhat fine, but unrealistic like photoshopped models for tv-advertisments?
What is good smoothing? near to real vehicle postion which nobody ever knows, or a curve whith low acceleration?
I would prefer an smoothing algorithm that produces the curve most near to the real (usually unknown) vehicle trajectory.
Or you might just think it should somehow look beautifull: In that case overlay the curves with different colors, display it on a satelitte image map, and let a team of humans (experts at least owning and driving an own car) decide what looks good and realistic.
We humans have the best multi purpose pattern matching algorithm built in.
Again why smooth?: for display in a map to please humans that look at that map?
or to use the smoothed tracks to feed other algorithms that have problems with the original data?
To please humans I have given an answer above.
To please other algorithms:
What they need? nearer positions? or better course value / direction between points.
What attributes do you want to smooth: only the latitude, longitude coordinates, or also the speed value, and course value?
I have much professional experience with GPS tracks, and recommend, to just remove every location under 7km/h and keep the rest as it is. In most cases there is no need for further smoothing.
Otherwise it gets expensive:
A possible solution:
1) You arrange a 2000€ Reference GPS receiver delivered with a magnetic vehicle roof antenna (E.g Company hemisphere 2000 GPS receiver) and use that as reference
2) You use a comnsumer GPS usually used for your task (smartphone, etc.)
Both mounted inside the car: drive some test tracks, in good conditions (highways) but more tracks at very bad: strong curves combined with big houses left and right. And through tunnel, a struight and a curved one, if you have one.
3) apply the smoothing algoritms to the consumer GPS tracks
4) compare the smoothed to the reference track, by matching two positions and finally calulate the (RMSE Root mean squared error)
Difficulties
matching two positions: Hopefully the time can be exactly matched which is usually not the case (0,5s offset possible).
Think what do you do when having an GPS outage.
Consider first to display a raw track and identify what kind of unsmoothed data is not suitable/ nice looking. (Probably later posting the pics here)
what about using the good old Kalman Filter!

Activity Recognition - Dimension reduction for continuous HMMs

I am a novice at HMMs but I have tried to build a code using Jahmm for the UCI Human Activity Recognition data set. The data set has 561 features and 7352 rows, and also includes the xyz inertial values of both the accelerometer and gyroscope, and it is mainly for recognizing 6 activities: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, and Laying. The data is normalized [-1,1], but not z-scaled. I can only get decent results after scaling (scale() function in R). After scaling, I have tried PCA, correlation of 90+%, and randomForest importance measures with mtry=8 for dimensional reduction, but so far, randomForest was the only one that seemed to work, but results are still quite low (80%). Also, sometimes, some activities give NaN values when run on the Jahmm code.
According to what I've read about HMMs so far, these results are too low. Should I do more preprocessing before I use the said dimension reduction techniques? Is there a particular dimension reduction technique that is compatible with HMMs? Am I overfitting it? or am I better off making it discrete instead of continuous? I really have to do Activity Recognition and HMMs for my project. I would be so glad to get suggestions/feedback from people who have already tried Jahmm and R for continuous HMMs. It would also be great if someone could suggest a package/library that uses log probabilities and gives off a viterbi sequence from a fitted HMM given a new set of test data.

Normalize output of fft using Libgdx in Android, from accelerometer data

I use the FFT function from the Libgdx library for a project in Android, where I process the accelerometer signal for create a signal spectrum.
I need to normalize the output from accelerometer data, i read there isn't a "correct" way to do this but is conventional. Someone use dividing by 1/N in FFT, other by 1/sqrt(N).
I didn't understand if this is conventional for who implements the library, this mean that every library have his normalization factor, or is conventional for the user than I can decide for aesthetic representation.
If it depends on library, which is the normalization factor for FFT in LIBGDX library?
Edit1: I searched already inside documentation but I found nothing. Here is it: http://libgdx-android.com/docs/api/com/badlogic/gdx/audio/analysis/FFT.html
I was about to say "just check the documentation", but it turns out that it's terrible, and doesn't say one way or the other!
Still, you could determine the scale factor empirically. Just run an FFT on all-ones dataset. There will be one non-zero bin in the output. There are three likely values of this bin:
1.0: The scale was 1/N
sqrt(N): The scale was 1/sqrt(N)
N: The scale was 1
You can do the same trick for the inverse FFT, although it's redundant. The forward and inverse scale factors must multiply to 1/N.
There's a specific normalization depending on if you want the spectrum or power spectral density. Oli provided a good test for determining the 1/N, 1/sqrt(N) or no scaling that the library performs.
Here's a document that explains everything in great detail along with a comprehensive comparison of window functions.
http://edoc.mpg.de/395068

I need a class to perform hypothesis testing on a normal population

In particular, I want to generate a tolerance interval, for which I would need to have the values of Zx for x some value on the standard normal.
Does the Java standard library have anything like this, or should I roll my own?
EDIT: Specifically, I'm looking to do something akin to linear regression on a set of images. I have two images, and I want to see what the degree of correlation is between their pixels. I suppose this might fall under computer vision as well.
Simply calculate Pearson correlation coefficient between those two images.
You will have 3 coefficients because of R,G,B channels needs to be analyzed separately.
Or you can calculate 1 coefficient just for intensity levels of images,... or you could calculate correlation between Hue values of images after converting to HSV or HSL color space.
Do whatever your see fits :-)
EDIT: Correlation coefficient may be maximized only after scaling and/or rotating some image. This may be a problem or not - depends on your needs.
You can use the complete statistical power of R using rJava/JRI. This includes correlations between pixels and so on.
Another option is to look around at imageJ, which contains libraries for many image manipulations, mathematics and statistics. It's an application allright, but the library is useable in development as well. It comes with an extensive developers manual. On a sidenote, imageJ can be combined with R as well.
imageJ allows you to use the correct methods for finding image similarity measures, based on fourier transformations or other methods. More info can be found in Digital Image Processing with Java an ImageJ. See also this paper.
Another one is the Commons-Math. This one also contains the basic statistical tools.
See also the answers on this question and this question.
It seems you want to compare to images to see how similar they are. In this case, the first two things to try are SSD (sum of squared differences) and normalized correlation (this is closely related to what 0x69 suggests, Pearson correlation) between the two images.
You can also try normalized correlation over small (corresponding) windows in the two images and add up the results over several (all) small windows in the image.
These two are very simple methods which you can write in a few minutes.
I'm not sure however what this has to do with hypothesis testing or linear regression, you might want to edit to clarify this part of your question.

Categories

Resources