I am a novice at HMMs but I have tried to build a code using Jahmm for the UCI Human Activity Recognition data set. The data set has 561 features and 7352 rows, and also includes the xyz inertial values of both the accelerometer and gyroscope, and it is mainly for recognizing 6 activities: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, and Laying. The data is normalized [-1,1], but not z-scaled. I can only get decent results after scaling (scale() function in R). After scaling, I have tried PCA, correlation of 90+%, and randomForest importance measures with mtry=8 for dimensional reduction, but so far, randomForest was the only one that seemed to work, but results are still quite low (80%). Also, sometimes, some activities give NaN values when run on the Jahmm code.
According to what I've read about HMMs so far, these results are too low. Should I do more preprocessing before I use the said dimension reduction techniques? Is there a particular dimension reduction technique that is compatible with HMMs? Am I overfitting it? or am I better off making it discrete instead of continuous? I really have to do Activity Recognition and HMMs for my project. I would be so glad to get suggestions/feedback from people who have already tried Jahmm and R for continuous HMMs. It would also be great if someone could suggest a package/library that uses log probabilities and gives off a viterbi sequence from a fitted HMM given a new set of test data.
Related
Hi everyone i’m trying to implement a face recognition system for a video surveillance application.
In this context test images are low quality, illumination change from an image to another, and, moreover, the detected subjects are not always in the same pose.
As first recognizer i used FisherFaces and, with 49 test images, i obtain an accuracy of 35/49, without considering the distances of each classified subject (i just considered labels). Trying to get a better accuracy i attempt to make a preprocessing both of training images and test images; the preprocessing i choose is described in “Mastering OpenCV with Practical Computer Vision Projects” book. The steps are:
detection of the eyes in order to allign and rotate a face;
separate histogram equalization to standardize the lighting in the image;
filtering to reduce the effect of pixel noise because the histogram equalization increase it;
the last step is to apply an elliptical mask to the face in order to delete some details of the face that are not significant for the recognition.
Well, with this type of preprocessing, i obtain worse results than before (4/49 subjects properly classified). So i thought of using another classifier, the LBPH recognizer, to improve the accuracy of the recognition since these two types of algorithms have different features and different ways to classify a face; if one use them together maybe the accuracy increase.
So my question is about the ways to combine these two algorithms; anyone knows how to merge the two outputs in order to obtain better accuracy? I thought at this: if FisherFaces and LBPH give the same result (the same label) then there is no problem; otherwise if they disagree my idea is to take the vector of the labels and the vector of the distances for each algorithm and for each subject sum the corresponding distances; at this point the label of the test image is the one that has the shortest distance.
This is just my idea but there are other ways to fuse the output of both algorithm also because i should change the code of the predict function of face module in OpenCV since it returns an int type not a vector of int.
Im am trying to learn and also adapt the ImageNeuralNetwork example in Java. So far my problem is that when i give the NN a larger amount of images that are 32X32 and let it train the error never goes down below 14% and at the beginning it jumps all over the place.
My images are BW and they are classified into 27 classes. so i know that there are 27 output neurons.
My question is why is the NN not learning, I tried setting different hidden layers ( 1 or 2 layers) with different neuron counts but nothing helps.
Can anyone give me an idea of what im doing wrong? Like i said im just beginning with NNs and im a bit lost here
Edit: It seems if i give it less images as input to learn the error goes down, but that doesnt really solve the problem, If i wanted to classify a lot of images i would be stuck with the error never going down
You need to use only one hidden layer. Additional hidden layers on a neural network really does not give you much, see the universal approximation theorem. I would try starting with (input count + output count) * 1.5 as the number of hidden neurons.
As to why the ANN fails to coverage with more images, that is more difficult. Most likely it is because the additional images are too varied for the ANN to classify them all. A simple feedforward ANN is really not ideal for grid-based image recognition. The neural network does not know which pixels are next to each other, it is just a straight linear vector of pixels. The ANN is basically learning which pixels must be present for each of the letters. If you shift one of the letters even slightly, the ANN might not recognize it because you've now moved nearly every pixel that it was trained with.
I really do not do much with OCR. However, this does seem to be the area where deep learning excels. Convolutional neural networks are better able to handle pixels near each other and approximate. You might get better results with a deep learning application. More info here: http://dpkingma.com/sgvb_mnist_demo/demo.html
I am a novice at HMMs but I have tried to build a code using Jahmm for the UCI Human Activity Recognition data set. The data set has 561 features and 7352 rows, and also includes the xyz inertial values of both the accelerometer and gyroscope, and it is mainly for recognizing 6 activities: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, and Laying. So far, I have tried the following:
With the xyz inertial values:
For each of the 6 activities, I trained 6 HMMs for each axis (for
both accelerometer and gyroscope), only using the activity train
data for the corresponding HMM. For each activity also, I applied
equal weights on all axes' probabilities (that is, when applied to
test data), and added them all to get the total for each activity.
The maximum probability will be the one picked. (I had no luck on
this one. There are activities with super high accuracies at the
same time super low on others.) Note: I used "ObservationReal", 6
states (tried states 2-10, actually), and just uniformly divided
initial values for the HMM. I sometimes get NaN values for some of
the activities.
I also tried scaling (z-score) the data first in R, and then
applying the above method, but still to no avail.
I also tried coding the inertial values with "ObservationVector,"
but I couldn't figure out how to set the initial Opdfs (it says that
it has to be a positive definite matrix).
With the feature values:
I found that the feature set is just too large to run on Jahmm, so
with the scaled data (because I couldn't get any decent results with
the out-of-the-box data though it's normalized [-1,1]), I ran the
train and test data on R for PCA and correlation before I fed them
on my Jahmm code (which consists of six 6-state HMMs, each for every
activity, taking the maximum probability with test data), and the
results are still not so good. Particularly the Sitting activity,
which always gets around 20% accuracy. (The same parameters with the
"Note" above)
I ran randomForest with the same data on R (with mtry=8), and got
the importance values. I separated the locomotive and static
activities first with 119 variables, then classified the locomotive
activities (Walking, W. Upstairs, W. Downstairs) with 89 features
(based on RF importance values) and static activities (Sitting,
Standing, Laying) with 5 variables. Separating the locomotive and
static activities is easy (2 states, 100%) but this method, with
adjusted HMM parameters, I only gained 86% overall accuracy. (Used
3-state HMMs for the second level)
I trained one HMM for all activities, with 6 states (corresponding
to 1 activity, as I've read in one paper). But I couldn't figure out
how to use the Viterbi after that. It tells me the Viterbi needs
List<Observation O> test sequences, but I obviously have
List<List<ObservationReal>> for my test data.
I have also tried HMM packages in R:
depmixS4 - doesn't have viterbi, and I have no idea how to get the
posterior probabilities with the test data (it gives the probs with
the train data only); I've tried contacting the author of the
package and he tried helping me, but the code he told me to try
gives me errors (I have yet to email him back).
RHmm - works like a charm at first; trained only one 6-state HMM
with all train data, but produces nans, resulting to a bad viterbi
sequence with the test data.
According to what I've read about HMMs so far, these results are too low for HMM. Am I doing something wrong? Should I do more preprocessing before I use the said techniques? Is the data really too large for HMM/Jahmm? Am I overfitting it? I am stuck now, but I really have to do Activity Recognition and HMMs for my project. I would be so glad to get suggestions/feedback from people who have already tried Jahmm and R for continuous HMMs. I am also open to study other languages, if that would mean it would finally work.
I just stumbled upon your question while searching for a scalable Java library. It seems you did not train HMM properly. When I first used HMM, I was also not able to get the correct results. I have used R to train and test HMM, here are some suggestions that can be helpful to you.
Properly assign random initial states when initializing the states and observable probabilities. Here is the code snippet from R using HMM library.
library(HMM)
....
...
ranNum<-matrix(runif(numStates*numStates, 0.0001, 1.000),nrow=numStates,ncol=numStates)
transitionInit <- ranNum/rowSums(ranNum)
ranNum<-matrix(runif(numStates*numSymbols, 0.0001, 1.000),nrow=numStates,ncol=numSymbols)
emissionInit <- ranNum/rowSums(ranNum)
rowSums(emissionInit)
hmm = initHMM(c(1:numStates),symbols,transProbs=transitionInit,emissionProbs=emissionInit)
Try to chop your rows into short sequences. I used Sliding window technique to chop them and then remove the redundant ones to avoid retraining and to save time.
You can save memory by replacing a string observable by an integer or a symbol
I used the following to train HMM using BaumWelch and measured the logForwardProbabilties to determine the likelihood (not probability). You need to sum the loglikelihood of each state to get the final log likelihood of the sequence
bw = baumWelch(hmm,trainSet,maxIterations=numIterations, delta=1E-9, pseudoCount=1E-9)
logForwardProbabilities <- forward(bw$hmm,validationSet[cnt,])
vProbs<-sum(logForwardProbabilities[,seqSize])
This is a negative number, calculate it for each of the 6 HMMS you trained and then see whichever is the bigger would represent a sequence.
I hope this might help you or someone else; if it's not too late.
I use the FFT function from the Libgdx library for a project in Android, where I process the accelerometer signal for create a signal spectrum.
I need to normalize the output from accelerometer data, i read there isn't a "correct" way to do this but is conventional. Someone use dividing by 1/N in FFT, other by 1/sqrt(N).
I didn't understand if this is conventional for who implements the library, this mean that every library have his normalization factor, or is conventional for the user than I can decide for aesthetic representation.
If it depends on library, which is the normalization factor for FFT in LIBGDX library?
Edit1: I searched already inside documentation but I found nothing. Here is it: http://libgdx-android.com/docs/api/com/badlogic/gdx/audio/analysis/FFT.html
I was about to say "just check the documentation", but it turns out that it's terrible, and doesn't say one way or the other!
Still, you could determine the scale factor empirically. Just run an FFT on all-ones dataset. There will be one non-zero bin in the output. There are three likely values of this bin:
1.0: The scale was 1/N
sqrt(N): The scale was 1/sqrt(N)
N: The scale was 1
You can do the same trick for the inverse FFT, although it's redundant. The forward and inverse scale factors must multiply to 1/N.
There's a specific normalization depending on if you want the spectrum or power spectral density. Oli provided a good test for determining the 1/N, 1/sqrt(N) or no scaling that the library performs.
Here's a document that explains everything in great detail along with a comprehensive comparison of window functions.
http://edoc.mpg.de/395068
I'm currently developing a percussion tutorial program. The program requires that I can determine what drum is being played, to do this I was going to analyse the frequency of the drum recording and see if the frequency is within a given range.
I have been using the Apache math commons implementation for FFT so far (http://commons.apache.org/math/) but my question is, once I preform the FFT, how do I use the array of results to calculate the frequencies contained in the signal?
Note: I have also tried experimenting with using Autocorrelation, but it didn't seem to work to well with sample from a drum kit
Any help or alternative suggestions of how to determine what drum is being hit would be greatly appreciated
Edit: Since writing this I've found a great online lesson on implementing FFT in java for Time/ frequency transformations Spectrum Analysis in Java
In the area of music information retrieval, people often use a related metric known as the mel-frequency cepstral coefficients (MFCCs).
For any N-sample segment of your signal, take the FFT. Those resulting N samples are transformed into a set of MFCCs containing, say, 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including which drum is used.
To do supervised classification, you can use something like a support vector machine (SVM). LIBSVM is a commonly used library that has Java compatibility (and many other languages). You train the SVM with these MFCCs and their corresponding instrument labels. Then, you test it by feeding a query MFCC vector, and it will tell you which instrument it is.
So the basic procedure, in summary:
Get FFT.
Get MFCCs from FFT.
Train SVM with MFCCs and instrument labels.
Query the SVM with MFCCs of the query signal.
Check for Java packages that do these things. (They must exist. I just don't know them.) Relatively, drum transcription is easier than most other instrument groups, so I am optimistic that this would work.
For further reading, there are a whole bunch of articles on drum transcription.
When I made a program using a DFT, I had it create an array of Frequencies and Amplitudes for each frequency. I could then find the largest amplitudes, and compare those to musical notes, getting a good grasp on what was played. If you know the approximate frequency of the drum, you should be able to do that.