I have implemented MFCC algorithm and want to implement BFCC. What are the differences between them and is it enough just to use another function instead of frequency to mel (2595 * Math.log10(1 + frequency / 700) ) and mel to frequency functions (700 * (Math.pow(10, mel / 2595) - 1) ) I follow that code: MFCC
PS: Does it need to change the code for triangular filters?
These are just different scales of representing the frequency spacings of the filters. MFCC uses filters whose center frequencies are spaced along the mel scale, while BFCC will use filters with center frequencies spaced along the bark scale.
The bark scale would simply be represented as:
Bark(f)=13*arctan(0.00076*f)+3.5*arctan((f/(7500))*(f/(7500)))
where f is the frequency in Hz.
Though you can use the bark scale to represent the center frequency spacings, research shows that using either mfcc or bfcc to represent feature vectors of an input speech sample has very little effect on ASR systems performance. The industry standard remains MFCC. In fact, I have not heard much of the BFCC.
If the code for the computation of filter coefficients is relatively generic and it takes in center frequencies as an input parameter, then I would say that you are OK. But, it is always best to double-check. Use MATLAB and plot frequency responses and check! You can check the [following paper][1] out for a comparison between MFCC, BFCC and uniform scale frequency spacings.
Update 1: The center frequency of a filter is either the arithmetic/geometric mean between the upper and lower cutoff frequencies of a band-pass/band-stop filter.
Also, the reverse equation to solve for f given the Bark frequencies is not trivial. It will be a quadratic equation that will need to be solved. One way would be to have a table constructed for different values of f and Bark and then do a table lookup. But I have not been able to find any links to the reverse equation.
[1]: http://148.204.64.201/paginas%20anexas/voz/articulos%20interesantes/front%20end/MFCC/a-comparative-study-of.pdf
You could just instead select the frequencies by hand of each bark critical band (a bounch of if's and else's), since there is no exact equation for bark critical bands (for mel's either, but there is a pretty close one), then get the logarithm of the value for each band, and then apply dct, remember this is for each frame, mel scale uses also logarithmic scale, so there is not much point between doing mfcc or bfcc.
Related
While simulating a random point process based on Poisson distribution that contains 1000 dots; they all appear to occupy a small region in the center of the window.
I used Donald Knuth inverse sampling algorithm to implement the Poisson-based pseudo-random number generator.
https://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
Lambda value (aka the success rate) was set to window_dimension/2, and obtained this result (screenshot)
Code:
public double getPoisson(double lambda) {//250
double L = Math.exp(-lambda);
double p = 1d;
int k = 0;
do {
k++;
p *= Math.random();
} while (p > L);
return k-1;
}
`
It looks to me like the problem is with what you think the output should be, because the program seems to be generating pretty much what you asked. A Poisson with a rate of 500 will have both its expected value and its variance equal to 500, and for large values of λ it's pretty symmetric and bell-shaped. Taken together that all means the standard deviation is sqrt(500), which is slightly less than 22.4, so you should expect about 95% of your incomes to be 500±45, which looks like what you're getting.
With your subsequent edit saying (in a comment) that λ=250, the results behave similarly. The likely range of outcomes in each dimension is 250±31, still clustering to the center.
It's easy to confirm my explanation by creating Poisson random variates with a standard deviation such that ±3σ span your plot area.
You need a larger variance/standard deviation to increase the spread of outcomes across your window. To demo this, I went with a Poisson(6400)—which has a standard deviation of 80—and subtracted 6150 to give the result a mean of 250. The overwhelming majority of values will therefore fall between 0 and 500. I generated 1000 independent pairs of values and plotted them using the JMP statistics package, and here are the results:
and just for jollies, here's a plot of independent pairs of Normal(250, 80)'s:
They look pretty darn similar, don't they?
To reiterate, there's nothing wrong with the Poisson algorithm you used. It's doing exactly what you told it to do, even if that's not what you expected the results to look like.
Addendum
Since you don't believe that Poisson converges to Gaussian as lambda grows, here's some direct evidence for your specific case, again generated with JMP:
On the left is a histogram of 1000 randomly generated Poisson(250) values. Note the well-formed bell shape. I had JMP select the best continuous distribution fit based on AIC (Aikaike Information Criterion). It selected normality as the best possible fit, with the diagnostics on the right and the resulting density plot in red superimposed on the histogram. The results pretty much speak for themselves.
I am doing a two-faces comparison work using OpenCV FaceRecognizer of LBP type. My question is how to calculate the percentage format prediction confidence? Giving the following code(javacv):
int n[] = new int[1];
double p[] = new double[1];
personRecognizer.predict(mat, n, p);
int confidence = p[0];
but the confidence is a double value, how should I convert it into a percentage % value of probability?
Is there an existing formula?
Sorry if I didn't state my question in a clear way. Ok, here is the scenario:
I want to compare two face images and get out the likeliness of the two face, for example input John's pic and his classmate Tom's pic, and let's say the likeliness is 30%; and then input John's pic and his brother Jack's pic, comes the likeliness is 80%.
These two likeliness factor shows that Jack is more like his brother John than Tom does... so the likeliness factor in percentage format is what i want, more the value means more likeliness of the two input face.
Currently I did this by computing the confidence value of the input using opencv function FaceRecognizer.predict, but the confidence value actually stands for the distance between the inputs in their feature vectors space, so how can I scale the distance(confidence) into the likeliness percentage format?
You are digging too deep by your question. Well, according to the OpenCV documentation:
predict()
Predicts a label and associated confidence (e.g. distance) for a given
input image
I am not sure what are you looking for here but the question is not really easy to be answered. Intra-person face variants (variation of the same person) are vast and inter-person face variation (faces from different persons) can be more compact (e.g. when both face front while the intra-person second facial image is profile) so this is a whole topic that expect an answer.
Probably you should have a ground truth (i.e. some faces with labels already known) and deduct form this set the percentage you want associating the distances with the labels. Though this is also often inaccurate as distance would not coincide with your perception of similarity (as mentioned before inter-person faces can vary a lot).
Edit:
First of all, there is no universal human perception of face similarity. On the other half, most people would recognize a face that belongs to the same person in various poses and postures. Most word here is important. As you pressure the limits the human perception will start to diverge, e.g. when asked to recognize a face over the years and the time span becomes quite large (child vs adolescence vs old person).
You are asking to compute the similarity of noses/eyes etc? If so, I think the best way is to find a set of noses/eyes belonging to the same persons and train over this and then check your performance on a different set from different persons.
The usual approach as I know is to train and test using pairs of images comprising positive and negative samples. A positive sample is a pair of images belonging to the same person while a negative one is an image pair belong to two different ones.
I am not sure what you are asking exactly so maybe you can check out this link.
Hope it helped.
Edit 2:
Well, since you want to convert the distance that you are getting to a similarity expressed as percentage you can somehow invert the distance to get the similarity. There are some problems arising here though:
There is a value for absolute match, that is dis = 0; or equivalently similarity is sim = 100% but there is no value explicit for total mismatch: dis = infinite so sim = 0%. On the other hand the inverse progress has explicit boundaries 0% - 100%.
Since extreme values include 0 and infinite there must be a smarter conversion than simple inversion.
You can easily assign 1.0 (or 100% to similarity) corresponding to the absolute match but what you are going to take as total mismatch is not clear. You can consider an arbitrary high value as 0.0 (since you there is no big difference e.g. in using distance 10000 to 11000 I guess) and all values higher than this (distance values that is) to be considered 0.0.
To find which value that should be I would suggest to compare two quite distinct images and use the distance between them as 0.0.
Let's suppose that this value is disMax = 250.0; and simMax = 100.0;
then a simple approach could be:
double sim = simMax - simMax/disMax*dis;
which gives a 100.0 similarity for 0 distance and 0.0 for 250 distance. Values larger than 250 would give negative similarity values which should be considered 0.0.
I did read the Point set registration and would like to implement it for my simple line matching. However, I only got very basic maths knowledge and cannot really understand the equations on the page.
Assuming I am able to extract points from 2 images, searching nearest pair by brute force looping and got a list of pairs with corresponding distances.
What is the next step to calculate a single index by utilizing the above data obtained?
The idea I currently come up with is to simply average all the distance. I believe this are many better approach. Or I should capture more data for the calculation?
Your instincts are almost correct.
Generally, the metric is the sum of squared distances; with the goal of finding the least-squares fit (minimizing the sum of all the individual square distances). Essentially this minimizes the standard deviation (actually it minimizes variance, but same end effect).
So take all your corresponding pairs, calculate the distance squared between them (fast calculation, no sqrt involved; faster than calculating actual distances) add them up and the lower the better. If your point sets differ in count you may wish to divide by the count to get a proper variance value.
This metric applies to pretty much any registration algorithm.
By the way, if you already have a point correspondance and you know there is no scaling/skewing, you might also be interested in Horn's method, which is a closed-form (non-iterative) algorithm that just spits out the least-squared fit directly. It's very efficient.
(P.S. For a very simple explanation of why the variance is a better indicator than the mean distance, check out this page).
I am trying to build a simple system to recognize simple shapes using Fourier descriptors:
I am using this implementation of Fast fourier transform on my program: (link below)
http://www.wikijava.org/wiki/The_Fast_Fourier_Transform_in_Java_%28part_1%29
fft(double[] inputReal, double[] inputImag, boolean direction)
inputs are: real and imag part (which are essentially x,y coordinates of boundary parameter I have)
and outputs are the transformed real and imag numbers.
question: How can i use the output (transformed real,imag ) as a invariant descriptors of my simple shapes?
This was what I thought :
calculate R = sqrt( real^2 + imag^2 ) for each N steps.
divide each R by R[1] = the normalization factor to make it invariant.
The problem is I get very different R values for slightly different images (such as slight rotations applied, etc)
In other words :
My descriptors are not invariant... I think I am doing something wrong with getting the R value.
There is some theory you need to know first about Fourier Descriptors: it's an extremely interesting technique, but should be devised correctly. What you want is invariance; invariance for rotation, translation, maybe even affine transforms. To allow good comparison with other sets of Fourier descriptors you should take following things in consideration:
if you want invariance to translation, do not use the DC-term, that is the first element in your resulting array of Fourier coefficients
if you want invariance to scaling, make the comparison ratio-like, for example by dividing every Fourier coefficient by the DC-coefficient. f*[1] = f[1]/f[0], f*[2]/f[0], and so on.
if you want invariance to the start point of your contour, only use absolute values of the resulting Fourier coefficients.
Only the first 5 to 8 Fourier coefficients are useful when comparing the coefficients of two different objects; higher coefficients only go into the details of your contour which mostly isn't very useful information. (it's the global form that matters)
Let's say you have 2 objects, and their Fourier descriptors. The resulting array of Fourier coefficients can be of a different size, meaning that the 'frequency interval' of the resulting frequency content is different for both shapes. You can't compare apples with pears. Zero-pad your shortest contour to match the size of the longest contour, and then calculate the Fourier descriptors. Now you have analogy between coefficients and a good comparison.
Hope this helps. Btw, user-made FFT solutions are not to be trusted in my opinion. Go for the solutions libraries reach out. If working with images, OpenCV provides Fourier transform utilities.
If you are looking to match different shapes, try using different shape descriptors from MPEG-7 standard. You will probably need a classifier, take a look at SVM, Boosting, Neural Networks ...: http://docs.opencv.org/modules/ml/doc/ml.html
I am looking for an open source package (preferably Java but R or other languages would be ok too) that provides these 2 functions
1) points output_seq[] SCALE(points input_seq[], double factor)
In other words a sequence of doubles (x1,y1), (x2,y2)... is given as input that represents a graph (each point is connected to the next by a straight line) and a scaling factor is given. Then it returns a similar sequence as output. The catch is that the output sequence may have fewer or more elements than the input. For example, if I request magnification by a factor of 2.012 then the output sequence may have twice as many elements as the input. The scaling factor should be a double, not an integer.
Lastly, it's important to return the output sequence as points (doubles), I have very little interest in the actual drawing on a screen beyond proving that it does the right thing.
2) points output_seq[] ROTATE(points input_seq[], double angle)
same as above, except there is no scaling but just rotation, the angle is from 0 to 359.9999 and is given in radians.
The size of the output is always the same as the size of the input.
Again the emphasis is on getting the output sequence as doubles, not so much on the actual drawing on a screen.
If you know the right terminology I should have used then let me know.
Thank you so much.
In Java, Path2D is suitable for 2D floating point coordinates. The lineTo() method will add straight lines to the path. Because Path2D implements the Shape interface, rotate and scale are possible via createTransformedShape(). One approach to interpolation, using PathIterator, is shown here.