Help with using FFT to determine frequency of an audio sample

Help with using FFT to determine frequency of an audio sample - java

I'm currently developing a percussion tutorial program. The program requires that I can determine what drum is being played, to do this I was going to analyse the frequency of the drum recording and see if the frequency is within a given range.
I have been using the Apache math commons implementation for FFT so far (http://commons.apache.org/math/) but my question is, once I preform the FFT, how do I use the array of results to calculate the frequencies contained in the signal?
Note: I have also tried experimenting with using Autocorrelation, but it didn't seem to work to well with sample from a drum kit
Any help or alternative suggestions of how to determine what drum is being hit would be greatly appreciated
Edit: Since writing this I've found a great online lesson on implementing FFT in java for Time/ frequency transformations Spectrum Analysis in Java

In the area of music information retrieval, people often use a related metric known as the mel-frequency cepstral coefficients (MFCCs).
For any N-sample segment of your signal, take the FFT. Those resulting N samples are transformed into a set of MFCCs containing, say, 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including which drum is used.
To do supervised classification, you can use something like a support vector machine (SVM). LIBSVM is a commonly used library that has Java compatibility (and many other languages). You train the SVM with these MFCCs and their corresponding instrument labels. Then, you test it by feeding a query MFCC vector, and it will tell you which instrument it is.
So the basic procedure, in summary:
Get FFT.
Get MFCCs from FFT.
Train SVM with MFCCs and instrument labels.
Query the SVM with MFCCs of the query signal.
Check for Java packages that do these things. (They must exist. I just don't know them.) Relatively, drum transcription is easier than most other instrument groups, so I am optimistic that this would work.
For further reading, there are a whole bunch of articles on drum transcription.

When I made a program using a DFT, I had it create an array of Frequencies and Amplitudes for each frequency. I could then find the largest amplitudes, and compare those to musical notes, getting a good grasp on what was played. If you know the approximate frequency of the drum, you should be able to do that.

Related

How to recognize the sound frequency from a .WAV file in Java

AudioInputStream stream = AudioSystem.getAudioInputStream(new File("file_a4.wav"));
I am looking for a way to recognise the frequency of a musical scale sound (e.g. A4 = 440 Hz) recorded on a .wav file. I have read a lot about FFT, but it has been suggested that the frequencies on the musical scale do not match the FFT.
I have also heard about DTFT. What should I use to recognise the frequency from a sound file?

What I understand from your question is that you want to recognize the musical note/s an instrument is playing in a wav file. If that is the case, there are several algorithms for doing that, and you could always train a neural network for doing that too.
Some important Things to take into account are:
Any instrument (the same would happen for the musical sounds produced by the human voice) has its own particular "color" when producing a note. This color is called the timbre (https://en.wikipedia.org/wiki/Timbre), and is composed by the harmonic and inharmonic frequencies that surround the frequency you psychoacoustically perceive when listening to that specific note. This is why you cannot just look for the peak of an FFT to detect the musical note, and it is also the reason why a piano sounds different than a guitar when playing the same note.
The analysis of an audio signal is often performed by windowing the signal and calculating the DFT of the windowed part of the signal. Each window would then produce its own spectrum, and it s from the analysis of each individual spectrum and/or the analysis of how they interact that you (or your CNN, for example) will obtain your conclusions/results. This process of windowing the signal and calculating the DFTs produces a spectogram (https://en.wikipedia.org/wiki/Spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,sonographs%2C%20voiceprints%2C%20or%20voicegrams.)
After that short introduction, here are some simple algorithms for identifying single notes in a wav file. You will be able to find implementations of those algorithms on the internet, and many others. The detection of the notes produced by chords is more complex but can be done with other algorithms or neural networks.
On the use of autocorrelation analysis for pitch detection: https://ieeexplore.ieee.org/document/1162905
YIN algorithm: http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

Distinguish notes FFT algorithm

What I am trying to achieve is distinguishing separate notes in an audio file. For simplicity lets say that a couple of notes are played one after another. The main question is how to determine when the next note is played ?
What I have already done is reading samples from an audio file and doing Fourier Transform on those samples using JTransforms library. Here's what I get:
.
Then I have calculated spectrum based on the data that FFT returned, and here's what I get:
.
As I understand the bigger "columns" on the chart are the harmonics, and the small ones are noise and other non-harmonic overtones, right ?
After that I have tried to do same process with audio file with two notes played one after another, but the result was kind of the same.
As a side question, do any of you know some lightweight and fast libraries for visualizing such data ? Because using JFreeChart for bigger datasets is a real pain for my processor.

To detect successive frequency bursts of different frequencies, and some of their time domain parameters, one can use overlapping short FFT windows (length shorter than the expected burst length) and look for where the relative magnitudes of frequency peaks swap order, or fall above/below thresholds. If you know, a priori, the frequencies involved, you can use the Goertzel filters instead of FFTs, with sliding windows or successive approximation in time for finer time domain granularity.
For pitched notes (such as music), one can do something similar, except using a pitch detection/estimation method (instead of simple FFT magnitudes, which are not reliable) on sufficiently short time domain windows of data.

Activity Recognition - Dimension reduction for continuous HMMs

I am a novice at HMMs but I have tried to build a code using Jahmm for the UCI Human Activity Recognition data set. The data set has 561 features and 7352 rows, and also includes the xyz inertial values of both the accelerometer and gyroscope, and it is mainly for recognizing 6 activities: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, and Laying. The data is normalized [-1,1], but not z-scaled. I can only get decent results after scaling (scale() function in R). After scaling, I have tried PCA, correlation of 90+%, and randomForest importance measures with mtry=8 for dimensional reduction, but so far, randomForest was the only one that seemed to work, but results are still quite low (80%). Also, sometimes, some activities give NaN values when run on the Jahmm code.
According to what I've read about HMMs so far, these results are too low. Should I do more preprocessing before I use the said dimension reduction techniques? Is there a particular dimension reduction technique that is compatible with HMMs? Am I overfitting it? or am I better off making it discrete instead of continuous? I really have to do Activity Recognition and HMMs for my project. I would be so glad to get suggestions/feedback from people who have already tried Jahmm and R for continuous HMMs. It would also be great if someone could suggest a package/library that uses log probabilities and gives off a viterbi sequence from a fitted HMM given a new set of test data.

Detect specific sound in audio file [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?

This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.

The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html

Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins

The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)

If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial

Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.

To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.

Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).

There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.

I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.

record audio in java and determine real time if a tone of x frequency was played if so do something

I want to be able to detect a tone of a predetermined frequency using java. What I am doing is playing a tone (the frequency of the tone is variable by user input) and I am trying to detect if the tone is of a certain frequency. If it is, I execute a certain method. From what I have read I will need to us FFT, but I'm not sure how to implement it in java. There seems to be a lot of documentation for how to do it, but what documentation there is involves looking at an audio file rather than real time analysis. I don't need to save the audio to a file just determine if and when a tone of frequency x was recorded.
Ideally I would like to record at a sample rate of 44KHz and after determining if a tone was detected, determine when the tone was detected with an accuracy of +-3ms. However, an accuracy less than this would be acceptable as long as it isn't ridiculous (ie +100ms). I know roughly what I need to do from what I have looked up, but I need help tying it all together. Using pseudo code it would look roughly like this (I think)
Note that I know roughly within +-1s of when a tone of satisfying frequency maybe detected
for(i = 0, i < 440000 * 2, i++){//*2 because of expected appearance interval;may change
record sound sample
fft(sound sample)
if(frequencySoundSample > x){
do something
return
}
}
There will be considerable background noise while the tone is played. However the tone will have a very high frequency, like 15-22KHz, so it is my belief that by simply looking for when the recorder detects a very high frequency I can be sure it is my tone (also the tone will be played with a high amplitude for maybe .5s or 1s). I know that there will not be other high frequency sounds as background noise (I am expecting a background frequency high of maybe 5KHz).
I have two questions then. Is the pseudo code that I have provided sufficient for what I want to do? If it isn't or if there is a better way of doing this I'm all for it. Second, how would I implement this in java? I understand what I need to do, but I'm having trouble tying it all together. I'm pretty decent with java but I'm not familiar with the syntax involved with audio and I don't have any experience with fft. Please be explicit and give code with comments. I've been trying to figure this out for a while I just need to see it all tied together. Thank you.
EDIT
I understand that using a for loop like I have will not produce the frequency that I want. It was more to show roughly what I want. That is, recording, performing fft, and testing the frequency all at once as time progresses.

If you're just looking for a specific frequency then an FFT-based method is probably a bad choice for your particular application, for two reasons:
it's overkill - you're computing an entire spectrum just to detect the magnitude at one point
to get 3 ms resolution for your onset detection you'll need a large overlap between successive FFTs, which will require much more CPU bandwidth than just processing successive blocks of samples
A better choice for detecting the presence or absence of a single tone is the Goertzel algorithm (aka Goertzel filter). It's effectively a DFT evaluated at a single frequency domain bin, and is widely used for tone detection. It's much less computationally expensive than an FFT, very simple to implement, and you can test its output on every sample, so no resolution problem (other than those dictated by the laws of physics). You'll need to low pass filter the magnitude of the output and then use some kind of threshold detection to determine the onset time of your tone.
Note that there are a number of useful questions and answers on SO already about tone detection and using the Goertzel algorithm (e.g. Precise tone onset/duration measurement?) - I suggest reading these along with the Wikipedia entry as a good starting point.

Im actually working on a similar project with pitch detection, in Java as well. If you want to use FFT, you could do it with these steps. Java has a lot of libraries that can make this process easy for you.
First, you need to read in the sound file. This can be done using Java Sound. It's a built in library with functions that make it easy to record sound. Examples can be found here. The default sample rate is 44,100 KHz (CD quality). These examples can get you from playing the actual tone to a double array of bytes representing the tone.
Second, you should take the FFT with JTransforms. Here is an example of FFT being taken on a collection of samples.
FFT gives you an array twice the length of the array of samples you passed it. You need to go through the FFT array by two's, since each part of this array is represented as an imaginary and a real piece. Compute the magnitude of each part of this array with sqrt(im^2 + re^2). Then, find which magnitude is the largest. The index of that magnitude corresponds to the frequency you're looking for.
Keep in mind, you don't take FFT on the entire portion of sound. You break the sound up into chunks, and FFT each one. The chunks can overlap for higher accuracy, but that shouldn't be a problem, since you're just looking for a predetermined note. If you want to improve performance, you can also window each chunk before doing this.
Once you have all the FFTs, they should confirm a certain frequency, and you can check that against the note you want.
If you want to try and visualize this, I'd suggest using JFreeChart. It's another library that makes it easy to graph things.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.