I am developing a system as an aid to musicians performing transcription. The aim is to perform automatic music transcription (it does not have to be perfect, as the user will correct glitches / mistakes later) on a single instrument monophonic recording. Does anyone here have experience in automatic music transcription? Or digital signal processing in general? Help from anyone is greatly appreciated no matter what your background.
So far I have investigated the use of the Fast Fourier Transform for pitch detection, and a number of tests in both MATLAB and my own Java test programs have shown it to be fast and accurate enough for my needs. Another element of the task that will need to be tackled is the display of the produced MIDI data in sheet music form, but this is something I am not concerned with right now.
In brief, what I am looking for is a good method for note onset detection, i.e. the position in the signal where a new note begins. As slow onsets can be quite difficult to detect properly, I will initially be using the system with piano recordings. This is also partially due to the fact I play piano and should be in a better position to obtain suitable recordings for testing. As stated above, early versions of this system will be used for simple monophonic recordings, possibly progressing later to more complex input depending on progress made in the coming weeks.
Here is a graphic that illustrates the threshold approach to note onset detection:
This image shows a typical WAV file with three discrete notes played in succession. The red line represents a chosen signal threshold, and the blue lines represent note start positions returned by a simple algorithm that marks a start when the signal level crosses the threshold.
As the image shows, selecting a proper absolute threshold is difficult. In this case, the first note is picked up fine, the second note is missed completely, and the third note (barely) is started very late. In general, a low threshold causes you to pick up phantom notes, while raising it causes you to miss notes. One solution to this problem is to use a relative threshold that triggers a start if the signal increases by a certain percentage over a certain time, but this has problems of its own.
A simpler solution is to use the somewhat-counterintuitively named compression (not MP3 compression - that's something else entirely) on your wave file first. Compression essentially flattens the spikes in your audio data and then amplifies everything so that more of the audio is near the maximum values. The effect on the above sample would look like this (which shows why the name "compression" appears to make no sense - on audio equipment it's usually labelled "loudness"):
After compression, the absolute threshold approach will work much better (although it's easy to over-compress and start picking up fictional note starts, the same effect as lowering the threshold). There are a lot of wave editors out there that do a good job of compression, and it's better to let them handle this task - you'll probably need to do a fair amount of work "cleaning up" your wave files before detecting notes in them anyway.
In coding terms, a WAV file loaded into memory is essentially just an array of two-byte integers, where 0 represents no signal and 32,767 and -32,768 represent the peaks. In its simplest form, a threshold detection algorithm would just start at the first sample and read through the array until it finds a value greater than the threshold.
short threshold = 10000;
for (int i = 0; i < samples.Length; i++)
{
if ((short)Math.Abs(samples[i]) > threshold)
{
// here is one note onset point
}
}
In practice this works horribly, since normal audio has all sorts of transient spikes above a given threshold. One solution is to use a running average signal strength (i.e. don't mark a start until the average of the last n samples is above the threshold).
short threshold = 10000;
int window_length = 100;
int running_total = 0;
// tally up the first window_length samples
for (int i = 0; i < window_length; i++)
{
running_total += samples[i];
}
// calculate moving average
for (int i = window_length; i < samples.Length; i++)
{
// remove oldest sample and add current
running_total -= samples[i - window_length];
running_total += samples[i];
short moving_average = running_total / window_length;
if (moving_average > threshold)
{
// here is one note onset point
int onset_point = i - (window_length / 2);
}
}
All of this requires much tweaking and playing around with settings to get it to find the start positions of a WAV file accurately, and usually what works for one file will not work very well on another. This is a very difficult and not-perfectly-solved problem domain you've chosen, but I think it's cool that you're tackling it.
Update: this graphic shows a detail of note detection I left out, namely detecting when the note ends:
The yellow line represents the off-threshold. Once the algorithm has detected a note start, it assumes the note continues until the running average signal strength drops below this value (shown here by the purple lines). This is, of course, another source of difficulties, as is the case where two or more notes overlap (polyphony).
Once you've detected the start and stop points of each note, you can now analyze each slice of WAV file data to determine the pitches.
Update 2: I just read your updated question. Pitch-detection through auto-correlation is much easier to implement than FFT if you're writing your own from scratch, but if you've already checked out and used a pre-built FFT library, you're better off using it for sure. Once you've identified the start and stop positions of each note (and included some padding at the beginning and end for the missed attack and release portions), you can now pull out each slice of audio data and pass it to an FFT function to determine the pitch.
One important point here is not to use a slice of the compressed audio data, but rather to use a slice of the original, unmodified data. The compression process distorts the audio and may produce an inaccurate pitch reading.
One last point about note attack times is that it may be less of a problem than you think. Often in music an instrument with a slow attack (like a soft synth) will begin a note earlier than a sharp attack instrument (like a piano) and both notes will sound as if they're starting at the same time. If you're playing instruments in this manner, the algorithm with pick up the same start time for both kinds of instruments, which is good from a WAV-to-MIDI perspective.
Last update (I hope): Forget what I said about including some paddings samples from the early attack part of each note - I forgot this is actually a bad idea for pitch detection. The attack portions of many instruments (especially piano and other percussive-type instruments) contain transients that aren't multiples of the fundamental pitch, and will tend to screw up pitch detection. You actually want to start each slice a little after the attack for this reason.
Oh, and kind of important: the term "compression" here does not refer to MP3-style compression.
Update again: here is a simple function that does non-dynamic compression:
public void StaticCompress(short[] samples, float param)
{
for (int i = 0; i < samples.Length; i++)
{
int sign = (samples[i] < 0) ? -1 : 1;
float norm = ABS(samples[i] / 32768); // NOT short.MaxValue
norm = 1.0 - POW(1.0 - norm, param);
samples[i] = 32768 * norm * sign;
}
}
When param = 1.0, this function will have no effect on the audio. Larger param values (2.0 is good, which will square the normalized difference between each sample and the max peak value) will produce more compression and a louder overall (but crappy) sound. Values under 1.0 will produce an expansion effect.
One other probably obvious point: you should record the music in a small, non-echoic room since echoes are often picked up by this algorithm as phantom notes.
Update: here is a version of StaticCompress that will compile in C# and explicity casts everything. This returns the expected result:
public void StaticCompress(short[] samples, double param)
{
for (int i = 0; i < samples.Length; i++)
{
Compress(ref samples[i], param);
}
}
public void Compress(ref short orig, double param)
{
double sign = 1;
if (orig < 0)
{
sign = -1;
}
// 32768 is max abs value of a short. best practice is to pre-
// normalize data or use peak value in place of 32768
double norm = Math.Abs((double)orig / 32768.0);
norm = 1.0 - Math.Pow(1.0 - norm, param);
orig = (short)(32768.0 * norm * sign); // should round before cast,
// but won't affect note onset detection
}
Sorry, my knowledge score on Matlab is 0. If you posted another question on why your Matlab function doesn't work as expected it would get answered (just not by me).
What you want to do is often called WAV-to-MIDI (google "wav-to-midi"). There have been many attempts at this process, with varying results (note onset is one of the difficulties; polyphony is much harder to deal with). I'd recommend starting with a thorough search of the off-the-shelf solutions, and only start work on your own if there's nothing acceptable out there.
The other part of the process you'd need is something to render the MIDI output as a traditional musical score, but there are umpteen billion products that do that.
Another answer is: yes, I've done a lot of digital signal processing (see the software on my website - it's an infinite-voice software synthesizer written in VB and C), and I'm interested in helping you with this problem. The WAV-to-MIDI part isn't really that difficult conceptually, it's just making it work reliably in practice that's hard. Note onset is just setting a threshold - errors can be easily adjusted forward or backward in time to compensate for note attack differences. Pitch detection is much easier to do on a recording than it is to do in real time, and involves just implementing an auto-correlation routine.
You should look at MIRToolbox - it is written for Matlab, and has an onset detector built in - it works pretty well. The source code is GPL'd, so you can implement the algorithm in whatever language works for you. What language is your production code going to use?
this library is centered around audio labeling:
aubio
aubio is a library for audio labelling. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio. The name aubio comes from 'audio' with a typo: several transcription errors are likely to be found in the results too.
and I have had good luck with it for onset detection and pitch detection. It's in c, but there is swig/python wrappers.
also, the author of the library has a pdf of his thesis on the page, which has great info and background about labeling.
Hard onsets are easily detected in the time domain by using an average energy measurement.
SUM from 0 to N (X^2)
Do this with chunks of the entire signal. You should see peaks when onsets occur (the window size is up to you, my suggestion is 50ms or more).
Extensive Papers on Onset Detection:
For Hardcore Engineers:
http://www.nyu.edu/classes/bello/MIR_files/2005_BelloEtAl_IEEE_TSALP.pdf
Easier for average person to understand:
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
You could try to transform the wav signal into a graph of amplitude against time. Then a way to determine a consistent onset is to calculate the intersection of a tangent in the inflection point of the rising flank of a signal with the x axis.
Related
I don't really properly know how to explain the exact thing I'm achieving. I need to track down the intensity of certain frequencies from an mp3 file for a certain number of times on android (Java) (or if it's possible on Dart (Flutter)). This is an image explaining what I mean:
I made this screenshot in Blender using Bake Sound to f-Curve Modifier, which is exactly what I'm trying to achieve, but it's written in C++, so the first thing I did was trying to get some hints from the source code but I couldn't manage to find anything.
As You can see it's not real-time but it's the value of a certain frequency (frequency Range in this case: 80-255Hz) over time.
zoomed Version
As you can see it's just a graph of the intensity of that frequency over time.
and the "divions" on X axis in this case are of about 180s/600 frames.
For what concerns the file format:
The input files are mp3s or wav.
For What Concerns the Language:
My main goal is to achieve this in java, but if it's possible, it would be nice to be done in Flutter (so Dart). I'm asking one of the two. If Flutter is impossible or too difficult, a java implementation is good anyway, I'm using a platform channel already on the application so isn't that much of a problem.
I've looked up online but the only tutorials I could find were real time examples that used FFT.
Better Explanation Of what I need:
I have a number of frames: Let's Say 300.
I need a function that is something like this:
List<Integer> calculateFrequencies(int number_of_frames, double freq_low,
freq_high, String FilePath){
List<Integer> result = new ArrayList<>();
double length = //Here I need to obtain the length of the song in units as small as possible, for example, milliseconds or nanoseconds
for(int v = 0, v < number_of_frames; v++){
double currFrame = lenght/300 * v;
double intensity = get_intensity(currFrame, freq_low, freq_high) //How Could I do this??
result.add(intensity);
}
return result;
}
How could I do this?
Is that possible? And if it is, is it possible in android too?
The Discrete Fourier Transform (for which FFT is the most commonly used algorithm) is exactly what you need since it will map your time domain samples to the frequency domain. Assuming you have the sound samples over a period of time and the frequency those were sampled with, you will be able to achieve your goal, it doesn't really matters if it is real time or not.
The second step would be to process the results from your FFT, which would produce the amplitudes of the frequencies present in your sample.
Try Github for Java based FFT implementations https://github.com/search?q=java+fft and you are likely to find also some examples there.
I can reliably get around 87% accuracy on my test data of 10,000 MNIST images, and about 98% accuracy on my training data. I developed the neural net from scratch alongside the development of a Matrix class to go along with it. Currently, I am using mini-batch stochastic gradient descent (mini batch of size 64) with a learning rate of 0.6. I am using a fixed learning rate which I heard is bad but I am a bit unsure of how best to incorporate a changing learning rate. This is an input -> hidden -> output layered MLP NN.
Currently, 5000 iterations are enough to get me around 70% accuracy (it also takes about 6 seconds) but if I want anything better, I have to iterate for 400k or so. I am keeping track of the average error of my output and graphing it on excel for every parameter change, and it always follows the model of dropping incredibly fast at first and then plateauing and having changes occur at MUCH larger intervals.
I want to implement momentum and a changing learning rate, but I am unfortunately a bit rusty with the notation of the math for backpropagation. I have looked at 20+ answers about implementing momentum but they all use the notation to describe it which would be fine if I understood it 100%. I get the idea behind using the past weight matrix to update the current, but my confusion comes into play when I am looking at where I would implement it in my specific code.
Here is the code for my forward and backpropagation, I would love any pointers about implementing momentum or a changing learning rate (or any suggestions at all about how to better improve my NN given the information I have provided). All the variables in forward prop are matricies (obvious but I just want to be clear)
/**
* forward propagate through the network
*/
private void forwardPropagation(Matrix inputBatch, Matrix outputBatch, boolean training) {
hiddenActivation = inputBatch.mult(inputToHiddenWeights);
hiddenActivation = hiddenActivation.sigmoidify();
outputActivation = hiddenActivation.mult(this.hiddenToOutputWeights);
outputActivation = outputActivation.sigmoidify();
if(training) {
backPropagation(inputBatch, outputBatch);
}
}
/**
* Perform backpropagation algorithm to update the weights
* and train the NN.
*/
private void backPropagation(Matrix inputBatch, Matrix outputBatch) {
// Compute gradient at output layer
outputErrorMatrix = outputBatch.sub(outputActivation);
// to keep track of the average error on every iteration (the data I plot)
this.avgErrorPerIteration[iterationToEpsilon] = outputErrorMatrix.averageValue();
// if the current error is less than a certain given error, exit and save weights
if(this.avgErrorPerIteration[iterationToEpsilon] < this.epsilon) {
this.lessThanEpsilon = true;
return;
}
// to print out the initial error value (I compare it to the end value)
if(iterationToEpsilon==0) {
System.out.println("Average error after first propagation: " + outputErrorMatrix.averageValue());
}
// compute slope at output and hidden layers
Matrix slopeOutput = outputActivation.sigmoidifyPrime();
Matrix slopeHiddenLayer = hiddenActivation.sigmoidifyPrime();
// compute delta at output layer
Matrix deltaOutput = (outputErrorMatrix.multAcross(slopeOutput)).mult(LEARNING_RATE);
// calculate error at hidden layer
Matrix hiddenError = deltaOutput.mult(hiddenToOutputWeights.transpose());
// compute delta at hidden layer
Matrix deltaHidden = hiddenError.multAcross(slopeHiddenLayer);
// update weight at both output and hidden layers
hiddenToOutputWeights = hiddenToOutputWeights.add(((hiddenActivation.transpose()).mult(deltaOutput)).mult(LEARNING_RATE));
inputToHiddenWeights = inputToHiddenWeights.add(((inputBatch.transpose()).mult(deltaHidden)).mult(LEARNING_RATE));
iterationToEpsilon++;
}
I don't think momentum or changing the learning rate will help you have a better test accuracy, as these two techniques only help the optmization algorithm and your optimization algorithm is already doing quite well (as it is only tries to reduce the error on the training set, and this error is very low).
Apparently your neural network is overfitting the training set, so one thing you can try is Dropout. An easier thing you can try is Weight decay. The point is, you need to regularize your network, as you are overfitting, and these two are regularization techniques, while the other two typically aren't.
I want to be able to detect a tone of a predetermined frequency using java. What I am doing is playing a tone (the frequency of the tone is variable by user input) and I am trying to detect if the tone is of a certain frequency. If it is, I execute a certain method. From what I have read I will need to us FFT, but I'm not sure how to implement it in java. There seems to be a lot of documentation for how to do it, but what documentation there is involves looking at an audio file rather than real time analysis. I don't need to save the audio to a file just determine if and when a tone of frequency x was recorded.
Ideally I would like to record at a sample rate of 44KHz and after determining if a tone was detected, determine when the tone was detected with an accuracy of +-3ms. However, an accuracy less than this would be acceptable as long as it isn't ridiculous (ie +100ms). I know roughly what I need to do from what I have looked up, but I need help tying it all together. Using pseudo code it would look roughly like this (I think)
Note that I know roughly within +-1s of when a tone of satisfying frequency maybe detected
for(i = 0, i < 440000 * 2, i++){//*2 because of expected appearance interval;may change
record sound sample
fft(sound sample)
if(frequencySoundSample > x){
do something
return
}
}
There will be considerable background noise while the tone is played. However the tone will have a very high frequency, like 15-22KHz, so it is my belief that by simply looking for when the recorder detects a very high frequency I can be sure it is my tone (also the tone will be played with a high amplitude for maybe .5s or 1s). I know that there will not be other high frequency sounds as background noise (I am expecting a background frequency high of maybe 5KHz).
I have two questions then. Is the pseudo code that I have provided sufficient for what I want to do? If it isn't or if there is a better way of doing this I'm all for it. Second, how would I implement this in java? I understand what I need to do, but I'm having trouble tying it all together. I'm pretty decent with java but I'm not familiar with the syntax involved with audio and I don't have any experience with fft. Please be explicit and give code with comments. I've been trying to figure this out for a while I just need to see it all tied together. Thank you.
EDIT
I understand that using a for loop like I have will not produce the frequency that I want. It was more to show roughly what I want. That is, recording, performing fft, and testing the frequency all at once as time progresses.
If you're just looking for a specific frequency then an FFT-based method is probably a bad choice for your particular application, for two reasons:
it's overkill - you're computing an entire spectrum just to detect the magnitude at one point
to get 3 ms resolution for your onset detection you'll need a large overlap between successive FFTs, which will require much more CPU bandwidth than just processing successive blocks of samples
A better choice for detecting the presence or absence of a single tone is the Goertzel algorithm (aka Goertzel filter). It's effectively a DFT evaluated at a single frequency domain bin, and is widely used for tone detection. It's much less computationally expensive than an FFT, very simple to implement, and you can test its output on every sample, so no resolution problem (other than those dictated by the laws of physics). You'll need to low pass filter the magnitude of the output and then use some kind of threshold detection to determine the onset time of your tone.
Note that there are a number of useful questions and answers on SO already about tone detection and using the Goertzel algorithm (e.g. Precise tone onset/duration measurement?) - I suggest reading these along with the Wikipedia entry as a good starting point.
Im actually working on a similar project with pitch detection, in Java as well. If you want to use FFT, you could do it with these steps. Java has a lot of libraries that can make this process easy for you.
First, you need to read in the sound file. This can be done using Java Sound. It's a built in library with functions that make it easy to record sound. Examples can be found here. The default sample rate is 44,100 KHz (CD quality). These examples can get you from playing the actual tone to a double array of bytes representing the tone.
Second, you should take the FFT with JTransforms. Here is an example of FFT being taken on a collection of samples.
FFT gives you an array twice the length of the array of samples you passed it. You need to go through the FFT array by two's, since each part of this array is represented as an imaginary and a real piece. Compute the magnitude of each part of this array with sqrt(im^2 + re^2). Then, find which magnitude is the largest. The index of that magnitude corresponds to the frequency you're looking for.
Keep in mind, you don't take FFT on the entire portion of sound. You break the sound up into chunks, and FFT each one. The chunks can overlap for higher accuracy, but that shouldn't be a problem, since you're just looking for a predetermined note. If you want to improve performance, you can also window each chunk before doing this.
Once you have all the FFTs, they should confirm a certain frequency, and you can check that against the note you want.
If you want to try and visualize this, I'd suggest using JFreeChart. It's another library that makes it easy to graph things.
I am working on a small example application for my fourth year project (dealing with Functional Reactive Programming). The idea is to create a simple program that can play a .wav file and then shows a 'bouncing' animation of the current volume of the playing song (like in audio recording software). I'm building this in Scala so have mainly been looking at Java libraries and existing solutions.
Currently, I have managed to play a .wav file easily but I can't seem to achieve the second goal. Basically is there a way I can decode a .wav file so I can have someway of accessing
the 'volume' at any given time? By volume I think I means its amplitude but I may be wrong about this - Higher Physics was a while ago....
Clearly, I don't know much about this at all so it would be great if someone could point me in the right direction!
In digital audio processing you typically refer to the momentary peak amplitude of the signal (this is also called PPM -- peak programme metering). Depending on how accurate you want to be or if you wish to model some standardised metering or not, you could either
just use a sliding window of sample frames (find the maximum absolute value per window)
implement some sort of peak-hold mechanism that retains the last peak value for a given duration and then start to have the value 'fall' by a given amount of decibels per second.
The other measuring mode is RMS which is calculated by integrating over a certain time window (add the squared sample values, divide by the window length, and take the square-root, thus root-mean-square RMS). This gives a better idea of the 'energy' of the signal, moving smoother than peak measurements, but not capturing the maximum values observed. This mode is sometimes called VU meter as well. You can approximate this with a sort of lagging (lowpass) filter, e.g. y[i] = y[i-1]*a + |x[i]|*(a-1), for some value 0 < a < 1
You typically display the values logarithmically, i.e. in decibels, as this corresponds better with our perception of signal strength and also for most signals produces a more regular coverage of your screen space.
Three projects I'm involved with may help you:
ScalaAudioFile which you can use to read the sample frames from an AIFF or WAVE file
ScalaAudioWidgets which is a still young and incomplete project to provide some audio application widgets on top of scala-swing, including a PPM view -- just use a sliding window and set the window's current peak value (and optionally RMS) at a regular interval, and the view will take care of peak-hold and fall times
(ScalaCollider, a client for the SuperCollider sound synthesis system, which you might use to play back the sound file and measure the peak and RMS amplitudes in real time. The latter is probably an overkill for your project and would involve some serious learning curve if you have never heard of SuperCollider. The advantage would be that you don't need to worry about synchronising your sound playback with the meter display)
In a wav file, the data at a given point in the stream IS the volume (shifted by half of the dynamic range). In other words, if you know what type of wav file (for example 8 bit, mono) each byte represents a single sample. If you know the sample rate (say 44100 HZ) then multiply the time by 44100 and that is the byte you want to look at.
The value of the byte is the volume (distance from the middle.. 0 and 255 are the peaks, 127 is zero). This is assuming that the encoding is not mu-law encoding. I found some good info on how to tell the difference, or better yet, convert between these formats here:
http://www.gnu.org/software/octave/doc/interpreter/Audio-Processing.html
You may want to average these samples though over a window of some fixed number of samples.
The Dell Streak has been discovered to have an FM radio which has very crude controls. 'Scanning' is unavailable by default, so my question is does anyone know how, using Java on Android, one might 'listen' to the FM radio as we iterate up through the frequency range detecting white noise (or a good signal) so as to act much like a normal radio's seek function?
I have done some practical work on this specific area, i would recommend (if you have a little time for it) to try just a little experimentation before resorting to fft'ing. The pcm stream can be interpreted very complexely and subtly (as per high quality filtering and resampling) but can also be practically treated for many purposes as the path of a wiggly line.
White noise is unpredictable shaking of the line, which is never-the-less quite continuous in intensity (rms, absolute mean..) Acoustic content is recurrent wiggling and occasional surprises (jumps, leaps) :]
Non-noise like content of a signal may be estimated by performing quick calculations on a running window of the pcm stream.
For example, noise will strongly tend to have a higher value for the absolute integral of its derivative, than non-noise. I think that is the academic way of saying this:
loop(n+1 to n.length)
{ sumd0+= abs(pcm[n]);
sumd1+= abs(pcm[n]-pcm[n-1]);
}
wNoiseRatio = ?0.8; //quite easily discovered, bit tricky to calculate.
if((sumd1/sumd0)<wNoiseRatio)
{ /*not like noise*/ }
Also, the running absolute average over ~16 to ~30 samples of white noise will tend to vary less, over white noise than acoustic signal:
loop(n+24 to n.length-16)
{ runAbsAve1 += abs(pcm[n]) - abs(pcm[n-24]); }
loop(n+24+16 to n.length)
{ runAbsAve2 += abs(pcm[n]) - abs(pcm[n-24]); }
unusualDif= 5; //a factor. tighter values for longer measures.
if(abs(runAbsAve1-runAbsAve2)>(runAbsAve1+runAbsAve2)/(2*unusualDif))
{ /*not like noise*/ }
This concerns how white noise tends to be non-sporadic over large enough span to average out its entropy. Acoustic content is sporadic (localised power) and recurrent (repetitive power).
The simple test reacts to acoustic content with lower frequencies and could be drowned out by high frequency content. There are simple to apply lowpass filters which could help (and no doubt other adaptions).
Also, the root mean square can be divided by the mean absolute sum providing another ratio which should be particular to white noise, though i cant figure what it is right now. The ratio will also differ for the signals derivatives as well.
I think of these as being simple formulaic signatures of noise. I'm sure there are more..
Sorry to not be more specific, it is fuzzy and imprecise advice, but so is performing simple tests on the output of an fft. For better explaination and more ideas perhaps check out statistical and stochastic(?) measurements of entropy and randomness on wikipedia etc.
Use a Fast Fourier Transform.
This is what you can use a Fast Fourier Transform for. It analyzes the signal and determines the strength of the signal at various frequencies. If there's a spike in the FFT curve at all, it should indicate that the signal is not simply white noise.
Here is a library which supports FFT's. Also, here is a blog with source code in case you want to learn about what the FFT does.
If you don't have FFT tools available, just a wild suggestion:
Try to compress a few milliseconds of audio.
A typical feature of noise is that it compresses much less than clear signal.
As far as I know there is no API or even drivers for the FM Radio in the Android SDK and unless Dell releases one you will have to roll your own. It's actually even worse than that. All(?) new chipsets has FM Radio but not all phones has an FM Radio application.
The old Windows Mobile had the same problem.
For white noise detection you need to do FFT and see that it has more or less continious spectrum. But recording from FM might be a problem.
Just high pass filtering it will give a good idea, and has sometimes been used for squelch on fm radios.
Note that this is comparable to what the derivative suggestion was getting at - taking the derivative is a simple form of high pass filter, and taking the absolute value of that a crude way of measuring power.
Do you have a subscription to the IEEE Xplore library? There are countless papers (one picked at random) on this very topic.
A very simplistic method would be to observe the "flatness" of the power spectral density. One could take this by using a Fast Fourier Transform of the signal in the time domain and find the standard deviation of the spectral density. If it is below some threshold, you have your white noise.
The main question here is: what type of signal do you have access to?
I bet you don't have direct access to the analog EM signal directly. So no use of FFT on this signal possible. You can't also try to build a phased-lock loop, which is the way your standard old radio tuner works ("Scanning" in your case).
Your only option is indeed to pick one frequency and listen too it (and try do detect when it's noise with FFT on sound). You might even only have access to the FFTed signal.
Problem here: If you want to detect a potential frequency using white noise you will pick up signals too easily.
Anyway, here is what I would try to do with this strategy:
Double integrate the autocorrelation of the spectral density over a fraction of a second of audio. And this for each frequency.
Then look for a FM frequency where this number is maxed.
Little explanation here:
Spectral density gives you a signal which most used frequencies are maxed.
If a bit of time later if the same frequencies are maxed then you have some supposedly clear audio. You get this by integrating the autocorrelation the spectral density for one audio frequency for a fraction of a second (using some function that grows larger than linear might also work)
You then just have to integrate this for all audio frequencies
Also be careful to normalize the integrals: a loud white noise signal should not get a higher score than a clear but low audio signal.
Several people have mentioned the FFT, which you'll want to do, but to then detect white noise you need to make sure that the magnitude is relatively constant over the range of audio frequencies. You'll want to look at magnitudes only, you can throw away the phases. You can compute an average and standard deviation for the magnitudes in O(N) time. For white noise, you should find the standard deviation to be a relatively small fraction of the average. If I remember my statistics right, it should be about (1/sqrt(N)) of the average.