How to create sound clips when silent?

How to create sound clips when silent? - java

So I am writing a program that splits an audio clip up into multiple parts when no sound is playing. So that all the clips created from the sound file only contain sections with sound. How would I accomplish this using Java? I plan on using FLAC, but the program currently supports WAV as well. Would RMS be the best way to determine this? Bonus points for any code.

You can roughly approximate the 'loudness' of a sampled waveform by averaging the squares of the differences between samples n and n+1. This will give you a rough indicator of how "loud" these samples will appear to the hearer.
The method is more sensitive to high frequencies than low ones, thats why it can be off quite a bit if the sound has a very extreme frequency distribution.
For a precise solution you will need to take the FFT approach and also correct the extracted frequencies weighting by a model representing the hearers ear (not all frequencies feel equally loud at the same DB level).

Related

How to recognize the sound frequency from a .WAV file in Java

AudioInputStream stream = AudioSystem.getAudioInputStream(new File("file_a4.wav"));
I am looking for a way to recognise the frequency of a musical scale sound (e.g. A4 = 440 Hz) recorded on a .wav file. I have read a lot about FFT, but it has been suggested that the frequencies on the musical scale do not match the FFT.
I have also heard about DTFT. What should I use to recognise the frequency from a sound file?

What I understand from your question is that you want to recognize the musical note/s an instrument is playing in a wav file. If that is the case, there are several algorithms for doing that, and you could always train a neural network for doing that too.
Some important Things to take into account are:
Any instrument (the same would happen for the musical sounds produced by the human voice) has its own particular "color" when producing a note. This color is called the timbre (https://en.wikipedia.org/wiki/Timbre), and is composed by the harmonic and inharmonic frequencies that surround the frequency you psychoacoustically perceive when listening to that specific note. This is why you cannot just look for the peak of an FFT to detect the musical note, and it is also the reason why a piano sounds different than a guitar when playing the same note.
The analysis of an audio signal is often performed by windowing the signal and calculating the DFT of the windowed part of the signal. Each window would then produce its own spectrum, and it s from the analysis of each individual spectrum and/or the analysis of how they interact that you (or your CNN, for example) will obtain your conclusions/results. This process of windowing the signal and calculating the DFTs produces a spectogram (https://en.wikipedia.org/wiki/Spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,sonographs%2C%20voiceprints%2C%20or%20voicegrams.)
After that short introduction, here are some simple algorithms for identifying single notes in a wav file. You will be able to find implementations of those algorithms on the internet, and many others. The detection of the notes produced by chords is more complex but can be done with other algorithms or neural networks.
On the use of autocorrelation analysis for pitch detection: https://ieeexplore.ieee.org/document/1162905
YIN algorithm: http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

Distinguish notes FFT algorithm

What I am trying to achieve is distinguishing separate notes in an audio file. For simplicity lets say that a couple of notes are played one after another. The main question is how to determine when the next note is played ?
What I have already done is reading samples from an audio file and doing Fourier Transform on those samples using JTransforms library. Here's what I get:
.
Then I have calculated spectrum based on the data that FFT returned, and here's what I get:
.
As I understand the bigger "columns" on the chart are the harmonics, and the small ones are noise and other non-harmonic overtones, right ?
After that I have tried to do same process with audio file with two notes played one after another, but the result was kind of the same.
As a side question, do any of you know some lightweight and fast libraries for visualizing such data ? Because using JFreeChart for bigger datasets is a real pain for my processor.

To detect successive frequency bursts of different frequencies, and some of their time domain parameters, one can use overlapping short FFT windows (length shorter than the expected burst length) and look for where the relative magnitudes of frequency peaks swap order, or fall above/below thresholds. If you know, a priori, the frequencies involved, you can use the Goertzel filters instead of FFTs, with sliding windows or successive approximation in time for finer time domain granularity.
For pitched notes (such as music), one can do something similar, except using a pitch detection/estimation method (instead of simple FFT magnitudes, which are not reliable) on sufficiently short time domain windows of data.

record audio in java and determine real time if a tone of x frequency was played if so do something

I want to be able to detect a tone of a predetermined frequency using java. What I am doing is playing a tone (the frequency of the tone is variable by user input) and I am trying to detect if the tone is of a certain frequency. If it is, I execute a certain method. From what I have read I will need to us FFT, but I'm not sure how to implement it in java. There seems to be a lot of documentation for how to do it, but what documentation there is involves looking at an audio file rather than real time analysis. I don't need to save the audio to a file just determine if and when a tone of frequency x was recorded.
Ideally I would like to record at a sample rate of 44KHz and after determining if a tone was detected, determine when the tone was detected with an accuracy of +-3ms. However, an accuracy less than this would be acceptable as long as it isn't ridiculous (ie +100ms). I know roughly what I need to do from what I have looked up, but I need help tying it all together. Using pseudo code it would look roughly like this (I think)
Note that I know roughly within +-1s of when a tone of satisfying frequency maybe detected
for(i = 0, i < 440000 * 2, i++){//*2 because of expected appearance interval;may change
record sound sample
fft(sound sample)
if(frequencySoundSample > x){
do something
return
}
}
There will be considerable background noise while the tone is played. However the tone will have a very high frequency, like 15-22KHz, so it is my belief that by simply looking for when the recorder detects a very high frequency I can be sure it is my tone (also the tone will be played with a high amplitude for maybe .5s or 1s). I know that there will not be other high frequency sounds as background noise (I am expecting a background frequency high of maybe 5KHz).
I have two questions then. Is the pseudo code that I have provided sufficient for what I want to do? If it isn't or if there is a better way of doing this I'm all for it. Second, how would I implement this in java? I understand what I need to do, but I'm having trouble tying it all together. I'm pretty decent with java but I'm not familiar with the syntax involved with audio and I don't have any experience with fft. Please be explicit and give code with comments. I've been trying to figure this out for a while I just need to see it all tied together. Thank you.
EDIT
I understand that using a for loop like I have will not produce the frequency that I want. It was more to show roughly what I want. That is, recording, performing fft, and testing the frequency all at once as time progresses.

If you're just looking for a specific frequency then an FFT-based method is probably a bad choice for your particular application, for two reasons:
it's overkill - you're computing an entire spectrum just to detect the magnitude at one point
to get 3 ms resolution for your onset detection you'll need a large overlap between successive FFTs, which will require much more CPU bandwidth than just processing successive blocks of samples
A better choice for detecting the presence or absence of a single tone is the Goertzel algorithm (aka Goertzel filter). It's effectively a DFT evaluated at a single frequency domain bin, and is widely used for tone detection. It's much less computationally expensive than an FFT, very simple to implement, and you can test its output on every sample, so no resolution problem (other than those dictated by the laws of physics). You'll need to low pass filter the magnitude of the output and then use some kind of threshold detection to determine the onset time of your tone.
Note that there are a number of useful questions and answers on SO already about tone detection and using the Goertzel algorithm (e.g. Precise tone onset/duration measurement?) - I suggest reading these along with the Wikipedia entry as a good starting point.

Im actually working on a similar project with pitch detection, in Java as well. If you want to use FFT, you could do it with these steps. Java has a lot of libraries that can make this process easy for you.
First, you need to read in the sound file. This can be done using Java Sound. It's a built in library with functions that make it easy to record sound. Examples can be found here. The default sample rate is 44,100 KHz (CD quality). These examples can get you from playing the actual tone to a double array of bytes representing the tone.
Second, you should take the FFT with JTransforms. Here is an example of FFT being taken on a collection of samples.
FFT gives you an array twice the length of the array of samples you passed it. You need to go through the FFT array by two's, since each part of this array is represented as an imaginary and a real piece. Compute the magnitude of each part of this array with sqrt(im^2 + re^2). Then, find which magnitude is the largest. The index of that magnitude corresponds to the frequency you're looking for.
Keep in mind, you don't take FFT on the entire portion of sound. You break the sound up into chunks, and FFT each one. The chunks can overlap for higher accuracy, but that shouldn't be a problem, since you're just looking for a predetermined note. If you want to improve performance, you can also window each chunk before doing this.
Once you have all the FFTs, they should confirm a certain frequency, and you can check that against the note you want.
If you want to try and visualize this, I'd suggest using JFreeChart. It's another library that makes it easy to graph things.

Representing music audio samples in terms of dB? [duplicate]

This question already has answers here:
Detect silence when recording
(2 answers)
Closed 9 years ago.
I am starting a project which would allow me to use Java to read sound samples, and depending on the properties of each sample (I'm thinking focusing on decibels at the moment for the sake of simplification, or finding some way to compute the overall 'volume' of a specific sample or set of samples), return a value from 0-255 where 0 would be silence and 255 would be the highest sound pressure (Compared to a reference point, I suppose? I have no idea how to word this). I want to then have these values returned as bytes and sent to an Arduino in order to control the intensity of LED's using PWM, and visually 'see' the music.
I am not any sort of audio file format expert, and have no particular understanding of how the data is stored in a music file. As such, I am having trouble finding out how to read a sample and find a way to represent its overall volume level as a byte. I have looked through the javax.sound.sampled package and it is all very confusing to me. Any insight as to how I could accomplish this would be greatly appreciated.

First i suggest you to read Pulse-code modulation which is the format use to store data on a .wav file (the simplest to begin with).
Next there is a post on how to get PCM data from a wav file in java here.
Finally to get the "volume" (which is actually more the energy) apply this energy equation.
wish it could help you,

As Bastyen (+1 from me) indicates, calculating decibels is actually NOT simple, but requires looking at a large number of samples. However, since sound samples run MUCH more frequently than visual frames in an animation, making an aggregate measure works out rather neatly.
A nice visual animation rate, for example, updates 60 times per second, and the most common sampling rate for sound is 44100 times per second. So, 735 samples (44100 / 60 = 735) might end up being a good choice for interfacing with a visualizer.
By the way, of all the official Java tutorials I've read (I am a big fan), I have found the ones that accompany the javax.sound.sampled to be the most difficult. http://docs.oracle.com/javase/tutorial/sound/TOC.html
But they are still worth reading. If I were in charge of a rewrite, there would be many more code examples. Some of the best code examples are in several sections deep, e.g., the "Using Files and Format Converters" discussion.
If you don't wish to compute the RMS, a hack would be to store the local high and/or low value for the given number of samples. Relating these numbers to decibels would be dubious, but MAYBE could be useful after giving it a mapping of your choice to the visualizer. Part of the problem is that values for a single point on given wave can range wildly. The local high might be more due to the phase of the constituent harmonics happening to line up than about the energy or volume.
Your PCM top and bottom values would probably NOT be 0 and 256, more likely -128 to 127 for 8-bit encoding. More common still is 16-bit encoding (-32768 to 32767). But you will get the hang of this if you follow Bastyen's links. To make your code independent of the bit-encoding, you would likely normalize the data (convert to floats between -1 and 1) before doing any other calculations.

Can you programmatically detect white noise?

The Dell Streak has been discovered to have an FM radio which has very crude controls. 'Scanning' is unavailable by default, so my question is does anyone know how, using Java on Android, one might 'listen' to the FM radio as we iterate up through the frequency range detecting white noise (or a good signal) so as to act much like a normal radio's seek function?

I have done some practical work on this specific area, i would recommend (if you have a little time for it) to try just a little experimentation before resorting to fft'ing. The pcm stream can be interpreted very complexely and subtly (as per high quality filtering and resampling) but can also be practically treated for many purposes as the path of a wiggly line.
White noise is unpredictable shaking of the line, which is never-the-less quite continuous in intensity (rms, absolute mean..) Acoustic content is recurrent wiggling and occasional surprises (jumps, leaps) :]
Non-noise like content of a signal may be estimated by performing quick calculations on a running window of the pcm stream.
For example, noise will strongly tend to have a higher value for the absolute integral of its derivative, than non-noise. I think that is the academic way of saying this:
loop(n+1 to n.length)
{ sumd0+= abs(pcm[n]);
sumd1+= abs(pcm[n]-pcm[n-1]);
}
wNoiseRatio = ?0.8; //quite easily discovered, bit tricky to calculate.
if((sumd1/sumd0)<wNoiseRatio)
{ /*not like noise*/ }
Also, the running absolute average over ~16 to ~30 samples of white noise will tend to vary less, over white noise than acoustic signal:
loop(n+24 to n.length-16)
{ runAbsAve1 += abs(pcm[n]) - abs(pcm[n-24]); }
loop(n+24+16 to n.length)
{ runAbsAve2 += abs(pcm[n]) - abs(pcm[n-24]); }
unusualDif= 5; //a factor. tighter values for longer measures.
if(abs(runAbsAve1-runAbsAve2)>(runAbsAve1+runAbsAve2)/(2*unusualDif))
{ /*not like noise*/ }
This concerns how white noise tends to be non-sporadic over large enough span to average out its entropy. Acoustic content is sporadic (localised power) and recurrent (repetitive power).
The simple test reacts to acoustic content with lower frequencies and could be drowned out by high frequency content. There are simple to apply lowpass filters which could help (and no doubt other adaptions).
Also, the root mean square can be divided by the mean absolute sum providing another ratio which should be particular to white noise, though i cant figure what it is right now. The ratio will also differ for the signals derivatives as well.
I think of these as being simple formulaic signatures of noise. I'm sure there are more..
Sorry to not be more specific, it is fuzzy and imprecise advice, but so is performing simple tests on the output of an fft. For better explaination and more ideas perhaps check out statistical and stochastic(?) measurements of entropy and randomness on wikipedia etc.

Use a Fast Fourier Transform.
This is what you can use a Fast Fourier Transform for. It analyzes the signal and determines the strength of the signal at various frequencies. If there's a spike in the FFT curve at all, it should indicate that the signal is not simply white noise.
Here is a library which supports FFT's. Also, here is a blog with source code in case you want to learn about what the FFT does.

If you don't have FFT tools available, just a wild suggestion:
Try to compress a few milliseconds of audio.
A typical feature of noise is that it compresses much less than clear signal.

As far as I know there is no API or even drivers for the FM Radio in the Android SDK and unless Dell releases one you will have to roll your own. It's actually even worse than that. All(?) new chipsets has FM Radio but not all phones has an FM Radio application.
The old Windows Mobile had the same problem.

For white noise detection you need to do FFT and see that it has more or less continious spectrum. But recording from FM might be a problem.

Just high pass filtering it will give a good idea, and has sometimes been used for squelch on fm radios.
Note that this is comparable to what the derivative suggestion was getting at - taking the derivative is a simple form of high pass filter, and taking the absolute value of that a crude way of measuring power.

Do you have a subscription to the IEEE Xplore library? There are countless papers (one picked at random) on this very topic.
A very simplistic method would be to observe the "flatness" of the power spectral density. One could take this by using a Fast Fourier Transform of the signal in the time domain and find the standard deviation of the spectral density. If it is below some threshold, you have your white noise.

The main question here is: what type of signal do you have access to?
I bet you don't have direct access to the analog EM signal directly. So no use of FFT on this signal possible. You can't also try to build a phased-lock loop, which is the way your standard old radio tuner works ("Scanning" in your case).
Your only option is indeed to pick one frequency and listen too it (and try do detect when it's noise with FFT on sound). You might even only have access to the FFTed signal.
Problem here: If you want to detect a potential frequency using white noise you will pick up signals too easily.
Anyway, here is what I would try to do with this strategy:
Double integrate the autocorrelation of the spectral density over a fraction of a second of audio. And this for each frequency.
Then look for a FM frequency where this number is maxed.
Little explanation here:
Spectral density gives you a signal which most used frequencies are maxed.
If a bit of time later if the same frequencies are maxed then you have some supposedly clear audio. You get this by integrating the autocorrelation the spectral density for one audio frequency for a fraction of a second (using some function that grows larger than linear might also work)
You then just have to integrate this for all audio frequencies
Also be careful to normalize the integrals: a loud white noise signal should not get a higher score than a clear but low audio signal.

Several people have mentioned the FFT, which you'll want to do, but to then detect white noise you need to make sure that the magnitude is relatively constant over the range of audio frequencies. You'll want to look at magnitudes only, you can throw away the phases. You can compute an average and standard deviation for the magnitudes in O(N) time. For white noise, you should find the standard deviation to be a relatively small fraction of the average. If I remember my statistics right, it should be about (1/sqrt(N)) of the average.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.