Smooth sound seek

Smooth sound seek - java

I have a not so simple question about Java Sound ( javax.sound package ).
I am implementing MP3 player with cross fade and smooth volume and seek controls.
I am reading sound as stream in 4096byte chunks and calculate the position in miliseconds manually.
When I want to seek() ( change base position from where the stream will be red ) I hear a really ugly "jump" in sound wave. I tried examining JLayer and other MP3 APIs but they don't have a seek() function at all or they have this "ugly sound jump" too.
My question is: How can I make this jump from one sound wave chunk to the other smoother? I tried interpolation but a reasonable ammount of time to "not-hear the jump" is 300ms and thats too long for seek() function.
Have you encountered this problem?
Do you know the solution?
I will paste a code sample here just to be sure.
public void seek( long pPosition )
{
sourceDataLine.flush();
seekIndex = ( sourceDataLine.getMicrosecondPosition() / 1000 ) - currentPositionInMilliseconds;
}
public long getPositionInMilliseconds()
{ return ( sourceDataLine.getMicrosecondPosition() / 1000 ) - seekIndex; }
the "position in milliseconds" is needed because of DataLine API of javax.sound
Thanks I'm frustrated...

You can't really create a smooth transition if the chunks you want to transition are too short for cross-fading, but you can eliminate the worst of the artifacts from the boundaries.
The bad artifact Im refering to often sounds like a click or pop, but if there are many in short succession it might sound like a thrashing sound or it may even introduce a specific pitch of its own, if the intervals are regular. This kind of artifact is a result of creating arbitrary blocks of audio because the amplitude of the audio at the boundaries may jump from one block to the next, or from the end of the block to silence. There are a few ways to eliminate it, the most common of which is to move the boundary from the arbitrary location to the nearest 'zero crossing' so that there is no longer a jump or discontinuity. Alternatively, since your blocks are rot on top of each other, you could do something to find some place where the value of the blocks cross each other, preferably going in the same direction.

The only way I know to do this is working directly on the data at the per-frame level. You have to "open up" the sounds to get at the bytes and directly do your computations. Most built in Java controls have a granularity that is hindered by the size of the buffer, i.e., can only process one volume change, in effect, per sound data buffer.
Even when you are working at the per-frame level, there are problems to surmount with Java's lack of real time guarantees. But they are surmountable.
I made a "clip slicer," for example, that uses the equivalent of a clip as source sound. It takes random slices of the sample and strings them together. As little as 16 frames of overlapping interpolation works to keep the sound flowing smoothly. Using 1/10th of a second slices with 16-frame overlaps worked well for making an endlessly streaming brook from a 4-second recording.
I made a Theremin that takes mouse-motion listener locations for volume and pitch. I got it to work quite smoothly with about 30 or 40 frame latency. The trick was time-stamping the mouse-motion-listener outputs, and basing the controls on the calculations made on that data, as the events do not arrive or get processed smoothly in real time, creating zippering or other discontinuities.
Another thing to consider, the range on the data does not map well to decibels. So a small volume differential at the low end is much more discontinuous (and prone to clicks) than the same volume interval at the high end. I solved this by making a mapping of the audio data to decibel volumes, and powering the amount of volume change based on the amplitude mapping. I hope some of these ideas prove helpful!

Related

Making a sound file with varying playback speed in Java

I have managed to play a sound file with a different speed using answers from here, but I need to be able to adjust the speed as it plays. There's two methods I've thought of using. The first is to split the audio file into short clips and play each one after the last ends. I haven't tried that yet, but it seems like it could easily end with the file playing over itself or having short gaps.
The other method is to take the original file as a stream and then make a stream using that that speeds it up or slows it down as needed. This seems like it would work well, but in order to construct an AudioInputStream, I either need an InptutStream of known length, which is impossible to figure out ahead of time, or a TargetDataLine, which is an interface that has way more methods than I'd care to implement.
Is there a better way of doing this? Also, why does AudioInputStream need to know the length of the stream?
Alternately, is there an external library I could use?

If you are simply playing back an audio file (e.g., a .wav) and are okay with the pitch of the sound being shifted, a simple possibility is to read the data from an AudioInputStream, translate to PCM, interpolate though that data at the desired rate, translate back to bytes an ship out via a SourceDataLine.
To speed up or slow down in real time, loosely couple inputs to the variable holding the increment being used to progress through the incoming frames. To minimize discontinuities, you can smooth out the transitions from one pitch to another over a given number of frames.
This is done to achieve real-time frequency changes in the open source library AudioCue, on github. Smoothing there between frequency changes is set to occur over 1028 frames (approx 1/40th of a second). But quicker changes are certainly possible. The sound data in that library is take from an internal float array of PCM values. But a good example of code needed to read the data as a line rather than a fixed array can be seen in the first code example in the Sound Trail, Using File Filters and Converters. You might be wanting to use an InputStream as the argument for the AudioInputStream. At the point in the example where it says "Here, do something useful.." you would convert to PCM and then cursor through the resulting PCM with the desired frequency rate, using linear interpolation, and then repackage and send out via a SourceDataLine.
If you wish to preserve pitch (time stretch or compress only) then this starts to require more heavy duty DSP. This thread at the StackExchange Digital Processing site has some info on that. I've had some success with making granules with a Hamming Window to aid cross-fading between them, but some of the other solutions were over my head (and I haven't been back to this problem in a long while). But it was possible to change the spacing of the granules in real time, if I remember correctly. Didn't sound as good as the Audacity tool's algorithm, though, but that's probably more on me than not. I'm pretty much self-taught and experimenting, not working in the field professionally.

(I believe Phil's answer will get you going nicely. I'm just posting this to add my two cents about resampling.)
Short answer: Create an AudioInputStream that either drops samples or adds zero samples. As length you can set AudioSystem.NOT_SPECIFIED.
Long answer: If you add zero samples, you might want to interpolate, but not linearly. The reason you have to interpolate for upsampling is aliasing, which you might want to avoid. You do so, by applying a lowpass filter. The reason for this is simple. The Nyquist-Shannon theorem states that when a signal is sampled at X Hz, you can only unambiguously represent frequencies up to X/2 Hz. When you upsample, you increase the sample frequency, so in theory you can represent a larger frequency range. Indeed, when simply adding zeros you see some energy in those additional frequency ranges—which shouldn't be there, because you have no information about it. So you need to "cut them off" using a low pass filter. More about upsampling can be found on Wikipedia.
Long story short, there is a proper way to do it. You seem to be OK with distortions, so doing it the right way may not be necessary, but a waste of time.
Shameless plug: If you nevertheless want to do it somewhat right, you might find the Resample class of jipes useful. It's not a universal resampler, i.e., it only supports a limited number of factors, like 2, 4, ..., but it may prove useful for you.
import com.tagtraum.jipes.math.MultirateFilters.Resampler;
[...]
float[] original = ... ; // original signal as float
Resampler downsampler2 = new MultirateFilters.Resampler(1, 2);
float[] downsampled = downsampler2.map(original);
Resampler upsampler2 = new MultirateFilters.Resampler(2, 1);
float[] upsampled = upsampler2.map(original);
If you want to time-scale modification (TSM), i.e., changing the tempo without changing the frequencies, you might want to use Rubberband for Java.

record audio in java and determine real time if a tone of x frequency was played if so do something

I want to be able to detect a tone of a predetermined frequency using java. What I am doing is playing a tone (the frequency of the tone is variable by user input) and I am trying to detect if the tone is of a certain frequency. If it is, I execute a certain method. From what I have read I will need to us FFT, but I'm not sure how to implement it in java. There seems to be a lot of documentation for how to do it, but what documentation there is involves looking at an audio file rather than real time analysis. I don't need to save the audio to a file just determine if and when a tone of frequency x was recorded.
Ideally I would like to record at a sample rate of 44KHz and after determining if a tone was detected, determine when the tone was detected with an accuracy of +-3ms. However, an accuracy less than this would be acceptable as long as it isn't ridiculous (ie +100ms). I know roughly what I need to do from what I have looked up, but I need help tying it all together. Using pseudo code it would look roughly like this (I think)
Note that I know roughly within +-1s of when a tone of satisfying frequency maybe detected
for(i = 0, i < 440000 * 2, i++){//*2 because of expected appearance interval;may change
record sound sample
fft(sound sample)
if(frequencySoundSample > x){
do something
return
}
}
There will be considerable background noise while the tone is played. However the tone will have a very high frequency, like 15-22KHz, so it is my belief that by simply looking for when the recorder detects a very high frequency I can be sure it is my tone (also the tone will be played with a high amplitude for maybe .5s or 1s). I know that there will not be other high frequency sounds as background noise (I am expecting a background frequency high of maybe 5KHz).
I have two questions then. Is the pseudo code that I have provided sufficient for what I want to do? If it isn't or if there is a better way of doing this I'm all for it. Second, how would I implement this in java? I understand what I need to do, but I'm having trouble tying it all together. I'm pretty decent with java but I'm not familiar with the syntax involved with audio and I don't have any experience with fft. Please be explicit and give code with comments. I've been trying to figure this out for a while I just need to see it all tied together. Thank you.
EDIT
I understand that using a for loop like I have will not produce the frequency that I want. It was more to show roughly what I want. That is, recording, performing fft, and testing the frequency all at once as time progresses.

If you're just looking for a specific frequency then an FFT-based method is probably a bad choice for your particular application, for two reasons:
it's overkill - you're computing an entire spectrum just to detect the magnitude at one point
to get 3 ms resolution for your onset detection you'll need a large overlap between successive FFTs, which will require much more CPU bandwidth than just processing successive blocks of samples
A better choice for detecting the presence or absence of a single tone is the Goertzel algorithm (aka Goertzel filter). It's effectively a DFT evaluated at a single frequency domain bin, and is widely used for tone detection. It's much less computationally expensive than an FFT, very simple to implement, and you can test its output on every sample, so no resolution problem (other than those dictated by the laws of physics). You'll need to low pass filter the magnitude of the output and then use some kind of threshold detection to determine the onset time of your tone.
Note that there are a number of useful questions and answers on SO already about tone detection and using the Goertzel algorithm (e.g. Precise tone onset/duration measurement?) - I suggest reading these along with the Wikipedia entry as a good starting point.

Im actually working on a similar project with pitch detection, in Java as well. If you want to use FFT, you could do it with these steps. Java has a lot of libraries that can make this process easy for you.
First, you need to read in the sound file. This can be done using Java Sound. It's a built in library with functions that make it easy to record sound. Examples can be found here. The default sample rate is 44,100 KHz (CD quality). These examples can get you from playing the actual tone to a double array of bytes representing the tone.
Second, you should take the FFT with JTransforms. Here is an example of FFT being taken on a collection of samples.
FFT gives you an array twice the length of the array of samples you passed it. You need to go through the FFT array by two's, since each part of this array is represented as an imaginary and a real piece. Compute the magnitude of each part of this array with sqrt(im^2 + re^2). Then, find which magnitude is the largest. The index of that magnitude corresponds to the frequency you're looking for.
Keep in mind, you don't take FFT on the entire portion of sound. You break the sound up into chunks, and FFT each one. The chunks can overlap for higher accuracy, but that shouldn't be a problem, since you're just looking for a predetermined note. If you want to improve performance, you can also window each chunk before doing this.
Once you have all the FFTs, they should confirm a certain frequency, and you can check that against the note you want.
If you want to try and visualize this, I'd suggest using JFreeChart. It's another library that makes it easy to graph things.

I want to know how to use a music file chosen by the user, then detect when a certain sound frequency. ANDROID

So basically I want this to get the range of 60 - 150 Hz which is the general area for bass that lies in a song. Whenever it is in this range I want it do a function, and only it the range, my problem is I have tried to look up the functions needed to do so but with no luck, if one could show me here or a good article or explanation on this it will be great! I appreciate all the help and I will continue looking on my own. If more explanation is needed I can provide whatever information that is needed!
Austin.
UPDATE: I simplified an algorithm here:
User selects the song they want
Song loads onto player
Function scans song and finds the lower frequencies throughout the song and the output is a pattern.

Step 1) Do a fast fourier transform: http://en.wikipedia.org/wiki/Fast_Fourier_transform
An FFT takes a piece of sound and transforms it into the frequency/time domain - as in, which frequencies are playing and how intensely and during what parts of the sound. This is a useful mathematical operation that relies upon the property that all sound, no matter how complex, can be fundamentally constructed out of one or more sine waves of different frequencies and amplitudes.
If you've ever looked at a spectrogram, for example in foobar2000, it is implemented using FFT:
I suggest instead of trying to implement FFT yourself you find a library that is well tested and fast, such as http://en.wikipedia.org/wiki/FFTW which is written in C
Step 2) Now that you've FFTed the part of the sound that the user is listening to, you can simply inspect the frequency bins and do whatever you want! Although detecting bass kicks is not as simple as 'is this frequency bin a high value?' because then you may mistake bass lines for bass kicks. You may need to do further testing and research to get it to work juuust right.
EDIT: Delyan suggests http://www.clear.rice.edu/elec301/Projects01/beat_sync/beatalgo.html and it looks pretty good.

Finding the 'volume' of a .wav at a given time

I am working on a small example application for my fourth year project (dealing with Functional Reactive Programming). The idea is to create a simple program that can play a .wav file and then shows a 'bouncing' animation of the current volume of the playing song (like in audio recording software). I'm building this in Scala so have mainly been looking at Java libraries and existing solutions.
Currently, I have managed to play a .wav file easily but I can't seem to achieve the second goal. Basically is there a way I can decode a .wav file so I can have someway of accessing
the 'volume' at any given time? By volume I think I means its amplitude but I may be wrong about this - Higher Physics was a while ago....
Clearly, I don't know much about this at all so it would be great if someone could point me in the right direction!

In digital audio processing you typically refer to the momentary peak amplitude of the signal (this is also called PPM -- peak programme metering). Depending on how accurate you want to be or if you wish to model some standardised metering or not, you could either
just use a sliding window of sample frames (find the maximum absolute value per window)
implement some sort of peak-hold mechanism that retains the last peak value for a given duration and then start to have the value 'fall' by a given amount of decibels per second.
The other measuring mode is RMS which is calculated by integrating over a certain time window (add the squared sample values, divide by the window length, and take the square-root, thus root-mean-square RMS). This gives a better idea of the 'energy' of the signal, moving smoother than peak measurements, but not capturing the maximum values observed. This mode is sometimes called VU meter as well. You can approximate this with a sort of lagging (lowpass) filter, e.g. y[i] = y[i-1]*a + |x[i]|*(a-1), for some value 0 < a < 1
You typically display the values logarithmically, i.e. in decibels, as this corresponds better with our perception of signal strength and also for most signals produces a more regular coverage of your screen space.
Three projects I'm involved with may help you:
ScalaAudioFile which you can use to read the sample frames from an AIFF or WAVE file
ScalaAudioWidgets which is a still young and incomplete project to provide some audio application widgets on top of scala-swing, including a PPM view -- just use a sliding window and set the window's current peak value (and optionally RMS) at a regular interval, and the view will take care of peak-hold and fall times
(ScalaCollider, a client for the SuperCollider sound synthesis system, which you might use to play back the sound file and measure the peak and RMS amplitudes in real time. The latter is probably an overkill for your project and would involve some serious learning curve if you have never heard of SuperCollider. The advantage would be that you don't need to worry about synchronising your sound playback with the meter display)

In a wav file, the data at a given point in the stream IS the volume (shifted by half of the dynamic range). In other words, if you know what type of wav file (for example 8 bit, mono) each byte represents a single sample. If you know the sample rate (say 44100 HZ) then multiply the time by 44100 and that is the byte you want to look at.
The value of the byte is the volume (distance from the middle.. 0 and 255 are the peaks, 127 is zero). This is assuming that the encoding is not mu-law encoding. I found some good info on how to tell the difference, or better yet, convert between these formats here:
http://www.gnu.org/software/octave/doc/interpreter/Audio-Processing.html
You may want to average these samples though over a window of some fixed number of samples.

Can you programmatically detect white noise?

The Dell Streak has been discovered to have an FM radio which has very crude controls. 'Scanning' is unavailable by default, so my question is does anyone know how, using Java on Android, one might 'listen' to the FM radio as we iterate up through the frequency range detecting white noise (or a good signal) so as to act much like a normal radio's seek function?

I have done some practical work on this specific area, i would recommend (if you have a little time for it) to try just a little experimentation before resorting to fft'ing. The pcm stream can be interpreted very complexely and subtly (as per high quality filtering and resampling) but can also be practically treated for many purposes as the path of a wiggly line.
White noise is unpredictable shaking of the line, which is never-the-less quite continuous in intensity (rms, absolute mean..) Acoustic content is recurrent wiggling and occasional surprises (jumps, leaps) :]
Non-noise like content of a signal may be estimated by performing quick calculations on a running window of the pcm stream.
For example, noise will strongly tend to have a higher value for the absolute integral of its derivative, than non-noise. I think that is the academic way of saying this:
loop(n+1 to n.length)
{ sumd0+= abs(pcm[n]);
sumd1+= abs(pcm[n]-pcm[n-1]);
}
wNoiseRatio = ?0.8; //quite easily discovered, bit tricky to calculate.
if((sumd1/sumd0)<wNoiseRatio)
{ /*not like noise*/ }
Also, the running absolute average over ~16 to ~30 samples of white noise will tend to vary less, over white noise than acoustic signal:
loop(n+24 to n.length-16)
{ runAbsAve1 += abs(pcm[n]) - abs(pcm[n-24]); }
loop(n+24+16 to n.length)
{ runAbsAve2 += abs(pcm[n]) - abs(pcm[n-24]); }
unusualDif= 5; //a factor. tighter values for longer measures.
if(abs(runAbsAve1-runAbsAve2)>(runAbsAve1+runAbsAve2)/(2*unusualDif))
{ /*not like noise*/ }
This concerns how white noise tends to be non-sporadic over large enough span to average out its entropy. Acoustic content is sporadic (localised power) and recurrent (repetitive power).
The simple test reacts to acoustic content with lower frequencies and could be drowned out by high frequency content. There are simple to apply lowpass filters which could help (and no doubt other adaptions).
Also, the root mean square can be divided by the mean absolute sum providing another ratio which should be particular to white noise, though i cant figure what it is right now. The ratio will also differ for the signals derivatives as well.
I think of these as being simple formulaic signatures of noise. I'm sure there are more..
Sorry to not be more specific, it is fuzzy and imprecise advice, but so is performing simple tests on the output of an fft. For better explaination and more ideas perhaps check out statistical and stochastic(?) measurements of entropy and randomness on wikipedia etc.

Use a Fast Fourier Transform.
This is what you can use a Fast Fourier Transform for. It analyzes the signal and determines the strength of the signal at various frequencies. If there's a spike in the FFT curve at all, it should indicate that the signal is not simply white noise.
Here is a library which supports FFT's. Also, here is a blog with source code in case you want to learn about what the FFT does.

If you don't have FFT tools available, just a wild suggestion:
Try to compress a few milliseconds of audio.
A typical feature of noise is that it compresses much less than clear signal.

As far as I know there is no API or even drivers for the FM Radio in the Android SDK and unless Dell releases one you will have to roll your own. It's actually even worse than that. All(?) new chipsets has FM Radio but not all phones has an FM Radio application.
The old Windows Mobile had the same problem.

For white noise detection you need to do FFT and see that it has more or less continious spectrum. But recording from FM might be a problem.

Just high pass filtering it will give a good idea, and has sometimes been used for squelch on fm radios.
Note that this is comparable to what the derivative suggestion was getting at - taking the derivative is a simple form of high pass filter, and taking the absolute value of that a crude way of measuring power.

Do you have a subscription to the IEEE Xplore library? There are countless papers (one picked at random) on this very topic.
A very simplistic method would be to observe the "flatness" of the power spectral density. One could take this by using a Fast Fourier Transform of the signal in the time domain and find the standard deviation of the spectral density. If it is below some threshold, you have your white noise.

The main question here is: what type of signal do you have access to?
I bet you don't have direct access to the analog EM signal directly. So no use of FFT on this signal possible. You can't also try to build a phased-lock loop, which is the way your standard old radio tuner works ("Scanning" in your case).
Your only option is indeed to pick one frequency and listen too it (and try do detect when it's noise with FFT on sound). You might even only have access to the FFTed signal.
Problem here: If you want to detect a potential frequency using white noise you will pick up signals too easily.
Anyway, here is what I would try to do with this strategy:
Double integrate the autocorrelation of the spectral density over a fraction of a second of audio. And this for each frequency.
Then look for a FM frequency where this number is maxed.
Little explanation here:
Spectral density gives you a signal which most used frequencies are maxed.
If a bit of time later if the same frequencies are maxed then you have some supposedly clear audio. You get this by integrating the autocorrelation the spectral density for one audio frequency for a fraction of a second (using some function that grows larger than linear might also work)
You then just have to integrate this for all audio frequencies
Also be careful to normalize the integrals: a loud white noise signal should not get a higher score than a clear but low audio signal.

Several people have mentioned the FFT, which you'll want to do, but to then detect white noise you need to make sure that the magnitude is relatively constant over the range of audio frequencies. You'll want to look at magnitudes only, you can throw away the phases. You can compute an average and standard deviation for the magnitudes in O(N) time. For white noise, you should find the standard deviation to be a relatively small fraction of the average. If I remember my statistics right, it should be about (1/sqrt(N)) of the average.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.