I have managed to play a sound file with a different speed using answers from here, but I need to be able to adjust the speed as it plays. There's two methods I've thought of using. The first is to split the audio file into short clips and play each one after the last ends. I haven't tried that yet, but it seems like it could easily end with the file playing over itself or having short gaps.
The other method is to take the original file as a stream and then make a stream using that that speeds it up or slows it down as needed. This seems like it would work well, but in order to construct an AudioInputStream, I either need an InptutStream of known length, which is impossible to figure out ahead of time, or a TargetDataLine, which is an interface that has way more methods than I'd care to implement.
Is there a better way of doing this? Also, why does AudioInputStream need to know the length of the stream?
Alternately, is there an external library I could use?
If you are simply playing back an audio file (e.g., a .wav) and are okay with the pitch of the sound being shifted, a simple possibility is to read the data from an AudioInputStream, translate to PCM, interpolate though that data at the desired rate, translate back to bytes an ship out via a SourceDataLine.
To speed up or slow down in real time, loosely couple inputs to the variable holding the increment being used to progress through the incoming frames. To minimize discontinuities, you can smooth out the transitions from one pitch to another over a given number of frames.
This is done to achieve real-time frequency changes in the open source library AudioCue, on github. Smoothing there between frequency changes is set to occur over 1028 frames (approx 1/40th of a second). But quicker changes are certainly possible. The sound data in that library is take from an internal float array of PCM values. But a good example of code needed to read the data as a line rather than a fixed array can be seen in the first code example in the Sound Trail, Using File Filters and Converters. You might be wanting to use an InputStream as the argument for the AudioInputStream. At the point in the example where it says "Here, do something useful.." you would convert to PCM and then cursor through the resulting PCM with the desired frequency rate, using linear interpolation, and then repackage and send out via a SourceDataLine.
If you wish to preserve pitch (time stretch or compress only) then this starts to require more heavy duty DSP. This thread at the StackExchange Digital Processing site has some info on that. I've had some success with making granules with a Hamming Window to aid cross-fading between them, but some of the other solutions were over my head (and I haven't been back to this problem in a long while). But it was possible to change the spacing of the granules in real time, if I remember correctly. Didn't sound as good as the Audacity tool's algorithm, though, but that's probably more on me than not. I'm pretty much self-taught and experimenting, not working in the field professionally.
(I believe Phil's answer will get you going nicely. I'm just posting this to add my two cents about resampling.)
Short answer: Create an AudioInputStream that either drops samples or adds zero samples. As length you can set AudioSystem.NOT_SPECIFIED.
Long answer: If you add zero samples, you might want to interpolate, but not linearly. The reason you have to interpolate for upsampling is aliasing, which you might want to avoid. You do so, by applying a lowpass filter. The reason for this is simple. The Nyquist-Shannon theorem states that when a signal is sampled at X Hz, you can only unambiguously represent frequencies up to X/2 Hz. When you upsample, you increase the sample frequency, so in theory you can represent a larger frequency range. Indeed, when simply adding zeros you see some energy in those additional frequency ranges—which shouldn't be there, because you have no information about it. So you need to "cut them off" using a low pass filter. More about upsampling can be found on Wikipedia.
Long story short, there is a proper way to do it. You seem to be OK with distortions, so doing it the right way may not be necessary, but a waste of time.
Shameless plug: If you nevertheless want to do it somewhat right, you might find the Resample class of jipes useful. It's not a universal resampler, i.e., it only supports a limited number of factors, like 2, 4, ..., but it may prove useful for you.
import com.tagtraum.jipes.math.MultirateFilters.Resampler;
[...]
float[] original = ... ; // original signal as float
Resampler downsampler2 = new MultirateFilters.Resampler(1, 2);
float[] downsampled = downsampler2.map(original);
Resampler upsampler2 = new MultirateFilters.Resampler(2, 1);
float[] upsampled = upsampler2.map(original);
If you want to time-scale modification (TSM), i.e., changing the tempo without changing the frequencies, you might want to use Rubberband for Java.
Related
Android provides a default of 15 steps for its sound systems which you can access through Audio Manager. However, I would like to have finer control.
One method of doing so seems to be altering specific files within the Android system to divide the sound levels even further then default. I would like to programmatically achieve the same effect using Java.
Fine volume control is an example of the app being able to divide the sound levels into one hundred distinct intervals. How do I achieve this?
One way, in Java, to get very precise volume adjustment is to access the PCM data directly and multiply it by some factor, usually from 0 up to 1. Another is to try and access the line's volume control, if it has one. I've given up trying to do the latter. The precision is okay in terms of amplitude, but the timing is terrible. One can only have one volume change per audio buffer read.
To access the PCM data directly, one has to iterate through the audio read buffer, translate the bytes into PCM, perform the multiplication then translate back to bytes. But this gives you per-frame control, so very smooth and fast fades can be made.
EDIT: To do this in Java, first check out the sample code snippet at the start of this java tutorial link, in particular, the section with the comment
// Here, do something useful with the audio data that's now in the audioBytes array...
There are several StackOverflow questions that show code for the math to convert audio bytes to PCM and back, using Java. Should not be hard to uncover with a search.
Pretty late to the party, but I'm currently trying to solve this issue as well. IF you are making your own media player app and are running an instance of a MediaPlayer, then you can use the function setVolume(leftScalar, rightScalar) where leftScalar and rightScalar are floats in the range of 0.0 to 1.0. representing logarithmic scale volume for each respective ear.
HOWEVER, this means that you must have a reference to the currently active MediaPlayer instance. If you are making a music app, no biggie. If you're trying to run a background service that allows users to give higher precision over all media output, I'm not sure how to use this in that scenario.
Hope this helps.
I want to be able to detect a tone of a predetermined frequency using java. What I am doing is playing a tone (the frequency of the tone is variable by user input) and I am trying to detect if the tone is of a certain frequency. If it is, I execute a certain method. From what I have read I will need to us FFT, but I'm not sure how to implement it in java. There seems to be a lot of documentation for how to do it, but what documentation there is involves looking at an audio file rather than real time analysis. I don't need to save the audio to a file just determine if and when a tone of frequency x was recorded.
Ideally I would like to record at a sample rate of 44KHz and after determining if a tone was detected, determine when the tone was detected with an accuracy of +-3ms. However, an accuracy less than this would be acceptable as long as it isn't ridiculous (ie +100ms). I know roughly what I need to do from what I have looked up, but I need help tying it all together. Using pseudo code it would look roughly like this (I think)
Note that I know roughly within +-1s of when a tone of satisfying frequency maybe detected
for(i = 0, i < 440000 * 2, i++){//*2 because of expected appearance interval;may change
record sound sample
fft(sound sample)
if(frequencySoundSample > x){
do something
return
}
}
There will be considerable background noise while the tone is played. However the tone will have a very high frequency, like 15-22KHz, so it is my belief that by simply looking for when the recorder detects a very high frequency I can be sure it is my tone (also the tone will be played with a high amplitude for maybe .5s or 1s). I know that there will not be other high frequency sounds as background noise (I am expecting a background frequency high of maybe 5KHz).
I have two questions then. Is the pseudo code that I have provided sufficient for what I want to do? If it isn't or if there is a better way of doing this I'm all for it. Second, how would I implement this in java? I understand what I need to do, but I'm having trouble tying it all together. I'm pretty decent with java but I'm not familiar with the syntax involved with audio and I don't have any experience with fft. Please be explicit and give code with comments. I've been trying to figure this out for a while I just need to see it all tied together. Thank you.
EDIT
I understand that using a for loop like I have will not produce the frequency that I want. It was more to show roughly what I want. That is, recording, performing fft, and testing the frequency all at once as time progresses.
If you're just looking for a specific frequency then an FFT-based method is probably a bad choice for your particular application, for two reasons:
it's overkill - you're computing an entire spectrum just to detect the magnitude at one point
to get 3 ms resolution for your onset detection you'll need a large overlap between successive FFTs, which will require much more CPU bandwidth than just processing successive blocks of samples
A better choice for detecting the presence or absence of a single tone is the Goertzel algorithm (aka Goertzel filter). It's effectively a DFT evaluated at a single frequency domain bin, and is widely used for tone detection. It's much less computationally expensive than an FFT, very simple to implement, and you can test its output on every sample, so no resolution problem (other than those dictated by the laws of physics). You'll need to low pass filter the magnitude of the output and then use some kind of threshold detection to determine the onset time of your tone.
Note that there are a number of useful questions and answers on SO already about tone detection and using the Goertzel algorithm (e.g. Precise tone onset/duration measurement?) - I suggest reading these along with the Wikipedia entry as a good starting point.
Im actually working on a similar project with pitch detection, in Java as well. If you want to use FFT, you could do it with these steps. Java has a lot of libraries that can make this process easy for you.
First, you need to read in the sound file. This can be done using Java Sound. It's a built in library with functions that make it easy to record sound. Examples can be found here. The default sample rate is 44,100 KHz (CD quality). These examples can get you from playing the actual tone to a double array of bytes representing the tone.
Second, you should take the FFT with JTransforms. Here is an example of FFT being taken on a collection of samples.
FFT gives you an array twice the length of the array of samples you passed it. You need to go through the FFT array by two's, since each part of this array is represented as an imaginary and a real piece. Compute the magnitude of each part of this array with sqrt(im^2 + re^2). Then, find which magnitude is the largest. The index of that magnitude corresponds to the frequency you're looking for.
Keep in mind, you don't take FFT on the entire portion of sound. You break the sound up into chunks, and FFT each one. The chunks can overlap for higher accuracy, but that shouldn't be a problem, since you're just looking for a predetermined note. If you want to improve performance, you can also window each chunk before doing this.
Once you have all the FFTs, they should confirm a certain frequency, and you can check that against the note you want.
If you want to try and visualize this, I'd suggest using JFreeChart. It's another library that makes it easy to graph things.
I have a not so simple question about Java Sound ( javax.sound package ).
I am implementing MP3 player with cross fade and smooth volume and seek controls.
I am reading sound as stream in 4096byte chunks and calculate the position in miliseconds manually.
When I want to seek() ( change base position from where the stream will be red ) I hear a really ugly "jump" in sound wave. I tried examining JLayer and other MP3 APIs but they don't have a seek() function at all or they have this "ugly sound jump" too.
My question is: How can I make this jump from one sound wave chunk to the other smoother? I tried interpolation but a reasonable ammount of time to "not-hear the jump" is 300ms and thats too long for seek() function.
Have you encountered this problem?
Do you know the solution?
I will paste a code sample here just to be sure.
public void seek( long pPosition )
{
sourceDataLine.flush();
seekIndex = ( sourceDataLine.getMicrosecondPosition() / 1000 ) - currentPositionInMilliseconds;
}
public long getPositionInMilliseconds()
{ return ( sourceDataLine.getMicrosecondPosition() / 1000 ) - seekIndex; }
the "position in milliseconds" is needed because of DataLine API of javax.sound
Thanks I'm frustrated...
You can't really create a smooth transition if the chunks you want to transition are too short for cross-fading, but you can eliminate the worst of the artifacts from the boundaries.
The bad artifact Im refering to often sounds like a click or pop, but if there are many in short succession it might sound like a thrashing sound or it may even introduce a specific pitch of its own, if the intervals are regular. This kind of artifact is a result of creating arbitrary blocks of audio because the amplitude of the audio at the boundaries may jump from one block to the next, or from the end of the block to silence. There are a few ways to eliminate it, the most common of which is to move the boundary from the arbitrary location to the nearest 'zero crossing' so that there is no longer a jump or discontinuity. Alternatively, since your blocks are rot on top of each other, you could do something to find some place where the value of the blocks cross each other, preferably going in the same direction.
The only way I know to do this is working directly on the data at the per-frame level. You have to "open up" the sounds to get at the bytes and directly do your computations. Most built in Java controls have a granularity that is hindered by the size of the buffer, i.e., can only process one volume change, in effect, per sound data buffer.
Even when you are working at the per-frame level, there are problems to surmount with Java's lack of real time guarantees. But they are surmountable.
I made a "clip slicer," for example, that uses the equivalent of a clip as source sound. It takes random slices of the sample and strings them together. As little as 16 frames of overlapping interpolation works to keep the sound flowing smoothly. Using 1/10th of a second slices with 16-frame overlaps worked well for making an endlessly streaming brook from a 4-second recording.
I made a Theremin that takes mouse-motion listener locations for volume and pitch. I got it to work quite smoothly with about 30 or 40 frame latency. The trick was time-stamping the mouse-motion-listener outputs, and basing the controls on the calculations made on that data, as the events do not arrive or get processed smoothly in real time, creating zippering or other discontinuities.
Another thing to consider, the range on the data does not map well to decibels. So a small volume differential at the low end is much more discontinuous (and prone to clicks) than the same volume interval at the high end. I solved this by making a mapping of the audio data to decibel volumes, and powering the amount of volume change based on the amplitude mapping. I hope some of these ideas prove helpful!
This question already has answers here:
Detect silence when recording
(2 answers)
Closed 9 years ago.
I am starting a project which would allow me to use Java to read sound samples, and depending on the properties of each sample (I'm thinking focusing on decibels at the moment for the sake of simplification, or finding some way to compute the overall 'volume' of a specific sample or set of samples), return a value from 0-255 where 0 would be silence and 255 would be the highest sound pressure (Compared to a reference point, I suppose? I have no idea how to word this). I want to then have these values returned as bytes and sent to an Arduino in order to control the intensity of LED's using PWM, and visually 'see' the music.
I am not any sort of audio file format expert, and have no particular understanding of how the data is stored in a music file. As such, I am having trouble finding out how to read a sample and find a way to represent its overall volume level as a byte. I have looked through the javax.sound.sampled package and it is all very confusing to me. Any insight as to how I could accomplish this would be greatly appreciated.
First i suggest you to read Pulse-code modulation which is the format use to store data on a .wav file (the simplest to begin with).
Next there is a post on how to get PCM data from a wav file in java here.
Finally to get the "volume" (which is actually more the energy) apply this energy equation.
wish it could help you,
As Bastyen (+1 from me) indicates, calculating decibels is actually NOT simple, but requires looking at a large number of samples. However, since sound samples run MUCH more frequently than visual frames in an animation, making an aggregate measure works out rather neatly.
A nice visual animation rate, for example, updates 60 times per second, and the most common sampling rate for sound is 44100 times per second. So, 735 samples (44100 / 60 = 735) might end up being a good choice for interfacing with a visualizer.
By the way, of all the official Java tutorials I've read (I am a big fan), I have found the ones that accompany the javax.sound.sampled to be the most difficult. http://docs.oracle.com/javase/tutorial/sound/TOC.html
But they are still worth reading. If I were in charge of a rewrite, there would be many more code examples. Some of the best code examples are in several sections deep, e.g., the "Using Files and Format Converters" discussion.
If you don't wish to compute the RMS, a hack would be to store the local high and/or low value for the given number of samples. Relating these numbers to decibels would be dubious, but MAYBE could be useful after giving it a mapping of your choice to the visualizer. Part of the problem is that values for a single point on given wave can range wildly. The local high might be more due to the phase of the constituent harmonics happening to line up than about the energy or volume.
Your PCM top and bottom values would probably NOT be 0 and 256, more likely -128 to 127 for 8-bit encoding. More common still is 16-bit encoding (-32768 to 32767). But you will get the hang of this if you follow Bastyen's links. To make your code independent of the bit-encoding, you would likely normalize the data (convert to floats between -1 and 1) before doing any other calculations.
I am working on a small example application for my fourth year project (dealing with Functional Reactive Programming). The idea is to create a simple program that can play a .wav file and then shows a 'bouncing' animation of the current volume of the playing song (like in audio recording software). I'm building this in Scala so have mainly been looking at Java libraries and existing solutions.
Currently, I have managed to play a .wav file easily but I can't seem to achieve the second goal. Basically is there a way I can decode a .wav file so I can have someway of accessing
the 'volume' at any given time? By volume I think I means its amplitude but I may be wrong about this - Higher Physics was a while ago....
Clearly, I don't know much about this at all so it would be great if someone could point me in the right direction!
In digital audio processing you typically refer to the momentary peak amplitude of the signal (this is also called PPM -- peak programme metering). Depending on how accurate you want to be or if you wish to model some standardised metering or not, you could either
just use a sliding window of sample frames (find the maximum absolute value per window)
implement some sort of peak-hold mechanism that retains the last peak value for a given duration and then start to have the value 'fall' by a given amount of decibels per second.
The other measuring mode is RMS which is calculated by integrating over a certain time window (add the squared sample values, divide by the window length, and take the square-root, thus root-mean-square RMS). This gives a better idea of the 'energy' of the signal, moving smoother than peak measurements, but not capturing the maximum values observed. This mode is sometimes called VU meter as well. You can approximate this with a sort of lagging (lowpass) filter, e.g. y[i] = y[i-1]*a + |x[i]|*(a-1), for some value 0 < a < 1
You typically display the values logarithmically, i.e. in decibels, as this corresponds better with our perception of signal strength and also for most signals produces a more regular coverage of your screen space.
Three projects I'm involved with may help you:
ScalaAudioFile which you can use to read the sample frames from an AIFF or WAVE file
ScalaAudioWidgets which is a still young and incomplete project to provide some audio application widgets on top of scala-swing, including a PPM view -- just use a sliding window and set the window's current peak value (and optionally RMS) at a regular interval, and the view will take care of peak-hold and fall times
(ScalaCollider, a client for the SuperCollider sound synthesis system, which you might use to play back the sound file and measure the peak and RMS amplitudes in real time. The latter is probably an overkill for your project and would involve some serious learning curve if you have never heard of SuperCollider. The advantage would be that you don't need to worry about synchronising your sound playback with the meter display)
In a wav file, the data at a given point in the stream IS the volume (shifted by half of the dynamic range). In other words, if you know what type of wav file (for example 8 bit, mono) each byte represents a single sample. If you know the sample rate (say 44100 HZ) then multiply the time by 44100 and that is the byte you want to look at.
The value of the byte is the volume (distance from the middle.. 0 and 255 are the peaks, 127 is zero). This is assuming that the encoding is not mu-law encoding. I found some good info on how to tell the difference, or better yet, convert between these formats here:
http://www.gnu.org/software/octave/doc/interpreter/Audio-Processing.html
You may want to average these samples though over a window of some fixed number of samples.