I am writing some code that intends to take a Wave file, and write it out to and AudioTrack in mode stream. This is a minimum viable test to get AudioTrack stream mode working.
But once I write some buffer of audio to the AudioTrack, and subsequently call play(), the method getPlaybackHeadPosition() continually returns 0.
EDIT: If I ignore my available frames check, and just continually write buffers to the AudioTrack, the write method returns 0 (after the the first buffer write), indicating that it simply did not write any more audio. So it seems that the AudioTrack just doesn't want to start playing.
My code is properly priming the audiotrack. The play method is not throwing any exceptions, so I am not sure what is going wrong.
When stepping through the code, everything on my end is exactly how I anticipate it, so I am thinking somehow I have the AudioTrack configured wrong.
I am running on an emulator, but I don't think that should be an issue.
The WavFile class I am using is a vetted class that I have up and running reliably in lots of Java projects, it is tested to work well.
Observe the following log write, which is a snippet from the larger chunk of code. This log write is never hitting...
if (headPosition > 0)
Log.e("headPosition is greater than zero!!");
..
public static void writeToAudioTrackStream(final WavFile wave)
{
Log.e("writeToAudioTrackStream");
Thread thread = new Thread()
{
public void run()
{
try {
final float[] data = wave.getData();
int format = -1;
if (wave.getChannel() == 1)
format = AudioFormat.CHANNEL_OUT_MONO;
else if (wave.getChannel() == 2)
format = AudioFormat.CHANNEL_OUT_STEREO;
else
throw new RuntimeException("writeToAudioTrackStatic() - unsupported number of channels value = "+wave.getChannel());
final int bufferSizeInFrames = 2048;
final int bytesPerSmp = wave.getBytesPerSmp();
final int bufferSizeInBytes = bufferSizeInFrames * bytesPerSmp * wave.getChannel();
AudioTrack audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, wave.getSmpRate(),
format,
AudioFormat.ENCODING_PCM_FLOAT,
bufferSizeInBytes,
AudioTrack.MODE_STREAM);
int index = 0;
float[] buffer = new float[bufferSizeInFrames * wave.getChannel()];
boolean started = false;
int framesWritten = 0;
while (index < data.length) {
// calculate the available space in the buffer
int headPosition = audioTrack.getPlaybackHeadPosition();
if (headPosition > 0)
Log.e("headPosition is greater than zero!!");
int framesInBuffer = framesWritten - headPosition;
int availableFrames = bufferSizeInFrames - framesInBuffer;
// once the buffer has no space, the prime is done, so start playing
if (availableFrames == 0) {
if (!started) {
audioTrack.play();
started = true;
}
continue;
}
int endOffset = availableFrames * wave.getChannel();
for (int i = 0; i < endOffset; i++)
buffer[i] = data[index + i];
int samplesWritten = audioTrack.write(buffer , 0 , endOffset , AudioTrack.WRITE_BLOCKING);
// could return error values
if (samplesWritten < 0)
throw new RuntimeException("AudioTrack write error.");
framesWritten += samplesWritten / wave.getChannel();
index = endOffset;
}
}
catch (Exception e) {
Log.e(e.toString());
}
}
};
thread.start();
}
Per the documentation,
For portability, an application should prime the data path to the maximum allowed by writing data until the write() method returns a short transfer count. This allows play() to start immediately, and reduces the chance of underrun.
With a strict reading, this might be seen to contradict the earlier statement:
...you can optionally prime the data path prior to calling play(), by writing up to bufferSizeInBytes...
(emphasis mine), but the intent is clear enough: You're supposed to get a short write first.
This is just to get play started. Once that takes place, you can, in fact, use
getPlaybackHeadPosition() to determine when more space is available. I've used that technique successfully in my own code, on many different devices/API levels.
As an aside: You should be prepared for getPlaybackHeadPosition() to change only in large increments (if I remember correctly, it's getMinBufferSize()/2). This is the max resolution available from the system; onMarkerReached() cannot be used to do any better.
Related
I am managing audio capturing and playing using java sound API (targetDataLine and sourceDataLine). Now suppose in a conference environment, one participant's audio queue size got greater than jitter size (due to processing or network) and I want to fast forward the audio bytes I have of that participant to make it shorter than jitter size.
How can I fast forward the audio byte array of that participant?
I can't do it during playing as normally Player thread just deque 1 frame from every participant's queue and mix it for playing. The only way I can get that is if I deque more than 1 frame of that participant and mix(?) it for fast-forwarding before mixing it with other participants 1 dequeued frame for playing?
Thanks in advance for any kind of help or advice.
There are two ways to speed up the playback that I know of. In one case, the faster pace creates a rise in pitch. The coding for this is relatively easy. In the other case, pitch is kept constant, but it involves a technique of working with sound granules (granular synthesis), and is harder to explain.
For the situation where maintaining the same pitch is not a concern, the basic plan is as follows: instead of advancing by single frames, advance by a frame + a small increment. For example, let's say that advancing 1.1 frames over a course of 44000 frames is sufficient to catch you up. (That would also mean that the pitch increase would be about 1/10 of an octave.)
To advance a "fractional" frame, you first have to convert the bytes of the two bracketing frames to PCM. Then, use linear interpolation to get the intermediate value. Then convert that intermediate value back to bytes for the output line.
For example, if you are advancing from frame[0] to frame["1.1"] you will need to know the PCM for frame[1] and frame[2]. The intermediate value can be calculated using a weighted average:
value = PCM[1] * 9/10 + PCM[2] * 1/10
I think it might be good to make the amount by which you advance change gradually. Take a few dozen frames to ramp up the increment and allow time to ramp down again when returning to normal dequeuing. If you suddenly change the rate at which you are reading the audio data, it is possible to introduce a discontinuity that will be heard as a click.
I have used this basic plan for dynamic control of playback speed, but I haven't had the experience of employing it for the situation that you are describing. Regulating the variable speed could be tricky if you also are trying to enforce keeping the transitions smooth.
The basic idea for using granules involves obtaining contiguous PCM (I'm not clear what the optimum number of frames would be for voice, 1 to 50 millis is cited as commonly being used with this technique in synthesis), and giving it a volume envelope that allows you to mix sequential granules end-to-end (they must overlap).
I think the envelopes for the granules make use of a Hann function or Hamming window--but I'm not clear on the details, such as the overlapping placement of the granules so that they mix/transition smoothly. I've only dabbled, and I'm going to assume folks at Signal Processing will be the best bet for advice on how to code this.
I found a fantastic git repo (sonic library, mainly for audio player) which actually does exactly what I wanted with so much controls. I can input a whole .wav file or even chunks of audio byte arrays and after processing, we can get speed up play experience and so more. For real time processing I actually called this on every chunk of audio byte array.
I found another way/algo to detect whether a audio chunk/byte array is voice or not and after depending on it's result, I can simply ignore playing non voice packets which gives us around 1.5x speedup with less processing.
public class DTHVAD {
public static final int INITIAL_EMIN = 100;
public static final double INITIAL_DELTAJ = 1.0001;
private static boolean isFirstFrame;
private static double Emax;
private static double Emin;
private static int inactiveFrameCounter;
private static double Lamda; //
private static double DeltaJ;
static {
initDTH();
}
private static void initDTH() {
Emax = 0;
Emin = 0;
isFirstFrame = true;
Lamda = 0.950; // range is 0.950---0.999
DeltaJ = 1.0001;
}
public static boolean isAllSilence(short[] samples, int length) {
boolean r = true;
for (int l = 0; l < length; l += 80) {
if (!isSilence(samples, l, l+80)) {
r = false;
break;
}
}
return r;
}
public static boolean isSilence(short[] samples, int offset, int length) {
boolean isSilenceR = false;
long energy = energyRMSE(samples, offset, length);
// printf("en=%ld\n",energy);
if (isFirstFrame) {
Emax = energy;
Emin = INITIAL_EMIN;
isFirstFrame = false;
}
if (energy > Emax) {
Emax = energy;
}
if (energy < Emin) {
if ((int) energy == 0) {
Emin = INITIAL_EMIN;
} else {
Emin = energy;
}
DeltaJ = INITIAL_DELTAJ; // Resetting DeltaJ with initial value
} else {
DeltaJ = DeltaJ * 1.0001;
}
long thresshold = (long) ((1 - Lamda) * Emax + Lamda * Emin);
// printf("e=%ld,Emin=%f, Emax=%f, thres=%ld\n",energy,Emin,Emax,thresshold);
Lamda = (Emax - Emin) / Emax;
if (energy > thresshold) {
isSilenceR = false; // voice marking
} else {
isSilenceR = true; // noise marking
}
Emin = Emin * DeltaJ;
return isSilenceR;
}
private static long energyRMSE(short[] samples, int offset, int length) {
double cEnergy = 0;
float reversOfN = (float) 1 / length;
long step = 0;
for (int i = offset; i < length; i++) {
step = samples[i] * samples[i]; // x*x/N=
// printf("step=%ld cEng=%ld\n",step,cEnergy);
cEnergy += (long) ((float) step * reversOfN);// for length =80
// reverseOfN=0.0125
}
cEnergy = Math.pow(cEnergy, 0.5);
return (long) cEnergy;
}
}
Here I can convert my byte array to short array and detect whether it is voice or non voice by
frame.silence = DTHVAD.isSilence(encodeShortBuffer, 0, shortLen);
tl/dr: I need to keep some values in my app up to date with the values in ~10 small files, but I'm worried reading the value over and over will have a lot of GC overhead. Do I create a bunch of unbuffered file readers and poll them, or is there any way to "map" the values in a file into a java Double that I can re-run a moment later when the value (maybe) changed?
Long version: I've got some physical sensors (Gyroscope, tachometer) which ev3dev helpfully exposes their current values as small files in a virtual filesystem. Like one file called "/sys/bus/lego/drivers/ev3-analog-sensor/angle" that contains 56.26712
Or the next moment it contains 58.9834
And I'd like a value in my app to keep as close in sync with that file as possible. I could have your standard loop containing MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size()); (from here) but that seems like a lot of allocation overhead if it put it in a fast loop.
Maybe something with a Scanner, or
FileChannel inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(1024);
while(inChannel.read(buffer) > 0)...
I haven't found a magic function of KeepInSyncWithFile(myFloatArray, File("./angle", MODE.FILE_TO_VALUE, 10, TimeUnits.MS)
Java 8+
Singe you are talking about pseudofiles on /sys virtual filesystem, it's unlikely that the standard WatchService will work for them. In order to get updated values, you need to read these files.
The good news is that you can keep reading in a garbage-free manner, i.e. with no allocation at all. Open the file and allocate the buffer just once, and every time you want to read a value, seek to the beginning of the file and read to an existing preallocated buffer.
Here is the code:
public class DeviceReader implements Closeable {
private final RandomAccessFile file;
private final byte[] buf = new byte[512];
public DeviceReader(String fileName) throws IOException {
this.file = new RandomAccessFile(fileName, "r");
}
#Override
public void close() throws IOException {
file.close();
}
public synchronized double readDouble() throws IOException {
file.seek(0);
int length = file.read(buf);
if (length <= 0) {
throw new EOFException();
}
int sign = 1;
long exp = 0;
long value = 0;
for (int i = 0; i < length; i++) {
byte ch = buf[i];
if (ch == '-') {
sign = -1;
} else if (ch == '.') {
exp = 1;
} else if (ch >= '0' && ch <= '9') {
value = (value * 10) + (ch - '0');
exp *= 10;
} else if (ch < ' ') {
break;
}
}
return (double) (sign * value) / Math.max(1, exp);
}
}
Note that I manually parse a floating point number from a byte[] buffer. It would be much easier to call Double.parseDouble, but in this case you'd have to convert a byte[] to a String, and the algorithm will no longer be allocation free.
I can't vouch for this but, File Observer might be worth looking into. You can cache the latest values in your app and observe the file via FileObserver to find if any modify event occurs. I personally don't have any experience working with it, so I can't say for sure as to whether it would work with system files. But if it does, then it's a better solution when compared to just repeatedly looking up the file in a loop.
I am writing my own audio format as part of a game console project. Part of the project requires me to write an emulator so I know exactly how to implement it's functions in hardware. I am currently writing the DSP portion, but I am having trouble writing a decoding algorithm. Before I go further, I'll explain my format.
DST (Dingo Sound Track) Audio format
The audio format only records to pieces of data per sample: the amplitude and the number of frames since the last sample. I'll explain. When converting an audio file (WAV for example), it compares the current sample with the previous one. If it detects that the current sample switches amplitude direction in relation to the previous sample, it records the previous sample and the number of frames since the last record. It keeps going until the end of the file. Here is a diagram to explain further:
What I need to do
I need my "DSP" to figure out the data between each sample, as accurately as possible using only the given information. I don't think it's my encoding algorithm, because when I play the file in Audacity, I can sort of make out the original song. But when I try to play it with my decoding algorithm, I get scattered clicks. I am able to play WAV files directly with a few mods to the algorithm with almost no quality drop, so I know it's definitely the algorithm and not the rest of the DSP.
The Code
So now I got all of the basic info out of the way, here is my code (only the important parts).
Encoding algorithm:
FileInputStream s = null;
BufferedWriter bw;
try {
int bytes;
int previous = 0;
int unsigned;
int frames = 0;
int size;
int cursor = 0;
boolean dir = true;
int bytes2;
int previous2 = 0;
int unsigned2;
int frames2 = 0;
boolean dir2 = true;
s = new FileInputStream(selectedFile);
size = (int)s.getChannel().size();
File f = new File(Directory.getPath() + "\\" + (selectedFile.getName().replace(".wav", ".dts")));
System.out.println(f.getPath());
if(!f.exists()){
f.createNewFile();
}
bw = new BufferedWriter(new FileWriter(f));
try (BufferedInputStream b = new BufferedInputStream(s)) {
byte[] data = new byte[128];
b.skip(44);
System.out.println("Loading...");
while ((bytes = b.read(data)) > 0) {
// do something
for(int i=1; i<bytes; i += 4) {
unsigned = data[i] & 0xFF;
if (dir) {
if (unsigned < previous) {
bw.write(previous);
bw.write(frames);
dir = !dir;
frames = 0;
}else{
frames ++;
}
} else {
if (unsigned > previous) {
bw.write(previous);
bw.write(frames);
dir = !dir;
frames = 0;
}else{
frames ++;
}
}
previous = unsigned;
cursor ++;
unsigned2 = data[i + 2] & 0xFF;
if (dir2) {
if (unsigned2 < previous2) {
bw.write(previous2);
bw.write(frames2);
dir2 = !dir2;
frames2 = 0;
}else{
frames2 ++;
}
} else {
if (unsigned2 > previous2) {
bw.write(previous2);
bw.write(frames2);
dir2 = !dir2;
frames2 = 0;
}else{
frames2 ++;
}
}
previous2 = unsigned2;
cursor ++;
progress.setValue((int)(((float)(cursor / size)) * 100));
}
}
b.read(data);
}
bw.flush();
bw.close();
System.out.println("Done");
convert.setEnabled(true);
status.setText("finished");
} catch (Exception ex) {
status.setText("An error has occured");
ex.printStackTrace();
convert.setEnabled(true);
}
finally {
try {
s.close();
} catch (Exception ex) {
status.setText("An error has occured");
ex.printStackTrace();
convert.setEnabled(true);
}
}
The progress and status objects can be ignored for they are part of the GUI of my converter tool. This algorithm converts WAV files to my format (DST).
Decoding algorithm:
int start = bufferSize * (bufferNumber - 1);
short current;
short frames;
short count = 1;
short count2 = 1;
float jump;
for (int i = 0; i < bufferSize; i ++) {
current = RAM.read(start + i);
i++;
frames = RAM.read(start + i);
if (frames == 0) {
buffer[count - 1] = current;
count ++;
} else {
jump = current / frames;
for (int i2 = 1; i2 < frames; i2++) {
buffer[(2 * i2) - 1] = (short) (jump * i2);
count ++;
}
}
i++;
current = RAM.read(start + i);
i++;
frames = RAM.read(start + i);
if (frames == 0) {
buffer[count2] = current;
count2 ++;
} else {
jump = current / frames;
for (int i2 = 1; i2 < frames; i2++) {
buffer[2 * i2] = (short) (jump * i2);
count2 ++;
}
}
}
bufferNumber ++;
if(bufferNumber > maxBuffer){
bufferNumber = 1;
}
The RAM object is just a byte array. bufferNumber and maxBuffer refer to the amount of processing buffers the DSP core uses. buffer is the object that the resulting audio is written to. This algorithm set is designed to convert stereo tracks, which works the same way in my format but each sample will contain two sets of data, one for each track.
The Question
How do I figure out the missing audio between each sample, as accurately as possible, and how accurate will the approach be? I would love to simply use the WAV format, but my console is limited on memory (RAM). This format halves the RAM space required to process audio. I am also planning on implementing this algorithm in an ARM microcontroller, which will be the console's real DSP. The algorithm should also be fast, but accuracy is more important. If I need to clarify or explain anything further, let me know since this is my first BIG question and I am sure I forgot something. Code samples would be nice, but aren't needed that much.
EDIT:
I managed to get the DSP to output a song, but it's sped up and filled with static. The sped up part is due to a glitch in it not splitting the track into stereo (I think). And the static is due to the initial increment being too steep. Here is a picture of what I'm getting:
Here is the new code used in the DSP:
if (frames == 0) {
buffer[i - 1] = current;
//System.out.println(current);
} else {
for (int i2 = 1; i2 < frames + 1; i2++) {
jump = (float)(previous + ((float)(current - previous) / (frames - i2 + 1)));
//System.out.println((short)jump);
buffer[(2 * i2) - 1] = (short)(jump);
}
}
previous = current;
I need a way to smooth out those initial increments, and I'd prefer not to use complex arithmetic because I am limited on performance when I port this to hardware (preferably something that can operate on a 100MHZ ARM controller while being able to keep a 44.1KHZ sample rate). Edit: the result wave should actually be backwards. Sorry.
Second Edit:
I got the DSP to output in stereo, but unfortunately that didn't fix anything else like I hoped it would. I also fixed some bugs with the encoder so now it takes 8 bit unsigned audio. This has become more of a math issue so I think I'll post a similar question in Mathematics Stack Exchange. Well that was a waste of time. It got put on fhold near instantly.
You have basically a record of the signal's local extrema and want to reconstruct the signal. The most straight-forward way would be to use some monotonic interpolation scheme. You can try if this fits your needs. But I guess, the result would be very inaccurate because the characteristics of the signal are ignored.
I am not an audio engineer, so my assumptions could be wrong. But maybe, you get somewhere with these thoughts.
The signal is basically a mixture of sines. Calculating a sine function for any segment between two key frames is quite easy. The period is given by twice their distance. The amplitude is given by half the amplitude difference. This will give you a sine that hits the two key samples exactly. Furthermore, it will give you a C1-continuous signal because the derivatives at the connection points are zero. For a nice signal, you probably need even more smoothness. So you could start to interpolate the two sines around a key frame with an appropriate window function. I would start with a simple triangle window but others may give better results. This procedure will preserve the extrema.
It is probably easier to tackle this problem visually (with a plot of the signal), so you can see the results.
If it's all about size, then maybe you want to look into established audio compression methods. They usually give much better compression ratio than 1:2. Also, I don't understand why this method saves RAM because you'll have to calculate all samples when decoding. Of course, this assumes that not the complete data are loaded into RAM but streamed in pieces.
I'm playing around with looping a sound snippet at succesively faster speeds, for the fun of it, and have stumbled upon this question, which solves it nicely, I guess. It does get kind of buggy when you go to high speeds, because it drops out anything in between and just takes one byte every so often. So I wanted to change it to take the average of all bytes in a stretch within the array. Problem is, bytes don't lend themselves to be divided by ints, and I'm a bit stupid when it comes to changing from bytes to ints. My solution was to do this (again complementing from the mentioned question.
import javax.swing.JOptionPane;
import javax.swing.JFileChooser;
import javax.sound.sampled.*;
import java.net.URL;
import java.io.ByteArrayOutputStream;
import java.io.ByteArrayInputStream;
import java.util.Date;
import java.io.File;
class AcceleratePlayback {
public static void main(String[] args) throws Exception {
int playBackSpeed = 3;
File soundFile;
if (args.length>0) {
try {
playBackSpeed = Integer.parseInt(args[0]);
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
}
System.out.println("Playback Rate: " + playBackSpeed);
JFileChooser chooser = new JFileChooser();
chooser.showOpenDialog(null);
soundFile = chooser.getSelectedFile();
System.out.println("FILE: " + soundFile);
AudioInputStream ais = AudioSystem.getAudioInputStream(soundFile);
AudioFormat af = ais.getFormat();
int frameSize = af.getFrameSize();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] b = new byte[2^16];
int read = 1;
while( read>-1 ) {
read = ais.read(b);
if (read>0) {
baos.write(b, 0, read);
}
}
System.out.println("End entire: \t" + new Date());
//This is the important bit
byte[] b1 = baos.toByteArray();
byte[] b2 = new byte[b1.length/playBackSpeed];
for (int ii=0; ii<b2.length/frameSize; ii++) {
for (int jj=0; jj<frameSize; jj++) {
int b3=0;
for (int kk = 0; kk < playBackSpeed; kk++){
b3 = b3+(int)b1[(ii*frameSize*playBackSpeed)+jj+kk];
}
b3 = b3/playBackSpeed;
b2[(ii*frameSize)+jj] = (byte)b3;
}
}
//ends here
System.out.println("End sub-sample: \t" + new Date());
ByteArrayInputStream bais = new ByteArrayInputStream(b2);
AudioInputStream aisAccelerated = new AudioInputStream(bais, af, b2.length);
Clip clip = AudioSystem.getClip();
clip.open(aisAccelerated);
clip.loop(2*playBackSpeed);
clip.start();
JOptionPane.showMessageDialog(null, "Exit?");
}
}
I do realize this is probably the wrong way to do it, but I'm not sure what else I could do, any thoughts??
Best, Alex.
Since my earlier "solution" was cited, I'll lay out what I use for variable speed playback in a more detail. I confess, I don't totally understand the approach used in this question, and so am not going to make an attempt to improve upon the code. I'm risking "not answering the question" in doing this, but perhaps the increased detail about using linear interpolation will show that this can be a sufficient way to make the higher-speed loops you are going for.
I'm NOT claiming the approach that I came up with is the best. I'm not a sound engineer. But it seems to work. (Am always grateful for any suggested improvements.)
This is for a sound library I've made for my own games. It is based on the idea of a Java Clip, but with some extra capabilities. In my library, there's a place to store data, and another couple structures for playback, one for concurrent single plays, and another for looping. Both allow vari-speed, even to the extent of playing the sound backwards.
To load and hold the "clip" data, I just use a single int[], called 'clipData', but I use it for both L & R, so odd & even ints are for either ear.
Loading 'clipData' initially:
while((bytesRead = ais.read(buffer, 0, 1024)) != -1)
{
bufferIdx = 0;
for (int i = 0, n = bytesRead / 2; i < n; i ++)
{
clipData[(int)clipIdx++] =
( buffer[(int)bufferIdx++] & 0xff )
| ( buffer[(int)bufferIdx++] << 8 ) ;
}
}
For playback, the object that holds this data array has two get() methods. The first is for normal speed. An int is used to index into the clipData array (maybe should be a 'long', for larger audio files!):
public double[] get(int idx) throws ArrayIndexOutOfBoundsException
{
idx *= 2; // assumed: stereo data
double[] audioVals = new double[2];
audioVals[0] = clipData[idx++];
audioVals[1] = clipData[idx];
return audioVals;
}
Maybe returning a float array is acceptable, in place of the double[]?
Here is the enhanced get() method for variable speed. It uses linear interpolation to account for the fractional part of the double used as an index into the clipData:
public double[] get(double idx) throws ArrayIndexOutOfBoundsException
{
int intPart = (int)idx * 2;
double fractionalPart = idx * 2 - intPart;
int valR1 = clipData[intPart++];
int valL1 = clipData[intPart++];
int valR2 = clipData[intPart++];
int valL2 = clipData[intPart];
double[] audioVals = new double[2];
audioVals[0] = (valR1 * (1 - fractionalPart)
+ valR2 * fractionalPart);
audioVals[1] = (valL1 * (1 - fractionalPart)
+ valL2 * fractionalPart);
return audioVals;
}
The while(playing) loop (for loading data into the playback SourceDataLine) has a variable associated with the clipData which I call "cursor" that iterates through the sound data array. For normal playback, 'cursor' is incremented by 1, and tested to make sure it goes back to zero when it reaches the end of the clipData.
You could write something like: audioData = clipData.get(cursor++) to read successive frames of the data.
For varispeed, the above would be more like this:
audioData = clipData.get(cursor += speedIncrement);
'speedIncrement' is a double. If it is set to 2.0, the playback is twice as fast. If it is set to 0.5, it is half as fast. If you put in the right checks you can even make speedIncrement equal a negative value for reverse playback.
This works as long as the speed doesn't go above the Nyquist value (at least theoretically). And again, you have to test to make sure 'cursor' hasn't gone off the edge of the clipData, but restarts at the appropriate spot at the other end of the sound data array.
Hope this helps!
Another note: you may want to rewrite the above get() methods to send a buffer's worth of reads instead of single frames. I'm currently experimenting with doing things on a per-frame basis. I think it makes the code a little easier to understand, and helps with per-frame processing and responsiveness, but it surely slows things down.
I've been working on implementing a system for real-time audio capture and analysis within an existing music software project. The goal of this system is to begin capturing audio when the user presses the record button (or after a specified count-in period), determine the notes the user sings or plays, and notate these notes on a musical staff. The gist of my method is to use one thread to capture chunks of audio data and put them into a queue, and another thread to remove the data from the queue and perform the analysis.
This scheme works well, but I am having trouble quantifying the latency between the onset of audio capture and playback of the MIDI backing instruments. Audio capture begins before the MIDI instruments begin playing back, and the user is presumably going to be synchronizing his or her performance with the MIDI instruments. Therefore, I need to ignore audio data captured before the backing MIDI instruments begin playing and only analyze audio data collected after that point.
Playback of the backing tracks is handled by a body of code that has been in place for quite a while and maintained by someone else, so I would like to avoid refactoring the whole program if possible. Audio capture is controlled with a Timer object and a class that extends TimerTask, instances of which are created in a lumbering (~25k lines) class called Notate. Notate also keeps tabs on the objects that handle playback of the backing tracks, by the way. The Timer’s .scheduleAtFixedRate() method is used to control periods of audio capture, and the TimerTask notifies the capture thread to begin by calling .notify() on the queue (ArrayBlockingQueue).
My strategy for calculating the time gap between the initialization of these two processes has been to subtract the timestamp taken just before capture begins (in milliseconds) from the timestamp taken at the moment playback begins, which I'm defining this as when the .start() method is called on the Java Sequencer object that is in charge of the MIDI backing tracks. I then use the result to determine the number of audio samples that I expect to have been captured during this interval (n) and ignore the first n * 2 bytes in the array of captured audio data (n * 2 because I am capturing 16-bit samples, whereas the data is stored as a byte array… 2 bytes per sample).
However, this method is not giving me accurate results. The calculated offset is always less than I expect it to be, such that there remains a non-trivial (and unfortunately varied) amount of “empty” space in the audio data after beginning analysis at the designated position. This causes the program to attempt to analyze audio data collected when the user had not yet begun to play along with the backing MIDI instruments, effectively adding rests - the absence of musical notes - at the begging of the user’s musical passage and ruining the rhythm values calculated for all subsequent notes.
Below is the code for my audio capture thread, which also determines the latency and corresponding position offset for the array of captured audio data. Can anyone offer insight into why my method for determining latency is not working correctly?
public class CaptureThread extends Thread
{
public void run()
{
//number of bytes to capture before putting data in the queue.
//determined via the sample rate, tempo, and # of "beats" in 1 "measure"
int bytesToCapture = (int) ((SAMPLE_RATE * 2.) / (score.getTempo()
/ score.getMetre()[0] / 60.));
//temporary buffer - will be added to ByteArrayOutputStream upon filling.
byte tempBuffer[] = new byte[target.getBufferSize() / 5];
int limit = (int) (bytesToCapture / tempBuffer.length);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(bytesToCapture);
int bytesRead;
try
{ //Loop until stopCapture is set.
while (!stopCapture)
{ //first, wait for notification from TimerTask
synchronized (thisCapture)
{
thisCapture.wait();
}
if (!processingStarted)
{ //the time at which audio capture begins
startTime = System.currentTimeMillis();
}
//start the TargetDataLine, from which audio data is read
target.start();
//collect 1 captureInterval's worth of data
for (int n = 0; n < limit; n++)
{
bytesRead = target.read(tempBuffer, 0, tempBuffer.length);
if (bytesRead > 0)
{ //Append data to output stream.
outputStream.write(tempBuffer, 0, bytesRead);
}
}
if (!processingStarted)
{
long difference = (midiSynth.getPlaybackStartTime()
+ score.getCountInTime() * 1000 - startTime);
positionOffset = (int) ((difference / 1000.)
* SAMPLE_RATE * 2.);
if (positionOffset % 2 != 0)
{ //1 sample = 2 bytes, so positionOffset must be even
positionOffset += 1;
}
}
if (outputStream.size() > 0)
{ //package data collected in the output stream into a byte array
byte[] capturedAudioData = outputStream.toByteArray();
//add captured data to the queue for processing
processingQueue.add(capturedAudioData);
synchronized (processingQueue)
{
try
{ //notify the analysis thread that data is in the queue
processingQueue.notify();
} catch (Exception e)
{
//handle the error
}
}
outputStream.reset(); //reset the output stream
}
}
} catch (Exception e)
{
//handle error
}
}
}
I am looking into using a Mixer object to synchronize the TargetDataLine which is accepting data from the microphone and the Line that handles playback from the MIDI instruments. Now to find the Line that handles playback... Any ideas?
Google has a good open source app called AudioBufferSize that you are probably familiar with. I modified this app the test one way latency- that is to say, the time between when a user presses a button and the sound is played by the Audio API. Here is the code I added to AudioBufferSize to achieve this. Could you use such an approach to provide the timing delta between the event and when the user perceives it?
final Button latencyButton = (Button) findViewById(R.id.latencyButton);
latencyButton.setOnClickListener(new OnClickListener() {
public void onClick(View v) {
mLatencyStartTime = getCurrentTime();
latencyButton.setEnabled(false);
// Do the latency calculation, play a 440 hz sound for 250 msec
AudioTrack sound = generateTone(440, 250);
sound.setNotificationMarkerPosition(count /2); // Listen for the end of the sample
sound.setPlaybackPositionUpdateListener(new OnPlaybackPositionUpdateListener() {
public void onPeriodicNotification(AudioTrack sound) { }
public void onMarkerReached(AudioTrack sound) {
// The sound has finished playing, so record the time
mLatencyStopTime = getCurrentTime();
diff = mLatencyStopTime - mLatencyStartTime;
// Update the latency result
TextView lat = (TextView)findViewById(R.id.latency);
lat.setText(diff + " ms");
latencyButton.setEnabled(true);
logUI("Latency test result= " + diff + " ms");
}
});
sound.play();
}
});
There is a reference to generateTone which looks likes this:
private AudioTrack generateTone(double freqHz, int durationMs) {
int count = (int)(44100.0 * 2.0 * (durationMs / 1000.0)) & ~1;
short[] samples = new short[count];
for(int i = 0; i < count; i += 2){
short sample = (short)(Math.sin(2 * Math.PI * i / (44100.0 / freqHz)) * 0x7FFF);
samples[i + 0] = sample;
samples[i + 1] = sample;
}
AudioTrack track = new AudioTrack(AudioManager.STREAM_MUSIC, 44100,
AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT,
count * (Short.SIZE / 8), AudioTrack.MODE_STATIC);
track.write(samples, 0, count);
return track;
}
Just realized, this question is multi-years old. Sorry, maybe someone will find it useful.