Currently I am trying to record a sound wave from a mic and display amplitude values in realtime in Java. I came across Targetdataline but I am having a bit of trouble understanding I get data from it.
Sample code from Oracle states:
line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format, line.getBufferSize());
ByteArrayOutputStream out = new ByteArrayOutputStream();
int numBytesRead;
byte[] data = new byte[line.getBufferSize() / 5];
// Begin audio capture.
line.start();
// Here, stopped is a global boolean set by another thread.
while (!stopped) {
// Read the next chunk of data from the TargetDataLine.
numBytesRead = line.read(data, 0, data.length);
****ADDED CODE HERE*****
// Save this chunk of data.
out.write(data, 0, numBytesRead);
}
So I am currently trying to add code to get a input stream of amplitude values however I get a ton of bytes when I print what the variable data is at the added code line.
for (int j=0; j<data.length; j++) {
System.out.format("%02X ", data[j]);
}
Does anyone who has used TargetDataLine before know how I can make use of it?
For anyone who has trouble using TargetDataLine for sound extraction in the future, the class WaveData by Ganesh Tiwari contains a very helpful method that turns bytes into a float array (http://code.google.com/p/speech-recognition-java-hidden-markov-model-vq-mfcc/source/browse/trunk/SpeechRecognitionHMM/src/org/ioe/tprsa/audio/WaveData.java):
public float[] extractFloatDataFromAudioInputStream(AudioInputStream audioInputStream) {
format = audioInputStream.getFormat();
audioBytes = new byte[(int) (audioInputStream.getFrameLength() * format.getFrameSize())];
// calculate durationSec
float milliseconds = (long) ((audioInputStream.getFrameLength() * 1000) / audioInputStream.getFormat().getFrameRate());
durationSec = milliseconds / 1000.0;
// System.out.println("The current signal has duration "+durationSec+" Sec");
try {
audioInputStream.read(audioBytes);
} catch (IOException e) {
System.out.println("IOException during reading audioBytes");
e.printStackTrace();
}
return extractFloatDataFromAmplitudeByteArray(format, audioBytes);
}
Using this I can get sound amplitude data.
Related
Good morning folks, I am trying to send audio data from a microphone attached to an ESP32 board over wifi to my desktop running some Java code. if I run the audio data using Java's AudioSystems library its a bit staticy but is legible. switching to use the Sphinx-4 library which converts audio to text it only sometimes recognizes the words.
This is the first time I've had to mess with raw audio data so it may not even be possible since the board can only read up to 12 bit signals which means converting a 16 bit, every single 12 bit value maps at 15 16bit values. it could also be due to the roughly 115 microsecond delay to down sample to 16kHz
How can I smooth out the audio playback enough that it can be easily recognized by the Sphinx4 library? The current implementation has very small breaks and some noise that I think is throwing it off
ESP32 Code:
BUFFERMAX = 8000
ONE_SECOND = 1000000
int writeBuffer[BUFFERMAX];
void writeAudio(){
for(int i=0; i< BUFFERMAX;i=i+1){
//data read in is 12 bits so I mapped the value to 16 bits ( 2 bytes)
sensorValue = (map(analogRead(sensorPin), 0, 4096, -32000, 32000));
//none to minimal sound is around -7000 so try to zero out additional noise with average
int prevAvg = avg;
avg = (avg + sensorValue)/2;
sensorValue = (abs(prevAvg) + sensorValue);
if(abs(sensorValue) < 1000){sensorValue = 0;}
writeBuffer[i] = ((sensorValue));
// delay so that 8000 INTs (16000 bytes) takes one second to record
delayMicroseconds(delayMicro);
}
client.write((byte*)writeBuffer, sizeof(writeBuffer));
Java Sphinx:
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(socket.getInputStream() );
System.out.print("awaiting command...");
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis().toLowerCase());
Java play audio:
private static void init() throws LineUnavailableException {
// specifying the audio format
AudioFormat _format = new AudioFormat(16000.F,// Sample Rate
16, // Size of SampleBits
1, // Number of Channels
true, // Is Signed?
false // Is Big Endian?
);
// creating the DataLine Info for the speaker format
DataLine.Info speakerInfo = new DataLine.Info(SourceDataLine.class, _format);
// getting the mixer for the speaker
_speaker = (SourceDataLine) AudioSystem.getLine(speakerInfo);
_speaker.open(_format);
}
_streamIn = socket.getInputStream();
_speaker.start();
byte[] data = new byte[16000];
System.out.println("Waiting for data...");
while (_running) {
long start = new Date().getTime();
// checking if the data is available to speak
if (_streamIn.available() <= 0)
continue; // data not available so continue back to start of loop
// count of the data bytes read
int readCount= _streamIn.read(data, 0, data.length);
if(readCount > 0 && (readCount%2) == 0){
System.out.println(readCount);
_speaker.write(data, 0, readCount);
readCount=0;
}
System.out.println("Time: " + (new Date().getTime() - start));
}
I think I have a performance (latency) issue with the Java Sound API.
Audio Monitor
The following code does indeed work for me. It correctly opens up the microphone, and outputs the audio input through my speakers in real time (i.e. monitoring). But my concern is the speed of which the playback happens... it is half a second behind from when I speak into my microphone till playback through my speakers.
How do I increase performance? How do I lower the latency?
private void initForLiveMonitor() {
AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);
try {
//Speaker
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
SourceDataLine sourceLine = (SourceDataLine) AudioSystem.getLine(info);
sourceLine.open();
//Microphone
info = new DataLine.Info(TargetDataLine.class, format);
TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
Thread monitorThread = new Thread() {
#Override
public void run() {
targetLine.start();
sourceLine.start();
byte[] data = new byte[targetLine.getBufferSize() / 5];
int readBytes;
while (true) {
readBytes = targetLine.read(data, 0, data.length);
sourceLine.write(data, 0, readBytes);
}
}
};
System.out.println( "Start LIVE Monitor for 15 seconds" );
monitorThread.start();
Thread.sleep(15000);
targetLine.stop();
targetLine.close();
System.out.println( "End LIVE Monitor" );
}
catch(LineUnavailableException lue) { lue.printStackTrace(); }
catch(InterruptedException ie) { ie.printStackTrace(); }
}
Additional Notes
With this code, the playback is smooth (no pops nor jitters), just half a second delayed.
I also know that my computer and USB Audio interface are capable to handle real-time monitoring through the computer, because when I do a side-by-side comparison with Logic Pro X there are minimal delays--I perceive no delay at all.
My attempts at making smaller/larger the byte[] size haven't helped the issue.
My conclusion, is that this is a Java code issue I have. Thanks in advance.
There is more than one buffer involved!
When you open the SourceDataLine and TargetDataLine, I'd recommend using the form where you specify the buffer size. But I don't know what size to recommend. I haven't played around with this enough to know what the optimum size is for safely piping microphone input--my experience is more with real-time synthesis.
Anyway, how about this: define the length of data[] and use the same length in your line opening methods. Try numbers like 1024 or multiples (while making sure the number of bytes can be evenly divided by the per-frame number of bytes which looks to be 4 according to the format you are using).
int bufferLen = 1024 * 4; // experiment with buffer size here
byte[] data = new byte[bufferLen];
sourceLine.open(bufferLen);
targetLine.open(bufferLen);
Also, maybe code in your run() would be better placed elsewhere so as not to add to the required processing before the piping can even start. The array data[] and int readBytes could be instance variables and ready to roll rather than being dinked with in the run(), potentially adding to the latency.
Those are things I'd try, anyway.
I'm having some trouble with reading and playing certain audio clips on Android 2.0.1 (Motorola Droid A855). Below is the code segment that I use. It works fine for some files, but for other files it just doesn't exit the while loop. I have tried checking
InputStream.available()
method but with no luck. I even printed out the number of bytes it reads properly before getting stuck. It seems that it gets stuck in the loop at the last round of read (have less than < 512 bytes left), and doesn't exit the loop.
int sampleFreq = 44100;
int minBufferSize = AudioTrack.getMinBufferSize(sampleFreq, AudioFormat.CHANNEL_IN_STEREO, AudioFormat.ENCODING_PCM_16BIT);
int bufferSize = 512;
AudioTrack at = new AudioTrack(AudioManager.STREAM_MUSIC, sampleFreq, AudioFormat.CHANNEL_IN_STEREO, AudioFormat.ENCODING_PCM_16BIT, minBufferSize, AudioTrack.MODE_STREAM);
InputStream input;
try {
File fileID=new File(Environment.getExternalStorageDirectory(),resourceID);
input = new FileInputStream( fileID);
int filesize=(int)fileID.length();
int i=0,byteread=0;
byte[] s = new byte[bufferSize];
at.play();
while((i = input.read(s, 0, bufferSize))>-1){
at.write(s, 0, i);
//at.flush();
byteread+=i;
Log.i(TAG,"playing audio "+byteread+"\t"+filesize);
}
at.stop();
at.release();
input.close();
} catch (FileNotFoundException e) {
// TODO
e.printStackTrace();
} catch (IOException e) {
// TODO
e.printStackTrace();
}
Audio files are around 1-2MB in size and are in wav format. Following is an example of the logging-
> : playing audio 1057280 1058474
> : playing audio 1057792 1058474
> : playing audio 1058304 1058474
Any idea why this is happening as it runs perfectly for some of the audio files.
Make sure your call to write() always delivers a byte size which is an integral number of samples.
For your 16 bit stereo mode, that should be an integral multiple of 4 bytes.
Additionally, at least before the final write, for stutter-free operation you should really respect the minimum buffer size of the audio subsystem and deliver at least that much data in each call to the audio write method.
If your source data is a .wav file, make sure you actually skip the header and read samples only starting from a valid payload chunk.
I have some problems finding out, what I actually read with the AudioInputStream. The program below just prints the byte-array I get but I actually don't even know, if the bytes are actually the samples, so the byte-array is the audio wave.
File fileIn;
AudioInputStream audio_in;
byte[] audioBytes;
int numBytesRead;
int numFramesRead;
int numBytes;
int totalFramesRead;
int bytesPerFrame;
try {
audio_in = AudioSystem.getAudioInputStream(fileIn);
bytesPerFrame = audio_in.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
bytesPerFrame = 1;
}
numBytes = 1024 * bytesPerFrame;
audioBytes = new byte[numBytes];
try {
numBytesRead = 0;
numFramesRead = 0;
} catch (Exception ex) {
System.out.println("Something went completely wrong");
}
} catch (Exception e) {
System.out.println("Something went completely wrong");
}
and in some other part, I read some bytes with this:
try {
if ((numBytesRead = audio_in.read(audioBytes)) != -1) {
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
}
} catch (Exception e) {
System.out.println("Had problems reading new content");
}
So first of all, this code is not from me. This is my first time, reading audio-files so I got some help from the inter-webs. (Found the link:
Java - reading, manipulating and writing WAV files
stackoverflow, who would have known.
The question is, what are the bytes in audioBytes representing? Since the source is a 44kHz, stereo, there have to be 2 waves hiding in there somewhere, am I right? so how do I filter the important informations out of these bytes?
// EDIT
So what I added is this function:
public short[] Get_Sample() {
if(samplesRead == 1024) {
Read_Buffer();
samplesRead = 4;
} else {
samplesRead = samplesRead + 4;
}
short sample[] = new short[2];
sample[0] = (short)(audioBytes[samplesRead-4] + 256*audioBytes[samplesRead-3]);
sample[1] = (short)(audioBytes[samplesRead-2] + 256*audioBytes[samplesRead-1]);
return sample;
}
where Read_Buffer() reads the next 1024 (or less) Bytes and loads them into audioBytes. sample[0] is used for the left side, sample[1] for the right side. But I'm still not sure since the waves i get from this look quite "noisy". (Edit: the used WAV actually used little-endian byte order so I had to change the calculation.)
AudioInputStream read() method returns the raw audio data. You don't know what is the 'construction' of data before you read the audio format with getFormat() which returns AudioFormat. From AudioFormat you can getChannels() and getSampleSizeInBits() and more... This is because the AudioInputStream is made for known format.
If you calculate a sample value you have different possibilities with signes and
endianness of the data (in case of 16-bit sample). To make a more generic code
use your AudioFormat object returned from AudioInputStream to get more info
about the data buffer:
encoding() : PCM_SIGNED, PCM_UNSIGNED ...
bigEndian() : true or false
As you already discovered the incorrect sample building may lead to some disturbed sound. If you work with various files it may case a problems in the future. If you won't provide a support for some formats just check what says AudioFormat and throw exception (e.g. javax.sound.sampled.UnsupportedAudioFileException). It will save your time.
This is a code that will attempt to record an audio sample : but i've a not constructed AudioFormat object ( that has been passed to DataLine.Info) because i don't know the sample rate.
EDIT
I have seen that just randomly placing sample rate of 8000 works . But is it fine ? Can i keep any value of sample rate ?
boolean lineIsStopped = false;
TargetDataLine line = null;
AudioFormat af; // object not constructed through out
DataLine.Info info = new DataLine.Info(TargetDataLine.class, af); // af not initialized
try {
line = (TargetDataLine)AudioSystem.getLine(info);
line.open( af );
} catch( LineUnavailableException ex ) {
// handle the error
}
// now we are ready for an input
// call start to start accepting data from mic
byte data[] = new byte[ line.getBufferSize() / 5 ];
line.start(); // this statement starts delivering data into the line buffer
// start retreiving data from the line buffer
int numBytesRead;
int offset = 0;
ByteArrayOutputStream out = new ByteArrayOutputStream();
while( ! lineIsStopped ) { // when the line is not stopped i.e is active
numBytesRead = line.read( data , offset , data.length );
// now save the data
try {
out.write(data); // writes data to this output stream !
} catch( Exception exc) {
System.out.println(exc);
}
}
In this how can i construct audio format object without getting any audio sample ?
After reading your comments, you are recording from the mic. In which case you want to set the audio format according to the quality you want from the mic. If you want telephone quality 8k hz would be fine. If you want tape quality 22khz, and if you want CD quality audio 44.1khz. Of course, if you transmitting that over the network then 8khz is probably going to be good enough.
It's always a good idea to have this be a setting if your application so the user can control what quality they want.