After researching and a lot of trials-and-errors, I have come to a point that I can construct a spectrogram which I think it has element of rights and wrongs.
1. First, I read .wav file into a byte array and extract only the data part.
2. I convert the byte array into a double array which takes the average of right and left channels. I also notice that 1 sample of 1 channel consists of 2 bytes. So, 4 bytes into 1 double.
3. For a certain window size of power of 2, I apply FFT from here and get the amplitude in frequency domain. This is a vertical strip of the spectrogram image.
4. I do this repeatedly with the same window size and overlapping for the whole data and obtain the spectrogram.
The following is the code for read .wav into double array
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
public class readWAV2Array {
private byte[] entireFileData;
//SR = sampling rate
public double getSR(){
ByteBuffer wrapped = ByteBuffer.wrap(Arrays.copyOfRange(entireFileData, 24, 28)); // big-endian by default
double SR = wrapped.order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt();
return SR;
}
public readWAV2Array(String filepath, boolean print_info) throws IOException{
Path path = Paths.get(filepath);
this.entireFileData = Files.readAllBytes(path);
if (print_info){
//extract format
String format = new String(Arrays.copyOfRange(entireFileData, 8, 12), "UTF-8");
//extract number of channels
int noOfChannels = entireFileData[22];
String noOfChannels_str;
if (noOfChannels == 2)
noOfChannels_str = "2 (stereo)";
else if (noOfChannels == 1)
noOfChannels_str = "1 (mono)";
else
noOfChannels_str = noOfChannels + "(more than 2 channels)";
//extract sampling rate (SR)
int SR = (int) this.getSR();
//extract Bit Per Second (BPS/Bit depth)
int BPS = entireFileData[34];
System.out.println("---------------------------------------------------");
System.out.println("File path: " + filepath);
System.out.println("File format: " + format);
System.out.println("Number of channels: " + noOfChannels_str);
System.out.println("Sampling rate: " + SR);
System.out.println("Bit depth: " + BPS);
System.out.println("---------------------------------------------------");
}
}
public double[] getByteArray (){
byte[] data_raw = Arrays.copyOfRange(entireFileData, 44, entireFileData.length);
int totalLength = data_raw.length;
//declare double array for mono
int new_length = totalLength/4;
double[] data_mono = new double[new_length];
double left, right;
for (int i = 0; i < new_length; i++){
left = ((data_raw[i] & 0xff) << 8) | (data_raw[i+1] & 0xff);
right = ((data_raw[i+2] & 0xff) << 8) | (data_raw[i+3] & 0xff);
data_mono[i] = (left+right)/2.0;
}
return data_mono;
}
}
The following code is the main program to run
import java.awt.Color;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import javax.imageio.ImageIO;
public class App {
public static Color getColor(double power) {
double H = power * 0.4; // Hue (note 0.4 = Green, see huge chart below)
double S = 1.0; // Saturation
double B = 1.0; // Brightness
return Color.getHSBColor((float)H, (float)S, (float)B);
}
public static void main(String[] args) {
// TODO Auto-generated method stub
String filepath = "audio_work/Sine_Sweep_Full_Spectrum_20_Hz_20_kHz_audiocheck.wav";
try {
//get raw double array containing .WAV data
readWAV2Array audioTest = new readWAV2Array(filepath, true);
double[] rawData = audioTest.getByteArray();
int length = rawData.length;
//initialize parameters for FFT
int WS = 2048; //WS = window size
int OF = 8; //OF = overlap factor
int windowStep = WS/OF;
//calculate FFT parameters
double SR = audioTest.getSR();
double time_resolution = WS/SR;
double frequency_resolution = SR/WS;
double highest_detectable_frequency = SR/2.0;
double lowest_detectable_frequency = 5.0*SR/WS;
System.out.println("time_resolution: " + time_resolution*1000 + " ms");
System.out.println("frequency_resolution: " + frequency_resolution + " Hz");
System.out.println("highest_detectable_frequency: " + highest_detectable_frequency + " Hz");
System.out.println("lowest_detectable_frequency: " + lowest_detectable_frequency + " Hz");
//initialize plotData array
int nX = (length-WS)/windowStep;
int nY = WS;
double[][] plotData = new double[nX][nY];
//apply FFT and find MAX and MIN amplitudes
double maxAmp = Double.MIN_VALUE;
double minAmp = Double.MAX_VALUE;
double amp_square;
double[] inputImag = new double[length];
for (int i = 0; i < nX; i++){
Arrays.fill(inputImag, 0.0);
double[] WS_array = FFT.fft(Arrays.copyOfRange(rawData, i*windowStep, i*windowStep+WS), inputImag, true);
for (int j = 0; j < nY; j++){
amp_square = (WS_array[2*j]*WS_array[2*j]) + (WS_array[2*j+1]*WS_array[2*j+1]);
if (amp_square == 0.0){
plotData[i][j] = amp_square;
}
else{
plotData[i][j] = 10 * Math.log10(amp_square);
}
//find MAX and MIN amplitude
if (plotData[i][j] > maxAmp)
maxAmp = plotData[i][j];
else if (plotData[i][j] < minAmp)
minAmp = plotData[i][j];
}
}
System.out.println("---------------------------------------------------");
System.out.println("Maximum amplitude: " + maxAmp);
System.out.println("Minimum amplitude: " + minAmp);
System.out.println("---------------------------------------------------");
//Normalization
double diff = maxAmp - minAmp;
for (int i = 0; i < nX; i++){
for (int j = 0; j < nY; j++){
plotData[i][j] = (plotData[i][j]-minAmp)/diff;
}
}
//plot image
BufferedImage theImage = new BufferedImage(nX, nY, BufferedImage.TYPE_INT_RGB);
double ratio;
for(int x = 0; x<nX; x++){
for(int y = 0; y<nY; y++){
ratio = plotData[x][y];
//theImage.setRGB(x, y, new Color(red, green, 0).getRGB());
Color newColor = getColor(1.0-ratio);
theImage.setRGB(x, y, newColor.getRGB());
}
}
File outputfile = new File("saved.png");
ImageIO.write(theImage, "png", outputfile);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
However, the image I obtain from .wav playing sweeping sound from 20-20kHz is like this:
The color show the intensity of sound red(High)-->green(Low)
By right, it should look something like the picture below:
I would really appreciate if I can get any correct/improvement/suggest on my project. Thank you in advance for commenting on my question.
Fortunately it seems you have more rights than wrongs.
The first and main issue which results in the extra red lines is due to how you decode the data in readWAV2Array.getByteArray. Since the samples span 4 bytes, you must index in multiples of 4 (e.g. bytes 0,1,2,3 for sample 0, bytes 4,5,6,7 for sample 1) otherwise you would be reading overlapping blocks of 4 bytes (e.g. bytes 0,1,2,3 for sample 0, bytes 1,2,3,4 for sample 1). The other thing with this conversion is that you must explicitly cast the result to the signed short type before it can be assigned to left and right (which are of type double) in order to get a signed 16 bit result out of unsigned bytes. This should give you a conversion loop which looks like:
for (int i = 0; 4*i+3 < totalLength; i++){
left = (short)((data_raw[4*i+1] & 0xff) << 8) | (data_raw[4*i] & 0xff);
right = (short)((data_raw[4*i+3] & 0xff) << 8) | (data_raw[4*i+2] & 0xff);
data_mono[i] = (left+right)/2.0;
}
At this point you should start to get a plot that has strong lines representing your 20Hz-20kHz chirp:
But you should notice that you actually get 2 lines. This is because for real-valued signal, the frequency spectrum has Hermitian symmetry. The magnitude of the spectrum above the Nyquist frequency (half the sampling rate, in this case 44100Hz/2) is thus a redundant reflection of the spectrum below the Nyquist frequency. Only plotting the non-redundant part below the Nyquist frequency can be achieved by changing the definition of nY in main to:
int nY = WS/2 + 1;
and would give you:
Almost what we're looking for, but the sweep with increasing frequency generates a figure with a line that's decreasing. That's because your indexing make the 0Hz frequency at index 0 which is the top of the figure, and the 22050Hz frequency at index nY-1 which is the bottom of the figure. To flip the figure around and get the more usual 0Hz at the bottom and 22050Hz at the top, you can change the indexing to use:
plotData[i][nY-j-1] = 10 * Math.log10(amp_square);
Now you should have a plot which looks like the one you were expecting (although with a different color map):
A final note: while I understand your intention to avoid taking the log of 0 in your conversion to decibels, setting the output to the linear scale amplitude in this specific case could produce unexpected results. Instead I would select a cutoff threshold amplitude for the protection:
// select threshold based on the expected spectrum amplitudes
// e.g. 80dB below your signal's spectrum peak amplitude
double threshold = 1.0;
// limit values and convert to dB
plotData[i][nY-j-1] = 10 * Math.log10(Math.max(amp_square,threshold));
Related
I am really new in sound processing and till date I have been able to understand (with a lot of help and criticism :P ) how to (1) take 2 frequencies and then generate audio out of it, alternatively.
Then, (2) write that audio as a .wav file that can be played by media players.
Then, (3) instead of time I took input from the user in the form of bits(8 bytes max) and when there is '0' in the given input I took the 1st frequency and in case of '1' the 2nd frequency.
I am attaching the above mentioned code, the '(3)' one, just for the sake of helping someone who needs it.
If you want to see my previous code, click here
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
public class AudioBits {
public static void main(String[] args) throws IOException {
Scanner in = new Scanner(System.in);
final double SAMPLING_RATE = 44100; // Audio sampling rate
float timeInterval = in.nextFloat(); //Time specified by user in milliseconds for each bit to be played
int frequency1 = in.nextInt(); //Frequency1 specified by the user in hz
int frequency2 = in.nextInt(); //Frequency2 specified by the user in hz
//To check if the user enters the value in the form of 0-1 or not, as that what is required
//And also the bits entered should not be greater than 8
while (!in.hasNext("[0-1]{1,8}")) {
System.out.println("Wrong input.");
in.next();
}
//Value in zero-one form. Where there is '0' it means one frequency and incase of '1' it means the other frequency
String binary = in.next();
//Converting the String value of one-zero form into its equivalent integer
int value = Integer.parseInt(binary, 2);
int binVal = 0b10000000; //Used to perform '&' operation with 'value'
//Size of buffer[], which in case of 2 sec is 88.2
float buffer[] = new float[((int) (timeInterval * SAMPLING_RATE)) / 1000];
//Size of buffer1[], which in case of 2 sec is 88.2
float buffer1[] = new float[((int) (timeInterval * SAMPLING_RATE)) / 1000];
for (int sample = 0; sample < buffer.length; sample++) { //range from zero to buffer.length
double cycle = sample / SAMPLING_RATE; //Fraction of cycle between samples
buffer[sample] = (float) (Math.sin(2 * Math.PI * frequency1 * cycle)); //value at every point of the cycle
}
for (int sample = 0; sample < buffer1.length; sample++) {
double cycle = sample / SAMPLING_RATE; //Fraction of cycle between samples
buffer1[sample] = (float) (Math.sin(2 * Math.PI * frequency2 * cycle));
}
byte byteBuffer[] = new byte[buffer.length * 2]; //Size of byteBuffer
byte byteBuffer1[] = new byte[buffer1.length * 2]; //Size of byteBuffer1
int count = 0;
//Analog to digital
for (int i = 0; i < byteBuffer.length; i++) {
int x = (int) (buffer[count++] * Short.MAX_VALUE);
byteBuffer[i++] = (byte) x;
byteBuffer[i] = (byte) (x / 256);
}
count = 0;
for (int i = 0; i < byteBuffer1.length; i++) {
int x = (int) (buffer1[count++] * Short.MAX_VALUE);
byteBuffer1[i++] = (byte) x;
byteBuffer1[i] = (byte) (x / 256);
}
byte[] merge = new byte[8 * byteBuffer.length]; //Merged Array's length
//Merging the two frequencies into one. Where there is '0' adding 1st frequency and in case of '1' adding 2nd
for (int i = 0; i < 8; i++) { //Loop for 8 Bits
int c = value & binVal; //'&' operation to check whether 'c' contains zero or not in every iteration
if (c == 0) {
System.arraycopy(byteBuffer, 0, merge, i * (byteBuffer.length), byteBuffer.length); //Adds 1st frequency
} else {
System.arraycopy(byteBuffer1, 0, merge, i * (byteBuffer.length), byteBuffer1.length); //Adds 2nd frequency
}
binVal = binVal >> 1; //Right Shifting the value of 'binVal' to be used for 'c'
}
File out = new File("E:/RecordAudio30.wav"); //The path where user want the file data to be written
//Construct an audio format, using 44100hz sampling rate, 16 bit samples, mono, and big
// endian byte ordering
AudioFormat format = new AudioFormat((float) SAMPLING_RATE, 16, 1, true, false);
// It uses 'merge' as its buffer array that contains bytes that may be read from the stream.
ByteArrayInputStream bais = new ByteArrayInputStream(merge);
//Constructs an audio input stream that has the requested format and length in sample frames, using audio data
//from the specified input stream.
AudioInputStream audioInputStream = new AudioInputStream(bais, format, (long) (8 * (byteBuffer.length / 2)));
//Writes a stream of bytes representing an audio file of the specified file type to the external file provided.
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, out);
audioInputStream.close(); //Closes this audio input stream
}
}
I am a student working on my first robotics project. I need help fixing a program that was provided by the manufacturer when I purchased an IR camera (similar to the Wii IR camera). The manufacturer provided two sketches. The first is an Arduino sketch that takes the i2c input and prints it out via the serial monitor. I don't have a problem with this sketch, but am providing it for reference.
#include <Wire.h>
int IRsensorAddress = 0xB0;
//int IRsensorAddress = 0x58;
int slaveAddress;
int ledPin = 13;
boolean ledState = false;
byte data_buf[16];
int i;
int Ix[4];
int Iy[4];
int s;
void Write_2bytes(byte d1, byte d2)
{
Wire.beginTransmission(slaveAddress);
Wire.write(d1); Wire.write(d2);
Wire.endTransmission();
}
void setup()
{
slaveAddress = IRsensorAddress >> 1; // This results in 0x21 as the address to pass to TWI
Serial.begin(19200);
pinMode(ledPin, OUTPUT); // Set the LED pin as output
Wire.begin();
// IR sensor initialize
Write_2bytes(0x30, 0x01); delay(10);
Write_2bytes(0x30, 0x08); delay(10);
Write_2bytes(0x06, 0x90); delay(10);
Write_2bytes(0x08, 0xC0); delay(10);
Write_2bytes(0x1A, 0x40); delay(10);
Write_2bytes(0x33, 0x33); delay(10);
delay(100);
}
void loop()
{
ledState = !ledState;
if (ledState) { digitalWrite(ledPin, HIGH); }
else { digitalWrite(ledPin, LOW); }
//IR sensor read
Wire.beginTransmission(slaveAddress);
Wire.write(0x36);
Wire.endTransmission();
Wire.requestFrom(slaveAddress, 16); // Request the 2 byte heading (MSB comes first)
for (i = 0; i < 16; i++) { data_buf[i] = 0; }
i = 0;
while (Wire.available() && i < 16) {
data_buf[i] = Wire.read();
i++;
}
Ix[0] = data_buf[1];
Iy[0] = data_buf[2];
s = data_buf[3];
Ix[0] += (s & 0x30) << 4;
Iy[0] += (s & 0xC0) << 2;
Ix[1] = data_buf[4];
Iy[1] = data_buf[5];
s = data_buf[6];
Ix[1] += (s & 0x30) << 4;
Iy[1] += (s & 0xC0) << 2;
Ix[2] = data_buf[7];
Iy[2] = data_buf[8];
s = data_buf[9];
Ix[2] += (s & 0x30) << 4;
Iy[2] += (s & 0xC0) << 2;
Ix[3] = data_buf[10];
Iy[3] = data_buf[11];
s = data_buf[12];
Ix[3] += (s & 0x30) << 4;
Iy[3] += (s & 0xC0) << 2;
for (i = 0; i < 4; i++)
{
if (Ix[i] < 1000)
Serial.print("");
if (Ix[i] < 100)
Serial.print("");
if (Ix[i] < 10)
Serial.print("");
Serial.print(int(Ix[i]));
Serial.print(",");
if (Iy[i] < 1000)
Serial.print("");
if (Iy[i] < 100)
Serial.print("");
if (Iy[i] < 10)
Serial.print("");
Serial.print(int(Iy[i]));
if (i < 3)
Serial.print(",");
}
Serial.println("");
delay(15);
}
The second sketch is a Processing sketch that draws a visual representation of the IR signal being seen by the camera. This is where I am having the issue. Sometimes when I start the program, it runs correctly. I get a printout of 8 numbers (four (x,y) coordinates) and the blobs representing the IR signals can be seen via the Processing window. But more often than not, I get an "ArrayIndexOutofBoundsException" followed by the number of the index being called for the output array.
Based on my searching and requesting help on the Processing forum, it seems that there may be an issue with converting the string from the serial output of the Arduino sketch to an integer array in the Processing sketch. The only response I got from the Processing forum was a link suggesting that I add tabs ("\t") to separate the string elements before I try to split them and convert them. But that didn't really do anything in terms of fixing the Array error. I originally added a line to print the length of the array so I could see that it was correct (it should be 8), and the length of the array kept changing. I'm pretty sure the error is occurring at line 29 to 40.
This is the Processing sketch:
// Example by Tom Igoe
// Modified for http://www.DFRobot.com by Lumi, Jan. 2014
/*
This code should show one colored blob for each detected IR source (max four) at the relative
position to the camera.
*/
import processing.serial.*;
int lf = 10; // Linefeed in ASCII
String myString = null;
Serial myPort; // The serial port
void setup() {
// List all the available serial ports
println(Serial.list());
// Open the port you are using at the rate you want:
myPort = new Serial(this, Serial.list()[1], 19200);
myPort.clear();
// Throw out the first reading, in case we started reading
// in the middle of a string from the sender.
myString = myPort.readStringUntil(lf);
myString = null;
size(800, 800);
//frameRate(30);
}
void draw() {
background(77);
//while (myPort.available() > 0) {
myString = myPort.readStringUntil(lf);
if (myString != null) {
int[] output = int(split(myString, ','));
println(myString); // display the incoming string
int xx = output[0];
int yy = output[1];
int ww = output[2];
int zz = output[3];
int xxx = output[4];
int yyy = output[5];
int www = output[6];
int zzz = output[7];
ellipseMode(RADIUS); // Set ellipseMode to RADIUS
fill(255, 0, 0); // Set fill to white
ellipse(xx, yy, 20, 20);
ellipseMode(RADIUS); // Set ellipseMode to RADIUS
fill(0, 255, 0); // Set fill to white
ellipse(ww, zz, 20, 20);
ellipseMode(RADIUS); // Set ellipseMode to RADIUS
fill(0, 0, 255); // Set fill to white
ellipse(xxx, yyy, 20, 20);
ellipseMode(RADIUS); // Set ellipseMode to RADIUS
fill(255); // Set fill to white
ellipse(www, zzz, 20, 20);
}
}
Any help would be appreciated. I have been banging my head against this thing for over a week now, but I don't want to give up.
I am trying to find the k nearest neighbors with the Knn classifier in OpenCV.
I found this C++ Code:
class atsKNN{
public :
void knn(cv::Mat& trainingData, cv::Mat& trainingClasses, cv::Mat& testData, cv::Mat& testClasses, int K)
{
cv::KNearest knn(trainingData, trainingClasses, cv::Mat(), false, K);
cv::Mat predicted(testClasses.rows, 1, CV_32F);
for(int i = 0; i < testData.rows; i++) {
const cv::Mat sample = testData.row(i);
predicted.at<float>(i,0) = knn.find_nearest(sample, K);
}
float percentage = evaluate(predicted, testClasses) * 100;
cout << "K Nearest Neighbor Evaluated Accuracy = " << percentage << "%" << endl;
prediction = predicted;
}
void showplot(cv::Mat testData)
{
plot_binary(testData, prediction, "Predictions Backpropagation");
}
private:
cv::Mat prediction;
};
The comments mention it works really good but i am having problems Converting it to Java. There is no Documentation for Java. I tried using a C++ to Java Converter but the resulting Code does not work.
here is the code it produced:
public class atsKNN
{
public final void knn(cv.Mat trainingData, cv.Mat trainingClasses, cv.Mat testData, cv.Mat testClasses, int K)
{
cv.KNearest knn = new cv.KNearest(trainingData, trainingClasses, cv.Mat(), false, K);
cv.Mat predicted = new cv.Mat(testClasses.rows, 1, CV_32F);
for (int i = 0; i < testData.rows; i++)
{
final cv.Mat sample = testData.row(i);
predicted.<Float>at(i,0) = knn.find_nearest(sample, K);
}
float percentage = evaluate(predicted, testClasses) * 100;
System.out.print("K Nearest Neighbor Evaluated Accuracy = ");
System.out.print(percentage);
System.out.print("%");
System.out.print("\n");
prediction = predicted;
}
public final void showplot(cv.Mat testData)
{
plot_binary(testData, prediction, "Predictions Backpropagation");
}
private cv.Mat prediction = new cv.Mat();
}
edit:
The line predicted.at(i,0) = knn.find_nearest(sample, K); has most definitely to be wrong.
There is now function at in object Mat.
Also there is no "evaluate function".
Another thing is where does the prediction Mat belong to?In java you can not just put it in the end of the class.
Thanks=)
The following code is for finding the digits
here's some code to try:
import org.opencv.core.*;
import org.opencv.imgproc.*;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.ml.*;
import org.opencv.utils.*;
import java.util.*;
class SimpleSample {
static{ System.loadLibrary(Core.NATIVE_LIBRARY_NAME); }
public static void main(String[] args) {
// samples/data/digits.png, have a look at it.
Mat digits = Imgcodecs.imread("digits.png", 0);
// setup train/test data:
Mat trainData = new Mat(),
testData = new Mat();
List<Integer> trainLabs = new ArrayList<Integer>(),
testLabs = new ArrayList<Integer>();
// 10 digits a 5 rows:
for (int r=0; r<50; r++) {
// 100 digits per row:
for (int c=0; c<100; c++) {
// crop out 1 digit:
Mat num = digits.submat(new Rect(c*20,r*20,20,20));
// we need float data for knn:
num.convertTo(num, CvType.CV_32F);
// 50/50 train/test split:
if (c % 2 == 0) {
// for opencv ml, each feature has to be a single row:
trainData.push_back(num.reshape(1,1));
// add a label for that feature (the digit number):
trainLabs.add(r/5);
} else {
testData.push_back(num.reshape(1,1));
testLabs.add(r/5);
}
}
}
// make a Mat of the train labels, and train knn:
KNearest knn = KNearest.create();
knn.train(trainData, Ml.ROW_SAMPLE, Converters.vector_int_to_Mat(trainLabs));
// now test predictions:
for (int i=0; i<testData.rows(); i++)
{
Mat one_feature = testData.row(i);
int testLabel = testLabs.get(i);
Mat res = new Mat();
float p = knn.findNearest(one_feature, 1, res);
System.out.println(testLabel + " " + p + " " + res.dump());
}
//// hmm, the 'real world' test case probably looks more like this:
//// make sure, you follow the very same preprocessing steps used in the train phase:
// Mat one_feature = Imgcodecs.imread("one_digit.png", 0);
// Mat feature; one_feature.convertTo(feature, CvTypes.CV_32F);
// Imgproc.resize(feature, feature, Size(20,20));
// int predicted = knn.findNearest(feature.reshape(1,1), 1);
}
}
i have a java application that records audio from a mixer and store it on a byte array, or save it to a file.
What I need is to get audio from two mixers simultaneously, and save it to an audio file (i am trying with .wav).
The thing is that I can get the two byte arrays, but don't know how to merge them (by "merge" i don't mean concatenate).
To be specific, it is an application that handles conversations over an USB modem and I need to record them (the streams are the voices for each talking person, already maged to record them separately).
Any clue on how to do it?
Here is my code:
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
public class FileMixer {
Path path1 = Paths.get("/file1.wav");
Path path2 = Paths.get("/file2.wav");
byte[] byte1 = Files.readAllBytes(path1);
byte[] byte2 = Files.readAllBytes(path2);
byte[] out = new byte[byte1.length];
public FileMixer() {
byte[] byte1 = Files.readAllBytes(path1);
byte[] byte2 = Files.readAllBytes(path2);
for (int i=0; i<byte1.Length; i++)
out[i] = (byte1[i] + byte2[i]) >> 1;
}
}
Thanks in advance
To mix sound waves digitally, you add each corresponding data point from the two files together.
for (int i=0; i<source1.length; i++)
result[i] = (source1[i] + source2[i]) >> 1;
In other words, you take item 0 from byte array 1, and item 0 from byte array two, add them together, and put the resulting number in item 0 of your result array. Repeat for the remaining values. To prevent overload, you may need to divide each resulting value by two.
Make sure to merge amplitude data and not just byte data. If your SampleRate is 8: one byte equals one amplitude data. But if it is 16 you need to add two bytes to one short and merge them.
Currently your loading your file like this
byte[] byte1 = Files.readAllBytes(path1);
This will also load your .wav file header into the byte array but you only want to merge actual audio data. Load it like this:
public static ByteBuffer loadFile(File file) throws IOException {
DataInputStream in = new DataInputStream(new FileInputStream(file));
byte[] sound = new byte[in.available() - 44];
in.skipNBytes(44); // skip the header
in.read(sound);
return ByteBuffer.wrap(sound);
}
You can then merge every byte of these Buffers or every two bytes depending on your sample size. I will use 16 as its more common.
public static ByteBuffer mergeAudio(ByteBuffer smaller, ByteBuffer larger) {
// When we merge we will get problems with LittleEndian/BigEndian
// Actually the amplitude data is stored reverse in the .wav fille
// When we extract the amplitude value we need to reverse it to get the actuall
// value
// We can then add up all the amplitude data and divide it by their amount to
// get the mean
// When we save the value we need to reverse it again
// The result will have the size of the larger audio file. In my case its file2
ByteBuffer result = ByteBuffer.allocate(larger.capacity());
while (larger.hasRemaining()) {
// getShort() for SampleSize 16bit get() for 8 bit.
// Reverse the short because of LittleEndian/BigEndian
short sum = Short.reverseBytes(larger.getShort());
int matches = 1;
// check if the smaller file still has content so it needs to merge
if (smaller.hasRemaining()) {
// getShort() for SampleSize 16bit get() for 8 bit
// Reverse the short because of LittleEndian/BigEndian
sum += Short.reverseBytes(smaller.getShort());
matches++;
}
// append the mean of all merged values
// reverse again
result.putShort(Short.reverseBytes((short) (sum / (float) matches)));
}
return result;
}
We now need to create our own .wav file header and append our merged data. Finally we can write the changes to the disk.
public static void saveToFile(File file, byte[] audioData) throws IOException {
int audioSize = audioData.length;
int fileSize = audioSize + 44;
// The stream that writes the audio file to the disk
DataOutputStream out = new DataOutputStream(new FileOutputStream(file));
// Write Header
out.writeBytes("RIFF");// 0-4 ChunkId always RIFF
out.writeInt(Integer.reverseBytes(fileSize));// 5-8 ChunkSize always audio-length +header-length(44)
out.writeBytes("WAVE");// 9-12 Format always WAVE
out.writeBytes("fmt ");// 13-16 Subchunk1 ID always "fmt " with trailing whitespace
out.writeInt(Integer.reverseBytes(16)); // 17-20 Subchunk1 Size always 16
out.writeShort(Short.reverseBytes(audioFormat));// 21-22 Audio-Format 1 for PCM PulseAudio
out.writeShort(Short.reverseBytes(channels));// 23-24 Num-Channels 1 for mono, 2 for stereo
out.writeInt(Integer.reverseBytes(sampleRate));// 25-28 Sample-Rate
out.writeInt(Integer.reverseBytes(byteRate));// 29-32 Byte Rate
out.writeShort(Short.reverseBytes(blockAlign));// 33-34 Block Align
out.writeShort(Short.reverseBytes(sampleSize));// 35-36 Bits-Per-Sample
out.writeBytes("data");// 37-40 Subchunk2 ID always data
out.writeInt(Integer.reverseBytes(audioSize));// 41-44 Subchunk 2 Size audio-length
out.write(audioData);// append the merged data
out.close();// close the stream properly
}
Its important that the two files you want to merge have the same
Channels, SampleSize, SampleRate, AudioFormat
This is how you calculate the header data:
private static short audioFormat = 1;
private static int sampleRate = 44100;
private static short sampleSize = 16;
private static short channels = 2;
private static short blockAlign = (short) (sampleSize * channels / 8);
private static int byteRate = sampleRate * sampleSize * channels / 8;
Here is your working example where I put everything together:
import static java.lang.Math.ceil;
import static java.lang.Math.round;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
public class AudioMerger {
private short audioFormat = 1;
private int sampleRate = 44100;
private short sampleSize = 16;
private short channels = 2;
private short blockAlign = (short) (sampleSize * channels / 8);
private int byteRate = sampleRate * sampleSize * channels / 8;
private ByteBuffer audioBuffer;
private ArrayList<MergeSound> sounds = new ArrayList<MergeSound>();
private ArrayList<Integer> offsets = new ArrayList<Integer>();
public void addSound(double offsetInSeconds, MergeSound sound) {
if (sound.getAudioFormat() != audioFormat)
new RuntimeException("Incompatible AudioFormat");
if (sound.getSampleRate() != sampleRate)
new RuntimeException("Incompatible SampleRate");
if (sound.getSampleSize() != sampleSize)
new RuntimeException("Incompatible SampleSize");
if (sound.getChannels() != channels)
new RuntimeException("Incompatible amount of Channels");
int offset = secondsToByte(offsetInSeconds);
offset = offset % 2 == 0 ? offset : offset + 1;// ensure we start at short when merging
sounds.add(sound);
offsets.add(secondsToByte(offsetInSeconds));
}
public void merge(double durationInSeconds) {
audioBuffer = ByteBuffer.allocate(secondsToByte(durationInSeconds));
for (int i = 0; i < sounds.size(); i++) {
ByteBuffer buffer = sounds.get(i).getBuffer();
int offset1 = offsets.get(i);
// iterate over all sound data to append it
while (buffer.hasRemaining()) {
int position = offset1 + buffer.position();// the global position in audioBuffer
// add the audio data to the vars
short sum = Short.reverseBytes(buffer.getShort());
int matches = 1;
// make sure later entries dont override the previsously merged
// continue only if theres empty audio data
if (audioBuffer.getShort(position) == 0) {
// iterate over the other sounds and check if the need to be merged
for (int j = i + 1; j < sounds.size(); j++) {// set j to i+1 to avoid all previous
ByteBuffer mergeBuffer = sounds.get(j).getBuffer();
int mergeOffset = offsets.get(j);
// check if this soundfile contains data that has to be merged
if (position >= mergeOffset && position < mergeOffset + mergeBuffer.capacity()) {
sum += Short.reverseBytes(mergeBuffer.getShort(position - mergeOffset));
matches++;
}
}
// make sure to cast to float 3/1=1 BUT round(3/1f)=2 for example
audioBuffer.putShort(position, Short.reverseBytes((short) round(sum / (float) matches)));
}
}
buffer.rewind();// So the sound can be added again
}
}
private int secondsToByte(double seconds) {
return (int) ceil(seconds * byteRate);
}
public void saveToFile(File file) throws IOException {
byte[] audioData = audioBuffer.array();
int audioSize = audioData.length;
int fileSize = audioSize + 44;
// The stream that writes the audio file to the disk
DataOutputStream out = new DataOutputStream(new FileOutputStream(file));
// Write Header
out.writeBytes("RIFF");// 0-4 ChunkId always RIFF
out.writeInt(Integer.reverseBytes(fileSize));// 5-8 ChunkSize always audio-length +header-length(44)
out.writeBytes("WAVE");// 9-12 Format always WAVE
out.writeBytes("fmt ");// 13-16 Subchunk1 ID always "fmt " with trailing whitespace
out.writeInt(Integer.reverseBytes(16)); // 17-20 Subchunk1 Size always 16
out.writeShort(Short.reverseBytes(audioFormat));// 21-22 Audio-Format 1 for PCM PulseAudio
out.writeShort(Short.reverseBytes(channels));// 23-24 Num-Channels 1 for mono, 2 for stereo
out.writeInt(Integer.reverseBytes(sampleRate));// 25-28 Sample-Rate
out.writeInt(Integer.reverseBytes(byteRate));// 29-32 Byte Rate
out.writeShort(Short.reverseBytes(blockAlign));// 33-34 Block Align
out.writeShort(Short.reverseBytes(sampleSize));// 35-36 Bits-Per-Sample
out.writeBytes("data");// 37-40 Subchunk2 ID always data
out.writeInt(Integer.reverseBytes(audioSize));// 41-44 Subchunk 2 Size audio-length
out.write(audioData);// append the merged data
out.close();// close the stream properly
}
}
I am currently trying to implement some code using Android to detect when a number of specific audio frequency ranges are played through the phone's microphone. I have set up the class using the AudioRecord class:
int channel_config = AudioFormat.CHANNEL_CONFIGURATION_MONO;
int format = AudioFormat.ENCODING_PCM_16BIT;
int sampleSize = 8000;
int bufferSize = AudioRecord.getMinBufferSize(sampleSize, channel_config, format);
AudioRecord audioInput = new AudioRecord(AudioSource.MIC, sampleSize, channel_config, format, bufferSize);
The audio is then read in:
short[] audioBuffer = new short[bufferSize];
audioInput.startRecording();
audioInput.read(audioBuffer, 0, bufferSize);
Performing an FFT is where I become stuck, as I have very little experience in this area. I have been trying to use this class:
FFT in Java and Complex class to go with it
I am then sending the following values:
Complex[] fftTempArray = new Complex[bufferSize];
for (int i=0; i<bufferSize; i++)
{
fftTempArray[i] = new Complex(audio[i], 0);
}
Complex[] fftArray = fft(fftTempArray);
This could easily be me misunderstanding how this class is meant to work, but the values returned jump all over the place and aren't representative of a consistent frequency even in silence. Is anyone aware of a way to perform this task, or am I overcomplicating matters to try and grab only a small number of frequency ranges rather than to draw it as a graphical representation?
First you need to ensure that the result you are getting is correctly converted to a float/double. I'm not sure how the short[] version works, but the byte[] version only returns the raw byte version. This byte array then needs to be properly converted to a floating point number. The code for the conversion should look something like this:
double[] micBufferData = new double[<insert-proper-size>];
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 100.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bytesRecorded - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = bufferData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
Then you use micBufferData[] to create your input complex array.
Once you get the results, use the magnitudes of the complex numbers in the results. Most of the magnitudes should be close to zero except the frequencies that have actual values.
You need the sampling frequency to convert the array indices to such magnitudes to frequencies:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}