I am really new in sound processing and till date I have been able to understand (with a lot of help and criticism :P ) how to (1) take 2 frequencies and then generate audio out of it, alternatively.
Then, (2) write that audio as a .wav file that can be played by media players.
Then, (3) instead of time I took input from the user in the form of bits(8 bytes max) and when there is '0' in the given input I took the 1st frequency and in case of '1' the 2nd frequency.
I am attaching the above mentioned code, the '(3)' one, just for the sake of helping someone who needs it.
If you want to see my previous code, click here
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
public class AudioBits {
public static void main(String[] args) throws IOException {
Scanner in = new Scanner(System.in);
final double SAMPLING_RATE = 44100; // Audio sampling rate
float timeInterval = in.nextFloat(); //Time specified by user in milliseconds for each bit to be played
int frequency1 = in.nextInt(); //Frequency1 specified by the user in hz
int frequency2 = in.nextInt(); //Frequency2 specified by the user in hz
//To check if the user enters the value in the form of 0-1 or not, as that what is required
//And also the bits entered should not be greater than 8
while (!in.hasNext("[0-1]{1,8}")) {
System.out.println("Wrong input.");
in.next();
}
//Value in zero-one form. Where there is '0' it means one frequency and incase of '1' it means the other frequency
String binary = in.next();
//Converting the String value of one-zero form into its equivalent integer
int value = Integer.parseInt(binary, 2);
int binVal = 0b10000000; //Used to perform '&' operation with 'value'
//Size of buffer[], which in case of 2 sec is 88.2
float buffer[] = new float[((int) (timeInterval * SAMPLING_RATE)) / 1000];
//Size of buffer1[], which in case of 2 sec is 88.2
float buffer1[] = new float[((int) (timeInterval * SAMPLING_RATE)) / 1000];
for (int sample = 0; sample < buffer.length; sample++) { //range from zero to buffer.length
double cycle = sample / SAMPLING_RATE; //Fraction of cycle between samples
buffer[sample] = (float) (Math.sin(2 * Math.PI * frequency1 * cycle)); //value at every point of the cycle
}
for (int sample = 0; sample < buffer1.length; sample++) {
double cycle = sample / SAMPLING_RATE; //Fraction of cycle between samples
buffer1[sample] = (float) (Math.sin(2 * Math.PI * frequency2 * cycle));
}
byte byteBuffer[] = new byte[buffer.length * 2]; //Size of byteBuffer
byte byteBuffer1[] = new byte[buffer1.length * 2]; //Size of byteBuffer1
int count = 0;
//Analog to digital
for (int i = 0; i < byteBuffer.length; i++) {
int x = (int) (buffer[count++] * Short.MAX_VALUE);
byteBuffer[i++] = (byte) x;
byteBuffer[i] = (byte) (x / 256);
}
count = 0;
for (int i = 0; i < byteBuffer1.length; i++) {
int x = (int) (buffer1[count++] * Short.MAX_VALUE);
byteBuffer1[i++] = (byte) x;
byteBuffer1[i] = (byte) (x / 256);
}
byte[] merge = new byte[8 * byteBuffer.length]; //Merged Array's length
//Merging the two frequencies into one. Where there is '0' adding 1st frequency and in case of '1' adding 2nd
for (int i = 0; i < 8; i++) { //Loop for 8 Bits
int c = value & binVal; //'&' operation to check whether 'c' contains zero or not in every iteration
if (c == 0) {
System.arraycopy(byteBuffer, 0, merge, i * (byteBuffer.length), byteBuffer.length); //Adds 1st frequency
} else {
System.arraycopy(byteBuffer1, 0, merge, i * (byteBuffer.length), byteBuffer1.length); //Adds 2nd frequency
}
binVal = binVal >> 1; //Right Shifting the value of 'binVal' to be used for 'c'
}
File out = new File("E:/RecordAudio30.wav"); //The path where user want the file data to be written
//Construct an audio format, using 44100hz sampling rate, 16 bit samples, mono, and big
// endian byte ordering
AudioFormat format = new AudioFormat((float) SAMPLING_RATE, 16, 1, true, false);
// It uses 'merge' as its buffer array that contains bytes that may be read from the stream.
ByteArrayInputStream bais = new ByteArrayInputStream(merge);
//Constructs an audio input stream that has the requested format and length in sample frames, using audio data
//from the specified input stream.
AudioInputStream audioInputStream = new AudioInputStream(bais, format, (long) (8 * (byteBuffer.length / 2)));
//Writes a stream of bytes representing an audio file of the specified file type to the external file provided.
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, out);
audioInputStream.close(); //Closes this audio input stream
}
}
Related
I am using mp3spi and Triton, and this code will handle exclusively 192kbps mp3 files.
The problem I am facing is that the first second of hz is mostly made up of exclusively:
0,0,0,0 or 255,255,255,255
I do believe I might not be skipping the header correct, in which case the frequencies are not a true depiction of the mp3 at that specific ms. Does anyone see anything wrong with the way im skipping the header, or how im adding up the bytes to the array?
In other words, I want it so the array at position [0] is equal to the mp3 at position 00:00:00, and the array at position [44100] is equal to the song at exactly 1 second in.
This is the code I use for reading the bytes from the mp3 file, adding it to the arraylist bytes.
import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
public class ReadMP3 {
private ArrayList<Integer> bytes = new ArrayList<>();
private AudioFormat decodedFormat;
public ReadMP3() throws UnsupportedAudioFileException, IOException {
String filename = new ReadFiles().getFile();
File file = new File(filename);
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioInputStream din = null;
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new
AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(),
16,
baseFormat.getChannels(),
baseFormat.getChannels() * 2,
baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
this.decodedFormat = decodedFormat;
int i = 0;
while(true){
int currentByte = din.read();
if (currentByte == -1) {break;}
bytes.add(i, currentByte);
i++;
}
din.close();
in.close();
}
This is the second part of my code, where I add 4 bytes to each index of the array, such that array.length / 44100 is equal to the length of the song in seconds. This implies that each array[i][4] is equal to 1hz.
and array[0][4] up to array[44100][4] is the first second of the song.
public class AnalyzeMP3 {
//adds 4 bytes to offset[i], where each i represents 1hz,
//and 44100hz=1sec
public static int[][] calculate(ReadMP3 mp3) {
//calculates and prints how long the song is
double seconds = mp3.getBytes().size() /
mp3.getDecodedFormat().getFrameRate() / 4;
System.out.println("Length of song: " + (int)seconds + "s");
//adds 4 values to i through the whole song
int[][] offset = new int[mp3.getBytes().size()/4][4];
for(int i = 0; i < mp3.getBytes().size()/4; i++) {
for(int j = 0; j < 4; j++) {
offset[i][j] = mp3.getBytes().get(i+j);
}
}
return offset;
}
}
Thanks Brad and VC.One for making me realize my own mistakes.
To begin with I had to add the correct values to the PCM-signed encoding like this:
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
(float)44.1, //samplerate
16, //sampleSizeInBits
2, //channels
626, //frameSize
(float)38.4615385, //frameRate
false); //bigEndian
Then I needed to accurately represent the 2 channels in an array. How I did it above in the class AnalyzeMP3 is wrong, and it should be added like this:
//adds 4 values to i through the whole song
int[][] offset = new int[mp3.getBytes().size()/4][4];
int counter = 0;
for(int i = 0; i < mp3.getBytes().size()/4;i++) {
for(int j = 0; j < 4; j++) {
offset[i][j] = mp3.getBytes().get(counter);
counter++;
}
}
After making these changes the array is 4351104 in size. 4351104 / 44100 is equal to the song length in seconds. And there is no header or anything I have to skip, the array is now an accurate representation of the whole song with 44100 frequencies each second. Which can easily be transformed to represent 10ms as 441 frequencies, etc.
After researching and a lot of trials-and-errors, I have come to a point that I can construct a spectrogram which I think it has element of rights and wrongs.
1. First, I read .wav file into a byte array and extract only the data part.
2. I convert the byte array into a double array which takes the average of right and left channels. I also notice that 1 sample of 1 channel consists of 2 bytes. So, 4 bytes into 1 double.
3. For a certain window size of power of 2, I apply FFT from here and get the amplitude in frequency domain. This is a vertical strip of the spectrogram image.
4. I do this repeatedly with the same window size and overlapping for the whole data and obtain the spectrogram.
The following is the code for read .wav into double array
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
public class readWAV2Array {
private byte[] entireFileData;
//SR = sampling rate
public double getSR(){
ByteBuffer wrapped = ByteBuffer.wrap(Arrays.copyOfRange(entireFileData, 24, 28)); // big-endian by default
double SR = wrapped.order(java.nio.ByteOrder.LITTLE_ENDIAN).getInt();
return SR;
}
public readWAV2Array(String filepath, boolean print_info) throws IOException{
Path path = Paths.get(filepath);
this.entireFileData = Files.readAllBytes(path);
if (print_info){
//extract format
String format = new String(Arrays.copyOfRange(entireFileData, 8, 12), "UTF-8");
//extract number of channels
int noOfChannels = entireFileData[22];
String noOfChannels_str;
if (noOfChannels == 2)
noOfChannels_str = "2 (stereo)";
else if (noOfChannels == 1)
noOfChannels_str = "1 (mono)";
else
noOfChannels_str = noOfChannels + "(more than 2 channels)";
//extract sampling rate (SR)
int SR = (int) this.getSR();
//extract Bit Per Second (BPS/Bit depth)
int BPS = entireFileData[34];
System.out.println("---------------------------------------------------");
System.out.println("File path: " + filepath);
System.out.println("File format: " + format);
System.out.println("Number of channels: " + noOfChannels_str);
System.out.println("Sampling rate: " + SR);
System.out.println("Bit depth: " + BPS);
System.out.println("---------------------------------------------------");
}
}
public double[] getByteArray (){
byte[] data_raw = Arrays.copyOfRange(entireFileData, 44, entireFileData.length);
int totalLength = data_raw.length;
//declare double array for mono
int new_length = totalLength/4;
double[] data_mono = new double[new_length];
double left, right;
for (int i = 0; i < new_length; i++){
left = ((data_raw[i] & 0xff) << 8) | (data_raw[i+1] & 0xff);
right = ((data_raw[i+2] & 0xff) << 8) | (data_raw[i+3] & 0xff);
data_mono[i] = (left+right)/2.0;
}
return data_mono;
}
}
The following code is the main program to run
import java.awt.Color;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import javax.imageio.ImageIO;
public class App {
public static Color getColor(double power) {
double H = power * 0.4; // Hue (note 0.4 = Green, see huge chart below)
double S = 1.0; // Saturation
double B = 1.0; // Brightness
return Color.getHSBColor((float)H, (float)S, (float)B);
}
public static void main(String[] args) {
// TODO Auto-generated method stub
String filepath = "audio_work/Sine_Sweep_Full_Spectrum_20_Hz_20_kHz_audiocheck.wav";
try {
//get raw double array containing .WAV data
readWAV2Array audioTest = new readWAV2Array(filepath, true);
double[] rawData = audioTest.getByteArray();
int length = rawData.length;
//initialize parameters for FFT
int WS = 2048; //WS = window size
int OF = 8; //OF = overlap factor
int windowStep = WS/OF;
//calculate FFT parameters
double SR = audioTest.getSR();
double time_resolution = WS/SR;
double frequency_resolution = SR/WS;
double highest_detectable_frequency = SR/2.0;
double lowest_detectable_frequency = 5.0*SR/WS;
System.out.println("time_resolution: " + time_resolution*1000 + " ms");
System.out.println("frequency_resolution: " + frequency_resolution + " Hz");
System.out.println("highest_detectable_frequency: " + highest_detectable_frequency + " Hz");
System.out.println("lowest_detectable_frequency: " + lowest_detectable_frequency + " Hz");
//initialize plotData array
int nX = (length-WS)/windowStep;
int nY = WS;
double[][] plotData = new double[nX][nY];
//apply FFT and find MAX and MIN amplitudes
double maxAmp = Double.MIN_VALUE;
double minAmp = Double.MAX_VALUE;
double amp_square;
double[] inputImag = new double[length];
for (int i = 0; i < nX; i++){
Arrays.fill(inputImag, 0.0);
double[] WS_array = FFT.fft(Arrays.copyOfRange(rawData, i*windowStep, i*windowStep+WS), inputImag, true);
for (int j = 0; j < nY; j++){
amp_square = (WS_array[2*j]*WS_array[2*j]) + (WS_array[2*j+1]*WS_array[2*j+1]);
if (amp_square == 0.0){
plotData[i][j] = amp_square;
}
else{
plotData[i][j] = 10 * Math.log10(amp_square);
}
//find MAX and MIN amplitude
if (plotData[i][j] > maxAmp)
maxAmp = plotData[i][j];
else if (plotData[i][j] < minAmp)
minAmp = plotData[i][j];
}
}
System.out.println("---------------------------------------------------");
System.out.println("Maximum amplitude: " + maxAmp);
System.out.println("Minimum amplitude: " + minAmp);
System.out.println("---------------------------------------------------");
//Normalization
double diff = maxAmp - minAmp;
for (int i = 0; i < nX; i++){
for (int j = 0; j < nY; j++){
plotData[i][j] = (plotData[i][j]-minAmp)/diff;
}
}
//plot image
BufferedImage theImage = new BufferedImage(nX, nY, BufferedImage.TYPE_INT_RGB);
double ratio;
for(int x = 0; x<nX; x++){
for(int y = 0; y<nY; y++){
ratio = plotData[x][y];
//theImage.setRGB(x, y, new Color(red, green, 0).getRGB());
Color newColor = getColor(1.0-ratio);
theImage.setRGB(x, y, newColor.getRGB());
}
}
File outputfile = new File("saved.png");
ImageIO.write(theImage, "png", outputfile);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
However, the image I obtain from .wav playing sweeping sound from 20-20kHz is like this:
The color show the intensity of sound red(High)-->green(Low)
By right, it should look something like the picture below:
I would really appreciate if I can get any correct/improvement/suggest on my project. Thank you in advance for commenting on my question.
Fortunately it seems you have more rights than wrongs.
The first and main issue which results in the extra red lines is due to how you decode the data in readWAV2Array.getByteArray. Since the samples span 4 bytes, you must index in multiples of 4 (e.g. bytes 0,1,2,3 for sample 0, bytes 4,5,6,7 for sample 1) otherwise you would be reading overlapping blocks of 4 bytes (e.g. bytes 0,1,2,3 for sample 0, bytes 1,2,3,4 for sample 1). The other thing with this conversion is that you must explicitly cast the result to the signed short type before it can be assigned to left and right (which are of type double) in order to get a signed 16 bit result out of unsigned bytes. This should give you a conversion loop which looks like:
for (int i = 0; 4*i+3 < totalLength; i++){
left = (short)((data_raw[4*i+1] & 0xff) << 8) | (data_raw[4*i] & 0xff);
right = (short)((data_raw[4*i+3] & 0xff) << 8) | (data_raw[4*i+2] & 0xff);
data_mono[i] = (left+right)/2.0;
}
At this point you should start to get a plot that has strong lines representing your 20Hz-20kHz chirp:
But you should notice that you actually get 2 lines. This is because for real-valued signal, the frequency spectrum has Hermitian symmetry. The magnitude of the spectrum above the Nyquist frequency (half the sampling rate, in this case 44100Hz/2) is thus a redundant reflection of the spectrum below the Nyquist frequency. Only plotting the non-redundant part below the Nyquist frequency can be achieved by changing the definition of nY in main to:
int nY = WS/2 + 1;
and would give you:
Almost what we're looking for, but the sweep with increasing frequency generates a figure with a line that's decreasing. That's because your indexing make the 0Hz frequency at index 0 which is the top of the figure, and the 22050Hz frequency at index nY-1 which is the bottom of the figure. To flip the figure around and get the more usual 0Hz at the bottom and 22050Hz at the top, you can change the indexing to use:
plotData[i][nY-j-1] = 10 * Math.log10(amp_square);
Now you should have a plot which looks like the one you were expecting (although with a different color map):
A final note: while I understand your intention to avoid taking the log of 0 in your conversion to decibels, setting the output to the linear scale amplitude in this specific case could produce unexpected results. Instead I would select a cutoff threshold amplitude for the protection:
// select threshold based on the expected spectrum amplitudes
// e.g. 80dB below your signal's spectrum peak amplitude
double threshold = 1.0;
// limit values and convert to dB
plotData[i][nY-j-1] = 10 * Math.log10(Math.max(amp_square,threshold));
I'm trying to generate sine wave and add it to byte array. I searched and found. However, I always get distorted waveform like an attachment.
Please give me your opinion why it happens. Thanks.
My code is here
private byte[] getData(int freq) { // taking pitch data
double pha = Math.PI/2; // defining phase
final int LENGTH = 44100 * 10; // defining length of sine wave, byte array
final byte[] arr = new byte[LENGTH];
for(int i = 0; i < arr.length; i++) {
double angle = (2.0 * Math.PI * i*freq+pha) / (44100);
arr[i] = (byte) (Math.cos(angle) *127* 0.3); // 0.3 is amplitude scale
}
return arr;
}
Distort waveform example pic
The code looks fine. I suspect it's the visualiser interpreting the two's complement signed values as unsigned (-1 becoming 255, -2 becoming 254 and so on).
I write to a wav file and plot it with SonicVisualiser
According to WAVE PCM soundfile format:
8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.
It looks like you either need to shift your sine wave up by 128 (so that it fits fully within the 0-255 range), or move to using 16-bit samples.
You can use this code to convince yourself that what you generate is appropriate at the level of Java semantics:
public static void main(String[] args) {
for (byte b : getData(300)) System.out.println(sample(b));
}
static String sample(byte val) {
final int len = (val-Byte.MIN_VALUE)/2;
final StringBuilder b = new StringBuilder();
for (int i = 0; i < len; i++) b.append(i < len-1? ' ' : '#');
return b.toString();
}
It will print a nice vertical sine. Fix your code by producing unsigned bytes with this method:
static byte[] getData(int freq) {
double pha = Math.PI/2;
final int LENGTH = 44100 * 10;
final byte[] arr = new byte[LENGTH];
for(int i = 0; i < arr.length; i++) {
double angle = (2.0 * Math.PI * i*freq+pha) / (44100);
int unsignedSample = (int) (Math.cos(angle)*Byte.MAX_VALUE*0.3 - Byte.MIN_VALUE);
arr[i] = (byte) (unsignedSample & 0xFF);
}
return arr;
}
If you print this, you'll see the same waveform which you saw in SonicVisualizer, but in that tool it will look the way you intended to.
i have a java application that records audio from a mixer and store it on a byte array, or save it to a file.
What I need is to get audio from two mixers simultaneously, and save it to an audio file (i am trying with .wav).
The thing is that I can get the two byte arrays, but don't know how to merge them (by "merge" i don't mean concatenate).
To be specific, it is an application that handles conversations over an USB modem and I need to record them (the streams are the voices for each talking person, already maged to record them separately).
Any clue on how to do it?
Here is my code:
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
public class FileMixer {
Path path1 = Paths.get("/file1.wav");
Path path2 = Paths.get("/file2.wav");
byte[] byte1 = Files.readAllBytes(path1);
byte[] byte2 = Files.readAllBytes(path2);
byte[] out = new byte[byte1.length];
public FileMixer() {
byte[] byte1 = Files.readAllBytes(path1);
byte[] byte2 = Files.readAllBytes(path2);
for (int i=0; i<byte1.Length; i++)
out[i] = (byte1[i] + byte2[i]) >> 1;
}
}
Thanks in advance
To mix sound waves digitally, you add each corresponding data point from the two files together.
for (int i=0; i<source1.length; i++)
result[i] = (source1[i] + source2[i]) >> 1;
In other words, you take item 0 from byte array 1, and item 0 from byte array two, add them together, and put the resulting number in item 0 of your result array. Repeat for the remaining values. To prevent overload, you may need to divide each resulting value by two.
Make sure to merge amplitude data and not just byte data. If your SampleRate is 8: one byte equals one amplitude data. But if it is 16 you need to add two bytes to one short and merge them.
Currently your loading your file like this
byte[] byte1 = Files.readAllBytes(path1);
This will also load your .wav file header into the byte array but you only want to merge actual audio data. Load it like this:
public static ByteBuffer loadFile(File file) throws IOException {
DataInputStream in = new DataInputStream(new FileInputStream(file));
byte[] sound = new byte[in.available() - 44];
in.skipNBytes(44); // skip the header
in.read(sound);
return ByteBuffer.wrap(sound);
}
You can then merge every byte of these Buffers or every two bytes depending on your sample size. I will use 16 as its more common.
public static ByteBuffer mergeAudio(ByteBuffer smaller, ByteBuffer larger) {
// When we merge we will get problems with LittleEndian/BigEndian
// Actually the amplitude data is stored reverse in the .wav fille
// When we extract the amplitude value we need to reverse it to get the actuall
// value
// We can then add up all the amplitude data and divide it by their amount to
// get the mean
// When we save the value we need to reverse it again
// The result will have the size of the larger audio file. In my case its file2
ByteBuffer result = ByteBuffer.allocate(larger.capacity());
while (larger.hasRemaining()) {
// getShort() for SampleSize 16bit get() for 8 bit.
// Reverse the short because of LittleEndian/BigEndian
short sum = Short.reverseBytes(larger.getShort());
int matches = 1;
// check if the smaller file still has content so it needs to merge
if (smaller.hasRemaining()) {
// getShort() for SampleSize 16bit get() for 8 bit
// Reverse the short because of LittleEndian/BigEndian
sum += Short.reverseBytes(smaller.getShort());
matches++;
}
// append the mean of all merged values
// reverse again
result.putShort(Short.reverseBytes((short) (sum / (float) matches)));
}
return result;
}
We now need to create our own .wav file header and append our merged data. Finally we can write the changes to the disk.
public static void saveToFile(File file, byte[] audioData) throws IOException {
int audioSize = audioData.length;
int fileSize = audioSize + 44;
// The stream that writes the audio file to the disk
DataOutputStream out = new DataOutputStream(new FileOutputStream(file));
// Write Header
out.writeBytes("RIFF");// 0-4 ChunkId always RIFF
out.writeInt(Integer.reverseBytes(fileSize));// 5-8 ChunkSize always audio-length +header-length(44)
out.writeBytes("WAVE");// 9-12 Format always WAVE
out.writeBytes("fmt ");// 13-16 Subchunk1 ID always "fmt " with trailing whitespace
out.writeInt(Integer.reverseBytes(16)); // 17-20 Subchunk1 Size always 16
out.writeShort(Short.reverseBytes(audioFormat));// 21-22 Audio-Format 1 for PCM PulseAudio
out.writeShort(Short.reverseBytes(channels));// 23-24 Num-Channels 1 for mono, 2 for stereo
out.writeInt(Integer.reverseBytes(sampleRate));// 25-28 Sample-Rate
out.writeInt(Integer.reverseBytes(byteRate));// 29-32 Byte Rate
out.writeShort(Short.reverseBytes(blockAlign));// 33-34 Block Align
out.writeShort(Short.reverseBytes(sampleSize));// 35-36 Bits-Per-Sample
out.writeBytes("data");// 37-40 Subchunk2 ID always data
out.writeInt(Integer.reverseBytes(audioSize));// 41-44 Subchunk 2 Size audio-length
out.write(audioData);// append the merged data
out.close();// close the stream properly
}
Its important that the two files you want to merge have the same
Channels, SampleSize, SampleRate, AudioFormat
This is how you calculate the header data:
private static short audioFormat = 1;
private static int sampleRate = 44100;
private static short sampleSize = 16;
private static short channels = 2;
private static short blockAlign = (short) (sampleSize * channels / 8);
private static int byteRate = sampleRate * sampleSize * channels / 8;
Here is your working example where I put everything together:
import static java.lang.Math.ceil;
import static java.lang.Math.round;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
public class AudioMerger {
private short audioFormat = 1;
private int sampleRate = 44100;
private short sampleSize = 16;
private short channels = 2;
private short blockAlign = (short) (sampleSize * channels / 8);
private int byteRate = sampleRate * sampleSize * channels / 8;
private ByteBuffer audioBuffer;
private ArrayList<MergeSound> sounds = new ArrayList<MergeSound>();
private ArrayList<Integer> offsets = new ArrayList<Integer>();
public void addSound(double offsetInSeconds, MergeSound sound) {
if (sound.getAudioFormat() != audioFormat)
new RuntimeException("Incompatible AudioFormat");
if (sound.getSampleRate() != sampleRate)
new RuntimeException("Incompatible SampleRate");
if (sound.getSampleSize() != sampleSize)
new RuntimeException("Incompatible SampleSize");
if (sound.getChannels() != channels)
new RuntimeException("Incompatible amount of Channels");
int offset = secondsToByte(offsetInSeconds);
offset = offset % 2 == 0 ? offset : offset + 1;// ensure we start at short when merging
sounds.add(sound);
offsets.add(secondsToByte(offsetInSeconds));
}
public void merge(double durationInSeconds) {
audioBuffer = ByteBuffer.allocate(secondsToByte(durationInSeconds));
for (int i = 0; i < sounds.size(); i++) {
ByteBuffer buffer = sounds.get(i).getBuffer();
int offset1 = offsets.get(i);
// iterate over all sound data to append it
while (buffer.hasRemaining()) {
int position = offset1 + buffer.position();// the global position in audioBuffer
// add the audio data to the vars
short sum = Short.reverseBytes(buffer.getShort());
int matches = 1;
// make sure later entries dont override the previsously merged
// continue only if theres empty audio data
if (audioBuffer.getShort(position) == 0) {
// iterate over the other sounds and check if the need to be merged
for (int j = i + 1; j < sounds.size(); j++) {// set j to i+1 to avoid all previous
ByteBuffer mergeBuffer = sounds.get(j).getBuffer();
int mergeOffset = offsets.get(j);
// check if this soundfile contains data that has to be merged
if (position >= mergeOffset && position < mergeOffset + mergeBuffer.capacity()) {
sum += Short.reverseBytes(mergeBuffer.getShort(position - mergeOffset));
matches++;
}
}
// make sure to cast to float 3/1=1 BUT round(3/1f)=2 for example
audioBuffer.putShort(position, Short.reverseBytes((short) round(sum / (float) matches)));
}
}
buffer.rewind();// So the sound can be added again
}
}
private int secondsToByte(double seconds) {
return (int) ceil(seconds * byteRate);
}
public void saveToFile(File file) throws IOException {
byte[] audioData = audioBuffer.array();
int audioSize = audioData.length;
int fileSize = audioSize + 44;
// The stream that writes the audio file to the disk
DataOutputStream out = new DataOutputStream(new FileOutputStream(file));
// Write Header
out.writeBytes("RIFF");// 0-4 ChunkId always RIFF
out.writeInt(Integer.reverseBytes(fileSize));// 5-8 ChunkSize always audio-length +header-length(44)
out.writeBytes("WAVE");// 9-12 Format always WAVE
out.writeBytes("fmt ");// 13-16 Subchunk1 ID always "fmt " with trailing whitespace
out.writeInt(Integer.reverseBytes(16)); // 17-20 Subchunk1 Size always 16
out.writeShort(Short.reverseBytes(audioFormat));// 21-22 Audio-Format 1 for PCM PulseAudio
out.writeShort(Short.reverseBytes(channels));// 23-24 Num-Channels 1 for mono, 2 for stereo
out.writeInt(Integer.reverseBytes(sampleRate));// 25-28 Sample-Rate
out.writeInt(Integer.reverseBytes(byteRate));// 29-32 Byte Rate
out.writeShort(Short.reverseBytes(blockAlign));// 33-34 Block Align
out.writeShort(Short.reverseBytes(sampleSize));// 35-36 Bits-Per-Sample
out.writeBytes("data");// 37-40 Subchunk2 ID always data
out.writeInt(Integer.reverseBytes(audioSize));// 41-44 Subchunk 2 Size audio-length
out.write(audioData);// append the merged data
out.close();// close the stream properly
}
}
I am currently trying to implement some code using Android to detect when a number of specific audio frequency ranges are played through the phone's microphone. I have set up the class using the AudioRecord class:
int channel_config = AudioFormat.CHANNEL_CONFIGURATION_MONO;
int format = AudioFormat.ENCODING_PCM_16BIT;
int sampleSize = 8000;
int bufferSize = AudioRecord.getMinBufferSize(sampleSize, channel_config, format);
AudioRecord audioInput = new AudioRecord(AudioSource.MIC, sampleSize, channel_config, format, bufferSize);
The audio is then read in:
short[] audioBuffer = new short[bufferSize];
audioInput.startRecording();
audioInput.read(audioBuffer, 0, bufferSize);
Performing an FFT is where I become stuck, as I have very little experience in this area. I have been trying to use this class:
FFT in Java and Complex class to go with it
I am then sending the following values:
Complex[] fftTempArray = new Complex[bufferSize];
for (int i=0; i<bufferSize; i++)
{
fftTempArray[i] = new Complex(audio[i], 0);
}
Complex[] fftArray = fft(fftTempArray);
This could easily be me misunderstanding how this class is meant to work, but the values returned jump all over the place and aren't representative of a consistent frequency even in silence. Is anyone aware of a way to perform this task, or am I overcomplicating matters to try and grab only a small number of frequency ranges rather than to draw it as a graphical representation?
First you need to ensure that the result you are getting is correctly converted to a float/double. I'm not sure how the short[] version works, but the byte[] version only returns the raw byte version. This byte array then needs to be properly converted to a floating point number. The code for the conversion should look something like this:
double[] micBufferData = new double[<insert-proper-size>];
final int bytesPerSample = 2; // As it is 16bit PCM
final double amplification = 100.0; // choose a number as you like
for (int index = 0, floatIndex = 0; index < bytesRecorded - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = bufferData[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = amplification * (sample / 32768.0);
micBufferData[floatIndex] = sample32;
}
Then you use micBufferData[] to create your input complex array.
Once you get the results, use the magnitudes of the complex numbers in the results. Most of the magnitudes should be close to zero except the frequencies that have actual values.
You need the sampling frequency to convert the array indices to such magnitudes to frequencies:
private double ComputeFrequency(int arrayIndex) {
return ((1.0 * sampleRate) / (1.0 * fftOutWindowSize)) * arrayIndex;
}