Wav comparison, same file - java

I'm currently stumped. I've been looking around and experimenting with audio comparison. I've found quite a bit of material, and a ton of references to different libraries and methods to do it.
As of now I've taken Audacity and exported a 3min wav file called "long.wav" and then split the first 30seconds of that into a file called "short.wav". I figured somewhere along the line I could visually log (log.txt) the data through java for each and should be able to see at least some visual similarities among the values.... here's some code
Main method:
int totalFramesRead = 0;
File fileIn = new File(filePath);
BufferedWriter writer = new BufferedWriter(new FileWriter(outPath));
writer.flush();
writer.write("");
try {
AudioInputStream audioInputStream =
AudioSystem.getAudioInputStream(fileIn);
int bytesPerFrame =
audioInputStream.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
// some audio formats may have unspecified frame size
// in that case we may read any amount of bytes
bytesPerFrame = 1;
}
// Set an arbitrary buffer size of 1024 frames.
int numBytes = 1024 * bytesPerFrame;
byte[] audioBytes = new byte[numBytes];
try {
int numBytesRead = 0;
int numFramesRead = 0;
// Try to read numBytes bytes from the file.
while ((numBytesRead =
audioInputStream.read(audioBytes)) != -1) {
// Calculate the number of frames actually read.
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
// Here, do something useful with the audio data that's
// now in the audioBytes array...
if(totalFramesRead <= 4096 * 100)
{
Complex[][] results = PerformFFT(audioBytes);
int[][] lines = GetKeyPoints(results);
DumpToFile(lines, writer);
}
}
} catch (Exception ex) {
// Handle the error...
}
audioInputStream.close();
} catch (Exception e) {
// Handle the error...
}
writer.close();
Then PerformFFT:
public static Complex[][] PerformFFT(byte[] data) throws IOException
{
final int totalSize = data.length;
int amountPossible = totalSize/Harvester.CHUNK_SIZE;
//When turning into frequency domain we'll need complex numbers:
Complex[][] results = new Complex[amountPossible][];
//For all the chunks:
for(int times = 0;times < amountPossible; times++) {
Complex[] complex = new Complex[Harvester.CHUNK_SIZE];
for(int i = 0;i < Harvester.CHUNK_SIZE;i++) {
//Put the time domain data into a complex number with imaginary part as 0:
complex[i] = new Complex(data[(times*Harvester.CHUNK_SIZE)+i], 0);
}
//Perform FFT analysis on the chunk:
results[times] = FFT.fft(complex);
}
return results;
}
At this point I've tried logging everywhere: audioBytes before transforms, Complex values, and FFT results.
The problem: No matter what values I log, the log.txt of each wav file is completely different. I'm not understanding it. Given that I took the small.wav from the large.wav (and they have all the same properties) there should be a very heavy similarity among either the raw wav byte[] data... or Complex[][] fft data... or something thus far..
How can I possibly try to compare these files if the data isn't even close to similar at any point of these calculations.
I know I'm missing quite a bit of knowledge with regards to audio analysis, and this is why I come to the board for help! Thanks for any info, help, or fixes you can offer!!

Have you looked at MARF? It is a well-documented Java library used for audio recognition.
It is used to recognize speakers (for transcription or securing software) but the same features should be able to be used to classify audio samples. I'm not familiar with it but it looks like you'd want to use the FeatureExtraction class to extract an array of features from each audio sample and then create a unique id.

For 16-bit audio, 3e-05 isn't really that different from zero. So a file of zeros is pretty much the same as a file of zeros (maybe missing equality by some tiny rounding errors.)
ADDED:
For your comparison, read in and plot, using some Java plotting library, a portion of each of the two waveforms when they get past the portion that's mostly (close to) zero.

I think for debugging you better try use matlab to plot out. Since matlab is much more powerful in dealing with this problem.
You use "wavread" to the file, and "stft" to get the short time Fourier Transformation which is a complex number Matrix. Then simply abs(Matrix) to get the magnitude of each complex number. Show the image with imshow(abs(Matrix),[]).
I don't know how do you compare the whole file and 30s clip (by looking at the stft image?)

I don't know how are you comparing both audio files, but, seeing some service that offer music recognition (like TrackId or MotoID), these services take a small sample of the music you're hearing (10-20 secs), then process them in their server, i theorize that they have samples that long or less and that they have a database of (or calculate it on the fly) patterns of that samples (in your case Fourier Transforms), in your case, you may need to break your long audio file in chunks of or smaller size than your sample data, in the first case you may find a specific chunk that resembles more the pattern in your sample data, in the second case your smaller chunks may resamble a part of your sample data and you can calculate the probability that the sample data belongs to a respective audio file.

I think you are looking at Acoustic Fingerprinting
It's hard, and there are libraries to do it.
If you want to implement it yourself, this is a whitepaper on the shazam algorithm.

Related

Can I use java.nio for Console Input?

Consider the scenario of competitive programming, I have to read 2*10^5 (or Even more ) numbers from console . Then I use BufferedReader or for even fast performance I use custom reader class that uses DataInputStream under the hood.
Quick Internet search given me this .
We can use java.io for smaller streaming of data and for large streaming we can use java.nio.
So I want to try java.nio console input and test it against the java.io performance .
Is it possible to read console input using java.nio ?
Can I read data from System.in using java.nio ?
Will it be faster than input methods that I currently have ?
Any relevant information will be appreciated.
Thanks ✌️
You can open a channel to stdin like
FileInputStream stdin = new FileInputStream(FileDescriptor.in);
FileChannel stdinChannel = stdin.getChannel();
When stdin has been redirected to a file, operations like querying the size, performing fast transfers to other channels and even memory mapping may work. But when the input is a real console or a pipe or you are reading character data, the performance is unlikely to differ significantly.
The performance depends on the way you read it, not the class you are using.
An example of code directly operating on a channel, to process white-space separated decimal numbers, is
CharsetDecoder cs = Charset.defaultCharset().newDecoder();
ByteBuffer bb = ByteBuffer.allocate(1024);
CharBuffer cb = CharBuffer.allocate(1024);
while(stdinChannel.read(bb) >= 0) {
bb.flip();
cs.decode(bb, cb, false);
bb.compact();
cb.flip();
extractDoubles(cb);
cb.compact();
}
bb.flip();
cs.decode(bb, cb, true);
if(cb.position() > 0) {
cb.flip();
extractDoubles(cb);
}
private static void extractDoubles(CharBuffer cb) {
doubles: for(int p = cb.position(); p < cb.limit(); ) {
while(p < cb.limit() && Character.isWhitespace(cb.get(p))) p++;
cb.position(p);
if(cb.hasRemaining()) {
for(; p < cb.limit(); p++) {
if(Character.isWhitespace(cb.get(p))) {
int oldLimit = cb.limit();
double d = Double.parseDouble(cb.limit(p).toString());
cb.limit(oldLimit);
processDouble(d);
continue doubles;
}
}
}
}
}
This is more complicated than using java.util.Scanner or a BufferedReader’s readLine() followed by split("\\s"), but has the advantage of avoiding the complexity of the regex engine, as well as not creating String objects for the lines. When there are more than one number per line or empty lines, i.e. the line strings would not not match the number strings, this can save the copying overhead intrinsic to string construction.
This code is still handling arbitrary charsets. When you know the expected charset and it is ASCII based, using a lightweight transformation instead of the CharsetDecoder, like shown in this answer, can gain an additional performance increase.

SourceDataLine buffersize resulting in clicks and stops in playback

I am trying to play back audio I create in realtime in my application with a SourceDataLine. When opening the SourceDataLine with sdl.open(format), it creates a default buffer of 32000 (my sample rate), so effectively one second. But since my application should be low-latency, I have tried to use a smaller buffer. (sdl.open(format, buffer);)
On generating sound, I use a buffer of 512 sampels at the moment (haven't figured out the best value there, if you have any insight, I would appreciate it)
Some pseudocode for my algorithm:
int pos = 0;
int max = 512;
byte sampleBuffer[] = new byte[max];
while(active) {
sampleBuffer[pos++] = generateSample(); // actually I generate doubles and make bytes out of em later, but who cares
if (pos == max) {
sdl.write(sampleBuffer, 0, pos);
pos = 0;
}
}
When I try to use my own buffer size (I tried everything from max to max * 2 // * 4; * 8; * 16), I get a lot of clicks and noise.
If you guys have any insight on the right way to go here, I would really appreciate it. I don't know how much bigger my SourceDataLine buffer should be than the chunks that I write to the Line, if at all. Are there any tricks to getting this smooth? I am quite certain that my program generates the audio fast enough, so that should not be the problem.

How Buffer Streams works internally in Java

I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.
As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.
All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}

convert a wav file to a txt file

thanks for reading this
Well what I'm trying to do is to take a .wav file (only a short audio) and convert it to ints, and every one represent a tone of the audio...
If you're asking why I'm doing this, is because I'm doing an arduino project, and I want to make the arduino to play a song, and for doing that I need an int array where every int is a tone.
So I thought, "well if I program a little application to convert any .wav file to a txt where are stored the ints that represent the melody notes, I just need to copy this values to the arduino project code";
So after all this, maybe you're asking "What is your problem?";
I done the code and is "working", the only problem is that the txt only have "1024" in each line...
So it's obviously that I'm having a problem, no all the tones are 1024 -_-
package WaveToText;
import java.io.*;
/**
*
* #author Luis Miguel Mejía Suárez
* #project This porject is to convert a wav music files to a int array
* Which is going to be printed in a txt file to be used for an arduino
* #serial 1.0.1 (05/11/201)
*/
public final class Converter
{
/**
*
* #Class Here is where is going to be allowed all the code for the application
*
* #Param Text is an .txt file where is going to be stored the ints
* #Param MyFile is the input of the wav file to be converted
*/
PrintStream Text;
InputStream MyFile;
public Converter () throws FileNotFoundException, IOException
{
MyFile = new FileInputStream("C:\\Users\\luismiguel\\Dropbox\\ESTUDIO\\PROGRAMAS\\JAVA\\WavToText\\src\\WaveToText\\prueba.wav");
Text = new PrintStream(new File("Notes.txt"));
}
public void ConvertToTxt() throws IOException
{
BufferedInputStream in = new BufferedInputStream(MyFile);
int read;
byte[] buff = new byte[1024];
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
Text.close();
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException{
// TODO code application logic here
Converter Exc = new Converter();
Exc.ConvertToTxt();
}
}
Wait wait wait..... a lot of things aren't right here....
You can't just read the bytes and send them to Arduino because as you are saying Arduino expects note numbers. The numbers in a Wav file are, first the "header" with audio info, and then the numbers representing discrete points in the signal (Waveform). If you want to get notes you need some algorithms for pitch detection or music transcription.
Pitch detection could work if your music is monophonic or close to monophonic. For full band songs it would be troublesome. So... I guess the "Arduino part" will play monophonic music, and you need to extract the fundamental frequency of the signal in particular time moment (This is called pitch detection and there are different ways to do it (autocorrelation, amdf, spectral analisys)). You must also keep the timing of the notes.
When you extract the frequencies there is a formula to convert frequency into integer number representing a note number on a piano. n=12(log2(f/440)) + 49 where n is the integer note number and f is the fundamental frequency of the note. Before calculating you should also quantize the frequencies you get from the pitch recognition algorithm to the closest (google for the exact note frequencies).
However I really suggest to do some more research. It would be really difficult to detect note in a music where you have few instruments playing, drums, singer, all together....
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
This bit of code reads 1024 bytes of data from in, then assigns the number of bytes read to read, which is 1024, until the end of file. You then print read to your text file.
You probably wanted to print buff to your text file, but that is going to write 1024 bytes, rather than the 1024 ints you want.
You will need to create a for loop to print the individual bytes as ints.
while ((read = in.read(buff)) > 0)
{
for (int i = 0; i < buff.length; i++)
Text.print((int)buff[i]);
}

Java algorithm for normalizing audio

I'm trying to normalize an audio file of speech.
Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter.
I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak.
I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here)
#Override
public void onAudioSamples(IAudioSamplesEvent event)
{
// get the raw audio byes and adjust it's value
ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer();
for (int i = 0; i < buffer.limit(); ++i)
buffer.put(i, (short)(buffer.get(i) * mVolume));
super.onAudioSamples(event);
}
Here, they modify the bytes in getAudioSamples() by a constant of mVolume.
Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples() to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value).
I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next.
Here's an abridged version of what I'm currently doing.
Step 1: Find peaks in file:
Reads the full audio file, and finds this highest and lowest values of buffer.get() for all AudioSamples
#Override
public void onAudioSamples(IAudioSamplesEvent event) {
IAudioSamples audioSamples = event.getAudioSamples();
ShortBuffer buffer =
audioSamples.getByteBuffer().asShortBuffer();
short min = Short.MAX_VALUE;
short max = Short.MIN_VALUE;
for (int i = 0; i < buffer.limit(); ++i) {
short value = buffer.get(i);
min = (short) Math.min(min, value);
max = (short) Math.max(max, value);
}
// assign of min/max ommitted for brevity.
super.onAudioSamples(event);
}
Step 2: Normalize all values:
In a loop similar to step1, replace the buffer with normalized values, calling:
buffer.put(i, normalize(buffer.get(i));
public short normalize(short value) {
if (isBackgroundNoise(value))
return value;
short rawMin = // min from step1
short rawMax = // max from step1
short targetRangeMin = 1000;
short targetRangeMax = 8000;
int abs = Math.abs(value);
double a = (abs - rawMin) * (targetRangeMax - targetRangeMin);
double b = (rawMax - rawMin);
double result = targetRangeMin + ( a/b );
// Copy the sign of value to result.
result = Math.copySign(result,value);
return (short) result;
}
Questions:
Is this a valid approach for attempting to normalize an audio file?
Is my math in normalize() valid?
Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?
I don't think the concept of "minimum sample value" is very meaningful, since the sample value just represents the current "height" of the sound wave at a certain time instant. I.e. its absolute value will vary between the peak value of the audio clip and zero. Thus, having a targetRangeMin seems to be wrong and will probably cause some distortion of the waveform.
I think a better approach might be to have some sort of weight function that decreases the sample value based on its size. I.e. bigger values are decreased by a large percentage than smaller values. This would also introduce some distortion, but probably not very noticeable.
Edit: here is a sample implementation of such a method:
public short normalize(short value) {
short rawMax = // max from step1
short targetMax = 8000;
//This is the maximum volume reduction
double maxReduce = 1 - targetMax/(double)rawMax;
int abs = Math.abs(value);
double factor = (maxReduce * abs/(double)rawMax);
return (short) Math.round((1 - factor) * value);
}
For reference, this is what your algorithm did to a sine curve with an amplitude of 10000:
This explains why the audio quality becomes much worse after being normalized.
This is the result after running with my suggested normalize method:
"normalization" of audio is the process of increasing the level of the audio such that the maximum is equal to some given value, usually the maximum possible value. Today, in another question, someone explained how to do this (see #1): audio volume normalization
However, you go on to say "Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter." This is called "compression" or "limiting" (not to be confused with the type of compression such as that used in encoding MP3s!). You can read more about that here: http://en.wikipedia.org/wiki/Dynamic_range_compression
A simple compressor is not particularly hard to implement, but you say your math "is embarrassingly weak." So you might want to find one that's already built. You might be able to find a compressor implemented in http://sox.sourceforge.net/ and convert that from C to Java. The only java implementation of compressor I know of who's source is available (and it's not very good) is in this book
As an alternative to solve your problem, you might be able to normalize your file in segments of say 1/2 a second each, and then connect the gain values you use for each segment using linear interpolation. You can read about linear interpolation for audio here: http://blog.bjornroche.com/2010/10/linear-interpolation-for-audio-in-c-c.html
I don't know if the source code is available for the levelator, but that's something else you can try.

Categories

Resources