How to track sections without sounds in a wav file?
A small software that I want to develop is dividing a wav file, and it consider a no volume area as a dividing point.
How can a program know that volume of a wav file is low?
I'll use Java or MFC.
I've had success with silence detection by calculating RMS of the signal. This is done in the following manner (assuming you have an array of audio samples):
long sumOfSquares = 0;
for (int i = startindex; i <= endindex; i++) {
sumOfSquares = sumOfSquares + samples[i] * samples[i];
}
int numberOfSamples = endindex - startindex + 1;
long rms = Math.sqrt(sumOfSquares / (numberOfSamples));
if rms is below a certain threshold, you can consider it being silent.
Well, a wave file is basically a list of values, which represents a sound wave discretely divided with some rate (44100 Hz usually). Silence is basically when values are near 0. Just set some threshold value and look for continuous ( let's say 100ms length) regions where value is below that threshold.
Simple silence detection is done by sequentially comparing sound chunks with some value (which is chosen depending on record quality).
Something like:
abs(track[position]) < 0.1
or
(track[position]) < 0.1) && (track[position]) > -0.1)
if we assume that chunk is [-1, 1] float.
It would work better if sound is normalized.
Related
I am reading wave files in my java program. The right channel audio has half the sample which happens to be 445440 samples (double amplitude values). Everything is working fine except for some significant differences in the values I am reading in Matlab. What's bugging me is that most of the values are identical (in my program and Matlab), but when I averaged all the elements, the values are quite far apart:
in Matlab I got: 1.4581E*-05, and my program: -44567.3253
So I started checking out values until I found a different value at the 166th element!
Matlab has -6.10351562500000e-05 and I have 2.0! (the value before and after are this are identical).
This is quite frustrating as only few elements in the first 300 elements differed! As you can imagine, I cannot physically go through all 445440 elements to understand the pattern.
I don't even know where to start looking for the issue. So taking a chance by asking all the brilliant minds out there. Here's my code if it helps:
public double[] getAmplitudes(Boolean asArrayOfDouble){
//bytesInASample is 2 (16-bit little endian);
int numOfSamples = data.length / bytesInASample ;
double[] amplitudes = new double[numOfSamples];
int pointer = 0;
for (int i = 0; i < numSamples; i++) {
double ampValue= 0;
for (int byteNumber = 0; byteNumber < bytesPerSample; byteNumber ++) {
ampValue+= (double) ((data[pointer ++] & 0xFF) << (byteNumber * 8))/32767.0;
}
amplitudes[i] = ampValue;
}
return amplitudes;
}
After this, I am simply reading the right channel data by using the following code:
double[] rightChannelData = new double[data.length/2];
for(int i = 0; i < data.length/2; i++)
{
rightChannelData [i] = data[2*i+1];
}
I know this might be a hard question to answer without seeing the actual program and it's output in contrast to the Matlab output. So do let me know if any additional information is needed.
You are masking all bytes with the term data[pointer ++] & 0xFF creating all-unsigned values. For values consisting of two bytes you are creating int values between 0 and 65536 which, after dividing by 32767.0, yield to values between 0.0 and 2.0 whereas Matlab using signed interpretation produces values in the range -1.0 and 1.0.
To illustrate this:
The short value 0xFFFE, interpreted as signed value is -2, and the division is -2/32768.0 produces -6.10351562500000e-05 while interpreted as unsigned is 65534 and 65534/32767.0 produces 2.0.
(Note that the negative value was divided by the absolute value of Short.MIN_VALUE rather than Short.MAX_VALUE…)
It’s not clear how you could calculate an average of -44567.3253 from that. Even for your unsigned values (between 0.0 and 2.0) that is way off.
After all, you are better off not doing everything manually:
ShortBuffer buf=ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN)
.asShortBuffer();
int numOfSamples = buf.remaining();
double[] amplitudes = new double[numOfSamples];
for (int i = 0; i < numOfSamples; i++) {
amplitudes[i] = buf.get() * (1.0/32768.0);
}
return amplitudes;
Since I don’t know how Matlab does the normalization I cannot guaranty that the values are the same. It’s possible that the loop body has to look like this instead:
final short s = buf.get();
amplitudes[i] = s * (s<0? (1.0/32768.0): (1.0/32767));
I recall reading about a method for efficiently using random bits in an article on a math-oriented website, but I can't seem to get the right keywords in Google to find it anymore, and it's not in my browser history.
The gist of the problem that was being asked was to take a sequence of random numbers in the domain [domainStart, domainEnd) and efficiently use the bits of the random number sequence to project uniformly into the range [rangeStart, rangeEnd). Both the domain and the range are integers (more correctly, longs and not Z). What's an algorithm to do this?
Implementation-wise, I have a function with this signature:
long doRead(InputStream in, long rangeStart, long rangeEnd);
in is based on a CSPRNG (fed by a hardware RNG, conditioned through SecureRandom) that I am required to use; the return value must be between rangeStart and rangeEnd, but the obvious implementation of this is wasteful:
long doRead(InputStream in, long rangeStart, long rangeEnd) {
long retVal = 0;
long range = rangeEnd - rangeStart;
// Fill until we get to range
for (int i = 0; (1 << (8 * i)) < range; i++) {
int in = 0;
do {
in = in.read();
// but be sure we don't exceed range
} while(retVal + (in << (8 * i)) >= range);
retVal += in << (8 * i);
}
return retVal + rangeStart;
}
I believe this is effectively the same idea as (rand() * (max - min)) + min, only we're discarding bits that push us over max. Rather than use a modulo operator which may incorrectly bias the results to the lower values, we discard those bits and try again. Since hitting the CSPRNG may trigger re-seeding (which can block the InputStream), I'd like to avoid wasting random bits. Henry points out that this code biases against 0 and 257; Banthar demonstrates it in an example.
First edit: Henry reminded me that summation invokes the Central Limit Theorem. I've fixed the code above to get around that problem.
Second edit: Mechanical snail suggested that I look at the source for Random.nextInt(). After reading it for a while, I realized that this problem is similar to the base conversion problem. See answer below.
Your algorithm produces biased results. Let's assume rangeStart=0 and rangeEnd=257. If first byte is greater than 0, that will be the result. If it's 0, the result will be either 0 or 256 with 50/50 probability. So 0 and 256 are twice less likely to be chosen than any other number.
I did a simple test to confirm this:
p(0)=0.001945
p(1)=0.003827
p(2)=0.003818
...
p(254)=0.003941
p(255)=0.003817
p(256)=0.001955
I think you need to do the same as java.util.Random.nextInt and discard the whole number, instead just the last byte.
After reading the source to Random.nextInt(), I realized that this problem is similar to the base conversion problem.
Rather than converting a single symbol at a time, it would be more effective to convert blocks of input symbol at a time through an accumulator "buffer" which is large enough to represent at least one symbol in the domain and in the range. The new code looks like this:
public int[] fromStream(InputStream input, int length, int rangeLow, int rangeHigh) throws IOException {
int[] outputBuffer = new int[length];
// buffer is initially 0, so there is only 1 possible state it can be in
int numStates = 1;
long buffer = 0;
int alphaLength = rangeLow - rangeHigh;
// Fill outputBuffer from 0 to length
for (int i = 0; i < length; i++) {
// Until buffer has sufficient data filled in from input to emit one symbol in the output alphabet, fill buffer.
fill:
while(numStates < alphaLength) {
// Shift buffer by 8 (*256) to mix in new data (of 8 bits)
buffer = buffer << 8 | input.read();
// Multiply by 256, as that's the number of states that we have possibly introduced
numStates = numStates << 8;
}
// spits out least significant symbol in alphaLength
outputBuffer[i] = (int) (rangeLow + (buffer % alphaLength));
// We have consumed the least significant portion of the input.
buffer = buffer / alphaLength;
// Track the number of states we've introduced into buffer
numStates = numStates / alphaLength;
}
return outputBuffer;
}
There is a fundamental difference between converting numbers between bases and this problem, however; in order to convert between bases, I think one needs to have enough information about the number to perform the calculation - successive divisions by the target base result in remainders which are used to construct the digits in the target alphabet. In this problem, I don't really need to know all that information, as long as I'm not biasing the data, which means I can do what I did in the loop labeled "fill."
Correct me if I'm approaching this wrong, but I have a queue server and a bunch of java workers that I'm running on in a cluster. My queue has work units that are very small but there are many of them. So far my benchmarks and review of the workers has shown that I get about 200mb/second.
So I'm trying to figure out how to get more work units via my bandwidth. Currently my CPU usage is not very high(40-50%) because it can process the data faster than the network can send it. I want to get more work through the queue and am willing to pay for it via expensive compression/decompression(since half of each core is idle right now).
I have tried java LZO and gzip, but was wondering if there was anything better(even if its more cpu expensive)?
Updated: data is a byte[]. Basically the queue only takes it in that format so I am using ByteArrayOutputStream to write two ints and a int[] to to a byte[] format. The values in int[] are all ints between 0 to 100(or 1000 but the vast majority of the numbers are zeros). The lists are quite large anywhere from 1000 to 10,000 items(again, majority zeros..never more than 100 non-zero numbers in the int[])
It sounds like using a custom compression mechanism that exploits the structure of the data could be very efficient.
Firstly, using a short[] (16 bit data type) instead of an int[] will halve (!) the amount of data sent, you can do this because the numbers are easily between -2^15 (-32768) and 2^15-1 (32767). This is ridiculously easy to implement.
Secondly, you could use a scheme similar to run-length encoding: a positive number represents that number literally, while a negative number represents that many zeros (after taking absolute values). e.g.
[10, 40, 0, 0, 0, 30, 0, 100, 0, 0, 0, 0] <=> [10, 40, -3, 30, -1, 100, -4]
This is harder to implement that just substituting short for int, but will provide ~80% compression in the very worst case (1000 numbers, 100 non-zero, none of which are consecutive).
I just did some simulations to work out the compression ratios. I tested the method I described above, and the one suggested by Louis Wasserman and sbridges. Both performed very well.
Assuming the length of the array and the number of non-zero numbers are both uniformly between their bounds, both methods save about 5400 ints (or shorts) on average with a compressed size of about 2.5% the original! The run-length encoding method seems to save about 1 additional int (or average compressed size that is 0.03% smaller), i.e. basically no difference, so you should use the one that is easiest to implement. The following are histograms of the compression ratios for 50000 random samples (they are very similar!).
Summary: using shorts instead of ints and one of the compression methods, you will be able to compress the data to about 1% of its original size!
For the simulation, I used the following R script:
SIZE <- 50000
lengths <- sample(1000:10000, SIZE, replace=T)
nonzeros <- sample(1:100, SIZE, replace=T)
f.rle <- function(len, nonzero) {
indexes <- sort(c(0,sample(1:len, nonzero, F)))
steps <- diff(indexes)
sum(steps > 1) + nonzero # one short per run of zeros, and one per zero
}
f.index <- function(len, nonzero) {
nonzero * 2
}
# using the [value, -1 * number of zeros,...] method
rle.comprs <- mapply(f.rle, lengths, nonzeros)
print(mean(lengths - rle.comprs)) # average number of shorts saved
rle.ratios <- rle.comprs / lengths * 100
print(mean(rle.ratios)) # average compression ratio
# using the [(index, value),...] method
index.comprs <- mapply(f.index, lengths, nonzeros)
print(mean(lengths - index.comprs)) # average number of shorts saved
index.ratios <- index.comprs / lengths * 100
print(mean(index.ratios)) # average compression ratio
par(mfrow=c(2,1))
hist(rle.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Run length encoding")
hist(index.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Store indices")
Try encoding your data as two varints, the first varint is the index of the number in the sequence, the second is the number itself. For entries which are 0, write nothing.
I wrote an implementation of an RLE algorithm. This operates on a byte array, so could be used as an in-line filter with your existing code. It should safely handle large or negative values should your data change in the future.
It encodes a sequence of zeros as {0}{qty} where {qty} is in the range 1..255. All other bytes are stored as the byte itself. You squish your byte array before sending, and bloat it back to full size when receiving.
public static byte[] squish(byte[] bloated) {
int size = bloated.length;
ByteBuffer bb = ByteBuffer.allocate(2 * size);
bb.putInt(size);
int zeros = 0;
for (int i = 0; i < size; i++) {
if (bloated[i] == 0) {
if (++zeros == 255) {
bb.putShort((short) zeros);
zeros = 0;
}
} else {
if (zeros > 0) {
bb.putShort((short) zeros);
zeros = 0;
}
bb.put(bloated[i]);
}
}
if (zeros > 0) {
bb.putShort((short) zeros);
zeros = 0;
}
size = bb.position();
byte[] buf = new byte[size];
bb.rewind();
bb.get(buf, 0, size).array();
return buf;
}
public static byte[] bloat(byte[] squished) {
ByteBuffer bb = ByteBuffer.wrap(squished);
byte[] bloated = new byte[bb.getInt()];
int pos = 0;
while (bb.position() < bb.capacity()) {
byte value = bb.get();
if (value == 0) {
bb.position(bb.position() - 1);
pos += bb.getShort();
} else {
bloated[pos++] = value;
}
}
return bloated;
}
I've been impressed with BZIP2, compared with 7z and gzip. I haven't personally tried this Java implementation, but it looks like it would be easy to substitute your GZIP call for this one and verify the results.
http://www.kohsuke.org/bzip2
You should probably try all the major ones on your data stream and see which works best. You should also consider that some algorithms will take longer to run, adding more latency to the queue. This may or may not be a problem depending on your application.
You can sometimes get better compression if you know something about the data. (dbaupp's answer covers this approach nicely)
This comparison of compression algorithms might be useful. From the article:
I'm trying to write a GPS tracking (akin to a jogging app) on android and the issue of GPS location jitter has reared it's ugly head. When accuracy is FINE and accuracy is within 5 meters, the position is jittering 1-n meters per second. How do you determine or filter out this jitter from legitimate movement?
Sporypal etc apps clearly have some way they are filtering out this noise.
Any thoughts?
Could you just run the positions through a low pass filter?
Something of the order
x(n) = (1-K)*x(n-1) + K*S(n)
where
S is your noisy samples and x, the low pass filtered samples. K is a constant between 0 and 1 which you would probably have to experiment with for best performance.
Per TK's suggestion:
My pseudocode will look awfully C like:
float noisy_lat[128], noisy_long[128];
float smoothed_lat[128], smoothed_lon[128];
float lat_delay=0., lon_delay=0.;
float smooth(float in[], float out[], int n, float K, float delay)
{
int i;
for (i=0; i<n; i++) {
*out = *in++ * K + delay * (1-K);
delay = *out++;
}
return delay;
}
loop:
Get new samples of position in noisy_lat and noise_lon
// LPF the noise samples to produce smoother position data
lat_delay = smooth(noisy_lat, smoothed_lat, 128, K, lat_delay);
lon_delay = smooth(noisy_lon, smoothed_lon, 128, K, lon_delay);
// Rinse. Repeat.
go to loop:
In a nutshell, this is a simply a feedback integrator with a one-sample delay. If your input has low frequency white-ish noise on top of the desired signal, this integrator will average the input signal over time, thus causing the noise components to average out to near zero, leaving you with the desired signal.
How well it works will depend on how much noise your signal has and the filter feedback factor K. As I said before, you'll have to play around a bit with the value to see which value produces the cleanest, most desirable result.
I am wokring on an Android project where I am using FFT for processing accelerometer data and I have problems understanding how are these things actually working.
I am using jTransform library by Piotr Wendykier in the following way:
int length = vectors.length;
float[] input = new float[length*2];
for(int i=0;i<length;i++){
input[i]=vectors[i];
}
FloatFFT_1D fftlib = new FloatFFT_1D(length);
fftlib.complexForward(input);
float outputData[] = new float[(input.length+1)/2];
if(input.length%2==0){
for(int i = 0; i < length/2; i++){
outputData[i]= (float) Math.sqrt((Math.pow(input[2*i],2))+(Math.pow(input[2*(i)+1], 2)));
}
}else{
for(int i = 0; i < length/2+1; i++){
outputData[i]= (float) Math.sqrt((Math.pow(input[2*i],2))+(Math.pow(input[2*i+1], 2)));
}
}
List<Float> output = new ArrayList<Float>();
for (float f : outputData) {
output.add(f);
}
the result is an array with following data .
I have problem with interpreting the output data..The data are from 10 seconds long interval, and the sampling frequency is 50Hz..While capturing I was moving the phone up and down cca each 3/4 second in my hand, so is possible that the extreme which is about x value 16 could be the period of the strongest component of the signal?
I need to obtain the frequency of the strongest component in the signal..
The frequency represented by each fft result bin is the bin number times the sample rate divided by the length of the fft (convolved with a Sinc function giving it non-zero width, to get a bit technical). If your sample rate is 50 Hz and your fft's lenght is fft length is 512, then bin 16 of the fft result would represent about 1.6 Hz which is close to having a period of 0.7 seconds.
The spike at bin 0 (DC) might represent the non-zero force of gravity on the accelerometer.
Since you have the real data, you should pass these values to realForward function (not complexForward) as stated here.