I would like to be able to take a frequency (eg. 1000hz, 250hz, 100hz) and play it out through the phone hardware.
I know that Android's AudioTrack will allow me to play a 16-bit PCM if I can calculate an array of bits or shorts. I would like to calculate only a single period so that later I can loop it without any issues, and so I can keep the calculation time down.
How could this be achieved?
Looping a single period isn't necessarily a good idea - the cycle may not fit nicely into an exact number of samples so you might get an undesirable discontinuity at the end of each cycle, or worse, the audible frequency may end up slightly off.
That said, the math isn't hard:
float sample_rate = 44100;
float samples_per_cycle = sample_rate / frequency;
int samples_to_produce = ....
for (int i = 0; i < samples_to_produce; ++i) {
sample[i] = Math.floor(32767.0 * Math.sin(2 * Math.PI * i / samples_per_cycle));
}
To see what I meant above about the frequency, take the standard tuning pitch of 440 Hz.
Strictly this needs 100.227 samples, but the code above would produce 100. So if you repeat your 100 samples over and over you'll actually play the sample 441 times per second, so your pitch will be off by 1 Hz.
To avoid the problem you'd really need to calculate several periods of the waveform, although I don't know many is needed to fool the ear into hearing the right pitch.
Ideally it would be as many as are needed such that:
i / samples_per_cycle
is a whole number, so that the last sample (technically the one after the last sample) ends exactly on a cycle boundary. I think if your input frequencies are all whole numbers then producing one second's worth exactly would work.
Related
I am trying to make a simple signalgenerator for isochronic pulsating sounds.
Basically a "beatFrequency" is controlling the amplitude variation of the main (pitch) frequency.
It works pretty well except that for some pitch frequencies above 4-5 K Hz, there is a second tone generated with lower frequency.
It's not for all frequencies but for quite many I can definetly hear a second tone.
What can this be? Some kind of resonance? I tried increasing the sampling rate, but its not changing anything, and using 44100 should be enough up to around 20 KHz, if I understand correctly?
I really can't figure it out on my own, so thankful for all help!
Here is an example code with beatFreequency 1 Hz, Pitch frequency 5000 Hz and samplerate 44100.
public void playSound() {
double beatFreq = 1;
double pitch = 5000;
int mSampleRate = 44100;
AudioTrack mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, mSampleRate,
AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT,
256, AudioTrack.MODE_STREAM);
int loopLength = 2 * this.mSampleRate;
while (isPlaying) {
double[] mSound = new double[loopLength];
for (int i = 0; i < loopLength; i = i + 1) {
mSound[i] = beatFreq*Math.sin((1.0*Math.PI * i/(this.mSampleRate/pitch)));
mBuffer[i] = (short) (mSound[i] * Short.MAX_VALUE);
}
mAudioTrack.play();
mAudioTrack.write(mBuffer, 0, loopLength);
}
}
Here is (added) an image of the frequencies when I play the tone 4734Hz.. And for example there is a rather large peak at around 1100 Hz as well as many higher.
The code is now just using the pitch, I have removed the beat Freq:
In your code, you are using beatFreq*Math.sin((1.0*Math.PI * i/(this.mSampleRate/pitch)));to determine the frequency (Missing some sort of assignment here)
However, mSampleRate and pitch are int values, leading to an integer division instead of a double division. For special cases for pitch and samplerate , this will result into lower frequencies than intended. For greater pitch values the effect should getting worse.
Try to use double instead of int, that should get rid of the problem.
Rewriting answer, after better understanding question, that the OP wants to do amplitude modulation.
Yes, Java can do Amplitude Modulation. I've made a passable cricket sound, for example, by taking a 4.9KHz tone and modulating the volume at 66Hz, and giving the resulting tone an AR envelope.
In your code, the variable beatFreq remains constant over an entire for-loop. Isn't your intention to vary this as well over the course of time?
I think you should simultaneously compute the beatFreq wave value in its own function (but also using the varying i), and multiply that result (scaled to range from 0 to 1) against the value computed for the faster tone.
** EDIT
To move redundant calculations out of the inner loop the following is a possibility:
Have the following as instance variables:
private double incr;
private double incrSum;
private final double TWO_PI = Math.PI * 2;
Have the following calculation done only one time, in your constructor:
incr = pitch / audioFmt.getSampleRate();
incr *= TWO_PI;
This assumes pitch is a value stated in Hertz, and `audioFormat' is the Java AudioFormat being used. With Android, I don't know how the audio format sample rate is stored or accessed.
With this in place you can have a method that returns the next double PCM value with some very simple code:
private double getNextSinePCM()
{
incrSum += incr;
if (incrSum > TWO_PI)
{
incrSum -= TWO_PI;
}
return Math.sin(incrSum);
}
Note: do not reset incrSum to zero as you stated in your comment. This can introduce a discontinuity.
If you have a second tone, it would get its own increment and running sum. This the two results, you can then multiply them to get amplitude modulation.
Now, as to the question as how to properly convert the PCM double value returned to something Android can use, I can't give you a definitive answer as I am not running Android.
Just a comment: it seems to me you are enthusiastic about working with sound, but maybe self-taught or lagging a bit in basic programming techniques. Moving redundant calculations outside of a loop is kind of fundamental. So is the ability to make simple test cases for testing assumptions and trouble-shooting. As you go forward, I want to encourage you to dedicate some time to developing these fundamentals as well as pursuing the interest in sound! You might check out the StackOverflow code reading group as a resource for more tips. I am also self-taught and have learned a lot from there as well as the code-reading course at JavaRanch called "CattleDrive".
I have an array (size 128) of data that I am using FFT on. I am trying to find the frequency of the data through the FFT spectrum. The problem is that the formula freq = i * Fs / N doesn't seem to be working. My data is quite noisy and I don't know if it is because of my noisy data or because I am doing something else wrong. Below is my raw data:
And this is the spectrum that results from the transform:
I am getting two maximum peaks of equal magnitude at index 4 and 128 in the output array. The frequency of the data should be around 1.1333 Hz, but I am getting 5-6 or completely wrong values when I use the formula:
freq = i * Fs / N;
where i is the array index of the largest magnitude peak, Fs is the sampling rate in Hz, and N is the data size.
Using my data, you get freq = (4 * 11.9) / 128 = 0.37 Hz, which is very off from what is expected.
If my calculation is correct, are there any ways to improve my data? Or, are my calculations for frequency incorrect?
Lets first make sure you are looking at the actual magnitudes. FFTs would return complex values associated with each frequency bins. These complex values are typically represented by two values: one for the real part and another for the imaginary part. The magnitude of a frequency component can then be obtained by computing sqrt(real*real+imaginary*imaginary).
This should give you half as many values and the corresponding spectrum (with the magnitudes expressed in decibels):
As you can see there is a strong peak near 0Hz. Which is consistent with your raw data having a large average value, as well as an increasing trend from time 0 to ~4.2s (both of which being larger than the oscillations' amplitude). If we were to remove these low frequency contributions (with for example a high-pass filter with cutoff frequency around 0.25Hz) we would get the following fluctuation data:
with the corresponding spectrum:
As you can see the oscillation frequency can much more readily be observed in bin 11, which gives you freq = (11 * 11.9) / 128 = 1Hz.
Note however that whether removing these frequency components bellow 0.25HZ is an improvement of your data depends on whether those frequency components are of interest for your application (possibly not since you seem to be interested in relatively faster fluctuations).
You need to remove the DC bias (average of all samples) before the FFT to measure frequencies near 0 Hz (or near the FFT's 0th result bin). Applying a window function (von Hann or Hamming window) after removing the DC bias and before the FFT may also help.
As part of a Monte Carlo simulation, I have to roll a group of dice until certain values show up a certain amount of times. My code that does this calls upon a dice class which generates a random number between 1 and 6, and returns it. Originally the code looked like
public void roll() {
value = (int)(Math.random()*6) + 1;
}
and it wasn't very fast. By exchanging Math.random() for
ThreadLocalRandom.current().nextInt(1, 7);
It ran a section in roughly 60% of the original time, which called this around about 250 million times.
As part of the full simulation it will call upon this method billions of times at the very least, so is there any faster way to do this?
Pick a random generator that is as fast and as good as you need it to be, and that isn't slowed down to a tiny fraction of its normal speed by thread safety mechanisms. Then pick a method of generating the [1..6] integer distribution that is a fast and as precise as you need it to be.
The fastest simple generator that is of sufficiently high quality to beat standard tests for PRNGs like TestU01 (instead of failing systematically, like the Mersenne Twister) is Sebastiano Vigna's xorshift64*. I'm showing it as C code but Sebastiano has it in Java as well:
uint64_t xorshift64s (int64_t &x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
Sebastiano Vigna's site has lots of useful info, links and benchmark results. Including papers, for the mathematically inclined.
At that high resolution you can simply use 1 + xorshift64s(state) % 6 and the bias will be immeasurably small. If that is not fast enough, implement the modulo division by multiplication with the inverse. If that is not fast enough - if you cannot afford two MULs per variate - then it gets tricky and you need to come back here. xorshift1024* (Java) plus some bit trickery for the variate would be an option.
Batching - generating an array full of numbers and processing that, then refilling the array and so on - can unlock some speed reserves. Needlessly wrapping things in classes achieves the opposite.
P.S.: if ThreadLocalRandom and xorshift* are not fast enough for your purposes even with batching then you might be going about things in the wrong way, or you might be doing it in the wrong language. Or both.
P.P.S.: in languages like Java (or C#, or Delphi), abstraction is not free, it has a cost. In Java you also have to reckon with things like mandatory gratuitous array bounds checking, unless you have a compiler that can eliminate those checks. Teasing high performance out of a Java program can get very involved... In C++ you get abstraction and performance for free.
Darth is correct that Xorshift* is probably the best generator to use. Use it to fill a ring buffer of bytes, then fetch the bytes one at a time to roll your dice, refilling the buffer when you've fetched enough. To get the actual die roll, avoid division and bias by using rejection sampling. The rest of the code then looks something like this (in C):
do {
if (bp >= buffer + sizeof buffer) {
// refill buffer with Xorshifts
}
v = *bp++ & 7;
} while (v > 5);
return v;
This will allow you to get on average 6 die rolls per 64-bit random value.
I'm currently using the JTransforms-library to calculate the DFT of a one-second signal at a FS of 44100 Hz. The code is quite simple:
DoubleFFT_1D fft = new DoubleFFT_1D(SIZE);
fft.complexForward(signal); // signal: 44100 length array with audio bytes.
See this page for the documentation of JTransform's DoubleFFT_1D class.
http://incanter.org/docs/parallelcolt/api/edu/emory/mathcs/jtransforms/fft/DoubleFFT_1D.html
The question is: what is SIZE supposed to be? I know it's probably the window size, but can't seem to get it to work with the most common values I've come across, such as 1024 and 2048.
At the moment I'm testing this function by generating a signal of a 1kHz sinusoid. However, when I use the code above and I'm comparing the results with MATLAB's fft-function, they seem to be from a whole different magnitude. E.g. MATLAB gives results such as 0.0004 - 0.0922i, whereas the above code results in results like -1.7785E-11 + 6.8533E-11i, with SIZE set to 2048. The contents of the signal-array are equal however.
Which value for SIZE would give a similar FFT-function as MATLAB's built-in fft?
According to the documentation, SIZE looks like it should be the number of samples in signal. If it's truly a 1 s signal at 44.1 kHz, then you should use SIZE = 44100. Since you're using complex data, signal should be an array twice this size (real/imaginary in sequence).
If you don't use SIZE = 44100, your results will not match what Matlab gives you. This is because of the way Matlab (and probably JTransforms) scales the fft and ifft functions based on the length of the input - don't worry that the amplitudes don't match. By default, Matlab calculates the FFT using the full signal. You can provide a second argument to fft (in Matlab) to calculate the N-point FFT and it should match your JTransforms result.
From your comments, it sounds like you're trying to create a spectrogram. For this, you will have to figure out your tradeoff between: spectral resolution, temporal resolution, and computation time. Here is my (Matlab) code for a 1-second spectrogram, calculated for each 512-sample chunk of a 1s signal.
fs = 44100; % Hz
w = 1; % s
t = linspace(0, w, w*fs);
k = linspace(-fs/2, fs/2, w*fs);
% simulate the signal - time-dependent frequency
f = 10000*t; % Hz
x = cos(2*pi*f.*t);
m = 512; % SIZE
S = zeros(m, floor(w*fs/m));
for i = 0:(w*fs/m)-1
s = x((i*m+1):((i+1)*m));
S(:,i+1) = fftshift(fft(s));
end
For this image we have 512 samples along the frequency axis (y-axis), ranging from [-22050 Hz 22050 Hz]. There are 86 samples along the time axis (x-axis) covering about 1 second.
For this image we now have 4096 samples along the frequency axis (y-axis), ranging from [-22050 Hz 22050 Hz]. The time axis (x-axis) again covers about 1 second, but this time with only 10 chunks.
Whether it's more important to have fast time resolution (512-sample chunks) or high spectral resolution (4096-sample chunks) will depend on what kind of signal you're working with. You have to make a decision about what you want in terms of temporal/spectral resolution, and what you can achieve in reasonable computation time. If you use SIZE = 4096, for example, you will be able to calculate the spectrum ~10x/s (based on your sampling rate) but the FFT may not be fast enough to keep up. If you use SIZE = 512 you will have poorer spectral resolution, but the FFT will calculate much faster and you can calculate the spectrun ~86x/s. If the FFT is still not fast enough, you could then start skipping chunks (e.g. use SIZE=512 but only calculate for every other chunk, giving ~43 spectrums per 1s signal). Hopefully this makes sense.
I'm able to display waveform but I don't know how to implement zoom in on the waveform.
Any idea?
Thanks piccolo
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
Answered by https://stackoverflow.com/users/234815/John%20Knoeller
Working on this right now, c# with a little linq but should be easy enough to read and understand. The idea here is to have a array of float values from -1 to 1 representing the amplitude for every sample in the wav file. Then knowing how many samples per second, we then need a scaling factor - segments per second. At this point you simply are reducing the datapoints and smoothing them out. to zoom in really tight give a samples per second of 1000, to zoom way out maybe 5-10. Note right now im just doing normal averaing, where this needs to be updated to be much more efficent and probably use RMS (root-mean-squared) averaging to make it perfect.
private List<float> BuildAverageSegments(float[] aryRawValues, int iSamplesPerSecond, int iSegmentsPerSecond)
{
double nDurationInSeconds = aryRawValues.Length/(double) iSamplesPerSecond;
int iNumSegments = (int)Math.Round(iSegmentsPerSecond*nDurationInSeconds);
int iSamplesPerSegment = (int) Math.Round(aryRawValues.Length/(double) iNumSegments); // total number of samples divided by the total number of segments
List<float> colAvgSegVals = new List<float>();
for(int i=0; i<iNumSegments-1; i++)
{
int iStartIndex = i * iSamplesPerSegment;
int iEndIndex = (i + 1) * iSamplesPerSegment;
float fAverageSegVal = aryRawValues.Skip(iStartIndex).Take(iEndIndex - iStartIndex).Average();
colAvgSegVals.Add(fAverageSegVal);
}
return colAvgSegVals;
}
Outside of this you need to get your audio into a wav format, you should be able to find source everywhere to read that data, then use something like this to convert the raw byte data to floats - again this is horribly rough and inefficent but clear
public float[] GetFloatData()
{
//Scale Factor - SignificantBitsPerSample
if (Data != null && Data.Length > 0)
{
float nMaxValue = (float) Math.Pow((double) 2, SignificantBitsPerSample);
float[] aryFloats = new float[Data[0].Length];
for (int i = 0; i < Data[0].Length; i++ )
{
aryFloats[i] = Data[0][i]/nMaxValue;
}
return aryFloats;
}
else
{
return null;
}
}