I have been reading all the posts in SO about FTT and using JTransform. I have a Real Data and it has the size of N. I would like to pass it to FFT so that I can calculate the magnitude at every frequency.
I have it done in matlab using the FFT function and I am trying to do the same in Java. My question is:
Do use RealForward or RealForwardFull? Or possibly creating another array with 0 imag and pass it to complex forward?
How big is the resulting FTT?
When you compute the DFT of an N - element array of real numbers, your result will be an N - element array of complex numbers.Now looking at the JTransform Documentation, in Jtransforms these complex numbers are represented as a pair of 2 real numbers (one representing the Real part and the other the imaginary) so your FFT result will actually be twice as long as your FFT input.
RealForwardFull computes the FFT of an array and returns all FFT coefficients whilst RealForward will only return half of the coefficients.The reason RealForward might suite some peoples needs is because the FFT has a symmetrical property, so by knowing just half of the coefficients you can obtain the rest of the coefficients using symmetrical properties of the FFT.
I am exploring the JTransforms FFT and I tried FFT.realForward() and FFT.realForwardFull() and I got these results:
RealForward
16.236587494
-2.3513066039999995 <----------------?
1.8268368691062924
-1.1179067733368138
6.698362081000001
5.353354667
-6.146021583106291
-12.013420149336813
RealForwardFull
16.236587494
0.0 <------------------
1.8268368691062924
-1.1179067733368138
6.698362081000001
5.353354667
-6.146021583106291
-12.013420149336813
-2.3513066039999995 <-----------------
0.0 <-------------
-6.146021583106291
12.013420149336813
6.698362081000001
-5.353354667
1.8268368691062924
1.1179067733368138
So, as you can see, realforwardFull() gave a different imaginary for the first number and add a new pair. Shouldn't realforward() produce the same results minus any duplication?
Because the FFT of a real signal is symmetrical, both the k=0 and k=N/2 components are real-only (you can see that in the "full" output). So they're packed together in the "non-full" output.
This output layout is explained in the JavaDoc.
(Note: FFTW does the same thing.)
For information on how your FFT output data is represented look at the JTransforms documentation HERE
For RealForwardFull the output is the Full FFT data the usual way, i.e N-Complex numbers, with the real and imaginary components interleaved.
For RealForward the output is arranged as follows: -The first two elements are the real components of the first ( k = 0 ) and the middle ( k = N/2 ) FFT coefficients respectively, the rest of the FFT coefficients are then given after that with the real and imaginary components interleaved.
Like Oli CharlesWorth has mentioned, the imaginary component of the k = 0 and k = N/2 terms will be zero because the input signal is real, so Realforward does not return them.It only returns the minimum information enough for you to be able to reconstruct the full spectrum.
I am looking for a way to shuffle a large amount of data which does not fit into memory (approx. 40GB).
I have around 30 millions entries, of variable length, stored in one large file. I know the starting and ending positions of each entry in that file. I need to shuffle this data which does not fit in the RAM.
The only solution I thought of is to shuffle an array containing the numbers from 1 to N, where N is the number of entries, with the Fisher-Yates algorithm and then copy the entries in a new file, according to this order. Unfortunately, this solution involves a lot of seek operations, and thus, would be very slow.
Is there a better solution to shuffle large amount of data with uniform distribution?
First get the shuffle issue out of your face. Do this by inventing a hash algorithm for your entries that produces random-like results, then do a normal external sort on the hash.
Now you have transformed your shuffle into a sort your problems turn into finding an efficient external sort algorithm that fits your pocket and memory limits. That should now be as easy as google.
A simple approach is to pick a K such that 1/K of the data fits comfortably in memory. Perhaps K=4 for your data, assuming you've got 16GB RAM. I'll assume your random number function has the form rnd(n) which generates a uniform random number from 0 to n-1.
Then:
for i = 0 .. K-1
Initialize your random number generator to a known state.
Read through the input data, generating a random number rnd(K) for each item as you go.
Retain items in memory whenever rnd(K) == i.
After you've read the input file, shuffle the retained data in memory.
Write the shuffled retained items to the output file.
This is very easy to implement, will avoid a lot of seeking, and is clearly correct.
An alternative is to partition the input data into K files based on the random numbers, and then go through each, shuffling in memory and writing to disk. This reduces disk IO (each item is read twice and written twice, compared to the first approach where each item is read K times and written once), but you need to be careful to buffer the IO to avoid a lot of seeking, it uses more intermediate disk, and is somewhat more difficult to implement. If you've got only 40GB of data (so K is small), then the simple approach of multiple iterations through the input data is probably best.
If you use 20ms as the time for reading or writing 1MB of data (and assuming the in-memory shuffling cost is insignificant), the simple approach will take 40*1024*(K+1)*20ms, which is 1 minute 8 seconds (assuming K=4). The intermediate-file approach will take 40*1024*4*20ms, which is around 55 seconds, assuming you can minimize seeking. Note that SSD is approximately 20 times faster for reads and writes (even ignoring seeking), so you should expect to perform this task in well under 10s using an SSD. Numbers from Latency Numbers every Programmer should know
I suggest keeping your general approach, but inverting the map before doing the actual copy. That way, you read sequentially and do scattered writes rather than the other way round.
A read has to be done when requested before the program can continue. A write can be left in a buffer, increasing the probability of accumulating more than one write to the same disk block before actually doing the write.
Premise
From what I understand, using the Fisher-Yates algorithm and the data you have about the positions of the entries, you should be able to obtain (and compute) a list of:
struct Entry {
long long sourceStartIndex;
long long sourceEndIndex;
long long destinationStartIndex;
long long destinationEndIndex;
}
Problem
From this point onward, the naive solution is to seek each entry in the source file, read it, then seek to the new position of the entry in the destination file and write it.
The problem with this approach is that it uses way too many seeks.
Solution
A better way to do it, is to reduce the number of seeks, using two huge buffers, for each of the files.
I recommend a small buffer for the source file (say 64MB) and a big one for the destination file (as big as the user can afford - say 2GB).
Initially, the destination buffer will be mapped to the first 2GB of the destination file. At this point, read the whole source file, in chunks of 64MB, in the source buffer. As you read it, copy the proper entries to the destination buffer. When you reach the end of the file, the output buffer should contain all the proper data. Write it to the destination file.
Next, map the output buffer to the next 2GB of the destination file and repeat the procedure. Continue until you have wrote the whole output file.
Caution
Since the entries have arbitrary sizes, it's very likely that at the beginning and ending of the buffers you will have suffixes and prefixes of entries, so you need to make sure you copy the data properly!
Estimated time costs
The execution time depends, essentially, on the size of the source file, the available RAM for the application and the reading speed of the HDD. Assuming a 40GB file, a 2GB RAM and a 200MB/s HDD read speed, the program will need to read 800GB of data (40GB * (40GB / 2GB)). Assuming the HDD is not highly fragmented, the time spent on seeks will be negligible. This means the reads will take up one hour! But if, luckily, the user has 8GB of RAM available for your application, the time may decrease to only 15 to 20 minutes.
I hope this will be enough for you, as I don't see any other faster way.
Although you can use external sort on a random key, as proposed by OldCurmudgeon, the random key is not necessary. You can shuffle blocks of data in memory, and then join them with a "random merge," as suggested by aldel.
It's worth specifying what "random merge" means more clearly. Given two shuffled sequences of equal size, a random merge behaves exactly as in merge sort, with the exception that the next item to be added to the merged list is chosen using a boolean value from a shuffled sequence of zeros and ones, with exactly as many zeros as ones. (In merge sort, the choice would be made using a comparison.)
Proving it
My assertion that this works isn't enough. How do we know this process gives a shuffled sequence, such that every ordering is equally possible? It's possible to give a proof sketch with a diagram and a few calculations.
First, definitions. Suppose we have N unique items, where N is an even number, and M = N / 2. The N items are given to us in two M-item sequences labeled 0 and 1 that are guaranteed to be in a random order. The process of merging them produces a sequence of N items, such that each item comes from sequence 0 or sequence 1, and the same number of items come from each sequence. It will look something like this:
0: a b c d
1: w x y z
N: a w x b y c d z
Note that although the items in 0 and 1 appear to be in order, they are just labels here, and the order doesn't mean anything. It just serves to connect the order of 0 and 1 to the order of N.
Since we can tell from the labels which sequence each item came from, we can create a "source" sequence of zeros and ones. Call that c.
c: 0 1 1 0 1 0 0 1
By the definitions above, there will always be exactly as many zeros as ones in c.
Now observe that for any given ordering of labels in N, we can reproduce a c sequence directly, because the labels preserve information about the sequence they came from. And given N and c, we can reproduce the 0 and 1 sequences. So we know there's always one path back from a sequence N to one triple (0, 1, c). In other words, we have a reverse function r defined from the set of all orderings of N labels to triples (0, 1, c) -- r(N) = (0, 1, c).
We also have a forward function f from any triple r(n) that simply re-merges 0 and 1 according to the value of c. Together, these two functions show that there is a one-to-one correspondence between outputs of r(N) and orderings of N.
But what we really want to prove is that this one-to-one correspondence is exhaustive -- that is, we want to prove that there aren't extra orderings of N that don't correspond to any triple, and that there aren't extra triples that don't correspond to any ordering of N. If we can prove that, then we can choose orderings of N in a uniformly random way by choosing triples (0, 1, c) in a uniformly random way.
We can complete this last part of the proof by counting bins. Suppose every possible triple gets a bin. Then we drop every ordering of N in the bin for the triple that r(N) gives us. If there are exactly as many bins as orderings, then we have an exhaustive one-to-one correspondence.
From combinatorics, we know that number of orderings of N unique labels is N!. We also know that the number of orderings of 0 and 1 are both M!. And we know that the number of possible sequences c is N choose M, which is the same as N! / (M! * (N - M)!).
This means there are a total of
M! * M! * N! / (M! * (N - M)!)
triples. But N = 2 * M, so N - M = M, and the above reduces to
M! * M! * N! / (M! * M!)
That's just N!. QED.
Implementation
To pick triples in a uniformly random way, we must pick each element of the triple in a uniformly random way. For 0 and 1, we accomplish that using a straightforward Fisher-Yates shuffle in memory. The only remaining obstacle is generating a proper sequence of zeros and ones.
It's important -- important! -- to generate only sequences with equal numbers of zeros and ones. Otherwise, you haven't chosen from among Choose(N, M) sequences with uniform probability, and your shuffle may be biased. The really obvious way to do this is to shuffle a sequence containing an equal number of zeros and ones... but the whole premise of the question is that we can't fit that many zeros and ones in memory! So we need a way to generate random sequences of zeros and ones that are constrained such that there are exactly as many zeros as ones.
To do this in a way that is probabilistically coherent, we can simulate drawing balls labeled zero or one from an urn, without replacement. Suppose we start with fifty 0 balls and fifty 1 balls. If we keep count of the number of each kind of ball in the urn, we can maintain a running probability of choosing one or the other, so that the final result isn't biased. The (suspiciously Python-like) pseudocode would be something like this:
def generate_choices(N, M):
n0 = M
n1 = N - M
while n0 + n1 > 0:
if randrange(0, n0 + n1) < n0:
yield 0
n0 -= 1
else:
yield 1
n1 -= 1
This might not be perfect because of floating point errors, but it will be pretty close to perfect.
This last part of the algorithm is crucial. Going through the above proof exhaustively makes it clear that other ways of generating ones and zeros won't give us a proper shuffle.
Performing multiple merges in real data
There remain a few practical issues. The above argument assumes a perfectly balanced merge, and it also assumes you have only twice as much data as you have memory. Neither assumption is likely to hold.
The fist turns out not to be a big problem because the above argument doesn't actually require equally sized lists. It's just that if the list sizes are different, the calculations are a little more complex. If you go through the above replacing the M for list 1 with N - M throughout, the details all line up the same way. (The pseudocode is also written in a way that works for any M greater than zero and less than N. There will then be exactly M zeros and M - N ones.)
The second means that in practice, there might be many, many chunks to merge this way. The process inherits several properties of merge sort — in particular, it requires that for K chunks, you'll have to perform roughly K / 2 merges, and then K / 4 merges, and so on, until all the data has been merged. Each batch of merges will loop over the entire dataset, and there will be roughly log2(K) batches, for a run time of O(N * log(K)). An ordinary Fisher-Yates shuffle would be strictly linear in N, and so in theory would be faster for very large K. But until K gets very, very large, the penalty may be much smaller than the disk seeking penalties.
The benefit of this approach, then, comes from smart IO management. And with SSDs it might not even be worth it — the seek penalties might not be large enough to justify the overhead of multiple merges. Paul Hankin's answer has some practical tips for thinking through the practical issues raised.
Merging all data at once
An alternative to doing multiple binary merges would be to merge all the chunks at once -- which is theoretically possible, and might lead to an O(N) algorithm. The random number generation algorithm for values in c would need to generate labels from 0 to K - 1, such that the final outputs have exactly the right number of labels for each category. (In other words, if you're merging three chunks with 10, 12, and 13 items, then the final value of c would need to have 0 ten times, 1 twelve times, and 2 thirteen times.)
I think there is probably an O(N) time, O(1) space algorithm that will do that, and if I can find one or work one out, I'll post it here. The result would be a truly O(N) shuffle, much like the one Paul Hankin describes towards the end of his answer.
Logically partition your database entries (for e.g Alphabetically)
Create indexes based on your created partitions
build DAO to sensitize based on index
I'm currently using the JTransforms-library to calculate the DFT of a one-second signal at a FS of 44100 Hz. The code is quite simple:
DoubleFFT_1D fft = new DoubleFFT_1D(SIZE);
fft.complexForward(signal); // signal: 44100 length array with audio bytes.
See this page for the documentation of JTransform's DoubleFFT_1D class.
http://incanter.org/docs/parallelcolt/api/edu/emory/mathcs/jtransforms/fft/DoubleFFT_1D.html
The question is: what is SIZE supposed to be? I know it's probably the window size, but can't seem to get it to work with the most common values I've come across, such as 1024 and 2048.
At the moment I'm testing this function by generating a signal of a 1kHz sinusoid. However, when I use the code above and I'm comparing the results with MATLAB's fft-function, they seem to be from a whole different magnitude. E.g. MATLAB gives results such as 0.0004 - 0.0922i, whereas the above code results in results like -1.7785E-11 + 6.8533E-11i, with SIZE set to 2048. The contents of the signal-array are equal however.
Which value for SIZE would give a similar FFT-function as MATLAB's built-in fft?
According to the documentation, SIZE looks like it should be the number of samples in signal. If it's truly a 1 s signal at 44.1 kHz, then you should use SIZE = 44100. Since you're using complex data, signal should be an array twice this size (real/imaginary in sequence).
If you don't use SIZE = 44100, your results will not match what Matlab gives you. This is because of the way Matlab (and probably JTransforms) scales the fft and ifft functions based on the length of the input - don't worry that the amplitudes don't match. By default, Matlab calculates the FFT using the full signal. You can provide a second argument to fft (in Matlab) to calculate the N-point FFT and it should match your JTransforms result.
From your comments, it sounds like you're trying to create a spectrogram. For this, you will have to figure out your tradeoff between: spectral resolution, temporal resolution, and computation time. Here is my (Matlab) code for a 1-second spectrogram, calculated for each 512-sample chunk of a 1s signal.
fs = 44100; % Hz
w = 1; % s
t = linspace(0, w, w*fs);
k = linspace(-fs/2, fs/2, w*fs);
% simulate the signal - time-dependent frequency
f = 10000*t; % Hz
x = cos(2*pi*f.*t);
m = 512; % SIZE
S = zeros(m, floor(w*fs/m));
for i = 0:(w*fs/m)-1
s = x((i*m+1):((i+1)*m));
S(:,i+1) = fftshift(fft(s));
end
For this image we have 512 samples along the frequency axis (y-axis), ranging from [-22050 Hz 22050 Hz]. There are 86 samples along the time axis (x-axis) covering about 1 second.
For this image we now have 4096 samples along the frequency axis (y-axis), ranging from [-22050 Hz 22050 Hz]. The time axis (x-axis) again covers about 1 second, but this time with only 10 chunks.
Whether it's more important to have fast time resolution (512-sample chunks) or high spectral resolution (4096-sample chunks) will depend on what kind of signal you're working with. You have to make a decision about what you want in terms of temporal/spectral resolution, and what you can achieve in reasonable computation time. If you use SIZE = 4096, for example, you will be able to calculate the spectrum ~10x/s (based on your sampling rate) but the FFT may not be fast enough to keep up. If you use SIZE = 512 you will have poorer spectral resolution, but the FFT will calculate much faster and you can calculate the spectrun ~86x/s. If the FFT is still not fast enough, you could then start skipping chunks (e.g. use SIZE=512 but only calculate for every other chunk, giving ~43 spectrums per 1s signal). Hopefully this makes sense.
I would like to be able to take a frequency (eg. 1000hz, 250hz, 100hz) and play it out through the phone hardware.
I know that Android's AudioTrack will allow me to play a 16-bit PCM if I can calculate an array of bits or shorts. I would like to calculate only a single period so that later I can loop it without any issues, and so I can keep the calculation time down.
How could this be achieved?
Looping a single period isn't necessarily a good idea - the cycle may not fit nicely into an exact number of samples so you might get an undesirable discontinuity at the end of each cycle, or worse, the audible frequency may end up slightly off.
That said, the math isn't hard:
float sample_rate = 44100;
float samples_per_cycle = sample_rate / frequency;
int samples_to_produce = ....
for (int i = 0; i < samples_to_produce; ++i) {
sample[i] = Math.floor(32767.0 * Math.sin(2 * Math.PI * i / samples_per_cycle));
}
To see what I meant above about the frequency, take the standard tuning pitch of 440 Hz.
Strictly this needs 100.227 samples, but the code above would produce 100. So if you repeat your 100 samples over and over you'll actually play the sample 441 times per second, so your pitch will be off by 1 Hz.
To avoid the problem you'd really need to calculate several periods of the waveform, although I don't know many is needed to fool the ear into hearing the right pitch.
Ideally it would be as many as are needed such that:
i / samples_per_cycle
is a whole number, so that the last sample (technically the one after the last sample) ends exactly on a cycle boundary. I think if your input frequencies are all whole numbers then producing one second's worth exactly would work.