Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
since I don't want to do it on my own, I am searching for a good FFT implementation for java. First I used this one here FFT Princeton but it uses objects and my profiler told me, that its not really fast due to this fact. So I googled again and found this one: FFT Columbia which is faster. Maybe one of you guys know another FFT implementation? I'd like to have the "best" one because my app has to process a huge amount of sound data, and users don't like waiting... ;-)
Regards.
FFTW is the 'fastest fourier transform in the west', and has some Java wrappers:
http://www.fftw.org/download.html
Hope that helps!
Late to the party - here as a pure java solution for those when JNI is not an option.JTransforms
I wrote a function for the FFT in Java: http://www.wikijava.org/wiki/The_Fast_Fourier_Transform_in_Java_%28part_1%29
I've released it in the Public Domain so you can use those functions everywhere (for personal or business projects too). Just cite me in the credits and send me just a link to your work, and you're ok.
It is completely reliable. I've checked its output against Mathematica's FFT and they were always correct until the 15th decimal digit. I think it's a very good FFT implementation for Java. I wrote it on the J2SE 1.6 version and tested it on the J2SE 1.5-1.6 version.
If you count the number of instructions (it's a lot much simpler than a perfect computational complexity function estimation) you can clearly see that this version is great even if it's not optimized at all. I'm planning to publish the optimized version if there are enough requests.
Let me know if it was useful, and tell me any comments you like.
I share the same code right here:
/**
* #author Orlando Selenu
* Originally written in the Summer of 2008
* Based on the algorithms originally published by E. Oran Brigham "The Fast Fourier Transform" 1973, in ALGOL60 and FORTRAN
*/
public class FFTbase {
/**
* The Fast Fourier Transform (generic version, with NO optimizations).
*
* #param inputReal
* an array of length n, the real part
* #param inputImag
* an array of length n, the imaginary part
* #param DIRECT
* TRUE = direct transform, FALSE = inverse transform
* #return a new array of length 2n
*/
public static double[] fft(final double[] inputReal, double[] inputImag,
boolean DIRECT) {
// - n is the dimension of the problem
// - nu is its logarithm in base e
int n = inputReal.length;
// If n is a power of 2, then ld is an integer (_without_ decimals)
double ld = Math.log(n) / Math.log(2.0);
// Here I check if n is a power of 2. If exist decimals in ld, I quit
// from the function returning null.
if (((int) ld) - ld != 0) {
System.out.println("The number of elements is not a power of 2.");
return null;
}
// Declaration and initialization of the variables
// ld should be an integer, actually, so I don't lose any information in
// the cast
int nu = (int) ld;
int n2 = n / 2;
int nu1 = nu - 1;
double[] xReal = new double[n];
double[] xImag = new double[n];
double tReal, tImag, p, arg, c, s;
// Here I check if I'm going to do the direct transform or the inverse
// transform.
double constant;
if (DIRECT)
constant = -2 * Math.PI;
else
constant = 2 * Math.PI;
// I don't want to overwrite the input arrays, so here I copy them. This
// choice adds \Theta(2n) to the complexity.
for (int i = 0; i < n; i++) {
xReal[i] = inputReal[i];
xImag[i] = inputImag[i];
}
// First phase - calculation
int k = 0;
for (int l = 1; l <= nu; l++) {
while (k < n) {
for (int i = 1; i <= n2; i++) {
p = bitreverseReference(k >> nu1, nu);
// direct FFT or inverse FFT
arg = constant * p / n;
c = Math.cos(arg);
s = Math.sin(arg);
tReal = xReal[k + n2] * c + xImag[k + n2] * s;
tImag = xImag[k + n2] * c - xReal[k + n2] * s;
xReal[k + n2] = xReal[k] - tReal;
xImag[k + n2] = xImag[k] - tImag;
xReal[k] += tReal;
xImag[k] += tImag;
k++;
}
k += n2;
}
k = 0;
nu1--;
n2 /= 2;
}
// Second phase - recombination
k = 0;
int r;
while (k < n) {
r = bitreverseReference(k, nu);
if (r > k) {
tReal = xReal[k];
tImag = xImag[k];
xReal[k] = xReal[r];
xImag[k] = xImag[r];
xReal[r] = tReal;
xImag[r] = tImag;
}
k++;
}
// Here I have to mix xReal and xImag to have an array (yes, it should
// be possible to do this stuff in the earlier parts of the code, but
// it's here to readibility).
double[] newArray = new double[xReal.length * 2];
double radice = 1 / Math.sqrt(n);
for (int i = 0; i < newArray.length; i += 2) {
int i2 = i / 2;
// I used Stephen Wolfram's Mathematica as a reference so I'm going
// to normalize the output while I'm copying the elements.
newArray[i] = xReal[i2] * radice;
newArray[i + 1] = xImag[i2] * radice;
}
return newArray;
}
/**
* The reference bit reverse function.
*/
private static int bitreverseReference(int j, int nu) {
int j2;
int j1 = j;
int k = 0;
for (int i = 1; i <= nu; i++) {
j2 = j1 / 2;
k = 2 * k + j1 - 2 * j2;
j1 = j2;
}
return k;
}
}
EDIT: 5th of May, 2022. Well... after more than 10 years I'm publishing the code on Github to avoid losing it: https://github.com/hedoluna/fft
Feel free to contribute and send me your opinions :) Thanks!
I guess it depends on what you are processing. If you are calculating the FFT over a large duration you might find that it does take a while depending on how many frequency points you are wanting. However, in most cases for audio it is considered non-stationary (that is the signals mean and variance changes to much over time), so taking one large FFT (Periodogram PSD estimate) is not an accurate representation. Alternatively you could use Short-time Fourier transform, whereby you break the signal up into smaller frames and calculate the FFT. The frame size varies depending on how quickly the statistics change, for speech it is usually 20-40ms, for music I assume it is slightly higher.
This method is good if you are sampling from the microphone, because it allows you to buffer each frame at a time, calculate the fft and give what the user feels is "real time" interaction. Because 20ms is quick, because we can't really perceive a time difference that small.
I developed a small bench mark to test the difference between FFTW and KissFFT c-libraries on a speech signal. Yes FFTW is highly optimised, but when you are taking only short-frames, updating the data for the user, and using only a small fft size, they are both very similar. Here is an example on how to implement the KissFFT libraries in Android using LibGdx by badlogic games. I implemented this library using overlapping frames in an Android App I developed a few months ago called Speech Enhancement for Android.
I'm looking into using SSTJ for FFTs in Java. It can redirect via JNI to FFTW if the library is available or will use a pure Java implementation if not.
Related
I am managing audio capturing and playing using java sound API (targetDataLine and sourceDataLine). Now suppose in a conference environment, one participant's audio queue size got greater than jitter size (due to processing or network) and I want to fast forward the audio bytes I have of that participant to make it shorter than jitter size.
How can I fast forward the audio byte array of that participant?
I can't do it during playing as normally Player thread just deque 1 frame from every participant's queue and mix it for playing. The only way I can get that is if I deque more than 1 frame of that participant and mix(?) it for fast-forwarding before mixing it with other participants 1 dequeued frame for playing?
Thanks in advance for any kind of help or advice.
There are two ways to speed up the playback that I know of. In one case, the faster pace creates a rise in pitch. The coding for this is relatively easy. In the other case, pitch is kept constant, but it involves a technique of working with sound granules (granular synthesis), and is harder to explain.
For the situation where maintaining the same pitch is not a concern, the basic plan is as follows: instead of advancing by single frames, advance by a frame + a small increment. For example, let's say that advancing 1.1 frames over a course of 44000 frames is sufficient to catch you up. (That would also mean that the pitch increase would be about 1/10 of an octave.)
To advance a "fractional" frame, you first have to convert the bytes of the two bracketing frames to PCM. Then, use linear interpolation to get the intermediate value. Then convert that intermediate value back to bytes for the output line.
For example, if you are advancing from frame[0] to frame["1.1"] you will need to know the PCM for frame[1] and frame[2]. The intermediate value can be calculated using a weighted average:
value = PCM[1] * 9/10 + PCM[2] * 1/10
I think it might be good to make the amount by which you advance change gradually. Take a few dozen frames to ramp up the increment and allow time to ramp down again when returning to normal dequeuing. If you suddenly change the rate at which you are reading the audio data, it is possible to introduce a discontinuity that will be heard as a click.
I have used this basic plan for dynamic control of playback speed, but I haven't had the experience of employing it for the situation that you are describing. Regulating the variable speed could be tricky if you also are trying to enforce keeping the transitions smooth.
The basic idea for using granules involves obtaining contiguous PCM (I'm not clear what the optimum number of frames would be for voice, 1 to 50 millis is cited as commonly being used with this technique in synthesis), and giving it a volume envelope that allows you to mix sequential granules end-to-end (they must overlap).
I think the envelopes for the granules make use of a Hann function or Hamming window--but I'm not clear on the details, such as the overlapping placement of the granules so that they mix/transition smoothly. I've only dabbled, and I'm going to assume folks at Signal Processing will be the best bet for advice on how to code this.
I found a fantastic git repo (sonic library, mainly for audio player) which actually does exactly what I wanted with so much controls. I can input a whole .wav file or even chunks of audio byte arrays and after processing, we can get speed up play experience and so more. For real time processing I actually called this on every chunk of audio byte array.
I found another way/algo to detect whether a audio chunk/byte array is voice or not and after depending on it's result, I can simply ignore playing non voice packets which gives us around 1.5x speedup with less processing.
public class DTHVAD {
public static final int INITIAL_EMIN = 100;
public static final double INITIAL_DELTAJ = 1.0001;
private static boolean isFirstFrame;
private static double Emax;
private static double Emin;
private static int inactiveFrameCounter;
private static double Lamda; //
private static double DeltaJ;
static {
initDTH();
}
private static void initDTH() {
Emax = 0;
Emin = 0;
isFirstFrame = true;
Lamda = 0.950; // range is 0.950---0.999
DeltaJ = 1.0001;
}
public static boolean isAllSilence(short[] samples, int length) {
boolean r = true;
for (int l = 0; l < length; l += 80) {
if (!isSilence(samples, l, l+80)) {
r = false;
break;
}
}
return r;
}
public static boolean isSilence(short[] samples, int offset, int length) {
boolean isSilenceR = false;
long energy = energyRMSE(samples, offset, length);
// printf("en=%ld\n",energy);
if (isFirstFrame) {
Emax = energy;
Emin = INITIAL_EMIN;
isFirstFrame = false;
}
if (energy > Emax) {
Emax = energy;
}
if (energy < Emin) {
if ((int) energy == 0) {
Emin = INITIAL_EMIN;
} else {
Emin = energy;
}
DeltaJ = INITIAL_DELTAJ; // Resetting DeltaJ with initial value
} else {
DeltaJ = DeltaJ * 1.0001;
}
long thresshold = (long) ((1 - Lamda) * Emax + Lamda * Emin);
// printf("e=%ld,Emin=%f, Emax=%f, thres=%ld\n",energy,Emin,Emax,thresshold);
Lamda = (Emax - Emin) / Emax;
if (energy > thresshold) {
isSilenceR = false; // voice marking
} else {
isSilenceR = true; // noise marking
}
Emin = Emin * DeltaJ;
return isSilenceR;
}
private static long energyRMSE(short[] samples, int offset, int length) {
double cEnergy = 0;
float reversOfN = (float) 1 / length;
long step = 0;
for (int i = offset; i < length; i++) {
step = samples[i] * samples[i]; // x*x/N=
// printf("step=%ld cEng=%ld\n",step,cEnergy);
cEnergy += (long) ((float) step * reversOfN);// for length =80
// reverseOfN=0.0125
}
cEnergy = Math.pow(cEnergy, 0.5);
return (long) cEnergy;
}
}
Here I can convert my byte array to short array and detect whether it is voice or non voice by
frame.silence = DTHVAD.isSilence(encodeShortBuffer, 0, shortLen);
I have an android application which is getting gesture coordinates (3 axis - x,y,z). I need to compare them with coordinates which I have in my DB and determine whether they are the same or not.
I also need to add some tolerance, since accelerometer (device which captures gestures) is very sensitive. It would be easy, but I also want to consider e.g. "big circle" drawn in the air, same as "small circle" drawn in the air. meaning that there would be different values, but structure of the graph would be the same, right?
I have heard about translating graph values into bits and then compare. Is that the right approach? Is there any library for such comparison?
So far I just hard coded it, covering all my requirements except the last one (big circle vs small circle).
My code now:
private int checkWhetherGestureMatches(byte[] values, String[] refValues) throws IOException {
int valuesSize = 32;
int ignorePositions = 4;
byte[] valuesX = new byte[valuesSize];
byte[] valuesY = new byte[valuesSize];
byte[] valuesZ = new byte[valuesSize];
for (int i = 0; i < valuesSize; i++) {
int position = i * 3 + ignorePositions;
valuesX[i] = values[position];
valuesY[i] = values[position + 1];
valuesZ[i] = values[position + 2];
}
Double[] valuesXprevious = new Double[valuesSize];
Double[] valuesYprevious = new Double[valuesSize];
Double[] valuesZprevious = new Double[valuesSize];
for (int i = 0; i < valuesSize; i++) {
int position = i * 3 + ignorePositions;
valuesXprevious[i] = Double.parseDouble(refValues[position]);
valuesYprevious[i] = Double.parseDouble(refValues[position + 1]);
valuesZprevious[i] = Double.parseDouble(refValues[position + 2]);
}
int incorrectPoints = 0;
for (int j = 0; j < valuesSize; j++) {
if (valuesX[j] < valuesXprevious[j] + 20 && valuesX[j] > valuesXprevious[j] - 20
&& valuesY[j] < valuesYprevious[j] + 20 && valuesY[j] > valuesYprevious[j] - 20
&& valuesZ[j] < valuesZprevious[j] + 20 && valuesZ[j] > valuesZprevious[j] - 20) {
} else {
incorrectPoints++;
}
}
return incorrectPoints;
}
EDIT:
I found JGraphT, it might work. If you know anything about that already, let me know.
EDIT2:
See these images, they are the same gesture but one is done in a slower motion than another.
Faster one:
Slower one:
I haven't captured images of the same gesture where one would be smaller than another, might add that later.
If your list of gestures is complex, I would suggest training a neural network which can classify the gestures based on the graph value bits you mentioned. The task is very similar to classification of handwritten numerical digits, for which lots of resources are there on the net.
The other approach would be to mathematically guess the shape of the gesture, but I doubt it will be useful considering the tolerance of the accelerometer and the fact that users won't draw accurate shapes.
(a) convert your 3D coordinates into 2D plain figure. Use matrix transformations.
(b) normalize your gesture scale - again with matrix transformations
(c) normalize the number of points or use interpolation on the next step.
(d) calculate the difference between your stored (s) gesture and current (c) gesture as
Sum((Xs[i] - Xc[i])^2 + (Ys[i] - Yc[i])^2) where i = 0 .. num of points
If the difference is below your predefined precision - gestures are equal.
I have used a Java implementation of Dynamic Time Wrapping algorithm. The library is called fastDTW.
Unfortunately from what I undersood they don't support it anymore, though I found a use for it.
https://code.google.com/p/fastdtw/
I can't recall now, but I think I used this one and compiled it myself:
https://github.com/cscotta/fastdtw/tree/master/src/main/java/com/fastdtw/dtw
In Java, I am trying to implement the following equation for calculating the current velocity of a skydiver not neglecting air resistance.
v(t) = v(t-∆t) + (g - [(drag x crossArea x airDensity) / (2*mass)] *
v[(t-∆t)^2] ) * (∆t)
My problem is that I am not sure how to translate "v(t - ∆t)" into a code. Right now I have this method below, where as you can see I am using the method within itself to find the previous velocity. This has continued to result in a stack overflow error message, understandably.
(timeStep = ∆t)
public double calculateVelocity(double time){
double velocity;
velocity = calculateVelocity(time - timeStep)
+ (acceleration - ((drag * crossArea * airDensity)
/ (2 * massOfPerson))
* (calculateVelocity(time - timeStep)*(time * timeStep)))
* timeStep;
}
return velocity;
}
I am calling the above method in the method below. Assuming that the ending time = an int, will be the user input but written this way to be dynamic.
public void assignVelocitytoArrays(){
double currentTime = 0;
while(currentTime <= endingTime){
this.vFinal = calculateVelocity(currentTime);
currentTime += timeStep;
}
}
I would like to figure this out on my own, could someone give me a general direction? Is using a method within itself the right idea or am I completely off track?
The formula you want to implement is the recursive representation of a sequence, mathematiacally speaking.
Recursive sequences need a starting point, e.g.
v(0) = 0 (because a negative time does not make sense)
and a rule to calculate the next elements, e.g.
v(t) = v(t-∆t) + (g - [(drag x crossArea x airDensity) / (2*mass)] * v[(t-∆t)^2] ) * (∆t)
(btw: are you sure it has to be v([t-∆t]^2) instead of v([t-∆t])^2?)
So your approach to use recursion (calling a function within itself) to calculate a recursive sequence is correct.
In your implementation, you only forgot one detail: the starting point. How should your program know that v(0) is not defined be the rule, but by a definite value? So you must include it:
if(input value == starting point){
return starting point
}
else{
follow the rule
}
On a side note: you seem to be creating an ascending array of velocities. It would make sense to use the already calculated values in the array instead of recursion, so you don't have to calculate every step again and again.
This only works if you did indeed make a mistake in the rule.
double[] v = new double[maxTime/timeStep];
v[0] = 0; //starting point
for(int t = 1; t < maxSteps; t++){
v[t] = v[t-1] + (g - [(drag x crossArea x airDensity) / (2*mass)] * v[t-1]^2 ) * (∆t)
}
I'm looking for java code (or a library) that calculates the earth mover's distance (EMD) between two histograms. This could be directly or indirectly (e.g. using the Hungarian algorithm). I found several implementations of this in c/c++ (e.g. "Fast and Robust Earth Mover's Distances", but I'm wondering if there is a Java version readily available.
I will be using the EMD calculation to evaluate the approach given by this paper in the context of a science project I'm working on.
Update
Using a variety of resources I estimate that the code below should do the trick. determineMinCostAssignment is the calculation of the optimal assignment as determined by the Hungarian algorithm. For this I will be using the code from http://konstantinosnedas.com/dev/soft/munkres.htm
My main concern is the calculated flow: I am not sure if this is correct. Is there someone who can verify that this is correct or not?
/**
* Determines the Earth Mover's Distance between two histogram assuming an equal distance between two buckets of a histogram. The distance between
* two buckets is equal to the differences in the indexes of the buckets.
*
* #param threshold
* The maximum distance to use between two buckets.
*/
public static double determineEarthMoversDistance(double[] histogram1, double[] histogram2, int threshold) {
if (histogram1.length != histogram2.length)
throw new InvalidParameterException("Each histogram must have the same number of elements");
double[][] groundDistances = new double[histogram1.length][histogram2.length];
for (int i = 0; i < histogram1.length; ++i) {
for (int j = 0; j < histogram2.length; ++j) {
int abs_diff = Math.abs(i - j);
groundDistances[i][j] = Math.min(abs_diff, threshold);
}
}
int[][] assignment = determineMinCostAssignment(groundDistances);
double costSum = 0, flowSum = 0;
for (int i = 0; i < assignment.length; i++) {
double cost = groundDistances[assignment[i][0]][assignment[i][1]];
double flow = histogram2[assignment[i][1]];
costSum += cost * flow;
flowSum += flow;
}
return costSum / flowSum;
}
Here's a pure Java port of the FastEMD algorithm, that I just released:
https://github.com/telmomenezes/JFastEMD
The website "Fast and Robust Earth Mover's Distances" has a Java wrapper for the C/C++ code with compiled binary for Linux and Windows.
This is what I use for Java/Scala:
import org.apache.commons.math3.ml.distance.EarthMoversDistance
new EarthMoversDistance().compute(observed, expected)
https://github.com/wihoho/VideoRecognition
Adapt the author's C implementation with python module through a file interface
The modified C codes are under the folder EarthMoverDistance SourceCode
I am pretty sure that you can do the same thing with Java. Just add a file interface to connect the C implementation of EMD with your Java codes.
Just to clarify this is NOT a homework question as I've seen similar accusations leveled against other bit-hackish questions:
That said, I have this bit hack in C:
#include <stdio.h>
const int __FLOAT_WORD_ORDER = 0;
const int __LITTLE_END = 0;
// Finds log-base 2 of 32-bit integer
int log2hack(int v)
{
union { unsigned int u[2]; double d; } t; // temp
t.u[0]=0;
t.u[1]=0;
t.d=0.0;
t.u[__FLOAT_WORD_ORDER==__LITTLE_END] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=__LITTLE_END] = v;
t.d -= 4503599627370496.0;
return (t.u[__FLOAT_WORD_ORDER==__LITTLE_END] >> 20) - 0x3FF;
}
int main ()
{
int i = 25; //Log2n(25) = 4
int j = 33; //Log2n(33) = 5
printf("Log2n(25)=%i!\n",
log2hack(25));
printf("Log2n(33)=%i!\n",
log2hack(33));
return 0;
}
I want to convert this to Java. So far what I have is:
public int log2Hack(int n)
{
int r; // result of log_2(v) goes here
int[] u = new int [2];
double d = 0.0;
if (BitonicSorterForArbitraryN.__FLOAT_WORD_ORDER==
BitonicSorterForArbitraryN.LITTLE_ENDIAN)
{
u[1] = 0x43300000;
u[0] = n;
}
else
{
u[0] = 0x43300000;
u[1] = n;
}
d -= 4503599627370496.0;
if (BitonicSorterForArbitraryN.__FLOAT_WORD_ORDER==
BitonicSorterForArbitraryN.LITTLE_ENDIAN)
r = (u[1] >> 20) - 0x3FF;
else
r = (u[0] >> 20) - 0x3FF;
return r;
}
(Note it's inside a bitonic sorting class of mine...)
Anyhow, when I run this for the same values 33 and 25, I get 52 in each cases.
I know Java's integers are signed, so I'm pretty sure that has something to do with why this is failing. Does anyone have any ideas how I can get this 5-op, 32-bit integer log 2 to work in Java?
P.S. For the record, the technique is not mine, I borrowed it from here:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogIEEE64Float
If you're in Java, can't you simply do 31 - Integer(v).numberOfLeadingZeros()? If they implement this using __builtin_clz it should be fast.
I think you did not get the meaning of that code. The C code uses a union - a struct that maps the same memory to two or more different fields. That makes it possible to access the storage allocated for the double as integers. In your Java code, you don't use an union but two different variables that are mapped to different parts of memory. This makes the hack fail.
As Java has no unions, you had to use serialization to get the results you want. Since that is quite slow, why not use another method to calculate the logarithm?
You are using the union to convert your pair of ints into a double with the same bit pattern. In Java, you can do that with Double.longBitsToDouble, and then convert back with Double.doubleToLongBits. Java is always (or at least gives the impression of always being) big-endian, so you don't need the endianness check.
That said, my attempt to adapt your code into Java didn't work. The signedness of Java integers might be a problem.