Random multivariate normal distribution

Random multivariate normal distribution - java

I've run into a problem where I have to be able to generate a set of randomly chosen numbers of a multivariate normal distribution with mean 0 and a given 3*3 variance-covariance matrix in Java.
Is there an easy way as to do this?

1) Use a library implementation, as suggested by Dima.
Or, if you really feel a burning need to do this yourself:
2) Assuming you want to generate normals with a mean vector M and variance/covariance matrix V, perform Cholesky Decomposition on V to come up with lower triangular matrix L such that V=LLt (where the superscript t indicates transpose). Generate a vector Z of three independent standard normals (using Random.nextGaussian() to get the individual elements). Then LZ + M will have the desired multivariate normal distribution.

Apache Commons has what you are looking for:
MultivariateNormalDistribution mnd = new MultivariateNormalDistribution(means, covariances);
double vals[] = mnd.sample();

Related

How to generate forests in java

I am creating a game where a landscape is generated all of the generations work perfectly, a week ago I have created a basic 'forest' generation system which just is a for loop that takes a chunk, and places random amounts of trees in random locations. But that does not give the result I would like to achieve.
Code:
for(int t = 0; t <= randomForTrees.nextInt(maxTreesPerChunk); t++){
// generates random locations for the X, Z positions\\
// the Y position is the height on the terrain gain with the X, Z coordinates \\
float TreeX = random.nextInt((int) (Settings.TERRAIN_VERTEX_COUNT + Settings.TERRAIN_SIZE)) + terrain.getX();
float TreeZ = random.nextInt((int) (Settings.TERRAIN_VERTEX_COUNT + Settings.TERRAIN_SIZE)) + terrain.getZ();
float TreeY = terrain.getTerrainHeightAtSpot(TreeX, TreeZ);
// creates a tree entity with the previous generated positions \\
Entity tree = new Entity(TreeStaticModel, new Vector3f(TreeX, TreeY, TreeZ), 0, random.nextInt(360), 0, 1);
// checks if the tree is on land \\
if(!(tree.getPosition().y <= -17)){
trees.add(tree);
}
}
Result:

First of all take a look at my:
simple C++ Island generator
as you can see you can compute Biomes from elevation, slope, etc... more sophisticated generators create a Voronoi map dividing your map into Biomes regions assigning randomly (with some rules) biome types based on neighbors already assigned...
Back to your question you should place your trees more dense around some position instead of uniformly cover large area with sparse trees... So you need slightly different kind of randomness distribution (like gauss). See the legendary:
Understanding “randomness”
on how to get a different distribution from uniform one...
So what you should do is get few random locations that would be covering your region shape uniformly. And then generate trees with density dependent on minimal distance to these points. The smaller distance the dense trees placement.

What are you looking for is a low-discrepancy-sequence to generate random numbers. The generated numbers are not truely random, but rather uniformly distributed. This distinguishes them from random number generators, which do not automatically produce uniformly distributed numbers.
One example of such a sequence would be the Halton Sequence, and Apache Commons also has an implementation which you can use.
double[] nextVector = generator.nextVector();
In your case, using two dimensions, the resulting array also has two entries. What you still need to do is to translate the points into your local coordinates by adding the the central point of the square where you want to place the forest to each generated vector. Also, to increase the gap between points, you should consider scaling the vectors.

A random invertible matrix in Java

I work on a project, for this project I need to generate a square random invertible matrix.
I found out how to generate a square random matrix, still I want to be sure that this is an invertible one, without having to compute the determinant or to generate this matrix multiple times, can you please give me a tip?

One way is to generate the SVD of the matrix. That is you generate 'random' (square) orthogonal matrices U and V, and a 'random' diagonal matrix S, and then compute
M = U*S*V'
Note that every matrix has an SVD
As long as none of the diagonal elements of S are 0, M will be invertible. Many routines that deal with invertible matrices are sensitive to the condition number of the matrix; errors tend to increase as the condition number gets larger. The condition number of M is the same as the condition numner of S which is the largest (by absolute value) diagonal element of S divided by the smallest (by absolute value). You may want to control this. One way is to generate the elements of S to be uniform in say [lo,hi] and then randomly set the sign.
One way to generate 'random' orthogonal matrices is to generate then as a product of 'random' Householder reflections, that is matrices of the form
R_v = 1 - 2*v*v'/(v'*v)
where v is a 'random' vector.
Every n by n orthogonal matrix can be written as a product of n Householder reflections.
All this is not as computationally severe as it at first might look. Due to the special form of the reflectors it is straightforward to write routines that compute
R_u*M and M*R_v'
in M using only n extra storage and being O( n * n)
So one scheme would be
Generate S
Repeat n times
Generate random non zero vector u
Update S to be R_u*S
Generate random non zero vector v
Update S to be S*R_v'

An LU decomposition might work.
Generate two matrices, L, which is lower triangular with all entries above the main diagonal zero and, U, an upper triangular matrix with entries below the main diagonal zero. Then form a matrix A=LU.
The determinant of either L or U is just the product of entries down the main diagonal so you just need to ensure none of these are zero. The determinant of A is the product of the two determinants.

Working of K-apriori Algorithm

I am trying to develop a java code for data mining algorithm i.e. k-apriori algorithm which improves the performance of apriori algorithm. As I have already developed 1) apriori & 2) apriori based on boolean matrix. The thing which I am not able to understand is how the wiener function helps to transform the data. Why we use it in this algorithm. I tried to search over google for example of K-apriori algorithm but not able to find any example. I know the working of K-means algorithm. If any one have example K-apriori as specially how it works it will be helpful.
Here is the link from which I am referring the K-apriori algorithm.

I never implemented k-apriori myself but if I am right it is just Apriori working in K clusters found by K-means
As you know K-means is based on the concept of cluster centroids. Usually the binary data clustering is done by using 0 and 1 as numerical value. But that is very problematic when it comes to calculating centroids from data. If you have binary data distance between two points is just number of bits that are different between two points. You can read more about this problem in this link
To get any meaningful clusters K-means should operate on real values. And that's why you use wiener function to transform binary values into real values which helps K-means get satisfying results
Wiener function - They perform it on each binary vector as follows:
Calculate the mean µ for the input vector Xi around each element
Calculate the variance σ^2 of each element
Perform wiener transformation for each element in the vector using equation Y based on its neighborhood
Assuming you have binary matrix size X of size pxq and vector V which is n-th row of that matrix. Let choose neighbourhood window 3. For n-th position of V vector
µ = 1/3 * ( V[n-1] + V[n] + V[n+1] )
σ^2 = 1/3 * ( ( V[n-1]-µ )^2 + ( V[n]-µ )^2 + ( V[n+1]-µ )^2 )
Y[n] = µ + (σ^2 - λ^2)/σ^2 * ( V[n] - µ )
where λ^2 is the average of all the local estimated variances, so f.e. assuming length of vector V = 5:
λ^2 = (σ^2[0]+σ^21+σ^2[2]+σ^2[3]+σ^2[4])/5

How should I implement a Mahalanobis distance function in Java?

I am working on a project in java and have two 2d int arrays both 10x15. I want to convert the Mahalanobis distance between them. They are grouped in categories along the x axis of the array (size 10). I understand that you must find the mean value in these groups and redistribute the data so that it is centered. My problem now is generating the covariance matrix necessary for calculation. If anyone knows a good way to do this or point to a useful guide that can step me through the process in 3D it would be a great help. Thanks.

A covariance matrix contains the expected relationship between any two variables. Given a statistical distribution on a vector x, with statistical mean avg:
covariance(i,j) = expected value of [ (x[i] - avg[i])(x[j] - avg[j]) ]
Given a statistical set of N vectors v_1 ... v_N, with mean vector avg, you can estimate the covariance of the distribution they were taken from as follows:
sample_covariance(i,j) = sum[for k=1..N]( (v_k[i] - avg[i])*(v_k[j] - avg[j]) ) / (N-1)
This last is the covariance matrix you're looking for. I recommend you also read the wiki link above.

What are the differences between MFCC and BFCC?

I have implemented MFCC algorithm and want to implement BFCC. What are the differences between them and is it enough just to use another function instead of frequency to mel (2595 * Math.log10(1 + frequency / 700) ) and mel to frequency functions (700 * (Math.pow(10, mel / 2595) - 1) ) I follow that code: MFCC
PS: Does it need to change the code for triangular filters?

These are just different scales of representing the frequency spacings of the filters. MFCC uses filters whose center frequencies are spaced along the mel scale, while BFCC will use filters with center frequencies spaced along the bark scale.
The bark scale would simply be represented as:
Bark(f)=13*arctan(0.00076*f)+3.5*arctan((f/(7500))*(f/(7500)))
where f is the frequency in Hz.
Though you can use the bark scale to represent the center frequency spacings, research shows that using either mfcc or bfcc to represent feature vectors of an input speech sample has very little effect on ASR systems performance. The industry standard remains MFCC. In fact, I have not heard much of the BFCC.
If the code for the computation of filter coefficients is relatively generic and it takes in center frequencies as an input parameter, then I would say that you are OK. But, it is always best to double-check. Use MATLAB and plot frequency responses and check! You can check the [following paper][1] out for a comparison between MFCC, BFCC and uniform scale frequency spacings.
Update 1: The center frequency of a filter is either the arithmetic/geometric mean between the upper and lower cutoff frequencies of a band-pass/band-stop filter.
Also, the reverse equation to solve for f given the Bark frequencies is not trivial. It will be a quadratic equation that will need to be solved. One way would be to have a table constructed for different values of f and Bark and then do a table lookup. But I have not been able to find any links to the reverse equation.
[1]: http://148.204.64.201/paginas%20anexas/voz/articulos%20interesantes/front%20end/MFCC/a-comparative-study-of.pdf

You could just instead select the frequencies by hand of each bark critical band (a bounch of if's and else's), since there is no exact equation for bark critical bands (for mel's either, but there is a pretty close one), then get the logarithm of the value for each band, and then apply dct, remember this is for each frame, mel scale uses also logarithmic scale, so there is not much point between doing mfcc or bfcc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.