How to implement Jaccard index in Java or JSP? - java

I have this problem in calculating Jaccard Similarity for finding similar books using transaction id from MySQL database of sales transactions :
t1= Java,Ruby,C
t2= Java,C#, Python
t3= C#, VB, C
....etc
Size of Java intersection = 2; (How could we find it out?)
Size of union = 3, (How could we find it out?)
Jaccard similarity = (intersection/union) = 2/3
But I don't understand how could I find out the "intersection" and "union" of the two vectors or how to implement it in Java/JSP.
Please help me and thanks a lot!

You need to use one of standard Set class. You can do an intersect, union and size calculation on sets.

Related

calculate w vector of svm with rbf kernel

I need to find a way to calculate w vector of SVM with rbf kernel. Refer to libsvm documentation, rbf kernel is defined as :
RBF: exp(-gamma*|u-v|^2)
I know u is a 1*n array (model.sv_coef)
and v is m*n matrix (model.SVs)
now, I don't know haw could I calculate euclidean distance of u and v ---> |u-v| (one of them is 1-d array and the other is 2-d one)
and after that how could I find a 1*n array which is a w array?
Thanks in advance.
It's not possible to perform the operation you dream of. One cannot calculate d(u, v) as in your case by use of euclidean distance. The dimensions do not match, when you define ANY type of vectors/matrices like you did in your question. u and v are data points, but not the results of some SV-estimation.
Make yourself clear what the dimensions of your vectors/matrices are. Consider using DTW (dynamic time warping) or the like, when you use vectors of different lengths. You can put DTW into the RBF kernel, at least this will give results on a empirical sound basis (but keep in mind DTW violates the triangle inequality).
One more thing: u and v are data points. You should familiarize yourself more with SVM.

Ojalgo: Defining whether a matrix is stable in Java

I'm trying to solve M (NxN) linear systems (Ax = B, B = [b1,b2,...bM]) using Ojalgo. Thanks to apete's counsel, I successfully managed to check if A (A, B are objects of type PrimitiveMatrix) is singular but it seems that sometimes it's also unstable..
It would be really useful for me to determine if this matrix is stable or not.
Any help would be most appreciated. Thank you!
You should find the condition number of your matrix. I believe getCondition() gives you the condition number of a matrix. Bigger the number is less stable the matrices are.

Singular Matrices and Jama

I am using Jama API for solving a problem with Linear Algebra. But it is giving me an error: java.lang.RuntimeException: Matrix is singular.
I suppose when the matrix is singular there are multiple solutions possible. Is there a way in Jama API to get one of these solutions or is there any other API that can help me here.
Below is a code snippet I am using:
Matrix A = new Matrix(input);
Matrix B = new Matrix(startState);
Matrix X = A.solve(B);
answer = X.getArray();
return answer;
check the determinant of the matrix - if zero, it means that the matrix does not have an inverse (rows making up the matrix are not independent). In that case, you can look into SVD, Gauss-Siedel, Jacobi iteration etc. Also, as an alternate library, you could look into apache commons math if it helps.

how to implement k-means for simple grouping in java

I would like to know simple k-means algorithm in java. I want to use k-means only for grouping one dimensional array not multi.
For example,
before grouping the array consists of 2,4,7,5,12,34,18,25
if we want four group then we got
group 1: 2,4,5
group 2: 7,12
group 3: 18,25
group 4: 34
You can take a look at the Weka implementation or simply use the Weka API if all you need are the clusters and not the implementation.
The standard (heuristic) algorithm for K-means clustering is presented on the Wikipedia page, together with links to variations and some existing implementations.
(This is programming forum, so it is reasonable to assume that you are capable of writing Java code yourself ... if you cannot find an existing implementation that it is suitable.)
You can implement k-Means as:
SimpleKMeans kmeans = new SimpleKMeans();
kmeans.setSeed(10);
// This is the important parameter to set
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(numberOfClusters);
kmeans.buildClusterer(instances);
// This array returns the cluster number (starting with 0) for each instance
// The array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();
int i=0;
for(int clusterNum : assignments) {
System.out.printf("Instance %d -> Cluster %d", i, clusterNum);
i++;
}
You can check my software : SPMF data mining software.
It offers an efficient implementation of KMeans in just 3 files so it should be easy to understand.
The software also offers many other algorithms. But you don't need them.
But another thing is that there is also a graphical user interface for launching KMeans and the other algorithms.

Which is the best way to implement a sparse vector in Java?

Which is the best way to implement a sparse vector in Java?
Of course the good thing would be to have something that can be manipulated quite easily (normalization, scalar product and so on)
Thanks in advance
MTJ has a Sparse Vector class. It has norm functions (1-norm 2-norm and ∞-norm) and dot product functions.
JScience has a SparseVector implementation that is part of its linear algebra package.
You can also try to look at la4j's CompressedVector implementation. It uses pair of arrays: array of values and array of their indicies. And with binary search on top of that it just flies. So, this implementation guarantees O(log n) running time for get/set operations.
Just a brief example
Vector a = new CompressedVector(new double[]{ 1.0, 2.0, 3.0 }).
// calculates L_1 norm of the vector
double n = a.norm();
// calculates the sum of vectors elements
double s = a.fold(Vectors.asSumAccumulator(0.0));

Categories

Resources