Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a fast svd library, in either c, c++ or java. Ultimately I'm using Java, but I'm very comfortable using jna to wrap c++, eg http://github.com/hughperkins/jeigen
I'm looking for a fast svd library that will handle sparse matrices. To keep this objective, so that the question doesn't get marked as too subjective, let's say:
targeting use with news20.binary , eg from http://mldata.org/repository/data/viewslug/news20binary/
how fast does it take to run?
how much variance is conserved, eg for an S matrix of size 6 or 20?
I looked around at a few libraries and found:
matlab: super fast, about 10 seconds, but it's not really a 'library' as such. average squared projection error: 0.93
redsvd: super fast, about 1 second to run, for 6 features, but the average squared projection error is 0.97, which is very high
Eigen's svd is both very slow, and only for dense matrices
svdlibc: ran for 28 minutes before I stopped it; I guess it's calculating the full S, rather than just the first 6 features or so
Basically, I'm looking for a library that gives about the same speed and average squared projection error as matlab, or at least, somewhat comparable.
From my experience, svdlibc is the best library of those options. I've dug a bit through its code before and I don't believe it's calculating the full S matrix (i.e., it is a true "thin svd"). If you can control the matrix representation on disk, svdlibc performs much faster when using the sparse binary input format due to the significantly lower I/O overhead.
The S-Space Package provided an executable jar around the SVDLIBJ java port of SVDLIBC. However, they found it had different results than SVDLIBC for certain input solutions.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am looking for a library in java or scala which can do the same clustering like scipy's linkage does.
Performs hierarchical/agglomerative clustering.
The input y may be either a 1d compressed distance matrix or a 2d
array of observation vectors.
If y is a 1d compressed distance matrix, then y must be a (n2)(n2)
sized vector where n is the number of original observations paired in
the distance matrix. The behavior of this function is very similar to
the MATLAB linkage function.
The java libraries I have found (like jblas) are pretty low level lacking of higher order algoritms like linkage. On the other hand I am pretty sure there are some libraries doing that. Would be nice if you could pin point me to one or two.
PS One can find a lot of indviduals implementing some hierarchical clustering, I prefer something more trustable library like commons math if possible. But there I could only find k means clustering.
In the end I am using this library https://github.com/lbehnke/hierarchical-clustering-java
Its not heavyly maintained but passes the comparisment to python and matlab implementations.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have an multi-objective optimization problem which I would like to solve, preferably in Java using evolutionary algorithms.
I use a parametric finite element model with a couple of real or integer input values x1...xn describing for example the geometry of the model. Each parameter can have values in a certain interval, e.g. x1 \in [2,10], x2 \in [1,4], ...
My goal is to find the optimum solution for one or more given criteria which I calculate within the finite element model. So the values of the objective function are calculated by the model.
I basically need a framework where I can define optimization parameters with certain intervals (x1...xn). The framework should build an initial population with starting values for x1...xn for each individual. With those values I create my model for each individual, perform my calculations and give back the values of the target funtion. Than the framework does its job and creates a new offspring population.
Is there an evolutionary algorithm framework in Java that can do that?
I had a quick look at TinyGP, Jenetics and JGAP. But these focus on Genetic Programming and Symbolic Regression problems. Or did I miss something fundamental?
You can look at watchmaker api
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Looking for a high-performance String hashing functions in Java/Scala - something faster than functions from MurmurHash family, doesn't need to be cryptographically strong, only distribute well.
Any suggestions?
You can find very fast hash function implementations for Java, which BTW account internal String implementation (char[] array) to maximize speed, here: https://github.com/OpenHFT/Zero-Allocation-Hashing
The fastest hashing algorithm that fits the bill presently seems to be xxHash. The lz4-java project contains an implementation ported to Java. I don't know whether the Java implementation has been benchmarked against MurmurHash, though; performance optimizations in C++ don't always port to/from Java. (In particular, xxHash contains more array access, so there could be non-negligible bounds-checking overhead.)
Edit: it looks to me like it uses JNI to call the C++ implementation of xxHash, but JNI overhead is non-negligible, so the performance concerns remain.
However, given that Scala includes a MurmurHash function, and that Java contains a faster default hash (about 2x) that is sorta-reasonably distributed sometimes, one does wonder whether it's really necessary. For instance, scala.util.hashing.MurmurHash3 is about as fast as string creation from an array of bytes, and is twice as fast as that if you give it an array of bytes.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm using mahout KMeansDriver to build clusters, and want to use Spearman as DistanceMeasure.
Can I find this algorithm in java or do I need to write it myself?
I didn't find any examples for that on web.
Do not use k-means with other distance measures.
It may stop converging.
K-means is designed to minimize variance. Your distance function must also minimize variance, otherwise you lose the convergence property. For guaranteed convergence with other distances, see partitioning around medoids (PAM) aka k-medoids.
Correlation measures are a good example of distances that do not work with k-means:
Consider the two vectors, and absolute spearman correlation: dist=1-|r|
1 2 3 4 5
5 4 3 2 1
Obviously, spearman correlation is -1, and these two vectors are considered "identical".
However, k-means will now compute the mean of these two, which yields the constant vector
3 3 3 3 3
which is as dis-similar to these two (in fact, it's correlation with anything isn't even well defined). In other words: the mean does not minimize absolute correlation, and
you shouldn't use this distance function.
Variance = squared Euclidean
This is why you should be using k-means only with squared Euclidean distance.
On L2 normalized vectors: Variance ~ Cosine
This is easy to see when looking at the definition of cosine similarity, and the reason why spherical k-means also works.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In Java, I have set of expressions like cond1 AND (cond2 OR cond3) AND ( cond 4 OR cond5). I would like to convert it into tree and then evaluate the final boolean answer. I tried searching a lot around java BDD but not able to get any. Any suggestion with sample code ?
A 5-second Google search returned some reasonable-looking results:
JavaBDD
Java Decision Diagram Libraries
What is the best Binary Decision Diagram library for Java?
Is this not what you're looking for?
He means Binary Decision Diagrams.
I've been tinkering with JavaBDD and JBDD/JDD. Both are based on BuDDY (a C library) -- JBDD actually uses the C DLLs for a marginal performance boost.
It looks to me like JavaBDD is more fully-featured (ex. it supports composing BDDs, which is what I need). But there is also no tutorial for it, and while the class docs aren't terrible, frankly I can't figure out how to use it for the most basic of boolean operations (like the problem you pose).
JBDD/JDD requires you to use manual garbage collection, and does weird things like store BDD objects in Java integers -- clearly carry-overs from C. But it has a set of tutorials.
If you want to run your own parser, check out JavaCC.
Here is a nice tutorial to get you started. A bit older, but still valid:
http://www.javaworld.com/jw-12-2000/jw-1229-cooltools.html