Any Latent Semantic Indexing? - java

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.

Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.

A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.
That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.

The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).
As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.

a google search for NLP tools provide this slides which i think helps ...

I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.

Have you tried the Semantic Vector package?
http://code.google.com/p/semanticvectors/

Related

Status of Apache-Commons Commons-Functor

When I look at the commons-functor website website, it appears to be out of sandbox state, but it also says there is no official binary release? But I thought I saw it in some Maven repository somewhere and can't find it now. Does anyone know what the status is and whether there is an official binary release? I suspect I am just poor at navigating the Commons website.
Looks like the last development was three weeks ago (see this) and their "release notes" for 1.0 are just a skeleton (see here).
A couple quick searches show that Apache Commons Lang used to have a package org.apache.commons.lang.functor - but this was removed some time ago, it seems (like 2003 or so)
As a side note, it appears that Apache Commons Collections has a package org.apache.commons.collections.functors - but this might not be what you're looking for.
I have absolutely no idea what the release or maintenance state of this library is. My apologies.
But what i do know is that the world needs another functional programming library for Java like it needs a hole in the, er, head. Ozone layer? There are already quite a number in circulation - Functional Java, the functional parts of Guava, LambdaJ, and others - all doing much the same thing (or at least having overlapping bits doing much the same thing). What we need to do now is to start coalescing our attention around two or three of these libraries, developing common styles and idioms for using them.</rant>
At the company where i work, where there are a lot of big fans of functional programming, we seem to have settled on Functional Java, having had LambdaJ, Guava, and a couple of homebrewed functional frameworks in our codebase (and having rewritten bits of it in Scala!). That decision was made by people with deeper understanding of functional style than me, and before i joined the company, so i can't explain the reasoning, merely report that it was made. Functional Java is actively developed, and it's in Maven. I would urge you to have a look at it, and see if it meets your needs.
Commons Functor is heading toward it's first official release. You can give it a try by using the nightly snapshot from the Apache repository:
https://repository.apache.org/content/groups/snapshots/org/apache/commons/commons-functor/1.0-SNAPSHOT/

How do I learn to use Java commons-collections?

Weird title, I know, let me explain.
I am a developer most familiar with C# and Javascript. I am completely sunk into those semi-functional worlds to the point that most of my code is about mapping/reducing/filtering collections. In C# that means I use LINQ just about everywhere, in Javascript it's Underscore.js and jQuery.
I have currently been assigned to an ongoing Java project and am feeling rather stifled. I simply do not think in terms of "create an array, shuffle stuff from one to another". I can (and did) create my own versions of the main map/reduce functions using anonymous types implementing interfaces but why re-invent the wheel? The project I am currently on already has commons-collections-3.1.jar and looking through the classes contained it seems like it likely can do everything that I want and more.
For the life of me, I can't find how to actually use it. Looking through the dozens of classes therein is not very helpful and the only thing I can google up is the api doc which is equally as helpful.
How do you use it to Map/Select, Filter/Where, Reduce/Aggregate? Is there anywhere that gives an actual tutorial on this library?
(Comment as answer for formatting purposes.)
Not so much, other than the limited user guide.
That said, I'm not sure where specifically you're having problems--filtering and selecting is mostly wrapped up in the functors package, and utilized by the CollectionUtils class.
While you're not looking for a replacement, you might find things like Guava or Lambda4J a bit more similar to what you're used to (within Java's constraints), and they're a bit less verbose.
Try these links :
http://commons.apache.org/collections/userguide.html (basic tutorial)
http://larvalabs.com/collections/tutorial.html (advanced tutorial with generic)
#george-mauer, you might have to rely on articles like this or a book like Jakarta Commons Cookbook. I have also found it rather useful to learn by creating samples of my own.

Backpropagation through time

Does anyone know of a library with a working implementation of backpropagation through time?
Any of Java/Python/C#/VB.NET/F# (preferably the last one) will do!
Assuming you're already using some library for BP, it should be (TM) rather straightforward to implement BPTT using BP as a step in the process.
The Wikipedia entry for BPTT [1] includes relevant pseudo code.
My own starting point, about 18 years ago, was "The Truck Backer-Upper: An Example of Self-Learning in Neural Networks" [2].
[1] http://en.wikipedia.org/wiki/Backpropagation_through_time
[2] http://www-isl.stanford.edu/~widrow/papers/c1989thetruck.pdf
I've used NeuronDotNet only for a limited time though. It allows you to create a feed-forward BackPropagation NN. I especially liked their use of intuitively named classes. Good luck!
This is a .net library.
I'm from a Java background but Encog has a .net implementation as well (and is a seriously good framework for NNets, with good time series support)
Can't help with an F# framework, but what domain are you coding for? If it's finance I'll reassert the "take a look at Encog"
Perhaps pybrain would do? The docstring for its BackpropTrainer class suggests that it does backpropagation through time:
class BackpropTrainer(Trainer):
"""Trainer that trains the parameters of a module according to a
supervised dataset (potentially sequential) by backpropagating the errors
(through time)."""
What about this one ? Just a Google search to help...
I've had good experiences with Weka - In my view one of the best and almost certainly the most comprehensive general purpose machine learning libraries around.
You could certainly do BPTT with Weka - you may find a ready made classifier that does what you need but even if not you can just chain a few normal backpropagation units together as per the very good wikipedia article on BPTT
I made backpropagation algorithm in Java quite time ago. I uploaded it into GitHub, maybe you can find it useful: https://github.com/bernii/NeuralNetwokPerceptronKohonen
Let me now if it was helpful :)
You can use TensorFlow's dynamic_rnn() function (API doc). TensorFlow's tutorial on Recurrent Neural Networks will help.
Also, this great blog post provides a nice introduction to predicting sequences using TensorFlow. Here's another blog post with some code to predict a time series.

Graph Control in Java/Netbeans? Does it exist?

Is it possible that anyone has a link to a graph class/library that I can use to produce a graph in Java?
Thanks for any help!
My top choice would be JGraph as others have suggested; I am using JGraph5 because it is better documented than the newer alternative, JGraphX. EDIT: JGraphX turns out to be the far superior version, despite the lack of documentation. It's not that hard to figure out.
JGraph
Demonstration app
Feature list
Licensing agreement
Other alternatives I've researched:
JGraphT
"JGraphT is a free Java graph library that provides mathematical graph-theory objects and algorithms...complete source code included, under the terms of the GNU Lesser General Public License." (http://jgrapht.sourceforge.net/)
Main project repository
Example visualizations
JUNG - Java Universal Network/Graph Framework
Main project repository
yEd Graph Editor
Implementation of yFiles library
Demonstration Java applet
About yEd
Saves graphs in GraphML format
I used JGraph as a visualizer for networks of nodes/topologies at my previous job, it's not half bad once you get past the architecture (it's a big state machine if I recall correctly).
Visual graph: JFreeChart
You may also wish to consider the Google Charts API, if you can make web service requests.
A really good alternative is to used the Google Charts API. Platform independent, easy to use, and fast processing (done on Google Server side)
graphviz would be my choice. It's not Java, but still terrific and easy to use.
There is a Java component that works with dot to generate graphs. I've used it - very nice, indeed.
I would recommend JGraphT. I used it to create multi-leveled graphs in my dissertation and as the base of a GPS Routing Software, understanding what is going on I found a bit of a mind bender, but once looking at how the algorithms package works I found it quite easy to implement A*/D* heuristic algorithms. For working out the distance between nodes on the graph I'd also recommend looking at the Haversine function, if that's your thing.

Java Library with subgraph isomorphism problem support?

I'm trying to analyze the usage of "#include" in C files (what is included first, dependencies...).
To do so, I extract from a C file the "#include" and I'm building a graph. I would like to identify common patterns in this graph...
So far, I'm using JGraphT as the graph engine (not sure this is the correct expression) and JGraph for the rendering (however using jgraph is a bit problematic since the Layouts are no longer included in the free release).
I've been unable to find any isomorphism support in jgrapht. Do you know any solution providing this kind of support (something like igraph but for java)..?
I'm using java 1.5 and the proposed solution must be free...
Not sure one of them can do isomorphism but I've collected a couple of links to graph layout engines in my blog: http://blog.pdark.de/2009/02/11/graph-layout-in-java/
You might want to look at graphviz, too. It's not Java but has a very powerful layout engine.
As for isomorphism: You probably only need to check for patterns at level 0 (i.e. the direct includes) because anything below that must be isomorphic by definition (all files included by some include file will always be the same unless someone used a lot of #if magic in the includes section).
Have you looked at Parsemis?
It's a Java graph mining library, and (sub)graph isomorphism is fundamental to this process, so my guess is that they're solving this issue somehow.
Not sure about the license though, but I believe it's open source as it was developed for academic reasons.
I've been pondering this problem myself lately (looking for common markup structures to factor out of JSPs into tags, in my case).
A library for this would be great. I haven't found one yet. In the meantime, here are a couple of problems that may be related to yours (isomorphically?).
I was planning to research the technique mathematical software uses to analytically evaluate integrals in calculus problems. In this case, there are a bunch of known structural patterns, and the problem in question has to be matched to one of the known patterns. The best way to do this is not always obvious because it depends on what terms are grouped together, etc.
Algorithms used in biology to find corresponding structures in two complex molecules might also be adapted to this problem.
Looks like there was a mention of isomorphism in the "experimental" package of JGraphT a few months back, but apparently no documentation.
Isomorphism comparison is a fundamental requirement in cheminformatics software (technically it's monomorphism that's used). Atoms are "nodes" and bonds are "edges". Molecular graphs are undirected and can be cyclic. A few open source cheminformatics libraries written in Java are available. You might be able to find some clues for solving your problem by looking at these libraries.
For example, I've written a BSD-licensed cheminformatics library called MX that implements a monomorphism algorithm based on VF. I wrote a high-level overview of how the algorithm was implemented, and you can browse the source for the mapping package in my GitHub repository. Most of the work is done in the DefaultState class.
MX also includes a fast exhaustive ring detector and other graph manipulations that might be applicable to your problem.
I sure don't know of a particular graph library with subgraph isomorphism code — since it's known NP-complete, you can't do a lot other than search anyway. It shows up a lot in graph rewriting schemes, so AGG might help.

Categories

Resources