Java Library with subgraph isomorphism problem support? - java

I'm trying to analyze the usage of "#include" in C files (what is included first, dependencies...).
To do so, I extract from a C file the "#include" and I'm building a graph. I would like to identify common patterns in this graph...
So far, I'm using JGraphT as the graph engine (not sure this is the correct expression) and JGraph for the rendering (however using jgraph is a bit problematic since the Layouts are no longer included in the free release).
I've been unable to find any isomorphism support in jgrapht. Do you know any solution providing this kind of support (something like igraph but for java)..?
I'm using java 1.5 and the proposed solution must be free...

Not sure one of them can do isomorphism but I've collected a couple of links to graph layout engines in my blog: http://blog.pdark.de/2009/02/11/graph-layout-in-java/
You might want to look at graphviz, too. It's not Java but has a very powerful layout engine.
As for isomorphism: You probably only need to check for patterns at level 0 (i.e. the direct includes) because anything below that must be isomorphic by definition (all files included by some include file will always be the same unless someone used a lot of #if magic in the includes section).

Have you looked at Parsemis?
It's a Java graph mining library, and (sub)graph isomorphism is fundamental to this process, so my guess is that they're solving this issue somehow.
Not sure about the license though, but I believe it's open source as it was developed for academic reasons.

I've been pondering this problem myself lately (looking for common markup structures to factor out of JSPs into tags, in my case).
A library for this would be great. I haven't found one yet. In the meantime, here are a couple of problems that may be related to yours (isomorphically?).
I was planning to research the technique mathematical software uses to analytically evaluate integrals in calculus problems. In this case, there are a bunch of known structural patterns, and the problem in question has to be matched to one of the known patterns. The best way to do this is not always obvious because it depends on what terms are grouped together, etc.
Algorithms used in biology to find corresponding structures in two complex molecules might also be adapted to this problem.

Looks like there was a mention of isomorphism in the "experimental" package of JGraphT a few months back, but apparently no documentation.
Isomorphism comparison is a fundamental requirement in cheminformatics software (technically it's monomorphism that's used). Atoms are "nodes" and bonds are "edges". Molecular graphs are undirected and can be cyclic. A few open source cheminformatics libraries written in Java are available. You might be able to find some clues for solving your problem by looking at these libraries.
For example, I've written a BSD-licensed cheminformatics library called MX that implements a monomorphism algorithm based on VF. I wrote a high-level overview of how the algorithm was implemented, and you can browse the source for the mapping package in my GitHub repository. Most of the work is done in the DefaultState class.
MX also includes a fast exhaustive ring detector and other graph manipulations that might be applicable to your problem.

I sure don't know of a particular graph library with subgraph isomorphism code — since it's known NP-complete, you can't do a lot other than search anyway. It shows up a lot in graph rewriting schemes, so AGG might help.

Related

Maximum Entropy Markov Model for Named Entity Recognition in Java

I have a parsing problem that would be solved really well by a MEMM. But I have spent far to much time trying to find a good implementation of the algorithm (ideally in java). Has anyone done this before? Alternatively I could implement it myself if some-one has some readable documentation.
Thanks!
(I have already tried Mallet and the trainer in the jar was unimplemented)
Have you looked into Stanford NLP Group's CMMClassifier, found in Stanford CoreNLP suite of NLP tools?
I'm afraid I cannot speak to the quality of the underlying MEMM implementation, but it is in Java, and I've used several other parts of Stanford NLP with relative success.
I find that sometimes the drawback of CoreNLP is its extensive object model and the very many dependencies that most modules have. When one wishes to focus on a single tool/class the distraction and learning curve associated with these dependencies can be annoying. On the other hand, this object model effectively corresponds to actual lower and mid-level processes which are common to many NLP tasks and hence can be quite useful.
What is your reason for thinking MEMMs are particularly good for your problem? Usually it is very hard to find theoretical justifications why something would work better than something else and the question is resolved empirically.
If you have Mallet already, try using the Conditional Random Field implementation. Recent research, starting with Lafferty, McCallum and Pereira's Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data shows that CRF is often superior to MEMM for sequence tagging.

Looking for a reliable and efficient Java library for Formal Concept Analysis

I need to perform Formal Concept Analysis on the fly and am looking for an efficient SDK for calculating the concepts for a given context. There are a lot of research tools around but I'm looking for something supported and reliable.
Any help would be appreciated
Frameworks / Libraries
FCA Home suggests a list of software, including (few) Java solutions:
Colibri-Java
FCAlib
Reuse or Learn from Existing Tools
Maybe some FCA exploration tools and academic projects provide source code you could inspire yourself from, or directly reuse if their licenses permit such use:
ToscanaJ
Galicia
CORON
ConExp
Roll Out Your Own
Or, as a last resort, you'll need to roll out your own based on either (or a combination of):
existing solutions available in other languages (maybe you can use a binding?),
experimental research (like this one and a herd of other ones).

Any Latent Semantic Indexing?

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.
Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.
A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.
That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.
The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).
As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.
a google search for NLP tools provide this slides which i think helps ...
I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.
Have you tried the Semantic Vector package?
http://code.google.com/p/semanticvectors/

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in new line the last column will be diagnosis 0 or 1, in the testing part of data diagnosis column will be empty - data set contain about 1000 records) and then make predictions in testing part of data :/
I've never done anything similar so I'll appreciate any advice or information about solution to similar problem.
I was thinking about Java Machine Learning Library or Java Data Mining Package but I'm not sure if it's right direction... ? and I'm still not sure how to tackle this challenge...
Please advise.
All the best!
I strongly recommend you use Weka for your task
Its a collection of machine learning algorithms with a user friendly front-end which facilitates a lot of different kinds of feature and model selection strategies
You can do a lot of really complicated stuff using this without really having to do any coding or math
The makers have also published a pretty good textbook that explains the practical aspects of data mining
Once you get the hang of it, you could use its API to integrate any of its classifiers into your own java programs
Hi As Gann Bierner said, this is a classification problem. The best classification algorithm for your needs I know of is, Ross Quinlan algorithm. It's conceptually very easy to understand.
For off-the-shelf implementations of the classification algorithms, the best bet is Weka. http://www.cs.waikato.ac.nz/ml/weka/. I have studied Weka but not used, as I discovered it a little too late.
I used a much simpler implementation called JadTi. It works pretty good for smaller data sets such as yours. I have used it quite a bit, so can confidently tell so. JadTi can be found at:
http://www.run.montefiore.ulg.ac.be/~francois/software/jaDTi/
Having said all that, your challenge will be building a usable interface over web. To do so, the dataset will be of limited use. The data set basically works on the premise that you have the training set already, and you feed the new test dataset in one step, and you get the answer(s) immediately.
But my application, probably yours also, was a step by step user discovery, with features to go back and forth on the decision tree nodes.
To build such an application, I created a PMML document from my training set, and built a Java Engine that traverses each node of the tree asking the user to give an input (text/radio/list) and use the values as inputs to the next possible node predicate.
The PMML standard can be found here: http://www.dmg.org/ Here you need the TreeModel only. NetBeans XML Plugin is a good schema-aware editor for PMML authoring. Altova XML can do a better job, but costs $$.
It is also possible to use an RDBMS to store your dataset and create the PMML automagically! I have not tried that.
Good luck with your project, please feel free to let me know if you need further inputs.
There are various algorithms that fall into the category of "machine learning", and which is right for your situation depends on the type of data you're dealing with.
If your data essentially consists of mappings of a set of questions to a set of diagnoses each of which can be yes/no, then I think methods that could potentially work include neural networks and methods for automatically building a decision tree based on the test data.
I'd have a look at some of the standard texts such as Russel & Norvig ("Artificial Intelligence: A Modern Approach") and other introductions to AI/machine learning and see if you can easily adapt the algorithms they mention to your particular data. See also O'Reilly, "Programming Collective Intelligence" for some sample Python code of one or two algorithms that might be adaptable to your case.
If you can read Spanish, the Mexican publishing house Alfaomega have also published various good AI-related introductions in recent years.
This is a classification problem, not really data mining. The general approach is to extract features from each data instance and let the classification algorithm learn a model from the features and the outcome (which for you is 0 or 1). Presumably each of your 30 questions would be its own feature.
There are many classification techniques you can use. Support vector machines is popular as is maximum entropy. I haven't used the Java Machine Learning library, but at a glance I don't see either of these. The OpenNLP project has a maximum entropy implementation. LibSVM has a support vector machine implementation. You'll almost certainly have to modify your data to something that the library can understand.
Good luck!
Update: I agree with the other commenter that Russel and Norvig is a great AI book which discusses some of this. Bishop's "Pattern Recognition and Machine Learning" discusses classification issues in depth if you're interested in the down and dirty details.
Your task is classical for neural networks, which are intended first of all to solve exactly classification tasks. Neural network has rather simple realization in any language, and it is the "mainstream" of "machine learning", closer to AI than anything other.
You just implement (or get existing implementation) standart neural network, for example multilayered network with learning by error back propagation, and give it learning examples in cycle. After some time of such learning you will get it working on real examples.
You can read more about neural networks starting from here:
http://en.wikipedia.org/wiki/Neural_network
http://en.wikipedia.org/wiki/Artificial_neural_network
Also you can get links to many ready implementations here:
http://en.wikipedia.org/wiki/Neural_network_software

Graph Control in Java/Netbeans? Does it exist?

Is it possible that anyone has a link to a graph class/library that I can use to produce a graph in Java?
Thanks for any help!
My top choice would be JGraph as others have suggested; I am using JGraph5 because it is better documented than the newer alternative, JGraphX. EDIT: JGraphX turns out to be the far superior version, despite the lack of documentation. It's not that hard to figure out.
JGraph
Demonstration app
Feature list
Licensing agreement
Other alternatives I've researched:
JGraphT
"JGraphT is a free Java graph library that provides mathematical graph-theory objects and algorithms...complete source code included, under the terms of the GNU Lesser General Public License." (http://jgrapht.sourceforge.net/)
Main project repository
Example visualizations
JUNG - Java Universal Network/Graph Framework
Main project repository
yEd Graph Editor
Implementation of yFiles library
Demonstration Java applet
About yEd
Saves graphs in GraphML format
I used JGraph as a visualizer for networks of nodes/topologies at my previous job, it's not half bad once you get past the architecture (it's a big state machine if I recall correctly).
Visual graph: JFreeChart
You may also wish to consider the Google Charts API, if you can make web service requests.
A really good alternative is to used the Google Charts API. Platform independent, easy to use, and fast processing (done on Google Server side)
graphviz would be my choice. It's not Java, but still terrific and easy to use.
There is a Java component that works with dot to generate graphs. I've used it - very nice, indeed.
I would recommend JGraphT. I used it to create multi-leveled graphs in my dissertation and as the base of a GPS Routing Software, understanding what is going on I found a bit of a mind bender, but once looking at how the algorithms package works I found it quite easy to implement A*/D* heuristic algorithms. For working out the distance between nodes on the graph I'd also recommend looking at the Haversine function, if that's your thing.

Categories

Resources