Ways to persist Guava Graph - java

I'm using the common.graph from Google Guava in Version 21.0. It suits very well to my usecase without one aspect: Persistence. The graph seems to be in-memory only. The graph-classes does not implement Serializable, it was explained in this issue posts.
Google describes three models to store the topology. The third option is:
a separate data repository (for example, a database) stores the topology
But that's all. I didn't found any methods in the package to apply a separate data repository. Is there any way to do this? Or is the only way to use the nodes()and edges() method to get a Set of my nodes and a Set of my edges? I can persist them in a database if I implement Serializable in this classes and restore the graph by calling addNode(Node) and addEdge(Source, Target, Edge) (there are no addAll-methods). But this seems to be a workaround.
Thanks for your support!

To briefly recap the reason why Guava's common.graph classes aren't Serializable: Java serialization is fragile because it depends on the details of the implementation, and that can change at any time, so we don't support it for the graph types.
In the short term, your proposed workaround is probably your best bet, although you'll need to be careful to store the endpoints (source and target) of the edges alongside the edge objects so that you'll be able to rebuild the graph as you describe. And in fact this may work for you in the longer term, too, if you've got a database that you're happy with and you don't need to worry about interoperation with anyone else.
As I mentioned in that GitHub issue, another option is to persist your graph to some kind of file format. (Guava itself does not provide a mechanism for doing this, but JUNG will for common.graph graphs once I can get 3.0 out the door, which I'm still working on.) Note that most graph file formats (at least the ones I'm familiar with) have fairly limited support for storing node and edge metadata, so you might want your own file format (say, something based on protocol buffers).

One way I found of storing the graph was through the DOT format, like so:
public class DOTWriter<INode, IEdge> {
public static String write(final Graph graph) {
StringBuilder sb = new StringBuilder();
sb.append("strict digraph G {\n");
for(INode n : graph.nodes()) {
sb.append(" \"" + n.getId() + "\n");
}
for(IEdge e : graph.edges()) {
sb.append(" \"" + e.getSource().getId() + "\" -> \"" + e.getTarget().getId() + "\" " + "\n");
}
sb.append("}");
return sb.toString();
}
}
This will produce something like
strict digraph G {
node_A;
node_B;
node_A -> node_B;
}
It's very easy to read this and build the graph in memory again.
If your nodes are complex objects you should store them separately though.

Based on #Maria Ines Parnisari's amazing answer, I modified a little😊. Then drawing with mermaid(a markdown plugin), I get a clear image like this in Idea(>=2021.2, support markdown better)!
code
//noinspection UnstableApiUsage
MutableGraph<String> graph = GraphBuilder.directed()
.allowsSelfLoops(false)
.build();
//noinspection UnstableApiUsage
graph.addNode("root");
graph.putEdge("root", "s1_1");
graph.putEdge("root", "s1_2");
graph.putEdge("root", "s1_3");
graph.putEdge("s1_2", "s2");
graph.putEdge("s2", "s3");
graph.putEdge("s3", "s4");
graph.putEdge("s3", "s5");
graph.putEdge("s4", "s6");
graph.putEdge("s5", "s6");
graph.putEdge("s1_1", "s6");
graph.putEdge("s1_1", "s2");
// print mermaid text , then copy it
StringBuilder sb = new StringBuilder();
for (EndpointPair<String> edge : graph.edges()) {
// shoudle be `-->` to draw with mermaid
sb.append(edge.nodeU() + " --> " + edge.nodeV() + "\n");
}
System.out.println(sb);
sorry As < 10 repution, i can't copy image directly that image will be hidden

Related

Java - get rid of Hardcoding

I am new to Java, working on a desktop application thereby understanding Java standards and concepts. I have a portion in the application where I do XML parsing, which also forms the majority of the application. The application parses a huge XML file and generates some readable texts depending on nodes and values. As of now, everything in the parsing logic is hard-coded.
As the application grows and new requirements and change requests come in, I prefer getting rid of all hard-coded stuff and would like to maintain a config file (like lookup table). I would like to revise the existing behavior which is seen as below
public String getReadable() {
action = action.getFirstChild();
switch (action.getNodeName()) {
case "Drive": {
String name = XMLUtility.getChildFrom(action, 0).getTextContent();
String model = XMLUtility.getChildFrom(action, 1).getTextContent();
String speed = XMLUtility.getChildFrom(action, 2).getTextContent();
return name + "is driving the " + model + " vehicle at a speed of " + speed + "mph";
}
case "Stop": {
String model = action.getFirstChild().getTextContent();
return model + " is now stopped";
}
}
return "";
}
There are hundreds of similar functions in my application under the parsing mechanism. I would like to maintain all the hard-coded stuffs centrally which is the more flexible way of maintaining the code.
But, as I do not have any prior experience in Java, I am looking for a standard way of doing this. Please recommend a legible approach to improve my code.

Using training made with python API as input to LabelImage module in java API?

I have a problem with java tensorflow API. I have run the training using the python tensorflow API, generating the files output_graph.pb and output_labels.txt. Now for some reason I want to use those files as input to the LabelImage module in java tensorflow API. I thought everything would have worked fine since that module wants exactly one .pb and one .txt. Nevertheless, when I run the module, I get this error:
2017-04-26 10:12:56.711402: W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
Exception in thread "main" java.lang.IllegalArgumentException: No Operation named [input] in the Graph
at org.tensorflow.Session$Runner.operationByName(Session.java:343)
at org.tensorflow.Session$Runner.feed(Session.java:137)
at org.tensorflow.Session$Runner.feed(Session.java:126)
at it.zero11.LabelImage.executeInceptionGraph(LabelImage.java:115)
at it.zero11.LabelImage.main(LabelImage.java:68)
I would be very grateful if you help me finding where the problem is. Furthermore I want to ask you if there is a way to run the training from java tensorflow API, because that would make things easier.
To be more precise:
As a matter of fact, I do not use self-written code, at least for the relevant steps. All I have done is doing the training with this module, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py, feeding it with the directory that contains the images divided among subdirectories according to their description. In particular, I think these are the lines that generate the outputs:
output_graph_def = graph_util.convert_variables_to_constants(
sess, graph.as_graph_def(), [FLAGS.final_tensor_name])
with gfile.FastGFile(FLAGS.output_graph, 'wb') as f:
f.write(output_graph_def.SerializeToString())
with gfile.FastGFile(FLAGS.output_labels, 'w') as f:
f.write('\n'.join(image_lists.keys()) + '\n')
Then, I give the outputs (one some_graph.pb and one some_labels.txt) as input to this java module: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/examples/LabelImage.java, replacing the default inputs. The error I get is the one reported above.
The model used by default in LabelImage.java is different that the model that is being retrained, so the names of inputs and output nodes do not align. Note that TensorFlow models are graphs and the arguments to feed() and fetch() are names of nodes in the graph. So you need to know the names appropriate for your model.
Looking at retrain.py, it seems that it has a node that takes the raw contents of a JPEG file as input (the node DecodeJpeg/contents) and produces the set of labels in the node final_result.
If that's the case, then you'd do something like the following in Java (and you don't need the bit that constructs a graph to normalize the image since that seems to be a part of the retrained model, so replace LabelImage.java:64 with something like:
try (Tensor image = Tensor.create(imageBytes);
Graph g = new Graph()) {
g.importGraphDef(graphDef);
try (Session s = new Session(g);
// Note the change to the name of the node and the fact
// that it is being provided the raw imageBytes as input
Tensor result = s.runner().feed("DecodeJpeg/contents", image).fetch("final_result").run().get(0)) {
final long[] rshape = result.shape();
if (result.numDimensions() != 2 || rshape[0] != 1) {
throw new RuntimeException(
String.format(
"Expected model to produce a [1 N] shaped tensor where N is the number of labels, instead it produced one with shape %s",
Arrays.toString(rshape)));
}
int nlabels = (int) rshape[1];
float[] probabilities = result.copyTo(new float[1][nlabels])[0];
// At this point nlabels = number of classes in your retrained model
DoSomethingWith(probabilities);
}
}
Hope that helps.
Regarding the "No operation" error, I was able to resolve that by using input and output layer names "Mul" and "final_result", respectively. See:
https://github.com/tensorflow/tensorflow/issues/2883

Naive Bayes Text Classification Algorithm

Hye there! I just need the help for implementing Naive Bayes Text Classification Algorithm in Java to just test my Data Set for research purposes. It is compulsory to implement the algorithm in Java; rather using Weka or Rapid Miner tools to get the results!
My Data Set has the following type of Data:
Doc Words Category
Means that I have the Training Words and Categories for each training (String) known in advance. Some of the Data Set is given below:
Doc Words Category
Training
1 Integration Communities Process Oriented Structures...(more string) A
2 Integration Communities Process Oriented Structures...(more string) A
3 Theory Upper Bound Routing Estimate global routing...(more string) B
4 Hardware Design Functional Programming Perfect Match...(more string) C
.
.
.
Test
5 Methodology Toolkit Integrate Technological Organisational
6 This test contain string naive bayes test text text test
SO the Data Set comes from a MySQL DataBase and it may contain multiple training strings and test strings as well! The thing is I just need to implement Naive Bayes Text Classification Algorithm in Java.
The algorithm should follow the following example mentioned here Table 13.1
Source: Read here
The thing is that I can implement the algorithm in Java Code myself but i just need to know if it is possible that there exist some kind a Java library with source code documentation available to allow me to just test the results.
The problem is I just need the results for just one time only means its just a test for results.
So, come to the point can somebody tell me about any good java library that helps my code this algorithm in Java and that could made my dataset possible to process the results, or can somebody give me any good ideas how to do it easily...something good that can help me.
I will be thankful for your help.
Thanks in advance
As per your requirement, you can use the Machine learning library MLlib from apache. The MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities. There is also a java code template to implement the algorithm utilizing the library. So to begin with, you can:
Implement the java skeleton for the Naive Bayes provided on their site as given below.
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.mllib.classification.NaiveBayes;
import org.apache.spark.mllib.classification.NaiveBayesModel;
import org.apache.spark.mllib.regression.LabeledPoint;
import scala.Tuple2;
JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set
final NaiveBayesModel model = NaiveBayes.train(training.rdd(), 1.0);
JavaPairRDD<Double, Double> predictionAndLabel =
test.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2<Double, Double>(model.predict(p.features()), p.label());
}
});
double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {
#Override public Boolean call(Tuple2<Double, Double> pl) {
return pl._1().equals(pl._2());
}
}).count() / (double) test.count();
For testing your datasets, there is no best solution here than use the Spark SQL. MLlib fits into Spark's APIs perfectly. To start using it, I would recommend you to go through the MLlib API first, implementing the Algorithm according to your needs. This is pretty easy using the library.
For the next step to allow the processing of your datasets possible, just use the Spark SQL.
I will recommend you to stick to this. I too have hunted down multiple options before settling for this easy to use library and it's seamless support for inter-operations with some other technologies. I would have posted the complete code here to perfectly fit your answer. But I think you are good to go.
You can use the Weka Java API and include it in your project if you do not want to use the GUI.
Here's a link to the documentation to incorporate a classifier in your code:
https://weka.wikispaces.com/Use+WEKA+in+your+Java+code
Please take a look at the Bow toolkit.
It has a Gnu license and source code. Some of its code includes
Setting word vector weights according to Naive Bayes, TFIDF, and several other methods.
Performing test/train splits, and automatic classification tests.
It's not a Java library, but you could compile the C code to ensure that you Java had similar results for a given corpus.
I also spotted a decent Dr. Dobbs article that implements in Perl. Once again, not the desired Java, but will give you the one-time results that you are asking for.
Hi I thinks Spark would help you a lot:
http://spark.apache.org/docs/1.2.0/mllib-naive-bayes.html
you can even choose the language you think is the most appropriate to your needs Java / Python / Scala!
You may want to take a look at this.
https://mahout.apache.org/users/classification/bayesian.html
Please use scipy from python. There is already an implementation of what you need:
class sklearn.naive_bayes.MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None)¶
scipy
You can use an algorithm platform like KNIME, it has variety of classification algorithms (Naive bayed included). You can run it with a GUI or Java API.
If you want to implement Naive Bayes Text Classification Algorithm in Java, then WEKA Java API will be a better solution. The data set should have to be in .arff format. Creating an .arff file from mySql database is very easy. Here is the attachment of the java code for the classifier a link of a sample .arff file.
Create a new Text document. Open it with Notepad. Copy and paste all the texts below the link. Save it as DataSet.arff. http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.arff
Download Weka Java API: http://www.java2s.com/Code/Jar/w/weka.htm
Code for the classifier:
public static void main(String[] args) {
try {
StringBuilder txtAreaShow = new StringBuilder();
//reads the arff file
BufferedReader breader = null;
breader = new BufferedReader(new FileReader("DataSet.arff"));
//if 40 attributes availabe then 39 will be the class index/attribuites(yes/no)
Instances train = new Instances(breader);
train.setClassIndex(train.numAttributes() - 1);
breader.close();
//
NaiveBayes nB = new NaiveBayes();
nB.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.crossValidateModel(nB, train, 10, new Random(1));
System.out.println("Run Information\n=====================");
System.out.println("Scheme: " + train.getClass().getName());
System.out.println("Relation: ");
System.out.println("\nClassifier Model(full training set)\n===============================");
System.out.println(nB);
System.out.println(eval.toSummaryString("\nSummary Results\n==================", true));
System.out.println(eval.toClassDetailsString());
System.out.println(eval.toMatrixString());
//txtArea output
txtAreaShow.append("\n\n\n");
txtAreaShow.append("Run Information\n===================\n");
txtAreaShow.append("Scheme: " + train.getClass().getName());
txtAreaShow.append("\n\nClassifier Model(full training set)"
+ "\n======================================\n");
txtAreaShow.append("" + nB);
txtAreaShow.append(eval.toSummaryString("\n\nSummary Results\n==================\n", true));
txtAreaShow.append(eval.toClassDetailsString());
txtAreaShow.append(eval.toMatrixString());
txtAreaShow.append("\n\n\n");
System.out.println(txtAreaShow.toString());
} catch (FileNotFoundException ex) {
System.err.println("File not found");
System.exit(1);
} catch (IOException ex) {
System.err.println("Invalid input or output.");
System.exit(1);
} catch (Exception ex) {
System.err.println("Exception occured!");
System.exit(1);
}
You can take a look at Blayze - It's a pretty minimal Naive Bayes library for the JVM written in Kotlin. Should be easy to follow.
Full disclosure: I'm one of the authors of Blayze

caching in java

guys
I am implementing a simple example of 2 level cache in java:
1st level is memeory
2nd - filesystem
I am new in java and I do this just for understanding caching in java.
And sorry for my English, this language is not native for me :)
I have completed 1st level by using LinkedHashMap class and removeEldestEntry method and it is looks like this:
import java.util.*;
public class level1 {
private static final int max_cache = 50;
private Map cache = new LinkedHashMap(max_cache, .75F, true) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > max_cache;
}
};
public level1() {
for (int i = 1; i < 52; i++) {
String string = String.valueOf(i);
cache.put(string, string);
System.out.println("\rCache size = " + cache.size() +
"\tRecent value = " + i +
" \tLast value = " +
cache.get(string) + "\tValues in cache=" +
cache.values());
}
}
Now, I am going to code my 2nd level. What code, methods I should write to implement this tasks:
1) When the 1st level cache is full, the value shouldn't be removed by removeEldestEntry but it should be moved to 2nd level (to file)
2) When the new values are added to 1st level, firstly this value should be checked in file (2nd level) and if it exists it should be moved from 2nd to 1st level.
And I tried to use LRUMap to upgrade my 1st level but the compiler couldn't find class LRUMap in library, what's the problem? Maybe special syntax needed?
You can either use the built in java serialization mechanism and just send your stuff to file by wrapping FileOutputStrem with DataOutputStream and then calling writeObjet().
This method is simple but not flexible enough. for example you will fail to read old cache from file if your classes changed.
You can use serialization to xml, e.g. JaxB or XStream. I used XStream in past and it worked just fine. You can easily store any collection in file and the restore it.
Obviously you can store stuff in DB but it is more complicated.
A remark is that you are not getting thread safety under consideration for your cache! By default LinkedHashMap is not thread-safe and you would need to synchronize your access to it. Even better you could use ConcurrentHashMap which deals with synchronization internally being able to handle by default 16 separate threads (you can increase this number via one of its constructors).
I don't know your exact requirements or how complicated you want this to be but have you looked at existing cache implementations like the ehcache library?

BooleanQuery$TooManyClauses exception when using wildcard queries

I'm using Hibernate Search / Lucene to maintain a really simple index to find objects by name - no fancy stuff.
My model classes all extend a class NamedModel which looks basically as follows:
#MappedSuperclass
public abstract class NamedModel {
#Column(unique = true)
#Field(store = Store.YES, index = Index.UN_TOKENIZED)
protected String name;
}
My problem is that I get a BooleanQuery$TooManyClauses exception when querying the index for objects with names starting with a specific letter, e.g. "name:l*".
A query like "name:lin*" will work without problems, in fact any query using more than one letter before the wildcard will work.
While searching the net for similar problems, I only found people using pretty complex queries and that always seemed to cause the exception. I don't want to increase maxClauseCount because I don't think it's a good practice to change limits just because you reach them.
What's the problem here?
Lucene tries to rewrite your query from simple name:l* to a query with all terms starting with l in them (something like name:lou OR name:la OR name: ...) - I believe as this is meant to be faster.
As a workaround, you may use a ConstantScorePrefixQuery instead of a PrefixQuery:
// instead of new PrefixQuery(prefix)
new ConstantScoreQuery(new PrefixFilter(prefix));
However, this changes scoring of documents (hence sorting if you rely on score for sorting). As we faced the challenge of needing score (and boost), we decided to go for a solution where we use PrefixQuery if possible and fallback to ConstantScorePrefixQuery where needed:
new PrefixQuery(prefix) {
public Query rewrite(final IndexReader reader) throws IOException {
try {
return super.rewrite(reader);
} catch (final TooManyClauses e) {
log.debug("falling back to ConstantScoreQuery for prefix " + prefix + " (" + e + ")");
final Query q = new ConstantScoreQuery(new PrefixFilter(prefix));
q.setBoost(getBoost());
return q;
}
}
};
(As an enhancement, one could use some kind of LRUMap to cache terms that failed before to avoid going through a costly rewrite again)
I can't help you with integrating this into Hibernate Search though. You might ask after you've switched to Compass ;)

Categories

Resources