I am currently writing a program that deals with graphs created by the jgrapht library. I have multiple graphs of the form:
UndirectedGraph <Integer, DefaultEdge> g_x = new SimpleGraph<Integer, DefaultEdge (DefaultEdge.class);
g.addVertex(1);
g.addVertex(2);
g.addVertex(3);
g.addEdge(1, 2);
g.addEdge(2, 4);
...
which are constant graphs associated with street maps that I am given as files. Right now I have all of my graphs declared in my main method and just reference the graph I want when a map is loaded. What I would like to do is have another file paired with each map (i.e map1.map and map1.graph) so that when I load the map from a file I can also load the graph like:
map = loadMap(mapName);
g_x = loadGraph(mapName);
where mapName is the file name prefix and not have to store it in my source code. Is it possible to do this in java and if so how would I create the files and load them? Would it also be possible to do this with a generic Object?
One option is to serialize your objects to xml or json (you could change the .xml to .map if you really wanted). Then you can open the xml in your code for each object you wish to load.
Serializing:
File file = new File(**filename**);
FileOutputStream out = new FileOutputStream(file);
XStream xmlStream = new XStream(new DomDriver());
out.write(xmlStream.toXML(**ObjectToSave**).getBytes());
out.close();
Deserializing:
try {
XStream xmlStream = new XStream(new DomDriver());
state = (**ClassNameYouWishToSave**) xmlStream.fromXML(new FileInputStream(**filename**));
} catch(IOException e) { e.printStackTrace(); }
You will need these imports:
import com.thoughtworks.xstream.XStream;
import com.thoughtworks.xstream.io.xml.DomDriver;
It is a simplistic way to do it, but it works. Hope it helps.
Related
I need to write unit tests for serialization() and deserialization() functions which I've no idea how to cover FileChooser and FileInputStream.
And also, if there are two functions in this file, I must write two and only two corresponding test functions?
The two functions are as followings:
/**
* serialize and save all the datasets and charts
* #param myDataset
* pass all the existing datasets into this function for serialization purpose
* #param myChart
* pass all the existing charts into this function for serialization purpose
*/
public static void serialize (HashMap<String, DataTable> myDataset, HashMap<String, Chart> myChart) {
// first, let the user select a directory to save
FileChooser chooser = new FileChooser();
chooser.setTitle("Save");
FileChooser.ExtensionFilter extFilter = new FileChooser.ExtensionFilter("comp3111 type", "*.comp3111");
chooser.getExtensionFilters().add(extFilter);
File file = chooser.showSaveDialog(new Stage());
try {
FileOutputStream fOutput = new FileOutputStream(file);
ObjectOutputStream objOutput = new ObjectOutputStream(fOutput);
// put all the DataTable type objects into an array
objOutput.writeObject(myDataset);
objOutput.writeObject(myChart);
objOutput.close();
fOutput.close();
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
}
/**
* choose a file with *.comp3111 extension for deserialization purpose
* #return
* the chosen *.comp3111 file
*/
public static File deserialize () {
FileChooser chooser = new FileChooser();
FileChooser.ExtensionFilter extFilter = new FileChooser.ExtensionFilter("comp3111 type", "*.comp3111");
chooser.getExtensionFilters().add(extFilter);
File mfile = chooser.showOpenDialog(new Stage());
return mfile;
}
The issue is that you have code in your serialize and deserialize methods that doesn't belong there. They are too broad: not only do they serialise and deserialise objects (although the code for deserialisation appears to be missing), they also open file choosers.
While this may make sense from a program flow point of view, it also breaks the single responsibility principle (note: I'm aware that SRP is usually applied to a class and not to a method). Your methods do too many things.
So you should write your methods in such a way that they accept a File or InputStream as an input parameter, and return a collection of objects as a result (for deserialisation), or take a collection of objects and return an OutputStream as a result (for serialisation). That way, you can test exactly that behaviour.
Even better would be to first write the test, where you could for example use a test file, both to read from to check if it produces the expected objects, or you test that a ByteOutputStream returned by the serialiser matches that same file.
It's called test-driven development, and it's a good way to write testable and stable code.
I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser.
With this code snippet:
try {
String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
XmlFactory xmlFactory = new XmlFactory();
JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
JsonToken token = jp.nextToken();
while (token != null) {
System.out.println("xml1 token=" + token);
token = jp.nextToken();
}
jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
token = jp.nextToken();
while (token != null) {
System.out.println("xml2 token=" + token);
token = jp.nextToken();
}
} catch (IOException e) {
e.printStackTrace();
}
I am getting:
xml1 token=START_OBJECT
xml1 token=END_OBJECT
xml2 token=START_OBJECT
xml2 token=FIELD_NAME
xml2 token=VALUE_NULL
xml2 token=END_OBJECT
Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?
Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported.
This is mentioned on project README (along with mention that "tree model" is similarly not supported).
Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).
While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper).
With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:
public class Foo {
public Bar bar;
}
which would map to JSON like:
json
{
"bar" : null
}
as well as XML of
xml
<foo>
<bar></bar>
</foo>
Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.
Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:
Knowing translation that XmlParser does, call getToken() expecting translation
Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory) at expected location, use that for reading.
I hope this helps.
A kid with a hammer...
I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.
//Consumer side
FileInputStream fis = new FileInputStream(filename);
XStream xs = new XStream();
Object obj = xs.fromXML(fis);
fis.close();
Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.
//producer side
FileOutputStream fos = new FileOutputStream(filename);
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
oos.flush();
oos.close();
fos.close();
//Consumer side
final FileInputStream fin = new FileInputStream(filename);
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
#SuppressWarnings("unchecked")
final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
ois.close();
fin.close();
I'm trying to create Avro file in Java (just testing code at the moment). Everything works fine, the code looks about like this:
GenericRecord record = new GenericData.Record(schema);
File file = new File("test.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(record);
dataFileWriter.close();
The problem I'm facing now is - what kind of Java object do I instantiate when I want to write Union? Not necessarily on the top level, possibly attach the union to a record being written. There are a few objects for complex types prepared, like GenericData.Record, GenericData.Array etc. For those that are not prepared, usually the right object is simply a standard Java object (java.util.Map implementing classes for "map" Avro type etc.).
But I cannot figure out what is the right object to instantiate for writing a Union.
This question refers to writing Avro file WITHOUT code generation. Any help is very much appreciated.
Here's what I did:
Suppose the schema is defined like this:
record MyStructure {
...
record MySubtype {
int p1;
}
union {null, MySubtype} myField = null;
...
}
And this is the Java code:
Schema schema; // the schema of the main structure
// ....
GenericRecord rec = new GenericData.Record(schema);
int i = schema.getField("myField").schema().getIndexNamed("MySubtype");
GenericRecord myField = new GenericData.Record(schema.getField("myField").schema().getTypes().get(i));
myField.put("p1", 100);
rec.put("myField", myField);
I used Weka Explorer:
Loaded the arff file
Applied StringToWordVector filter
Selected IBk as the best classifier
Generated/Saved my_model.model binary
In my Java code I deserialize the model:
URL curl = ClassUtility.findClasspathResource( "models/my_model.model" );
final Classifier cls = (Classifier) weka.core.SerializationHelper.read( curl.openConnection().getInputStream() );
Now, I have the classifier BUT I need somehow the information on the filter. Where I am getting is: how do I prepare an instance to be classified by my deserialized model (how do I apply the filter before classification) - (The raw instance that I have to classify has a field text with tokens in it. The filter was supposed to transform that into a list of new attributes)
I even tried to use a FilteredClassifier where I set the classifier to the deserialized on and the filter to a manually created instance of StringToWordVector
final StringToWordVector filter = new StringToWordVector();
filter.setOptions(new String[]{"-C", "-P x_", "-L"});
FilteredClassifier fcls = new FilteredClassifier();
fcls.setFilter(filter);
fcls.setClassifier(cls);
The above does not work either. It throws the exception:
Exception in thread "main" java.lang.NullPointerException: No output instance format defined
What I am trying to avoid is doing the training in the Java code. It can be very slow and the prospect is that I might have multiple classifiers to train (different algorithms as well) and I want my app to start fast.
Your problem is that your model doesn't know anything about what the filter did to the data. The StringToWordVector filter changes the data, but depending on the input (training) data. A model trained on this transformed data set will only work on data that underwent the exact same transformation. To guarantee this, the filter needs to be part of your model.
Using a FilteredClassifier is the correct idea, but you have to use it from the beginning:
Load the ARFF file
Select FilteredClassifier as classifier
Select StringToWordVector as filter for it
Select IBk as classifier for the FilteredClassifier
Generate/Save the model to my_model.binary
The trained and serialized model will then also contain the intialized filter, including the information on how to transform data.
Another way to do this is to use the same filter to your testing data as the one used on training data. I describe the procedure analytically. In your case you just need to follow steps after the loading of your serialized classifier.
Create your training file (e.g training.arff)
Create Instances from training file. Instances trainingData = ..
Use StringToWordVector to transform your string attributes to number representation:
sample code:
StringToWordVector() filter = new StringToWordVector();
filter.setWordsToKeep(1000000);
if(useIdf){
filter.setIDFTransform(true);
}
filter.setTFTransform(true);
filter.setLowerCaseTokens(true);
filter.setOutputWordCounts(true);
filter.setMinTermFreq(minTermFreq);
filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
NGramTokenizer t = new NGramTokenizer();
t.setNGramMaxSize(maxGrams);
t.setNGramMinSize(minGrams);
filter.setTokenizer(t);
WordsFromFile stopwords = new WordsFromFile();
stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
filter.setStopwordsHandler(stopwords);
if (useStemmer){
Stemmer s = new /*Iterated*/LovinsStemmer();
filter.setStemmer(s);
}
filter.setInputFormat(trainingData);
Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);
Select a classifier to create your model
sample code for LibLinear classifier
Classifier cls = null;
LibLINEAR liblinear = new LibLINEAR();
liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
liblinear.setProbabilityEstimates(true);
// liblinear.setBias(1); // default value
cls = liblinear;
cls.buildClassifier(trainingData);
Save model
sample code
System.out.println("Saving the model...");
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));
oos.writeObject(cls);
oos.flush();
oos.close();
Create a testing file (e.g testing.arff)
Create Instances from training file: Instances testingData=...
Load classifier
sample code
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");
Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will keep the format of training set and will not add words that are not in training set.
Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);
Classify!
sample code
for (int j = 0; j < testingData.numInstances(); j++) {
double res = myCls.classifyInstance(testingData.get(j));
}
What I'm trying to do is to convert an object to xml, then use a String to transfer it via Web Service so another platform (.Net in this case) can read the xml and then deparse it into the same object. I've been reading this article:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php#start
And I've been able to do everything with no problems until here:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
File result = new File("example.xml");
serializer.write(pac, result);
I know this will sound silly, but I can't find where Java creates the new File("example.xml"); so I can check the information.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Thanks in advance.
And I wanna know if is there any way to convert that xml into a String instead of a File, because that's what I need exactly. I can't find that information at the article.
Check out the JavaDoc. There is a method that writes to a Writer, so you can hook it up to a StringWriter (which writes into a String):
StringWriter result = new StringWriter(expectedLength);
serializer.write(pac, result)
String s = result.toString();
You can use an instance of StringWriter:
Serializer serializer = new Persister();
PacienteObj pac = new PacienteObj();
pac.idPaciente = "1";
pac.nomPaciente = "Sonia";
StringWriter result = new StringWriter();
serializer.write(pac, result);
String xml = result.toString(); // xml now contains the serialized data
Log or print the below statement will tell you where the file is on the file system.
result.getAbsolutePath()