How to build an rdf file using an xml file as input - java

I have an input file in xml format and I need to convert it into a .rdf file that is based on an ontology model created.
Can anyone let me know what the suitable method is to do this using jena api in java?

Is your input file in some arbitrary XML format, or is it already serialized as RDF/XML? (ie: is the root tag of your document <rdf:RDF>?)
If it is in some arbitrary format, then you will need to define some rdf-based schema for representing your data. This is purely project-specific, and will require work on your part to define a way for a graph to apply to your data.
Once you have done that, then basic document construction is a topic for the Jena Tutorials. There is far too much material to cover here, but the basics of creating a statement should suffice:
final Model m = ModelFactory.createDefaultModel();
final Resource s = m.createResource("urn:ex:subject");
final Property p = m.createProperty("urn:ex:predicate");
final Resource o = m.createResource("urn:ex:object");
m.add(s,p,o);
try( final OutputStream out = Files.newOutputStream(Paths.createTempFile("tmp","rdf"), StandardOpenOptions.CREATE_NEW) ){
m.write(out, null, "RDF/XML");
}
The exercise of iterating over your XML and constructing the proper set of statements is left as an exercise for the reader.
If your data is already in RDF/XML, then you can directly read it in to a model:
// Assume you have an InputStream called 'in' pointing at your input data
final Model m = ModelFactory.createDefaultModel();
m.read(in, null, "RDF/XML"); // Assumed that there is no base

Related

Dumping values with quotes with SnakeYaml

Having a simple yml file test.yml as follows
color: 'red'
I load and dump the file as follows
final DumperOptions yamlOptions = new DumperOptions();
yamlOptions.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
Yaml yaml = new Yaml(yamlOptions);
Object result = yaml.load(new FileInputStream(new File("test.yml")));
System.out.println(yaml.dump(result));
I expect to get
color: 'red'
However, the during the dump, the serializer leaves out the quotes and prints
color: red
How can I make the serializer to print the original quotes too?
How can I make the serializer to print the original quotes too?
Not with the high-level API. Quoting the spec:
The scalar style is a presentation detail and must not be used to convey content information, with the exception that plain scalars are distinguished for the purpose of tag resolution.
The high-level API implements the whole YAML loading process, giving you only the content of the YAML file, without any information about presentation details, as required by the spec.
That being said, you can use the low level API which preserves presentation details:
final Yaml yaml = new Yaml();
final Iterator<Event> events = yaml.parse(new StreamReader(new UnicodeReader(
new FileInputStream(new File("test.yml"))).iterator();
final DumperOptions yamlOptions = new DumperOptions();
final Emitter emitter = new Emitter(new PrintWriter(System.out), yamlOptions);
while (events.hasNext()) emitter.emit(events.next());
However, be aware that even this will not preserve every presentation detail of your input (e.g. indentation and comments will not be preserved). SnakeYaml is not round-tripping and therefore unable to preserve the exact input layout.

Map Multiple CSV to single POJO

I have many CSV files with different column header. Currently I am reading those csv files and map them to different POJO classes based on their column header. So some of the CSV files have around 100 column headers which makes difficult to create a POJO class.
So Is there any technique where I can use single pojo, so when reading those csv files can map to a single POJO class or I should read the CSV file line by line and parse accordingly or I should create the POJO during runtime(javaassist)?
If I understand your problem correctly, you can use uniVocity-parsers to process this and get the data in a map:
//First create a configuration object - there are many options
//available and the tutorial has a lot of examples
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(settings);
parser.beginParsing(new File("/path/to/your.csv"));
// you can also apply some transformations:
// NULL year should become 0000
parser.getRecordMetadata().setDefaultValueOfColumns("0000", "Year");
// decimal separator in prices will be replaced by comma
parser.getRecordMetadata().convertFields(Conversions.replace("\\.00", ",00")).set("Price");
Record record;
while ((record = parser.parseNextRecord()) != null) {
Map<String, String> map = record.toFieldMap(/*you can pass a list of column names of interest here*/);
//for performance, you can also reuse the map and call record.fillFieldMap(map);
}
Or you can even parse the file and get beans of different types in a single step. Here's how you do it:
CsvParserSettings settings = new CsvParserSettings();
//Create a row processor to process input rows. In this case we want
//multiple instances of different classes:
MultiBeanListProcessor processor = new MultiBeanListProcessor(TestBean.class, AmountBean.class, QuantityBean.class);
// we also need to grab the headers from our input file
settings.setHeaderExtractionEnabled(true);
// configure the parser to use the MultiBeanProcessor
settings.setRowProcessor(processor);
// create the parser and run
CsvParser parser = new CsvParser(settings);
parser.parse(new File("/path/to/your.csv"));
// get the beans:
List<TestBean> testBeans = processor.getBeans(TestBean.class);
List<AmountBean> amountBeans = processor.getBeans(AmountBean.class);
List<QuantityBean> quantityBeans = processor.getBeans(QuantityBean.class);
See an example here and here
If your data is too big and you can't hold everything in memory, you can stream the input row by row by using the MultiBeanRowProcessor instead. The method rowProcessed(Map<Class<?>, Object> row, ParsingContext context) will give you a map of instances created for each class in the current row. Inside the method, just call:
AmountBean amountBean = (AmountBean) row.get(AmountBean.class);
QuantityBean quantityBean = (QuantityBean) row.get(QuantityBean.class);
...
//perform something with the instances parsed in a row.
Hope this helps.
Disclaimer: I'm the author of this library. It's open-source and free (Apache 2.0 license)
To me, creating a POJO class is not a good idea in this case. As neither number of columns nor number of files are constant.
Therefore, it is better to use something more dynamic for which you do not have to change your code to a great extent just to support more columns OR files.
I would go for a List (Or Map) of Map List<Map<>> for a given csv file.
Where each map represents a row in your csv file with key as column name.
You can easily extend it to multiple csv files.

Parsing RDF items

I have a couple lines of (I think) RDF data
<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class>
<http://www.test.com/meta#0002> <http://www.test.com/meta#CONCEPT_hasType> "BEAR"^^<http://www.w3.org/2001/XMLSchema#string>
Each line has 3 items in it. I want to pull out the item before and after the URL. So that would result in:
0001, type, Class
0002, CONCEPT_hasType, (BEAR, string)
Is there a library out there (java or scala) that would do this split for me? Or do I just need to shove string.splits and assumptions in my code?
Most RDF libraries will have something to facilitate this. For example, if you parse your RDF data using Eclipse RDF4J's Rio parser, you will get back each line as a org.eclipse.rdf4j.model.Statement, with a subject, predicate and object value. The subject in both your lines will be an org.eclipse.rdf4j.model.IRI, which has a getLocalName() method you can use to get the part behind the last #. See the Javadocs for more details.
Assuming your data is in N-Triples syntax (which it seems to be given the example you showed us), here's a simple bit of code that does this and prints it out to STDOUT:
// parse the file into a Model object
InputStream in = new FileInputStream(new File("/path/to/rdf-data.nt"));
org.eclipse.rdf4j.model.Model model = Rio.parse(in, RDFFormat.NTRIPLES);
for (org.eclipse.rdf4j.model.Statement st: model) {
org.eclipse.rdf4j.model.Resource subject = st.getSubject();
if (subject instanceof org.eclipse.rdf4j.model.IRI) {
System.out.print(((IRI)subject).getLocalName());
}
else {
System.out.print(subject.stringValue());
}
// ... etc for predicate and object (the 2nd and 3rd elements in each RDF statement)
}
Update if you don't want to read data from a file but simply use a String, you could just use a java.io.StringReader instead of an InputStream:
StringReader r = new StringReader("<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .");
org.eclipse.rdf4j.model.Model model = Rio.parse(r, RDFFormat.NTRIPLES);
Alternatively, if you don't want to parse the data at all and just want to do String processing, there is a org.eclipse.rdf4j.model,URIUtil class which you can just feed a string and it can give you back the index of the local name part:
String uri = "http://www.test.com/meta#0001";
String localpart = uri.substring(URIUtil.getLocalNameIndex(uri)); // will be "0001"
(disclosure: I am on the RDF4J development team)

Read Xml Data and store in Text file Dynamically

I need to read XMl Data and store it in Text File, In the above code i am hard Coding getTagValue for all the Tag Names, If they are 4 tag names i can hardcode getTagValuebut now i had 200 tags and how can i read data into text file without hard coding getTagValue
When using DOM to parse the XML you must know the exact structure of the XML, so ther is no real way to avoid what your are doing.
If you have an XSD (if not you can write one), you can generate a Java object from it using some Xml binding framework like XmlBeans and then with one line you can parse the XML and start working with regular java object.
A sample code would be:
File xmlFile = new File("c:\employees.xml");
// Bind the instance to the generated XMLBeans types.
EmployeesDocument empDoc =
EmployeesDocument.Factory.parse(xmlFile);
// Get and print pieces of the XML instance.
Employees emps = empDoc.getEmployees();
Employee[] empArray = emps.getEmployeeArray();
for (int i = 0; i < empArray.length; i++)
{
System.out.println(empArray[i]);
}

Externalize XML construction from a stream of CSV in Java

I get a stream of values as CSV , based on some condition I need to generate a XML including only a set of values from the CSV. For e.g .
Input : a:value1, b:value2, c:value3, d:value4, e:value5.
if (condition1)
XML O/P = <Request><ValueOfA>value1</ValueOfA><ValueOfE>value5</ValueOfE></Request>
else if (condition2)
XML O/P = <Request><ValueOfB>value2</ValueOfB><ValueOfD>value4</ValueOfD></Request>
I want to externalize the process in a way that given a template the output XML is generated accordingly. String manipulation is the easiest way of implementing this but I do not want to mess up the XML if some special characters appear in the input, etc. Please suggest.
Perhaps you could benefit from templating engine, something like Apache Velocity.
I would suggest creating an xsd and using JAXB to create the Java binding classes that you can use to generate the XML.
I recommend my own templating engine (JATL http://code.google.com/p/jatl/) Although its geared to (X)HTML its also very good at generating XML.
I didn't bother solving the whole problem for you (that is double splitting on the input ("," and then ":").) but this is how you would use JATL.
final String a = "stuff";
HtmlWriter html = new HtmlWriter() {
#Override
protected void build() {
//If condition1
start("Request").start("ValueOfA").text(a).end().end();
}
};
//Now write.
StringWriter writer = new StringWriter();
String results = html.write(writer).getBuffer().toString();
Which would generate
<Request><ValueOfA>stuff</ValueOfA></Request>
All the correct escaping is handled for you.

Categories

Resources