Could any one please recommend a tutorial or tell me how can I build a java program for extracting information from xml files and produce the out put as RDF triples using an existing ontology. an example would be really helpful.
Thanks
There are ready-made tools that address this problem, such as XSPARQL. You can write an XSPARQL query that queries the XML and produces RDF triples as output. This example should be pretty close to what you're looking for.
Your problem is really two problems:
parsing XML
writing RDF
For Java XML parsing, there are numerous examples on the web:
Java and XML - Tutorial
Java Examples in a Nutshell, Chapter 19, XML
Working with XML: The Java/XML Tutorial
For RDF there are fewer resources, it's a much more specialized field:
What are some good Java RDF libraries?
In the past I worked with Jena – it offers a friendly API to the semantic web stack.
I would recommend the XmlToRdf Java library.
XmlToRdf offers incredibly fast conversion by using the built in Java SAX parser to stream convert your XML file to RDF. A vast selection of configurations (with sane defaults) makes it simple to adjust the conversion for your needs, including element renaming and advanced IRI generation with composite identifiers.
Output from the conversion can be written directly to file as RDF Turtle or added to a Sesame Repository or Jena Dataset for further processing. With Sesame and Jena it is possible to do further, SPARQL based, transformations on the data and outputting to formats such as RDF Turtle and JSON-LD.
Related
I have a rich text document(.rtf or .doc) that has lot of data elements which needs to be read and converted into structured data objects either XML or Json. These docs have certain formats in terms of data. Are there any libraries that i can use to convert using java. DO anyone have come across this type of scenario?
Has anyone tried Apache POI or Apache Tika to convert into XML
I'd break this task into two parsers and two serializers
Parse rtf to java model
Parse doc to java model
Serialize java model to xml
Serialize java model to json
For 1&2 its pretty standard to use POI.
For 3&4 you have many more options, a popular option would be Jackson
I'd suggest looking at RTF Parser Kit which you can use to populate a Java data structure suitable for further processing or persistence.
Is there any java API available for reading and writing Graph Modeling Language (GML) files.
In fact, I am looking for any popular graph file format --that is supported by some handy graph editor and visualizer (with good layout management) tools-- and a convenient java API that is provided to reading and writing graphs in this popular format.
My intention is to generate graphs in my application, save it in this standard format, then feed it to the graph editor I mentioned above in order to further manipulate it, and save it again in the same format (that would be naturally readable again in my application).
Among the graph editors, I found yEd a handy one which supports GML (Graph Modeling Language) format as well. In the GML website, it seems there exist a C language API. So I am looking to see if there is any API in java (scala) that I can read and write this format conveniently.
Try graphstream. API can be found at: http://graphstream-project.org/api/gs-core/
do you know of any tool which creates an AST from a Java program or class and creates an XML representation (Collection or single XML document) from the AST?
kind regards,
Johannes
Not any tools directly, but http://www.antlr.org/ is the defacto tool for building ASTs from any general language. And there exists several grammar files for Java that you can repurpose for your own programs. So grab ANTLR, use the latest Java grammer, and write out the XML representation you want.
Our DMS Software Reengineering Toolkit with its Java Front End can do this directly. You ask DMS to parse the file, and produce an XML dump using a command line switch ++XML.
See What would an AST (abstract syntax tree) for an object-oriented programming language look like?.
As a general rule, we don't recommend this, for several reasons:
XML output for real files is really enormous, and takes a lot of time to write and read
Most people do this because they believe with an XML representation that just a little bit of XSLT will get them what they want
If you intend to modify the code, once you have the XML you pretty much can't regenerate it.
The machinery that DMS provides (attribute grammars, symbol tables, flow analyses, pattern matching and source-to-source transformations, source regeneration from the AST, is what you really want, and you get access to it by using DMS after the parsing step without exporting the XML ever
I am building an app in Java using Jena for semantic information scraping. I am looking for a RDFa parser that would allow me to correctly extract all the RDFa statements. Specifically, one that extracts info about namespaces used and presuming that RDFa tags are correct in the page produces correct triples, ones that distinguish between object and data properties.
I went through all RDFa parsers from the site http://rdfa.info/wiki/Consume for Java. They all struggle to extract any RDFa statements and if they do not crash, Jena RDFa parser shows plenty of errors and then dies a terrible death, the data is of little use as it is incorrectly processed and generally mixed up. I am newbie in this area so please be gentle:)
I was also thinking of using a library written in different language but then again I don't really know how to plug it into Java code. Any suggestions?
Most RDFa parsers struggle with invalid HTML. The any23 library includes an RDFa parser that can deal with invalid HTML. It parses any RDFa into full RDF, including namespace mappings and so on, and is under active development.
Use java-rdfa. It supports jena, and uses the validator.nu html 5 parser, which handles parsing the html like a browser does (i.e. it will repair broken markup).
I need to read an XML file using Java. Its contents are something like
<ReadingFile>
<csvFile>
<fileName>C:/Input.csv</fileName>
<delimiter>COMMA</delimiter>
<tableFieldNamesList>COMPANYNAME|PRODUCTNAME|PRICE</tableFieldNamesList>
<fieldProcessorDescriptorSize>20|20|20</fieldProcessorDescriptorSize>
<fieldName>company_name|product_name|price</fieldName>
</csvFile>
</ReadingFile>
Is there any special reader/JARs or should we read using FileInputStream?
Check out Java's JAXP APIs which come as standard. You can read the XML in from the file into a DOM (object model), or as SAX - a series of events (your code will receive an event for each start-of-element, end-of-element etc.). For both DOM and SAX, I would look at an API tutorial to get started.
Alternatively, you may find JDOM easier/more intuitive to use.
Another suggestion: Try out Commons digester. This allows you to develop parsing code very quickly using a rule-based approach. There's a tutorial here and the library is available here
I also agree with Brian and Alzoid in that JAXB is great to get you up and running quickly. You can use the xjc binding compiler that ships with the JDK to auto generate your Java classes given an XML schema.
xstream would do very nicely here. Check out the one page tutorial
You can user external libraries like
Castor https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1046622.html
I have used castor in past. Here are few other links that might help.
http://www.xml-training-guide.com/e-xml27.html
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/XMLReader.html
http://www.cafeconleche.org/books/xmljava/chapters/ch07.html
There are two major ways to parse XML with Java. The first is to use a SAX parser see here
which is fairly simple.
The second option is to use a DOM parser see here
which is more complicated but gives you more control.
JAXB is another technology that might suit your needs.