Parser to parse unknown XML Schema in java

Parser to parse unknown XML Schema in java - java

I tried understanding all other answers in stackoverflow.But I am not able to relate those answers to my question.
When I call a web service, I get response. I get schema by response.getData();(The XML of the data table containing the results.) (return type String). We don't know what data we get in that XML.
I need to use a 3rd party parser, so that when I give the above string to one method in that parser it should return all the elements in that XML and then I can print the required elements.
I don't want to start parsing the XML myself. Is there a way I can do this? (Does it even make any sense?). Sorry If I am totally wrong. (using Axis2/eclipse) (Edited)
Edit: Adding the code I've tried already.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
NodeList nodeList = null;
try {
String xml = res2.getResult().getRawData();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new ByteArrayInputStream(xml.getBytes()));
nodeList = document.getElementsByTagName("PhoneNumber");
NamedNodeMap attrib = document.getAttributes();
for (int i = 0; i < attrib.getLength(); i++) {
String nodeName = attrib.item(i).getNodeName();
//nodeName
String nodeValue = attrib.item(i).getNodeValue();
}
But I am not sure if the PhoneNumber is with that tag or other name. Also we don't know how many tags we have.
Thanks, Using the code by SyamS, I am able to print all the nodes and corresponding values from xml. Now I want to store that into a hashmap with key as node name and node values in a list.
Example XML :
<Docs>
<Doc>
<Id>12</Id>
<Phone>1234</Phone>
</Doc>
<Doc>
<Id>147</Id>
<Phone>12345</Phone>
<Locked>false</Locked>
<BID>2</BID>
<DocId>8</DocId>
<Date>2014-02-04T12:18:50.063-07:00</Date>
<Urgent>false</Urgent>
</Doc>
</Docs>

You need not go for a third party library for this. you could simply identify all leaf nodes using xpath and read the value (as well as attributes). For example
public static Map<String, List<String>> parseXml(String xml) throws XMLStreamException {
StringBuilder content = null;
Map<String, List<String>> dataMap = new HashMap<>();
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream stream = new ByteArrayInputStream(xml.getBytes());
XMLStreamReader reader = factory.createXMLStreamReader(stream);
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
content = new StringBuilder();
break;
case XMLStreamConstants.CHARACTERS:
if (content != null) {
content.append(reader.getText().trim());
}
break;
case XMLStreamConstants.END_ELEMENT:
if (content != null) {
String leafText = content.toString();
if(dataMap.get(reader.getLocalName()) == null){
List<String> values = new ArrayList<>();
values.add(leafText);
dataMap.put(reader.getLocalName(), values);
} else {
dataMap.get(reader.getLocalName()).add(leafText);
}
}
content = null;
break;
case XMLStreamConstants.START_DOCUMENT:
break;
}
}
return dataMap;
}

You should read answers related to Best XML parser for Java. Using the example XML from Sample XML File (books.xml) which I've downloaded to a temp folder on my C: drive, you might use Java's native SAXParser library. Here's an example class you might use to iterate through all the elements in XML. Create the class in your project, and call its parse method as:
File xml = new File("c:/temp/books.xml");
MySaxParser sax = new MySaxParser(xml);
sax.parseXml();
This is the class you can copy into your project to try it out. Modify according to your needs, of course. The imports should direct you to the appropriate Java API pages such as Class SAXParser to begin with.
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class MySaxParser extends DefaultHandler {
private String absolutePathToXml = "";
public MySaxParser(File xml) {
absolutePathToXml = xml.getAbsolutePath();
}
/**
* Parses an XML file into memory
*/
public void parseXml() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(absolutePathToXml, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfigurationException: ");
e.printStackTrace();
} catch (SAXException e) {
System.out.println("SAXException: ");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOException: ");
e.printStackTrace();
}
}
/**
* Event: Parser starts reading an element
*/
#Override
public void startElement(String s1, String s2, String elementName, Attributes attributes)
throws SAXException {
//print an element's name
System.out.println("element: " + elementName);
//print all attributes for this element
for(int i = 0; i < attributes.getLength(); i++) {
System.out.println("attribute: " + attributes.getValue(i));
}
}
}

Related

Calling and Testing java method

I have built some code to ingest an XML document and parse through the values but now I'm stuck on testing the method.
What would be the proper way to unit run test on this method? I'm also unsure how to pass an xml document to run.
public class WebServiceTools
{
static public String getVersionFromWSResponseFromDOM(Document responseDocument) {
String versionDataAsXML = badData;
try {
responseDocument.normalizeDocument();
NodeList resultList = responseDocument.getElementsByTagName("ti:VersionResponse");
Element resultElement = (Element) resultList.item(0);
if (!badData.equalsIgnoreCase(resultElement.getTextContent())) {
versionDataAsXML = resultElement.getTextContent().trim();
}
} catch (Exception e) {
e.printStackTrace();
}
return versionDataAsXML;
}
}
package org.examples.tools;
import java.lang.reflect.Method;
public class ReflectApp {
public static void main(String[] args) {
//String parameter
Class[] paramString = new Class[1];
paramString[0] = String.class;
try{
//load the AppTest at runtime
Class cls = Class.forName("org.examples.tools.WebServiceTools");
Object obj = cls.newInstance();
//call the printItString method, pass a String param
method = cls.getDeclaredMethod("printItString", paramString);
method.invoke(obj, new String(" Do I put document here? "));
}catch(Exception ex){
ex.printStackTrace();
}
}
package org.examples.tools;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
public class TestGetVersion {
public static void main (String[] args) throws Exception {
String fileName = "C:/examples/VersionResponse.xml"; // Set path to file
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new File(fileName));
// Create do
String result = WebServiceTools.getVersionFromWSResponseFromDOM(doc);
// Treat result
System.out.print(result);
}
}

I understand your problem is how to read an XML file into a Document, right?
There are several ways and libraries to read an XML from a file: Java: How to read and write xml files?
For instance:
String fileName = ""; // Set path to file
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new File(fileName));
Then, just call your method from a main function or a JUnit test:
public static void main (String[] args) {
// Create doc
String result = WebServiceTools.getVersionFromWSResponseFromDOM(doc);
// Treat result
}
UPDATE
REFLECTION
Your cls.getDeclaredMethod("printItString", paramString); is correct, although using such parameter is confusing. At first sight I thought it was a String. I'd preferably use
Method method = cls.getDeclaredMethod("printItString", new Class[] {String.class});
I think this makes it clearer (just my opinion).
To call through reflection is just what you did. Didn't it work?
Object result = method.invoke(obj, new Object[] {"whatever string"});
I assume printString is a method on WebServiceTools class, whose signature is printString(String param)
Unmarshall
First things first: As far as I know, unmarshall is usually used to convert an XML back into an Object (in XML serialization libraries, like XStream or JABX), but I guess you meant convert a Document back to String. Am I right?
If so, one way that works:
Source source = new DOMSource(doc);
StringWriter writer = new StringWriter();
Result result = new StreamResult(writer);
// create a transformer
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
// set some options on the transformer
transformer.setOutputProperty(OutputKeys.ENCODING, encoding);
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// transform the xml document into a string
transformer.transform(source, result);
String xml = writer.getBuffer().toString();
If this is not what you meant, please clarify.

how to split an XML file into multiple XML files using java

I'm using XML files in Java for the first time and i need some help. I am trying to split an XML file to multiple XML files using Java
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<products>
<product>
<description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
<gtin>00027242816657</gtin>
<price>2999.99</price>
<orderId>2343</orderId>
<supplier>Sony</supplier>
</product>
<product>
<description>Apple iPad 2 with Wi-Fi 16GB - iOS 5 - Black
</description>
<gtin>00885909464517</gtin>
<price>399.0</price>
<orderId>2343</orderId>
<supplier>Apple</supplier>
</product>
<product>
<description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue
</description>
<gtin>00027242831438</gtin>
<price>91.99</price>
<orderId>2343</orderId>
<supplier>Sony</supplier>
</product>
<product>
<description>Apple MacBook Air A 11.6" Mac OS X v10.7 Lion MacBook
</description>
<gtin>00885909464043</gtin>
<price>1149.0</price>
<orderId>2344</orderId>
<supplier>Apple</supplier>
</product>
<product>
<description>Panasonic TC-L47E50 47" Smart TV Viera E50 Series LED
HDTV</description>
<gtin>00885170076471</gtin>
<price>999.99</price>
<orderId>2344</orderId>
<supplier>Panasonic</supplier>
</product>
</products>
and I'm trying to get three XML documents like:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
<gtin>00027242816657</gtin>
<price currency="USD">2999.99</price>
<orderid>2343</orderid>
</product>
<product>
<description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue</description>
<gtin>00027242831438</gtin>
<price currency="USD">91.99</price>
<orderid>2343</orderid>
</product>
</products>
one for each supplier. How can I receive it? Any help on this will be great.

Make sure you change the path in "inputFile" to your file and also the output part:
StreamResult result = new StreamResult(new File("C:\xmls\" + supplier.trim() + ".xml"));
Here your code.
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXml
{
/**
* #param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = "resources/products.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allProductsExpression = xpath.compile("//product/supplier/text()");
NodeList productNodes = (NodeList) allProductsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List<String> suppliers = new ArrayList<String>();
for (int i=0; i<productNodes.getLength(); ++i)
{
Node productName = productNodes.item(i);
System.out.println(productName.getTextContent());
suppliers.add(productName.getTextContent());
}
//Now we create the split XMLs
for (String supplier : suppliers)
{
String xpathQuery = "/products/product[supplier='" + supplier + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList productNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + productNodesFiltered.getLength() +
" product(s) for supplier " + supplier);
//We store the new XML file in supplierName.xml e.g. Sony.xml
Document suppXml = dBuilder.newDocument();
//we have to recreate the root node <products>
Element root = suppXml.createElement("products");
suppXml.appendChild(root);
for (int i=0; i<productNodesFiltered.getLength(); ++i)
{
Node productNode = productNodesFiltered.item(i);
//we append a product (cloned) to the new file
Node clonedNode = productNode.cloneNode(true);
suppXml.adoptNode(clonedNode); //We adopt the orphan :)
root.appendChild(clonedNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(suppXml);
StreamResult result = new StreamResult(new File("resources/" + supplier.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + supplier);
}
}
}

DOM parser will consume more memory. I prefer to use SAX parser to read XML and write .

I like the approach of Xmappr (https://code.google.com/p/xmappr/) where you can use simple annotations:
first the root-element Products which simply holds a list of Product-instances
#RootElement
public class Products {
#Element
public List<Product> product;
}
Then the Product-class
#RootElement
public class Product {
#Element
public String description;
#Element
public String supplier;
#Element
public String gtin;
#Element
public String price;
#Element
public String orderId;
}
And then you simply fetch the Product-instances from the Products:
public static void main(String[] args) throws FileNotFoundException {
Reader reader = new FileReader("test.xml");
Xmappr xm = new Xmappr(Products.class);
Products products = (Products) xm.fromXML(reader);
// fetch list of products
List<Product> listOfProducts = products.product;
// do sth with the products in the list
for (Product product : listOfProducts) {
System.out.println(product.description);
}
}
And then you can do whatever you want with the products (e.g. sorting them according the supplier and put them out to an xml-file)

You can have a look here to see how to parse a XML document using DOM, in Java:
DOM XML Parser Example
Here, how to write the new XML file(s):
Create XML file using java
In addition you could study XPath to easily select your nodes: Java Xpath expression
If the performances are not your goal, first of all, once you load your DOM and your Xpath, you can retrieve all the suppliers you have in your xml document using the following XPath query
//supplier/text()
you will get something like that:
Text='Sony'
Text='Apple'
Text='Sony'
Text='Apple'
Text='Panasonic'
Then I will put those results in a ArraryList or whatever. The second step will be the iteration of that collection, and for each item query the XML input document in order to extract all the nodes with a particular supplier:
/products/product[supplier='Sony']
of course in java you will have to build the last xpath query in a dynamic way:
String xpathQuery = "/products/product/[supplier='" + currentValue + "']
After that, you will get the list of nodes which match the supplier you specified. The next step would be constructing the new XML DOM and save it on a file.

Consider this xml
<?xml version="1.0"?>
<SSNExportDocument xmlns="urn:com:ssn:schema:export:SSNExportFormat.xsd" Version="0.1" DocumentID="b482350d-62bb-41be-b792-8a9fe3884601-1" ExportID="b482350d-62bb-41be-b792-8a9fe3884601" JobID="464" RunID="3532468" CreationTime="2019-04-16T02:20:01.332-04:00" StartTime="2019-04-15T20:20:00.000-04:00" EndTime="2019-04-16T02:20:00.000-04:00">
<MeterData MeterName="MUNI1-11459398" UtilDeviceID="11459398" MacID="00:12:01:fae:fe:00:d5:fc">
<RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
<RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:06.214-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
<Tier Number="0">
<Register Number="1" Summation="5949.1000" SummationUOM="GAL"/>
</Tier>
</RegisterRead>
</RegisterData>
</MeterData>
<MeterData MeterName="MUNI4-11460365" UtilDeviceID="11460365" MacID="00:11:01:bc:fe:00:d3:f9">
<RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
<RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:11.082-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
<Tier Number="0">
<Register Number="1" Summation="136349.9000" SummationUOM="GAL"/>
</Tier>
</RegisterRead>
</RegisterData>
</MeterData>
We can use JAXB which converts your xml tags to objects. Then we can play around with them.
File xmlFile = new File("input.xml");
jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
SSNExportDocument ssnExpDoc = (SSNExportDocument) jaxbUnmarshaller.unmarshal(xmlFile);
MeterData mD = new MeterData();
Map<String, List<MeterData>> meterMapper = new HashMap<String, List<MeterData>>(); // Phantom Reference
for (MeterData mData : ssnExpDoc.getMeterData()) {
String meterFullName = mData.getMeterName();
String[] splitMeterName = meterFullName.split("-");
List<MeterData> _meterDataList = meterMapper.get(splitMeterName[0]);// o(1)
if (_meterDataList == null) {
_meterDataList = new ArrayList<>();
_meterDataList.add(mData);
meterMapper.put(splitMeterName[0], _meterDataList);
_meterDataList = null;
} else {
_meterDataList.add(mData);
}
}
meterMapper contains tag names against list of objects
Then Marshall the contents using
JAXBContext jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);
// Create Marshaller
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
// Required formatting??
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
jaxbMarshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
//jaxbMarshaller.setProperty("com.sun.xml.bind.xmlDeclaration", Boolean.FALSE);
// Print XML String to Console
StringWriter sw = new StringWriter();
// Write XML to StringWriter
jaxbMarshaller.marshal(employee, sw);
// Verify XML Content
String xmlContent = sw.toString();
System.out.println(xmlContent);

Not a perfect solution but works in most cases. Had to play around with some string operations to make it work. Basically this solution splits the given XML for a given element and forms Sub-XMLs and writes those a list.
public static void main(String[] args) {
java.io.File inputFile = new java.io.File("input.xml");
String elementSplitString = "product";
java.io.InputStream inputStream = null;
try {
inputStream = new java.io.BufferedInputStream(new java.io.FileInputStream(inputFile));
javax.xml.stream.XMLInputFactory inputFactory = javax.xml.stream.XMLInputFactory.newInstance();
javax.xml.stream.XMLOutputFactory outputFactory = javax.xml.stream.XMLOutputFactory.newInstance();
javax.xml.stream.XMLEventReader reader = inputFactory.createXMLEventReader(inputStream);
javax.xml.stream.XMLEventWriter writer = null;
StringWriter parentXMLStringWriter = new StringWriter();
javax.xml.stream.XMLEventWriter headerWriter = outputFactory.createXMLEventWriter(parentXMLStringWriter);
StringWriter stringWriter = null;
String lastReadEvent = "";
boolean splitElementFound = false;
List<StringBuilder> list = new ArrayList<StringBuilder>();
while (reader.hasNext()) {
javax.xml.stream.events.XMLEvent event = reader.nextEvent();
switch(event.getEventType()) {
case javax.xml.stream.XMLStreamConstants.START_ELEMENT:
javax.xml.stream.events.StartElement startElement = (javax.xml.stream.events.StartElement)event;
if (startElement.getName().getLocalPart().equals(elementSplitString)) {
splitElementFound = true;
stringWriter = new StringWriter();
writer = outputFactory.createXMLEventWriter(stringWriter);
if (writer != null) writer.add(event);
} else if(writer != null)
writer.add(event);
break;
case javax.xml.stream.XMLStreamConstants.END_ELEMENT:
javax.xml.stream.events.EndElement endElement = (javax.xml.stream.events.EndElement)event;
if (endElement.getName().getLocalPart().equals(elementSplitString)) {
if (writer != null) writer.add(event);
writer.close();
StringBuilder builder = new StringBuilder();
String parentXML = parentXMLStringWriter.toString();
builder.append(parentXML.subSequence(0, parentXML.indexOf(">", parentXML.indexOf(lastReadEvent)) + 1));
builder.append(stringWriter.toString());
builder.append(parentXML.substring(parentXML.indexOf(">", parentXML.indexOf(lastReadEvent)) + 2));
list.add(builder);
writer = null;
}else if(writer != null)
writer.add(event);
break;
default:
if (writer != null)
writer.add(event);
break;
}
if(!splitElementFound) {
if(event instanceof javax.xml.stream.events.StartElement)
lastReadEvent = ((javax.xml.stream.events.StartElement)event).getName().getLocalPart();
else if(event instanceof javax.xml.stream.events.EndElement)
lastReadEvent = ((javax.xml.stream.events.EndElement)event).getName().getLocalPart();
headerWriter.add(event);
}else {
headerWriter.close();
}
}
headerWriter = null;
reader.close();
if (writer != null) writer.close();
} catch(Throwable ex) {
ex.printStackTrace();
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (java.io.IOException ex) {
// do nothing
}
}
}
}

An alternative to Dom would be, if you have the Schema (XSD) for your XML dialect, JAXB.

Java jdom xml parsing

it's my first day with java and I try to build a little xml parser for my websites, so I can have a clean look on my sitemaps.xml . The code I use is like that
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.util.List;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
class downloadxml {
public static void main(String[] args) throws IOException {
String str = "http://www.someurl.info/sitemap.xml";
URL url = new URL(str);
InputStream is = url.openStream();
int ptr = 0;
StringBuilder builder = new StringBuilder();
while ((ptr = is.read()) != -1) {
builder.append((char) ptr);
}
String xml = builder.toString();
org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder();
try {
org.jdom2.Document doc = saxBuilder.build(new StringReader(xml));
System.out.println(xml);
Element xmlfile = doc.getRootElement();
System.out.println("ROOT -->"+xmlfile);
List list = xmlfile.getChildren("url");
System.out.println("LIST -->"+list);
} catch (JDOMException e) {
// handle JDOMExceptio n
} catch (IOException e) {
// handle IOException
}
System.out.println("===========================");
}
}
When the code pass
System.out.println(xml);
I get a clean print of the xml sitemap. When it comes to:
System.out.println("ROOT -->"+xmlfile);
Output:
ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]
It also finds the root element. But for some reason or another, when the script should go for the childs, it return an empty print:
System.out.println("LIST -->"+list);
Output:
LIST -->[]
What should I do in another way? Any pointers to get the childs?
The XML looks like this
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://www.image.url</loc>
<image:image>
<image:loc>http://www.image.url/image.jpg</image:loc>
</image:image>
<changefreq>daily</changefreq>
</url>
<url>
</urlset>

You've come a long way in a day.
Short answer, you are ignoring the namespace of your XML Document. Change the line:
List list = xmlfile.getChildren("url");
to
Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
List list = xmlfile.getChildren("url", ns);
For your convenience, you may also want to simplify the whole build process to:
org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");

My comment is similar to the above, but with the catch clauses, that display nice messages when the input xml is not "well-formed". The input here is an xml file.
File file = new File("adr781.xml");
SAXBuilder builder = new SAXBuilder(false);
try {
Document doc = builder.build(file);
Element root = doc.getRootElement();
} catch (JDOMException e) {
say(file.getName() + " is not well-formed.");
say(e.getMessage());
} catch (IOException e) {
say("Could not check " + file.getAbsolutePath());
say(" because " + e.getMessage());
}

XML Utility class exist for simple modification - add, remove/delete, change/modify?

Does a Java library exist that has the capability shown in the client code below? I'm looking for a library that provides basic XML manipulation capabilities using strings.
MagicXml mXml = MagicXmlUtil.createXml("<team name='cougars'><players><player name='Michael'/></players></team>");
mXml.addNode("players", "<player name='Frank'/>");
mXml.addNode("players", "<player name='Delete Me'/>");
mXml.removeNode("player[#name='Delete Me']");
mXml.addAttribute("team[#name='cougars']", "city", "New York");
mXml.addAttribute("team[#name='cougars']", "deleteMeAttribute", "Delete Me");
mXml.removeAttribute("team[#name='cougars']", "deleteMeAttribute");
mXml.modifyAttribute("player[#name='Michael']", "name", "Mike");
mXml.setNodeValue("player[#name='Mike']", "node value for Mike");
MagicXmlNode node = mXml.getNode("<player[#name='Frank'/>");
mXml.addNode("players", node);
mXml.modifyAttribute("player[#name='Frank'][1]", "name", "Frank2");
System.out.println("mXml:\n" + mXml.toString());
mXml:
<team name='cougars' city="New York">
<players>
<player name='Mike'>
node value for Mike
</player>
<player name='Frank' />
<player name='Frank2' />
</players>
</team>

there are many different java libraries for xml manipulation/editing, the basics one with java standard library are hard to use if your a beginner so you should try JDOM(java document object model) for parsing and editing is easy.
Read a bit of documentation and download sample code here if you want to try http://www.jdom.org/ good luck =)

Whether you use something already existing like dom4j or jdom or as I said in my comment you create a simple class that wraps call to finding nodes using XPath and adding/removing what you want (Nodes, Attributes etc).
This is a sample class, I'll let you add what's missing (modifyAttribute, setNodeValue etc)
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class MagicXml {
static XPath xpath = XPathFactory.newInstance().newXPath();
Document doc;
Element root;
public MagicXml(String xml) throws Exception {
doc = parseXml(xml);
root = doc.getDocumentElement();
}
private static Document parseXml(String xml) throws Exception {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
return docBuilder.parse(bis);
}
private String asXPath(String path) {
return path.startsWith("/") ? path : "//" + path;
}
private static Node findNode(Document doc, String xPath) throws Exception {
XPathExpression expr = xpath.compile(xPath);
return (Node) expr.evaluate(doc, XPathConstants.NODE);
}
public static MagicXml createXml(String xml) throws Exception {
return new MagicXml(xml);
}
public MagicXml addNode(String path, String xml) throws Exception {
Document subDoc = parseXml(xml);
Node destNode = findNode(doc, asXPath(path));
Node srcNode = subDoc.getFirstChild();
destNode.appendChild(doc.adoptNode(srcNode.cloneNode(true)));
return this;
}
public MagicXml removeNode(String path) throws Exception {
Node destNode = findNode(doc, asXPath(path));
destNode.getParentNode().removeChild(destNode);
return this;
}
public MagicXml addAttribute(String path, String attr, String value) throws Exception {
Element destNode = (Element)findNode(doc, asXPath(path));
destNode.setAttribute(attr, value);
return this;
}
public MagicXml removeAttribute(String path, String attr) throws Exception {
Element destNode = (Element)findNode(doc, asXPath(path));
destNode.removeAttribute(attr);
return this;
}
public String docToString(Document doc) {
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(sw));
return sw.toString();
} catch (Exception e) {
return "";
}
}
public String toString() {
return docToString(doc);
}
public static void main(String[] args) throws Exception {
System.out.println(//
MagicXml.createXml("<team name='cougars'><players><player name='Michael'/></players></team>")//
.addNode("players", "<player name='Frank'/>")//
.addNode("players", "<player name='Delete Me'/>")//
.removeNode("player[#name='Delete Me']") //
.addAttribute("player[#name='Frank']", "foo", "bar") //
.addAttribute("player[#name='Frank']", "bar", "bazz") //
.removeAttribute("player[#name='Frank']", "bar") //
.toString());
}
}

XStream is a very easy XML manipulation tool. It can go from java classes to XML and vice versa very easily.

XML Namespace is getting issues for parsing the file in XPath + java

I have an XML likewise
<?xml version="1.0" encoding="UTF-8"?>
<QDTM_IN300301QD ITSVersion="XML_1.0" xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hl7-org:v3 QDTM_IN300401QD.xsd ">
<controlActEvent classCode="CACT" moodCode="EVN">
<code code="QDTM_TE300401QD">
</code>
<statusCode code="Active" />
<subject contextConductionInd="true" contextControlCode="ON"
typeCode="SUBJ">
<registrationEvent classCode="REG" moodCode="EVN">
<statusCode code="token" />
<subject contextControlCode="AN" typeCode="SBJ">
<testCodeIdentifier classCode="ROL">
<playingTestCodeDetails classCode="ENT"
determinerCode="INSTANCE">
<code code="6399Z" codeSystemName="QTIM" codeSystemVersion="Updated">
<originalText><![CDATA[CBC (includes Differential and Platelets)]]></originalText>
<translation codeSystemName="DOSCATALOGNAMEHTMLENABLED">
<originalText><![CDATA[CBC (includes Differential and Platelets)]]></originalText>
</translation>
</code>
</playingTestCodeDetails>
</testCodeIdentifier>
</subject>
</registrationEvent>
</subject>
</controlActEvent>
</QDTM_IN300301QD>
JAVA CODE:
package com.parse;
import java.io.IOException;
import java.util.Iterator;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class ParseXPath {
public String parseXML(String fileName) {
fileName = "D://projects//Draft.xml";
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc;
try {
builder = domFactory.newDocumentBuilder();
doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(new NamespaceContext(){public String getNamespaceURI(String prefix) {
return "urn:hl7-org:v3";
}
public String getPrefix(String namespaceURI) {
return null; // we are not using this.
}
public Iterator getPrefixes(String namespaceURI) {
return null; // we are not using this.
}
});
String expr="//QDTM_IN300401QD/controlActEvent/subject/registrationEvent/subject/testCodeIdentifier/playingTestCodeDetails/code/translation[#codeSystemName='DOSCATALOGNAMEHTMLENABLED']/originalText/text()";
String result = xpath.evaluate(expr, doc);
System.out.println("Result --> "+result);
return result;
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (XPathExpressionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return fileName;
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
ParseBUXpath p = new ParseBUXpath();
p.parseRelatedTestXML("test");
}
}
I am facing this issue in Namespace thing in XML. When the xml is having "xmlns="urn:hl7-org:v3" then the xpath query doesnt get me the data. For supressing that i have written the code in java and removed the line from XML.
I need to parse the XML and get the data without deleting the namespace part from the XML. Is this a problem related to xsd or it is not getting the xsd mentioned?

You need to include the namespace in the expression, the namespace can be replaced by a "prefix" acting as a lookup for the full namespace uri.
String expr="//prefix:QDTM_IN300401QD/controlActEvent/...."
which you set up using namespace mapping, take a look at https://www.ibm.com/developerworks/library/x-javaxpathapi/index.html and https://xml.apache.org/xalan-j/xpath_apis.html#namespacecontext
If the xml only contains one namespace you could also try to use //*:elementname in your expression to simply ignore in what namespace the element name exists.
Take a look at http://www.w3schools.com/XML/xml_namespaces.asp to understand how namespaces are used and what problem they solve

The purpose of getNamespaceURI inside of the NamespaceContext is to associate each namespace in the source document with a unique prefix, so that the XPath engine knows which namespace an element belongs to whenever it encounters that prefix string in an XPath expression. The prefix here doesn't need to match the prefix (if any) for the same URI in the source XML; it just needs to provide a mapping from the prefix to the correct namespace.
So, if you were to write the expression like this:
//p:QDTM_IN300301QD/p:controlActEvent/p:subject/p:registrationEvent/p:subject
/p:testCodeIdentifier/p:playingTestCodeDetails/p:code
/p:translation[#codeSystemName='DOSCATALOGNAMEHTMLENABLED']
/p:originalText/text()"
...then you'd write the corresponding getNamespaceURI like this:
public String getNamespaceURI(String prefix) {
if ("p".equals(prefix)) {
return "urn:hl7-org:v3";
}
return null;
}
This is how the engine knows to look for an element in the urn:hl7-org:v3 namespace whenever it encounters the p prefix, which is the whole point. Othwerwise, how would the engine know that you didn't want some element named QDTM_IN300301QD in no namespace? Or an element with that name in some other namespace?
Note that the prefix name is arbitrary; it can be anything you want, as long as it's unique. That is, if you have other namespaces in your document, then you'll need to modify getNamespaceURI to be aware of those namespaces and assign a unique prefix to each of them.
Here is a complete (minimal) example:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("Draft.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if ("p".equals(prefix)) {
return "urn:hl7-org:v3";
}
return null;
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
XPathExpression expr = xpath.compile("//p:QDTM_IN300301QD/p:controlActEvent" +
"/p:subject/p:registrationEvent" +
"/p:subject/p:testCodeIdentifier/p:playingTestCodeDetails/p:code" +
"/p:translation[#codeSystemName='DOSCATALOGNAMEHTMLENABLED']" +
"/p:originalText/text()");
System.out.println("[" + expr.evaluate(doc, XPathConstants.STRING));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parser to parse unknown XML Schema in java - java

Related

Calling and Testing java method

how to split an XML file into multiple XML files using java

Java jdom xml parsing

XML Utility class exist for simple modification - add, remove/delete, change/modify?

XML Namespace is getting issues for parsing the file in XPath + java

Categories

Resources