Java jdom xml parsing - java

it's my first day with java and I try to build a little xml parser for my websites, so I can have a clean look on my sitemaps.xml . The code I use is like that
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.util.List;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
class downloadxml {
public static void main(String[] args) throws IOException {
String str = "http://www.someurl.info/sitemap.xml";
URL url = new URL(str);
InputStream is = url.openStream();
int ptr = 0;
StringBuilder builder = new StringBuilder();
while ((ptr = is.read()) != -1) {
builder.append((char) ptr);
}
String xml = builder.toString();
org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder();
try {
org.jdom2.Document doc = saxBuilder.build(new StringReader(xml));
System.out.println(xml);
Element xmlfile = doc.getRootElement();
System.out.println("ROOT -->"+xmlfile);
List list = xmlfile.getChildren("url");
System.out.println("LIST -->"+list);
} catch (JDOMException e) {
// handle JDOMExceptio n
} catch (IOException e) {
// handle IOException
}
System.out.println("===========================");
}
}
When the code pass
System.out.println(xml);
I get a clean print of the xml sitemap. When it comes to:
System.out.println("ROOT -->"+xmlfile);
Output:
ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]
It also finds the root element. But for some reason or another, when the script should go for the childs, it return an empty print:
System.out.println("LIST -->"+list);
Output:
LIST -->[]
What should I do in another way? Any pointers to get the childs?
The XML looks like this
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://www.image.url</loc>
<image:image>
<image:loc>http://www.image.url/image.jpg</image:loc>
</image:image>
<changefreq>daily</changefreq>
</url>
<url>
</urlset>

You've come a long way in a day.
Short answer, you are ignoring the namespace of your XML Document. Change the line:
List list = xmlfile.getChildren("url");
to
Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
List list = xmlfile.getChildren("url", ns);
For your convenience, you may also want to simplify the whole build process to:
org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");

My comment is similar to the above, but with the catch clauses, that display nice messages when the input xml is not "well-formed". The input here is an xml file.
File file = new File("adr781.xml");
SAXBuilder builder = new SAXBuilder(false);
try {
Document doc = builder.build(file);
Element root = doc.getRootElement();
} catch (JDOMException e) {
say(file.getName() + " is not well-formed.");
say(e.getMessage());
} catch (IOException e) {
say("Could not check " + file.getAbsolutePath());
say(" because " + e.getMessage());
}

Related

JAVA code snippet to replace single quote(') to double quote in whole XML file

I have a XML file having nested tags. We can use DOM, JDOM parser
I want to replace inside the string of all tag from single quote(') to double quote in whole XML file. tag can be nested inside tags also. I want some for loop which looks for all tag and replace value like HYPER SHIPPING'SDN BHD_First_Page --> HYPER SHIPPING''SDN BHD_First_Page
Sample code
public void iterateChildNodes(org.jdom.Element parentNode) {
if(parentNode.getChildren().size() == 0) {
if(parentNode.getText().contains("'")) {
parentNode.setText(parentNode.getText().replaceAll("'", "\'"));
LOGGER.info("************* Below Value updated");
LOGGER.info(parentNode.getText());
}
}else {
List<Element> rec = parentNode.getChildren();
for(Element i : rec) {
iterateChildNodes(i);
}
}
}
Sample XML File
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING'SDN BHD_First_Page</Value> //Value to be replaced here
<DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street '10 road</Value> //Value to be replaced here
<Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>
This code can replace all ' with " from an XML file.
Adding no description here, try to code step by step. It is very easy to understand.
(Updated)
Part 1: Using JDOM
import java.util.ArrayList;
import java.util.List;
import org.w3c.dom.NodeList;
import org.jdom2.input.SAXBuilder;
import org.jdom2.transform.JDOMSource;
import org.w3c.dom.*;
import java.io.*;
public class XmlModificationJDom {
public static void main(String[] args) {
XmlModificationJDom xmlModificationJDom = new XmlModificationJDom();
xmlModificationJDom.updateXmlAndSaveJDom();
}
public void updateXmlAndSaveJDom() {
try {
File inputFile = new File("document.xml");
SAXBuilder saxBuilder = new SAXBuilder();
org.jdom2.Document xmlDocument = saxBuilder.build(inputFile);
org.jdom2.Element rootElement = xmlDocument.getRootElement();
iterateAndUpdateElementsUsingJDom(rootElement);
saveUpdatedXmlUsingJDomSource(xmlDocument);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public void iterateAndUpdateElementsUsingJDom(org.jdom2.Element element) {
if (element.getChildren().size() == 0) {
// System.out.println(element.getName() + ","+ element.getText());
if (element.getText().contains("'")) {
element.setText(element.getText().replaceAll("\'", "\""));
}
} else {
// System.out.println(element.getName());
for (org.jdom2.Element childElement : element.getChildren()) {
iterateAndUpdateElementsUsingJDom(childElement);
}
}
}
}
Part 2: Using DOM
import javax.xml.parsers.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.util.ArrayList;
import java.util.List;
import java.io.*;
public class XmlModificationDom {
public static void main(String[] args) {
XmlModificationDom XmlModificationDom = new XmlModificationDom();
XmlModificationDom.updateXmlAndSave();
}
public void updateXmlAndSave() {
try {
File inputFile = new File("document.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.parse(inputFile);
document.getDocumentElement().normalize();
Node parentNode = document.getFirstChild();
iterateChildNodesAndUpate(parentNode);
writeAndSaveXML(document);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public void writeAndSaveXML(Document document) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File("updated-document.xml"));
transformer.transform(source, result);
}
public void iterateChildNodesAndUpate(Node parentNode) {
NodeList nodeList = parentNode.getChildNodes();
for (int index = 0; index < nodeList.getLength(); index++) {
Node node = nodeList.item(index);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
//System.out.print(element.getNodeName());
if (element.hasChildNodes() && element.getChildNodes().getLength() > 1) {
//System.out.println("Child > " + element.getNodeName());
iterateChildNodesAndUpate(element);
} else {
//System.out.println(" - " + element.getTextContent());
if (element.getTextContent().contains("'")) {
String str = element.getTextContent().replaceAll("\'", "\"");
element.setTextContent(str);
}
}
}
}
}
}
Input file document.xml:
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING'SDN BHD_First_Page</Value> //Value to be replaced here
<DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street '10 road</Value> //Value to be replaced here
<Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>
Output updated-document.xml/updated-document-jdom.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING"SDN BHD_First_Page</Value><DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street "10 road</Value><Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>
More details code, visit this repo
you need to add backslash on single quote and double quote
value =value.replace("\'","\"");
Just replace the removeQuote method with
private static void removeQuote(Document batchXml) throws JDOMException, Exception {
Element root = batchXml.getRootElement();
List<Element> docs = root.getChild("Documents").getChildren("Document");
for (Element doc : docs) {
String docType = doc.getChildText("Value");
value =value.replaceAll("\'", "\"");
}
}

Parsing xml data from one xml to a new xml in Java

I have an xml file that have paragraphs element, sentence elements and annotation sub element under sentences. I would like to read these annotation elements and extract the content to write them to a new xml file like:
<sentence>
<Date></Date>
<Person></Person>
<NumberDate></NumberDate>
<Location></Location>
<etc></etc>
</sentence>
In my code, I parse the xml file and read the annotations but am only able to print to console. I cant figure out how to continue and how to export to a new xml file.
Here is my code:
package domparserxml;
import java.io.File;
//package domparserxml;
import java.io.IOException;
import java.io.PrintStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class DomParserXml {
public static void main(String[] args) {
// Tap into the xml
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("Chrono.xml"); //This is my input xml file
NodeList paragraphList = doc.getElementsByTagName("paragraph");//getting the paragraph tags
for (int i=0;i<paragraphList.getLength();i++) {
Node p = paragraphList.item(i);//getting the paragraphs
if (p.getNodeType()==Node.ELEMENT_NODE) {//if the datatype is Node element than we can handle it
Element paragraph = (Element) p;
paragraph.getAttribute("id"); //get the paragraph id
paragraph.getAttribute("date");//get the paragraph date
NodeList sentenceList = paragraph.getChildNodes();//getting the sentence childnodes of the paragraph element
for(int j=0;j<sentenceList.getLength();j++) {
Node s = sentenceList.item(j);
if(s.getNodeType()==Node.ELEMENT_NODE) {
Element sentence = (Element) s;
//sentence.getAttribute("id"); //dont need it now
NodeList annotationList = sentence.getChildNodes();//the annotation tags or nodes are childnodes of the sentence element
int len = annotationList.getLength(); //to make it shorter and reusable
System.out.println(""); //added these two just to add spaces in between sentences
//System.out.println("");
for(int a=0;a<len;a++) { //here i am using 'len' i defined above.
Node anno = annotationList.item(a);
if(anno.getNodeType()==Node.ELEMENT_NODE) {
Element annotation = (Element) anno;
if(a ==1){ //if it is the first sentence of the paragraph, print all these below:
//PrintStream myconsole = new PrintStream(new File("C:\\Users\\ngwak\\Applications\\eclipse\\workfolder\\results.xml"));
//System.setOut(myconsole);
//myconsole.print("paragraph-id:" + paragraph.getAttribute("id") + ";" + "paragraph-date:" + paragraph.getAttribute("date") + ";" + "senteid:" + sentence.getAttribute("id") + ";" + annotation.getTagName() + ":" + annotation.getTextContent() + ";");
System.out.print("paragraph-id:" + paragraph.getAttribute("id") + ";" + "paragraph-date:" + paragraph.getAttribute("date") + ";" + "senteid:" + sentence.getAttribute("id") + ";" + annotation.getTagName() + ":" + annotation.getTextContent() + ";");
}
if (a>1){ // if there is more after the first sentence, don't write paragraph, id etc. again, just write what is new..
//PrintStream myconsole = new PrintStream(new File("C:\\Users\\ngwak\\Applications\\eclipse\\workfolder\\results.xml"));
System.out.print(annotation.getTagName() + ":" + annotation.getTextContent() + ";");
//myconsole.print("paragraph-id:" + paragraph.getAttribute("id") + " " + "paragraph-date:" + paragraph.getAttribute("date") + " " + "senteid:" + sentence.getAttribute("id") + " " + annotation.getTagName() + ":" + annotation.getTextContent() + " ");
}
}
}
}
}
}
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Can somebody please help me.
Thanks.
DOM provides many handy classes to create XML file easily. Firstly, you have to create a Document with DocumentBuilder class, define all the XML content – node, attribute with Element class. In last, use Transformer class to output the entire XML content to stream output, typically a File.
Have a look at the code, you can use this code just after you get all the values in your paragraph variable
package com.sujit;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class CreateXML {
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try
{
docBuilder = docFactory.newDocumentBuilder();
// root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("sentence"); //root
doc.appendChild(rootElement);
Element date = doc.createElement("date");
date.appendChild(doc.createTextNode(paragraph.getAttribute("date"))); // child
rootElement.appendChild(date);
Element person = doc.createElement("person");
person.appendChild(doc.createTextNode(paragraph.getAttribute("person")));
rootElement.appendChild(person);
Element numberdate = doc.createElement("numberdate");
numberdate.appendChild(doc.createTextNode(paragraph.getAttribute("numberDate")));
rootElement.appendChild(numberdate);
Element location = doc.createElement("location");
location.appendChild(doc.createTextNode(paragraph.getAttribute("location")));
rootElement.appendChild(location);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
File file = new File("E://file.xml");
StreamResult result = new StreamResult(file);
transformer.transform(source, result);
System.out.println("File saved!");
}
catch (ParserConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Let me know if you still face any issue.

how to split an XML file into multiple XML files using java

I'm using XML files in Java for the first time and i need some help. I am trying to split an XML file to multiple XML files using Java
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<products>
<product>
<description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
<gtin>00027242816657</gtin>
<price>2999.99</price>
<orderId>2343</orderId>
<supplier>Sony</supplier>
</product>
<product>
<description>Apple iPad 2 with Wi-Fi 16GB - iOS 5 - Black
</description>
<gtin>00885909464517</gtin>
<price>399.0</price>
<orderId>2343</orderId>
<supplier>Apple</supplier>
</product>
<product>
<description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue
</description>
<gtin>00027242831438</gtin>
<price>91.99</price>
<orderId>2343</orderId>
<supplier>Sony</supplier>
</product>
<product>
<description>Apple MacBook Air A 11.6" Mac OS X v10.7 Lion MacBook
</description>
<gtin>00885909464043</gtin>
<price>1149.0</price>
<orderId>2344</orderId>
<supplier>Apple</supplier>
</product>
<product>
<description>Panasonic TC-L47E50 47" Smart TV Viera E50 Series LED
HDTV</description>
<gtin>00885170076471</gtin>
<price>999.99</price>
<orderId>2344</orderId>
<supplier>Panasonic</supplier>
</product>
</products>
and I'm trying to get three XML documents like:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
<gtin>00027242816657</gtin>
<price currency="USD">2999.99</price>
<orderid>2343</orderid>
</product>
<product>
<description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue</description>
<gtin>00027242831438</gtin>
<price currency="USD">91.99</price>
<orderid>2343</orderid>
</product>
</products>
one for each supplier. How can I receive it? Any help on this will be great.
Make sure you change the path in "inputFile" to your file and also the output part:
StreamResult result = new StreamResult(new File("C:\xmls\" + supplier.trim() + ".xml"));
Here your code.
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXml
{
/**
* #param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = "resources/products.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allProductsExpression = xpath.compile("//product/supplier/text()");
NodeList productNodes = (NodeList) allProductsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List<String> suppliers = new ArrayList<String>();
for (int i=0; i<productNodes.getLength(); ++i)
{
Node productName = productNodes.item(i);
System.out.println(productName.getTextContent());
suppliers.add(productName.getTextContent());
}
//Now we create the split XMLs
for (String supplier : suppliers)
{
String xpathQuery = "/products/product[supplier='" + supplier + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList productNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + productNodesFiltered.getLength() +
" product(s) for supplier " + supplier);
//We store the new XML file in supplierName.xml e.g. Sony.xml
Document suppXml = dBuilder.newDocument();
//we have to recreate the root node <products>
Element root = suppXml.createElement("products");
suppXml.appendChild(root);
for (int i=0; i<productNodesFiltered.getLength(); ++i)
{
Node productNode = productNodesFiltered.item(i);
//we append a product (cloned) to the new file
Node clonedNode = productNode.cloneNode(true);
suppXml.adoptNode(clonedNode); //We adopt the orphan :)
root.appendChild(clonedNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(suppXml);
StreamResult result = new StreamResult(new File("resources/" + supplier.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + supplier);
}
}
}
DOM parser will consume more memory. I prefer to use SAX parser to read XML and write .
I like the approach of Xmappr (https://code.google.com/p/xmappr/) where you can use simple annotations:
first the root-element Products which simply holds a list of Product-instances
#RootElement
public class Products {
#Element
public List<Product> product;
}
Then the Product-class
#RootElement
public class Product {
#Element
public String description;
#Element
public String supplier;
#Element
public String gtin;
#Element
public String price;
#Element
public String orderId;
}
And then you simply fetch the Product-instances from the Products:
public static void main(String[] args) throws FileNotFoundException {
Reader reader = new FileReader("test.xml");
Xmappr xm = new Xmappr(Products.class);
Products products = (Products) xm.fromXML(reader);
// fetch list of products
List<Product> listOfProducts = products.product;
// do sth with the products in the list
for (Product product : listOfProducts) {
System.out.println(product.description);
}
}
And then you can do whatever you want with the products (e.g. sorting them according the supplier and put them out to an xml-file)
You can have a look here to see how to parse a XML document using DOM, in Java:
DOM XML Parser Example
Here, how to write the new XML file(s):
Create XML file using java
In addition you could study XPath to easily select your nodes: Java Xpath expression
If the performances are not your goal, first of all, once you load your DOM and your Xpath, you can retrieve all the suppliers you have in your xml document using the following XPath query
//supplier/text()
you will get something like that:
Text='Sony'
Text='Apple'
Text='Sony'
Text='Apple'
Text='Panasonic'
Then I will put those results in a ArraryList or whatever. The second step will be the iteration of that collection, and for each item query the XML input document in order to extract all the nodes with a particular supplier:
/products/product[supplier='Sony']
of course in java you will have to build the last xpath query in a dynamic way:
String xpathQuery = "/products/product/[supplier='" + currentValue + "']
After that, you will get the list of nodes which match the supplier you specified. The next step would be constructing the new XML DOM and save it on a file.
Consider this xml
<?xml version="1.0"?>
<SSNExportDocument xmlns="urn:com:ssn:schema:export:SSNExportFormat.xsd" Version="0.1" DocumentID="b482350d-62bb-41be-b792-8a9fe3884601-1" ExportID="b482350d-62bb-41be-b792-8a9fe3884601" JobID="464" RunID="3532468" CreationTime="2019-04-16T02:20:01.332-04:00" StartTime="2019-04-15T20:20:00.000-04:00" EndTime="2019-04-16T02:20:00.000-04:00">
<MeterData MeterName="MUNI1-11459398" UtilDeviceID="11459398" MacID="00:12:01:fae:fe:00:d5:fc">
<RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
<RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:06.214-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
<Tier Number="0">
<Register Number="1" Summation="5949.1000" SummationUOM="GAL"/>
</Tier>
</RegisterRead>
</RegisterData>
</MeterData>
<MeterData MeterName="MUNI4-11460365" UtilDeviceID="11460365" MacID="00:11:01:bc:fe:00:d3:f9">
<RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
<RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:11.082-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
<Tier Number="0">
<Register Number="1" Summation="136349.9000" SummationUOM="GAL"/>
</Tier>
</RegisterRead>
</RegisterData>
</MeterData>
We can use JAXB which converts your xml tags to objects. Then we can play around with them.
File xmlFile = new File("input.xml");
jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
SSNExportDocument ssnExpDoc = (SSNExportDocument) jaxbUnmarshaller.unmarshal(xmlFile);
MeterData mD = new MeterData();
Map<String, List<MeterData>> meterMapper = new HashMap<String, List<MeterData>>(); // Phantom Reference
for (MeterData mData : ssnExpDoc.getMeterData()) {
String meterFullName = mData.getMeterName();
String[] splitMeterName = meterFullName.split("-");
List<MeterData> _meterDataList = meterMapper.get(splitMeterName[0]);// o(1)
if (_meterDataList == null) {
_meterDataList = new ArrayList<>();
_meterDataList.add(mData);
meterMapper.put(splitMeterName[0], _meterDataList);
_meterDataList = null;
} else {
_meterDataList.add(mData);
}
}
meterMapper contains tag names against list of objects
Then Marshall the contents using
JAXBContext jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);
// Create Marshaller
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
// Required formatting??
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
jaxbMarshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
//jaxbMarshaller.setProperty("com.sun.xml.bind.xmlDeclaration", Boolean.FALSE);
// Print XML String to Console
StringWriter sw = new StringWriter();
// Write XML to StringWriter
jaxbMarshaller.marshal(employee, sw);
// Verify XML Content
String xmlContent = sw.toString();
System.out.println(xmlContent);
Not a perfect solution but works in most cases. Had to play around with some string operations to make it work. Basically this solution splits the given XML for a given element and forms Sub-XMLs and writes those a list.
public static void main(String[] args) {
java.io.File inputFile = new java.io.File("input.xml");
String elementSplitString = "product";
java.io.InputStream inputStream = null;
try {
inputStream = new java.io.BufferedInputStream(new java.io.FileInputStream(inputFile));
javax.xml.stream.XMLInputFactory inputFactory = javax.xml.stream.XMLInputFactory.newInstance();
javax.xml.stream.XMLOutputFactory outputFactory = javax.xml.stream.XMLOutputFactory.newInstance();
javax.xml.stream.XMLEventReader reader = inputFactory.createXMLEventReader(inputStream);
javax.xml.stream.XMLEventWriter writer = null;
StringWriter parentXMLStringWriter = new StringWriter();
javax.xml.stream.XMLEventWriter headerWriter = outputFactory.createXMLEventWriter(parentXMLStringWriter);
StringWriter stringWriter = null;
String lastReadEvent = "";
boolean splitElementFound = false;
List<StringBuilder> list = new ArrayList<StringBuilder>();
while (reader.hasNext()) {
javax.xml.stream.events.XMLEvent event = reader.nextEvent();
switch(event.getEventType()) {
case javax.xml.stream.XMLStreamConstants.START_ELEMENT:
javax.xml.stream.events.StartElement startElement = (javax.xml.stream.events.StartElement)event;
if (startElement.getName().getLocalPart().equals(elementSplitString)) {
splitElementFound = true;
stringWriter = new StringWriter();
writer = outputFactory.createXMLEventWriter(stringWriter);
if (writer != null) writer.add(event);
} else if(writer != null)
writer.add(event);
break;
case javax.xml.stream.XMLStreamConstants.END_ELEMENT:
javax.xml.stream.events.EndElement endElement = (javax.xml.stream.events.EndElement)event;
if (endElement.getName().getLocalPart().equals(elementSplitString)) {
if (writer != null) writer.add(event);
writer.close();
StringBuilder builder = new StringBuilder();
String parentXML = parentXMLStringWriter.toString();
builder.append(parentXML.subSequence(0, parentXML.indexOf(">", parentXML.indexOf(lastReadEvent)) + 1));
builder.append(stringWriter.toString());
builder.append(parentXML.substring(parentXML.indexOf(">", parentXML.indexOf(lastReadEvent)) + 2));
list.add(builder);
writer = null;
}else if(writer != null)
writer.add(event);
break;
default:
if (writer != null)
writer.add(event);
break;
}
if(!splitElementFound) {
if(event instanceof javax.xml.stream.events.StartElement)
lastReadEvent = ((javax.xml.stream.events.StartElement)event).getName().getLocalPart();
else if(event instanceof javax.xml.stream.events.EndElement)
lastReadEvent = ((javax.xml.stream.events.EndElement)event).getName().getLocalPart();
headerWriter.add(event);
}else {
headerWriter.close();
}
}
headerWriter = null;
reader.close();
if (writer != null) writer.close();
} catch(Throwable ex) {
ex.printStackTrace();
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (java.io.IOException ex) {
// do nothing
}
}
}
}
An alternative to Dom would be, if you have the Schema (XSD) for your XML dialect, JAXB.

Parser to parse unknown XML Schema in java

I tried understanding all other answers in stackoverflow.But I am not able to relate those answers to my question.
When I call a web service, I get response. I get schema by response.getData();(The XML of the data table containing the results.) (return type String). We don't know what data we get in that XML.
I need to use a 3rd party parser, so that when I give the above string to one method in that parser it should return all the elements in that XML and then I can print the required elements.
I don't want to start parsing the XML myself. Is there a way I can do this? (Does it even make any sense?). Sorry If I am totally wrong. (using Axis2/eclipse) (Edited)
Edit: Adding the code I've tried already.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
NodeList nodeList = null;
try {
String xml = res2.getResult().getRawData();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new ByteArrayInputStream(xml.getBytes()));
nodeList = document.getElementsByTagName("PhoneNumber");
NamedNodeMap attrib = document.getAttributes();
for (int i = 0; i < attrib.getLength(); i++) {
String nodeName = attrib.item(i).getNodeName();
//nodeName
String nodeValue = attrib.item(i).getNodeValue();
}
But I am not sure if the PhoneNumber is with that tag or other name. Also we don't know how many tags we have.
Thanks, Using the code by SyamS, I am able to print all the nodes and corresponding values from xml. Now I want to store that into a hashmap with key as node name and node values in a list.
Example XML :
<Docs>
<Doc>
<Id>12</Id>
<Phone>1234</Phone>
</Doc>
<Doc>
<Id>147</Id>
<Phone>12345</Phone>
<Locked>false</Locked>
<BID>2</BID>
<DocId>8</DocId>
<Date>2014-02-04T12:18:50.063-07:00</Date>
<Urgent>false</Urgent>
</Doc>
</Docs>
You need not go for a third party library for this. you could simply identify all leaf nodes using xpath and read the value (as well as attributes). For example
public static Map<String, List<String>> parseXml(String xml) throws XMLStreamException {
StringBuilder content = null;
Map<String, List<String>> dataMap = new HashMap<>();
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream stream = new ByteArrayInputStream(xml.getBytes());
XMLStreamReader reader = factory.createXMLStreamReader(stream);
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
content = new StringBuilder();
break;
case XMLStreamConstants.CHARACTERS:
if (content != null) {
content.append(reader.getText().trim());
}
break;
case XMLStreamConstants.END_ELEMENT:
if (content != null) {
String leafText = content.toString();
if(dataMap.get(reader.getLocalName()) == null){
List<String> values = new ArrayList<>();
values.add(leafText);
dataMap.put(reader.getLocalName(), values);
} else {
dataMap.get(reader.getLocalName()).add(leafText);
}
}
content = null;
break;
case XMLStreamConstants.START_DOCUMENT:
break;
}
}
return dataMap;
}
You should read answers related to Best XML parser for Java. Using the example XML from Sample XML File (books.xml) which I've downloaded to a temp folder on my C: drive, you might use Java's native SAXParser library. Here's an example class you might use to iterate through all the elements in XML. Create the class in your project, and call its parse method as:
File xml = new File("c:/temp/books.xml");
MySaxParser sax = new MySaxParser(xml);
sax.parseXml();
This is the class you can copy into your project to try it out. Modify according to your needs, of course. The imports should direct you to the appropriate Java API pages such as Class SAXParser to begin with.
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class MySaxParser extends DefaultHandler {
private String absolutePathToXml = "";
public MySaxParser(File xml) {
absolutePathToXml = xml.getAbsolutePath();
}
/**
* Parses an XML file into memory
*/
public void parseXml() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(absolutePathToXml, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfigurationException: ");
e.printStackTrace();
} catch (SAXException e) {
System.out.println("SAXException: ");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOException: ");
e.printStackTrace();
}
}
/**
* Event: Parser starts reading an element
*/
#Override
public void startElement(String s1, String s2, String elementName, Attributes attributes)
throws SAXException {
//print an element's name
System.out.println("element: " + elementName);
//print all attributes for this element
for(int i = 0; i < attributes.getLength(); i++) {
System.out.println("attribute: " + attributes.getValue(i));
}
}
}

How can I append an attribute to an existing XML element in Java?

I want to append an attribute an existing element in XML using Java. For example:
<employee>
<details name="Jai" age="25"/>
<details name="kishore" age="30"/>
</employee>
It want to add weight to it (assume that it is calculated and then appended in response). How can I append that to all items?
<details name="Jai" age="25" weight="55"/>
import org.w3c.dom.*;
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
public class AddAndPrint {
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("/path/to/file.xml");
NodeList employees = document.getElementsByTagName("employee");
for (Node employee : employees) {
for (Node child : employee.getChildNodes() {
if ("details".equals(child.getNodeName()) child.setAttribute("weight", "150");
}
}
try {
Source source = new DOMSource(doc);
StringWriter stringWriter = new StringWriter();
Result result = new StreamResult(stringWriter);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.transform(source, result);
System.out.println(stringWriter.getBuffer().toString());
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
}
}
}
Here is a quick solution based on jdom:
public static void main(String[] args) throws JDOMException, IOException {
File xmlFile = new File("employee.xml");
SAXBuilder builder = new SAXBuilder();
Document build = builder.build(xmlFile);
XPath details = XPath.newInstance("//details");
List<Element> detailsNodes = details.selectNodes(build);
for (Element detailsNode:detailsNodes) {
detailsNode.setAttribute("weight", "70"); // static weight for demonstration
}
XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());
outputter.output(build, System.out);
}
First, we build a document (SAXBuilder), next we create a XPath expression for the details node, then we iterate through the elements for that expression and add the weight attribute.
The last two lines just verify that it's white magic :-)

Categories

Resources