Parsing xml data from one xml to a new xml in Java

Parsing xml data from one xml to a new xml in Java - java

I have an xml file that have paragraphs element, sentence elements and annotation sub element under sentences. I would like to read these annotation elements and extract the content to write them to a new xml file like:
<sentence>
<Date></Date>
<Person></Person>
<NumberDate></NumberDate>
<Location></Location>
<etc></etc>
</sentence>
In my code, I parse the xml file and read the annotations but am only able to print to console. I cant figure out how to continue and how to export to a new xml file.
Here is my code:
package domparserxml;
import java.io.File;
//package domparserxml;
import java.io.IOException;
import java.io.PrintStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class DomParserXml {
public static void main(String[] args) {
// Tap into the xml
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("Chrono.xml"); //This is my input xml file
NodeList paragraphList = doc.getElementsByTagName("paragraph");//getting the paragraph tags
for (int i=0;i<paragraphList.getLength();i++) {
Node p = paragraphList.item(i);//getting the paragraphs
if (p.getNodeType()==Node.ELEMENT_NODE) {//if the datatype is Node element than we can handle it
Element paragraph = (Element) p;
paragraph.getAttribute("id"); //get the paragraph id
paragraph.getAttribute("date");//get the paragraph date
NodeList sentenceList = paragraph.getChildNodes();//getting the sentence childnodes of the paragraph element
for(int j=0;j<sentenceList.getLength();j++) {
Node s = sentenceList.item(j);
if(s.getNodeType()==Node.ELEMENT_NODE) {
Element sentence = (Element) s;
//sentence.getAttribute("id"); //dont need it now
NodeList annotationList = sentence.getChildNodes();//the annotation tags or nodes are childnodes of the sentence element
int len = annotationList.getLength(); //to make it shorter and reusable
System.out.println(""); //added these two just to add spaces in between sentences
//System.out.println("");
for(int a=0;a<len;a++) { //here i am using 'len' i defined above.
Node anno = annotationList.item(a);
if(anno.getNodeType()==Node.ELEMENT_NODE) {
Element annotation = (Element) anno;
if(a ==1){ //if it is the first sentence of the paragraph, print all these below:
//PrintStream myconsole = new PrintStream(new File("C:\\Users\\ngwak\\Applications\\eclipse\\workfolder\\results.xml"));
//System.setOut(myconsole);
//myconsole.print("paragraph-id:" + paragraph.getAttribute("id") + ";" + "paragraph-date:" + paragraph.getAttribute("date") + ";" + "senteid:" + sentence.getAttribute("id") + ";" + annotation.getTagName() + ":" + annotation.getTextContent() + ";");
System.out.print("paragraph-id:" + paragraph.getAttribute("id") + ";" + "paragraph-date:" + paragraph.getAttribute("date") + ";" + "senteid:" + sentence.getAttribute("id") + ";" + annotation.getTagName() + ":" + annotation.getTextContent() + ";");
}
if (a>1){ // if there is more after the first sentence, don't write paragraph, id etc. again, just write what is new..
//PrintStream myconsole = new PrintStream(new File("C:\\Users\\ngwak\\Applications\\eclipse\\workfolder\\results.xml"));
System.out.print(annotation.getTagName() + ":" + annotation.getTextContent() + ";");
//myconsole.print("paragraph-id:" + paragraph.getAttribute("id") + " " + "paragraph-date:" + paragraph.getAttribute("date") + " " + "senteid:" + sentence.getAttribute("id") + " " + annotation.getTagName() + ":" + annotation.getTextContent() + " ");
}
}
}
}
}
}
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Can somebody please help me.
Thanks.

DOM provides many handy classes to create XML file easily. Firstly, you have to create a Document with DocumentBuilder class, define all the XML content – node, attribute with Element class. In last, use Transformer class to output the entire XML content to stream output, typically a File.
Have a look at the code, you can use this code just after you get all the values in your paragraph variable
package com.sujit;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class CreateXML {
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try
{
docBuilder = docFactory.newDocumentBuilder();
// root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("sentence"); //root
doc.appendChild(rootElement);
Element date = doc.createElement("date");
date.appendChild(doc.createTextNode(paragraph.getAttribute("date"))); // child
rootElement.appendChild(date);
Element person = doc.createElement("person");
person.appendChild(doc.createTextNode(paragraph.getAttribute("person")));
rootElement.appendChild(person);
Element numberdate = doc.createElement("numberdate");
numberdate.appendChild(doc.createTextNode(paragraph.getAttribute("numberDate")));
rootElement.appendChild(numberdate);
Element location = doc.createElement("location");
location.appendChild(doc.createTextNode(paragraph.getAttribute("location")));
rootElement.appendChild(location);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
File file = new File("E://file.xml");
StreamResult result = new StreamResult(file);
transformer.transform(source, result);
System.out.println("File saved!");
}
catch (ParserConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Let me know if you still face any issue.

Related

No getting desired XML output in Java

I am converting CSV file to XML , it is converting but not getting desired structured output .
My java Code :-
public static void main(String[] args){
List<String> headers=new ArrayList<String>(5);
File file=new File("C:/Users/Admin/Desktop/data.csv");
BufferedReader reader=null;
try {
DocumentBuilderFactory domFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder=domFactory.newDocumentBuilder();
Document newDoc=domBuilder.newDocument();
// Root element
Element rootElement=newDoc.createElement("root");
newDoc.appendChild(rootElement);
reader = new BufferedReader(new FileReader(file));
int line=0;
String text=null;
while ((text=reader.readLine())!=null) {
StringTokenizer st=new StringTokenizer(text, "?", false);
String[] rowValues=new String[st.countTokens()];
int index=0;
while (st.hasMoreTokens()) {
String next=st.nextToken();
rowValues[index++]=next;
}
//String[] rowValues = text.split(",");
if (line == 0) { // Header row
for (String col:rowValues) {
headers.add(col);
Element rowElement=newDoc.createElement("header");
rootElement.appendChild(rowElement);
for (int col1=0;col1<headers.size();col1++) {
String header = headers.get(col1);
String value = null;
if (col1<rowValues.length) {
value=rowValues[col1];
} else {
// ?? Default value
value=" ";
}
rowElement.setTextContent(value);
System.out.println(headers+" "+value);
}
}} else { // Data row
Element rowElement=newDoc.createElement("row");
rootElement.appendChild(rowElement);
for (int col=0;col<headers.size();col++) {
String header = headers.get(col);
String value = null;
if (col<rowValues.length) {
value=rowValues[col];
} else {
// ?? Default value
value=" ";
}
rowElement.setTextContent(value);
System.out.println(header+" "+value);
}
}
line++;
}
try {
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
Result result = new StreamResult(new File("C:/Users/Admin/Desktop/data.xml"));
aTransformer.transform(src, result);
System.out.println("File creation successfully!");
} catch (Exception exp) {
exp.printStackTrace();
} finally {
try {
} catch (Exception e1) {
}
try {
} catch (Exception e1) {
}
}
} catch (Exception e1) {
e1.printStackTrace();
}
}
This is my CSV file:-
Symbol,Open,High,Low,Last Traded Price,Change
"NIFTY 50","9,645.90","9,650.65","9,600.95","9,609.30","-5.70"
"RELIANCE","1,390.00","1,414.20","1,389.00","1,407.55","26.50"
"BPCL","647.70","665.00","645.95","660.10","10.75"
"ADANIPORTS","368.50","373.80","368.00","372.25","4.25"
"ONGC","159.50","161.75","159.35","160.80","1.70"
And this is the output I am getting:-
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<header>Symbol,Open,High,Low,Last Traded Price,Change</header>
<row>"NIFTY 50","9,645.90","9,650.65","9,600.95","9,609.30","-5.70"</row>
<row>"RELIANCE","1,390.00","1,414.20","1,389.00","1,407.55","26.50"</row>
<row>"BPCL","647.70","665.00","645.95","660.10","10.75"</row>
<row>"ADANIPORTS","368.50","373.80","368.00","372.25","4.25"</row>
<row>"ONGC","159.50","161.75","159.35","160.80","1.70"</row>
</root>
Suggest me where am I going wrong ? I tried according to me , but getting confuse where in header and row section should I make changes.
ADDED :-
Expected output
<root>
<header>symbol</header>
<row>NIFTY 50</row>
<row>RELIANCE</row>
<row>BPCL></row>
.
.
<header>Open</header>
<row>9,645.90</row>
<row>1,390.00</row>
.
.
</root>

For your reference:
import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import org.apache.commons.csv.QuoteMode;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class CsvToXml {
public static void main(String[] args) {
File inputFile = new File("C:/Users/Admin/Desktop/data.csv");
CSVParser inParser = null;
Document newDoc = null;
try {
inParser = CSVParser.parse(inputFile, StandardCharsets.UTF_8,
CSVFormat.EXCEL.withHeader().withQuoteMode(QuoteMode.NON_NUMERIC));
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
newDoc = domBuilder.newDocument();
// Root element
Element rootElement = newDoc.createElement("root");
newDoc.appendChild(rootElement);
List<CSVRecord> records = inParser.getRecords();
for (String key : inParser.getHeaderMap().keySet()) {
Element rowElement = newDoc.createElement("header");
rootElement.appendChild(rowElement);
rowElement.setTextContent(key);
for (CSVRecord record : records) {
rowElement = newDoc.createElement("row");
rootElement.appendChild(rowElement);
rowElement.setTextContent(record.get(key));
}
}
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
Result result = new StreamResult(new File("C:/Users/Admin/Desktop/data.xml"));
aTransformer.transform(src, result);
System.out.println("File creation successfully!");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (inParser != null) {
try {
inParser.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
This is using Apache Commons CSV.

splitting Sitemap into more sitemaps if it has more than maxnumber of urls

I would like to split my Sitemap into Sitemaps, if it has more than maxURLs. The following example should split the Sitemap, if it has more than one url.
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.CharacterData;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class SiteMapSplitter {
public static void main(String[] args){
String sitemapStr = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
"<url>\n" +
"<loc>test1.html</loc>\n" +
"<lastmod>today</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url>\n" +
"<url>\n" +
"<loc>test2.html</loc>\n" +
"<lastmod>yesterday</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url></urlset>";
try {
splitSitemap(sitemapStr);
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
static private void splitSitemap(String sitemapStr) throws ParserConfigurationException {
DocumentBuilder db = null;
try {
db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(sitemapStr));
Document doc = null;
try {
doc = db.parse(is);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NodeList nodes = doc.getElementsByTagName("url");
int maxURLs = 1;
Set<String> smURLsSet= new HashSet<String>();
if (nodes.getLength()>maxURLs){
for (int i = 0; i < nodes.getLength(); i++) {
StringBuilder smURLsBuilder = new StringBuilder("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n");
for (int k = 0; k<maxURLs; k++){
Element element = (Element) nodes.item(i);
smURLsBuilder.append(element);
}
smURLsSet.add(smURLsBuilder.toString());
}
Iterator i = smURLsSet.iterator();
while(i.hasNext()){
System.out.println(i.next());
}
}
}
}
The problem is that Element element = (Element) nodes.item(i); smURLsBuilder.append(element);
does not append the whole element (in this case the url and its childreen) to the smURLsBuilder. How to do this?

You should consider using an object oriented approach to the sitemap. Either with data binding (JAXB) or even shorter using data projection (Disclosure: I'm affiliated with that project). This way you do not need to create the XML by string concatenation.
public class SitemapSplitter {
static String sitemapStr = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
"<url>\n" +
"<loc>test1.html</loc>\n" +
"<lastmod>today</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url>\n" +
"<url>\n" +
"<loc>test2.html</loc>\n" +
"<lastmod>yesterday</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url></urlset>";
public interface Sitemap {
#XBWrite("/urlset/url")
Sitemap setUrls(List<? extends Node> urls);
}
public static void main(String... args) {
XBProjector projector = new XBProjector(Flags.TO_STRING_RENDERS_XML);
// Get all urls from existing sitemap.
List<Node> urlNodes = projector.onXMLString(sitemapStr).evalXPath("/xbdefaultns:urlset/xbdefaultns:url").asListOf(Node.class);
for (Node urlNode: urlNodes) {
// Create a new sitemap, here with only one url
Sitemap newSitemap = projector.onXMLString(sitemapStr).createProjection(Sitemap.class).setUrls(Collections.singletonList(urlNode));
System.out.println(newSitemap);
}
}
}
This program prints out
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>test1.html</loc>
<lastmod>today</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
</urlset>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>test2.html</loc>
<lastmod>yesterday</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
</urlset>

Java Xml Parsing :Distinguish two Xml Elements by missing tag

From a Webserver request i get an XML Response,which contains my needed data.
it looks like (excerpt):
<ctc:BasePrice>
<cgc:PriceAmount amountCurrencyID="EUR">18.75</cbc:PriceAmount>
<cgc:BaseQuantity quantityUnitCode="EA">1</cbc:BaseQuantity>
</ctc:BasePrice>
<ctc:BasePrice>
<cgc:PriceAmount amountCurrencyID="EUR">18.25</cbc:PriceAmount>
<cgc:BaseQuantity quantityUnitCode="EA">1</cbc:BaseQuantity>
<cgc:MinimumQuantity quantityUnitCode="EA">3</cbc:MinimumQuantity>
<ctc:BasePrice>
What i need is the first "PriceAmount" value,which could be a different price then the second.
But how can i make sure to retrieve the correct one,by "telling" the parser he should take the element which does not contain the "MinimumQuantity" Field and distinguish them ?
I read a lot in Sax etc but could find an idea how to implement a "logic" for that.
Maybe someone ran into similar problem.Thanks in advance for every hint.

You could use xpath for this. The expression "//*[local-name()='MinimumQuantity']/../*[local-name()='PriceAmount']" should return what you want. eg
import java.io.ByteArrayInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XpathParser {
public static void main(String[] args) {
try {
String xml = "<root>"
+ "<ctc:BasePrice>"
+ "<cgc:PriceAmount amountCurrencyID=\"EUR\">18.75</cgc:PriceAmount>"
+ "<cgc:BaseQuantity quantityUnitCode=\"EA\">1</cgc:BaseQuantity>"
+ "</ctc:BasePrice>"
+ "<ctc:BasePrice>"
+ "<cgc:PriceAmount amountCurrencyID=\"EUR\">18.25</cgc:PriceAmount>"
+ "<cgc:BaseQuantity quantityUnitCode=\"EA\">1</cgc:BaseQuantity>"
+ "<cgc:MinimumQuantity quantityUnitCode=\"EA\">3</cgc:MinimumQuantity>"
+ "</ctc:BasePrice>"
+ "</root>";
InputStream xmlStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(xmlStream);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//*[local-name() = 'BasePrice' and not(descendant::*[local-name() = 'MinimumQuantity'])]/*[local-name()='PriceAmount']";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
}
Dom Parser way
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class DomParser{
public static void main(String[] args) {
try {
String xml = "<root>"
+ "<ctc:BasePrice>"
+ "<cgc:PriceAmount amountCurrencyID=\"EUR\">18.75</cgc:PriceAmount>"
+ "<cgc:BaseQuantity quantityUnitCode=\"EA\">1</cgc:BaseQuantity>"
+ "</ctc:BasePrice>"
+ "<ctc:BasePrice>"
+ "<cgc:PriceAmount amountCurrencyID=\"EUR\">18.25</cgc:PriceAmount>"
+ "<cgc:BaseQuantity quantityUnitCode=\"EA\">1</cgc:BaseQuantity>"
+ "<cgc:MinimumQuantity quantityUnitCode=\"EA\">3</cgc:MinimumQuantity>"
+ "</ctc:BasePrice>"
+ "</root>";
InputStream xmlStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(xmlStream);
xmlDocument.getDocumentElement().normalize();
NodeList nList = xmlDocument.getElementsByTagName("ctc:BasePrice");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
if(eElement.getElementsByTagName("cgc:MinimumQuantity").getLength() == 0){
System.out.println(eElement.getElementsByTagName("cgc:PriceAmount").item(0).getTextContent());
}
}
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Java jdom xml parsing

it's my first day with java and I try to build a little xml parser for my websites, so I can have a clean look on my sitemaps.xml . The code I use is like that
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.util.List;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
class downloadxml {
public static void main(String[] args) throws IOException {
String str = "http://www.someurl.info/sitemap.xml";
URL url = new URL(str);
InputStream is = url.openStream();
int ptr = 0;
StringBuilder builder = new StringBuilder();
while ((ptr = is.read()) != -1) {
builder.append((char) ptr);
}
String xml = builder.toString();
org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder();
try {
org.jdom2.Document doc = saxBuilder.build(new StringReader(xml));
System.out.println(xml);
Element xmlfile = doc.getRootElement();
System.out.println("ROOT -->"+xmlfile);
List list = xmlfile.getChildren("url");
System.out.println("LIST -->"+list);
} catch (JDOMException e) {
// handle JDOMExceptio n
} catch (IOException e) {
// handle IOException
}
System.out.println("===========================");
}
}
When the code pass
System.out.println(xml);
I get a clean print of the xml sitemap. When it comes to:
System.out.println("ROOT -->"+xmlfile);
Output:
ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]
It also finds the root element. But for some reason or another, when the script should go for the childs, it return an empty print:
System.out.println("LIST -->"+list);
Output:
LIST -->[]
What should I do in another way? Any pointers to get the childs?
The XML looks like this
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://www.image.url</loc>
<image:image>
<image:loc>http://www.image.url/image.jpg</image:loc>
</image:image>
<changefreq>daily</changefreq>
</url>
<url>
</urlset>

You've come a long way in a day.
Short answer, you are ignoring the namespace of your XML Document. Change the line:
List list = xmlfile.getChildren("url");
to
Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
List list = xmlfile.getChildren("url", ns);
For your convenience, you may also want to simplify the whole build process to:
org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");

My comment is similar to the above, but with the catch clauses, that display nice messages when the input xml is not "well-formed". The input here is an xml file.
File file = new File("adr781.xml");
SAXBuilder builder = new SAXBuilder(false);
try {
Document doc = builder.build(file);
Element root = doc.getRootElement();
} catch (JDOMException e) {
say(file.getName() + " is not well-formed.");
say(e.getMessage());
} catch (IOException e) {
say("Could not check " + file.getAbsolutePath());
say(" because " + e.getMessage());
}

how to parse this XML using DOM and put its content in a hashtable?

I want two hash tables out of the following XML. The first one being (screen id,widget id)
and the second one being (widget id,string id).
I have been able to parse this XML using DOM but putting its content into a hash table is what I haven't done.
<?xml version="1.0" encoding="UTF-8"?>
<screen id="616699" name ="SCR_NEW_HOME">
<widget id="617259" type="label" name= "NEW_HOME_TITLE">
<attribute type = "Strings">
<val id="54">HOME_SYSSETUP</val>
</attribute>
</widget>
<widget id="616836" type = "label" name ="HOME_MENU">
<attribute type="Strings">
<val id="1815" >DAILY</val>
<val id="2060" >MONTH_NOV</val>
<val id="1221" >ASPECT_RATIO_PANSCAN</val>
</attribute>
</widget>
<screen id="1556" name="SCR_EVENTLIST">
<widget id="77009" type= "label" name="EL_GUIDE_EVENT_TABLE">
<attribute type ="Strings">
<val id="1">time</val>
<val id="2">date</val>
</attribute>
</widget>
<widget id="186461" type= "label" name= "EL_PIG_CONT">
<attribute type ="Strings">
<val id="3">progress bar</val>
<val id="4">video</val>
</attribute>
</widget>
and the code I have tried is
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Hashtable;
public class ReadXmlFile {
private static Hashtable<Integer,ArrayList<Integer>> D1 = new Hashtable<Integer, ArrayList<Integer>>();
private static Hashtable<Integer,ArrayList<Integer>> D2 = new Hashtable<Integer,ArrayList<Integer>>();
static Integer ScreenID;
static ArrayList<Integer> StringID;
static ArrayList<Integer> WidgetID;
static Integer WidgetID2;
public static void main(String argv[]) {
// try {
File fXmlFile = new File("E:/eclipse workspace/stringValidation/screens.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = null;
try {
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Document doc = null;
try {
doc = dBuilder.parse(fXmlFile);
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
if(doc.hasChildNodes()){
printNote(doc.getChildNodes());
}
}
private static void printNote(NodeList nodeList) {
for (int count = 0; count < nodeList.getLength(); count++) {
Node tempNode = nodeList.item(count);
// make sure it's element node.
if (tempNode.getNodeType() == Node.ELEMENT_NODE) {
// get node name and value
System.out.println("\nNode Name =" + tempNode.getNodeName() + " [OPEN]");
System.out.println("Node Value =" + tempNode.getTextContent());
if (tempNode.hasAttributes()) {
// get attributes names and values
NamedNodeMap nodeMap = tempNode.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
Node node = nodeMap.item(i);
System.out.println("attr name : " + node.getNodeName());
System.out.println("attr value : " + node.getNodeValue());
}
}
if (tempNode.hasChildNodes()) {
// loop again if has child nodes
printNote(tempNode.getChildNodes());
}
System.out.println("Node Name =" + tempNode.getNodeName() + " [CLOSE]");
}
}
}
}

By looking at your code I assume that you are able to print out the whole document, but not to locate the data you are looking for. I do things like that with XMLBeam. It lets you create an object oriented representation of the data you need, without having to follow the complete structure of the xml you are processing. Here is how to extract the data in your first xml file, the second is just as easy (and much shorter than walking through the DOM by hand):
public class TestFirst {
#XBDocURL("res://first.xml")
public interface Projection {
#XBRead("//screen")
List<Screen> getScreens();
}
public interface Widget {
#XBRead("./#id")
String getID();
#XBRead("./attribute[#type='Strings']/val")
List<String> getStringAttributes();
}
public interface Screen {
#XBRead("./#id")
String getID();
#XBRead("./widget")
List<Widget> getWidgets();
}
#Test
public void testFirst() throws IOException {
Projection projection = new XBProjector().io().fromURLAnnotation(Projection.class);
for (Screen screen:projection.getScreens()) {
for (Widget widget:screen.getWidgets()) {
for (String string:widget.getStringAttributes()) {
System.out.println(screen.getID()+" "+ widget.getID()+ " "+ string);
}
}
}
}
}
This prints out
616699 617259 HOME_SYSSETUP
616699 616836 DAILY
616699 616836 MONTH_NOV
616699 616836 ASPECT_RATIO_PANSCAN
Now you should be able to fill your HashTable. (Plz consider HashMap or ConcurrentHashMap)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing xml data from one xml to a new xml in Java - java

Related

No getting desired XML output in Java

splitting Sitemap into more sitemaps if it has more than maxnumber of urls

Java Xml Parsing :Distinguish two Xml Elements by missing tag

Java jdom xml parsing

how to parse this XML using DOM and put its content in a hashtable?

Categories

Resources