Parse Specific Elements DOM - Java

Parse Specific Elements DOM - Java - java

I believe this is a simple question but I am having trouble to find out how it works.
That's the XML file (from www.w3schools.com):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Edited by XMLSpy® -->
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
As you can see the book XQuery Kick Start has more than one author.
But I cant find a way to get the right number of authors.
Thats my code:
public static void main(String argv[]) throws ParserConfigurationException, SAXException, IOException {
File fXmlFile = new File("\books.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("book");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Category : " + eElement.getAttribute("category"));
System.out.println("Title : " + eElement.getElementsByTagName("title").item(0).getTextContent());
System.out.println("Author : " + eElement.getElementsByTagName("author").item(0).getTextContent());
System.out.println("Year : " + eElement.getElementsByTagName("year").item(0).getTextContent());
System.out.println("Price : " + eElement.getElementsByTagName("price").item(0).getTextContent());
}
}
But as Result I'll be getting only one author:
Root element :bookstore
----------------------------
Current Element :book
Categoria do Livro : cooking
Titulo : Everyday Italian
Autor : Giada De Laurentiis
Ano : 2005
Price : 30.00
Current Element :book
Categoria do Livro : children
Titulo : Harry Potter
Autor : J K. Rowling
Ano : 2005
Price : 29.99
Current Element :book
Categoria do Livro : web
Titulo : XQuery Kick Start
Autor : James McGovern
Ano : 2003
Price : 49.99
Current Element :book
Categoria do Livro : web
Titulo : Learning XML
Autor : Erik T. Ray
Ano : 2003
Price : 39.95
Does anyone knows a good method to get the right number of elements?
sorry about the long question, I didnt know how to express myself so I had to paste here
*I'm new to DOM*

You're are getting the first author always, as you're retrieving the first item of the nodelist
getElementsByTagName("author").item(0)
Try iterating them, as there could be more than one
for (int i = 0; i < eElement.getElementsByTagName("author").getLength(); i++)
System.out.println("Author : " +
eElement.getElementsByTagName("author").item(i).getTextContent());

Related

Parsing the multilevel XML File with Java - Dom Parser

I have this xml file that contains 3 categories:employee_list, position_details and employee_info.
<?xml version="1.0" encoding="UTF-8"?>
<employee>
<employee_list>
<employee ID="1">
<firstname>Andrei</firstname>
<lastname>Rus</lastname>
<age>23</age>
<position-skill ref="Java"/>
<detail-ref ref="AndreiR"/>
</employee>
<employee ID="2">
<firstname>Ion</firstname>
<lastname>Popescu</lastname>
<age>25</age>
<position-skill ref="Python"/>
<detail-ref ref="IonP"/>
</employee>
<employee ID="3">
<firstname>Georgiana</firstname>
<lastname>Domide</lastname>
<age>33</age>
<position-skill ref="C"/>
<detail-ref ref="GeorgianaD"/>
</employee>
</employee_list>
<position_details>
<position ID="Java">
<role>Junior Developer</role>
<skill_name>Java</skill_name>
<experience>1</experience><!-- years of experience -->
</position>
<position ID="Python">
<role>Developer</role>
<skill_name>Python</skill_name>
<experience>3</experience>
</position>
<position ID="C">
<role>Senior Developer</role>
<skill_name>C</skill_name>
<experience>5</experience>
</position>
</position_details>
<employee_info>
<detail ID="AndreiR">
<username>AndreiR</username>
<residence>Timisoara</residence>
<yearOfBirth>1999</yearOfBirth>
<phone>0</phone>
</detail>
<detail ID="IonP">
<username>IonP</username>
<residence>Timisoara</residence>
<yearOfBirth>1997</yearOfBirth>
<phone>0</phone>
</detail>
<detail ID="GeorgianaD">
<username>GeorgianaD</username>
<residence>Arad</residence>
<yearOfBirth>1989</yearOfBirth>
<phone>0</phone>
</detail>
</employee_info>
</employee>
I would like to write java code for all 3 categories, but so far I have only managed to get past the first category (employee_list). When I try to retrieve information from the position_list or employee_info category, the program fails to find information according to each category.
I wrote the Java code for the 3 categories and the result looks like this:
package Dom;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
public class main {
public static void main(String[] args) {
try {
File xmlDoc = new File("employees.xml");
DocumentBuilderFactory dbFact = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuild = dbFact.newDocumentBuilder();
Document doc = dBuild.parse(xmlDoc);
//Citim radacina
// doc localizeaza radacina da numele ei
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
System.out.println("-----------------------------------------------------------------------------");
//citim un array de studenti pe care il denumim NodeList
NodeList nList = doc.getElementsByTagName("employee");
System.out.println("Total Category inside = " + nList.getLength());
System.out.println("-----------------------------------------------------");
for(int i = 0 ; i<nList.getLength();i++) {
Node nNode = nList.item(i);
//System.out.println("Node name: " + nNode.getNodeName()+" " + (i+1));
if(nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Person id#: " + eElement.getAttribute("id"));
System.out.println("Person Last Name: " + eElement.getElementsByTagName("lastname").item(0).getTextContent());
System.out.println("Person First name: " + eElement.getElementsByTagName("firstname").item(0).getTextContent());
System.out.println("Person Age: " + eElement.getElementsByTagName("age").item(0).getTextContent());
System.out.println("--------------------------------------------------------------------------");
}
}
System.out.println("=============================================================================================");
nList = doc.getElementsByTagName("position");
System.out.println("Total Category inside = " + nList.getLength());
System.out.println("-----------------------------------------------------");
for(int i = 0 ; i<nList.getLength();i++) {
Node nNode = nList.item(i);
//System.out.println("Node name: " + nNode.getNodeName()+" " + (i+1));
if(nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Role: " + eElement.getElementsByTagName("role").item(0).getTextContent());
System.out.println("Skill: "+ eElement.getElementsByTagName("skill_name").item(0).getTextContent());
System.out.println("Experience: "+ eElement.getElementsByTagName("experience").item(0).getTextContent());
System.out.println("--------------------------------------------------------------------------");
}
}
System.out.println("=============================================================================================");
nList = doc.getElementsByTagName("detail");
System.out.println("Total Category inside = " + nList.getLength());
System.out.println("-----------------------------------------------------");
for(int i = 0 ; i<nList.getLength();i++) {
Node nNode = nList.item(i);
//System.out.println("Node name: " + nNode.getNodeName()+" " + (i+1));
if(nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Person with username: " + eElement.getElementsByTagName("username").item(0).getTextContent());
System.out.println("Username: " + eElement.getElementsByTagName("username").item(0).getTextContent());
System.out.println("Residence: "+ eElement.getElementsByTagName("residence").item(0).getTextContent());
System.out.println("Year of birth: "+ eElement.getElementsByTagName("yearOfBirth").item(0).getTextContent());
System.out.println("Phone: "+ eElement.getElementsByTagName("phone").item(0).getTextContent());
System.out.println("--------------------------------------------------------------------------");
}
}
}catch(Exception e) {
}
}
}
output:
Root element: employee
-----------------------------------------------------------------------------
Total Category inside = 4
-----------------------------------------------------
Person id#:
Person Last Name: Rus
Person First name: Andrei
Person Age: 23
--------------------------------------------------------------------------
Person id#:
Person Last Name: Rus
Person First name: Andrei
Person Age: 23
--------------------------------------------------------------------------
Person id#:
Person Last Name: Popescu
Person First name: Ion
Person Age: 25
--------------------------------------------------------------------------
Person id#:
Person Last Name: Domide
Person First name: Georgiana
Person Age: 33
--------------------------------------------------------------------------
=============================================================================================
Total Category inside = 3
-----------------------------------------------------
Role: Junior Developer
Skill: Java
Experience: 1
--------------------------------------------------------------------------
Role: Developer
Skill: Python
Experience: 3
--------------------------------------------------------------------------
Role: Senior Developer
Skill: C
Experience: 5
--------------------------------------------------------------------------
=============================================================================================
Total Category inside = 3
-----------------------------------------------------
Person with username: AndreiR
Username: AndreiR
Residence: Timisoara
Year of birth: 1999
Phone: 0
--------------------------------------------------------------------------
Person with username: IonP
Username: IonP
Residence: Timisoara
Year of birth: 1997
Phone: 0
--------------------------------------------------------------------------
Person with username: GeorgianaD
Username: GeorgianaD
Residence: Arad
Year of birth: 1989
Phone: 0
--------------------------------------------------------------------------
Is there any possibility that the output could be slightly more grouped, in the following form for each person:
PersonId
firstname
lastname
age
role
skill_name
experience
username
residence
yearOfBirth
phone

But you've basically done it already, you wrote the code for the three categories. "Failing to find information according to each category" might mean that the output is not the desired output. The reason could be that Document.getElementsByTagName() searches globally for all elements that have the name passed as the argument. As your root element too is named employee, it's included as an additional Node in your NodeList on doc.getElementsByTagName("employee"), which btw. doesn't have an ID attribute of it's own (note: these are case-sensitive). Hence a "Total Category inside = 4". If you then do getElementsByTagName("lastname") on this first Node/Element that's the root, sure enough, it has a <lastname/> element below it, just not as a direct child, but two levels down, the element of the first "actual"/desired <employee/>.
So what you probably want to do is to not search for your element names globally, but in the local context of the category, as you already do successfully elsewhere inside the loops. Maybe just change
NodeList nList = doc.getElementsByTagName("employee");
to
NodeList nList = doc.getElementsByTagName("employee_list");
nList = ((Element)nList.item(0)).getElementsByTagName("employee");
for the employee_list category, and likewise for the other categories.
In order to group records more nicely, you don't need to output/print them immediately. You can copy/store the values you get from the DOM in the members of an object of a class you could define, or make a more generic class which stores the field values in a List that contains a Map, or something like that. With such, you can iterate/loop over the objects or list you created, and output/print the field values in your preferred order.

How to convert xml file to HashMap using apache Tika

In my case i am able to read the xml file and parse it to get content as of meta data only provides the type of file which is "application/xml"
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.xml.XMLParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class XmlParserExample {
public static void main(String[] args) throws IOException, SAXException, TikaException {
BodyContentHandler handler = new BodyContentHandler();
XMLParser parser = new XMLParser();
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
FileInputStream inputstream = new FileInputStream(new File("example.xml"));
parser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Contents of the document:" + handler.toString());
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
}
}
Above snippet of code prints the whole xml content and Content Type (as metadata).But i also want to fetch the xml tags as well so that i can create a HashMap which is requirement in my case.
Below is my Dummy example.xml:-
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE PubmedArticleSet SYSTEM "http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">27483086</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1532-849X</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>26</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2017</Year>
<Month>Jun</Month>
</PubDate>
</JournalIssue>
<Title>Journal of prosthodontics : official journal of the American College of Prosthodontists</Title>
<ISOAbbreviation>J Prosthodont</ISOAbbreviation>
</Journal>
<ArticleTitle>The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.</ArticleTitle>
<Pagination>
<MedlinePgn>321-326</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1111jopr.12525</ELocationID>
<Abstract>
<AbstractText>The fabrication of a survey crown under an existing partial removable dental prosthesis (PRDP) has always been a challenge to many dental practitioners. This clinical report presents a technique for fabricating accurate cast gold survey crowns to fit existing PRDPs using CAD/CAM technology. The report describes a technique that would digitally scan the coronal anatomy of a cast gold survey crown and an abutment tooth under existing PRDPs planned for restoration, prior to any preparation. The information is stored in the digital software where all the coronal anatomical details are preserved without any modifications. The scanned designs are then applied to the scanned teeth preparations, sent to the milling machine and milled into full-contour clear acrylic resin burn-out patterns. The acrylic resin patterns are tried in the patient's mouth the same day to verify the full seating of the PRDP components. The patterns are then invested and cast into gold crowns and cemented in the conventional manner.</AbstractText>
<CopyrightInformation>© 2016 by the American College of Prosthodontists.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>El Kerdani</LastName>
<ForeName>Tarek</ForeName>
<Initials>T</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Roushdy</LastName>
<ForeName>Sally</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D002363">Case Reports</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>08</Month>
<Day>02</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Prosthodont</MedlineTA>
<NlmUniqueID>9301275</NlmUniqueID>
<ISSNLinking>1059-941X</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>7440-57-5</RegistryNumber>
<NameOfSubstance UI="D006046">Gold</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>D</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000368" MajorTopicYN="N">Aged</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017076" MajorTopicYN="Y">Computer-Aided Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003442" MajorTopicYN="Y">Crowns</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000044" MajorTopicYN="N">Dental Abutments</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017267" MajorTopicYN="Y">Dental Prosthesis Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003832" MajorTopicYN="Y">Denture, Partial, Removable</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006046" MajorTopicYN="N">Gold</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">CADM</Keyword>
<Keyword MajorTopicYN="N">cast gold</Keyword>
<Keyword MajorTopicYN="N">milled acrylic resin patterns</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>06</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>5</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483086</ArticleId>
<ArticleId IdType="doi">10.111pr.12525</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">27483087</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">2326-5205</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>68</Volume>
<Issue>11</Issue>
<PubDate>
<Year>2016</Year>
<Month>11</Month>
</PubDate>
</JournalIssue>
<Title>Arthritis & rheumatology (Hoboken, N.J.)</Title>
</Journal>
<ArticleTitle>Reply.</ArticleTitle>
<Pagination>
<MedlinePgn>2826-2827</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10t.39831</ELocationID>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Hitchon</LastName>
<ForeName>Carol Ann</ForeName>
<Initials>CA</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Koppejan</LastName>
<ForeName>Hester</ForeName>
<Initials>H</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Trouw</LastName>
<ForeName>Leendert A</ForeName>
<Initials>LA</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Huizinga</LastName>
<ForeName>Tom J W</ForeName>
<Initials>TJ</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Toes</LastName>
<ForeName>René E M</ForeName>
<Initials>RE</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>El-Gabalawy</LastName>
<ForeName>Hani S</ForeName>
<Initials>HS</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>MOP‐77700</GrantID>
<Agency>CIHR</Agency>
<Country>Canada</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016422">Letter</PublicationType>
<PublicationType UI="D013485">Research Sup</PublicationType>
<PublicationType UI="D016420">Comment</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>10</Month>
<Day>09</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Arthritis Rheumatol</MedlineTA>
<NlmUniqueID>101623795</NlmUniqueID>
<ISSNLinking>2326-5191</ISSNLinking>
</MedlineJournalInfo>
<CommentsCorrectionsList>
<CommentsCorrections RefType="CommentOn">
<RefSource>dff</RefSource>
<PMID Version="1">27483211</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="CommentOn">
<RefSource>Arthritis Rheumato</RefSource>
<PMID Version="1">26946484</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>07</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>07</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483087</ArticleId>
<ArticleId IdType="doi">efre</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
Kindly help me out on this.
Thanks

My suggestion: If you want to read an XML file, and then parse its contents, you are probably better off using a purpose-built XML parser, rather than Tika.
There are various options - each with its own pros and cons (for example speed, memory consumption).
Here is one approach - it reads the entire file into memory, but you already do that with your Tika approach, so I assume file size is not a problem.
The code assumes there is a file called pubmed.xml which contains the XML presented in the question.
It reads the XML from file, and handles each element as a DOM node.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
...
public void parseUsingDom() {
try {
File xmlFile = new File("C:/tmp/pubmed.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
NodeList articles = doc.getElementsByTagName("Article");
for (int i = 0; i < articles.getLength(); i++) {
Node article = articles.item(i);
if (article.getNodeType() == Node.ELEMENT_NODE) {
Element articleElement = (Element) article;
String title = articleElement
.getElementsByTagName("ArticleTitle")
.item(0).getTextContent();
System.out.println("");
System.out.println("Title : " + title);
NodeList authors = articleElement.getElementsByTagName("Author");
for (int j = 0; j < authors.getLength(); j++) {
Node author = authors.item(j);
if (author.getNodeType() == Node.ELEMENT_NODE) {
Element authorElement = (Element) author;
String foreName = authorElement
.getElementsByTagName("ForeName")
.item(0).getTextContent();
String lastName = authorElement
.getElementsByTagName("LastName")
.item(0).getTextContent();
System.out.println("Author : " + lastName + ", " + foreName);
}
}
}
}
} catch (Exception e) {
System.err.print(e);
}
}
The program prints the following output, just as a demo of what is possible:
Title : The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.
Author : El Kerdani, Tarek
Author : Roushdy, Sally
Title : Reply.
Author : Hitchon, Carol Ann
Author : Koppejan, Hester
Author : Trouw, Leendert A
Author : Huizinga, Tom J W
Author : Toes, René E M
Author : El-Gabalawy, Hani S
In your case, you would capture the relevant values in a hash map, of course.

XPath Java parsing xml document under more conditions

XPath Java parsing xml under more conditions
I need to show elements from books.xml which satisfy next two
conditions: price > 10 and publish_date > "2006-12-31" . books.xml is:
<?xml version='1.0'?>
<catalog>
<book id='bk110'>
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2006-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id='bk111'>
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2007-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id='bk112'>
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2008-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
When I try this code:
package web.services;
import java.io.File;
import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.xpath.*;
import org.xml.sax.*;
import org.w3c.dom.*;
public class WebServices {
private static void showElements() {
InputSource inputSource = null;
Object result;
NodeList nodeList = null;
String file;
String workingDir = System.getProperty("user.dir");
file="data"+File.separator+"books.xml";
try {
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression xPathExpression = xPath.compile("//book[price > 10][xs:date(publish_date) > xs:date('2005-12-31')]/*/text()");
File xmlDocument = new File(file);
try {
inputSource = new InputSource(new FileInputStream(xmlDocument));
} catch (FileNotFoundException ex) {
Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex);
}
result = xPathExpression.evaluate(inputSource, XPathConstants.NODESET);
nodeList = (NodeList) result;
} catch (XPathExpressionException ex) {
Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex);
}
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.print("Node name: " + nodeList.item(i).getNodeName());
System.out.print(" | ");
System.out.println("Node value: " + nodeList.item(i).getNodeValue());
System.out.println("------------------------------------------------");
}
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
showElements();
}
}
I'm getting this error:
maj 27, 2015 10:01:19 AM web.services.WebServices showElements
SEVERE: null
javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
... 6 more
---------
java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:305)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
Caused by: javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
... 2 more
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
... 6 more
Exception in thread "main" java.lang.NullPointerException
at web.services.WebServices.showElements(WebServices.java:45)
at web.services.WebServices.main(WebServices.java:58)
Java Result: 1
BUILD SUCCESSFUL (total time: 2 seconds)
What is wrong? Thank you!

You are trying to use XPath 2.0 data types like xs:date while the XPath implementation in the Oracle JRE only supports XPath 1.0 which does not know any such data types. For that particular path expression it should be possible to use XPath 1.0 and simple number comparison with a path like //book[price > 10][number(translate(publish_date, '-', '')) > 20051231].
If you want to use XPath 2.0 you need to look into third party libraries like Saxon 9 or into XQuery implementations (as XPath 2.0 is a subset of XQuery 1.0).

How to browse and to display XML content with Java

I have a problem in Java with DOM...
Here is the XML code :
<?xml version="1.0" encoding="UTF-8"?>
<bib>
<domain>
<title>Specifications</title>
<bib_ref>
<year>March 2000</year>
<title>MOF 1.3</title>
<author>OMG</author>
<weblink>D:\SALIM\Docs\Specifications\MOF1_3.pdf</weblink>
</bib_ref>
<bib_ref>
<year>August 2002</year>
<title>IDLto Java LanguageMapping Specification</title>
<author>OMG</author>
<weblink>D:\SALIM\Docs\Specifications\IDL2Java.pdf</weblink>
</bib_ref>
<bib_ref>
<year>1999</year>
<title>XML Metadata Interchange (XMI) Version 1.1</title>
<author>OMG</author>
<weblink>D:\SALIM\Docs\Specifications\xmi-1.1.pdf</weblink>
</bib_ref>
<bib_ref>
<year>2002</year>
<title>XML Metadata Interchange (XMI) Version 2</title>
<author>"OMG</author>
<weblink>D:\SALIM\Docs\Specifications\XMI2.pdf</weblink>
</bib_ref>
<bib_ref>
<year>2002</year>
<title>XMI Version 1Production of XML Schema Specification</title>
<author>OMG</author>
<weblink>D:\SALIM\Docs\Specifications\XMI1XSD.pdf</weblink>
</bib_ref>
<bib_ref>
<year>2002</year>
<title>EDOC</title>
<author>OMG</author>
<weblink>D:\SALIM\Docs\Specifications\EDOC02-02-05.pdf</weblink>
</bib_ref>
</domain>
<domain>
<title>Theses</title>
<bib_ref>
<year>Octobre 2001</year>
<title>Echanges de Spécifications Hétérogènes et Réparties</title>
<author>Xavier Blanc</author>
<weblink>D:\SALIM\Docs\Theses\TheseXavier.pdf</weblink>
</bib_ref>
<bib_ref>
<year>Janvier 2001</year>
<title>Composition of Object-Oriented Software Design Models</title>
<author>Siobhan Clarke</author>
<weblink>D:\SALIM\Docs\Theses\SClarkeThesis.pdf</weblink>
</bib_ref>
......
......
After, in Java main function, I call the dispContent function which is just there:
public void dispContent (Node n)
{
String domainName = null;
// we are in an element node
if (n instanceof Element) {
Element e = ((Element) n);
// domain title
if (e.getTagName().equals("title") && e.getParentNode().getNodeName().equals("domain")) {
domainName = e.getTextContent();
DomaineTemplate(domainName);
}
else if (e.getTagName().equals("bib_ref")) {
NodeList ref = e.getChildNodes();
for (int i = 0; i < ref.getLength(); i++) {
Node temp = (Node) ref.item(i);
if (temp.getNodeType() == Node.ELEMENT_NODE) {
if (temp.getNodeType() == org.w3c.dom.Node.TEXT_NODE)
continue;
out.println(temp.getNodeName() + " : " + temp.getTextContent() + "\n");
}
}
}
else {
NodeList sub = n.getChildNodes();
for(int i=0; (i < sub.getLength()); i++)
dispContent(sub.item(i));
}
}
/*else if (n instanceof Document) {
NodeList fils = n.getChildNodes();
for(int i=0; (i < fils.getLength()); i++) {
dispContent(fils.item(i));
}
}*/
}
The "domaineTemplate" function just displays its parameter!
My problem happens when I browse the "bib_ref" tag in Java. For each "bib_ref" loop, it displays all the contents of all "bib_ref" tags... in one line! I want to display only one content (year, title, author and weblink tag) per "bib_ref".
Here is what it is displaying at the moment when I browse bib_ref :
Specifications
year : March 2000 title : MOF 1.3 author : OMG weblink : D:\SALIM\Docs\Specifications\MOF1_3.pdf year : August 2002 title : IDLto Java LanguageMapping Specification author : OMG weblink : D:\SALIM\Docs\Specifications\IDL2Java.pdf year : 1999 title : XML Metadata Interchange (XMI) Version 1.1 author : OMG weblink : D:\SALIM\Docs\Specifications\xmi-1.1.pdf year : 2002 title : XML Metadata Interchange (XMI) Version 2 author : OMG weblink : D:\SALIM\Docs\Specifications\XMI2.pdf year : 2002 title : XMI Version 1Production of XML Schema Specification author : OMG weblink : D:\SALIM\Docs\Specifications\XMI1XSD.pdf year : 2002 title : EDOC author : OMG weblink : D:\SALIM\Docs\Specifications\EDOC02-02-05.pdf
Theses
year : Octobre 2001 title : Echanges de Sp�cifications H�t�rog�nes et R�parties author : Xavier Blanc weblink : D:\SALIM\Docs\Theses\TheseXavier.pdf year : Janvier 2001 title : Composition of Object-Oriented Software Design Models author : Siobhan Clarke weblink : D:\SALIM\Docs\Theses\SClarkeThesis.pdf year : Juin 2002 title : Contribution � la repr�sentation de processu par des techniques de m�ta mod�lisation author : Erwan Breton weblink : D:\SALIM\Docs\Theses\ErwanBretonThesis.pdf year : Octobre 2000 title : Technique de Mod�lisation et de M�ta mod�lisation author : Richard Lemesle weblink : D:\SALIM\Docs\Theses\RichardLemesle.pdf year : Juillet 2002 title : Utilsation d'agents mobiles pour la construction des services distribu�s author : Siegfried Rouvrais weblink : D:\SALIM\Docs\Theses\theserouvrais.pdf
...
...
Can you help me ?
I'm just a beginner in xml with java and I'm searching some solution for about 3 hours... Thank's a lot!

I called dispContent(doc.getFirstChild()); where doc is the Document with your given xml file contents.
Assumptions: out.println() is System.out.println() and DomaineTemplate(domainName); adds a newline (based on your output provided)
I get the following print out in the console:
Specifications
year : March 2000
title : MOF 1.3
author : OMG
weblink : D:\SALIM\Docs\Specifications\MOF1_3.pdf
year : August 2002
title : IDLto Java LanguageMapping Specification
author : OMG
weblink : D:\SALIM\Docs\Specifications\IDL2Java.pdf
year : 1999
title : XML Metadata Interchange (XMI) Version 1.1
author : OMG
weblink : D:\SALIM\Docs\Specifications\xmi-1.1.pdf
year : 2002
title : XML Metadata Interchange (XMI) Version 2
author : "OMG
weblink : D:\SALIM\Docs\Specifications\XMI2.pdf
year : 2002
title : XMI Version 1Production of XML Schema Specification
author : OMG
weblink : D:\SALIM\Docs\Specifications\XMI1XSD.pdf
year : 2002
title : EDOC
author : OMG
weblink : D:\SALIM\Docs\Specifications\EDOC02-02-05.pdf
Theses
year : Octobre 2001
title : Echanges de Spécifications Hétérogènes et Réparties
author : Xavier Blanc
weblink : D:\SALIM\Docs\Theses\TheseXavier.pdf
year : Janvier 2001
title : Composition of Object-Oriented Software Design Models
author : Siobhan Clarke
weblink : D:\SALIM\Docs\Theses\SClarkeThesis.pdf
If you're having an issue with "\n" creating a new line, you can try using what the System uses:
public static final String NEW_LINE = System.getProperty("line.separator");
If you don't want the new lines between each line of the "bib_ref" node's children print outs, change:
else if (e.getTagName().equals("bib_ref")) {
NodeList ref = e.getChildNodes();
for (int i = 0; i < ref.getLength(); i++) {
Node temp = (Node) ref.item(i);
if (temp.getNodeType() == Node.ELEMENT_NODE) {
if (temp.getNodeType() == org.w3c.dom.Node.TEXT_NODE)
continue;
out.println(temp.getNodeName() + " : " + temp.getTextContent() + "\n");
}
}
}
to:
else if (e.getTagName().equals("bib_ref")) {
NodeList ref = e.getChildNodes();
for (int i = 0; i < ref.getLength(); i++) {
Node temp = (Node) ref.item(i);
if (temp.getNodeType() == Node.ELEMENT_NODE) {
if (temp.getNodeType() == org.w3c.dom.Node.TEXT_NODE)
continue;
// Removed "\n":
out.println(temp.getNodeName() + " : " + temp.getTextContent());
}
}
// Added out.println();
out.println();
}
Results:
Specifications
year : March 2000
title : MOF 1.3
author : OMG
weblink : D:\SALIM\Docs\Specifications\MOF1_3.pdf
year : August 2002
title : IDLto Java LanguageMapping Specification
author : OMG
weblink : D:\SALIM\Docs\Specifications\IDL2Java.pdf
year : 1999
title : XML Metadata Interchange (XMI) Version 1.1
author : OMG
weblink : D:\SALIM\Docs\Specifications\xmi-1.1.pdf
year : 2002
title : XML Metadata Interchange (XMI) Version 2
author : "OMG
weblink : D:\SALIM\Docs\Specifications\XMI2.pdf
year : 2002
title : XMI Version 1Production of XML Schema Specification
author : OMG
weblink : D:\SALIM\Docs\Specifications\XMI1XSD.pdf
year : 2002
title : EDOC
author : OMG
weblink : D:\SALIM\Docs\Specifications\EDOC02-02-05.pdf
Theses
year : Octobre 2001
title : Echanges de Spécifications Hétérogènes et Réparties
author : Xavier Blanc
weblink : D:\SALIM\Docs\Theses\TheseXavier.pdf
year : Janvier 2001
title : Composition of Object-Oriented Software Design Models
author : Siobhan Clarke
weblink : D:\SALIM\Docs\Theses\SClarkeThesis.pdf

Of course, now I see you have tagged your question with html and your code has nothing to do with html so far. So I asume out.println is some OutputStream for a Servlet and you're trying to Output html.
So the println linebreak, and the "\n" linebreaks are just available in the html sourcecode. The browser will skip this.
Change this line
out.println(temp.getNodeName() + " : " + temp.getTextContent() +
"\n");
to
out.println(temp.getNodeName() + " : " + temp.getTextContent() + "< br
/>");
and a servlet should output just fine with the linebreaks.

DOM Parser for reading XML File(edit)

I want to read following XML file using DOM Parser.
<?xml version="1.0" encoding="UTF-8"?>
<CCL>
<COUNTRY>
<COUNTRYNAME>INDIA</COUNTRYNAME>
<CITY>
<CITYNAME>NOIDA</CITYNAME>
<LOCALITY>SEC 22^SEC 24^SEC 55</LOCALITY>
</CITY>
<CITY>
<CITYNAME>DELHI</CITYNAME>
<LOCALITY>MAYUR VIHAR^PATPARGANJ^CHANDNI CAHUK</LOCALITY>
</CITY>
</COUNTRY>
<COUNTRY>
<COUNTRYNAME>SINGAPORE</COUNTRYNAME>
<CITY>
<CITYNAME>TIONG BAHRU</CITYNAME>
<LOCALITY>BLK 150^BLK 154^BLK 129</LOCALITY>
</CITY>
<CITY>
<CITYNAME>TANJONG PAGAR</CITYNAME>
<LOCALITY>MAXWELL ROAD^CECILL STREET^AXA TOWER</LOCALITY>
</CITY>
</COUNTRY>
</CCL>
and my java code is
public void ReadXMlFile(File f) throws ParserConfigurationException, SAXException, IOException
{
log.info("Reading log file" + f.getName() + ", from: "+ f.getAbsolutePath());
File fXmlFile = f;
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
log.info("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("COUNTRY");
log.info("-----------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
log.info("COUNTRYNAME: " + getTagValue("COUNTRYNAME", eElement));
NodeList nodel= eElement.getChildNodes();
for(int tempcity=0; tempcity< nodel.getLength() ; tempcity++)
{
Node nNode_1 = nodel.item(tempcity);
if (nNode_1.getNodeType() == Node.ELEMENT_NODE) {
Element eElement_1 = (Element) nNode_1;
log.info("CITYNAME: " + getTagValue("CITYNAME", eElement));
log.info("LOCALITY: " + getTagValue("LOCALITY", eElement));
}
}
}
}
}
private static String getTagValue(String sTag, Element eElement) {
NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();
Node nValue = (Node) nlList.item(0);
return nValue.getNodeValue();
}
I m getting followin output
INFO [http-8080-1] (UtilityClass.java:41) - Root element :CCL
INFO [http-8080-1] (UtilityClass.java:43) - -----------------------
INFO [http-8080-1] (UtilityClass.java:52) - COUNTRYNAME: INDIA
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: NOIDA
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: SEC 22^SEC 24^SEC 55
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: NOIDA
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: SEC 22^SEC 24^SEC 55
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: NOIDA
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: SEC 22^SEC 24^SEC 55
INFO [http-8080-1] (UtilityClass.java:52) - COUNTRYNAME: SINGAPORE
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: TIONG BAHRU
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: BLK 150^BLK 154^BLK 129
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: TIONG BAHRU
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: BLK 150^BLK 154^BLK 129
INFO [http-8080-1] (UtilityClass.java:61) - CITYNAME: TIONG BAHRU
INFO [http-8080-1] (UtilityClass.java:62) - LOCALITY: BLK 150^BLK 154^BLK 129
I want to read all entries in CITY tag for every country.
**
I am able to read the country tag, but I am not sure how to read CITY,
CITYNAME for every entry of country. Please help me
**
Can any one help me to resolve this issue

It looks like you're just referencing the wrong Element from your CITYNAME & LOCALITY log messages..
Try changing from:
if (nNode_1.getNodeType() == Node.ELEMENT_NODE) {
Element eElement_1 = (Element) nNode_1;
log.info("CITYNAME: " + getTagValue("CITYNAME", eElement));
log.info("LOCALITY: " + getTagValue("LOCALITY", eElement));
}
To:
if (nNode_1.getNodeType() == Node.ELEMENT_NODE) {
Element eElement_1 = (Element) nNode_1;
log.info("CITYNAME: " + getTagValue("CITYNAME", eElement_1));
log.info("LOCALITY: " + getTagValue("LOCALITY", eElement_1));
}
I can't be sure because I can't see the definition of getTagValue(..), but it looks very likely
HTH

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse Specific Elements DOM - Java - java

Related

Parsing the multilevel XML File with Java - Dom Parser

How to convert xml file to HashMap using apache Tika

XPath Java parsing xml document under more conditions

How to browse and to display XML content with Java

DOM Parser for reading XML File(edit)

Categories

Resources