CSV to XML conversion Java - java
I was working on converting CSV to XML data. By looking at various examples I was able to write the code for parsing the CSV file and getting the XML file. However, the code I have written returns the XML file with incorrect tags.
This is the Code for Conversion:
package com.adarsh.parse;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class Converter {
/* Protected members to avoid instantiation */
protected DocumentBuilderFactory domFactory = null;
protected DocumentBuilder domBuilder = null;
/* Constant strings */
// Input CSV file
final String INPUT_FILE = "sample_data.csv";
// Output XML document
final String OUTPUT_FILE ="in.xml";
// First element in the XML document
final String FIRST_ELEMENT="school";
public Converter(){
try {
domFactory = DocumentBuilderFactory.newInstance();
/* Obtaining instance of class DocumentBuilder */
domBuilder = domFactory.newDocumentBuilder();
}
catch(ParserConfigurationException exp) {
System.err.println(exp.toString());
}
catch(FactoryConfigurationError exp){
System.err.println(exp.toString());
}
catch(Exception exp){
System.err.println(exp.toString());
}
}
/**
* This method converts the given CSV file into an XML document
*/
public int convert(String csvFileName, String xmlFileName) {
int rowCount = -1;
try {
/* Initializing the XML document */
Document newDoc = domBuilder.newDocument();
/* Creating the root element in the XML */
Element rootElem = newDoc.createElement(FIRST_ELEMENT);
newDoc.appendChild(rootElem);
/* Reading the CSV file */
BufferedReader csvFileReader;
csvFileName = INPUT_FILE;
csvFileReader = new BufferedReader(new FileReader(csvFileName));
/* Initialize the number of fields to 0 */
int fieldCount = 0;
String[] csvFields = null;
StringTokenizer stringTokenizer = null;
/**
* Map the column names in the CSV file as the elements in the XML
* document, eliminate any other characters not eligible for XML element
* naming
*/
/* Initialize the current line variable */
String currLine = csvFileReader.readLine();
/* Loop until we reach the end of the file
* edge case: Empty CSV file
* */
if(currLine != null) {
/* Separate fields based on commas */
stringTokenizer = new StringTokenizer(currLine, ",");
fieldCount = stringTokenizer.countTokens();
/* If there is data in the CSV file */
if(fieldCount > 0) {
/* Initialize a String Array of Fields */
csvFields = new String[fieldCount];
int i = 0;
/* Loop till all elements are found and save fields */
while (stringTokenizer.hasMoreElements()) {
csvFields[i++] = String.valueOf(stringTokenizer.nextElement());
}
}
}
else {
System.out.println("Nothing to parse");
}
/* reading rows from the CSV file */
while((currLine = csvFileReader.readLine()) != null) {
stringTokenizer = new StringTokenizer(currLine, ",");
fieldCount = stringTokenizer.countTokens();
/* if rows exist in the CSV file*/
if(fieldCount > 0) {
/* Create the row element*/
Element rowElem = newDoc.createElement("row");
int i = 0;
/* until there are more elements*/
while(stringTokenizer.hasMoreElements()) {
try {
/* Append each element found to each row element*/
String currValue = String.valueOf(stringTokenizer.nextElement());
Element currElem = newDoc.createElement(csvFields[i++]);
currElem.appendChild(newDoc.createTextNode(currValue));
rowElem.appendChild(currElem);
}
catch(Exception exp) {
}
}
/* Append the rows to the root element*/
rootElem.appendChild(rowElem);
rowCount++;
}
}
/* Finish reading the CSV file */
csvFileReader.close();
/* Saving the generated XML doc into required format file to disk */
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
xmlFileName = OUTPUT_FILE;
Result dest = new StreamResult(new File(xmlFileName));
aTransformer.transform(src, dest);
rowCount++;
}
catch(IOException exp) {
System.err.println(exp.toString());
}
catch(Exception exp) {
System.err.println(exp.toString());
}
/* Number of rows parsed into XML */
return rowCount;
}
}
This is the sample CSV data in the file:
classroom_id,classroom_name,teacher_1_id,teacher_1_last_name,teacher_1_first_name,teacher_2_id,teacher_2_last_name,teacher_2_first_name,student_id,student_last_name,student_first_name,student_grade
103, Brian's Homeroom, 10300000001, O'Donnell, Brian, , , , , , , 102,
Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011,
Patterson, John, 10200000011, McCrancy, Brandon, 1 102, Mr. Smith's
PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson,
John, 10200000018, Reginald, Alexis, 1 102, Mr. Smith's PhysEd Class,
10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000019,
Gayle, Matthew, 1 102, Mr. Smith's PhysEd Class, 10200000001, Smith,
Arthur, 10200000011, Patterson, John, 10200000010, Smith, Nathaniel, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur,
10200000011, Patterson, John, 10200000013, Lanni, Erica, 1 102, Mr.
Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011,
Patterson, John, 10200000014, Flores, Michael, 1 102, Mr. Smith's
PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson,
John, 10200000012, Marco, Elizabeth, 1 102, Mr. Smith's PhysEd Class,
10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000016,
Perez, Brittany, 1 102, Mr. Smith's PhysEd Class, 10200000001, Smith,
Arthur, 10200000011, Patterson, John, 10200000015, Hill, Jasmin, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur,
10200000011, Patterson, John, 10200000017, Hiram, William, 1 101, Mrs.
Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000015,
Cruz, Alex, 1 101, Mrs. Jones' Math Class, 10100000001, Jones,
Barbara, , , , 10100000014, Garcia, Lizzie, 1 101, Mrs. Jones' Math
Class, 10100000001, Jones, Barbara, , , , 10100000013, Mercado, Toby,
1 101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , ,
10100000011, Gutierrez, Kimberly, 2 101, Mrs. Jones' Math Class,
10100000001, Jones, Barbara, , , , 10100000010, Gil, Michael, 2
I was expecting to get the output as following in XML file:
<grade id="1">
<classroom id="101" name="Mrs. Jones' Math Class">
<teacher id="10100000001" first_name="Barbara" last_name="Jones"/>
<student id="10100000010" first_name="Michael" last_name="Gil"/>
<student id="10100000011" first_name="Kimberly" last_name="Gutierrez"/>
<student id="10100000013" first_name="Toby" last_name="Mercado"/>
<student id="10100000014" first_name="Lizzie" last_name="Garcia"/>
<student id="10100000015" first_name="Alex" last_name="Cruz"/>
</classroom>
<classroom id="102" name="Mr. Smith's PhysEd Class">
<teacher id="10200000001" first_name="Arthur" last_name="Smith"/>
<teacher id="10200000011" first_name="John" last_name="Patterson"/>
<student id="10200000010" first_name="Nathaniel" last_name="Smith"/>
<student id="10200000011" first_name="Brandon" last_name="McCrancy"/>
<student id="10200000012" first_name="Elizabeth" last_name="Marco"/>
<student id="10200000013" first_name="Erica" last_name="Lanni"/>
<student id="10200000014" first_name="Michael" last_name="Flores"/>
<student id="10200000015" first_name="Jasmin" last_name="Hill"/>
<student id="10200000016" first_name="Brittany" last_name="Perez"/>
<student id="10200000017" first_name="William" last_name="Hiram"/>
<student id="10200000018" first_name="Alexis" last_name="Reginald"/>
<student id="10200000019" first_name="Matthew" last_name="Gayle"/>
</classroom>
<classroom id="103" name="Brian's Homeroom">
<teacher id="10300000001" first_name="Brian" last_name="O'Donnell"/>
</classroom>
</grade>
This is how I am currently getting the output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<school>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>103</classroom_id>
</row>
</school>
So could someone please help me with this? I was wondering where I am going wrong. Thanks
P.S. I have already referred to other question regarding CSV to XML conversions here on stackoverflow. However, I wasn't able to find a suitable solution or explanation to the problem which is specific to me.
P.S.S. Please don't suggest me to use XSLT if it isn't compulsory to parse such CSV data to XML. If there is no other choice then I would have to go learn XSLT as I have very little knowledge about XSLT. would appreciate it a lot if you would suggest changes in the code I have already written.
It seems your CSV content have no newline separators.
Related
How to convert xml file to HashMap using apache Tika
In my case i am able to read the xml file and parse it to get content as of meta data only provides the type of file which is "application/xml" import java.io.File; import java.io.FileInputStream; import java.io.IOException; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.ParseContext; import org.apache.tika.parser.xml.XMLParser; import org.apache.tika.sax.BodyContentHandler; import org.xml.sax.SAXException; public class XmlParserExample { public static void main(String[] args) throws IOException, SAXException, TikaException { BodyContentHandler handler = new BodyContentHandler(); XMLParser parser = new XMLParser(); Metadata metadata = new Metadata(); ParseContext pcontext = new ParseContext(); FileInputStream inputstream = new FileInputStream(new File("example.xml")); parser.parse(inputstream, handler, metadata, pcontext); System.out.println("Contents of the document:" + handler.toString()); System.out.println("Metadata of the document:"); String[] metadataNames = metadata.names(); for(String name : metadataNames) { System.out.println(name + ": " + metadata.get(name)); } } } Above snippet of code prints the whole xml content and Content Type (as metadata).But i also want to fetch the xml tags as well so that i can create a HashMap which is requirement in my case. Below is my Dummy example.xml:- <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE PubmedArticleSet SYSTEM "http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd"> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Status="MEDLINE" Owner="NLM"> <PMID Version="1">27483086</PMID> <DateCompleted> <Year>2018</Year> <Month>05</Month> <Day>02</Day> </DateCompleted> <DateRevised> <Year>2018</Year> <Month>05</Month> <Day>02</Day> </DateRevised> <Article PubModel="Print-Electronic"> <Journal> <ISSN IssnType="Electronic">1532-849X</ISSN> <JournalIssue CitedMedium="Internet"> <Volume>26</Volume> <Issue>4</Issue> <PubDate> <Year>2017</Year> <Month>Jun</Month> </PubDate> </JournalIssue> <Title>Journal of prosthodontics : official journal of the American College of Prosthodontists</Title> <ISOAbbreviation>J Prosthodont</ISOAbbreviation> </Journal> <ArticleTitle>The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.</ArticleTitle> <Pagination> <MedlinePgn>321-326</MedlinePgn> </Pagination> <ELocationID EIdType="doi" ValidYN="Y">10.1111jopr.12525</ELocationID> <Abstract> <AbstractText>The fabrication of a survey crown under an existing partial removable dental prosthesis (PRDP) has always been a challenge to many dental practitioners. This clinical report presents a technique for fabricating accurate cast gold survey crowns to fit existing PRDPs using CAD/CAM technology. The report describes a technique that would digitally scan the coronal anatomy of a cast gold survey crown and an abutment tooth under existing PRDPs planned for restoration, prior to any preparation. The information is stored in the digital software where all the coronal anatomical details are preserved without any modifications. The scanned designs are then applied to the scanned teeth preparations, sent to the milling machine and milled into full-contour clear acrylic resin burn-out patterns. The acrylic resin patterns are tried in the patient's mouth the same day to verify the full seating of the PRDP components. The patterns are then invested and cast into gold crowns and cemented in the conventional manner.</AbstractText> <CopyrightInformation>© 2016 by the American College of Prosthodontists.</CopyrightInformation> </Abstract> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>El Kerdani</LastName> <ForeName>Tarek</ForeName> <Initials>T</Initials> <AffiliationInfo> <Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>Roushdy</LastName> <ForeName>Sally</ForeName> <Initials>S</Initials> <AffiliationInfo> <Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation> </AffiliationInfo> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType UI="D002363">Case Reports</PublicationType> <PublicationType UI="D016428">Journal Article</PublicationType> </PublicationTypeList> <ArticleDate DateType="Electronic"> <Year>2016</Year> <Month>08</Month> <Day>02</Day> </ArticleDate> </Article> <MedlineJournalInfo> <Country>United States</Country> <MedlineTA>J Prosthodont</MedlineTA> <NlmUniqueID>9301275</NlmUniqueID> <ISSNLinking>1059-941X</ISSNLinking> </MedlineJournalInfo> <ChemicalList> <Chemical> <RegistryNumber>7440-57-5</RegistryNumber> <NameOfSubstance UI="D006046">Gold</NameOfSubstance> </Chemical> </ChemicalList> <CitationSubset>D</CitationSubset> <MeshHeadingList> <MeshHeading> <DescriptorName UI="D000368" MajorTopicYN="N">Aged</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D017076" MajorTopicYN="Y">Computer-Aided Design</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D003442" MajorTopicYN="Y">Crowns</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D000044" MajorTopicYN="N">Dental Abutments</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D017267" MajorTopicYN="Y">Dental Prosthesis Design</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D003832" MajorTopicYN="Y">Denture, Partial, Removable</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D006046" MajorTopicYN="N">Gold</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName> </MeshHeading> </MeshHeadingList> <KeywordList Owner="NOTNLM"> <Keyword MajorTopicYN="N">CADM</Keyword> <Keyword MajorTopicYN="N">cast gold</Keyword> <Keyword MajorTopicYN="N">milled acrylic resin patterns</Keyword> </KeywordList> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="accepted"> <Year>2016</Year> <Month>06</Month> <Day>13</Day> </PubMedPubDate> <PubMedPubDate PubStatus="pubmed"> <Year>2016</Year> <Month>8</Month> <Day>3</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2018</Year> <Month>5</Month> <Day>3</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="entrez"> <Year>2016</Year> <Month>8</Month> <Day>3</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>ppublish</PublicationStatus> <ArticleIdList> <ArticleId IdType="pubmed">27483086</ArticleId> <ArticleId IdType="doi">10.111pr.12525</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> <PubmedArticle> <MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM"> <PMID Version="1">27483087</PMID> <DateCompleted> <Year>2018</Year> <Month>08</Month> <Day>07</Day> </DateCompleted> <DateRevised> <Year>2018</Year> <Month>08</Month> <Day>07</Day> </DateRevised> <Article PubModel="Print-Electronic"> <Journal> <ISSN IssnType="Electronic">2326-5205</ISSN> <JournalIssue CitedMedium="Internet"> <Volume>68</Volume> <Issue>11</Issue> <PubDate> <Year>2016</Year> <Month>11</Month> </PubDate> </JournalIssue> <Title>Arthritis & rheumatology (Hoboken, N.J.)</Title> </Journal> <ArticleTitle>Reply.</ArticleTitle> <Pagination> <MedlinePgn>2826-2827</MedlinePgn> </Pagination> <ELocationID EIdType="doi" ValidYN="Y">10t.39831</ELocationID> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Hitchon</LastName> <ForeName>Carol Ann</ForeName> <Initials>CA</Initials> <AffiliationInfo> <Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>Koppejan</LastName> <ForeName>Hester</ForeName> <Initials>H</Initials> <AffiliationInfo> <Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>Trouw</LastName> <ForeName>Leendert A</ForeName> <Initials>LA</Initials> <AffiliationInfo> <Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>Huizinga</LastName> <ForeName>Tom J W</ForeName> <Initials>TJ</Initials> <AffiliationInfo> <Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>Toes</LastName> <ForeName>René E M</ForeName> <Initials>RE</Initials> <AffiliationInfo> <Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation> </AffiliationInfo> </Author> <Author ValidYN="Y"> <LastName>El-Gabalawy</LastName> <ForeName>Hani S</ForeName> <Initials>HS</Initials> <AffiliationInfo> <Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation> </AffiliationInfo> </Author> </AuthorList> <Language>eng</Language> <GrantList CompleteYN="Y"> <Grant> <GrantID>MOP‐77700</GrantID> <Agency>CIHR</Agency> <Country>Canada</Country> </Grant> </GrantList> <PublicationTypeList> <PublicationType UI="D016422">Letter</PublicationType> <PublicationType UI="D013485">Research Sup</PublicationType> <PublicationType UI="D016420">Comment</PublicationType> </PublicationTypeList> <ArticleDate DateType="Electronic"> <Year>2016</Year> <Month>10</Month> <Day>09</Day> </ArticleDate> </Article> <MedlineJournalInfo> <Country>United States</Country> <MedlineTA>Arthritis Rheumatol</MedlineTA> <NlmUniqueID>101623795</NlmUniqueID> <ISSNLinking>2326-5191</ISSNLinking> </MedlineJournalInfo> <CommentsCorrectionsList> <CommentsCorrections RefType="CommentOn"> <RefSource>dff</RefSource> <PMID Version="1">27483211</PMID> </CommentsCorrections> <CommentsCorrections RefType="CommentOn"> <RefSource>Arthritis Rheumato</RefSource> <PMID Version="1">26946484</PMID> </CommentsCorrections> </CommentsCorrectionsList> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="received"> <Year>2016</Year> <Month>07</Month> <Day>26</Day> </PubMedPubDate> <PubMedPubDate PubStatus="accepted"> <Year>2016</Year> <Month>07</Month> <Day>28</Day> </PubMedPubDate> <PubMedPubDate PubStatus="pubmed"> <Year>2016</Year> <Month>10</Month> <Day>28</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2016</Year> <Month>10</Month> <Day>28</Day> <Hour>6</Hour> <Minute>1</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="entrez"> <Year>2016</Year> <Month>8</Month> <Day>3</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>ppublish</PublicationStatus> <ArticleIdList> <ArticleId IdType="pubmed">27483087</ArticleId> <ArticleId IdType="doi">efre</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet> Kindly help me out on this. Thanks
My suggestion: If you want to read an XML file, and then parse its contents, you are probably better off using a purpose-built XML parser, rather than Tika. There are various options - each with its own pros and cons (for example speed, memory consumption). Here is one approach - it reads the entire file into memory, but you already do that with your Tika approach, so I assume file size is not a problem. The code assumes there is a file called pubmed.xml which contains the XML presented in the question. It reads the XML from file, and handles each element as a DOM node. import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.w3c.dom.Node; import org.w3c.dom.Element; import java.io.File; ... public void parseUsingDom() { try { File xmlFile = new File("C:/tmp/pubmed.xml"); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(xmlFile); doc.getDocumentElement().normalize(); NodeList articles = doc.getElementsByTagName("Article"); for (int i = 0; i < articles.getLength(); i++) { Node article = articles.item(i); if (article.getNodeType() == Node.ELEMENT_NODE) { Element articleElement = (Element) article; String title = articleElement .getElementsByTagName("ArticleTitle") .item(0).getTextContent(); System.out.println(""); System.out.println("Title : " + title); NodeList authors = articleElement.getElementsByTagName("Author"); for (int j = 0; j < authors.getLength(); j++) { Node author = authors.item(j); if (author.getNodeType() == Node.ELEMENT_NODE) { Element authorElement = (Element) author; String foreName = authorElement .getElementsByTagName("ForeName") .item(0).getTextContent(); String lastName = authorElement .getElementsByTagName("LastName") .item(0).getTextContent(); System.out.println("Author : " + lastName + ", " + foreName); } } } } } catch (Exception e) { System.err.print(e); } } The program prints the following output, just as a demo of what is possible: Title : The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report. Author : El Kerdani, Tarek Author : Roushdy, Sally Title : Reply. Author : Hitchon, Carol Ann Author : Koppejan, Hester Author : Trouw, Leendert A Author : Huizinga, Tom J W Author : Toes, René E M Author : El-Gabalawy, Hani S In your case, you would capture the relevant values in a hash map, of course.
VTD-XML reading gives no results
I am trying to read a RSS content using VTD-XML. Below is a sample of RSS. <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"> <?xml-stylesheet type="text/xsl" href="rss.xsl"?> <channel> <title>MyRSS</title> <atom:link href="http://www.example.com/rss.php" rel="self" type="application/rss+xml" /> <link>http://www.example.com/rss.php</link> <description>MyRSS</description> <language>en-us</language> <pubDate>Tue, 22 May 2018 13:15:15 +0530</pubDate> <item> <title>Title 1</title> <pubDate>Tue, 22 May 2018 13:14:40 +0530</pubDate> <link>http://www.example.com/news.php?nid=47610</link> <guid>http://www.example.com/news.php?nid=47610</guid> <description>bla bla bla</description> </item> </channel> </rss> Anyway as you know, some RSS feeds can contain more styling info etc. However in every RSS, the <channel> and <item> will be common, at least for the ones I need to use. I tried VTD XML to read this as quickly as possible. Below is the code. VTDGen vg = new VTDGen(); if (vg.parseHttpUrl(appDataBean.getUrl(), true)) { VTDNav vn = vg.getNav(); AutoPilot ap = new AutoPilot(vn); ap.selectXPath("/channel/item"); int result = -1; while ((result = ap.evalXPath()) != -1) { if (vn.matchElement("item")) { do { //do something with the contnets in the item node Log.d("VTD", vn.toString(vn.getText())); } while (vn.toElement(VTDNav.NEXT_SIBLING)); } } } Unfortunately this did not print anything. What am I doing wrong here? Also non of the RSS feeds are very big, so I need to read them in couple of miliseconds. This code is on Android.
XPath Java parsing xml document under more conditions
XPath Java parsing xml under more conditions I need to show elements from books.xml which satisfy next two conditions: price > 10 and publish_date > "2006-12-31" . books.xml is: <?xml version='1.0'?> <catalog> <book id='bk110'> <author>O'Brien, Tim</author> <title>Microsoft .NET: The Programming Bible</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2006-12-09</publish_date> <description>Microsoft's .NET initiative is explored in detail in this deep programmer's reference.</description> </book> <book id='bk111'> <author>O'Brien, Tim</author> <title>MSXML3: A Comprehensive Guide</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2007-12-01</publish_date> <description>The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.</description> </book> <book id='bk112'> <author>Galos, Mike</author> <title>Visual Studio 7: A Comprehensive Guide</title> <genre>Computer</genre> <price>49.95</price> <publish_date>2008-04-16</publish_date> <description>Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment.</description> </book> </catalog> When I try this code: package web.services; import java.io.File; import java.io.*; import java.util.logging.Level; import java.util.logging.Logger; import javax.xml.xpath.*; import org.xml.sax.*; import org.w3c.dom.*; public class WebServices { private static void showElements() { InputSource inputSource = null; Object result; NodeList nodeList = null; String file; String workingDir = System.getProperty("user.dir"); file="data"+File.separator+"books.xml"; try { XPathFactory factory = XPathFactory.newInstance(); XPath xPath = factory.newXPath(); XPathExpression xPathExpression = xPath.compile("//book[price > 10][xs:date(publish_date) > xs:date('2005-12-31')]/*/text()"); File xmlDocument = new File(file); try { inputSource = new InputSource(new FileInputStream(xmlDocument)); } catch (FileNotFoundException ex) { Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex); } result = xPathExpression.evaluate(inputSource, XPathConstants.NODESET); nodeList = (NodeList) result; } catch (XPathExpressionException ex) { Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex); } for (int i = 0; i < nodeList.getLength(); i++) { System.out.print("Node name: " + nodeList.item(i).getNodeName()); System.out.print(" | "); System.out.println("Node value: " + nodeList.item(i).getNodeValue()); System.out.println("------------------------------------------------"); } } /** * #param args the command line arguments */ public static void main(String[] args) { // TODO code application logic here showElements(); } } I'm getting this error: maj 27, 2015 10:01:19 AM web.services.WebServices showElements SEVERE: null javax.xml.transform.TransformerException: Unknown error in XPath. at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303) at web.services.WebServices.showElements(WebServices.java:39) at web.services.WebServices.main(WebServices.java:58) Caused by: java.lang.NullPointerException at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210) at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157) at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481) at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374) at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197) at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344) at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503) at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339) ... 6 more --------- java.lang.NullPointerException at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210) at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157) at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481) at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374) at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197) at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344) at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503) at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303) at web.services.WebServices.showElements(WebServices.java:39) at web.services.WebServices.main(WebServices.java:58) --------------- linked to ------------------ javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Unknown error in XPath. at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:305) at web.services.WebServices.showElements(WebServices.java:39) at web.services.WebServices.main(WebServices.java:58) Caused by: javax.xml.transform.TransformerException: Unknown error in XPath. at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109) at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303) ... 2 more Caused by: java.lang.NullPointerException at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210) at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157) at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344) at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481) at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374) at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197) at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344) at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503) at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339) ... 6 more Exception in thread "main" java.lang.NullPointerException at web.services.WebServices.showElements(WebServices.java:45) at web.services.WebServices.main(WebServices.java:58) Java Result: 1 BUILD SUCCESSFUL (total time: 2 seconds) What is wrong? Thank you!
You are trying to use XPath 2.0 data types like xs:date while the XPath implementation in the Oracle JRE only supports XPath 1.0 which does not know any such data types. For that particular path expression it should be possible to use XPath 1.0 and simple number comparison with a path like //book[price > 10][number(translate(publish_date, '-', '')) > 20051231]. If you want to use XPath 2.0 you need to look into third party libraries like Saxon 9 or into XQuery implementations (as XPath 2.0 is a subset of XQuery 1.0).
How to read xml file in java
How to read XML file in java. Below is my XML file: <?xml version="1.0" encoding="utf-8"?> <LivescoreData> <Sport SportId="1"> <Name language="en">Soccer</Name> <Name language="se">Fotboll</Name> <Category CategoryId="34"> <Name language="en">Australia</Name> <Name language="se">Australien</Name> <Tournament TournamentId="144"> <Name language="en">Hyundai A-League</Name> <Name language="se">Hyundai A-League</Name> <Match MatchId="4616735"> <MatchDate>2011-01-05T07:30:00</MatchDate> <Team1 TeamId="1029369"> <Name language="en">Wellington Phoenix FC</Name> <Name language="se">Wellington</Name> </Team1> <Team2 TeamId="529088"> <Name language="en">Melbourne Victory</Name> <Name language="se">Melbourne Victory</Name> </Team2> <Status Code="100"> <Name language="en">Ended</Name> <Name language="se">Avslutad</Name> </Status> <Winner>1</Winner> <Scores> <Score type="Current"> <Team1>2</Team1> <Team2>0</Team2> </Score> </Scores> <Goals></Goals> <Cards></Cards> <Substitutions></Substitutions> <Lineups></Lineups> </Match> </Tournament> </Category> <Category CategoryId="1"> <Name language="en">England</Name> <Name language="se">England</Name> <Tournament TournamentId="1"> <Name language="en">Premier League</Name> <Name language="se">Premier League</Name> <Match MatchId="4601857"> <MatchDate>2011-01-04T21:00:00</MatchDate> <Team1 TeamId="5431228"> <Name language="en">Blackpool FC</Name> <Name language="se">Blackpool FC</Name> </Team1> <Team2 TeamId="23960"> <Name language="en">Birmingham City</Name> <Name language="se">Birmingham City</Name> </Team2> <Status Code="100"> <Name language="en">Ended</Name> <Name language="se">Avslutad</Name> </Status> <Winner>1</Winner> <Scores> <Score type="Current"> <Team1>5</Team1> <Team2>1</Team2> </Score> </Scores> <Goals></Goals> <Cards></Cards> <Substitutions></Substitutions> <Lineups></Lineups> </Match> <Match MatchId="4601859"> <MatchDate>2011-01-04T21:00:00</MatchDate> <Team1 TeamId="26511"> <Name language="en">Fulham FC</Name> <Name language="se">Fulham FC</Name> </Team1> <Team2 TeamId="94356"> <Name language="en">West Bromwich Albion</Name> <Name language="se">West Bromwich Albion</Name> </Team2> <Status Code="100"> <Name language="en">Ended</Name> <Name language="se">Avslutad</Name> </Status> <Winner>1</Winner> <Scores> <Score type="Current"> <Team1>4</Team1> <Team2>1</Team2> </Score> </Scores> <Goals></Goals> <Cards></Cards> <Substitutions></Substitutions> <Lineups></Lineups> </Match> </Tournament> </Category> </Sport> </LivescoreData> Below is the code:IT print the first value of XML Soccer and not able to print the next one. nodeLst = doc.getElementsByTagName("Sport"); for (int i = 0; i < nodeLst.getLength(); i++) { Node myNode = nodeLst.item(i); if (myNode.getNodeType() == Node.ELEMENT_NODE) { Element Sport = (Element) myNode; NodeList Name= Sport .getElementsByTagName("Name"); Element NameElement = (Element) Name.item(0); NodeList Namevalue = NameElement.getChildNodes(); System.out.println("Name : " + ((Node) Namevalue.item(0)).getNodeValue()+"|"); //This gives me null value NodeList Category = Sport .getElementsByTagName("Category"); Element CategoryName= (Element) Category .item(0); NodeList Categoryvalue = CategoryName .getChildNodes(); System.out.println("Category: " + ((Node) Categoryvalue.item(0)) .getNodeValue()); } } Am able to read only the first data Soccer.The second one just give null value. I need my result as: Soccer | Australia | Hyundai A-League | Wellington Phoenix FC - Melbourne Victory : 2 - 0 Soccer | England | Premier League | Blackpool FC - Birmingham City : 5 - 1 Soccer | England | Premier League | Fulham FC - West Bromwich Albion : 4 - 1
You could use JAXP for parsing the XML
Since you haven't really said what you want to do with said XML file, the best I can do is direct you to this guide: http://tutorials.jenkov.com/java-xml/dom.html
It's much better/easier NOT to do this in Java, but to do it in XSLT or XQuery code which you can invoke from your Java application.
Cutting a XML with JAXB
I have the following xml: <Package> <PackageHeader> <name>External Vendor File</name> <description>External vendor file for some purpose</description> <version>3.141694baR3</version> </PackageHeader> <PackageBody> <Characteristic id="1"> <Size> <value>1.68</value> <scale>Meters</scale> <comment>Size can vary, depending on temperature</comment> </Size> <Weight> <value>9</value> <scale>M*Tons</scale> <comment>His mama is so fat, we had to use another scale</comment> </Weight> <rating> <ratingCompany>ISO</ratingCompany> <rating:details xmlns:rating="http://www.w3schools.com/ratingDetails"> <rating:value companyDepartment="Finance">A</rating:value> <rating:expirationDate update="1/12/2010">1/1/2014</rating:expirationDate> <rating:comment userID="z94234">You're not Silvia.</rating:comment> <rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment> <rating:comment userID="i77880">TWO!</rating:comment> <rating:priority>3</rating:priority> </rating:details> </rating> </Characteristic> <Characteristic id="2"> <Size/> <Weight/> <rating/> </Characteristic> ... <Characteristic id="n"/> </PackageBody> </Package> And the following Java code: public class XMLTest { public static void main(String[] args) throws Exception { Package currentPackage = new Package(); Package sourcePackage = new Package(); int totalCharacteristics; PackageBody currentPackageBody = new PackageBody(); Characteristic currentCharacteristic = new Characteristic(); rating currentRating = new rating(); FileInputStream fis = new FileInputStream("sourceFile.xml"); JAXBContext myCurrentContext = JAXBContext.newInstance(Package.class); Marshaller m = myCurrentContext.createMarshaller(); Unmarshaller um = myCurrentContext.createUnmarshaller(); sourcePackage = (Package)um.unmarshal(fis); currentPackage.setPackageHeader(sourcePackage.getPackageHeader()); totalCharacteristics = sourcePackage.getPackageBody().getCharacteristics().size(); for (int i = 0; i < totalCharacteristics; i++) { currentRating = sourcePackage.getPackageBody().getCharacteristics().get(i).getrating(); } currentCharacteristic.setrating(currentRating); currentPackageBody.getCharacteristics().add(currentCharacteristic); currentPackage.setPackageBody(currentPackageBody); m.marshal(currentPackage, new File("targetFile.xml")); fis.close(); } } Which gives me the next XML: <Package> <PackageHeader> <name>External Vendor File</name> <description>External vendor file for some purpose</description> <version>3.141694baR3</version> </PackageHeader> <PackageBody> <Characteristic id="1"> <rating> <ratingCompany>ISO</ratingCompany> <rating:details xmlns:rating="http://www.w3schools.com/ratingDetails"> <rating:value companyDepartment="Finance">A</rating:value> <rating:expirationDate update="1/12/2010">1/1/2014</rating:expirationDate> <rating:comment userID="z94234">You're not Silvia.</rating:comment> <rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment> <rating:comment userID="i77880">TWO!</rating:comment> <rating:priority>3</rating:priority> </rating:details> </rating> </Characteristic> <Characteristic id="2"> <rating/> </Characteristic> ... <Characteristic id="n"/> </PackageBody> </Package> And this is what I need: <Package> <PackageHeader> <name>External Vendor File</name> <description>External vendor file for some purpose</description> <version>3.141694baR3</version> </PackageHeader> <PackageBody> <Characteristic> <rating id="1"> <ratingCompany>ISO</ratingCompany> <rating:details xmlns:rating="http://www.w3schools.com/ratingDetails"> <rating:comment userID="z94234">You're not Silvia.</rating:comment> <rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment> <rating:comment userID="i77880">TWO!</rating:comment> <rating:priority>3</rating:priority> </rating:details> </rating> </Characteristic> <Characteristic> <rating id="2"/> </Characteristic> ... <Characteristic/> </PackageBody> </Package> But I have a few questions: How could I implement a way to read a 4GBs file? (for example, reading it with StAX). If I want to filter some tags from source to target(as in the last xml), would I have to assign them one by one to the targetFile? Is there any iterator that might allow me to go through all subnodes and assign them? If the sourceFile changes, would I need to rerun the xjc and recompile the whole project? Thanks.
For reading huge XML files, you definitely need a streaming parser like StAX. In addition, you can use a combination of JAXB to selectively map a given piece of xml to java object if you wish work with it. You need to regenerate your JAXB classes only if your schema changes. No need to regenerate if you application code changes.