How to convert xml file to HashMap using apache Tika - java

In my case i am able to read the xml file and parse it to get content as of meta data only provides the type of file which is "application/xml"
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.xml.XMLParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class XmlParserExample {
public static void main(String[] args) throws IOException, SAXException, TikaException {
BodyContentHandler handler = new BodyContentHandler();
XMLParser parser = new XMLParser();
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
FileInputStream inputstream = new FileInputStream(new File("example.xml"));
parser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Contents of the document:" + handler.toString());
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
}
}
Above snippet of code prints the whole xml content and Content Type (as metadata).But i also want to fetch the xml tags as well so that i can create a HashMap which is requirement in my case.
Below is my Dummy example.xml:-
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE PubmedArticleSet SYSTEM "http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">27483086</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1532-849X</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>26</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2017</Year>
<Month>Jun</Month>
</PubDate>
</JournalIssue>
<Title>Journal of prosthodontics : official journal of the American College of Prosthodontists</Title>
<ISOAbbreviation>J Prosthodont</ISOAbbreviation>
</Journal>
<ArticleTitle>The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.</ArticleTitle>
<Pagination>
<MedlinePgn>321-326</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1111jopr.12525</ELocationID>
<Abstract>
<AbstractText>The fabrication of a survey crown under an existing partial removable dental prosthesis (PRDP) has always been a challenge to many dental practitioners. This clinical report presents a technique for fabricating accurate cast gold survey crowns to fit existing PRDPs using CAD/CAM technology. The report describes a technique that would digitally scan the coronal anatomy of a cast gold survey crown and an abutment tooth under existing PRDPs planned for restoration, prior to any preparation. The information is stored in the digital software where all the coronal anatomical details are preserved without any modifications. The scanned designs are then applied to the scanned teeth preparations, sent to the milling machine and milled into full-contour clear acrylic resin burn-out patterns. The acrylic resin patterns are tried in the patient's mouth the same day to verify the full seating of the PRDP components. The patterns are then invested and cast into gold crowns and cemented in the conventional manner.</AbstractText>
<CopyrightInformation>© 2016 by the American College of Prosthodontists.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>El Kerdani</LastName>
<ForeName>Tarek</ForeName>
<Initials>T</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Roushdy</LastName>
<ForeName>Sally</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D002363">Case Reports</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>08</Month>
<Day>02</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Prosthodont</MedlineTA>
<NlmUniqueID>9301275</NlmUniqueID>
<ISSNLinking>1059-941X</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>7440-57-5</RegistryNumber>
<NameOfSubstance UI="D006046">Gold</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>D</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000368" MajorTopicYN="N">Aged</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017076" MajorTopicYN="Y">Computer-Aided Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003442" MajorTopicYN="Y">Crowns</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000044" MajorTopicYN="N">Dental Abutments</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017267" MajorTopicYN="Y">Dental Prosthesis Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003832" MajorTopicYN="Y">Denture, Partial, Removable</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006046" MajorTopicYN="N">Gold</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">CADM</Keyword>
<Keyword MajorTopicYN="N">cast gold</Keyword>
<Keyword MajorTopicYN="N">milled acrylic resin patterns</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>06</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>5</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483086</ArticleId>
<ArticleId IdType="doi">10.111pr.12525</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">27483087</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">2326-5205</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>68</Volume>
<Issue>11</Issue>
<PubDate>
<Year>2016</Year>
<Month>11</Month>
</PubDate>
</JournalIssue>
<Title>Arthritis & rheumatology (Hoboken, N.J.)</Title>
</Journal>
<ArticleTitle>Reply.</ArticleTitle>
<Pagination>
<MedlinePgn>2826-2827</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10t.39831</ELocationID>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Hitchon</LastName>
<ForeName>Carol Ann</ForeName>
<Initials>CA</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Koppejan</LastName>
<ForeName>Hester</ForeName>
<Initials>H</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Trouw</LastName>
<ForeName>Leendert A</ForeName>
<Initials>LA</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Huizinga</LastName>
<ForeName>Tom J W</ForeName>
<Initials>TJ</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Toes</LastName>
<ForeName>René E M</ForeName>
<Initials>RE</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>El-Gabalawy</LastName>
<ForeName>Hani S</ForeName>
<Initials>HS</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>MOP‐77700</GrantID>
<Agency>CIHR</Agency>
<Country>Canada</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016422">Letter</PublicationType>
<PublicationType UI="D013485">Research Sup</PublicationType>
<PublicationType UI="D016420">Comment</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>10</Month>
<Day>09</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Arthritis Rheumatol</MedlineTA>
<NlmUniqueID>101623795</NlmUniqueID>
<ISSNLinking>2326-5191</ISSNLinking>
</MedlineJournalInfo>
<CommentsCorrectionsList>
<CommentsCorrections RefType="CommentOn">
<RefSource>dff</RefSource>
<PMID Version="1">27483211</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="CommentOn">
<RefSource>Arthritis Rheumato</RefSource>
<PMID Version="1">26946484</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>07</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>07</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483087</ArticleId>
<ArticleId IdType="doi">efre</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
Kindly help me out on this.
Thanks

My suggestion: If you want to read an XML file, and then parse its contents, you are probably better off using a purpose-built XML parser, rather than Tika.
There are various options - each with its own pros and cons (for example speed, memory consumption).
Here is one approach - it reads the entire file into memory, but you already do that with your Tika approach, so I assume file size is not a problem.
The code assumes there is a file called pubmed.xml which contains the XML presented in the question.
It reads the XML from file, and handles each element as a DOM node.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
...
public void parseUsingDom() {
try {
File xmlFile = new File("C:/tmp/pubmed.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
NodeList articles = doc.getElementsByTagName("Article");
for (int i = 0; i < articles.getLength(); i++) {
Node article = articles.item(i);
if (article.getNodeType() == Node.ELEMENT_NODE) {
Element articleElement = (Element) article;
String title = articleElement
.getElementsByTagName("ArticleTitle")
.item(0).getTextContent();
System.out.println("");
System.out.println("Title : " + title);
NodeList authors = articleElement.getElementsByTagName("Author");
for (int j = 0; j < authors.getLength(); j++) {
Node author = authors.item(j);
if (author.getNodeType() == Node.ELEMENT_NODE) {
Element authorElement = (Element) author;
String foreName = authorElement
.getElementsByTagName("ForeName")
.item(0).getTextContent();
String lastName = authorElement
.getElementsByTagName("LastName")
.item(0).getTextContent();
System.out.println("Author : " + lastName + ", " + foreName);
}
}
}
}
} catch (Exception e) {
System.err.print(e);
}
}
The program prints the following output, just as a demo of what is possible:
Title : The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.
Author : El Kerdani, Tarek
Author : Roushdy, Sally
Title : Reply.
Author : Hitchon, Carol Ann
Author : Koppejan, Hester
Author : Trouw, Leendert A
Author : Huizinga, Tom J W
Author : Toes, René E M
Author : El-Gabalawy, Hani S
In your case, you would capture the relevant values in a hash map, of course.

Related

Inlining SVG's not working on Unix Server but External SVG's ok

We are using Apache FOP to generate PDF's & have an issue with SVG's.
To include an SVG we're using something like the following...
<fo:block>
<fo:external-graphic src="classpath:image/MyImage.svg" content-width="150mm"/>
</fo:block>
The above works fine in all environments.
Now I'm trying to inline an SVG in the Stylesheet, like this...
<fo:block>
<fo:instream-foreign-object content-width="272.6mm">
<svg xmlns="http://www.w3.org/2000/svg" width="780" height="120" viewBox="0 0 780 120">
<g style="fill-opacity:0;stroke-width:2;stroke:black">
<rect x="2" y="2" width="254" height="99"/>
<rect x="256" y="2" width="485" height="99"/>
</g>
</svg>
</fo:instream-foreign-object>
</fo:block>
That works OK under Windows, but when deployed on our Linux Server, seems to do nothing.
I have read some comments on the Apache FOP Website about it using Apache Batik to render SVG's and that this requires a Graphical Environment, so will not work in many Unix configurations.
What I don't understand is, how come the external SVG is working ok on the Unix Server & inline is not?
Also, they recommend a Tool called PJA toolkit to workaround this issue, but it looks very dated, so I wonder if its going to work with our JDK 17.
I would be grateful if anyone has some Info about this.
Previously we hadn't noticed the symptom, which was a spurious empty Namespace entry on the <g> Element, meaning it no longer belonged to the SVG Namespace:
<svg xmlns="http://www.w3.org/2000/svg" height="775" viewBox="0 0 1184 775" width="1184">
<g xmlns="" style="fill-opacity:0;stroke-width:7;stroke:red">
The problem turned out to be Namespace Awareness when parsing the Stylesheet.
When enabled, the generated inline SVG works fine.
Here's a Java Source to illustrate the Solution.
(it uses Multiline Strings which require JDK >= 14)
Just look for the TODO & (un)comment that line to see the difference:
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.*;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class SvgNamespaceDomSimple {
private static final String DATA_XML = """
<?xml version="1.0" encoding="UTF-8"?><dataXml/>""";
private static final String STYLESHEET = """
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="3.0">
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="singlePage" page-width="297mm" page-height="210mm">
<fo:region-body margin-left="2.19mm" margin-top="6.75mm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="singlePage">
<fo:flow flow-name="xsl-region-body">
<fo:block>
<fo:instream-foreign-object content-width="272.6mm" content-type="image/svg+xml">
<svg xmlns="http://www.w3.org/2000/svg" width="1184" height="775" viewBox="0 0 1184 775">
<g style="fill-opacity:0;stroke-width:7;stroke:black">
<rect width="254" height="99"/>
</g>
</svg>
</fo:instream-foreign-object>
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
""";
private static Document parse(final byte[] byteArray) throws ParserConfigurationException, SAXException, IOException {
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
; dbf.setNamespaceAware(true); // TODO setNamespaceAware(true)
final DocumentBuilder dbd = dbf.newDocumentBuilder();
final Document doc = dbd.parse(new ByteArrayInputStream(byteArray));
System.out.println("DocumentBuilderFactory.: " + dbf.getClass());
System.out.println("DocumentBuilder........: " + dbd.getClass());
System.out.println("Document...............: " + doc.getClass());
return doc;
}
public static void main(final String[] args) throws Exception {
System .out.println("Data XML...............:" + '\n' + DATA_XML + '\n');
System .out.println("Stylesheet.............:" + '\n' + STYLESHEET);
try(final ByteArrayInputStream ist = new ByteArrayInputStream (DATA_XML.getBytes());
final ByteArrayOutputStream ost = new ByteArrayOutputStream();)
{
final Document stylesheetDoc = parse(STYLESHEET.getBytes());
final TransformerFactory transformerFactory = TransformerFactory.newInstance();
final Templates templates = transformerFactory.newTemplates(new DOMSource(stylesheetDoc));
final Transformer transformer = templates.newTransformer();
transformer.transform(new StreamSource(ist), new StreamResult(ost));
final String foString = new String(ost.toByteArray());
System.out.println("TransformerFactory.....: " + transformerFactory.getClass());
System.out.println("FO Bytes...............:" + '\n' + foString);
System.out.println( foString.replace(">", ">" + '\n'));
}
}
}
The resulting inline SVG then looked something like this & was rendered fine both under Windows & on our Linux Server:
<fo:instream-foreign-object content-width="272.6mm" content-type="image/svg+xml">
<svg xmlns="http://www.w3.org/2000/svg" width="1184" height="775" viewBox="0 0 1184 775">
<g style="fill-opacity:0;stroke-width:7;stroke:black">
<rect width="254" height="99"/>
</g>
</svg>
</fo:instream-foreign-object>
As a workaround, you can specify the Namespace explicitly as follows:
<fo:instream-foreign-object content-width="272.6mm" content-type="image/svg+xml">
<svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="1184" height="775" viewBox="0 0 1184 775">
<svg:g style="fill-opacity:0;stroke-width:2;stroke:black">
<svg:rect width="254" height="99"/>
</svg:g>
</svg:svg>
</fo:instream-foreign-object>

CSV to XML conversion Java

I was working on converting CSV to XML data. By looking at various examples I was able to write the code for parsing the CSV file and getting the XML file. However, the code I have written returns the XML file with incorrect tags.
This is the Code for Conversion:
package com.adarsh.parse;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class Converter {
/* Protected members to avoid instantiation */
protected DocumentBuilderFactory domFactory = null;
protected DocumentBuilder domBuilder = null;
/* Constant strings */
// Input CSV file
final String INPUT_FILE = "sample_data.csv";
// Output XML document
final String OUTPUT_FILE ="in.xml";
// First element in the XML document
final String FIRST_ELEMENT="school";
public Converter(){
try {
domFactory = DocumentBuilderFactory.newInstance();
/* Obtaining instance of class DocumentBuilder */
domBuilder = domFactory.newDocumentBuilder();
}
catch(ParserConfigurationException exp) {
System.err.println(exp.toString());
}
catch(FactoryConfigurationError exp){
System.err.println(exp.toString());
}
catch(Exception exp){
System.err.println(exp.toString());
}
}
/**
* This method converts the given CSV file into an XML document
*/
public int convert(String csvFileName, String xmlFileName) {
int rowCount = -1;
try {
/* Initializing the XML document */
Document newDoc = domBuilder.newDocument();
/* Creating the root element in the XML */
Element rootElem = newDoc.createElement(FIRST_ELEMENT);
newDoc.appendChild(rootElem);
/* Reading the CSV file */
BufferedReader csvFileReader;
csvFileName = INPUT_FILE;
csvFileReader = new BufferedReader(new FileReader(csvFileName));
/* Initialize the number of fields to 0 */
int fieldCount = 0;
String[] csvFields = null;
StringTokenizer stringTokenizer = null;
/**
* Map the column names in the CSV file as the elements in the XML
* document, eliminate any other characters not eligible for XML element
* naming
*/
/* Initialize the current line variable */
String currLine = csvFileReader.readLine();
/* Loop until we reach the end of the file
* edge case: Empty CSV file
* */
if(currLine != null) {
/* Separate fields based on commas */
stringTokenizer = new StringTokenizer(currLine, ",");
fieldCount = stringTokenizer.countTokens();
/* If there is data in the CSV file */
if(fieldCount > 0) {
/* Initialize a String Array of Fields */
csvFields = new String[fieldCount];
int i = 0;
/* Loop till all elements are found and save fields */
while (stringTokenizer.hasMoreElements()) {
csvFields[i++] = String.valueOf(stringTokenizer.nextElement());
}
}
}
else {
System.out.println("Nothing to parse");
}
/* reading rows from the CSV file */
while((currLine = csvFileReader.readLine()) != null) {
stringTokenizer = new StringTokenizer(currLine, ",");
fieldCount = stringTokenizer.countTokens();
/* if rows exist in the CSV file*/
if(fieldCount > 0) {
/* Create the row element*/
Element rowElem = newDoc.createElement("row");
int i = 0;
/* until there are more elements*/
while(stringTokenizer.hasMoreElements()) {
try {
/* Append each element found to each row element*/
String currValue = String.valueOf(stringTokenizer.nextElement());
Element currElem = newDoc.createElement(csvFields[i++]);
currElem.appendChild(newDoc.createTextNode(currValue));
rowElem.appendChild(currElem);
}
catch(Exception exp) {
}
}
/* Append the rows to the root element*/
rootElem.appendChild(rowElem);
rowCount++;
}
}
/* Finish reading the CSV file */
csvFileReader.close();
/* Saving the generated XML doc into required format file to disk */
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
xmlFileName = OUTPUT_FILE;
Result dest = new StreamResult(new File(xmlFileName));
aTransformer.transform(src, dest);
rowCount++;
}
catch(IOException exp) {
System.err.println(exp.toString());
}
catch(Exception exp) {
System.err.println(exp.toString());
}
/* Number of rows parsed into XML */
return rowCount;
}
}
This is the sample CSV data in the file:
classroom_id,classroom_name,teacher_1_id,teacher_1_last_name,teacher_1_first_name,teacher_2_id,teacher_2_last_name,teacher_2_first_name,student_id,student_last_name,student_first_name,student_grade
103, Brian's Homeroom, 10300000001, O'Donnell, Brian, , , , , , , 102,
Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011,
Patterson, John, 10200000011, McCrancy, Brandon, 1 102, Mr. Smith's
PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson,
John, 10200000018, Reginald, Alexis, 1 102, Mr. Smith's PhysEd Class,
10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000019,
Gayle, Matthew, 1 102, Mr. Smith's PhysEd Class, 10200000001, Smith,
Arthur, 10200000011, Patterson, John, 10200000010, Smith, Nathaniel, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur,
10200000011, Patterson, John, 10200000013, Lanni, Erica, 1 102, Mr.
Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011,
Patterson, John, 10200000014, Flores, Michael, 1 102, Mr. Smith's
PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson,
John, 10200000012, Marco, Elizabeth, 1 102, Mr. Smith's PhysEd Class,
10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000016,
Perez, Brittany, 1 102, Mr. Smith's PhysEd Class, 10200000001, Smith,
Arthur, 10200000011, Patterson, John, 10200000015, Hill, Jasmin, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur,
10200000011, Patterson, John, 10200000017, Hiram, William, 1 101, Mrs.
Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000015,
Cruz, Alex, 1 101, Mrs. Jones' Math Class, 10100000001, Jones,
Barbara, , , , 10100000014, Garcia, Lizzie, 1 101, Mrs. Jones' Math
Class, 10100000001, Jones, Barbara, , , , 10100000013, Mercado, Toby,
1 101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , ,
10100000011, Gutierrez, Kimberly, 2 101, Mrs. Jones' Math Class,
10100000001, Jones, Barbara, , , , 10100000010, Gil, Michael, 2
I was expecting to get the output as following in XML file:
<grade id="1">
<classroom id="101" name="Mrs. Jones' Math Class">
<teacher id="10100000001" first_name="Barbara" last_name="Jones"/>
<student id="10100000010" first_name="Michael" last_name="Gil"/>
<student id="10100000011" first_name="Kimberly" last_name="Gutierrez"/>
<student id="10100000013" first_name="Toby" last_name="Mercado"/>
<student id="10100000014" first_name="Lizzie" last_name="Garcia"/>
<student id="10100000015" first_name="Alex" last_name="Cruz"/>
</classroom>
<classroom id="102" name="Mr. Smith's PhysEd Class">
<teacher id="10200000001" first_name="Arthur" last_name="Smith"/>
<teacher id="10200000011" first_name="John" last_name="Patterson"/>
<student id="10200000010" first_name="Nathaniel" last_name="Smith"/>
<student id="10200000011" first_name="Brandon" last_name="McCrancy"/>
<student id="10200000012" first_name="Elizabeth" last_name="Marco"/>
<student id="10200000013" first_name="Erica" last_name="Lanni"/>
<student id="10200000014" first_name="Michael" last_name="Flores"/>
<student id="10200000015" first_name="Jasmin" last_name="Hill"/>
<student id="10200000016" first_name="Brittany" last_name="Perez"/>
<student id="10200000017" first_name="William" last_name="Hiram"/>
<student id="10200000018" first_name="Alexis" last_name="Reginald"/>
<student id="10200000019" first_name="Matthew" last_name="Gayle"/>
</classroom>
<classroom id="103" name="Brian's Homeroom">
<teacher id="10300000001" first_name="Brian" last_name="O'Donnell"/>
</classroom>
</grade>
This is how I am currently getting the output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<school>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>101</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>102</classroom_id>
</row>
<row>
<classroom_id>103</classroom_id>
</row>
</school>
So could someone please help me with this? I was wondering where I am going wrong. Thanks
P.S. I have already referred to other question regarding CSV to XML conversions here on stackoverflow. However, I wasn't able to find a suitable solution or explanation to the problem which is specific to me.
P.S.S. Please don't suggest me to use XSLT if it isn't compulsory to parse such CSV data to XML. If there is no other choice then I would have to go learn XSLT as I have very little knowledge about XSLT. would appreciate it a lot if you would suggest changes in the code I have already written.
It seems your CSV content have no newline separators.

XPath Java parsing xml document under more conditions

XPath Java parsing xml under more conditions
I need to show elements from books.xml which satisfy next two
conditions: price > 10 and publish_date > "2006-12-31" . books.xml is:
<?xml version='1.0'?>
<catalog>
<book id='bk110'>
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2006-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id='bk111'>
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2007-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id='bk112'>
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2008-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
When I try this code:
package web.services;
import java.io.File;
import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.xpath.*;
import org.xml.sax.*;
import org.w3c.dom.*;
public class WebServices {
private static void showElements() {
InputSource inputSource = null;
Object result;
NodeList nodeList = null;
String file;
String workingDir = System.getProperty("user.dir");
file="data"+File.separator+"books.xml";
try {
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression xPathExpression = xPath.compile("//book[price > 10][xs:date(publish_date) > xs:date('2005-12-31')]/*/text()");
File xmlDocument = new File(file);
try {
inputSource = new InputSource(new FileInputStream(xmlDocument));
} catch (FileNotFoundException ex) {
Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex);
}
result = xPathExpression.evaluate(inputSource, XPathConstants.NODESET);
nodeList = (NodeList) result;
} catch (XPathExpressionException ex) {
Logger.getLogger(WebServices.class.getName()).log(Level.SEVERE, null, ex);
}
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.print("Node name: " + nodeList.item(i).getNodeName());
System.out.print(" | ");
System.out.println("Node value: " + nodeList.item(i).getNodeValue());
System.out.println("------------------------------------------------");
}
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
showElements();
}
}
I'm getting this error:
maj 27, 2015 10:01:19 AM web.services.WebServices showElements
SEVERE: null
javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
... 6 more
---------
java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException: javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:305)
at web.services.WebServices.showElements(WebServices.java:39)
at web.services.WebServices.main(WebServices.java:58)
Caused by: javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:305)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:135)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:109)
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:303)
... 2 more
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:210)
at com.sun.org.apache.xpath.internal.Expression.execute(Expression.java:157)
at com.sun.org.apache.xpath.internal.operations.Operation.execute(Operation.java:111)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:344)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:481)
at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(AxesWalker.java:374)
at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(WalkingIterator.java:197)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(NodeSequence.java:344)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(NodeSequence.java:503)
at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:279)
at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:214)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:339)
... 6 more
Exception in thread "main" java.lang.NullPointerException
at web.services.WebServices.showElements(WebServices.java:45)
at web.services.WebServices.main(WebServices.java:58)
Java Result: 1
BUILD SUCCESSFUL (total time: 2 seconds)
What is wrong? Thank you!
You are trying to use XPath 2.0 data types like xs:date while the XPath implementation in the Oracle JRE only supports XPath 1.0 which does not know any such data types. For that particular path expression it should be possible to use XPath 1.0 and simple number comparison with a path like //book[price > 10][number(translate(publish_date, '-', '')) > 20051231].
If you want to use XPath 2.0 you need to look into third party libraries like Saxon 9 or into XQuery implementations (as XPath 2.0 is a subset of XQuery 1.0).

Parse Specific Elements DOM - Java

I believe this is a simple question but I am having trouble to find out how it works.
That's the XML file (from www.w3schools.com):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Edited by XMLSpy® -->
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
As you can see the book XQuery Kick Start has more than one author.
But I cant find a way to get the right number of authors.
Thats my code:
public static void main(String argv[]) throws ParserConfigurationException, SAXException, IOException {
File fXmlFile = new File("\books.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("book");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Category : " + eElement.getAttribute("category"));
System.out.println("Title : " + eElement.getElementsByTagName("title").item(0).getTextContent());
System.out.println("Author : " + eElement.getElementsByTagName("author").item(0).getTextContent());
System.out.println("Year : " + eElement.getElementsByTagName("year").item(0).getTextContent());
System.out.println("Price : " + eElement.getElementsByTagName("price").item(0).getTextContent());
}
}
But as Result I'll be getting only one author:
Root element :bookstore
----------------------------
Current Element :book
Categoria do Livro : cooking
Titulo : Everyday Italian
Autor : Giada De Laurentiis
Ano : 2005
Price : 30.00
Current Element :book
Categoria do Livro : children
Titulo : Harry Potter
Autor : J K. Rowling
Ano : 2005
Price : 29.99
Current Element :book
Categoria do Livro : web
Titulo : XQuery Kick Start
Autor : James McGovern
Ano : 2003
Price : 49.99
Current Element :book
Categoria do Livro : web
Titulo : Learning XML
Autor : Erik T. Ray
Ano : 2003
Price : 39.95
Does anyone knows a good method to get the right number of elements?
sorry about the long question, I didnt know how to express myself so I had to paste here
*I'm new to DOM*
You're are getting the first author always, as you're retrieving the first item of the nodelist
getElementsByTagName("author").item(0)
Try iterating them, as there could be more than one
for (int i = 0; i < eElement.getElementsByTagName("author").getLength(); i++)
System.out.println("Author : " +
eElement.getElementsByTagName("author").item(i).getTextContent());

Cutting a XML with JAXB

I have the following xml:
<Package>
<PackageHeader>
<name>External Vendor File</name>
<description>External vendor file for some purpose</description>
<version>3.141694baR3</version>
</PackageHeader>
<PackageBody>
<Characteristic id="1">
<Size>
<value>1.68</value>
<scale>Meters</scale>
<comment>Size can vary, depending on temperature</comment>
</Size>
<Weight>
<value>9</value>
<scale>M*Tons</scale>
<comment>His mama is so fat, we had to use another scale</comment>
</Weight>
<rating>
<ratingCompany>ISO</ratingCompany>
<rating:details xmlns:rating="http://www.w3schools.com/ratingDetails">
<rating:value companyDepartment="Finance">A</rating:value>
<rating:expirationDate update="1/12/2010">1/1/2014</rating:expirationDate>
<rating:comment userID="z94234">You're not Silvia.</rating:comment>
<rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment>
<rating:comment userID="i77880">TWO!</rating:comment>
<rating:priority>3</rating:priority>
</rating:details>
</rating>
</Characteristic>
<Characteristic id="2">
<Size/>
<Weight/>
<rating/>
</Characteristic>
...
<Characteristic id="n"/>
</PackageBody>
</Package>
And the following Java code:
public class XMLTest {
public static void main(String[] args) throws Exception {
Package currentPackage = new Package();
Package sourcePackage = new Package();
int totalCharacteristics;
PackageBody currentPackageBody = new PackageBody();
Characteristic currentCharacteristic = new Characteristic();
rating currentRating = new rating();
FileInputStream fis = new FileInputStream("sourceFile.xml");
JAXBContext myCurrentContext = JAXBContext.newInstance(Package.class);
Marshaller m = myCurrentContext.createMarshaller();
Unmarshaller um = myCurrentContext.createUnmarshaller();
sourcePackage = (Package)um.unmarshal(fis);
currentPackage.setPackageHeader(sourcePackage.getPackageHeader());
totalCharacteristics = sourcePackage.getPackageBody().getCharacteristics().size();
for (int i = 0; i < totalCharacteristics; i++)
{
currentRating = sourcePackage.getPackageBody().getCharacteristics().get(i).getrating();
}
currentCharacteristic.setrating(currentRating);
currentPackageBody.getCharacteristics().add(currentCharacteristic);
currentPackage.setPackageBody(currentPackageBody);
m.marshal(currentPackage, new File("targetFile.xml"));
fis.close();
}
}
Which gives me the next XML:
<Package>
<PackageHeader>
<name>External Vendor File</name>
<description>External vendor file for some purpose</description>
<version>3.141694baR3</version>
</PackageHeader>
<PackageBody>
<Characteristic id="1">
<rating>
<ratingCompany>ISO</ratingCompany>
<rating:details xmlns:rating="http://www.w3schools.com/ratingDetails">
<rating:value companyDepartment="Finance">A</rating:value>
<rating:expirationDate update="1/12/2010">1/1/2014</rating:expirationDate>
<rating:comment userID="z94234">You're not Silvia.</rating:comment>
<rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment>
<rating:comment userID="i77880">TWO!</rating:comment>
<rating:priority>3</rating:priority>
</rating:details>
</rating>
</Characteristic>
<Characteristic id="2">
<rating/>
</Characteristic>
...
<Characteristic id="n"/>
</PackageBody>
</Package>
And this is what I need:
<Package>
<PackageHeader>
<name>External Vendor File</name>
<description>External vendor file for some purpose</description>
<version>3.141694baR3</version>
</PackageHeader>
<PackageBody>
<Characteristic>
<rating id="1">
<ratingCompany>ISO</ratingCompany>
<rating:details xmlns:rating="http://www.w3schools.com/ratingDetails">
<rating:comment userID="z94234">You're not Silvia.</rating:comment>
<rating:comment userID="r24942">You're one of the Kung-Fu Creatures On The Rampage</rating:comment>
<rating:comment userID="i77880">TWO!</rating:comment>
<rating:priority>3</rating:priority>
</rating:details>
</rating>
</Characteristic>
<Characteristic>
<rating id="2"/>
</Characteristic>
...
<Characteristic/>
</PackageBody>
</Package>
But I have a few questions:
How could I implement a way to read a 4GBs file? (for example, reading it with StAX).
If I want to filter some tags from source to target(as in the last xml), would I have to assign them one by one to the targetFile? Is there any iterator that might allow me to go through all subnodes and assign them?
If the sourceFile changes, would I need to rerun the xjc and recompile the whole project?
Thanks.
For reading huge XML files, you definitely need a streaming parser like StAX. In addition, you can use a combination of JAXB to selectively map a given piece of xml to java object if you wish work with it. You need to regenerate your JAXB classes only if your schema changes. No need to regenerate if you application code changes.

Categories

Resources