How could I get the total number inside a java xml tap?

How could I get the total number inside a java xml tap? - java

For xml file view:
<?xml version="1.0"?>
<EXAMPLE DATE="20160830">
<SUB NUM="1">
<NAME>Peter</NAME>
</SUB>
<SUB NUM="2">
<NAME>Mary</NAME>
</SUB>
</EXAMPLE>
After I setup a NodeList for check the document,
I want it can be count the "NAME" Tap in each "SUB NUM="[x]""
For the code that I set for it:
NodeList nList= doc.getElementsByTagName("NUM"); // doc has been set correct and get successful
The nList.length will return "2" due to xml having 2 of the tap which is named as: "NUM", but I want to check each of the group only.
Is any Idea how could I get the length like:
SUB NUM [1] Found: [1] Length with tap name: [NAME]
SUB NUM [2] Found: [1] Length with tap name: [NAME]

This can be done as follows. Just the hashmap printing using a JAVA8 syntax. You should be able to iterate normally and print if you are not on 8.
import java.io.StringReader;
import java.util.HashMap;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class NumCountHandler extends DefaultHandler {
private HashMap<String, Integer> countOfNum = new HashMap<String, Integer>();
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("SUB")) {
String attributeNum = attributes.getValue("NUM");
// System.out.println("Here" + qName +"" + );
if (countOfNum.containsKey(attributeNum)) {
Integer count = countOfNum.get(attributeNum);
countOfNum.put(attributeNum, new Integer(count.intValue() + 1));
} else {
countOfNum.put(attributeNum, new Integer(1));
}
}
}
public static void main(String[] args) {
try {
String xml = "<EXAMPLE DATE=\"20160830\"> <SUB NUM=\"1\"> <NAME>Peter</NAME> </SUB> <SUB NUM=\"2\"> <NAME>Mary</NAME> </SUB></EXAMPLE>";
InputSource is = new InputSource(new StringReader(xml));
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
NumCountHandler userhandler = new NumCountHandler();
saxParser.parse(is, userhandler);
userhandler.countOfNum
.forEach((k, v) -> System.out.println("SUB NUM [" + k + "]" + "Length with tap name :[" + v + "]"));
} catch (Exception e) {
e.printStackTrace();
}
}
}
Prints
SUB NUM [1]Length with tap name :[1]
SUB NUM [2]Length with tap name :[1]

Related

How to programmatically fix an XML document based on maxLength restrictions in the schema

How can I programmatically fix the content of an XML document to conform with the maxLength restrictions of its schema (in this case: cut the content to 10 characters if longer)?
This very similar question asks how to insert default values based on the schema (unfortunately the answer was not detailed enough for me).
The API documentation of ValidatorHandler says:
ValidatorHandler checks if the SAX events follow the set of
constraints described in the associated Schema, and additionally it
may modify the SAX events (for example by adding default values, etc.)
I looked at usages of Schema.newValidatorHandler() and ValidatorHandler.setContentHandler() on tabnine.com, but I couldn't find any examples that modify the input stream.
Example Schema:
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="a">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="10" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
Example XML document:
<?xml version="1.0" encoding="UTF-8" ?>
<a xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd">0123456789x</a>
Example validation error:
cvc-maxLength-valid: Value '0123456789x' with length = '11' is not facet-valid with respect to maxLength '10' for type '#AnonType_a'.
Current attempts (this code uses the javax.xml APIs, but I am open to any solution at all):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import javax.xml.validation.ValidatorHandler;
public class Test {
public static void main(String[] args) throws Exception {
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new File("schema.xsd"));
// validation
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File("document.xml")));
// modify stream while parsing?
ValidatorHandler validatorHandler = schema.newValidatorHandler();
validatorHandler.setErrorHandler(?);
validatorHandler.setContentHandler(?);
validatorHandler.setDocumentLocator(?);
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
saxParser.parse(new File("document.xml"), ?); // only accepts DefaultHandler or HandlerBase
}
}

I managed to implement a solution based on Schema.newValidatorHandler(). I lost most time with the fact that SaxParser.parse() only accepts a DefaultHandler. To insert a custom ContentHandler, one has to use SaxParser.getXMLReader().setContentHandler().
I am aware that this proof of concept is not very robust, because it is parsing the validation error message to extract the maxLength schema information. So this solution is relying on a very specific SAX implementation.
I looked at schema aware XSLT transformation, but could not find any indication that the schema information can be accessed in the transformation expressions.
Writing my own specialized schema parser is still not completely off the table.
import java.io.IOException;
import java.io.StringReader;
import java.util.Map.Entry;
import java.util.SortedMap;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.XMLConstants;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.ValidatorHandler;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class TestFixMaxLength {
public static void main(String[] args) throws Exception {
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new StreamSource(TestFixMaxLength.class.getResourceAsStream("schema.xsd")));
// validation on original input should fail
// schema.newValidator().validate(new StreamSource(TestFixMaxLength.class.getResourceAsStream("input.xml")));
CustomContentHandler customContentHandler = new CustomContentHandler();
ValidatorHandler validatorHandler = schema.newValidatorHandler();
validatorHandler.setContentHandler(customContentHandler);
validatorHandler.setErrorHandler(customContentHandler);
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
saxParserFactory.setNamespaceAware(true);
SAXParser saxParser = saxParserFactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(validatorHandler);
xmlReader.parse(new InputSource(TestFixMaxLength.class.getResourceAsStream("input.xml")));
// not: saxParser.parse(TestFixMaxLength.class.getResourceAsStream("input.xml"), ???);
System.out.println();
System.out.println();
System.out.println(customContentHandler.m_outputBuilder.toString());
// validation on corrected input should pass
schema.newValidator().validate(new StreamSource(new StringReader(customContentHandler.m_outputBuilder.toString())));
}
/****************************************************************************************************************************************/
private static class CustomContentHandler extends DefaultHandler {
private StringBuilder m_outputBuilder = new StringBuilder();
private SortedMap<String, String> m_prefixMappings = new TreeMap<>();
private int m_lastValueLength = 0;
private Matcher m_matcher = Pattern.compile(
"cvc-maxLength-valid: Value '(.+?)' with length = '(.+?)' is not facet-valid with respect to maxLength '(.+?)' for type '(.+?)'.",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL).matcher("");
#Override
public void error(SAXParseException e) throws SAXException {
if (e.getMessage().startsWith("cvc-maxLength-valid")) {
System.out.println("error: " + e);
m_matcher.reset(e.getMessage());
if (m_matcher.matches()) {
int maxLength = Integer.parseInt(m_matcher.group(3));
m_outputBuilder.setLength(m_outputBuilder.length() - m_lastValueLength + maxLength);
} else {
System.out.println("unexpected message format");
}
}
}
#Override
public void startDocument() throws SAXException {
System.out.println("startDocument");
}
#Override
public void endDocument() throws SAXException {
System.out.println("endDocument");
}
#Override
public void startPrefixMapping(String prefix, String uri) throws SAXException {
System.out.println("startPrefixMapping: prefix: " + prefix + ", uri: " + uri);
m_prefixMappings.put(prefix, uri);
}
#Override
public void endPrefixMapping(String prefix) throws SAXException {
System.out.println("endPrefixMapping: prefix: " + prefix);
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("startElement: uri: " + uri + ", localName: " + localName + ", qName: " + qName
+ ", attributes: " + attributes.getLength());
m_outputBuilder.append('<');
m_outputBuilder.append(qName);
for (int i = 0; i < attributes.getLength(); i++) {
m_outputBuilder.append(' ');
m_outputBuilder.append(attributes.getQName(i));
m_outputBuilder.append('=');
m_outputBuilder.append('\"');
m_outputBuilder.append(attributes.getValue(i));
m_outputBuilder.append('\"');
}
if (!m_prefixMappings.isEmpty()) {
for (Entry<String, String> mapping : m_prefixMappings.entrySet()) {
m_outputBuilder.append(" xmlns:");
m_outputBuilder.append(mapping.getKey());
m_outputBuilder.append('=');
m_outputBuilder.append('\"');
m_outputBuilder.append(mapping.getValue());
m_outputBuilder.append('\"');
}
m_prefixMappings.clear();
}
m_outputBuilder.append('>');
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("endElement: uri: " + uri + ", localName: " + localName + ", qName: " + qName);
m_outputBuilder.append('<');
m_outputBuilder.append('/');
m_outputBuilder.append(qName);
m_outputBuilder.append('>');
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.println(
"characters: '" + new String(ch, start, length) + "', start: " + start + ", length: " + length);
m_outputBuilder.append(ch, start, length);
m_lastValueLength = length;
}
#Override
public void skippedEntity(String name) throws SAXException {
System.out.println("skippedEntity: name: " + name);
}
#Override
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException {
System.out.println("ignorableWhitespace: '" + new String(ch, start, length) + "', start: " + start
+ ", length: " + length);
m_outputBuilder.append(ch, start, length);
}
#Override
public void processingInstruction(String target, String data) throws SAXException {
System.out.println("processingInstruction: target: " + target + ", data: " + data);
}
#Override
public InputSource resolveEntity(String publicId, String systemId) throws IOException, SAXException {
System.out.println("resolveEntity");
return null;
}
}
}

JAVA code snippet to replace single quote(') to double quote in whole XML file

I have a XML file having nested tags. We can use DOM, JDOM parser
I want to replace inside the string of all tag from single quote(') to double quote in whole XML file. tag can be nested inside tags also. I want some for loop which looks for all tag and replace value like HYPER SHIPPING'SDN BHD_First_Page --> HYPER SHIPPING''SDN BHD_First_Page
Sample code
public void iterateChildNodes(org.jdom.Element parentNode) {
if(parentNode.getChildren().size() == 0) {
if(parentNode.getText().contains("'")) {
parentNode.setText(parentNode.getText().replaceAll("'", "\'"));
LOGGER.info("************* Below Value updated");
LOGGER.info(parentNode.getText());
}
}else {
List<Element> rec = parentNode.getChildren();
for(Element i : rec) {
iterateChildNodes(i);
}
}
}
Sample XML File
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING'SDN BHD_First_Page</Value> //Value to be replaced here
<DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street '10 road</Value> //Value to be replaced here
<Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>

This code can replace all ' with " from an XML file.
Adding no description here, try to code step by step. It is very easy to understand.
(Updated)
Part 1: Using JDOM
import java.util.ArrayList;
import java.util.List;
import org.w3c.dom.NodeList;
import org.jdom2.input.SAXBuilder;
import org.jdom2.transform.JDOMSource;
import org.w3c.dom.*;
import java.io.*;
public class XmlModificationJDom {
public static void main(String[] args) {
XmlModificationJDom xmlModificationJDom = new XmlModificationJDom();
xmlModificationJDom.updateXmlAndSaveJDom();
}
public void updateXmlAndSaveJDom() {
try {
File inputFile = new File("document.xml");
SAXBuilder saxBuilder = new SAXBuilder();
org.jdom2.Document xmlDocument = saxBuilder.build(inputFile);
org.jdom2.Element rootElement = xmlDocument.getRootElement();
iterateAndUpdateElementsUsingJDom(rootElement);
saveUpdatedXmlUsingJDomSource(xmlDocument);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public void iterateAndUpdateElementsUsingJDom(org.jdom2.Element element) {
if (element.getChildren().size() == 0) {
// System.out.println(element.getName() + ","+ element.getText());
if (element.getText().contains("'")) {
element.setText(element.getText().replaceAll("\'", "\""));
}
} else {
// System.out.println(element.getName());
for (org.jdom2.Element childElement : element.getChildren()) {
iterateAndUpdateElementsUsingJDom(childElement);
}
}
}
}
Part 2: Using DOM
import javax.xml.parsers.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.util.ArrayList;
import java.util.List;
import java.io.*;
public class XmlModificationDom {
public static void main(String[] args) {
XmlModificationDom XmlModificationDom = new XmlModificationDom();
XmlModificationDom.updateXmlAndSave();
}
public void updateXmlAndSave() {
try {
File inputFile = new File("document.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.parse(inputFile);
document.getDocumentElement().normalize();
Node parentNode = document.getFirstChild();
iterateChildNodesAndUpate(parentNode);
writeAndSaveXML(document);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public void writeAndSaveXML(Document document) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File("updated-document.xml"));
transformer.transform(source, result);
}
public void iterateChildNodesAndUpate(Node parentNode) {
NodeList nodeList = parentNode.getChildNodes();
for (int index = 0; index < nodeList.getLength(); index++) {
Node node = nodeList.item(index);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
//System.out.print(element.getNodeName());
if (element.hasChildNodes() && element.getChildNodes().getLength() > 1) {
//System.out.println("Child > " + element.getNodeName());
iterateChildNodesAndUpate(element);
} else {
//System.out.println(" - " + element.getTextContent());
if (element.getTextContent().contains("'")) {
String str = element.getTextContent().replaceAll("\'", "\"");
element.setTextContent(str);
}
}
}
}
}
}
Input file document.xml:
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING'SDN BHD_First_Page</Value> //Value to be replaced here
<DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street '10 road</Value> //Value to be replaced here
<Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>
Output updated-document.xml/updated-document-jdom.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Document>
<Identifier>DOC1</Identifier>
<Type>HYPER SHIPPING SDN BHD</Type>
<Description>HYPER SHIPPING SDN BHD</Description>
<Confidence>33.12</Confidence>
<ConfidenceThreshold>10.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ReviewedBy>SYSTEM</ReviewedBy>
<ValidatedBy>SYSTEM</ValidatedBy>
<ErrorMessage/>
<Value>HYPER SHIPPING"SDN BHD_First_Page</Value><DocumentDisplayInfo/>
<DocumentLevelFields/>
<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HYPER-KL FEB-0001-0001.tif</OldFileName>
<NewFileName>BI2E7_0.tif</NewFileName>
<SourceFileID>1</SourceFileID>
<PageLevelFields>
<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Park Street "10 road</Value><Type/>
<Confidence>66.23</Confidence>
<LearnedFileName>HYPER KL-JUN-0001.tif</LearnedFileName>
<OcrConfidenceThreshold>0.0</OcrConfidenceThreshold>
<OcrConfidence>0.0</OcrConfidence>
<FieldOrderNumber>0</FieldOrderNumber>
<ForceReview>false</ForceReview>
</PageLevelField>
</PageLevelFields>
</Page>
</Pages>
</Document>
More details code, visit this repo

you need to add backslash on single quote and double quote
value =value.replace("\'","\"");

Just replace the removeQuote method with
private static void removeQuote(Document batchXml) throws JDOMException, Exception {
Element root = batchXml.getRootElement();
List<Element> docs = root.getChild("Documents").getChildren("Document");
for (Element doc : docs) {
String docType = doc.getChildText("Value");
value =value.replaceAll("\'", "\"");
}
}

splitting Sitemap into more sitemaps if it has more than maxnumber of urls

I would like to split my Sitemap into Sitemaps, if it has more than maxURLs. The following example should split the Sitemap, if it has more than one url.
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.CharacterData;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class SiteMapSplitter {
public static void main(String[] args){
String sitemapStr = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
"<url>\n" +
"<loc>test1.html</loc>\n" +
"<lastmod>today</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url>\n" +
"<url>\n" +
"<loc>test2.html</loc>\n" +
"<lastmod>yesterday</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url></urlset>";
try {
splitSitemap(sitemapStr);
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
static private void splitSitemap(String sitemapStr) throws ParserConfigurationException {
DocumentBuilder db = null;
try {
db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(sitemapStr));
Document doc = null;
try {
doc = db.parse(is);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NodeList nodes = doc.getElementsByTagName("url");
int maxURLs = 1;
Set<String> smURLsSet= new HashSet<String>();
if (nodes.getLength()>maxURLs){
for (int i = 0; i < nodes.getLength(); i++) {
StringBuilder smURLsBuilder = new StringBuilder("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n");
for (int k = 0; k<maxURLs; k++){
Element element = (Element) nodes.item(i);
smURLsBuilder.append(element);
}
smURLsSet.add(smURLsBuilder.toString());
}
Iterator i = smURLsSet.iterator();
while(i.hasNext()){
System.out.println(i.next());
}
}
}
}
The problem is that Element element = (Element) nodes.item(i); smURLsBuilder.append(element);
does not append the whole element (in this case the url and its childreen) to the smURLsBuilder. How to do this?

You should consider using an object oriented approach to the sitemap. Either with data binding (JAXB) or even shorter using data projection (Disclosure: I'm affiliated with that project). This way you do not need to create the XML by string concatenation.
public class SitemapSplitter {
static String sitemapStr = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
"<url>\n" +
"<loc>test1.html</loc>\n" +
"<lastmod>today</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url>\n" +
"<url>\n" +
"<loc>test2.html</loc>\n" +
"<lastmod>yesterday</lastmod>\n" +
"<changefreq>daily</changefreq>\n" +
"<priority>1.0</priority>\n" +
"</url></urlset>";
public interface Sitemap {
#XBWrite("/urlset/url")
Sitemap setUrls(List<? extends Node> urls);
}
public static void main(String... args) {
XBProjector projector = new XBProjector(Flags.TO_STRING_RENDERS_XML);
// Get all urls from existing sitemap.
List<Node> urlNodes = projector.onXMLString(sitemapStr).evalXPath("/xbdefaultns:urlset/xbdefaultns:url").asListOf(Node.class);
for (Node urlNode: urlNodes) {
// Create a new sitemap, here with only one url
Sitemap newSitemap = projector.onXMLString(sitemapStr).createProjection(Sitemap.class).setUrls(Collections.singletonList(urlNode));
System.out.println(newSitemap);
}
}
}
This program prints out
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>test1.html</loc>
<lastmod>today</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
</urlset>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>test2.html</loc>
<lastmod>yesterday</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
</urlset>

how to parse this XML using DOM and put its content in a hashtable?

I want two hash tables out of the following XML. The first one being (screen id,widget id)
and the second one being (widget id,string id).
I have been able to parse this XML using DOM but putting its content into a hash table is what I haven't done.
<?xml version="1.0" encoding="UTF-8"?>
<screen id="616699" name ="SCR_NEW_HOME">
<widget id="617259" type="label" name= "NEW_HOME_TITLE">
<attribute type = "Strings">
<val id="54">HOME_SYSSETUP</val>
</attribute>
</widget>
<widget id="616836" type = "label" name ="HOME_MENU">
<attribute type="Strings">
<val id="1815" >DAILY</val>
<val id="2060" >MONTH_NOV</val>
<val id="1221" >ASPECT_RATIO_PANSCAN</val>
</attribute>
</widget>
<screen id="1556" name="SCR_EVENTLIST">
<widget id="77009" type= "label" name="EL_GUIDE_EVENT_TABLE">
<attribute type ="Strings">
<val id="1">time</val>
<val id="2">date</val>
</attribute>
</widget>
<widget id="186461" type= "label" name= "EL_PIG_CONT">
<attribute type ="Strings">
<val id="3">progress bar</val>
<val id="4">video</val>
</attribute>
</widget>
and the code I have tried is
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Hashtable;
public class ReadXmlFile {
private static Hashtable<Integer,ArrayList<Integer>> D1 = new Hashtable<Integer, ArrayList<Integer>>();
private static Hashtable<Integer,ArrayList<Integer>> D2 = new Hashtable<Integer,ArrayList<Integer>>();
static Integer ScreenID;
static ArrayList<Integer> StringID;
static ArrayList<Integer> WidgetID;
static Integer WidgetID2;
public static void main(String argv[]) {
// try {
File fXmlFile = new File("E:/eclipse workspace/stringValidation/screens.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = null;
try {
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Document doc = null;
try {
doc = dBuilder.parse(fXmlFile);
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
if(doc.hasChildNodes()){
printNote(doc.getChildNodes());
}
}
private static void printNote(NodeList nodeList) {
for (int count = 0; count < nodeList.getLength(); count++) {
Node tempNode = nodeList.item(count);
// make sure it's element node.
if (tempNode.getNodeType() == Node.ELEMENT_NODE) {
// get node name and value
System.out.println("\nNode Name =" + tempNode.getNodeName() + " [OPEN]");
System.out.println("Node Value =" + tempNode.getTextContent());
if (tempNode.hasAttributes()) {
// get attributes names and values
NamedNodeMap nodeMap = tempNode.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
Node node = nodeMap.item(i);
System.out.println("attr name : " + node.getNodeName());
System.out.println("attr value : " + node.getNodeValue());
}
}
if (tempNode.hasChildNodes()) {
// loop again if has child nodes
printNote(tempNode.getChildNodes());
}
System.out.println("Node Name =" + tempNode.getNodeName() + " [CLOSE]");
}
}
}
}

By looking at your code I assume that you are able to print out the whole document, but not to locate the data you are looking for. I do things like that with XMLBeam. It lets you create an object oriented representation of the data you need, without having to follow the complete structure of the xml you are processing. Here is how to extract the data in your first xml file, the second is just as easy (and much shorter than walking through the DOM by hand):
public class TestFirst {
#XBDocURL("res://first.xml")
public interface Projection {
#XBRead("//screen")
List<Screen> getScreens();
}
public interface Widget {
#XBRead("./#id")
String getID();
#XBRead("./attribute[#type='Strings']/val")
List<String> getStringAttributes();
}
public interface Screen {
#XBRead("./#id")
String getID();
#XBRead("./widget")
List<Widget> getWidgets();
}
#Test
public void testFirst() throws IOException {
Projection projection = new XBProjector().io().fromURLAnnotation(Projection.class);
for (Screen screen:projection.getScreens()) {
for (Widget widget:screen.getWidgets()) {
for (String string:widget.getStringAttributes()) {
System.out.println(screen.getID()+" "+ widget.getID()+ " "+ string);
}
}
}
}
}
This prints out
616699 617259 HOME_SYSSETUP
616699 616836 DAILY
616699 616836 MONTH_NOV
616699 616836 ASPECT_RATIO_PANSCAN
Now you should be able to fill your HashTable. (Plz consider HashMap or ConcurrentHashMap)

Java - Reading XML file

I am trying to read in some data from an XML file and having some trouble, the XML I have is as follows:
<xml version="1.0" encoding="UTF-8"?>
<EmailSettings>
<recipient>test#test.com</recipient>
<sender>test2#test.com</sender>
<subject>Sales Query</subject>
<description>email body message</description>
</EmailSettings>
I am trying to read these values as strings into my Java program, I have written this code so far:
private static Document getDocument (String filename){
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(filename));
}
catch (Exception e){
System.out.println("Error reading configuration file:");
System.out.println(e.getMessage());
}
return null;
}
Document doc = getDocument(configFileName);
Element config = doc.getDocumentElement();
I am struggling with reading in the actual string values though.

One of the possible implementations:
File file = new File("userdata.xml");
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(file);
String usr = document.getElementsByTagName("user").item(0).getTextContent();
String pwd = document.getElementsByTagName("password").item(0).getTextContent();
when used with the XML content:
<credentials>
<user>testusr</user>
<password>testpwd</password>
</credentials>
results in "testusr" and "testpwd" getting assigned to the usr and pwd references above.

Reading xml the easy way:
http://www.mkyong.com/java/jaxb-hello-world-example/
package com.mkyong.core;
import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
#XmlRootElement
public class Customer {
String name;
int age;
int id;
public String getName() {
return name;
}
#XmlElement
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
#XmlElement
public void setAge(int age) {
this.age = age;
}
public int getId() {
return id;
}
#XmlAttribute
public void setId(int id) {
this.id = id;
}
}
.
package com.mkyong.core;
import java.io.File;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
public class JAXBExample {
public static void main(String[] args) {
Customer customer = new Customer();
customer.setId(100);
customer.setName("mkyong");
customer.setAge(29);
try {
File file = new File("C:\\file.xml");
JAXBContext jaxbContext = JAXBContext.newInstance(Customer.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
// output pretty printed
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
jaxbMarshaller.marshal(customer, file);
jaxbMarshaller.marshal(customer, System.out);
} catch (JAXBException e) {
e.printStackTrace();
}
}
}

If using another library is an option, the following may be easier:
package for_so;
import java.io.File;
import rasmus_torkel.xml_basic.read.TagNode;
import rasmus_torkel.xml_basic.read.XmlReadOptions;
import rasmus_torkel.xml_basic.read.impl.XmlReader;
public class Q7704827_SimpleRead
{
public static void
main(String[] args)
{
String fileName = args[0];
TagNode emailNode = XmlReader.xmlFileToRoot(new File(fileName), "EmailSettings", XmlReadOptions.DEFAULT);
String recipient = emailNode.nextTextFieldE("recipient");
String sender = emailNode.nextTextFieldE("sender");
String subject = emailNode.nextTextFieldE("subject");
String description = emailNode.nextTextFieldE("description");
emailNode.verifyNoMoreChildren();
System.out.println("recipient = " + recipient);
System.out.println("sender = " + sender);
System.out.println("subject = " + subject);
System.out.println("desciption = " + description);
}
}
The library and its documentation are at rasmustorkel.com

Avoid hardcoding try making the code that is dynamic below is the code it will work for any xml I have used SAX Parser you can use dom,xpath it's upto you
I am storing all the tags name and values in the map after that it becomes easy to retrieve any values you want I hope this helps
SAMPLE XML:
<parent>
<child >
<child1> value 1 </child1>
<child2> value 2 </child2>
<child3> value 3 </child3>
</child>
<child >
<child4> value 4 </child4>
<child5> value 5</child5>
<child6> value 6 </child6>
</child>
</parent>
JAVA CODE:
import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class saxParser {
static Map<String,String> tmpAtrb=null;
static Map<String,String> xmlVal= new LinkedHashMap<String, String>();
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, VerifyError {
/**
* We can pass the class name of the XML parser
* to the SAXParserFactory.newInstance().
*/
//SAXParserFactory saxDoc = SAXParserFactory.newInstance("com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl", null);
SAXParserFactory saxDoc = SAXParserFactory.newInstance();
SAXParser saxParser = saxDoc.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
String tmpElementName = null;
String tmpElementValue = null;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
tmpElementValue = "";
tmpElementName = qName;
tmpAtrb=new HashMap();
//System.out.println("Start Element :" + qName);
/**
* Store attributes in HashMap
*/
for (int i=0; i<attributes.getLength(); i++) {
String aname = attributes.getLocalName(i);
String value = attributes.getValue(i);
tmpAtrb.put(aname, value);
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(tmpElementName.equals(qName)){
System.out.println("Element Name :"+tmpElementName);
/**
* Retrive attributes from HashMap
*/ for (Map.Entry<String, String> entrySet : tmpAtrb.entrySet()) {
System.out.println("Attribute Name :"+ entrySet.getKey() + "Attribute Value :"+ entrySet.getValue());
}
System.out.println("Element Value :"+tmpElementValue);
xmlVal.put(tmpElementName, tmpElementValue);
System.out.println(xmlVal);
//Fetching The Values From The Map
String getKeyValues=xmlVal.get(tmpElementName);
System.out.println("XmlTag:"+tmpElementName+":::::"+"ValueFetchedFromTheMap:"+getKeyValues);
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
tmpElementValue = new String(ch, start, length) ;
}
};
/**
* Below two line used if we use SAX 2.0
* Then last line not needed.
*/
//saxParser.setContentHandler(handler);
//saxParser.parse(new InputSource("c:/file.xml"));
saxParser.parse(new File("D:/Test _ XML/file.xml"), handler);
}
}
OUTPUT:
Element Name :child1
Element Value : value 1
XmlTag:<child1>:::::ValueFetchedFromTheMap: value 1
Element Name :child2
Element Value : value 2
XmlTag:<child2>:::::ValueFetchedFromTheMap: value 2
Element Name :child3
Element Value : value 3
XmlTag:<child3>:::::ValueFetchedFromTheMap: value 3
Element Name :child4
Element Value : value 4
XmlTag:<child4>:::::ValueFetchedFromTheMap: value 4
Element Name :child5
Element Value : value 5
XmlTag:<child5>:::::ValueFetchedFromTheMap: value 5
Element Name :child6
Element Value : value 6
XmlTag:<child6>:::::ValueFetchedFromTheMap: value 6
Values Inside The Map:{child1= value 1 , child2= value 2 , child3= value 3 , child4= value 4 , child5= value 5, child6= value 6 }

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How could I get the total number inside a java xml tap? - java

Related

How to programmatically fix an XML document based on maxLength restrictions in the schema

JAVA code snippet to replace single quote(') to double quote in whole XML file

splitting Sitemap into more sitemaps if it has more than maxnumber of urls

how to parse this XML using DOM and put its content in a hashtable?

Java - Reading XML file

Categories

Resources