Java XML pull data on specific node

Java XML pull data on specific node - java

I am trying to pull all data by searching for a specific node. My code below can only print out all the nodes but what if I only want to pull information on pantone 101 for example and print out all of the other nodes such as the colors within that specific pantone 101 node. Here is the code I've created to print out the XML data, how can I edit this to only print a specific node. Thanks!
<inventory>
<Product pantone="100" blue="7.4" red="35" green="24"> </Product>
<Product pantone="101" blue="5.4" red="3" rubine="35" purple="24"> </Product>
<Product pantone="102" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="103" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="104" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="105" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="106" black="5.4" rubine="35" white="24" purple="35" orange="5.4"> </Product>
</inventory>
//
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
public class ReadXML {
public static void main(String args[]) throws Exception {
DocumentBuilderFactory buildFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder dBuilder = buildFactory.newDocumentBuilder();
Document document = dBuilder.parse(ReadXML.class.getResourceAsStream("data.xml"));
document.normalize();
//get main node
NodeList rootNodes = document.getElementsByTagName("inventory");
Node rootNode = rootNodes.item(0);
Element rootElement = (Element) rootNode;
//print all with specific tag
NodeList inventoryList = rootElement.getElementsByTagName("Product");
for(int i = 0; i < inventoryList.getLength(); i++){
Node pantone = inventoryList.item(i);
Element pantoneElement = (Element) pantone;
//remove blank elements
System.out.println("Pantone: " + pantoneElement.getAttribute("pantone")); // print attribute
System.out.println("Blue: " + pantoneElement.getAttribute("blue"));
System.out.println("Red: " + pantoneElement.getAttribute("red"));
System.out.println("Orange: " + pantoneElement.getAttribute("orange"));
System.out.println("White: " + pantoneElement.getAttribute("white"));
System.out.println("Purple: " + pantoneElement.getAttribute("purple"));
System.out.println("Green: " + pantoneElement.getAttribute("green"));
System.out.println("Black: " + pantoneElement.getAttribute("black"));
System.out.println("Rubine: " + pantoneElement.getAttribute("rubine"));
}
} catch (ParserConfigurationException | SAXException | IOException e) {
}
}
}

You can use XPath to get data from XML document using more advanced criteria. For example, the following is XPath expression that will get <Product> element having pantone attribute equals 101 :
/inventory/Product[#pantone=101]
alternatively, just don't mention /inventory in the XPath if you plan to call the XPath on <inventory> element as the context :
Product[#pantone=101]
I'm not familiar with Java, but this post should give you a good hint : How to read XML using XPath in Java

Related

Java XML Document XPath with Disable Escaping

Currently, I need to get the element of XML without escaping.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<Header>H001</Header>
<Body>
<Item>ABC&amp;ABC&quot;</Item>
</Body>
</Message>
I need to get the value of "Item" element via XPath.
However, it is escaped automatically.
My Result = ABC&ABC"
Expected = ABC&amp;ABC&quot;
How can I get the expected value?

XPath will always return the values of nodes that result from XML parsing. The string value of the Item element in your XML, after parsing, is ABC&ABC", so that's what XPath gives you. If you want ABC&amp;ABC&quot; then you will have to reverse the action of the XML parser - this is known as serialization. Parsing "unescapes" entity and character references (it turns & into &). Serialization escapes special characters such as "&" (it turns & into &).

Put content surrounded by CDATA.
Note: Charater data (CDATA) will tell the parser to send the text as regular text (no markup) without parsing.
For example :
abc.xml
<?xml version="1.0" encoding="UTF-8"?>
<Messages>
<Message>
<Header>H001</Header>
<Body>
<Item><![CDATA[ABC&&ABC&quot;]]></Item>
</Body>
</Message>
</Messages>
Java code :
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("abc.xml");
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
NodeList list = doc.getElementsByTagName("Message");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
NodeList children = node.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
node = children.item(j);
System.out.println(node.getTextContent().trim());
}
}
}
}
Output :
H001
ABC&&ABC&quot;

How can i send that soap api response some fields into db

How can I parse and send data fields to db using java. I need code for store data to db. It would require any extra dependencies.
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<WS_FulfillmentResponse xmlns="https://imfawstest.codemantra.com/AIDC/">
<WS_FulfillmentResult>
<Date_Generated>12/2/2019 2:24:03 AM</Date_Generated>
<product>
<PUB_ID>25905</PUB_ID>
<LOG_NO>46725</LOG_NO>
<PIN_NO>94389</PIN_NO>
<SERIES_CODE>001</SERIES_CODE>
<SERIES_DESC>IMF Working Papers</SERIES_DESC>
<TITLE><![CDATA[cM_Safe and Wholesome Food - Nordic Reflections]]></TITLE>
<SUBTITLE />
<PUBL_SERIES_VOLNO>Working Paper No. 12/596</PUBL_SERIES_VOLNO>
<LANGUAGE_ID>4</LANGUAGE_ID>
<EDITION />
<PRC_STATUS_ID>11</PRC_STATUS_ID>
<PRC_STATUS_CODE>40</PRC_STATUS_CODE>
<Process_Status_Desc>Key' metadata required for promotion is entered by now (Project ID & description, Stock No.)</Process_Status_Desc>
<PUBLISHED_DATE />
<REVISION_DATE />
<EST_COMPL_DATE />
<ISBN>9781498302258</ISBN>
<ISSN_ID>152</ISSN_ID>
<COPYRIGHT_YR>2019</COPYRIGHT_YR>
<DOI>10.5089/9781498302258.001</DOI>
<Persistent_Link>https://elibrary.imf.org/view/IMF001/25905-9781498302258/25905-9781498302258/25905-9781498302258.xml</Persistent_Link>
<SKU>802258</SKU>
<DRAFT>0</DRAFT>
<FRONT_MATTERS>0</FRONT_MATTERS>
<MAIN>0</MAIN>
<Size>Small</Size>
<TOTAL>0</TOTAL>
<ROMAN_ARABIC />
<Dept_Phone>37779</Dept_Phone>
<Dept_Email>JLI2</Dept_Email>
<Dept_Contact_ID>22382</Dept_Contact_ID>
<Dept_Contact_Name />
<EXR_Editor_Name>Jim Beardow</EXR_Editor_Name>
<EXR_Editor_ID>45</EXR_Editor_ID>
<EXR_Editor_Phone>37899</EXR_Editor_Phone>
<EXR_Editor_Email>jbeardow#imf.org</EXR_Editor_Email>
<OutsidePublisher_Id>0</OutsidePublisher_Id>
<OutsidePublisher_Name />
<OutsidePublisher_Email />
<OutsidePublisher_Phone />
<OutsidePublisher_Address />
<PUBL_MANUS_EXPT_DATE>10/31/2019</PUBL_MANUS_EXPT_DATE>
<PUBL_DATE_TO_GRAPHICS />
<PUBL_DATE_TRANSLATION_RECD />
<PUBL_DATE_SENT_TRANSLATION>10/31/2019</PUBL_DATE_SENT_TRANSLATION>
<PUBL_MANUS_RCVD_DATE>10/31/2019</PUBL_MANUS_RCVD_DATE>
<PUBL_ASG_EDITOR_DATE>10/31/2019</PUBL_ASG_EDITOR_DATE>
<FIRST_RECIEPT_DATE>10/31/2019</FIRST_RECIEPT_DATE>
<PUBL_OUT_OF_PRINT_DATE />
<PRIORITY>Medium</PRIORITY>
<PUBL_DEPT_SUMMARY />
<PUBL_NOTE_CORESP />
<PUBL_INTERNAL_REMARKS />
above is sample format how to parse and take some fields name and isbn,price some more send to db how.

You can extract each data using inbuilt Java API for XML Processing called DOM XML Parser. It turns the XML file into DOM or Tree structure, and you have to traverse a node by node to get what you want. Here is an example that shows how to do it,
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
try {
// Creating a constructor of file class and parsing an XML file
File file = new File("XMLFile.xml");
// An instance of factory that gives a document builder
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// An instance of builder to parse the specified XML file
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
NodeList nodeList = doc.getElementsByTagName("product");
// NodeList is not iterable, so we are using for loop
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println("\nNode Name :" + node.getNodeName());
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) node;
System.out.println("PUB ID: " + eElement.getElementsByTagName("PUB_ID").item(0).getTextContent());
System.out.println("LOG NO: " + eElement.getElementsByTagName("LOG_NO").item(0).getTextContent());
System.out.println("PIN NO: " + eElement.getElementsByTagName("PIN_NO").item(0).getTextContent());
System.out.println("SERIES CODE: " + eElement.getElementsByTagName("SERIES_CODE").item(0).getTextContent());
System.out.println("SERIES DESC: " + eElement.getElementsByTagName("SERIES_DESC").item(0).getTextContent());
}
}
} catch (SAXException | IOException | ParserConfigurationException ex) {
Logger.getLogger(YourClass.class.getName()).log(Level.SEVERE, null, ex);
}
And then you can code an insert query to send data to your database.
Hope this helps you!

is there any way other than using Xpath for this?

hello guys i'am writing this program:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class DOMbooks {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
File file = new File("books-fixed.xml");
Document doc = docBuilder.parse(file);
NodeList list = doc.getElementsByTagName("*");
int bookCounter = 1;
for (int i = 1; i < list.getLength(); i++) {
Element element = (Element)list.item(i);
String nodeName = element.getNodeName();
if (nodeName.equals("book")) {
bookCounter++;
System.out.println("BOOK " + bookCounter);
String isbn = element.getAttribute("sequence");
System.out.println("\tsequence:\t" + isbn);
}
else if (nodeName.equals("author")) {
System.out.println("\tAuthor:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("title")) {
System.out.println("\tTitle:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("publishYear")) {
System.out.println("\tpublishYear:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("genre")) {
System.out.println("\tgenre:\t" + element.getChildNodes().item(0).getNodeValue());
}
}
}
}
i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck, with too much errors... any suggestions?
assuming that i have this table and i only want to select science fiction books with all their info
<book sequence="5">
<title>Aftershock</title>
<auther>Robert B. Reich</auther>
<publishYear>2010</publishYear>
<genre>Economics</genre>
</book>
- <book sequence="6">
<title>The Time Machine</title>
<auther>H.G. Wells</auther>
<publishYear>1895</publishYear>
<genre>Science Fiction</genre>
assuming i have this table i only want to print the Science Fiction books with all their info...

i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck,
I assume you'd mean that you want all the books for which genre == "Science Fiction", right? In that case, XPath is really much simpler than whatever you were trying in Java (you don't show the root note, so I'll start with '//', which selects at any depth):
//book[genre = 'Science Fiction']
XSLT approach to simplify things
Now, having another look at your code, it looks like you want to print each and every element, including the element's name. This is more trivially done in XSLT:
<!-- every XSLT 1.0 must start like this -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- you want text -->
<xsl:output method="text" />
<!-- match any science fiction book (your primary goal) -->
<xsl:template match="book[genre = 'Science Fiction']">
<xsl:text>BOOK </xsl:text>
<xsl:value-of select="position()" />
<!-- send the children and attribute to be processed by templates -->
<xsl:apply-templates select="#sequence | *" />
</xsl:template>
<!-- "catch" any elements or attributes under <book> -->
<xsl:template match="book/* | book/#*">
<!-- a newline and a tab per line-->
<xsl:text>
</xsl:text>
<!-- and the name of the element or attribute -->
<xsl:value-of select="local-name()" />
<!-- another tab, plus contents of the element or attribute -->
<xsl:text> </xsl:text>
<xsl:value-of select="." />
</xsl:template>
<!-- make sure that other values are ignored, but process children -->
<xsl:template match="node()">
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
You can use this code, which is significantly shorter (if you ignore the comments and whitespace) and (arguably, once you get the hang of it) more readable than your original code. To use it:
Store it as books.xsl
Then, simply use this (copied and changed from here):
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestMain {
public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("books.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("books-fixed.xml"));
transformer.transform(text, new StreamResult(new File("output.txt")));
}
}
XPath 2.0
If you can use Saxon in Java, the above becomes a one-liner with XPath 2.0 and you don't even need XSLT:
for $book in //book[genre = 'Science Fiction']
return (
'BOOK',
count(//book[genre = 'Science Fiction'][. << $book]) + 1,
for $tag in $book/(#sequence | *)
return $tag/local-name(), ':', string($tag)
)

Extract a block of XML into another XML file in java

I have a XML file called word.xml containing
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
<Biased> good morning </Biased>
<abc>..............</abc>
.
. // few more tags here
.
</A>
Now i want to extract another XML file called word1.xml containing part of word1.xml
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
</A>
Java Code which I tried so far
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ReadXMLFile {
public static void main(String args[]) {
try {
File stocks = new File("word.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stocks);
doc.getDocumentElement().normalize();
System.out.println("root of xml file" + doc.getDocumentElement().getNodeName());
NodeList nodes = doc.getElementsByTagName("A");
System.out.println("==========================");
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("i value---"+i);
System.out.println(nodes.getLength());
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
System.out.println(element.getTextContent());
//element.getElementsByTagName(name)
File statText = new File(i+".txt");
FileOutputStream is = new FileOutputStream(statText);
OutputStreamWriter osw = new OutputStreamWriter(is);
Writer w = new BufferedWriter(osw);
w.write("<Answer>");
w.write(element.getElementsByTagName("Answer").item(0).getTextContent());
w.write("</Answer>");
w.write("Question");
w.write(element.getElementsByTagName("Question").item(0).getTextContent());
w.write("</Question>");
w.close();
}
}
}
catch (Exception ex) {
ex.printStackTrace();
}
private static String getValue(String tag, Element element) {
NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue();
}
}
}
I just want to include tags in my results. This is the DIRTY way of doing. Can you please suggest me the best way.Need help. Thanks in advance.

If Java is not a mandatory constraint here, you can achieve this by using XSLT. It's pretty easy to follow. You can find some guidance here: Link
An example of my own practice:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//title">
<article>
<title>
<xsl:value-of select="./name/>
<xsl:text> : </xsl:text>
<xsl:value-of select = "./number/>
</title>
<references>
<xsl:value-of select = "reference"/>
</references>
</article>
</xsl:for-each>
</xsl:template>
Hope it helps!

Just like BeginnerJava explained XSL is the most appropriate technology here as you are transforming one XML tree to another XML tree and XSL is meant for that.
in XSL the code needed to achieve what you describe would be (I skipped some bits):
<xsl:template match="A">
<xsl:copy>
<xsl:apply-templates select="Answer|Question"/>
</xsl:copy>
</xsl:template>
You can invoke XSL transfomation from you Java code or from command line like this:
java net.sf.saxon.Transform [options] source-document stylesheet [ params…]

Parse the xml into DOM using a DocumentParser. remove the elements that you don't need from the resulting Document. write the modified Document to a new File using a Transformer. (note, the details for each of these steps can be found in any of the thousands of java xml tutorials online).

How to iterate through the org.w3c.dom.NodeList using jython?

I'm using org.w3c.dom to process some xml documents. And I'm using jython 2.5.1 to implement it.
Part of my xml document (EmployeeInfo.xml) is like:
<employees>
<employee id="1">
<name>ABC</name>
<title>Software Engineer</title>
</employee>
<employee id="2">
<name>DEF</name>
<title>Systems Engineer</title>
</employee>
<employee id="3">
<name>GHI</name>
<title>QA Engineer</title>
</employee>
......
</employees>
And my jython code for reading in and parsing xml is like:
import sys, logging
logging.basicConfig(level=logging.INFO)
from java.io import File
from javax.xml.parsers import DocumentBuilder
from javax.xml.parsers import DocumentBuilderFactory
from org.w3c.dom import Document
from org.w3c.dom import Element
from org.w3c.dom import Node
from org.w3c.dom import NodeList
// ... some code
file = "C:/Users/Adminstrator/Doc/EmployeeInfo.xml"
doc = File(file)
if doc.exists():
docFactory = DocumentBuilderFactory.newInstance()
docFactory.setNamespaceAware(True)
docBuilder = docFactory.newDocumentBuilder()
if doc.endswith(".xml"):
logging.info(" -- Reading " + doc)
employeeDoc = docBuilder.parse(doc)
if employeeDoc != None:
employees = employeeDoc.getElementsByTagNameNS("*","employee")
if employees != None:
for employee in employees:
logging.info(employee.getChildNodes().getLength())
else:
logging.warn("Failed to get the employee from " + doc)
else:
logging.warn("Failed to parse the document " + doc)
else:
logging.warn("Failed to find the specified document" + doc + ", please check the path!")
When I ran this script, there was an error:
TypeError: 'org.apache.xerces.dom.DeepNodeListImpl' object is not iterable
referring to the line:
for employee in employees:
It seems like it automatically treat the 'employees' as the jython's NodeList rather than org.w3c.dom.NodeList...
I searched online regarding this issue, but I've got little regarding this issue...Could anyone here help me with this? Thanks in advance!

I used the while loop to replace the for loop because it's rare to use the for(int i=0; i
So I used:
i = 0
while i < employees.getLength:
employee = employees.item(i)
i = i + 1
....

org.apache.xerces.dom.DeepNodeListImpl
This should work:
for (int i; i < employees.getLength(); i++) {
Node employee = employees.item(i);
....
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java XML pull data on specific node - java

Related

Java XML Document XPath with Disable Escaping

How can i send that soap api response some fields into db

is there any way other than using Xpath for this?

Extract a block of XML into another XML file in java

How to iterate through the org.w3c.dom.NodeList using jython?

Categories

Resources