Extract a block of XML into another XML file in java

Extract a block of XML into another XML file in java - java

I have a XML file called word.xml containing
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
<Biased> good morning </Biased>
<abc>..............</abc>
.
. // few more tags here
.
</A>
Now i want to extract another XML file called word1.xml containing part of word1.xml
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
</A>
Java Code which I tried so far
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ReadXMLFile {
public static void main(String args[]) {
try {
File stocks = new File("word.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stocks);
doc.getDocumentElement().normalize();
System.out.println("root of xml file" + doc.getDocumentElement().getNodeName());
NodeList nodes = doc.getElementsByTagName("A");
System.out.println("==========================");
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("i value---"+i);
System.out.println(nodes.getLength());
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
System.out.println(element.getTextContent());
//element.getElementsByTagName(name)
File statText = new File(i+".txt");
FileOutputStream is = new FileOutputStream(statText);
OutputStreamWriter osw = new OutputStreamWriter(is);
Writer w = new BufferedWriter(osw);
w.write("<Answer>");
w.write(element.getElementsByTagName("Answer").item(0).getTextContent());
w.write("</Answer>");
w.write("Question");
w.write(element.getElementsByTagName("Question").item(0).getTextContent());
w.write("</Question>");
w.close();
}
}
}
catch (Exception ex) {
ex.printStackTrace();
}
private static String getValue(String tag, Element element) {
NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue();
}
}
}
I just want to include tags in my results. This is the DIRTY way of doing. Can you please suggest me the best way.Need help. Thanks in advance.

If Java is not a mandatory constraint here, you can achieve this by using XSLT. It's pretty easy to follow. You can find some guidance here: Link
An example of my own practice:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//title">
<article>
<title>
<xsl:value-of select="./name/>
<xsl:text> : </xsl:text>
<xsl:value-of select = "./number/>
</title>
<references>
<xsl:value-of select = "reference"/>
</references>
</article>
</xsl:for-each>
</xsl:template>
Hope it helps!

Just like BeginnerJava explained XSL is the most appropriate technology here as you are transforming one XML tree to another XML tree and XSL is meant for that.
in XSL the code needed to achieve what you describe would be (I skipped some bits):
<xsl:template match="A">
<xsl:copy>
<xsl:apply-templates select="Answer|Question"/>
</xsl:copy>
</xsl:template>
You can invoke XSL transfomation from you Java code or from command line like this:
java net.sf.saxon.Transform [options] source-document stylesheet [ params…]

Parse the xml into DOM using a DocumentParser. remove the elements that you don't need from the resulting Document. write the modified Document to a new File using a Transformer. (note, the details for each of these steps can be found in any of the thousands of java xml tutorials online).

Related

Java XML Document XPath with Disable Escaping

Currently, I need to get the element of XML without escaping.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<Header>H001</Header>
<Body>
<Item>ABC&amp;ABC&quot;</Item>
</Body>
</Message>
I need to get the value of "Item" element via XPath.
However, it is escaped automatically.
My Result = ABC&ABC"
Expected = ABC&amp;ABC&quot;
How can I get the expected value?

XPath will always return the values of nodes that result from XML parsing. The string value of the Item element in your XML, after parsing, is ABC&ABC", so that's what XPath gives you. If you want ABC&amp;ABC&quot; then you will have to reverse the action of the XML parser - this is known as serialization. Parsing "unescapes" entity and character references (it turns & into &). Serialization escapes special characters such as "&" (it turns & into &).

Put content surrounded by CDATA.
Note: Charater data (CDATA) will tell the parser to send the text as regular text (no markup) without parsing.
For example :
abc.xml
<?xml version="1.0" encoding="UTF-8"?>
<Messages>
<Message>
<Header>H001</Header>
<Body>
<Item><![CDATA[ABC&&ABC&quot;]]></Item>
</Body>
</Message>
</Messages>
Java code :
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("abc.xml");
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
NodeList list = doc.getElementsByTagName("Message");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
NodeList children = node.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
node = children.item(j);
System.out.println(node.getTextContent().trim());
}
}
}
}
Output :
H001
ABC&&ABC&quot;

is there any way other than using Xpath for this?

hello guys i'am writing this program:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class DOMbooks {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
File file = new File("books-fixed.xml");
Document doc = docBuilder.parse(file);
NodeList list = doc.getElementsByTagName("*");
int bookCounter = 1;
for (int i = 1; i < list.getLength(); i++) {
Element element = (Element)list.item(i);
String nodeName = element.getNodeName();
if (nodeName.equals("book")) {
bookCounter++;
System.out.println("BOOK " + bookCounter);
String isbn = element.getAttribute("sequence");
System.out.println("\tsequence:\t" + isbn);
}
else if (nodeName.equals("author")) {
System.out.println("\tAuthor:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("title")) {
System.out.println("\tTitle:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("publishYear")) {
System.out.println("\tpublishYear:\t" + element.getChildNodes().item(0).getNodeValue());
}
else if (nodeName.equals("genre")) {
System.out.println("\tgenre:\t" + element.getChildNodes().item(0).getNodeValue());
}
}
}
}
i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck, with too much errors... any suggestions?
assuming that i have this table and i only want to select science fiction books with all their info
<book sequence="5">
<title>Aftershock</title>
<auther>Robert B. Reich</auther>
<publishYear>2010</publishYear>
<genre>Economics</genre>
</book>
- <book sequence="6">
<title>The Time Machine</title>
<auther>H.G. Wells</auther>
<publishYear>1895</publishYear>
<genre>Science Fiction</genre>
assuming i have this table i only want to print the Science Fiction books with all their info...

i want to print all the data about the "Science Fiction" books.. i know i should use Xpath but it's stuck,
I assume you'd mean that you want all the books for which genre == "Science Fiction", right? In that case, XPath is really much simpler than whatever you were trying in Java (you don't show the root note, so I'll start with '//', which selects at any depth):
//book[genre = 'Science Fiction']
XSLT approach to simplify things
Now, having another look at your code, it looks like you want to print each and every element, including the element's name. This is more trivially done in XSLT:
<!-- every XSLT 1.0 must start like this -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- you want text -->
<xsl:output method="text" />
<!-- match any science fiction book (your primary goal) -->
<xsl:template match="book[genre = 'Science Fiction']">
<xsl:text>BOOK </xsl:text>
<xsl:value-of select="position()" />
<!-- send the children and attribute to be processed by templates -->
<xsl:apply-templates select="#sequence | *" />
</xsl:template>
<!-- "catch" any elements or attributes under <book> -->
<xsl:template match="book/* | book/#*">
<!-- a newline and a tab per line-->
<xsl:text>
</xsl:text>
<!-- and the name of the element or attribute -->
<xsl:value-of select="local-name()" />
<!-- another tab, plus contents of the element or attribute -->
<xsl:text> </xsl:text>
<xsl:value-of select="." />
</xsl:template>
<!-- make sure that other values are ignored, but process children -->
<xsl:template match="node()">
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
You can use this code, which is significantly shorter (if you ignore the comments and whitespace) and (arguably, once you get the hang of it) more readable than your original code. To use it:
Store it as books.xsl
Then, simply use this (copied and changed from here):
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestMain {
public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("books.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("books-fixed.xml"));
transformer.transform(text, new StreamResult(new File("output.txt")));
}
}
XPath 2.0
If you can use Saxon in Java, the above becomes a one-liner with XPath 2.0 and you don't even need XSLT:
for $book in //book[genre = 'Science Fiction']
return (
'BOOK',
count(//book[genre = 'Science Fiction'][. << $book]) + 1,
for $tag in $book/(#sequence | *)
return $tag/local-name(), ':', string($tag)
)

Java XML pull data on specific node

I am trying to pull all data by searching for a specific node. My code below can only print out all the nodes but what if I only want to pull information on pantone 101 for example and print out all of the other nodes such as the colors within that specific pantone 101 node. Here is the code I've created to print out the XML data, how can I edit this to only print a specific node. Thanks!
<inventory>
<Product pantone="100" blue="7.4" red="35" green="24"> </Product>
<Product pantone="101" blue="5.4" red="3" rubine="35" purple="24"> </Product>
<Product pantone="102" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="103" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="104" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="105" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="106" black="5.4" rubine="35" white="24" purple="35" orange="5.4"> </Product>
</inventory>
//
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
public class ReadXML {
public static void main(String args[]) throws Exception {
DocumentBuilderFactory buildFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder dBuilder = buildFactory.newDocumentBuilder();
Document document = dBuilder.parse(ReadXML.class.getResourceAsStream("data.xml"));
document.normalize();
//get main node
NodeList rootNodes = document.getElementsByTagName("inventory");
Node rootNode = rootNodes.item(0);
Element rootElement = (Element) rootNode;
//print all with specific tag
NodeList inventoryList = rootElement.getElementsByTagName("Product");
for(int i = 0; i < inventoryList.getLength(); i++){
Node pantone = inventoryList.item(i);
Element pantoneElement = (Element) pantone;
//remove blank elements
System.out.println("Pantone: " + pantoneElement.getAttribute("pantone")); // print attribute
System.out.println("Blue: " + pantoneElement.getAttribute("blue"));
System.out.println("Red: " + pantoneElement.getAttribute("red"));
System.out.println("Orange: " + pantoneElement.getAttribute("orange"));
System.out.println("White: " + pantoneElement.getAttribute("white"));
System.out.println("Purple: " + pantoneElement.getAttribute("purple"));
System.out.println("Green: " + pantoneElement.getAttribute("green"));
System.out.println("Black: " + pantoneElement.getAttribute("black"));
System.out.println("Rubine: " + pantoneElement.getAttribute("rubine"));
}
} catch (ParserConfigurationException | SAXException | IOException e) {
}
}
}

You can use XPath to get data from XML document using more advanced criteria. For example, the following is XPath expression that will get <Product> element having pantone attribute equals 101 :
/inventory/Product[#pantone=101]
alternatively, just don't mention /inventory in the XPath if you plan to call the XPath on <inventory> element as the context :
Product[#pantone=101]
I'm not familiar with Java, but this post should give you a good hint : How to read XML using XPath in Java

How to iterate through the org.w3c.dom.NodeList using jython?

I'm using org.w3c.dom to process some xml documents. And I'm using jython 2.5.1 to implement it.
Part of my xml document (EmployeeInfo.xml) is like:
<employees>
<employee id="1">
<name>ABC</name>
<title>Software Engineer</title>
</employee>
<employee id="2">
<name>DEF</name>
<title>Systems Engineer</title>
</employee>
<employee id="3">
<name>GHI</name>
<title>QA Engineer</title>
</employee>
......
</employees>
And my jython code for reading in and parsing xml is like:
import sys, logging
logging.basicConfig(level=logging.INFO)
from java.io import File
from javax.xml.parsers import DocumentBuilder
from javax.xml.parsers import DocumentBuilderFactory
from org.w3c.dom import Document
from org.w3c.dom import Element
from org.w3c.dom import Node
from org.w3c.dom import NodeList
// ... some code
file = "C:/Users/Adminstrator/Doc/EmployeeInfo.xml"
doc = File(file)
if doc.exists():
docFactory = DocumentBuilderFactory.newInstance()
docFactory.setNamespaceAware(True)
docBuilder = docFactory.newDocumentBuilder()
if doc.endswith(".xml"):
logging.info(" -- Reading " + doc)
employeeDoc = docBuilder.parse(doc)
if employeeDoc != None:
employees = employeeDoc.getElementsByTagNameNS("*","employee")
if employees != None:
for employee in employees:
logging.info(employee.getChildNodes().getLength())
else:
logging.warn("Failed to get the employee from " + doc)
else:
logging.warn("Failed to parse the document " + doc)
else:
logging.warn("Failed to find the specified document" + doc + ", please check the path!")
When I ran this script, there was an error:
TypeError: 'org.apache.xerces.dom.DeepNodeListImpl' object is not iterable
referring to the line:
for employee in employees:
It seems like it automatically treat the 'employees' as the jython's NodeList rather than org.w3c.dom.NodeList...
I searched online regarding this issue, but I've got little regarding this issue...Could anyone here help me with this? Thanks in advance!

I used the while loop to replace the for loop because it's rare to use the for(int i=0; i
So I used:
i = 0
while i < employees.getLength:
employee = employees.item(i)
i = i + 1
....

org.apache.xerces.dom.DeepNodeListImpl
This should work:
for (int i; i < employees.getLength(); i++) {
Node employee = employees.item(i);
....
}

Modify XML file with xPath

I want to modify an existing XML file using xPath. If the node doesn't exist, it should be created (along with it's parents if neccessary). An example:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<param0>true</param0>
<param1>1.0</param1>
</configuration>
And here are a couple of xPaths I want to insert/modify:
/configuration/param1/text() -> 4.0
/configuration/param2/text() -> "asdf"
/configuration/test/param3/text() -> true
The XML file should look like this afterwards:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<param0>true</param0>
<param1>4.0</param1>
<param2>asdf</param2>
<test>
<param3>true</param3>
</test>
</configuration>
I tried this:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
try {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
Document doc = domFactory.newDocumentBuilder().parse(file.getAbsolutePath());
XPath xpath = XPathFactory.newInstance().newXPath();
String xPathStr = "/configuration/param1/text()";
Node node = ((NodeList) xpath.compile(xPathStr).evaluate(doc, XPathConstants.NODESET)).item(0);
System.out.printf("node value: %s\n", node.getNodeValue());
node.setNodeValue("4.0");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(new DOMSource(doc), new StreamResult(file));
} catch (Exception e) {
e.printStackTrace();
}
The node is changed in the file after running this code. Exactly what I wanted. But if I use one of the below paths, node is null (and therefore a NullPointerException is thrown):
/configuration/param2/text()
/configuration/test/param3/text()
How can I change this code so that the node (and non existing parent nodes as well) are created?
EDIT: Ok, to clarify: I have a set of parameters that I want to save to XML. During development, this set can change (some parameters get added, some get moved, some get removed). So I basically want to have a function to write the current set of parameters to an already existing file. It should override the parameters that already exist in the file, add new parameters and leave old parameters in there.
The same for reading, I could just have the xPath or some other coordinates and get the value from the XML. If it doesn't exist, it returns the empty string.
I don't have any constraints on how to implement it, xPath, DOM, SAX, XSLT... It should just be easy to use once the functionality is written (like BeniBela's solution).
So if I have the following parameters to set:
/configuration/param1/text() -> 4.0
/configuration/param2/text() -> "asdf"
/configuration/test/param3/text() -> true
the result should be the starting XML + those parameters. If they already exist at that xPath, they get replaced, otherwise they get inserted at that point.

If you want a solution without dependencies, you can do it with just DOM and without XPath/XSLT.
Node.getChildNodes|getNodeName / NodeList.* can be used to find the nodes, and Document.createElement|createTextNode, Node.appendChild to create new ones.
Then you can write your own, simple "XPath" interpreter, that creates missing nodes in the path like that:
public static void update(Document doc, String path, String def){
String p[] = path.split("/");
//search nodes or create them if they do not exist
Node n = doc;
for (int i=0;i < p.length;i++){
NodeList kids = n.getChildNodes();
Node nfound = null;
for (int j=0;j<kids.getLength();j++)
if (kids.item(j).getNodeName().equals(p[i])) {
nfound = kids.item(j);
break;
}
if (nfound == null) {
nfound = doc.createElement(p[i]);
n.appendChild(nfound);
n.appendChild(doc.createTextNode("\n")); //add whitespace, so the result looks nicer. Not really needed
}
n = nfound;
}
NodeList kids = n.getChildNodes();
for (int i=0;i<kids.getLength();i++)
if (kids.item(i).getNodeType() == Node.TEXT_NODE) {
//text node exists
kids.item(i).setNodeValue(def); //override
return;
}
n.appendChild(doc.createTextNode(def));
}
Then, if you only want to update text() nodes, you can use it as:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
Document doc = domFactory.newDocumentBuilder().parse(file.getAbsolutePath());
update(doc, "configuration/param1", "4.0");
update(doc, "configuration/param2", "asdf");
update(doc, "configuration/test/param3", "true");

Here is a simple XSLT solution:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="param1/text()">4.0</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
<param2>asdf</param2>
<test><param3>true</param3></test>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<configuration>
<param0>true</param0>
<param1>1.0</param1>
</configuration>
the wanted, correct result is produced:
<configuration>
<param0>true</param0>
<param1>4.0</param1>
<param2>asdf</param2>
<test><param3>true</param3></test>
</configuration>
Do Note:
An XSLT transformation never "updates in-place". It always creates a new result tree. Therefore, if one wants to modify the same file, typically the result of the transformation is saved under another name, then the original file is deleted and the result is renamed to have the original name.

I've created a small project for using XPATH to create/update XML: https://github.com/shenghai/xmodifier
the code to change your xml is like:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(xmlfile);
XModifier modifier = new XModifier(document);
modifier.addModify("/configuration/param1", "asdf");
modifier.addModify("/configuration/param2", "asdf");
modifier.addModify("/configuration/test/param3", "true");
modifier.modify();

I would point you to a new/novel way of doing what you described, by using VTD-XML... there are numerous reasons why VTD-XML is far better than all other solutions provided for this question... here are a few links ...
Simplify XML processing with vtd-xml
Manipulate XML the Ximple Way
Processing XML with Java – A Performance Benchmark
dfs
import com.ximpleware.*;
import java.io.*;
public class modifyXML {
public static void main(String[] s) throws VTDException, IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/configuration/param1/text()");
XMLModifier xm = new XMLModifier(vn);
// using XPath
int i=ap.evalXPath();
if(i!=-1){
xm.updateToken(i, "4.0");
}
String s1 ="<param2>asdf</param2>/n<test>/n<param3>true</param3>/n</test>";
xm.insertAfterElement(s1);
xm.output("output.xml");
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract a block of XML into another XML file in java - java

Parse the xml into DOM using a DocumentParser. remove the elements that you don't need from the resulting Document. write the modified Document to a new File using a Transformer. (note, the details for each of these steps can be found in any of the thousands of java xml tutorials online).

Related

Java XML Document XPath with Disable Escaping

is there any way other than using Xpath for this?

Java XML pull data on specific node

How to iterate through the org.w3c.dom.NodeList using jython?

Modify XML file with xPath

Categories

Resources