I have the following .xml file
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<name>Book-1</name>
<author>Author-1</author>
</book>
</bookstore>
The question is I want to delete the book with id="1". I want the way to delete a book node so that all its child nodes are removed automatically. Is there a way to do so?
You can use a XSLT script:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="book[#id='1']"/>
</xsl:stylesheet>
Basically what one does is copying the entire file without entities that match the pattern "book[#id='1']"
XSLT will of course introduce some overhead but the advantages are:
You can easily modify the transformation when the query becomes more complex
The execution mechanism is more bug-free than code you will write yourself (no offense, but XSLT programs are used by thousands of people and thus easy bugs will be debugged)
You can test transformations with numerous programs, by hardcoding the transformation you will have to write a testbench on your own.
In some cases XSLT will even run faster simply because programmers try to optimize execution.
Using the DOM parser (without specifying the entire file, I hope you are familiar with DOM):
NodeList nList = doc.getElementsByTagName("book");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
if(eElement.getAttribute("id").equals("1")) {
doc.removeChild(eElement);
}
}
}
There is another simple solution available to parse the XML file altogether. Browse through the documentation of Xstream
Related
I want to fetch the count of attributes using xpath in java. I know we can use DOM parsers but my input file is going to be very large. I can't really use SAX as there are multiple nested tags I need to take care of. I'm also not sure what all attributes are going to be inside the xml document. Having xpath would make my life easier but im really worried dom parser will choke the memory. I read about s9 apis but coudn't really solve it. Are there any other alternate libraries in JAVA that uses xpath without DOM parser? Sharing examples would be really helpful
Lets say my input is
<?xml version="1.0" encoding="UTF-8"?>
<cricketers>
<continent>
<team>
<aussies>
<cricketer type="righty">
<name>Smith</name>
<role>Captain</role>
<position>Wicket-Keeper</position>
</cricketer>
<cricketer type="lefty">
<name>Warner</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</aussies>
</team>
</continent>
<continent>
<team>
<england>
<cricketer type="righty">
<name>Morgan</name>
<role>Captain</role>
<position>Covers</position>
</cricketer>
<cricketer type="lefty">
<name>Cook</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</england>
</team>
</continent>
<continent>
<team>
<aussies>
<cricketer type="righty">
<name>Smith</name>
<role>Captain</role>
<position>Wicket-Keeper</position>
</cricketer>
<cricketer type="lefty">
<name>Warner</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</aussies>
</team>
</continent>
</cricketers>
Given an xpath //team/aussies/cricketer, the count is 4 in this case.
I want to implement something like this without DOM parser
With XSLT 3 supporting streaming (e.g. with Saxon EE 10 or 9.9) you can use
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:output method="adaptive"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
<xsl:sequence select="count(//#*)"/>
</xsl:template>
</xsl:stylesheet>
if the task is really only to count all attributes. Saxon should run that in a single, forwards only parse through the whole document without building a full tree of all nodes.
Counting elements selected without predicates doing child selection, like
<xsl:template match="/">
<xsl:sequence select="count(//team/aussies/cricketer)"/>
</xsl:template>
should also work.
In the s9api, you simply need to make sure you pass in the input document as a stream to the Xslt30Transformer e.g.
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource("count-example1.xsl"));
Xslt30Transformer transformer = executable.load30();
XdmValue result = transformer.applyTemplates(new StreamSource("sample1.xml"));
System.out.println(result);
I want to parse a xml element that has the following incidents:
and no xml declaration
can serve the elements in no particular order
<employees>
<employee>
<details>
<name>Joe</name>
<age>34</age>
</details>
<address>
<street>test</street>
<nr>12</nr>
</address>
</employee>
<employee>
<address>....</address>
<details>
<!-- note the changed order of elements! -->
<age>24</age>
<name>Sam</name>
</details>
</employee>
</employees>
Output should be a csv:
name;age;street;nr
Joe,34,test,12
Sam,24,...
Problem: when using event-driven parsers like stax/sax, I would have to create a temporary Employee bean whose properties I set on each event node, and lateron convert the bean to csv.
But as my xml file is several GB in size, I'd like to prevent having to create additional bean objects for each entry.
Thus I probably have to use plain old DOM parsing? Correct my if I'm wrong, I'm happy for any suggestions.
I tried as follows. Problem is that doc.getElementsByTagName("employees") returns an empty nodelist, while I'd expect one xml element. Why?
StringBuilder sb = new StringBuilder();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
doc.getDocumentElement().normalize();
NodeList employees = doc.getElementsByTagName("employees");
for (int i = 0; i < employees.getLength(); i++) {
Node employee = employees.item(i);
if (employees.getNodeType() == Node.ELEMENT_NODE) {
NodeList employee = ((Element) employees).getElementsByTagName("employee");
for (int j = 0; j < employee.getLength(); j++) {
NodeList details = ((Element) employee).getElementsByTagName("details");
//the rest is pseudocode
for (details)
sb.append(getElements("name").item(0) + ",");
sb.append(getElements("age").item(0) + ",");
for (address)
sb.append(getElements("street").item(0) + ",");
sb.append(getElements("nr").item(0) + ",");
}
}
}
A DOM solution is going to use a lot of memory, a SAX/Stax solution is going to involve writing and debugging a lot of code. The ideal tool for this job is an XSLT 3.0 streamable transformation:
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:template match="employee">
<xsl:value-of select="copy-of(.)!(.//name, .//age, .//street, .//nr)"
separator=","/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:transform>
NOTE
I originally wrote the select expression as copy-of(.)//(name, age, street, nr). This is incorrect, because the // operator sorts the results into document order, which we don't want. The use of ! and , carefully avoids the sorting.
Do not use a StringBuilder but write immediately to the file (Files.newBufferedWriter).
It is not a big deal to manually parse the XML as there does not seem to be a high level of complexity, neither need of XML based validation.
DOM parsing would build a document object model, just what you would not want.
Stax needs to build a full employee if sub-elements are unordered.
So doing reading an employee yourself would not be that different.
Also the XML seems not to originate from XML writing, and might need to patch XML invalid text, like & that should be & in XML.
If the XML is valid (you could have a Reader that adds <?xml ...> in front), scanning through the XML would be:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader( ... );
while(r.hasNext()) {
r.next();
}
That easily allows maintaing a Map for employee attributes, started with <employee> and ending, being validated and written at </employee>.
I have a java REST API and we recently changed domain. The api is versioned although up to now this has involved adding removing elements across the versions.
I would like to change the namespaces if someone goes back to previous versions but I am struggling. I have realised now, after some hacking about, that it is probably because I am changing the namespace of the xml that is actually being referenced. I was thinking of it as a text document but I guess the tool is not ?
So looking at this xml with the n#namespace url veg.com ->
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://veg.com/app/api/apple/red"
xmlns:ns4="http://veg.com/app/banana" xmlns:ns5="http://veg.com/app/api/pear" xmlns:ns6="http://veg.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
<ns2:name>granny smith</ns2:title>
<ns2:flavour>sweet</ns2:status>
<ns2:origin>southwest region</ns2:grantCategory>
...
</ns2:apple>
I would like to change the namespaces to fruit.com. This is a very hacky unit test which shows the broad approach that I have been trying...
#Test
public void testNamespaceChange() throws Exception {
Document appleDoc = load("apple.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
org.w3c.dom.Node node = (org.w3c.dom.Node) xpath.evaluate("//*[local-name()='apple']", appleDoc , XPathConstants.NODE);
NamedNodeMap nodeMap = node.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
if (nodeMap.item(i).getNodeName().startsWith("xmlns:ns")) {
nodeMap.item(i).setTextContent( nodeMap.item(i).getNodeValue().replace( "veg.com", "fruit.com"));
}
}
//Check values have been set
for (int i = 0; i < nodeMap.getLength(); i++) {
System.out.println(nodeMap.item(i).getNodeName());
System.out.println(nodeMap.item(i).getNodeValue());
System.out.println("----------------");
}
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), result);
System.out.println("XML IN String format is: \n" +
writer.toString());
}
So the result of this is that the loop of nodeMap items shows the updates taking hold
i.e. all updated along these lines
xmlns:ns1
http://fruit.com/app/api
-------------------------------------------
xmlns:ns2
http://fruit.com/app/api/apple
-------------------------------------------
xmlns:ns3
http://fruit.com/app/api/apple/red
-------------------------------------------
...
but when I print out the transfomed document I get what I see in the api response...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://fruit.com/app/api/apple/red"
xmlns:ns4="http://fruit.com/app/banana" xmlns:ns5="http://fruit.com/app/api/pear" xmlns:ns6="http://fruit.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
The sibling (and further down the hierarchy) namespaces have been changed but ns1 and ns2 have remained unchanged.
Can anyone tell me why and whether there is a simple way for me to update them ? I guess the next step for me might be to stream the xml doc into a string, update them as text and then reload it as an xml document but I'm hoping I'm being defeatist and there is a more elegant solution ?
I would solve it with an XSLT like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*[namespace-uri()='http://veg.com/app/api/apple']" priority="1">
<xsl:element name="local-name()" namespace="http://fruit.com/app/api/apple">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This stylesheet combines the identity transform with a template which changes namespace of elements in http://veg.com/app/api/apple to http://fruit.com/app/api/apple.
I think it is much simpler that Java code that you have. You'd be also more flexible, should you find out you have more differences between version of you XML apart just namespaces.
Please consider this to be a rough sketch. I wrote a book on XSLT some 15 years ago, but did not use XSLT for more than 6 or 7 years.
I have an XML file and I know the node name I need to change the value for.
The nodename is ipAddress.
I can use JDOM, get document, get node and change the value and write it or I can write an XSLT file.
The code changing value goes from Java, so my question is which option is better? The size of the XML file can be different.
Another XSLT-related question: Is it possible to write an XSLT file such that I will not be listing all nodes that are in XML but will just specify like if node == ipAddress, then take the new value, and how would I apply the XSLT transformation from Java?
Thank you.
You could use the standard org.w3c.dom APIs to get a DOM. Then get the node using the standard javax.xml.xpath APIs. And then use the javax.xml.transform APIs to write it back out.
Something like:
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document = dbf.newDocumentBuilder().parse(new File("input.xml"));
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
XPathExpression expression = xpath.compile("//A/B[C/E/text()=13]");
Node b13Node = (Node) expression.evaluate(document, XPathConstants.NODE);
b13Node.getParentNode().removeChild(b13Node);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(document), new StreamResult(System.out));
}
}
XSLT solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pNewIpAddress" select="'192.68.0.1'"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ipAddress/text()">
<xsl:value-of select="$pNewIpAddress"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any document, all nodes of the document are copied "as-is" except for the text-node child of any ipAddress element (regardless where this element is in the document). The latter is replaced with the value of an externally provided parameter, named $pNewIpAddress.
For example, if the transformation is applied against this XML document:
<t>
<a>
<b>
<ipAddress>127.0.0.1</ipAddress>
</b>
<c/>
</a>
<d/>
</t>
the wanted, correct result is produced:
<t>
<a>
<b>
<ipAddress>192.68.0.1</ipAddress>
</b>
<c/>
</a>
<d/>
</t>
There are many Java-based XSLT processors and the proper place to understand how they can be invoked from Java is their documentation. One of the best such XSLT processors is Saxon and its documentation can be found at:
http://www.saxonica.com/documentation/documentation.xml
I want to take an attribute found thru xpath and replace it in the Document.
This is the xml:
<MineX STATE="add">
<Desc F_CREATOR="admin" F_ENTRYDATE="2010-12-24" F_HEIGHT="0.875" F_ID="1" F_LEFT="1.15625" F_LINE_COLOR="255" F_FORECOLOR="0">
<F_CUSTOM_BYTES></F_CUSTOM_BYTES>
</Desc>
</MineX>
With Java, I can retrieve the value like this:
org.w3c.dom.Document xmlDoc = getDoc(path);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression myExp = xpath.compile("//MineX/Desc/#F_LINE_COLOR");
System.out.println("Line color:" + (String)myExp.evaluate(xmlDoc, XPathConstants.STRING) + "\n");
This prints out: 255
So, what XPath function will allow me to replace the 255, for another string?
Or do I need something other than XPath for this?
So, what XPath function will allow me
to replace the 255, for another
string? Or do I need something other
than XPath for this?
XPath is the query language for XML and as such cannot modify an XML document.
In order to modify an XML document one needs to use the programming language (such as XSLT, C#, JS, PHP, ..., etc) that is hosting XPath.
Here is a solution, where the hosting language is XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pNewLineColor" select="123"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#F_LINE_COLOR">
<xsl:attribute name="{name()}">
<xsl:value-of select="$pNewLineColor"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<MineX STATE="add">
<Desc F_CREATOR="admin"
F_ENTRYDATE="2010-12-24"
F_HEIGHT="0.875"
F_ID="1"
F_LEFT="1.15625"
F_LINE_COLOR="255"
F_FORECOLOR="0">
<F_CUSTOM_BYTES></F_CUSTOM_BYTES>
</Desc>
</MineX>
the wanted, correct result is produced:
<MineX STATE="add">
<Desc F_CREATOR="admin"
F_ENTRYDATE="2010-12-24"
F_HEIGHT="0.875"
F_ID="1"
F_LEFT="1.15625"
F_LINE_COLOR="123"
F_FORECOLOR="0">
<F_CUSTOM_BYTES></F_CUSTOM_BYTES>
</Desc>
</MineX>
XPath is a query language for extracting information out of an XML file. As far as I know it is not suited for replacing or editing data in an XML. One way to transform an XML is via XSLT.