Change xml namespace url in Java - java

I have a java REST API and we recently changed domain. The api is versioned although up to now this has involved adding removing elements across the versions.
I would like to change the namespaces if someone goes back to previous versions but I am struggling. I have realised now, after some hacking about, that it is probably because I am changing the namespace of the xml that is actually being referenced. I was thinking of it as a text document but I guess the tool is not ?
So looking at this xml with the n#namespace url veg.com ->
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://veg.com/app/api/apple/red"
xmlns:ns4="http://veg.com/app/banana" xmlns:ns5="http://veg.com/app/api/pear" xmlns:ns6="http://veg.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
<ns2:name>granny smith</ns2:title>
<ns2:flavour>sweet</ns2:status>
<ns2:origin>southwest region</ns2:grantCategory>
...
</ns2:apple>
I would like to change the namespaces to fruit.com. This is a very hacky unit test which shows the broad approach that I have been trying...
#Test
public void testNamespaceChange() throws Exception {
Document appleDoc = load("apple.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
org.w3c.dom.Node node = (org.w3c.dom.Node) xpath.evaluate("//*[local-name()='apple']", appleDoc , XPathConstants.NODE);
NamedNodeMap nodeMap = node.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
if (nodeMap.item(i).getNodeName().startsWith("xmlns:ns")) {
nodeMap.item(i).setTextContent( nodeMap.item(i).getNodeValue().replace( "veg.com", "fruit.com"));
}
}
//Check values have been set
for (int i = 0; i < nodeMap.getLength(); i++) {
System.out.println(nodeMap.item(i).getNodeName());
System.out.println(nodeMap.item(i).getNodeValue());
System.out.println("----------------");
}
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), result);
System.out.println("XML IN String format is: \n" +
writer.toString());
}
So the result of this is that the loop of nodeMap items shows the updates taking hold
i.e. all updated along these lines
xmlns:ns1
http://fruit.com/app/api
-------------------------------------------
xmlns:ns2
http://fruit.com/app/api/apple
-------------------------------------------
xmlns:ns3
http://fruit.com/app/api/apple/red
-------------------------------------------
...
but when I print out the transfomed document I get what I see in the api response...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://fruit.com/app/api/apple/red"
xmlns:ns4="http://fruit.com/app/banana" xmlns:ns5="http://fruit.com/app/api/pear" xmlns:ns6="http://fruit.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
The sibling (and further down the hierarchy) namespaces have been changed but ns1 and ns2 have remained unchanged.
Can anyone tell me why and whether there is a simple way for me to update them ? I guess the next step for me might be to stream the xml doc into a string, update them as text and then reload it as an xml document but I'm hoping I'm being defeatist and there is a more elegant solution ?

I would solve it with an XSLT like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*[namespace-uri()='http://veg.com/app/api/apple']" priority="1">
<xsl:element name="local-name()" namespace="http://fruit.com/app/api/apple">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This stylesheet combines the identity transform with a template which changes namespace of elements in http://veg.com/app/api/apple to http://fruit.com/app/api/apple.
I think it is much simpler that Java code that you have. You'd be also more flexible, should you find out you have more differences between version of you XML apart just namespaces.
Please consider this to be a rough sketch. I wrote a book on XSLT some 15 years ago, but did not use XSLT for more than 6 or 7 years.

Related

How to parse large xml document with DOM?

I want to parse a xml element that has the following incidents:
and no xml declaration
can serve the elements in no particular order
<employees>
<employee>
<details>
<name>Joe</name>
<age>34</age>
</details>
<address>
<street>test</street>
<nr>12</nr>
</address>
</employee>
<employee>
<address>....</address>
<details>
<!-- note the changed order of elements! -->
<age>24</age>
<name>Sam</name>
</details>
</employee>
</employees>
Output should be a csv:
name;age;street;nr
Joe,34,test,12
Sam,24,...
Problem: when using event-driven parsers like stax/sax, I would have to create a temporary Employee bean whose properties I set on each event node, and lateron convert the bean to csv.
But as my xml file is several GB in size, I'd like to prevent having to create additional bean objects for each entry.
Thus I probably have to use plain old DOM parsing? Correct my if I'm wrong, I'm happy for any suggestions.
I tried as follows. Problem is that doc.getElementsByTagName("employees") returns an empty nodelist, while I'd expect one xml element. Why?
StringBuilder sb = new StringBuilder();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
doc.getDocumentElement().normalize();
NodeList employees = doc.getElementsByTagName("employees");
for (int i = 0; i < employees.getLength(); i++) {
Node employee = employees.item(i);
if (employees.getNodeType() == Node.ELEMENT_NODE) {
NodeList employee = ((Element) employees).getElementsByTagName("employee");
for (int j = 0; j < employee.getLength(); j++) {
NodeList details = ((Element) employee).getElementsByTagName("details");
//the rest is pseudocode
for (details)
sb.append(getElements("name").item(0) + ",");
sb.append(getElements("age").item(0) + ",");
for (address)
sb.append(getElements("street").item(0) + ",");
sb.append(getElements("nr").item(0) + ",");
}
}
}
A DOM solution is going to use a lot of memory, a SAX/Stax solution is going to involve writing and debugging a lot of code. The ideal tool for this job is an XSLT 3.0 streamable transformation:
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:template match="employee">
<xsl:value-of select="copy-of(.)!(.//name, .//age, .//street, .//nr)"
separator=","/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:transform>
NOTE
I originally wrote the select expression as copy-of(.)//(name, age, street, nr). This is incorrect, because the // operator sorts the results into document order, which we don't want. The use of ! and , carefully avoids the sorting.
Do not use a StringBuilder but write immediately to the file (Files.newBufferedWriter).
It is not a big deal to manually parse the XML as there does not seem to be a high level of complexity, neither need of XML based validation.
DOM parsing would build a document object model, just what you would not want.
Stax needs to build a full employee if sub-elements are unordered.
So doing reading an employee yourself would not be that different.
Also the XML seems not to originate from XML writing, and might need to patch XML invalid text, like & that should be & in XML.
If the XML is valid (you could have a Reader that adds <?xml ...> in front), scanning through the XML would be:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader( ... );
while(r.hasNext()) {
r.next();
}
That easily allows maintaing a Map for employee attributes, started with <employee> and ending, being validated and written at </employee>.

Accessing unparsed entities in XSLT with a SAXTransformerFactory and TransformerHandlers

I have some trouble while retrieving unparsed entity URIs, with the XPath function unparsed-entity-uri().
I'm using a SAXTransformerFactory like in "Efficient XSLT pipeline in Java" question, because I need to perform a transformations chain (i.e. apply several XSLT transformations, and use the result of a transformation as input for the second transformation).
I discovered I'm unable to retrieve unparsed entity thank to the code below. Actually it works well with Xalan, but not with Saxon-HE (version 9.7.0) - but I need Saxon because I'd rather XSLT 2.0 (even if in the code below there's nothing specific to XSLT 2, it's only for the sake of providing an example). It also works with Saxon if I don't use a TransformerHandler, e.g. stf.newTransformer(new StreamSource("transfo.xsl")).transform(new StreamSource("input.xsl"), new StreamResult(System.out)) will produce the desired output.
Is there a configuration step that I forgot?
// use "org.apache.xalan.processor.TransformerFactoryImpl" for Xalan
String transformerFactoryClassName = "net.sf.saxon.TransformerFactoryImpl";
SAXTransformerFactory stf = (SAXTransformerFactory) TransformerFactory.newInstance(transformerFactoryClassName,
LaunchSimpleTransformationUnparsedEntities.class.getClassLoader());
try {
TransformerHandler thTransf = stf
.newTransformerHandler(new StreamSource("transfo.xsl"));
// output the result in console
thTransf.setResult(new StreamResult(System.out));
// Launch transformation of input.xml
Transformer t = stf.newTransformer();
t.transform(new StreamSource("input.xml"),
new SAXResult(thTransf));
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
}
In input, I have (for input.xml):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE book
[<!ENTITY cover_hadrien SYSTEM "images/covers/cover_hadrien.jpg" NDATA jpeg>]>
<book>
<title>Les mémoires d'Hadrien</title>
<author>Marguerite Yourcenar</author>
<cover imgref="cover_hadrien" />
</book>
and a sample XSLT (for transfo.xsl):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="cover">
<xsl:copy>
<xsl:value-of select="unparsed-entity-uri(#imgref)"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
as a result, I would expect something like:
<?xml version="1.0" encoding="UTF-8"?><book>
<title>Les mémoires d'Hadrien</title>
<author>Marguerite Yourcenar</author>
<cover>images/covers/cover_hadrien.jpg</cover>
</book>
but <cover> is empty when performing the transformation with Saxon.
Interesting observation. The issue in fact is not with Saxon's TransformerHandler, but rather with the "identity transformer" obtained using SAXTransformerFactory.newTransformer(): the identity transformer is not passing unparsed entities down the line. This is essentially because Saxon's identity transformer is reusing parts of the XSLT engine, and XSLT does not provide any way for a transformation to output unparsed entities in the result. If you sent the SAX parser output directly to the TransformerHandler, rather than going via an identity transformer, then I think it would all work.
As with all things JAXP-related, the specification of SAXTransformerFactory.newTransformer() is infuriatingly vague. All it says is that the returned Transformer performs a copy of the Source to the Result. i.e. the "identity transform". What exactly counts as a copy? I think Saxon's interpretation has been that it is equivalent to the effect of doing an XSLT identity transform - which would lose unparsed entities (as well as other things like CDATA sections, the DTD, etc).
Incidentally XSLT 2.0 specifies that the result of unparsed-entity-uri() should be an absolute URI (XSLT 1.0 doesn't say anything on the subject) so even if this is fixed, the Saxon output will be different.
Entered as a Saxon issue here: https://saxonica.plan.io/issues/3201 I think we need to be a bit careful about passing unparsed entities to a SAXResult if we don't pass all the other events expected by a SAX DTDHandler - and we're certainly not going to change the Saxon identity transformer to retain things (like DTD declarations) that aren't modelled in XDM.
Indeed, following #MichaelKay's details, launching the transformation that way works properly:
// launch transformation of input.xml
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(thTransf);
reader.setDTDHandler(thTransf);
reader.parse(new InputSource(input.xml"));
(this will replace the following line:
// Launch transformation of input.xml
Transformer t = stf.newTransformer();
t.transform(new StreamSource("input.xml"),
new SAXResult(thTransf));
that were used initially).

How to delete a parent node will all its child nodes?

I have the following .xml file
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<name>Book-1</name>
<author>Author-1</author>
</book>
</bookstore>
The question is I want to delete the book with id="1". I want the way to delete a book node so that all its child nodes are removed automatically. Is there a way to do so?
You can use a XSLT script:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="book[#id='1']"/>
</xsl:stylesheet>
Basically what one does is copying the entire file without entities that match the pattern "book[#id='1']"
XSLT will of course introduce some overhead but the advantages are:
You can easily modify the transformation when the query becomes more complex
The execution mechanism is more bug-free than code you will write yourself (no offense, but XSLT programs are used by thousands of people and thus easy bugs will be debugged)
You can test transformations with numerous programs, by hardcoding the transformation you will have to write a testbench on your own.
In some cases XSLT will even run faster simply because programmers try to optimize execution.
Using the DOM parser (without specifying the entire file, I hope you are familiar with DOM):
NodeList nList = doc.getElementsByTagName("book");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
if(eElement.getAttribute("id").equals("1")) {
doc.removeChild(eElement);
}
}
}
There is another simple solution available to parse the XML file altogether. Browse through the documentation of Xstream

Doing DOM Node-to-String transformation, but with namespace issues

So we have an XML Document with custom namespaces. (The XML is generated by software we don't control. It's parsed by a namespace-unaware DOM parser; standard Java7SE/Xerces stuff, but also outside our effective control.) The input data looks like this:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
.... 18 blarzillion lines of XML ....
<Thing CustomAttr:gibberish="borkborkbork" ... />
.... another 27 blarzillion lines ....
</MainTag>
The Document we get is usable and xpath-queryable and traversable and so on.
Converting this Document into a text format for writing out to a data sink uses the standard Transformer approach described in a hundred SO "how do I change my XML Document into a Java string?" questions:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter stringwriter = new StringWriter();
transformer.transform (new DOMSource(theXMLDocument), new StreamResult(stringwriter));
return stringwriter.toString();
and it works perfectly.
But now I'd like to transform individual arbitrary Nodes from that Document into strings. A DOMSource constructor accepts Node pointers just the same as it accepts a Document (and in fact Document is just a subclass of Node, so it's the same API as far as I can tell). So passing in an individual Node in the place of "theXMLDocument" in the snippet above works great... until we get to the Thing.
At that point, transform() throws an exception:
java.lang.RuntimeException: Namespace for prefix 'CustomAttr' has not been declared.
at com.sun.org.apache.xml.internal.serializer.SerializerBase.getNamespaceURI(Unknown Source)
at com.sun.org.apache.xml.internal.serializer.SerializerBase.addAttribute(Unknown Source)
at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.addAttribute(Unknown Source)
......
That makes sense. (The "com.sun.org.apache" is weird to read, but whatever.) It makes sense, because the namespace for the custom attribute was declared at the root node, but now the transformer is starting at a child node and can't see the declarations "above" it in the tree. So I think I understand the problem, or at least the symptom, but I'm not sure how to solve it though.
If this were a String-to-Document conversion, we'd be using a DocumentBuilderFactory instance and could call .setNamespaceAware(false), but this is going in the other direction.
None of the available properties for transformer.setOutputProperty() affect the namespaceURI lookup, which makes sense.
There is no such corresponding setInputProperty or similar function.
The input parser wasn't namespace aware, which is how the "upstream" code got as far as creating its Document to hand to us. I don't know how to hand that particular status flag on to the transforming code, which is what I really would like to do, I think.
I believe it's possible to (somehow) add a xmlns:CustomAttr="http://BlitherBlither" attribute to the Thing node, the same as the root MainTag had. But at that point the output is no longer identical XML to what was read in, even if it "means" the same thing, and the text strings are eventually going to be compared in the future. We wouldn't know if it were needed until the exception got thrown, then we could add it and try again... ick. For that matter, changing the Node would alter the original Document, and this really ought to be a read-only operation.
Advice? Is there some way of telling the Transformer, "look, don't stress your dimwitted little head over whether the output is legit XML in isolation, it's not going to be parsed back in on its own (but you don't know that), just produce the text and let us worry about its context"?
Given your quoted error message "Namespace for prefix 'CustomAttr' has not been declared.",
I'm assuming that your pseudo code is along the lines of:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
.... 18 blarzillion lines of XML ....
<Thing CustomAttr:attributeName="borkborkbork" ... />
.... another 27 blarzillion lines ....
</MainTag>
With that assumption, here's my suggestion:
So you want to extract the "Thing" node from the "big" XML. The standard approach is to use a little XSLT to do that. You prepare the XSL transformation with:
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new File("isolate-the-thing-node.xslt")));
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setParameter("elementName", stringWithCurrentThing); // parameterize transformation for each Thing
...
EDIT: #Ti, please note the parameterization instruction above (and below in the xslt).
The file 'isolate-the-thing-node.xslt' could be a flavour of the following:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:custom0="http://BlahBlahBlah"
xmlns:custom1="http://BlitherBlither"
version="1.0">
<xsl:param name="elementName">to-be-parameterized</xsl:param>
<xsl:output encoding="utf-8" indent="yes" method="xml" omit-xml-declaration="no" />
<xsl:template match="/*" priority="2" >
<!--<xsl:apply-templates select="//custom0:Thing" />-->
<!-- changed to parameterized selection: -->
<xsl:apply-templates select="custom0:*[local-name()=$elementName]" />
</xsl:template>
<xsl:template match="node() | #*" priority="1">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Hope that gets you over the "Thing" thing :)
I have managed to parse the provided document, get the Thing node and print it without issues.
Take a look at the Working Example:
Node rootElement = d.getDocumentElement();
System.out.println("Whole document: \n");
System.out.println(nodeToString(rootElement));
Node thing = rootElement.getChildNodes().item(1);
System.out.println("Just Thing: \n");
System.out.println(nodeToString(thing));
nodeToString:
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
Output:
Whole document:
<?xml version="1.0" encoding="UTF-8"?><MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
<Thing CustomAttr="borkborkbork"/>
</MainTag>
Just Thing:
<?xml version="1.0" encoding="UTF-8"?><Thing CustomAttr="borkborkbork"/>
When I try the same code with CustomAttr:attributeName as suggested by #marty it fails with the original exception, so it looks like somewhere in your original XML you are prefixing a attribute or node with that custom CustomAttr namespace.
In the latter case you can leverage the problem with setNamespaceAware(true), which will include the namespace information on the Thing node itself.
<?xml version="1.0" encoding="UTF-8"?><Thing xmlns:CustomAttr="http://BlitherBlither" CustomAttr:attributeName="borkborkbork" xmlns="http://BlahBlahBlah"/>

Java xsl transform produces weird content after xsl-copy

I am using the following xsl to retrieve the schema types from wsdl
<?xml version="1.0" encoding="UTF-8"?><xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="//*[local-name()='schema' and namespace-uri()='http://www.w3.org/2001/XMLSchema']">
<xsl:copy-of select="."></xsl:copy-of>
</xsl:template></xsl:transform>
and the following java snippet
tFactory = TransformerFactory.newInstance();
xslSource = new StreamSource("resources/input/wsdl2xsd.xsl");
xmlSource = new StreamSource(wsdlURI.toString());
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
outStream = new BufferedOutputStream(byteArrayOutputStream);
transformer = tFactory.newTransformer(xslSource);
transformer.transform(xmlSource, new StreamResult(outStream));
System.out.print(new String(byteArrayOutputStream.toByteArray()));
but for some reason I am getting a lot for empty space with some weird string after the end of my 'schema' element?. It works for some wslds though. I tried eBays public wsdl which is huge:
http://developer.ebay.com/webservices/741/eBaySvc.wsdl
which outputs strings after the 'schema' element.
Well your current stylesheet code relies on the built-in templates to reach your own template and on that way you might get output from text nodes.
As you don't seem to want that you can either add <xsl:template match="text()"/> to your code or you might want a different approach where you simply do e.g. <xsl:template match="/"><xsl:copy-of select="//xs:schema" xmlns:xs="http://www.w3.org/2001/XMLSchema"/></xsl:template>.

Categories

Resources