Remove data from XML file in DOM? - java

Is there an easy way (perhaps using the DOM api, or other) where I could remove the actual data from an XML file, leaving behind just a kind of template of its schema, so that we can see what potential information it can hold.
I will give an example, to make this clear.
Consider the users inputs the following xml file:
<photos page="2" pages="89" perpage="10" total="881">
<photo id="2636" owner="47058503995#N01"
secret="a123456" server="2" title="test_04"
ispublic="1" isfriend="0" isfamily="0" />
<photo id="2635" owner="47058503995#N01"
secret="b123456" server="2" title="test_03"
ispublic="0" isfriend="1" isfamily="1" />
<photo id="2633" owner="47058503995#N01"
secret="c123456" server="2" title="test_01"
ispublic="1" isfriend="0" isfamily="0" />
<photo id="2610" owner="12037949754#N01"
secret="d123456" server="2" title="00_tall"
ispublic="1" isfriend="0" isfamily="0" />
</photos>
Then I want to transform this into:
<photos page=“..." pages=“..." perpage=“..." total=“...">
<photo id=“.." owner=“.."
secret=“..." server=“..." title=“..."
ispublic=“..." isfriend=“..." isfamily=“...” />
</photos>
I’m sure this could be written manually, but would be the be best, most efficient and reliable way of doing this. (preferably in Java).
Thnx!

There are plenty of possibilities:
DOM API (included in JDK)
SAX API (included in JDK)
JDOM (easy to use, but external)
XSLT (transforming XML with prepared XSL stylesheet, JDK supports XSLT 1.0)
I think that XSLT is most reliable and universal way to transform XML into another XML. Here is some quick example:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()[position()=1]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{name()}">...</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Result:
<photos page="..." pages="..." perpage="..." total="...">
<photo id="..." owner="..." secret="..." server="..." title="..." ispublic="..."
isfriend="..."
isfamily="..."/>
</photos>

Rather than use the DOM API, in which you'd have to iterate across the structure yourself, take a look at the SAX API, which iterates itself and calls you back for each element, text node etc. For each element you get called back for, you'll get the set of attributes too.
You'd still have to determine what to output, reduce duplicates etc. But you get a callback for an end-of-element as well, so perhaps record everything you get given, and then for your end-of-element callback, just determine the unique set of data you wish to output.

There are heaps of XML parsers available that you can use to do this job. If you are interested in learning then try XmlBeans or JAXB. These APIs gives you great deal of control and validations. Plus you get to learn XSD and generation of java classes from XSD. Also parsing and writing into XML files is fairly easy with these APIs. Following are some useful links,
XmlBeans
JAXB 2.0

Related

If regEx is bad for matching XML, what is the correct way?

I was trying to do a simple string delete in XML.
I want to delete something like the following.
<A>
<B>Test Name</B>
</A>
Has to work with all possible XML, though.
<Test><A><B>Test Name</B></A></Test>
<Test ><A ><B >Test Name</B ></A ></Test >
<Test>
<A>
<B>Test Name</B>
</A>
</Test>
etc, etc.
The regularEX I got so far, is simply:
<A>\s*(\r\n|\r|\n)*\s*<B>Test Name<\/B>\s*(\r\n|\r|\n)*\s*<\/A>
Everyone always says regEx is bad for match XML, which it clearly is. So what should I use instead.
GC_
The best approach for this case would be using XSLT. And even with XSLT-1.0 this is simple (You can use the Java XSLT-processor, linux'es xsltproc or any other XSLT processor; every XSLT processor supports at least XSLT-1.0):
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*" />
<!-- identity template - matches everything except the things matched by other templates -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Removes the elements you do not want -->
<xsl:template match="A[B[normalize-space(.)='Test Name']]" />
</xsl:stylesheet>
The output of your sample case (with a hypothetical root element) would be
<Test/>
<Test/>
<Test/>
Trying to use RegEx would be error-prone and no good-practice at all.
Why would you make it complicated if it could be so easy?

Dynamic XML response based on the XML input

We are building a web service which takes an XML input and provides and XML response based on the input. The input XML contains the XML structure with empty value, the empty values will be replaced by the actual value based on the data in the database. The user can reduce the number of nodes requested in the input XML, if they do not want all the details.
Sample Input XML
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<RequestHeader>
<id>123</id>
</RequestHeader>
<RequestedElement>
<ABC>
<DEF>
<GHI />
<JKL>
<MNO />
</JKL>
</DEF>
</ABC>
</RequestedElement>
</XML>
Sample Response
<XML>
<ABC>
<DEF>
<GHI>Value1</GHI>
<JKL>
<MNO>Value2</MNO>
</JKL>
</DEF>
</ABC>
</XML>
The way I have implemented now, is to having a mapping of the XML nodes to the table names and column names, and then use reflection to retrieve the data from the database and generate the XML. However using reflection is slowing down the whole process.
The other option which I can think of is to get rid of reflection and to create the XML with all the nodes, and then use XSLT to generate the final XML with only the requested nodes. Is this possible to do so with XSLT?
Is there any better option to do the same which can increase the performance and get the desired results?
We are building a web service which takes an XML input and provides
and XML response based on the input. The input XML contains the XML
structure with empty value, the empty values will be replaced by the
actual value based on the data in the database. The user can reduce
the number of nodes requested in the input XML, if they do not want
all the details
One way to do this is to invoke the chosen XSLT processor (how to do this differs from one XSLT processor to another -- read the documentation) passing to it global transformation parameters, whose names are the names of the elements that should receive the corresponding values.
In the transformation below the values of the parameters are hardcoded, but in the real implementation these must be provided by the invoker -- so in the XSLT code they will look like this:
<xsl:param name="GHI"/>
<xsl:param name="MNO"/>
Here is the complete transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="GHI">Value1</xsl:param>
<xsl:param name="MNO">Value2</xsl:param>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="RequestHeader"/>
<xsl:template match="*[not(node()) and name() = document('')/*/xsl:param/#name]">
<xsl:copy>
<xsl:value-of select="document('')/*/xsl:param[#name = name(current())]"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<XML>
<RequestHeader>
<id>123</id>
</RequestHeader>
<RequestedElement>
<ABC>
<DEF>
<GHI />
<JKL>
<MNO />
</JKL>
</DEF>
</ABC>
</RequestedElement>
</XML>
the wanted result is produced:
<XML>
<RequestedElement>
<ABC>
<DEF>
<GHI>Value1</GHI>
<JKL>
<MNO>Value2</MNO>
</JKL>
</DEF>
</ABC>
</RequestedElement>
</XML>

xslt transformation in java , attribute value not encoded

I am writing a java program where in some cases I have to perform a xslt transformation.
In this case i need to add an attribute named type at the level of the work.
Its value should be the same as the value of the element ns2:type_work
For example:
<ns2:work>
<ns2:type_work>PROP</ns2:type_work>
<ns2:identifier_work>2017/375/943030</ns2:identifier_work>
<ns2:work>
should become
<ns2:work type="PROP">
<ns2:type_work>PROP</ns2:type_work>
<ns2:identifier_work>2017/375/943030</ns2:identifier_work>
<ns2:work>
I have made the following XSLT
<xsl:template match="ns2:work">
<ns2:work>
<xsl:attribute name="type" select="ns2:type_work/node()" />
<xsl:apply-templates select="#*|child::node()" />
</ns2:work>
</xsl:template>
and I apply it using the proper java functios (javax.xml.transform.),
I get no erros,
the attribute -type- is created
but it is empty.
Does it have to do something with the XSLT version is my xslt not compatible with 1.0?
How can I bypass this?
If you are using XSLT 1.0, then the code needs to look like this, as select is not valid on xsl:attribute in XSLT 1.0
<xsl:attribute name="type">
<xsl:value-of select="ns2:type_work/node()" />
</xsl:attribute>
(Note that you can just do <xsl:value-of select="ns2:type_work" /> here)
Better still, use Attribute Value Templates
<ns2:work type="{ns2:type_work}" />
<xsl:apply-templates select="#*|child::node()" />
</ns2:work>

XSLT CallTemplate ForEach XML extension function

I want the $content to be the expected string. I know that copy-of instead of value-of $content produces the expected string. But how do I not use copy-of and pass it to say a java extension function?
I asked a different related question here.
XML
<?xml version="1.0"?>
<a>
<b c="d"/>
<b c="d"/>
<b c="d"/>
</a>
XSL
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="foo">
<xsl:param name="content"></xsl:param>
<!-- <xsl:copy-of select="$content"></xsl:copy-of> -->
<!-- copy-of produces the expected string here, but how to pass to Java -->
<xsl:value-of select="java:someMethod($content)" />
<!-- I want the content to be the expected string -->
</xsl:template>
<xsl:template match="/">
<xsl:call-template name="foo">
<xsl:with-param name="content">
<xsl:for-each select="a/b">
<e>
<xsl:value-of select="#c" />
</e>
</xsl:for-each>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Desired String to be passed to Java extension function from $content.
<?xml version="1.0"?>
<e>d</e>
<e>d</e>
<e>d</e>
PS: Calling foo is mandatory.
Ultimately, my objective is to emulate result-document in XSLT 1.0 with extension function.
Calling Java extension functions depends on which XSLT processor you are using, which you haven't told us. If you use Java's TransformerFactory you will get whatever is on the classpath, for example the built-in version of Xalan, the Apache version of Xalan, or Saxon.
Your description suggests you want to pass a string containing lexical XML to your Java extension function. That implies you need to serialize a node to a string. That can't be achieved without an extension function such as saxon:serialize(). It's probably easier to pass the node to your Java method, and have the Java method do the serialization.

Dynamic xml filtering and transform (in Java)

I have an XML file that looks like
<?xml version='1.0' encoding='UTF-8'?>
<root>
<node name="foo1" value="bar1" />
<node name="foo2" value="bar2" />
</root>
I have a method
String processBar(String bar)
and I want to end up with
<?xml version='1.0' encoding='UTF-8'?>
<root>
<node name="foo1" value="processBar("bar1")" />
<node name="foo2" value="processBar("bar2")" />
</root>
Is there an easy way to do this? Preferably in Java. Note that the file is too large to safely load completely into memory. The data in the XML roughly arbitrary and processBar may be complex, so I don't want to use regular expressions.
Assuming you mean replacing the attribute values with the result of calling processBar on said attribute values...
Use the JDK's XSLT API to run the following:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:java="http://xml.apache.org/xalan/java"
extension-element-prefixes="java">
<xsl:template match="/root/node/#value">
<xsl:attribute name="value">
<xsl:value-of select="java:com.example.yourclass.processBar(string(.))"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
This uses the Xalan-Java extensions and assumes a static method. You can get an instance of an object and store it in an xsl:variable, like this:
<xsl:variable name="frobber" select="java:com.example.Frobber.new()"/>
<xsl:value-of select="java:processBar($frobber, string(.))"/>
Or somesuch.
This only works with Xalan, but since that's the XSLT processor distributed with the JDK, I doubt it will be onerous to use Xalan.
you can either parse the whole thing in a java xml parser OR just get the file content into a string and then do a regexp replace on it (using i.e. http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29)

Categories

Resources