Ant and XML configuration file parsing - java

I have an XML file of the following form -
<map MAP_XML_VERSION="1.0">
<entry key="database.user" value="user1"/>
...
</map>
Does ant have a native ability to read this and let me perform an xquery to pull back values for keys? Going through the API I did not see such capabilities.

The optional Ant task XMLTask is designed to do this. Give it an XPath expression and you can select the above into (say) a property. Here's an article on how to use it, with examples. It'll do tons of other XML-related manipulations/querying/creation as well.
e.g.
<xmltask source="map.xml">
<!-- copies to a property 'user' -->
<copy path="/map/entry[#key='database.user']/#value" attrValue="true" property="user"/>
</xmltask>
Disclaimer: I'm the author.

You can use the scriptdef tag to create a JavaScript wrapper for your class. Inside JS, you pretty much have the full power of Java and can do any kind of complicated XML parsing you want.
For example:
<project default="build">
<target name="build">
<xpath-query query="//entry[#key='database.user']/#value"
xmlFile="test.xml" addproperty="value"/>
<echo message="Value is ${value}"/>
</target>
<scriptdef name="xpath-query" language="javascript">
<attribute name="query"/>
<attribute name="xmlfile"/>
<attribute name="addproperty"/>
<![CDATA[
importClass(java.io.FileInputStream);
importClass(javax.xml.xpath.XPath);
importClass(javax.xml.xpath.XPathConstants);
importClass(javax.xml.xpath.XPathFactory);
importClass(org.xml.sax.InputSource);
var exp = attributes.get("query");
var filename = attributes.get("xmlfile");
var input = new InputSource(new FileInputStream(filename));
var xpath = XPathFactory.newInstance().newXPath();
var value = xpath.evaluate(exp, input, XPathConstants.STRING);
self.project.setProperty( attributes.get("addproperty"), value );
]]>
</scriptdef>
</project>

Sounds like you want something like ant-xpath-task. I'm not aware of a built-in way to do this with Ant.

Related

Include tag in jaxb

I have an xml file of following structure:
<root>
<paramsToInclude>
<params id="id1">
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
</params>
<params id="id3">
<param31>val1</param31>
<param32>val2</param32>
</params>
</paramsToInclude>
<process>
<subprocess1>
<include params="id1"/>
<query>
SELECT *
FROM
table;
</query>
</subprocess2>
<subprocess1>
<rule>rule1</rule>
<rule>rule2</rule>
</subprocess2>
<subprocess3>
<processParam>val1</processParam>
<include params="id2"/>
<include params="id3"/>
</subprocess3>
</process>
I'm using jaxb to parse this xml into the java classes. Is there a way to substitute includes in the process by it's values from the begin of file? I mean, I wan't file to be parsed as if it look's like
<root>
<paramsToInclude>
<params id="id1">
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
</params>
<params id="id3">
<param31>val1</param31>
<param32>val2</param32>
</params>
</paramsToInclude>
<process>
<subprocess1>
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
<query>
SELECT *
FROM
table;
</query>
</subprocess2>
<subprocess1>
<rule>rule1</rule>
<rule>rule2</rule>
</subprocess2>
<subprocess3>
<processParam>val1</processParam>
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
<param31>val1</param31>
<param32>val2</param32>
</subprocess3>
</process>
is it possible t do this? I've found link http://thetechietutorials.blogspot.com/2011/08/jaxb-tutorial-part-2-jaxb-with-xinclude.html how to this include from another file, but comment says that it's impossible to do this for the same file (I understand that I can put this includes in another xml, but I don't think it's a best way). Also I don't want to use hashMap because in this way included params will be stored in hashMap and processParam (from subprocess3) will be class variable.
Is there a way to do this somehow?

XLIFF, versioning or translation updates process (Diff Leverage step)

I am considering using XLIFF to standardize localization efforts within the enterprise.
I am very new to xliff and having done some research I figured the following general process to use it:
Extract strings from development project resources (.resx for .Net,
.properties for Java) - the good way to do it as I found is to use
Rainbow from Okapi Localization Toolbox - and use "Utilities =>
Translation Kit Creation" command
Then translate the extracted file, like it is described at
http://www.opentag.com/okapi/wiki/index.php?title=How_to_Translate_XLIFF_Documents
for example using Virtaal application
And finally convert the translated xliff back into original format (resx/properties) - which is possible to do with Rainbow as well "Utilities => Translation Kit Post-Processing"
So far everything is clear, however I would like to know what are the best practices when adding or modifying the string resources? I would prefer not to have all resources to be re-translated every time there is a new string added to the string resources in original format (resx/properties)
That will be also great if there is a versioning support for the translations - so that multiple languages translations will be consolidated (provide the same set of strings) if they are marked with the same version. And version is updated when new string are added or existing strings are modified.
Is there a ready to use solution for this? Or is it something we will have to build on our own?
EDIT:
I found the Diff Leverage step in Okapi Rainbow's Pipeline library, but I have a difficulty to get it working. Here are two xliff files. First one was the first version of the resources that was translated in French, the second one is an file generated from new version of resources with the following changes:
1 string updated (AdminTitleResource is now Administration Resource)
1 string removed (HomeLinkResource is gone)
2 new strings added (Project and Company)
But running Diff Leverage pipeline doesn't produce a smart merge of the translations. Any ideas why?
The translated xliff for previous version of resources:
<?xml version="1.0" encoding="windows-1252"?><xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" version="1.2">
<file original="/Messages.resx" source-language="en-us" target-language="fr-fr" datatype="xml">
<body>
<trans-unit id="1" resname="AccessDenied" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Access denied</source>
<target xml:lang="fr-fr" state="translated">Accès refusé </target>
<note>Error message</note>
</trans-unit>
<trans-unit id="2" resname="AdminTitleResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Administration</source>
<target xml:lang="fr-fr" state="translated">Administration</target>
<note></note>
</trans-unit>
<trans-unit id="3" resname="HomeLinkResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Main page</source>
<target xml:lang="fr-fr" state="translated">Page web principale</target>
<note></note>
</trans-unit>
<trans-unit id="4" resname="SelectCategoriesResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Categories</source>
<target xml:lang="fr-fr" state="translated">Catégories</target>
<note></note>
</trans-unit>
<trans-unit id="5" resname="SelectConfigResource" xml:space="preserve">
<source xml:lang="en-us">Configuration</source>
<target xml:lang="fr-fr" state="needs-review-translation">Paramètres</target>
<note></note>
</trans-unit>
<trans-unit id="6" resname="SelectGroupsResource" xml:space="preserve">
<source xml:lang="en-us">User groups</source>
<target xml:lang="fr-fr" state="needs-review-translation">Utiliser le groupe</target>
<note></note>
</trans-unit>
</body>
</file>
</xliff>
How do I get XLIFF file with only strings that need to be translated?
The new file with changes listed above:
<?xml version="1.0" encoding="windows-1252"?><xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" version="1.2">
<file original="/Messages_v2.resx" source-language="en-us" target-language="fr-fr" datatype="xml">
<body>
<trans-unit id="1" resname="AccessDenied" xml:space="preserve">
<source xml:lang="en-us">Access denied</source>
<target xml:lang="fr-fr">Access denied</target>
<note>Error message</note>
</trans-unit>
<trans-unit id="2" resname="AdminTitleResource" xml:space="preserve">
<source xml:lang="en-us">Administration Resource</source>
<target xml:lang="fr-fr">Administration Resource</target>
<note></note>
</trans-unit>
<trans-unit id="3" resname="SelectCategoriesResource" xml:space="preserve">
<source xml:lang="en-us">Categories</source>
<target xml:lang="fr-fr">Categories</target>
<note></note>
</trans-unit>
<trans-unit id="4" resname="SelectConfigResource" xml:space="preserve">
<source xml:lang="en-us">Configuration</source>
<target xml:lang="fr-fr">Configuration</target>
<note></note>
</trans-unit>
<trans-unit id="5" resname="SelectGroupsResource" xml:space="preserve">
<source xml:lang="en-us">User groups</source>
<target xml:lang="fr-fr">User groups</target>
<note></note>
</trans-unit>
<trans-unit id="6" resname="Project" xml:space="preserve">
<source xml:lang="en-us">Project</source>
<target xml:lang="fr-fr">Project</target>
<note></note>
</trans-unit>
<trans-unit id="7" resname="Company" xml:space="preserve">
<source xml:lang="en-us">Company</source>
<target xml:lang="fr-fr">Company</target>
<note></note>
</trans-unit>
</body>
</file>
</xliff>
There is an answer for this here:
http://tech.groups.yahoo.com/group/okapitools/message/2494
EDIT: content of linked message
Hi Paul,
I am trying to figure out how to use the diff leverage
to improve the translation experience and get an
update xliff file when merging/leveraging existing
translation with new version of the document with
added/modified/removed strings.
As Jim noted, with XLIFF files you may be able to take advantage of
ID-based steps.
But the Diff leverage step would work too. Here is how to do it:
I've assumed you have the XLIFF files and just want to update them.
You could create pipelines that do additional things like create a
translation kit etc. but this will keep things simple.
First you need to put the new source file in the Input List 1 and the
translated file in the Input List 2.
Then you can create the following pipeline:
Raw document to Filter Events
Diff Leveraging
Filter Events to Raw Document
In the parameters for the Diff Leverage step: make sure the option
"Copy to/over the target" is set.
Then execute the pipeline.
I've attached a comparison (compare_out.html) between the original new
file and the output file. As you can see all the text that could be
leveraged is now in the output. Your entry 'AdminTitleResource' is not
translated because it's the source in the translated file is
different, and your two new entries are also not translated.
You'll also note the new attributes approved='yes' that are there to
indicate the translation was done. That extra flag can be used to
differentiate entries that need translation from the one that have
been leveraged.
For some reason two of the leveraged entries do not have it: I'll have
to look at that and report back. It may be a bug or some condition I
don't recall (maybe Jim does).
The Id-Based Copy step could almost be better. It would copy the
translated text by matching in the resname of the entries. I say
almost because currently it does not look at the source texts, so you
get the translation even if the new source is different (it's not a
'leveraging' step). But we could add an option to make that extra
check and that would make the step work like a leveraging step. I'll
try to find the time do this.
Hope this helps,
-yves

Ant string functions?

Does Ant have any way of doing string uppercase/lowercase/captialize/uncaptialize string manipulations? I looked at PropertyRegex but I don't believe the last two are possible with that. Is that anything else?
From this thread, use an Ant <script> task:
<target name="capitalize">
<property name="foo" value="This is a normal line that doesn't say much"/>
<!-- Using Javascript functions to convert the string -->
<script language="javascript"> <![CDATA[
// getting the value
sentence = project.getProperty("foo");
// convert to uppercase
lowercaseValue = sentence.toLowerCase();
uppercaseValue = sentence.toUpperCase();
// store the result in a new property
project.setProperty("allLowerCase",lowercaseValue);
project.setProperty("allUpperCase",uppercaseValue);
]]> </script>
<!-- Display the values -->
<echo>allLowerCase=${allLowerCase}</echo>
<echo>allUpperCase=${allUpperCase}</echo>
</target>
Output
D:\ant-1.8.0RC1\bin>ant capitalize
Buildfile: D:\ant-1.8.0RC1\bin\build.xml
capitalize:
[echo] allLowerCase=this is a normal line that doesn't say much
[echo] allUpperCase=THIS IS A NORMAL LINE THAT DOESN'T SAY MUCH
BUILD SUCCESSFUL
Update for WarrenFaith's comment to separate the script into another target and pass a property from the called target back to the calling target
Use antcallback from the ant-contrib jar
<target name="testCallback">
<antcallback target="capitalize" return="allUpperCase">
<param name="param1" value="This is a normal line that doesn't say much"/>
</antcallback>
<echo>a = ${allUpperCase}</echo>
</target>
and capitalise task uses the passed in param1 thus
<target name="capitalize">
<property name="foo" value="${param1}"/>
Final output
[echo] a = THIS IS A NORMAL LINE THAT DOESN'T SAY MUCH
you could use the script task and use a jsr223-supported script language like javascript, jruby, jython,... to do your string handling

How to use size of file inside Ant target

I'm currently in the process of replacing my homebrewn build script by an Ant build script.
Now I need to replace various tokens by the size of a specific file. I know how to get the size in bytes via the <length> task and store in in a property, but I need the size in kilobytes and megabytes too.
How can I access the file size in other representations (KB, MB) or compute these values from within the Ant target and store them in a property?
Edit: After I discovered the <script> task, it was fairly easy to calculate the other values using some JavaScript and add a new property to the project using project.setNewProperty("foo", "bar");.
I found a solution that does not require any third-party library or custom tasks using the <script> task that allows for using JavaScript (or any other Apache BSF or JSR 223 supported language) from within an Ant target.
<target name="insert-filesize">
<length file="${afile}" property="fs.length.bytes" />
<script language="javascript">
<![CDATA[
var length_bytes = project.getProperty("fs.length.bytes");
var length_kbytes = Math.round((length_bytes / 1024) * Math.pow(10,2))
/ Math.pow(10,2);
var length_mbytes = Math.round((length_kbytes / 1024) * Math.pow(10,2))
/ Math.pow(10,2);
project.setNewProperty("fs.length.kb", length_kbytes);
project.setNewProperty("fs.length.mb", length_mbytes);
]]>
</script>
<copy todir="${target.dir}">
<fileset dir="${source.dir}">
<include name="**/*" />
<exclude name="**/*.zip" />
</fileset>
<filterset begintoken="$$$$" endtoken="$$$$">
<filter token="SIZEBYTES" value="${fs.length.bytes}"/>
<filter token="SIZEKILOBYTES" value="${fs.length.kb}"/>
<filter token="SIZEMEGABYTES" value="${fs.length.mb}"/>
</filterset>
</copy>
</target>
There is a math task at http://ant-contrib.sourceforge.net/ that may be useful

Merge Two XML Files in Java

I have two XML files of similar structure which I wish to merge into one file.
Currently I am using EL4J XML Merge which I came across in this tutorial.
However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4.
Instead it just discards either 1 and 2 or 3 and 4 depending on which file is merged first.
So I would be grateful to anyone who has experience with XML Merge if they could tell me what I might be doing wrong or alternatively does anyone know of a good XML API for Java that would be capable of merging the files as I require?
Many Thanks for Your Help in Advance
Edit:
Could really do with some good suggestions on doing this so added a bounty. I've tried jdigital's suggestion but still having issues with XML merge.
Below is a sample of the type of structure of XML files that I am trying to merge.
<run xmloutputversion="1.02">
<info type="a" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="up" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="1">
<state value="test" />
<service value="gamma" />
</result>
<result id="2">
<state value="test4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
<run xmloutputversion="1.02">
<info type="b" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="down" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="3">
<state value="testagain" />
<service value="gamma2" />
</result>
<result id="4">
<state value="testagain4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
Expected output
<run xmloutputversion="1.02">
<info type="a" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="down" reason="somereason"/>
<status state="up" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="1">
<state value="test" />
<service value="gamma" />
</result>
<result id="2">
<state value="test4" />
<service value="gamma4" />
</result>
<result id="3">
<state value="testagain" />
<service value="gamma2" />
</result>
<result id="4">
<state value="testagain4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
Not very elegant, but you could do this with the DOM parser and XPath:
public class MergeXmlDemo {
public static void main(String[] args) throws Exception {
// proper error/exception handling omitted for brevity
File file1 = new File("merge1.xml");
File file2 = new File("merge2.xml");
Document doc = merge("/run/host/results", file1, file2);
print(doc);
}
private static Document merge(String expression,
File... files) throws Exception {
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression compiledExpression = xpath
.compile(expression);
return merge(compiledExpression, files);
}
private static Document merge(XPathExpression expression,
File... files) throws Exception {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
docBuilderFactory
.setIgnoringElementContentWhitespace(true);
DocumentBuilder docBuilder = docBuilderFactory
.newDocumentBuilder();
Document base = docBuilder.parse(files[0]);
Node results = (Node) expression.evaluate(base,
XPathConstants.NODE);
if (results == null) {
throw new IOException(files[0]
+ ": expression does not evaluate to node");
}
for (int i = 1; i < files.length; i++) {
Document merge = docBuilder.parse(files[i]);
Node nextResults = (Node) expression.evaluate(merge,
XPathConstants.NODE);
while (nextResults.hasChildNodes()) {
Node kid = nextResults.getFirstChild();
nextResults.removeChild(kid);
kid = base.importNode(kid, true);
results.appendChild(kid);
}
}
return base;
}
private static void print(Document doc) throws Exception {
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory
.newTransformer();
DOMSource source = new DOMSource(doc);
Result result = new StreamResult(System.out);
transformer.transform(source, result);
}
}
This assumes that you can hold at least two of the documents in RAM simultaneously.
I use XSLT to merge XML files. It allows me to adjust the merge operation to just slam the content together or to merge at an specific level. It is a little more work (and XSLT syntax is kind of special) but super flexible. A few things you need here
a) Include an additional file
b) Copy the original file 1:1
c) Design your merge point with or without duplication avoidance
a) In the beginning I have
<xsl:param name="mDocName">yoursecondfile.xml</xsl:param>
<xsl:variable name="mDoc" select="document($mDocName)" />
this allows to point to the second file using $mDoc
b) The instructions to copy a source tree 1:1 are 2 templates:
<!-- Copy everything including attributes as default action -->
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="#*" />
<xsl:apply-templates />
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{name()}"><xsl:value-of select="." /></xsl:attribute>
</xsl:template>
With nothing else you get a 1:1 copy of your first source file. Works with any type of XML. The merging part is file specific. Let's presume you have event elements with an event ID attribute. You do not want duplicate IDs. The template would look like this:
<xsl:template match="events">
<xsl:variable name="allEvents" select="descendant::*" />
<events>
<!-- copies all events from the first file -->
<xsl:apply-templates />
<!-- Merge the new events in. You need to adjust the select clause -->
<xsl:for-each select="$mDoc/logbook/server/events/event">
<xsl:variable name="curID" select="#id" />
<xsl:if test="not ($allEvents[#id=$curID]/#id = $curID)">
<xsl:element name="event">
<xsl:apply-templates select="#*" />
<xsl:apply-templates />
</xsl:element>
</xsl:if>
</xsl:for-each>
</properties>
</xsl:template>
Of course you can compare other things like tag names etc. Also it is up to you how deep the merge happens. If you don't have a key to compare, the construct becomes easier e.g. for log:
<xsl:template match="logs">
<xsl:element name="logs">
<xsl:apply-templates select="#*" />
<xsl:apply-templates />
<xsl:apply-templates select="$mDoc/logbook/server/logs/log" />
</xsl:element>
To run XSLT in Java use this:
Source xmlSource = new StreamSource(xmlFile);
Source xsltSource = new StreamSource(xsltFile);
Result xmlResult = new StreamResult(resultFile);
TransformerFactory transFact = TransformerFactory.newInstance();
Transformer trans = transFact.newTransformer(xsltSource);
// Load Parameters if we have any
if (ParameterMap != null) {
for (Entry<String, String> curParam : ParameterMap.entrySet()) {
trans.setParameter(curParam.getKey(), curParam.getValue());
}
}
trans.transform(xmlSource, xmlResult);
or you download the Saxon SAX Parser and do it from the command line (Linux shell example):
#!/bin/bash
notify-send -t 500 -u low -i gtk-dialog-info "Transforming $1 with $2 into $3 ..."
# That's actually the only relevant line below
java -cp saxon9he.jar net.sf.saxon.Transform -t -s:$1 -xsl:$2 -o:$3
notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into $3 done!"
YMMV
Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.
So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure.
From this I used XStream to unserialize the XML file back into classes.
This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.
If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.
This is how it should look like using XML Merge:
action.default=MERGE
xpath.info=/run/info
action.info=PRESERVE
xpath.result=/run/host/results/result
action.result=MERGE
matcher.result=ID
You have to set ID matcher for //result node and set PRESERVE action for //info node. Also beware that .properties XML Merge uses are case sensitive - you have to use "xpath" not "XPath" in your .properties.
Don't forget to define -config parameter like this:
java -cp lib\xmlmerge-full.jar; ch.elca.el4j.services.xmlmerge.tool.XmlMergeTool -config xmlmerge.properties example1.xml example2.xml
It might help if you were explicit about the result that you're interested in achieving. Is this what you're asking for?
Doc A:
<root>
<a/>
<b>
<c/>
</b>
</root>
Doc B:
<root>
<d/>
</root>
Merged Result:
<root>
<a/>
<b>
<c/>
</b>
<d/>
</root>
Are you worried about scaling for large documents?
The easiest way to implement this in Java is to use a streaming XML parser (google for 'java StAX'). If you use the javax.xml.stream library you'll find that the XMLEventWriter has a convenient method XMLEventWriter#add(XMLEvent). All you have to do is loop over the top level elements in each document and add them to your writer using this method to generate your merged result. The only funky part is implementing the reader logic that only considers (only calls 'add') on the top level nodes.
I recently implemented this method if you need hints.
I took a look at the referenced link; it's odd that XMLMerge would not work as expected. Your example seems straightforward. Did you read the section entitled Using XPath declarations with XmlMerge? Using the example, try to set up an XPath for results and set it to merge. If I'm reading the doc correctly, it would look something like this:
XPath.resultsNode=results
action.resultsNode=MERGE
You might be able to write a java app that deserilizes the XML documents into objects, then "merge" the individual objects programmatically into a collection. You can then serialize the collection object back out to an XML file with everything "merged."
The JAXB API has some tools that can convert an XML document/schema into java classes. The "xjc" tool might be able to do this, although I can't remember if you can create classes directly from the XML doc, or if you have to generate a schema first. There are tools out there than can generate a schema from an XML doc.
Hope this helps... not sure if this is what you were looking for.
In addition to using Stax (which does make sense), it'd probably be easier with StaxMate (http://staxmate.codehaus.org/Tutorial). Just create 2 SMInputCursors, and child cursor if need be. And then typical merge sort with 2 cursors. Similar to traversing DOM documents in recursive-descent manner.
So, you're only interested in merging the 'results' elements? Everything else is ignored? The fact that input0 has an <info type="a"/> and input1 has an <info type="b"/> and the expected result has an <info type="a"/> seems to suggest this.
If you're not worried about scaling and you want to solve this problem quickly then I would suggest writing a problem-specific bit of code that uses a simple library like JDOM to consider the inputs and write the output result.
Attempting to write a generic tool that was 'smart' enough to handle all of the possible merge cases would be pretty time consuming - you'd have to expose a configuration capability to define merge rules. If you know exactly what your data is going to look like and you know exactly how the merge needs to be executed then I would imagine your algorithm would walk each XML input and write to a single XML output.
You can try Dom4J which provides a very good means to extract information using XPath Queries and also allows you to write XML very easily. You just need to play around with the API for a while to do your job
Sometimes you need just concatenate XML-files into one, for example with similar structure, like this:
File xml1:
<root>
<level1>
...
</level1>
<!--many records-->
<level1>
...
</level1>
</root>
File xml2:
<root>
<level1>
...
</level1>
<!--many records-->
<level1>
...
</level1>
</root>
In this case, the next procedure that uses jdom2 library can help you:
void concatXML(Path fSource,Path fDest) {
Document jdomSource = null;
Document jdomDest = null;
List<Element> elems = new LinkedList<Element>();
SAXBuilder jdomBuilder = new SAXBuilder();
try {
jdomSource = jdomBuilder.build(fSource.toFile());
jdomDest = jdomBuilder.build(fDest.toFile());
Element root = jdomDest.getRootElement();
root.detach();
String sourceNextElementName=((Element) jdomSource.getRootElement().getContent().get(1)).getName();
for (Element record:jdomSource.getRootElement().getDescendants(new ElementFilter(sourceNextElementName)))
elems.add(record);
for (Element elem : elems) (elem).detach();
root.addContent(elems);
Document newDoc = new Document(root);
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.output(newDoc, System.out);
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(newDoc, Files.newBufferedWriter(fDest, Charset.forName("UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
Have you considered just not bothering with parsing the XML "properly" and just treating the files as big long strings and using boring old things such as hash maps and regular expressions...? This could be one of those cases where the fancy acronyms with X in them just make the job fiddlier than it needs to be.
Obviously this does depend a bit on how much data you actually need to parse out while doing the merge. But by the sound of things, the answer to that is not much.

Categories

Resources