Two Xmls input, One output with XSL transform

Two Xmls input, One output with XSL transform - java

I'm trying to write an XSL that basically need to take some values from one xml and other from another and output a XML. I've searched online for some solution and I found that I've to put this <xsl:variable name='file' select="'file:///C:/Users/file.xml'"> inside my input XML which is supposed to load another XML and store it into a variable but from this I dont know how to get the tags value of the document.
The file.xml is this one
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<silosMediaObject>
<canBeDeleted>-1</canBeDeleted>
<checkedOut>-1</checkedOut>
<checkedOutBy>-1</checkedOutBy>
<deleted>-1</deleted>
<description>Traccia audio migrata da ASCN</description>
<externalResourcePath>TEST/ASCN/lq/3763_2015-05-05.mp3</externalResourcePath>
<fileName>3763_2015-05-05.mp3</fileName>
<framesPerSecond>-1</framesPerSecond>
<hasScheduledIngestion>false</hasScheduledIngestion>
<isArchived>-1</isArchived>
<isArchiving>-1</isArchiving>
<isAvailable>-1</isAvailable>
<isEncoding>-1</isEncoding>
<isRestoring>-1</isRestoring>
<isVerified>-1</isVerified>
<mediaObjectId>-1</mediaObjectId>
<mediaTypeId>-1</mediaTypeId>
<mosId>4347</mosId>
<resourceIsExternal>-1</resourceIsExternal>
<sourceMediaObjectId>-1</sourceMediaObjectId>
<state>AVAILABLE</state>
<versionLinkId>-1</versionLinkId>
</silosMediaObject>
The Java class I'm using to transform the file is this one:
public class TestMain {
public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("C:\\Users\\xmltemplate_transformer.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("C:\\Users\\tobe_transformed.xml"));
transformer.transform(text, new StreamResult(new File("C:\\Users\\out.xml")));
}
}

I've searched online for some solution and I found that I've to put this <xsl:variable name='file' select="'file:///C:/Users/file.xml'"> inside my input XML which is supposed to load another XML and store it into a variable
I don't know where you got that idea, but you're confused. The select value is interpreted as an XPath expression. Yours is a string literal containing a URL with the file scheme. As far as XPath or XSLT is concerned, it is just a string. One might do something further to cause the file designated by that URL to be parsed, but what you've presented has no such effect.
In particular, you might have wanted to do this:
<xsl:variable name='file' select="document('file:///C:/Users/file.xml')"/>
The document() function is the secret sauce that actually causes the designated file to be read and parsed (if possible); when used as shown, its result is a node set containing the root node of the resulting document, or an empty node set if the designated document cannot be parsed and the processor elects not to signal an error.
Note: when you say you put the xsl:variable "inside my input XML", I presume you mean at an appropriate place inside your (XML-based) XSL stylesheet. If you actually mean that you have placed it in a different XML data file that you are processing, then it will have no direct effect there, other than to be included, as itself, in the input tree.
but from this I dont know how to get the tags value of the document.
Having successfully parsed the file, you can use the resulting node set anywhere that XSLT expects an expression that evaluates to a node set. In particular, within its scope, you can use a reference to the variable you've defined ($file) as an argument to XPath functions, or as a whole expression, such as the select expression of an xsl:apply-templates. Since you haven't said what, specifically, you want to do with the contents, I cannot be any more specific myself. See what you can do, and if you can't figure out the details then that could be a suitable topic for a new question.

Related

XML File looses its format after reading and writing in Java

I'm writing a program in Java that it's going to read a XML file and do some modification,and then write the file with the same format.
The following is the code block that reads and writes the XML file:
final Document fileDocument = parseFileAsDocument(file);
final OutputFormat format = new OutputFormat(fileDocument);
try {
final FileWriter out = new FileWriter(file);
final XMLSerializer serializer = new XMLSerializer(out,format);
serializer.serialize(fileDocument);
}
catch (final IOException e) {
System.out.println(e.getMessage());
}
This is the method used to parse the file:
private Document parseFileAsDocument(final File file) {
Document inputDocument = null;
try {
inputDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
}//catching some exceptions{}
return inputDocument;
}
I'm noticing two changes after the file is written:
Before I had a node similar to this:
<instance ref='filter'>
<value></value>
</instance>
After reading and writing, the node looks like this:
<instance ref="filter">
<value/>
</instance>
As you can see from above, the 'filter' has been changed to "filter" with double quote.
The second change is <value></value> has been changed to <value/>. This change happens across the XML file whenever we have a node similar to <tag></tag> with no value in between. So if we have something like <tag>somevalue</tag>, there is no issue.
Any thought please how to get the XML nodes format to be the same after writing?
I'd appreciate it!

You can't, and you shouldn't try. It's a bit like complaining that when you add 0123 and 0234, you get 357 without the leading zeroes. Leading zeroes in integers aren't considered significant, so arithmetic operations don't preserve them. The same happens to insignificant details of your XML, like the distinction between double quotes and single quotes, and the distinction between a self-closing tags and a start/end tag pair for an empty element. If any consumer of the XML is depending on these details, they need to be sent for retraining.
The most usual reason for asking for lexical details to be preserved is that you want to detect changes. But this means you are doing your comparisons the wrong way: you should be comparing at the logical level, not the physical level. One way to do comparisons is to canonicalize the XML, so whenever there is an arbitrary choice to be made between equivalent representations, it is made the same way.

Saxon/Javax Transform from multiple XML Files/Strings

This code is in Java and uses Saxon
I am implementing a transform function to transform xml and several secondary xml sources
All of the inputs are not files, so I cannot use document() or other methods that define files directly
String transform(String xml, List<String> secondaryXmls, String xslt);
It outputs the transformed xml result
I am successful in applying the transformation from xslt to the single xml file, but I have difficulties in applying transformation that also utilize the secondaryXmls. I have done my research and still could not find the right method to apply these
here is a snapshot of the code
TransformerFactory tFactory = TransformerFactory.newInstance("net.sf.saxon.TransformerFactoryImpl",null);
Document transformerDoc = loadXMLFromString(xslt);
Source transformerSource = new DOMSource(transformerDoc);
Transformer transformer = tFactory.newTransformer(transformerSource);
Document sourceDoc = loadXMLFromString(xml);
Source source = new DOMSource(sourceDoc);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Document resultDoc = (Document) result.getNode();
return getStringFrom(resultDoc);
Thanks!
EDIT:
Which is the better way:
concatenating all the xmls, transform, return only the original part filtering the concatenated secondary xmls
Write a code that adds
<xsl:variable name="asd" select="document('asd')">
on top of the xslt string

First thing - get rid of all that DOM stuff! Using the DOM with Saxon slows it down by a factor of ten. Let Saxon build the trees in its own format, by using a StreamSource or SAXSource, and a StreamResult. Or you can build a tree in Saxon format yourself, if you want, using the s9api DocumentBuilder class.
Then as to the answer to your question: here are three possible solutions:
(a) supply the documents as a stylesheet parameter of type document-node()* (that is, a sequence of document nodes). In the Java, convert your list of XML strings to a list of document nodes by calling Configuration.buildDocument() on each one.
(b) write a URIResolver whose effect is to interpret the URI doc/3 as meaning the third document in the list; then use document('doc/3') to fetch that document.
(c) write a CollectionURIResolver which makes the whole collection of documents available using the collection() function.

how to pass all parameters of input XML document into output XML document

general task is XML document processing. It'd be nice to have all input XML parameters usually given as <p:MyDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="urn:where/to/look/4/schema" schemaVersion="1.3">
passed to an output XML.
More specifically, how to get them from the input XML?
I can pass them (if I know them in advance) as
Element rootElement = document.createElementNS("urn:hard/coded", root);
rootElement.setAttribute("schemaVersion", "myWish");

You can get all the attribute details using document.getAttributes()

XSLT transform in xmlSignature java?

I have a XML document.I am signing a part of document using xmlsignature. Before finding digest, I want to apply XSLT transform. According to what I read, XSLT converts an XML document to another format(can be XML also). Now I am confused that, where will be the transformed new document is avilable?How to retrieve the value from this newly created document if I want to show it to user?
My XML Document
<r1>
<user>asd</user>
<person>ghi</person>
</r1>
Code for Transformation
Transform t=fac.newTransform(Transform.XPATH,new XPathFilterParameterSpec("/r1/user"));
According to xpath transformation,Whenever value of user element changes the xmlsignature should not be validated. And if person element's value changes then Signature should be validated. But when I change person element's value the signature is not validated. WHY?

The xslt transform used when signing a document relates to how nodes in your source XML are selected when the signature is calculated.
This question/answer by Dave relates to signing parts of an XML document using xpath2. The link to Sean Mullans' post in this answer suggests xpath2 is more appropriate for signing parts of a document because the evaluation of an xpath expression is done per node.
So based on the sun dsig example you can replace the Reference creation using:
List<XPathType> xpaths = new ArrayList<XPathType>();
xpaths.add(new XPathType("//r1/user", XPathType.Filter.INTERSECT));
Reference ref = fac.newReference
("", fac.newDigestMethod(DigestMethod.SHA1, null),
Collections.singletonList
(fac.newTransform(Transform.XPATH2,
new XPathFilter2ParameterSpec(xpaths))),
null, null);
This allows //r1/user to be protected with a signature while the rest of the document can be altered.
The problem with the xpath/xpath2 selection is that a signature can be generated for /some/node/that/does/not/exist. You are right to modify a test document and make sure the signature is working the way you expect.
You might test the document in a test program by generating a signature then tampering with the xml node before verification:
NodeList nlt = doc.getElementsByTagName("user");
nlt.item(0).getFirstChild().setTextContent("Something else");
A more reliable alternative to an xpath selector might be to put an ID on the xml document elements you hope to sign like:
<r1>
<user id="sign1">asd</user>
<person>ghi</person>
</r1>
then reference this ID as the URI in the first parameter of an enveloped transfer:
Reference ref = fac.newReference
("#sign1", fac.newDigestMethod(DigestMethod.SHA1, null),
Collections.singletonList
(fac.newTransform(Transform.ENVELOPED,(TransformParameterSpec) null)),
null, null);
For the output, a signature operation adds a new Signature element to the DOM you have loaded in memory. You can stream the output by transforming it like this:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.transform(new DOMSource(doc), new StreamResult(System.out));

The XSLT specification doesn't define what happens to the result document; that's defined by the API specifications of your chosen XSLT processor. For example, if you invoke XSLT from Java using the JAXP interface, you can ask for the result as a DOM tree in memory or for it to be serialized to a specified file on disk.
You have tagged your question "Java" which is the only clue you give to your processing environment. My guess is you want to transform to a DOM and then use DOM interfaces to get the value from the new document. Though if you're using XSLT 2.0 and Saxon, the s9api interface is much more usable than the native JAXP interface.

The xslt part defines only the transformation definition, nothing else.
Have a look at this:
java xslt tutorial
in Francois Gravel answer the input.xml file is the file that will be transformed, the transform.xslt is the xslt definition which describes how to transform the xml file. output.out are the results, this may be xml, but it can also be html, flat file...
This is where I started with when i was using xslt:
http://www.w3schools.com/xsl/default.asp
Have a look at this also:
http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog

Are there any advantages to using an XSLT stylesheet compared to manually parsing an XML file using a DOM parser

For one of our applications, I've written a utility that uses java's DOM parser. It basically takes an XML file, parses it and then processes the data using one of the following methods to actually retrieve the data.
getElementByTagName()
getElementAtIndex()
getFirstChild()
getNextSibling()
getTextContent()
Now i have to do the same thing but i am wondering whether it would be better to use an XSLT stylesheet. The organisation that sends us the XML file keeps changing their schema meaning that we have to change our code to cater for these shema changes. Im not very familiar with XSLT process so im trying to find out whether im better of using XSLT stylesheets rather than "manual parsing".
The reason XSLT stylesheets looks attractive is that i think that if the schema for the XML file changes i will only need to change the stylesheet? Is this correct?
The other thing i would like to know is which of the two (XSLT transformer or DOM parser) is better performance wise. For the manual option, i just use the DOM parser to parse the xml file. How does the XSLT transformer actually parse the file? Does it include additional overhead compared to manually parsing the xml file? The reason i ask is that performance is important because of the nature of the data i will be processing.
Any advice?
Thanks
Edit
Basically what I am currently doing is parsing an xml file and process the values in some of the xml elements. I don't transform the xml file into any other format. I just extract some value, extract a row from an Oracle database and save a new row into a different table. The xml file I parse just contains reference values I use to retrieve some data from the database.
Is xslt not suitable in this scenario? Is there a better approach that I can use to avoid code changes if the schema changes?
Edit 2
Apologies for not being clear enough about what i am doing with the XML data. Basically there is an XML file which contains some information. I extract this information from the XML file and use it to retrieve more information from a local database. The data in the xml file is more like reference keys for the data i need in the database. I then take the content i extracted from the XML file plus the content i retrieved from the database using a specific key from the XML file and save that data into another database table.
The problem i have is that i know how to write a DOM parser to extract the information i need from the XML file but i was wondering whether using an XSLT stylesheet was a better option as i wouldnt have to change the code if the schema changes.
Reading the responses below it sounds like XSLT is only used for transorming and XML file to another XML file or some other format. Given that i dont intend to transform the XML file, there is probably no need to add the additional overhead of parsing the XSLT stylesheet as well as the XML file.

Transforming XML documents into other formats is XSLT's reason for being. You can use XSLT to output HTML, JSON, another XML document, or anything else you need. You don't specify what kind of output you want. If you're just grabbing the contents of a few elements, then maybe you won't want to bother with XSLT. For anything more, XSLT offers an elegant solution. This is primarily because XSLT understands the structure of the document it's working on. Its processing model is tree traversal and pattern matching, which is essentially what you're manually doing in Java.
You could use XSLT to transform your source data into the representation of your choice. Your code will always work on this structure. Then, when the organization you're working with changes the schema, you only have to change your XSLT to transform the new XML into your custom format. None of your other code needs to change. Why should your business logic care about the format of its source data?

You are right that XSLT's processing model based on a rule-based event-driven approach makes your code more resilient to changes in the schema.
Because it's a different processing model to the procedural/navigational approach that you use with DOM, there is a learning and familiarisation curve, which some people find frustrating; if you want to go this way, be patient, because it will be a while before the ideas click into place. Once you are there, it's much easier than DOM programming.
The performance of a good XSLT processor will be good enough for your needs. It's of course possible to write very inefficient code, just as it is in any language, but I've rarely seen a system where XSLT was the bottleneck. Very often the XML parsing takes longer than the XSLT processing (and that's the same cost as with DOM or JAXB or anything else.)
As others have said, a lot depends on what you want to do with the XML data, which you haven't really explained.

I think that what you need is actually an XPath expression. You could configure that expression in some property file or whatever you use to retrieve your setup parameters.
In this way, you'd just change the XPath expression whenever your customer hides away the info you use in yet another place.
Basically, an XSLT is an overkill, you just need an XPath expression. A single XPath expression will allow to home in onto each value you are after.
Update
Since we are now talking about JDK 1.4 I've included below 3 different ways of fetching text in an XML file using XPath. (as simple as possible, no NPE guard fluff I'm afraid ;-)
Starting from the most up to date.
0. First the sample XML config file
<?xml version="1.0" encoding="UTF-8"?>
<config>
<param id="MaxThread" desc="MaxThread" type="int">250</param>
<param id="rTmo" desc="RespTimeout (ms)" type="int">5000</param>
</config>
1. Using JAXP 1.3 standard part of Java SE 5.0
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder;
try {
builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH_FOR_PRM_MaxThread);
Object result = expr.evaluate(doc, XPathConstants.NUMBER);
if ( result instanceof Double ) {
System.out.println( ((Double)result).intValue() );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
2. Using JAXP 1.2 standard part of Java SE 1.4-2
import javax.xml.parsers.*;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.*;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
Node param = XPathAPI.selectSingleNode( doc, XPATH_FOR_PRM_MaxThread );
if ( param instanceof Text ) {
System.out.println( Integer.decode(((Text)(param)).getNodeValue() ) );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
3. Using JAXP 1.1 standard part of Java SE 1.4 + jdom + jaxen
You need to add these 2 jars (available from www.jdom.org - binaries, jaxen is included).
import java.io.File;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
SAXBuilder sxb = new SAXBuilder();
Document doc = sxb.build(new File(CFG_FILE));
Element root = doc.getRootElement();
XPath xpath = XPath.newInstance(XPATH_FOR_PRM_MaxThread);
Text param = (Text) xpath.selectSingleNode(root);
Integer maxThread = Integer.decode( param.getText() );
System.out.println( maxThread );
} catch (Exception e) {
e.printStackTrace();
}
}
}

Since performance is important, I would suggest using a SAX parser for this. JAXB will give you roughly the same performance as DOM parsing PLUS it will be much easier and maintainable. Handling the changes in the schema also should not affect you badly if you are using JAXB, just get the new schema and regenerate the classes. If you have a bridge between the JAXB and your domain logic, then the changes can be absorbed in that layer without worrying about XML. I prefer treating XML as just a message that is used in the messaging layer. All the application code should be agnostic of XML schema.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.