Multiple xslt tranformations - java

In input i have xml file(it can be 1000 or 100000 files) and i have to convert it to 6 csv files for later saving to the database. My question is how to do this in java more efficient, now i create 6 transformers with different xslt stylesheets and manually transform xml 6 times. I tried do this in one xslt transformation with function: result-document, it works, but in inputmay be more than one xml file and after each transformation data in result files rewrites. My idea collect all data from xml files in csv and then copy it to db tables.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTemplates(stylesource).newTransformer();
Transformer transformer2 = tf.newTemplates(stylesource2).newTransformer();
Transformer transformer3 = tf.newTemplates(stylesource3).newTransformer();
Transformer transformer4 = tf.newTemplates(stylesource4).newTransformer();
Transformer transformer5 = tf.newTemplates(stylesource5).newTransformer();
Transformer transformer6 = tf.newTemplates(stylesource6).newTransformer();
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
public void transformXmlToCsv(String content) throws TransformerException, IOException, SAXException {
Document doc = db.parse(new InputSource(new StringReader(content)));
Source source = new DOMSource(doc);
transformer.transform(source, outputTarget);
transformer2.transform(source, outputTarget2);
transformer3.transform(source, outputTarget3);
transformer4.transform(source, outputTarget4);
transformer5.transform(source, outputTarget5);
transformer6.transform(source, outputTarget6);
}

One improvement you could make would be to avoid repeated parsing of the source document by building the input tree once. For example, by building a DOM tree and using a DOMSource, or (better if you're using Saxon) by using Saxon interfaces to build the tree once in Saxon's internal format.
Another improvement would be to only create one TransformerFactory for everything. Creating a TransformerFactory is typically expensive (it involves a search of the classpath) and there's no need to ever create more than one.
It should be easy to fix your problem with xsl:result-document. There are many ways of doing it, e.g. by directing the output of each transformation to a different directory, but I can't tell what the best way is without more information.

Related

Java - correctly indenting an XML made from multiple sources

I'm trying to correctly indent (indentation = 2) an XML file written by a Java Spring Boot application. The problem is that I'm not making up the XML myself, I'm creating the XML by joining parts of various source XML with different schemas.
My code is:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(root.getOwnerDocument());
StreamResult file = new StreamResult(outputFile);
transformer.transform(source, file);
This seems to ignore completely the "indentation" parameter: it just copies whatever indentation was present in the original XML Files.
To copy the nodes, I tried both:
root.appendChild(document.adoptNode(extractedNodeToCopy.cloneNode(true)));
and
root.appendChild(document.importNode(extractedNodeToCopy, true));
But this doesn't change anything.
I don't get error messages, the result is simply indented as the original documents were (so every tag has a different style).
You are using XSLT already. The code
transformerFactory.newTransformer()
instantiates an XSLT transformer with a default XSL template that performs the 'identity' transformation (see also https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer() )
So in your case to have indentation modified you could either
use java to prepare the data in your DOM (stored in root). You would have
to add TextNodes that contain whitespace to your taste, or
do a similar job within a stylesheet (see also Problems Trying to Pretty Print XSLT Output) and make use of that by calling
transformerFactory.newTransformer(your stylesheet)

Convert org.w3c.dom.Document to File file

I have a xml file as object in Java as org.w3c.dom.Document doc and I want to convert this into File file. How can I convert the type Document to File?
thanks
I want to add metadata elements in an existing xml file (standard dita) with type File.
I know a way to add elements to the file, but then I have to convert the file to a org.w3c.dom.Document. I did that with the method loadXML:
private Document loadXML(File f) throws Exception{
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
return builder.parse(f);
After that I change the org.w3c.dom.Document, then I want to continue with the flow of the program and I have to convert the Document doc back to a File file.
What is a efficient way to do that? Or what is a better solution to get some elements in a xml File without converting it?
You can use a Transformer class to output the entire XML content to a File, as showed below:
Document doc =...
// write the content into xml file
DOMSource source = new DOMSource(doc);
FileWriter writer = new FileWriter(new File("/tmp/output.xml"));
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(source, result);
With JDK 1.8.0 a short way is to use the built-in XMLSerializer (which was introduced with JDK 1.4 as a fork of Apache Xerces)
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
Document doc = //use your method loadXML(File f)
//change Document
java.io.Writer writer = new java.io.FileWriter("MyOutput.xml");
XMLSerializer xml = new XMLSerializer(writer, null);
xml.serialize(doc);
Use an object of type OutputFormat to configure output, for example like this:
OutputFormat format = new OutputFormat(Method.XML, StandardCharsets.UTF_8.toString(), true);
format.setIndent(4);
format.setLineWidth(80);
format.setPreserveEmptyAttributes(true);
format.setPreserveSpace(true);
XMLSerializer xml = new XMLSerializer(writer, format);
Note that the classes are from com.sun.* package which is not documented and therefore generally is not seen as the preferred way of doing things. However, with javax.xml.transform.OutputKeys you cannot specify the amount of indentation or line width for example. So, if this is important then this solution should help.

Saxon/Javax Transform from multiple XML Files/Strings

This code is in Java and uses Saxon
I am implementing a transform function to transform xml and several secondary xml sources
All of the inputs are not files, so I cannot use document() or other methods that define files directly
String transform(String xml, List<String> secondaryXmls, String xslt);
It outputs the transformed xml result
I am successful in applying the transformation from xslt to the single xml file, but I have difficulties in applying transformation that also utilize the secondaryXmls. I have done my research and still could not find the right method to apply these
here is a snapshot of the code
TransformerFactory tFactory = TransformerFactory.newInstance("net.sf.saxon.TransformerFactoryImpl",null);
Document transformerDoc = loadXMLFromString(xslt);
Source transformerSource = new DOMSource(transformerDoc);
Transformer transformer = tFactory.newTransformer(transformerSource);
Document sourceDoc = loadXMLFromString(xml);
Source source = new DOMSource(sourceDoc);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Document resultDoc = (Document) result.getNode();
return getStringFrom(resultDoc);
Thanks!
EDIT:
Which is the better way:
concatenating all the xmls, transform, return only the original part filtering the concatenated secondary xmls
Write a code that adds
<xsl:variable name="asd" select="document('asd')">
on top of the xslt string
First thing - get rid of all that DOM stuff! Using the DOM with Saxon slows it down by a factor of ten. Let Saxon build the trees in its own format, by using a StreamSource or SAXSource, and a StreamResult. Or you can build a tree in Saxon format yourself, if you want, using the s9api DocumentBuilder class.
Then as to the answer to your question: here are three possible solutions:
(a) supply the documents as a stylesheet parameter of type document-node()* (that is, a sequence of document nodes). In the Java, convert your list of XML strings to a list of document nodes by calling Configuration.buildDocument() on each one.
(b) write a URIResolver whose effect is to interpret the URI doc/3 as meaning the third document in the list; then use document('doc/3') to fetch that document.
(c) write a CollectionURIResolver which makes the whole collection of documents available using the collection() function.

How to write to an existing XML file using java

I've got an xml file looked like this. employees.xml
<Employees>
<Employee>
<FirstName>myFirstName</FirstName>
<LastName>myLastName</LastName>
<Salary>10000</Salary>
<Employee>
</Employees>
Now how do I add new Employee elements to the existing XML file?.. An example code is highly appreciated.
You can't 'write nodes to an existing XML file.' You can read an existing XML file into memory, add to the data model, and then write a new file. You can rename the old file and write the new file under the old name. But there is no commonly-used Java utility that will modify an XML file in place.
To add to an existing XML file, you generally need to read it in to an internal data structure, add the needed data in the internal form and then write it all out again, overwriting the original file.
The internal structure can be DOM or one of your own making, and there are multiple ways of both reading it in and writing it out.
If the data is reasonably small, DOM is probably easiest, and there is some sample code in the answers to this related question.
If your data is large, DOM will not do. Possible approaches are to use SAX to read and write (though SAX is traditionally only a reading mechanism) as described in an answer to another related question.
You might also want to consider JAXB or (maybe even best) StAX.
Please use xstream to parse your file as an object, or create a list with employees and then you can directly convert that to xml.
I think this link can be useful for you.
Here you have samples how to read / parse, modify (add elements) and save (write to xml file again).
The following samples you can find at: http://www.petefreitag.com/item/445.cfm
Read:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("/path/to/file.xml");
Modify:
// attributes
Node earth = doc.getFirstChild();
NamedNodeMap earthAttributes = earth.getAttributes();
Attr galaxy = doc.createAttribute("galaxy");
galaxy.setValue("milky way");
earthAttributes.setNamedItem(galaxy);
// nodes
Node canada = doc.createElement("country");
canada.setTextContent("ca");
earth.appendChild(canada);
Write XML file:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
You need to use DOM to write/edit your xml file.
It's very easy:
You just need to create nodes and add attributes to it.
You can also write/edit XSLT files by using DOM.
just search google for DOM java

Control order of XML attributes in outputed file in Java

How do I control the order that the XML attributes are listed within the output file?
It seems by default they are getting alphabetized, which the program I'm sending this XML to apparently isn't handling.
e.g. I want zzzz to show first, then bbbbb in the following code.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("requests");
doc.appendChild(root);
root.appendChild(request);
root.setAttribute("zzzzzz", "My z value");
root.setAttribute("bbbbbbb", "My b value");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);
The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.
Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.
I had the same issue when I used XML DOM API for writing file. To resolve the problem I had to use XMLStreamWriter. Attributes appear in a xml file in the order you write it using XMLStreamWriter.
XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.
If you don't want to use another framework just for a custom attribute order you can simply add an order identifier to the attributes.
<someElement a__price="32" b__amount="3"/>
After the XML serializer is done post process the raw XML like so:
public static String removeAttributeOrderIdentifiers(String xml) {
return xml.replaceAll(
" [a-z]__(.+?=\")",
"$1"
);
}
And you will get:
<someElement amount="3" price="32"/>

Categories

Resources