Java: write XML according to user defined XSD - java

I'm writing a tool to transform CSV formatted data into XML. The user will specify the parsing method and that is: the XSD for the output, which field in the CSV goes in which field of the resulting XML.
(very simplified use-case) Example:
CSV
Ciccio;Pippo;Pappo
1;2;3
XSD
(more stuff...)
<xs:element name="onetwo">
<xs:element name="three">
<xs:element name="four">
USER GIVES RULES
Ciccio -> onetwo
Pippo -> three
Pappo -> four
I've implemented this in C# using Dataset, how could I do it in Java? I know there's DOM, JAXB etc. but it seems XSD is only used to validate an otherwise created XML. Am I wrong?
Edit:
Everything needs to be at runtime. I don't know what kind of XSD I'll receive so I cannot instantiate objects that don't exist nor populate them with data. So I'm guessing the xjc is not an option.

Since you have the XSD for your output XML file, the best way to create this XML would be by using Java Architecture for XML Binding (JAXB). You might want to refer to: "Using JAXB" tutorial to give you an overview of how to go about using this for your requirement.
The basic idea is as follows:
Generate JAXB Java classes from an XML schema, i.e. the XSD that you have
Use schema-derived JAXB classes to unmarshal and marshal XML content in a Java application
Create a Java content tree from scratch using schema-derived JAXB classes
Unmarshal the data to your output XML file.
Here's another tutorial that you might find informative.

This is still work in progress, but you could recurse over the XSD writing out elements as you find them to a new document tree.
public void run() throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new FileReader(
"schema.xsd")));
Document outputDoc = builder.newDocument();
recurse(document.getDocumentElement(), outputDoc, outputDoc);
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.transform(new DOMSource(outputDoc),
new StreamResult(buffer));
System.out.println(buffer.toString());
}
public void recurse(Node node, Node outputNode, Document outputDoc) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
if ("xs:element".equals(node.getNodeName())) {
Element newElement = outputDoc.createElement(element
.getAttribute("name"));
outputNode = outputNode.appendChild(newElement);
// map elements from CSV values here?
}
if ("xs:attribute".equals(node.getNodeName())) {
//TODO required attributes
}
}
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
recurse(list.item(i), outputNode, outputDoc);
}
}

Related

Adding element to XML using javax parser without document modification

i am trying to add elements to xml document. Elements are added successfuly but problem is, that parser modifies original xml file in other places e.g it swaps namespace and id attributes or deletes duplicate namespace definitions. I need to get precisely the same document (same syntax, preserved whitespaces) only with specific elements added. I would greatly appreciate any suggestions. Here is my code:
public void appendTimestamp(String timestamp, String signedXMLFile, String timestampedXMLFile){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try{
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File(signedXMLFile));
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList list = (NodeList)xPath.evaluate("//*[local-name()='Signature']/*[local-name()='Object']/*[local-name()='QualifyingProperties']", doc, XPathConstants.NODESET);
if(list.getLength() != 1){
throw new Exception();
}
Node node = list.item(0);
Node unsignedProps = doc.createElement("xades:UnsignedProperties");
Node unsignedSignatureProps = doc.createElement("xzep:UnsignedSignatureProperties");
Node timestampNode = doc.createElement("xzep:SignatureTimeStamp");
timestampNode.appendChild(doc.createTextNode(timestamp));
unsignedSignatureProps.appendChild(timestampNode);
unsignedProps.appendChild(unsignedSignatureProps);
node.appendChild(unsignedProps);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
DOMSource source = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult stringWriter = new StreamResult(writer);
transformer.transform(source, stringWriter);
writer.flush();
System.out.println(writer.toString());
}catch(Exception e){
e.printStackTrace();
}
}
The original xml file:
...
<ds:Object Id="objectIdVerificationObject" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
...
Modified xml file:
...
<ds:Object xmlns:ds="http://www.w3.org/2000/09/xmldsig#" Id="objectIdVerificationObject">
...
If you use the dom model, then the whole xml file is read, then represented in the memory as the node tree and then saved to xml in a way determined by the writer. So it is almost impossible to preserve the original xml format as you don't have the control over it and for example whitespaces are not represented at all in the node tree.
You need to read partially the original xml and ouptut its content to the new file preserving what was read, then in the "right" place you need to add your new content adn then continue simple coapying of the original.
For example you could use the XMLStreamWriter and XMLStreamReader to achieve that as they offer "the low" level operations.
But probably it would be much easier to just copy the xml as the text line by one till you recognize the insertion point, then create new xml portion and append it as text and continue with copying.

How to parse large SOAP response

I have a large SOAP response that I want to process and store in Database. I'm trying to process the whole thing as Document as below
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(resp.getBytes());
Document doc = db.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(fetchResult);
String result = (String) expr.evaluate(doc, XPathConstants.STRING);
resp is the SOAP response and fetchResult is
String fetchResult = "//result/text()";
I'm getting out of memory exception with this approach. So I was trying to process the document as a stream, rather than consuming the entire response as a Document.
But I can not come up with the code.
Could any of you please help me out?
DOM & JDOM are memory-consuming parsing APIs. DOM creates a tree of the XML document in memory. You should use StAX or SAX because they offer better performance.
If this in Java you could try using dom4j. This has a nice way of reading the xml using the xpathExpression.
Additionally dom4j provides an event based model for processing XML documents. Using this event based model allows us to prune the XML tree when parts of the document have been successfully processed avoiding having to keep the entire document in memory.
If you need to process a very large XML file that is generated externally by some database process and looks something like the following (where N is a very large number).
<ROWSET>
<ROW id="1">
...
</ROW>
<ROW id="2">
...
</ROW>
...
<ROW id="N">
...
</ROW>
</ROWSET>
So to process each <ROW> individually you can do the following.
// enable pruning mode to call me back as each ROW is complete
SAXReader reader = new SAXReader();
reader.addHandler( "/ROWSET/ROW",
new ElementHandler() {
public void onStart(ElementPath path) {
// do nothing here...
}
public void onEnd(ElementPath path) {
// process a ROW element
Element row = path.getCurrent();
Element rowSet = row.getParent();
Document document = row.getDocument();
...
// prune the tree
row.detach();
}
}
);
Document document = reader.read(url);
// The document will now be complete but all the ROW elements
// will have been pruned.
// We may want to do some final processing now
...
Please see How dom4j handle very large XML documents? to understand how it works.
Moreover dom4j works with any SAX parser via JAXP.
For more details see What XML parser does dom4j use?
The XPath & XPathExpression classes have methods that accept an InputSource argument.
InputStream input = ...;
InputSource source = new InputSource(input);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("...");
String result = (String) expr.evaluate(source, XPathConstants.STRING);

Filling template xml in memory using Java and JDOM?

I want to create a XML from a template during runtime in Java using JDOM.
Below is a sample template
<PARENT>
<ISSUES>
<ISSUE id="ISSUE-X">
<SUMMARY></SUMMARY>
<CATEGORY></CATEGORY>
..
</ISSUE>
</ISSUES>
</PARENT>
I want to load this template file using Java + JDOM and get the following
<PARENT>
<ISSUES>
<ISSUE id="ISSUE-1">
<SUMMARY>Test 1</SUMMARY>
<CATEGORY>Cat 1</CATEGORY>
..
</ISSUE>
<ISSUE id="ISSUE-2">
<SUMMARY>Test 2</SUMMARY>
<CATEGORY>Cat 2</CATEGORY>
..
</ISSUE>
</ISSUES>
</PARENT>
Ideally I want to create more ISSUE nodes and fill the data from DB & save to file
Reason I thought I could use Template is because there will be additional nodes under <ISSUE> which I need to fill from db & was thinking filling this via template would be much faster
Can someone guide me on how to get this done in Java using JDOM?
Note: This template will adhere to a XSD which I haven't given here.
Thanks in advance
EDIT: Code snippet below
String sXMLPath = "D:\\WS\\issue_sample.xml";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document doc = dBuilder.parse(new File(sXMLPath));
DOMBuilder domBuilder = new DOMBuilder();
Document xConfigurationDocument;
xConfigurationDocument = domBuilder.build(doc);
XPathFactory xpfac = XPathFactory.instance();
XPathExpression<Element> xElements = xpfac.compile("//ns:MY-ISSUE/ns:ISSUES",Filters.element(),null,Namespace.getNamespace("ns", "http://www.myns.net/schemas/issue"));
List<Element> elements = xElements.evaluate(xConfigurationDocument);
for (Element xIssuesParent : elements) {
System.out.println(xIssuesParent.getName());
Element xCloneIssue = null ;
for (Element xIssueChild : xIssuesParent.getChildren())
{
xCloneIssue = xIssueChild.clone();
System.out.println(xIssueChild.getName());
xIssuesParent.removeContent(xIssueChild);
}
for (int i = 1; i < 3; i++) {
xCloneIssue.setAttribute("ID", "ISSUE-" + i);
xIssuesParent.addContent(xCloneIssue);
}
}
XMLOutputter xmlOutput = new XMLOutputter();
// display nice nice
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(xConfigurationDocument, new FileWriter("c:\\temp\\OutputFile.xml"));
I am trying out this in a sample application
The problem I face is that in the for loop (for (int i = 1; i < 3; i++)) after 1st I always get the following error The Content already has an existing parent "ISSUES"
Obviously what I am missing is a new clone.
My question is how can i always get a handle of an element and keep adding to the parent
If it will adhere to an XSD then take a look at org.jdom.input.DOMBuilder which you can parse a DTD into.

Java Web Service returns string with > and < instead of > and <

I have a java web service that returns a string. I'm creating the body of this xml string with a DocumentBuilder and Document class. When I view source of the returned XML (Which looks fine in the browser window) instead of <> it's returning < and > around the XML nodes.
Please help.
****UPDATE (including code example)
The code is not including any error catching, it was stripped for simplicity.
One code block and three methods are included:
The first code block (EXAMPLE SETUP) shows the basics of what the Document object is setup to be like. the method appendPayment(...) is where the actual document building happens. It calls on two helper methods getTagValue(...) and prepareElement(...)
**Note, this code is meant to copy specific parts from a pre-existing xml string, xmlString, and grap the necessary information to be returned later.
****UPDATE 2
Response added at the end of the question
************ Follow-up question to the first answer is here:
How to return arbitrary XML Document using an Eclipse/AXIS2 POJO Service
EXAMPLE SETUP
{
//create new document
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder newDocBuilder = docFactory.newDocumentBuilder();
Document newDoc = newDocBuilder.newDocument();
Element rootElement = newDoc.createElement("AllTransactions");
newDoc.appendChild(rootElement);
appendPayment(stringXML, newDoc);
}
public static void appendPayment(String xmlString, Document newDoc) throws Exception
{
//convert string to inputstream
ByteArrayInputStream bais = new ByteArrayInputStream(xmlString.getBytes());
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document oldDoc = docBuilder.parse(bais);
oldDoc.getDocumentElement().normalize();
NodeList nList = oldDoc.getChildNodes();
Node nNode = nList.item(0);
Element eElement = (Element) nNode;
//Create new child node for this payment
Element transaction = newDoc.createElement("Transaction");
newDoc.getDocumentElement().appendChild(transaction);
//status
transaction.appendChild(prepareElement("status", eElement, newDoc));
//amount
transaction.appendChild(prepareElement("amount", eElement, newDoc));
}
private static String getTagValue(String sTag, Element eElement)
{
NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();
Node nValue = (Node) nlList.item(0);
return nValue.getNodeValue();
}
private static Element prepareElement(String sTag, Element eElement, Document newDoc)
{
String str = getTagValue(sTag, eElement);
Element newElement = newDoc.createElement(sTag);
newElement.appendChild(newDoc.createTextNode(str));
return newElement;
}
Finally, I use the following method to convert the final Document object to a String
public static String getStringFromDocument(Document doc)
{
try
{
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
return writer.toString();
}
catch(TransformerException ex)
{
ex.printStackTrace();
return null;
}
}
The header type of the response is as follows
Server: Apache-Coyote/1.1
Content-Type: text/xml;charset=utf-8
Transfer-Encoding: chunked
This is an example response
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<getTransactionsResponse xmlns="http://services.paypal.com">
<getTransactionsReturn><AllTransactions><Transaction><status>PENDING</status><amount>55.55</amount></transaction>
</getTransactionsResponse>
</soapenv:Body>
</soapenv:Envelope>
The framework is doing what you tell it; your method returns a String which means the generated WSDL should have a response message of type <xsd:string>. As we know, XML strings must encode certain characters as character entity references (i.e. "<" becomes "<" so the XML parser treats it as a string, not the beginning of an XML element as you intend). If you want to return an XML document then you must define the XML structure in the WSDL <types> section and set the response message part to the appropriate element.
To put it another way, you are trying to send "typed" data without using the strong type system provided by SOAP/WSDL (namely XML schema); this is generally regarded as bad design (see Loosely typed versus strongly typed web services).
The ultimate solution is to to define the response document via a proper XML Schema. If there is no set schema, as by the design of your service, then use the <xsd:any> type for the message response type, although this approach has its pitfalls. Moreover, such a redesign implies a schema-first (top-down) development model and from the comment stream it seems that you are currently practicing code-first (bottom-up) approach. Perhaps your tools provide a mechanism such as a "general XML document" return type or annotation which achieves the same effect.

How to deal with unknown entity references?

I'm parsing (a lot of) XML files that contain entity references which i dont know in advance (can't change that fact).
For example:
xml = "<tag>I'm content with &funny; &entity; &references;.</tag>"
when i try to parse this using the following code:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
final Document d = db.parse(is);
i get the following exception:
org.xml.sax.SAXParseException: The entity "funny" was referenced, but not declared.
but, what i do want to achieve is, that the parser replaces every entity that is not declared (unknown to the parser) with an empty String ''.
Or even better, is there a way to pass a map to the parser like:
Map<String,String> entityMapping = ...
entityMapping.put("funny","very");
entityMapping.put("entity","important");
entityMapping.put("references","stuff");
so that i could do the following:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
db.setEntityResolver(entityMapping);
final Document d = db.parse(is);
if i would obtain the text from the document using this example code i should receive:
I'm content with very important stuff.
Any suggestions? Of course, i already would be happy to just replace the unknown entity's with empty strings.
Thanks,
The StAX API has support for this. Have a look at XMLInputFactory, it has a runtime property which dictates whether or not internal entities are expanded, or left in place. If set to false, then the StAX event stream will contain instances of EntityReference to represent the unexpanded entities.
If you still want a DOM as the end result, you can chain it together like this:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
String xml = "my xml";
StringReader xmlReader = new StringReader(xml);
XMLEventReader eventReader = inputFactory.createXMLEventReader(xmlReader);
StAXSource source = new StAXSource(eventReader);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Node document = result.getNode();
In this case, the resulting DOM will contain nodes of org.w3c.dom.EntityReference mixed in with the text nodes. You can then process these as you see fit.
Since your XML input seems to be available as a String, could you not do a simple pre-processing with regular expression replacement?
xml = "...";
/* replace entities before parsing */
for (Map.Entry<String,String> entry : entityMapping.entrySet()) {
xml = xml.replaceAll("&" + entry.getKey() + ";", entry.getValue());
}
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
...
It's quite hacky, and you may want to spend some extra effort to ensure that the regexps only match where they really should (think <entity name="&don't-match-me;"/>), but at least it's something...
Of course, there are more efficient ways to achieve the same effect than calling replaceAll() a lot of times.
You could add the entities at the befinning of the file. Look here for more infos.
You could also take a look at this thread where someone seems to have implemented an EntityResolver interface (you could also implement EntityResolver2 !) where you can process the entities on the fly (e.g. with your proposed Map).
WARNING: there is a bug! in jdk6, but you could try it with jdk5

Categories

Resources