How to parse large SOAP response

How to parse large SOAP response - java

I have a large SOAP response that I want to process and store in Database. I'm trying to process the whole thing as Document as below
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(resp.getBytes());
Document doc = db.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(fetchResult);
String result = (String) expr.evaluate(doc, XPathConstants.STRING);
resp is the SOAP response and fetchResult is
String fetchResult = "//result/text()";
I'm getting out of memory exception with this approach. So I was trying to process the document as a stream, rather than consuming the entire response as a Document.
But I can not come up with the code.
Could any of you please help me out?

DOM & JDOM are memory-consuming parsing APIs. DOM creates a tree of the XML document in memory. You should use StAX or SAX because they offer better performance.

If this in Java you could try using dom4j. This has a nice way of reading the xml using the xpathExpression.
Additionally dom4j provides an event based model for processing XML documents. Using this event based model allows us to prune the XML tree when parts of the document have been successfully processed avoiding having to keep the entire document in memory.
If you need to process a very large XML file that is generated externally by some database process and looks something like the following (where N is a very large number).
<ROWSET>
<ROW id="1">
...
</ROW>
<ROW id="2">
...
</ROW>
...
<ROW id="N">
...
</ROW>
</ROWSET>
So to process each <ROW> individually you can do the following.
// enable pruning mode to call me back as each ROW is complete
SAXReader reader = new SAXReader();
reader.addHandler( "/ROWSET/ROW",
new ElementHandler() {
public void onStart(ElementPath path) {
// do nothing here...
}
public void onEnd(ElementPath path) {
// process a ROW element
Element row = path.getCurrent();
Element rowSet = row.getParent();
Document document = row.getDocument();
...
// prune the tree
row.detach();
}
}
);
Document document = reader.read(url);
// The document will now be complete but all the ROW elements
// will have been pruned.
// We may want to do some final processing now
...
Please see How dom4j handle very large XML documents? to understand how it works.
Moreover dom4j works with any SAX parser via JAXP.
For more details see What XML parser does dom4j use?

The XPath & XPathExpression classes have methods that accept an InputSource argument.
InputStream input = ...;
InputSource source = new InputSource(input);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("...");
String result = (String) expr.evaluate(source, XPathConstants.STRING);

Related

Format attributes for XML in Pretty format in java

I am trying to format XML string to pretty. I want all the attributes to be printed in single line.
XML input:
<root><feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d" attribute5="e" attribute6="f"> <id>2140</id><title>gj</title><description>ghj</description>
<msg/>
Expected output:
<root>
<feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d" attribute5="e" attribute6="f">
<id>2140</id>
<title>gj</title>
<description>ghj</description>
<msg/>
</feeds>
Actual Output:
<root>
<feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d"
attribute5="e" attribute6="f">
<id>2140</id>
<title>gj</title>
<description>ghj</description>
<msg/>
</feeds>
Here is my code to format xml. I have also tried SAX parser. I don't want to use DOM4J.
public static String formatXml(String xml) {
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", false);
writer.getDomConfig().setParameter("well-formed", true);
LSOutput output = impl.createLSOutput();
ByteArrayOutputStream out = new ByteArrayOutputStream();
output.setByteStream(out);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
writer.write(db.parse(is), output);
return new String(out.toByteArray());
}
Is there any way to keep attributes in one line with SAX or DOM parser? I am not looking for any additional library. I am looking for solution with java library only.

A SAX or DOM parser will read your input string and allow your application to understand what was passed in. At some point in time your application then writes out that data, and that is the moment where you decide to insert additional whitespace (like linefeeds and tab characters) to pretty-print the document.
If you really want to use SAX and make the parser efficient the best you could do is write the document while it is being parsed. So you would implement the ContentHandler interface (https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/org/xml/sax/ContentHandler.html) such that it directly writes out the data while adding linefeeds where you feel they belong to.
Check this tutorial to see how the ContentHandler can then be applied in a SAX parser: https://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html

Casting JDom 1.1.3 Element to Document without DocumentBuilderFactory or DocumentBuilder

I need to find the easier and the efficient way to convert a JDOM element (with all it's tailoring nodes) to a Document. ownerDocument( ) won't work as this is version JDOM 1.
Moreover, org.jdom.IllegalAddException: The Content already has an existing parent "root" exception occurs when using the following code.
DocumentBuilderFactory dbFac = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFac.newDocumentBuilder();
Document doc = null;
Element elementInfo = getElementFromDB();
doc = new Document(elementInfo);
XMLOutputter xmlOutput = new XMLOutputter();
byte[] byteInfo= xmlOutput.outputString(elementInfo).getBytes("UTF-8");
String stringInfo = new String(byteInfo);
doc = dBuilder.parse(stringInfo);

I think you have to use the following method of the element.
Document doc = <element>.getDocument();
Refer the API documentation It says
Return this parent's owning document or null if the branch containing this parent is currently not attached to a document.

JDOM content can only have one parent at a time, and you have to detatch it from one parent before you can attach it to another. This code:
Document doc = null;
Element elementInfo = getElementFromDB();
doc = new Document(elementInfo);
if that code is failing, it is because the getElementFromDB() method is returning an Element that is part of some other structure. You need to 'detach' it:
Element elementInfo = getElementFromDB();
elementInfo.detach();
Document doc = new Document(elementInfo);
OK, that solves the IllegalAddException
On the other hand, if you just want to get the document node containing the element, JDOM 1.1.3 allows you to do that with getDocument:
Document doc = elementInfo.getDocument();
Note that the doc may be null.
To get the top most element available, try:
Element top = elementInfo;
while (top.getParentElement() != null) {
top = top.getParentElement();
}
In your case, your elementInfo you get from the DB is a child of an element called 'root', something like:
<root>
<elementInfo> ........ </elementInfo>
</root>
That is why you get the message you do, with the word "root" in it:
The Content already has an existing parent "root"

Read sitemap with XPath

I want to read Sitemap with XPath but it doesn't work.
here is my code :
private void evaluate2(String src){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
try{
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(src.getBytes()));
System.out.println(src);
XPathFactory xp_factory = XPathFactory.newInstance();
XPath xpath = xp_factory.newXPath();
XPathExpression expr = xpath.compile("//url/loc");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
items.add(nodes.item(i).getNodeValue());
System.out.println(nodes.item(i).toString());
}
}catch(Exception e){
System.out.println(e.getMessage());
}
}
Before I retrieve the remote source of the sitemap, and it's passed to evaluate2 through the variable src.
And the System.out.println(nodes.getLength()); display 0
My xpath query is working because this query work in PHP.
Do you see errors in my code ?
Thanks

You parse the sitemap with a namespace-aware parser (that's what factory.setNamespaceAware(true) does), but then attempt to access it using an XPath that does not usea namespace resolver (or reference any namespaces).
The simplest solution is to configure the parser as not namespace aware. As long as you're just parsing a self-contained sitemap, that shouldn't be a problem.
One more problem in your code is that you pass the sitemap contents as a String, then convert that String using the platform default encoding. This will work as long as your platform-default encoding matches that of the actual bytes that you retrieved from the server (assuming that you also created the string using the platform-default encoding). If it doesn't, you're likely to get a conversion error.

I think the input has namespace. So you would have to initialize the namespaceContext for the xpath object and change your xpath with prefixes. i.e. //usr/loc should be //ns:url/ns:loc
and then add the namespace prefix binding in the namespace object.
You can find an NamespaceContext implementation available with apache common. http://ws.apache.org/commons/util/apidocs/index.html
ws-commons-utils
NamespaceContextImpl namespaceContextObj = new NamespaceContextImpl();
nsContext.startPrefixMapping("ns", "http://sitename/xx");
xpath.setNamespaceContext(namespaceContextObj);
XPathExpression expr = xpath.compile("//ns:url/ns:loc");
In case you don't know what namespaces that are comming, you can get them from the document it self, but I doubt it ll be of much use. There are few how-tos here
http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html

I can't see any errors in your code so I gues the problem is the source.
Are you sure that the source file contains this element?
Maybe you could try to use this code to parse the String in an Document
builder.parse(new InputSource(new StringReader(xml)));

Java Web Service returns string with > and < instead of > and <

I have a java web service that returns a string. I'm creating the body of this xml string with a DocumentBuilder and Document class. When I view source of the returned XML (Which looks fine in the browser window) instead of <> it's returning < and > around the XML nodes.
Please help.
****UPDATE (including code example)
The code is not including any error catching, it was stripped for simplicity.
One code block and three methods are included:
The first code block (EXAMPLE SETUP) shows the basics of what the Document object is setup to be like. the method appendPayment(...) is where the actual document building happens. It calls on two helper methods getTagValue(...) and prepareElement(...)
**Note, this code is meant to copy specific parts from a pre-existing xml string, xmlString, and grap the necessary information to be returned later.
****UPDATE 2
Response added at the end of the question
************ Follow-up question to the first answer is here:
How to return arbitrary XML Document using an Eclipse/AXIS2 POJO Service
EXAMPLE SETUP
{
//create new document
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder newDocBuilder = docFactory.newDocumentBuilder();
Document newDoc = newDocBuilder.newDocument();
Element rootElement = newDoc.createElement("AllTransactions");
newDoc.appendChild(rootElement);
appendPayment(stringXML, newDoc);
}
public static void appendPayment(String xmlString, Document newDoc) throws Exception
{
//convert string to inputstream
ByteArrayInputStream bais = new ByteArrayInputStream(xmlString.getBytes());
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document oldDoc = docBuilder.parse(bais);
oldDoc.getDocumentElement().normalize();
NodeList nList = oldDoc.getChildNodes();
Node nNode = nList.item(0);
Element eElement = (Element) nNode;
//Create new child node for this payment
Element transaction = newDoc.createElement("Transaction");
newDoc.getDocumentElement().appendChild(transaction);
//status
transaction.appendChild(prepareElement("status", eElement, newDoc));
//amount
transaction.appendChild(prepareElement("amount", eElement, newDoc));
}
private static String getTagValue(String sTag, Element eElement)
{
NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();
Node nValue = (Node) nlList.item(0);
return nValue.getNodeValue();
}
private static Element prepareElement(String sTag, Element eElement, Document newDoc)
{
String str = getTagValue(sTag, eElement);
Element newElement = newDoc.createElement(sTag);
newElement.appendChild(newDoc.createTextNode(str));
return newElement;
}
Finally, I use the following method to convert the final Document object to a String
public static String getStringFromDocument(Document doc)
{
try
{
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
return writer.toString();
}
catch(TransformerException ex)
{
ex.printStackTrace();
return null;
}
}
The header type of the response is as follows
Server: Apache-Coyote/1.1
Content-Type: text/xml;charset=utf-8
Transfer-Encoding: chunked
This is an example response
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<getTransactionsResponse xmlns="http://services.paypal.com">
<getTransactionsReturn><AllTransactions><Transaction><status>PENDING</status><amount>55.55</amount></transaction>
</getTransactionsResponse>
</soapenv:Body>
</soapenv:Envelope>

The framework is doing what you tell it; your method returns a String which means the generated WSDL should have a response message of type <xsd:string>. As we know, XML strings must encode certain characters as character entity references (i.e. "<" becomes "<" so the XML parser treats it as a string, not the beginning of an XML element as you intend). If you want to return an XML document then you must define the XML structure in the WSDL <types> section and set the response message part to the appropriate element.
To put it another way, you are trying to send "typed" data without using the strong type system provided by SOAP/WSDL (namely XML schema); this is generally regarded as bad design (see Loosely typed versus strongly typed web services).
The ultimate solution is to to define the response document via a proper XML Schema. If there is no set schema, as by the design of your service, then use the <xsd:any> type for the message response type, although this approach has its pitfalls. Moreover, such a redesign implies a schema-first (top-down) development model and from the comment stream it seems that you are currently practicing code-first (bottom-up) approach. Perhaps your tools provide a mechanism such as a "general XML document" return type or annotation which achieves the same effect.

How to deal with unknown entity references?

I'm parsing (a lot of) XML files that contain entity references which i dont know in advance (can't change that fact).
For example:
xml = "<tag>I'm content with &funny; &entity; &references;.</tag>"
when i try to parse this using the following code:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
final Document d = db.parse(is);
i get the following exception:
org.xml.sax.SAXParseException: The entity "funny" was referenced, but not declared.
but, what i do want to achieve is, that the parser replaces every entity that is not declared (unknown to the parser) with an empty String ''.
Or even better, is there a way to pass a map to the parser like:
Map<String,String> entityMapping = ...
entityMapping.put("funny","very");
entityMapping.put("entity","important");
entityMapping.put("references","stuff");
so that i could do the following:
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = dbf.newDocumentBuilder();
final InputSource is = new InputSource(new StringReader(xml));
db.setEntityResolver(entityMapping);
final Document d = db.parse(is);
if i would obtain the text from the document using this example code i should receive:
I'm content with very important stuff.
Any suggestions? Of course, i already would be happy to just replace the unknown entity's with empty strings.
Thanks,

The StAX API has support for this. Have a look at XMLInputFactory, it has a runtime property which dictates whether or not internal entities are expanded, or left in place. If set to false, then the StAX event stream will contain instances of EntityReference to represent the unexpanded entities.
If you still want a DOM as the end result, you can chain it together like this:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
String xml = "my xml";
StringReader xmlReader = new StringReader(xml);
XMLEventReader eventReader = inputFactory.createXMLEventReader(xmlReader);
StAXSource source = new StAXSource(eventReader);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Node document = result.getNode();
In this case, the resulting DOM will contain nodes of org.w3c.dom.EntityReference mixed in with the text nodes. You can then process these as you see fit.

Since your XML input seems to be available as a String, could you not do a simple pre-processing with regular expression replacement?
xml = "...";
/* replace entities before parsing */
for (Map.Entry<String,String> entry : entityMapping.entrySet()) {
xml = xml.replaceAll("&" + entry.getKey() + ";", entry.getValue());
}
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
...
It's quite hacky, and you may want to spend some extra effort to ensure that the regexps only match where they really should (think <entity name="&don't-match-me;"/>), but at least it's something...
Of course, there are more efficient ways to achieve the same effect than calling replaceAll() a lot of times.

You could add the entities at the befinning of the file. Look here for more infos.
You could also take a look at this thread where someone seems to have implemented an EntityResolver interface (you could also implement EntityResolver2 !) where you can process the entities on the fly (e.g. with your proposed Map).
WARNING: there is a bug! in jdk6, but you could try it with jdk5

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to parse large SOAP response - java

DOM & JDOM are memory-consuming parsing APIs. DOM creates a tree of the XML document in memory. You should use StAX or SAX because they offer better performance.

Related

Format attributes for XML in Pretty format in java

Casting JDom 1.1.3 Element to Document without DocumentBuilderFactory or DocumentBuilder

Read sitemap with XPath

Java Web Service returns string with > and < instead of > and <

How to deal with unknown entity references?

Categories

Resources