Parsing xml stored in string

Parsing xml stored in string - java

I have an string encoded in xml format. So when I print it to console it will come out looking like an xml file. What I want to do is read values from this string now using java with DOM or SAX library. But I don't know how to do it because my string is not stored in a file.
<?xml version="1.0" encoding="UTF-8"?>
<ADT_A01 xmlns="urn:hl7-org:v2xml">
<MSH>
<MSH.1>|</MSH.1>
<MSH.2>^~\&</MSH.2>
<MSH.3>
<HD.1>HIS</HD.1>
</MSH.3>
<MSH.4>
<HD.1>RIH</HD.1>
</MSH.4>
<MSH.5>
<HD.1>EKG</HD.1>
</MSH.5>
<MSH.6>
<HD.1>EKG</HD.1>
</MSH.6>
<MSH.7>199904140038</MSH.7>
<MSH.9>
<MSG.1>ADT</MSG.1>
<MSG.2>A01</MSG.2>
</MSH.9>
<MSH.11>
<PT.1>P</PT.1>
</MSH.11>
<MSH.12>
<VID.1>2.6</VID.1>
</MSH.12>
</MSH>
<PID>
<PID.1>1</PID.1>
<PID.3>
<CX.1>1478895</CX.1>
<CX.2>4</CX.2>
<CX.3>M10</CX.3>
<CX.4>
<HD.1>PA</HD.1>
</CX.4>
</PID.3>
<PID.5>
<XPN.1>
<FN.1>XTEST</FN.1>
</XPN.1>
<XPN.2>PATIENT</XPN.2>
</PID.5>
<PID.7>19591123</PID.7>
<PID.8> F</PID.8>
</PID>
</ADT_A01>

For DOM, one option is to use an InputSource:
String str = "<xml>...</xml>";
DocumentBuilder builder = DocumentBuilderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(str)));
You can use a similar strategy with SAX, since it supports InputSource as well.

Related

Format attributes for XML in Pretty format in java

I am trying to format XML string to pretty. I want all the attributes to be printed in single line.
XML input:
<root><feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d" attribute5="e" attribute6="f"> <id>2140</id><title>gj</title><description>ghj</description>
<msg/>
Expected output:
<root>
<feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d" attribute5="e" attribute6="f">
<id>2140</id>
<title>gj</title>
<description>ghj</description>
<msg/>
</feeds>
Actual Output:
<root>
<feeds attribute1="a" attribute2="b" attribute3="c" attribute4="d"
attribute5="e" attribute6="f">
<id>2140</id>
<title>gj</title>
<description>ghj</description>
<msg/>
</feeds>
Here is my code to format xml. I have also tried SAX parser. I don't want to use DOM4J.
public static String formatXml(String xml) {
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", false);
writer.getDomConfig().setParameter("well-formed", true);
LSOutput output = impl.createLSOutput();
ByteArrayOutputStream out = new ByteArrayOutputStream();
output.setByteStream(out);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
writer.write(db.parse(is), output);
return new String(out.toByteArray());
}
Is there any way to keep attributes in one line with SAX or DOM parser? I am not looking for any additional library. I am looking for solution with java library only.

A SAX or DOM parser will read your input string and allow your application to understand what was passed in. At some point in time your application then writes out that data, and that is the moment where you decide to insert additional whitespace (like linefeeds and tab characters) to pretty-print the document.
If you really want to use SAX and make the parser efficient the best you could do is write the document while it is being parsed. So you would implement the ContentHandler interface (https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/org/xml/sax/ContentHandler.html) such that it directly writes out the data while adding linefeeds where you feel they belong to.
Check this tutorial to see how the ContentHandler can then be applied in a SAX parser: https://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html

If I keep space in any of the tag, I get error XML document structures must start and end within the same entity

Why am I getting this error if I keep space in any of the tag?
XML document structures must start and end within the same entity.
This is the code:
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(path));
Document doc = db.parse(is);
My xml data:
path=
<user>
<userdata>
<salutation>Ms</salutation>
<firstname>TestUser</firstname>
<middlename>Tes tMN</middlename>
<lastname>TestLN</lastname>
<loginid>testEmail#gmail.com</loginid>
<display-name>Test User</display-name>
<email>testEmail#gmail.com</email>
<isadmin>1</isadmin>
<mobno>9865323656</mobno>
<dob>1994/01/02</dob>
<status>1</status>
</userdata>
<operationdata>
<operationtype>1</operationtype>
</operationdata>
</user>
when I am trying to pass xml data as a string in url with space in data..url breaks..is there any way to handle that?

Convert String to org.w3c.dom Document

I am receiving this string via servlet from javascript as such :
%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22+standalone%3D%22no%22%3F%3E%3Crunningparameters%3E++%0A%09%3Caccesskey%3Eawd%3C%2Faccesskey%3E%0A%09%3Csecretkey%3ErL8cuC6GzH%2F8zHycg45sM47MqQMVVPmgZjdgDIJS%3C%2Fsecretkey%3E%0A%09%3Ckeyfile%3E%2FTdcGooglePrototype%2Fresources%2Ffrank_arianit_keypair.pem%3C%2Fkeyfile%3E%0A%09%3Creadybucketname%3Efa-initialbucket%3C%2Freadybucketname%3E%0A%09%3Cdonebucketname%3Efa-completedbucket%3C%2Fdonebucketname%3E%0A%09%3Cmanagertodownloader%3Ehttps%3A%2F%2Fsqs.eu-central-1.amazonaws.com%2F210890795349%2FFA_M2D%0A%09%3C%2Fmanagertodownloader%3E%0A%09%3Capplicationtomanager%3Ehttps%3A%2F%2Fsqs.eu-central-1.amazonaws.com%2F210890795349%2FFA_A2M%0A%09%3C%2Fapplicationtomanager%3E%0A%09%3Cpolicyconfig%3Ehttps%3A%2F%2Fsqs.eu-central-1.amazonaws.com%2F210890795349%2FFA_Policy%0A%09%3C%2Fpolicyconfig%3E%0A%09%3Cqueuelength%3E100%3C%2Fqueuelength%3E%0A%09%3Cnumberofdownloader%3E100%3C%2Fnumberofdownloader%3E%0A%09%3Cdownloadspeed%3E10%3C%2Fdownloadspeed%3E%0A%09%3Cpagerankthreshold%3E0.001%3C%2Fpagerankthreshold%3E%0A%09%3Ctotalpages%3E10000%3C%2Ftotalpages%3E%0A%09%3Cpagerankmaxloop%3E10%3C%2Fpagerankmaxloop%3E%0A%09%3Ccountryfilters%3E%0A%09%09%3Curlcountry%3Ecom%3C%2Furlcountry%3E%0A%09%3C%2Fcountryfilters%3E%0A%09%3Cfiletypefilters%3E++++++%0A%09%09%3Curlfiletype%3Epdf%3C%2Furlfiletype%3E%0A%09%09%3Curlfiletype%3Ehtml%3C%2Furlfiletype%3E++%0A%09%09%3Curlfiletype%3Ehtm%3C%2Furlfiletype%3E%0A%09%3C%2Ffiletypefilters%3E%0A%09%3Cseedurls%3E%0A%09%09%09%3Curl%3Ehttp%3A%2F%2Fwww.cnn.com%2F%3C%2Furl%3E+%09%0A%09%3C%2Fseedurls%3E%0A%3C%2Frunningparameters%3E
It's in xml but kinda look messed up. Is there a way to convert this to dom document ?
I tried :
Document d;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
d = parser.parse(xmlstring);
But it doesn't work because i know the string format looks crazy.

How to parse large SOAP response

I have a large SOAP response that I want to process and store in Database. I'm trying to process the whole thing as Document as below
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(resp.getBytes());
Document doc = db.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(fetchResult);
String result = (String) expr.evaluate(doc, XPathConstants.STRING);
resp is the SOAP response and fetchResult is
String fetchResult = "//result/text()";
I'm getting out of memory exception with this approach. So I was trying to process the document as a stream, rather than consuming the entire response as a Document.
But I can not come up with the code.
Could any of you please help me out?

DOM & JDOM are memory-consuming parsing APIs. DOM creates a tree of the XML document in memory. You should use StAX or SAX because they offer better performance.

If this in Java you could try using dom4j. This has a nice way of reading the xml using the xpathExpression.
Additionally dom4j provides an event based model for processing XML documents. Using this event based model allows us to prune the XML tree when parts of the document have been successfully processed avoiding having to keep the entire document in memory.
If you need to process a very large XML file that is generated externally by some database process and looks something like the following (where N is a very large number).
<ROWSET>
<ROW id="1">
...
</ROW>
<ROW id="2">
...
</ROW>
...
<ROW id="N">
...
</ROW>
</ROWSET>
So to process each <ROW> individually you can do the following.
// enable pruning mode to call me back as each ROW is complete
SAXReader reader = new SAXReader();
reader.addHandler( "/ROWSET/ROW",
new ElementHandler() {
public void onStart(ElementPath path) {
// do nothing here...
}
public void onEnd(ElementPath path) {
// process a ROW element
Element row = path.getCurrent();
Element rowSet = row.getParent();
Document document = row.getDocument();
...
// prune the tree
row.detach();
}
}
);
Document document = reader.read(url);
// The document will now be complete but all the ROW elements
// will have been pruned.
// We may want to do some final processing now
...
Please see How dom4j handle very large XML documents? to understand how it works.
Moreover dom4j works with any SAX parser via JAXP.
For more details see What XML parser does dom4j use?

The XPath & XPathExpression classes have methods that accept an InputSource argument.
InputStream input = ...;
InputSource source = new InputSource(input);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("...");
String result = (String) expr.evaluate(source, XPathConstants.STRING);

How to rename an xml node to a html tag

Say I have a Java String which has xml data like so:
String content = "<abc> Hello <mark> World </mark> </abc>";
Now, I seek to render this String as text on a web page and hightlight/mark the word "World". The tag "abc" could change dynamically, so is there a way I can rename the outermost xml tag in a String using Java ?
I would like to convert the above String to the format shown below:
String content = "<i> Hello <mark> World </mark> </i>";
Now, I could use the new String to set html content and display the text in italics and highlight the word World.
Thanks,
Sony
PS: I am using xquery over files in BaseX xml database. The String content is essentially a result of an xquery which uses ft:extract(), a function to extract full text search results.

XML "parsing" with regexes can be cumbersome. If there is a possibility that your XML string can be more complicated than the one used in your example, you should consider processing it as a real XML node.
String newName = "i";
// parse String as DOM
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(content)));
// modify DOM
doc.renameNode(doc.getDocumentElement(), null, newName);
This code assumes that the element to that needs to be renamed is always the outermost element, that is, the root element.
Now the document is a DOM tree. It can be converted back to String object with a transformer.
// output DOM as String
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StringWriter sw = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(sw));
String italicsContent = sw.toString();

Perhaps a simple regex?
String content = "<abc> Sample text <mark> content </mark> </abc>";
Pattern outerTags = Pattern.compile("^<(\\w+)>(.*)</\\1>$");
Matcher m = outerTags.matcher(content);
if (m.matches()) {
content = "<i>" + m.group(2) + "</i>";
System.out.println(content);
}
Alternatively, use a DOM parser, find the children of the outer tag and print them, preceded and followed by your desired tag as strings

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing xml stored in string - java

Related

Format attributes for XML in Pretty format in java

If I keep space in any of the tag, I get error XML document structures must start and end within the same entity

Convert String to org.w3c.dom Document

How to parse large SOAP response

How to rename an xml node to a html tag

Categories

Resources