Parse CDATA from XML to enable editing Java

Parse CDATA from XML to enable editing Java - java

I am writing a simulator which communicates with a client's piece of software over a local socket. The communication language is XML. I have written some code which works - parsing the incoming XML string into Document via the DocumentBuilder interface.
I have been encountering a problem with CDATA (Having never seen it before). Basically, I need to access fields within the CDATA tag and change them. I load up a 'template' XML document (to reply to the messages with) and use values received in the first message inside the response. Some of the fields that need to be changed are in this CDATA tag (clear what I mean below).
public static String getOutputMessage(String input) throws Exception{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document inputDoc, outputDoc;
Element messageElement = (Element)inputDoc.getElementsByTagName("TRANS").item(0);
messageType = messageElement.getAttribute("name");
if (messageType.equals("processTransaction")){
outputDoc = db.parse(path+"processTransaction\\posPrintReceipt.xml");
outputDoc = changeContent(outputDoc, "PAN_NUMBER", transaction.getPan_number());
outputDoc = changeContent(outputDoc, "TOKEN", transaction.getToken());
outputDoc = changeContent(outputDoc, "TOTAL_AMOUNT", transaction.getTotal_amount());
outputDoc = changeContent(outputDoc, "TRANSACTION_TIME", transaction.getTransaction_time());
outputDoc = changeContent(outputDoc, "TRANSACTION_DATE", transaction.getTransaction_date());
}
}
private static Document changeContent(Document doc,String tag,String value) {
System.out.println("Changing: ["+tag+" : "+value+"]");
NodeList nodes=doc.getElementsByTagName(tag);
Node node = nodes.item(0);
Node parent=node.getParentNode();
node.setTextContent(value);
System.out.println(doc.getElementsByTagName(tag).item(0) + " " + node.getTextContent());
parent.replaceChild(node, doc.getElementsByTagName(tag).item(0));
return doc;
}
The functions above work on normal Elements but below is an example XML message I have to read and change some values such as
<RLSOLVE_MSG version="5.0">
<MESSAGE>
<SOURCE_ID>DP01</SOURCE_ID>
<TRANS_NUM>000001</TRANS_NUM>
</MESSAGE>
<POI_MSG type="interaction">
<INTERACTION name="posPrintReceipt">
<RECEIPT type="merchant" format="xml">
<![CDATA[<RECEIPT>
<AUTH_CODE>06130</AUTH_CODE>
<CARD_SCHEME>VISA</CARD_SCHEME>
<CURRENCY_CODE>GBP</CURRENCY_CODE>
<CUSTOMER_PRESENCE>internet</CUSTOMER_PRESENCE>
<FINAL_AMOUNT>1.00</FINAL_AMOUNT>
<MERCHANT_NUMBER>8888888</MERCHANT_NUMBER>
<PAN_NUMBER>454420******0382</PAN_NUMBER>
<PAN_EXPIRY>12/15</PAN_EXPIRY>
<TERMINAL_ID>04176421</TERMINAL_ID>
<TOKEN>454420bbbbbkqrm0382</TOKEN>
<TOTAL_AMOUNT>1.00</TOTAL_AMOUNT>
<TRANSACTION_DATA_SOURCE>keyed</TRANSACTION_DATA_SOURCE>
<TRANSACTION_DATE>14/02/2014</TRANSACTION_DATE>
<TRANSACTION_NUMBER>000001</TRANSACTION_NUMBER>
<TRANSACTION_RESPONSE>06130</TRANSACTION_RESPONSE>
<TRANSACTION_TIME>17:13:17</TRANSACTION_TIME>
<TRANSACTION_TYPE>purchase</TRANSACTION_TYPE>
<VERIFICATION_METHOD>unknown</VERIFICATION_METHOD>
<DUPLICATE>false</DUPLICATE>
</RECEIPT>]]>
</RECEIPT>
</INTERACTION>
</POI_MSG>

CDATA is an encoding mechanism to include arbitrary data within an XML file. Everything within CDATA is parsed as a single string when loading the XML into a Document instance. If you need to access the contents of the CDATA as a DOM document, you will need to instantiate a second Document object from the string contents, make your changes, then serialize that back to a string and put the string back into a CDATA in the original document.

I dont think CDATA section will be parsed as other regular elements in the XML. CDATA section is purely to escape any syntax checks. My suggestion would be use a element to represent the data in CDATA section. If you still want to use CDATA section, I guess you'll need parse the section as a string and then load the data into a Document.

Related

> and < gets converted to > and < while adding a xml like string in element.setTextContent()

I have a string which looks like an XML
Ex: String sample = "<GrpHdr><MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId><CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm></GrpHdr>";
I am trying to create an XML document with an element containing the above information:
Ex:
<ns1:TstCode>T</ns1:TstCode>
<ns1:FType>SCF</ns1:FType>
<ns1:FileRef>220811084023</ns1:FileRef>
<ns1:RoutingInd>ALL</ns1:RoutingInd>
<ns1:FileBusDt>2022-08-11</ns1:FileBusDt>
<ns1:FIToFI xmlns="urn:iso:std:iso:20022:tech:xsd">
<GrpHdr>
<MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId>
<CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm>
</GrpHdr>
</ns1:FIToFI>
When I create the document for the above XML using this code:
private static DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation domImpl = db.getDOMImplementation();
private Document buildExampleDocumentWithNamespaces(DOMImplementation domImpl, String output) {
Document document = domImpl.createDocument("urn:Scf:xsd:$BlkCredTrf", "ns1:BlkCredTrf", null);
document.getDocumentElement().appendChild(document.createElement("ns1:TstCode")).setTextContent("T");
document.getDocumentElement().appendChild(document.createElement("ns1:FType")).setTextContent("SCF");
document.getDocumentElement().appendChild(document.createElement("ns1:FileRef")).setTextContent("220811084023");
document.getDocumentElement().appendChild(document.createElement("ns1:RoutingInd")).setTextContent("ALL");
document.getDocumentElement().appendChild(document.createElement("ns1:FileBusDt")).setTextContent("2022-08-11");
document.getDocumentElement().appendChild(document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI");
return document;
}
I do not have issues until this point.
When I try to add <GrpHdr><MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId><CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm></GrpHdr> as a Text content to the FIToFI tag at last, using the code:
document.getDocumentElement().appendChild(document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI").setTextContent(sample);
The XML gets created like this:
<ns1:TstCode>T</ns1:TstCode>
<ns1:FType>SCF</ns1:FType>
<ns1:FileRef>220811084023</ns1:FileRef>
<ns1:RoutingInd>ALL</ns1:RoutingInd>
<ns1:FileBusDt>2022-08-11</ns1:FileBusDt>
<ns1:FIToFI xmlns="urn:iso:std:iso:20022:tech:xsd">
<GrpHdr>
<MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId>
<CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm>
</GrpHdr>
</ns1:FIToFI>
Please help me to create this XML without the escape characters.

That the content is escaped is intended. When you set the text content of an element, any special character like < have to be escaped like <, otherwise the text content will be interpreted as other XML content like elements or comments. That's why setTextContent() will escape the content for you.
When you want to add an element instead, you use methods like appendChild() with an Element argument. Build your elements as usual with the createElement() method and add them together like this:
Element element = document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI");
Element grpHdr = document.createElement("GrpHdr");
Element msgId = document.createElement("MsgId");
msgId.setTextContent("MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV");
grpHdr.appendChild(msgId);
Element creDtTm = document.createElement("CreDtTm");
creDtTm.setTextContent("2023-02-02T21:48:58.075+05:30");
grpHdr.appendChild(creDtTm);
element.appendChild(grpHdr);
document.getDocumentElement().appendChild(element);
This will add the XML element inside the other XML element.
When you have the inner XML as a string, parse the XML string with DocumentBuilder.parse() (see How to create a XML object from String in Java?) and import the Element with the Document.importNode() method (see org.w3c.dom.DOMException: WRONG_DOCUMENT_ERR: A node is used in a different document than the one that created it). The code can look like this:
String innerXml = "<GrpHdr><MsgId[...]eDtTm></GrpHdr>";
Document innerDocument = db.parse(new InputSource(new StringReader(innerXml)));
Element innerRootElement = innerDocument.getDocumentElement();
Node importedNode = document.importNode(innerRootElement, true);
element.appendChild(importedNode);

Can't parse XML (from web) using JSoup

I am trying to work with small XML files sent from web and parse few attributes from them. How would I approach this in JSoup? I know it's not XML Parser but HTML one but it supports XML too and I don't have to build any Handlers, BuildFactories and such as I would have to in DOM, SAX etc.
Here is example xml: LINK I can't paste it here because it exits the code tag after every line - if someone can fix that I would be grateful.
And here is my piece of code::
String xml = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
// want to select first occurrence of genre tag though there is only one it
// doesn't work without .first() - but it doesn't parse it
Element genreFromXml = doc.select("genre").first();
String genre = genreFromXml.text();
System.out.println(genre);
It results in NPE at:
String genre = genreFromXml.text();

There are 2 issues in your code:
You provide a String representation of an URL while an XML content is expected, you should rather use the method parse(InputStream in, String charsetName, String baseUri, Parser parser) instead to parse your XML as an input stream.
There is no element genre in your XML, genre is an attribute of the element movie.
Here is how your code should look like:
String url = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
// Parse the doc using an XML parser
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", "", Parser.xmlParser());
// Select the first element "movie"
Element movieFromXml = doc.select("movie").first();
// Get its attribute "genre"
String genre = movieFromXml.attr("genre");
// Print the result
System.out.println(genre);
Output:
Drama, War

OpenSAML custom attribute value

I'm trying to create a SAML response. One of the attributes that makes up the assertion is called address and the attribute value needs to be a custom type that is defined in an XSD. How do I add custom attribute value types to the response?

If your attribute value XML is in String form:
String yourXMLFragment = "...";
AttributeStatementBuilder attributeStatementBuilder =
(AttributeStatementBuilder) builderFactory.getBuilder(AttributeStatement.DEFAULT_ELEMENT_NAME);
AttributeStatement attributeStatement = attributeStatementBuilder.buildObject();
AttributeBuilder attributeBuilder =
(AttributeBuilder) builderFactory.getBuilder(Attribute.DEFAULT_ELEMENT_NAME);
Attribute attr = attributeBuilder.buildObject();
attr.setName("yourAttributeName");
XSAnyBuilder sb2 = (XSAnyBuilder) builderFactory.getBuilder(XSAny.TYPE_NAME);
XSAny attrAny = sb2.buildObject(AttributeValue.DEFAULT_ELEMENT_NAME, XSAny.TYPE_NAME);
attrAny.setTextContent(yourXMLFragment.trim());
attr.getAttributeValues().add(attrAny);
attributeStatement.getAttributes().add(attr);

Actually this above does not yeld correct results. The above example can be used only to create xsany with text content not xml content (xml content gets escaped).
So after digging in opensaml sources the following did work as needed:
public XSAny createXSAny(Element dom)
{
XSAnyBuilder anyBuilder = (XSAnyBuilder) Configuration.getBuilderFactory().getBuilder(XSAny.TYPE_NAME);
XSAny any = anyBuilder.buildObject(AttributeValue.DEFAULT_ELEMENT_NAME, XSAny.TYPE_NAME);
// this builds only the root element not the whole dom
XSAny xo=anyBuilder.buildObject(dom);
// set/populate dom so whole dom gets into picture
xo.setDOM(dom);
any.getUnknownXMLObjects().add(xo);
return any;
}

jdom removes duplicate namespace declaration (xmloutputter)

jdom seems to remove duplicate namespace declarations. This is a problem when a XML document is embedded into another XML structure, such as for example in the OAI-PHM (open archive initiative). This can be a problem when the surrounding xml is only a container and the embedded document gets extracted later.
Here is some code. The embedded xml is contained in the string with the same name. It declares the xsi namespace. We construct a jdom container, also declaring the xsi namespace. We parse and embed the string. When we print the whole thing the inner xsi namepsace is gone.
public static final Namespace OAI_PMH= Namespace.getNamespace( "http://www.openarchives.org/OAI/2.0/");
public static final Namespace XSI = Namespace.getNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
public static final String SCHEMA_LOCATION = "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";
public static final String ROOT_NAME = "OAI-PMH";
String embeddedxml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <myxml xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\""
+ "http://www.isotc211.org/2005/gmd"
+ " http://www.ngdc.noaa.gov/metadata/published/xsd/schema/gmd/gmd.xsd"
+ " http://www.isotc211.org/2005/gmx"
+ " http://www.ngdc.noaa.gov/metadata/published/xsd/schema/gmx/gmx.xsd\">\""
+ "</myxml>";
// loadstring omitted (parse embeddedxml into jdom)
Element xml = loadString(embeddedxml ,false);
Element root = new Element(ROOT_NAME, OAI_PMH);
root.setAttribute("schemaLocation", SCHEMA_LOCATION, XSI);
// insert embedded xml into container structure
root.addContent(xml);
XMLOutputter out = new XMLOutputter(Format.getPrettyFormat());
// will see that the xsi namespace declaration from embeddedxml is gone
out.output(root,System.out);
I think that XMLoutputter is responsible for this behaviour. Any hints how I can make it preserve the duplicate namepspace?
thanks
Kurt

Something is missing in your code: The declaration of final static String ROOT_NAME is not shown and Element xml ist not used after initialization.
If ROOT_NAME is initialized with "myxml" somewhere else, then the solution to your problem is, that you just don't add the xml element to your document, and the result looks as if you did so.

How to prevent xml transformer to transform empty tags into single tag

I'm using javax.xml.transform.Transformer class to transform the DOM source into XML string. I have some empty elements in DOM tree, and these become one tag which I don't want.
How do I prevent <sampletag></sampletag> from becoming <sampletag/>?

I hade the same problem.
This is the function to get that result.
public static String fixClosedTag(String rawXml){
LinkedList<String[]> listTags = new LinkedList<String[]>();
String splittato[] = rawXml.split("<");
String prettyXML="";
int counter = 0;
for(int x=0;x<splittato.length;x++){
String tmpStr = splittato[x];
int indexEnd = tmpStr.indexOf("/>");
if(indexEnd>-1){
String nameTag = tmpStr.substring(0, (indexEnd));
String oldTag = "<"+ nameTag +"/>";
String newTag = "<"+ nameTag +"></"+ nameTag +">";
String tag[]=new String [2];
tag[0] = oldTag;
tag[1] = newTag;
listTags.add(tag);
}
}
prettyXML = rawXml;
for(int y=0;y<listTags.size();y++){
String el[] = listTags.get(y);
prettyXML = prettyXML.replaceAll(el[0],el[1]);
}
return prettyXML;
}

If you want to control how XML is formatted, provide your own ContentHandler to prettify XML into "text". It should not matter to the receiving end (unless human) whether it receives <name></name> or <name/> - they both mean the same thing.

The two representations are equivalent to an XML parser, so it doesn't matter.
If you want to process XML with anything else than an XML-parser, you will end up with a lot of work and an XML-parser anyway.

If the process you are sending it through NEEDS the element not to be self-closing (which it should not), you can force the element not to be self-closing by placing content inside of it.
How does the PDF converter handle XML comments or processing instructions?
<sampletag>!<--Sample Comment--></sampletag>
<sampletag><?SampleProcessingInstruction?></sampletag>

I tried below to prevent transform empty tags into single tag :
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.METHOD,"html")
It's retaining empty tags.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse CDATA from XML to enable editing Java - java

Related

> and < gets converted to > and < while adding a xml like string in element.setTextContent()

Can't parse XML (from web) using JSoup

OpenSAML custom attribute value

jdom removes duplicate namespace declaration (xmloutputter)

How to prevent xml transformer to transform empty tags into single tag

Categories

Resources