How to escape ampersand in the XML present inside the CDATA section

How to escape ampersand in the XML present inside the CDATA section - java

Hi I tried to find out the solution for this but could get anything of much help.
The problem is that the CDATA section has a XML present in it and I want to escape the special character '&' present in it. I'am using XMLBeans and tried using XmlOptionCharEscapeMap but its throwing exception while parsing.
`XmlObject.Factory.parse(XMLString, xmlOptionsObj);`
here the setSaveSubstituteCharacters in xmlOptionsObj was set with XmlOptionCharEscapeMap Object.
XML example:
<Message xmlns="http://www.com.test/XMLSchema">
<Header></Header>
<Body><![CDATA[<Inner xmlns="http://www.com.test/XMLSchema">
<TagA>...</TagA>
</Inner>]]></Body>
</Message>'
'

Related

Java CDATA extract xml

For some reason someone changed the webService xml response that I needed. So now, the imformation I need to fetch is inside a CDATA tag.
The thing is that all "<" and ">" characters have been replaced with "<" and ">".
Example how it should look like:
<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
<nbr>234</nbr>
<nbrProcess>97` ....
And this is how I am receiving it:
<MapAAAResult>
<mapa>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map>
<nbr>234</nbr>
<nbrProcess>97 .....
How can I do to get the information back to its original form? More exactly how can I transform that information back to an xml?
Any ideas?
Thanks!!

Possibly related to the character escaping issue:
HTML inside XML CDATA being converted with < and > brackets
The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. Looks like the webService switched up their schema somewhere along the way.
I've encountered a similar issue where I had to parse an escaped xml. A quick solution to get back the xml is to use replaceAll():
String data = "<MapAAAResult>"
+ "<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr>"
+ "<nbrProcess>97";
data = data.replaceAll("<","<");
data = data.replaceAll(">", ">");
data = data.replaceAll("&","&");
System.out.println(data);
you will get back:
<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...
It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as:
<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>
Thus, escaping the embedded data by using the < > & is more resilient but can introduce unnecessary processing. Also note: some parsers or xml readers can recognize the escaped characters.
Some other related threads:
XSL unescape HTML inside CDATA
When to CDATA vs. Escape & Vice Versa?

org.xml.sax.SAXParseException;Reference is not allowed in prolog. auto generated XHTML, java

I just wanted to try out Flying Saucer to generate a PDF from an xhtml code. So what I did was to make a layout in LibreOffice, let it generate the xhtml code and (wanted to) hand this over to the parsing library (in java) to generate the pdf.
However, I couldn't take over all of the xml-code 1:1 as i needed to escape things.. so i escaped all "<" with "<" and all ">" with ">" and all double-quotes with a " \" ".
When trying to parse the whole thing i get following error:
[Fatal Error] :1:2: Reference is not allowed in prolog.
I tried to track it down via some logic thinking and googling. If I understood right following is my "prolog":
buf.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
buf.append("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN\" \"http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd\">");
buf.append("<html xmlns=\"http://www.w3.org/1999/xhtml\"><!--This file was converted to xhtml by OpenOffice.org - see http://xml.openoffice.org/odf2xhtml for more info.--><head profile=\"http://dublincore.org/documents/dcmi-terms/\"><meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=utf-8\"/><title xml:lang=\"en-US\">- no title specified</title><meta name=\"DCTERMS.title\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.language\" content=\"en-US\" scheme=\"DCTERMS.RFC4646\"/><meta name=\"DCTERMS.source\" content=\"http://xml.openoffice.org/odf2xhtml\"/><meta name=\"DCTERMS.issued\" content=\"2012-11-20T20:59:05.11\" scheme=\"DCTERMS.W3CDTF\"/><meta name=\"DCTERMS.provenance\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.subject\" content=\",\" xml:lang=\"en-US\"/><link rel=\"schema.DC\" href=\"http://purl.org/dc/elements/1.1/\" hreflang=\"en\"/><link rel=\"schema.DCTERMS\" href=\"http://purl.org/dc/terms/\" hreflang=\"en\"/><link rel=\"schema.DCTYPE\" href=\"http://purl.org/dc/dcmitype/\" hreflang=\"en\"/><link rel=\"schema.DCAM\" href=\"http://purl.org/dc/dcam/\" hreflang=\"en\"/><style type=\"text/css\">");
sorry for the huge (and ugly) thing, but well.. next thing i did, was commenting out line per line to see where the wrong thing is.
the error still appears if i comment out the first two lines of this, after the third i get a different error ("Content is not allowed in prolog" or similiar)
however, here is the third line.. i can't find the error, every help is appreciated :)
buf.append("<html xmlns=\"http://www.w3.org/1999/xhtml\"><!--This file was converted to xhtml by OpenOffice.org - see http://xml.openoffice.org/odf2xhtml for more info.--><head profile=\"http://dublincore.org/documents/dcmi-terms/\"><meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=utf-8\"/><title xml:lang=\"en-US\">- no title specified</title><meta name=\"DCTERMS.title\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.language\" content=\"en-US\" scheme=\"DCTERMS.RFC4646\"/><meta name=\"DCTERMS.source\" content=\"http://xml.openoffice.org/odf2xhtml\"/><meta name=\"DCTERMS.issued\" content=\"2012-11-20T20:59:05.11\" scheme=\"DCTERMS.W3CDTF\"/><meta name=\"DCTERMS.provenance\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.subject\" content=\",\" xml:lang=\"en-US\"/><link rel=\"schema.DC\" href=\"http://purl.org/dc/elements/1.1/\" hreflang=\"en\"/><link rel=\"schema.DCTERMS\" href=\"http://purl.org/dc/terms/\" hreflang=\"en\"/><link rel=\"schema.DCTYPE\" href=\"http://purl.org/dc/dcmitype/\" hreflang=\"en\"/><link rel=\"schema.DCAM\" href=\"http://purl.org/dc/dcam/\" hreflang=\"en\"/><style type=\"text/css\">");
thanks in advance!
edit1: http://validator.w3.org/check validated it as totally correct!

It appears you're being confused by the bad layout of this blog article. If you download the sample code, you'll see that the '<' and '>' characters are not converted to "<" and ">" in the author's actual code and data.
In order to get quotes into hard-coded Java strings, you do of course have to escape them. But you shouldn't need any of this xml escaping.

how to place < symbol in xml file which is to be read by the java program?

I am placing an SQL query(which contains < symbol) inside xml file and i am trying to read that query in a java program.
But it is displaying the exception
"org.xml.sax.SAXParseException: The content of elements must consist
of well-formed character data or markup."
can any one help me how to fix the above issue?

You need to escape using XML entities:
& encode as &
< encode as <
Technically, you don't need to escape the following, but it is common to do so:
> encode as >
" encode as "
' encode as &apos;
For more info, see this Wikipedia article for more

Make use of CDATA
CDATA - (Unparsed) Character Data
CDATA stands for Character Data and it means that the data in between these tags includes data that could be interpreted as XML markup, but should not be
The term CDATA is used about text data that should not be parsed by the XML parser.
Characters like "<" and "&" are illegal in XML elements.
"<" will generate an error because the parser interprets it as the start of a new element.
"&" will generate an error because the parser interprets it as the start of an character entity.
Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.
Everything inside a CDATA section is ignored by the parser.
Example:
<![CDATA[ select <abcddata> ]]>

< = >
> = <
These are the HTML entities and should be accepted

XML Illegal Attribute Value

I am using the SAX parser in java to read some XML. The XML I am giving it has problems and is causing the parse to fail. Here is the error message:
11-18 10:25:37.290: W/System.err(3712): org.xml.sax.SAXParseException: Illegal: "<" inside attribute value (position:START_TAG <question text='null'>#1:23 in java.io.InputStreamReader#4074c678)
I have a feeling that it does not like the fact that I have some HTML tags inside of a string in the XML. I would think that anything inside the quotes gets ignored from a syntax standpoint. Also, is it valid to use single quotes here? Here is an example:
<quiz>
<question text="<img src='//files/alex/hilltf.PNG' alt='hill' style='max-width:400px' /> is represented on map by cut. ">
<answer text="1"/>
<answer text="2" correct="true"/>
</question>
</quiz>

You need to escape the < inside the text attribute value. Since XML uses < and > to denote tags, it's illegal in content unless escaped or enclosed in a CDATA tag (which isn't an option for an attribute value).

The error message is correct. A < must be the start of a tag, and cannot appear inside a string. It must be < instead. I don't believe the quotes is a problem.

Java XML resource bundle and HTML content

Need to store HTML content as a value in the resource bundle (XML format). The HTML tags are conflicting with the XML tags. How can I store the HTML string without using character entity references (< and >)

Put it between <![CDATA[ and ]]>.

have you tried using "CDATA ". CDATA section will be ignored by the parser.
see example here.
http://www.w3schools.com/xml/xml_cdata.asp

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to escape ampersand in the XML present inside the CDATA section - java

Related

Java CDATA extract xml

org.xml.sax.SAXParseException;Reference is not allowed in prolog. auto generated XHTML, java

how to place < symbol in xml file which is to be read by the java program?

XML Illegal Attribute Value

Java XML resource bundle and HTML content

Categories

Resources