Java XML resource bundle and HTML content

Java XML resource bundle and HTML content - java

Need to store HTML content as a value in the resource bundle (XML format). The HTML tags are conflicting with the XML tags. How can I store the HTML string without using character entity references (< and >)

Put it between <![CDATA[ and ]]>.

have you tried using "CDATA ". CDATA section will be ignored by the parser.
see example here.
http://www.w3schools.com/xml/xml_cdata.asp

Related

Java CDATA extract xml

For some reason someone changed the webService xml response that I needed. So now, the imformation I need to fetch is inside a CDATA tag.
The thing is that all "<" and ">" characters have been replaced with "<" and ">".
Example how it should look like:
<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
<nbr>234</nbr>
<nbrProcess>97` ....
And this is how I am receiving it:
<MapAAAResult>
<mapa>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map>
<nbr>234</nbr>
<nbrProcess>97 .....
How can I do to get the information back to its original form? More exactly how can I transform that information back to an xml?
Any ideas?
Thanks!!

Possibly related to the character escaping issue:
HTML inside XML CDATA being converted with < and > brackets
The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. Looks like the webService switched up their schema somewhere along the way.
I've encountered a similar issue where I had to parse an escaped xml. A quick solution to get back the xml is to use replaceAll():
String data = "<MapAAAResult>"
+ "<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr>"
+ "<nbrProcess>97";
data = data.replaceAll("<","<");
data = data.replaceAll(">", ">");
data = data.replaceAll("&","&");
System.out.println(data);
you will get back:
<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...
It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as:
<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>
Thus, escaping the embedded data by using the < > & is more resilient but can introduce unnecessary processing. Also note: some parsers or xml readers can recognize the escaped characters.
Some other related threads:
XSL unescape HTML inside CDATA
When to CDATA vs. Escape & Vice Versa?

In XSLT, how do I get the filepath of the xml file of a certain element if that xml file was included with xinclude?

I have these XML files:
master.xml (which uses XInclude to include child1.xml and child2.xml)
child1.xml
child2.xml
Both child1.xml and child2.xml contain a <section> element with some text.
In the XSLT transformation, I 'd want to add the name of the file the <section> element came from, so I get something like:
<section srcFile="child1.xml">Text from child 1.</section>
<section srcFile="child2.xml">Text from child 2.</section>
How do I retrieve the values child1.xml and child2.xml?

Unless you turn off that feature, all XInclude processors should add an #xml:base attribute
with the URL of the included file. So you don't have to do anything, it should already be:
<section xml:base="child1.xml">Text from child 1.</section>
<section xml:base="child2.xml">Text from child 2.</section>
( If you want, you can use XSLT to transform the #xml:base attr into #srcFile. )

I'm 99% sure that once xi:include has been processed, you have a single document (and single infoset) that won't let you determine which URL any given part of the document came from.
I think you will need to place that information directly in the individual included files. Having said that, you can still give document-uri a try, but I think all nodes will return the same URI.

How to parse xml having html tags within xml tags

I've got an xml which has html within the xml tags and i'm not able to parse as it.
When i start parsing the xml the str tag has html in it
can anyone help me out in extracting the html with all the tags.

It is a good idea to store XHTML within CDATA tags (<![CDATA[ and ]]>), so that it can be retrieved normally:
<str name="body">
<![CDATA[<font face="arial" size="2"><ul><li><p align="justify">india’s first</p></li></ul></font>]]>
</str>

Problem is not the HTML but improper HTML. If this HTML is in your hand, ensure it complies with XHTML and xml parser will treat it as normal xml. However, you may otherwise use tools like "HTML Tidy" ti fix your HTML and use HTML parsers. For example:
http://www.codeproject.com/KB/dotnet/apmilhtml.aspx

Need to handle special characters in URL

My input html is
<p>
<span>first
</span>
<span>Google Cloud Connect for Microsoft Office</span>
</p>
I am using xslt1.0 to convert the html to xml..my output xml is
<Relationship Id="rId12700703801" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="http://tools.google.com/dlpage/cloudconnect#utm_campaign=launch&utm_source=en-na-us-gdb-GCC-Appsperience_02242011&utm_medium=blog" TargetMode="External"/></Relationships>
with error "XML Parsing Error: not well-formed" in the location =(after launch&utm_source) in target attribute..
I want to escape the special characters present in url through xslt and make the xml.
Please help me. Thanks in advance..

are you generating the input html? if so you can use URLEncoder.encode to properly encode the string so the transformer doesn't complain about the syntax.
If this is just a random html page, and you have no control over it, then you probably need to use some html parser, such as tagsoup, et. al, to pre-correct it as most html files are not properly formatted.

XSLT expects XML as input, not HTML. You need to turn your HTML into XML if you want to transform it with XSLT.
I think it might be possible to do it with HTML Tidy.

How to add doctype with ID attribute to the XML file in java?

I am dynamically creating a DOM object and need to add following doctype to the XML file in java:
<!DOCTYPE MyXml [<!ATTLIST node id ID #REQUIRED>]>
I am using org.w3c.dom, is there any way we can do this?
Regards,
Abhishek

The org.w3c.dom just provides the interfaces for the DOM. Are you implementing these interfaces?
Otherwise, if you are using a library like JDOM, it's very simple.
See http://www.jdom.org/docs/apidocs/org/jdom/DocType.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java XML resource bundle and HTML content - java

Need to store HTML content as a value in the resource bundle (XML format). The HTML tags are conflicting with the XML tags. How can I store the HTML string without using character entity references (< and >)

Put it between <![CDATA[ and ]]>.

have you tried using "CDATA ". CDATA section will be ignored by the parser. see example here. http://www.w3schools.com/xml/xml_cdata.asp

Related

Java CDATA extract xml

In XSLT, how do I get the filepath of the xml file of a certain element if that xml file was included with xinclude?

How to parse xml having html tags within xml tags

Need to handle special characters in URL

How to add doctype with ID attribute to the XML file in java?

Categories

Resources