I have a string with newlines in an attribute parameter :
<rule-parameter formattingMethod="NO_FORMATTING_METHOD" type="String" value="Hello,
My name is luba.
How are you?"/>
When I unmarshal this xml the object that I get for this property the string is in one line.
What should I do so the java property will also have newline in the string ?
An XML parser (including JAXB) will not preserve a newline in an XML attribute.
http://www.w3.org/TR/REC-xml/#sec-line-ends
You will need to move this content to an XML element instead.
Related
For some reason someone changed the webService xml response that I needed. So now, the imformation I need to fetch is inside a CDATA tag.
The thing is that all "<" and ">" characters have been replaced with "<" and ">".
Example how it should look like:
<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
<nbr>234</nbr>
<nbrProcess>97` ....
And this is how I am receiving it:
<MapAAAResult>
<mapa>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map>
<nbr>234</nbr>
<nbrProcess>97 .....
How can I do to get the information back to its original form? More exactly how can I transform that information back to an xml?
Any ideas?
Thanks!!
Possibly related to the character escaping issue:
HTML inside XML CDATA being converted with < and > brackets
The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. Looks like the webService switched up their schema somewhere along the way.
I've encountered a similar issue where I had to parse an escaped xml. A quick solution to get back the xml is to use replaceAll():
String data = "<MapAAAResult>"
+ "<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr>"
+ "<nbrProcess>97";
data = data.replaceAll("<","<");
data = data.replaceAll(">", ">");
data = data.replaceAll("&","&");
System.out.println(data);
you will get back:
<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...
It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as:
<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>
Thus, escaping the embedded data by using the < > & is more resilient but can introduce unnecessary processing. Also note: some parsers or xml readers can recognize the escaped characters.
Some other related threads:
XSL unescape HTML inside CDATA
When to CDATA vs. Escape & Vice Versa?
how can I ignore whitespaces while parsing a XML file. It always calls the characters(...) method again, while after the end element a '\n' or '\r' is following, so it calls this method twice, instead of only once.
A SAXParser that is parsing a document against a DTD calls ignorableWhitespace() when it encounters whitespace in element content. For example, if this XML fragment
<ol>
<li>one</li>
<li>two</li>
</ol>
is parsed against this DTD fragment:
<!ELEMENT ol (li+)>
<!ELEMENT li (#PCDATA)>
the SAXParser would call characters(...) for "one" and "two", and ignorableWhitespace(...) for all the white space between the elements.
Note also that this applies only to parsing against a DTD. When using a Schema, ignorableWhitespace(...) is not called (even though the same kind of information is available).
I want to bind a XML content as a String to the field.
Here is how my xml seems like:
<sample>
<content>
<p>here is content <b>with bold</b></p>
</content>
</sample>
which should be bound to the following domain object:
#Entity
#Table(name="news_table")
#XmlRootElement
class Sample {
#XmlElement(name="content")
#Column(name="news_content")
private String content;
}
After unmarshalling, i want to bind the content starts with <p> as String type in order to persist formatted text with HTML tags, so that:
System.out.println(sample.getContent());
must give the following out:
> "<p>here is content <b>with bold</b></p>"
With #XmlElement annotation i get only empty string "" back from binding operation, since the JAXB recognize the element starts with "<p>" as Object to be bound according to my understanding.
Any suggestion ?
Try using #XmlAnyElement annotation with a custom DomHandler. You can find an example here.
If it is an option to change the content of the xml file, you could just escape the < and >. Then JAXB handles it just fine and you also get the correct html string when calling getContent() in java.
Here is your xml file with escaped content:
<sample>
<content><p>here is content <b>with bold</b></p></content>
</sample>
I transform xml with the Saxon XSLT2 processor (using Java + the Saxon S9API) and have to deal with xml-documents as the source, which contain invalid characters as tag names and therefore can't be parsed by the document-builder.
Example:
<A>
<B />
<C>
<D />
</C>
<E!_RANDOM_ />
< />
</A>
Code:
import net.sf.saxon.s9api.*;
[...]
/* XSLT Processor & Compiler */
proc = new Processor(false);
/* build document from input*/
XdmNode source = proc.newDocumentBuilder().build(new StreamSource(input));
Error:
Error on line X column Y
SXXP0003: Error reported by XML parser: Element type
"E" must be followed by either attribute specifications, ">" or "/>".
The exclamation mark and the tag name just being space are currently my only invalid tags.
I am searching for a more robust solution rather than just removing whole lines of the (formated) xml.
With some mind-bending I could come up with a regular expression to identify the invalid strings, but would struggle with the removal of the nodes containing attributes and child-nodes.
Thank you for your help!
If the input contains invalid tags then it is not XML. It's best to get your mind-set right by referring to these as non-XML documents rather than XML documents; that helps to make it clear that to process non-XML documents, you need non-XML tools. (Forget about "nodes" - there are no nodes until the document has been parsed, and it can't be parsed until you have turned it into well-formed XML). To turn non-XML into XML, you will typically want to use non-XML tools that are good at text manipulation, such as Perl. Of course, it's much better to fix the problem at source: all the benefits of XML are lost if people generate data in private non-XML formats.
i want to get the value of attribute of xml file without knowing it's index, since attributes are repeated in more than one element in the xml file.
here is my xml file
<fields>
<form name="userAdditionFrom">
</form>
</fields>
and here is the procssing file
case XMLEvent.ATTRIBUTE:
//how can i know the index of attribute?
String attName = xmlReader.getAttributeValue(?????);
break;
thanx in advance.
Alaa
If it is XMLStreamReader then getAttributeValue(int index) and getAttributeValue(String namespaceURI, String localName) can be used to get attribute value.
From your question it look like you are using mix of Event and Cursor API. I have appended Using StAX link for your reference that gives idea how to use both.
Resources:
XMLStreamReader getAttributeValue(String, String) JavaDoc Entry
Using StAX