Preserving the CDATA format with a SAX parser - java

I'm trying to parse an XML file and insert some attributes in my database. I'm developing in JAVA and using SAX to parse the XML file.
My problem is that when I read an attribute in CDATA format I only get what the CDATA contains. Perhaps I wan't to keep the CDATA format?
For example with the XML below :
<?xml version="1.0" encoding="UTF-8"?>
<Bank>
<Account type="saving">
<Id>1001</Id>
<Name><![CDATA[<Jack> <Robinson>]]></Name>
<Amt>10000</Amt>
</Account>
<Account type="current">
<Id>1002</Id>
<Name>Sony Corporation</Name>
<Amt>1000000</Amt>
</Account>
</Bank>
I would like to get the Name and have it like this <![CDATA[<Jack> <Robinson>]]> and not only <Jack> <Robinson> which is what I am getting.
Can anyone help me with this issue please.
PS : Sorry for my English, I'm french.
Best regards,

Like #Quentin asked, I am curious why do you care about markup.
Did you consider appending <![CDATA[ and ]] using StringBuffer manually in your output.

Related

Selective XML parsing

This the xml file I have
<?xml version="1.0" encoding="UTF-8"?>
<Bank>
<Account type="saving">
<Id>1001</Id>
<Name>Jack Robinson</Name>
<Amt>10000</Amt>
</Account>
<Account type="current">
<Id>1002</Id>
<Name>Sony Corporation</Name>
<Amt>1000000</Amt>
</Account>
</Bank>
I need to parse this xml and get the contents between <Bank>...</Bank>. My output xml should be
<Account type="saving">
<Id>1001</Id>
<Name>Jack Robinson</Name>
<Amt>10000</Amt>
</Account>
<Account type="current">
<Id>1002</Id>
<Name>Sony Corporation</Name>
<Amt>1000000</Amt>
</Account>
Any ideas on how to achieve this using Java?
First of all:
your output XML is not valid XML.
XML must have root element which you try to remove.
As #Seelenvirtuose said, there are tons of ways to do what you want on many levels.
From simple manipulating original XML as String and up to using DOM model, JAXB, XPath/XQuery, or XSLT. It is matter of your choice.
As example with Apache commons utils:
String resultString = org.apache.commons.lang.StringUtils.substringBetween(originalXMLString,"<Bank>","</Bank>").trim();
Of course your output can be only String, because it is not valid XML. Then you can do with that String whatever you want - print it, store in file or DB etc...

How to remove XML meta tags in java or jsp

i am having a xml something looks like this
<?xml version='1.0' encoding='utf-8'?><s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><content></content>
i just want to extract the content alone,i am expecting something like this.
<content></content>
i know String replace is a solution but it changes every time.
Thanks.
Using jsoup:
String extracted = Jsoup.parse(xml).getElementsByTag("s:body").first().html();

android - How to search in xml file

i have a XML file in : /res/xml/countries.xml
and also i have a view with editText inside.
i want to search that text (users should type in editText) in my countries.xml file!
here is my xml file :
<?xml version="1.0" encoding="utf-8"?>
<countries>
<country>
<name>United State</name>
</country>
<countries>
If you want to read a whole document, use XML Parser, like Pull already mentioned. If you only want to pick out only one or few certain things, I would suggest XPath. Here is a good Tutorial on how to use XPath in Java.
First you need to parse that xml file containing the countries name. Store list of countries in any arraylist and then write any search algo to search the text found from that edittext.
Try this xpath tutorial...
You will have good learning experience...
This tutorial explain clearly how to parse an XML file : XML parser
Other tutorial simpler with XPath : Xparse tutorial

Parsing Hyphenated xml Tag

can somebody explain me how to parse hyphenated xml tag using commons digest .i have been searching around the net but no luck help me out in this
<foo id="1">...
<bar>
<foo-ref id="1"/>
</bar>
</foo>
What about this?
digester.addObjectCreate("foo-ref", "mypackage.FooRef");
http://commons.apache.org/proper/commons-digester/guide/core.html#doc.Usage

Parse XML with nested xml opening tags <?xml ...?> in java

can you help me in parsing xml with nested <?xml version="1.0" encoding="utf-8"?> tags. when i am trying to parse this xml, i m getting parsing error.
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
</serviceResponse>
</soapenvBody>
</soap>
I don't think this is really a Java problem. Having a second XML declaration within the XML body is just illegal, so I don't think you'll be able to get any XML parsers to parse that. If you have control over the XML (it looks like you're generating it to store a response) then you could try wrapping the inner-XML document with CDATA:
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
]]>
</serviceResponse>
</soapenvBody>
</soap>
EDIT:
I'm thinking that you most likely don't want the extra XML declaration inside that response at all. Do you have control over the code that creates the response? My guess is that the XML snippet <data>...</data> is created as a separate DOM object and then the string is spliced in the middle of the response. Writing out the entire XML document object results in the XML declaration being included, but if you just grab the document root node object (<data>) and write that out as a string then it probably won't include the extra XML declaration that's causing you all this trouble.
It occurred to me that a parser made for dealing with HTML might be able to do what you want. Since HTML tends to be a total mess compared to strict XML, HTML parsers are usually much more error-tolerant. A quick search turned up jsoup. I was able to pull the respCode from your sample XML above with roughly this code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
String data = "your xml goes here";
Document doc = Jsoup.parse(data);
String respCodeRaw = doc.select("respCode").first().text();
int respCode = Integer.valueOf(respCodeRaw);
(I actually tested the library in the Clojure repl, but the code above should work!)
A tag that starts with like <? is a processing instruction. <?xml...> is an XML declaration, and can only be present at the beginning of the xml content. It's not allowed in the XML body.
Why does your soap body contain this? Do you have the option of removing it?
i did not find any parser in java to parse such embedded xml as it is not a valid xml and i guess almost all parses validate the xml before parsing it. so i choose the option to preprocess the xml and selected the inner xml then using SAX parser i parsed the xml and retrieved the values from xml. Guys thanks for your replies.

Categories

Resources