So I'm trying to parse a GPX file using the XmlPullParser.
For the most part, I have it working, but noticed that I'm not getting what I'm expecting.
A snippet of the file:
<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1">
<wpt lat="34.767778" lon="-88.078889">
<name>EG1325</name>
<type>Waypoint</type>
<extensions>
<groundspeak:cache>
<groundspeak:country>United States</groundspeak:country>
</groundspeak:cache>
</extensions>
</wpt>
</gpx>
I trimmed the unimportant tags here, for the purpose of this question, assuming that the file passes validation with all namespaces represented. (Because the full file does.)
The issue comes when I get past the <type> tag.
Using EITHER next() or nextToken(), I will get the END_TAG event for the <type> tag. Then my next event will be a TEXT event, an the text will contain \n. The event after that will be the START_TAG, but for the <groundspeak:cache> and NOT the <extensions> tag.
I seem to get this both for using the nextToken() and next() calls. Is this expected?
Edit to add: The only setting I am setting in code for the XmlPullParser is:
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(false);
Check your xml file. Some xml files contains at start some extra bytes, to be specific "EF BB BF". It's called BOM (Byte-Order-Mark). When xml contains this extra bytes our XmlPullParser doesn't work properly and behave like there is no START_TAG event and goes to END_DOCUMENT.
Related
My requirement is to match following tag in data.xml file and replace the content in display.xml file using in ant
data.xml
--------
<data>123456789</data>
display.xml
-----------
<data>abcdefg</data
I need to match the content in data.xml file and replace the it in display.xml file.
my final output should be like:
data.xml
--------
<data>123456789</data>
display.xml
-----------
<data>123456789</data>
How can i solve this Issue? Thanks in advance
I did not find any Ant task named GetXmlProperties, but I think you may have thought about this one, XmlProperty, which composes a sequence of properties from parsing an XML document.
Two ways, to achieve what you want, come to my mind (there may be more):
The most primitive would be to just use the XmlProperty task to retreive the value in question and use a crude Replace task on the destination file, doing a simple string replace, by handling the destination as a plain text file, instead of caring for the XML logic in it. However, doing string match and replace with XML data is no fun and error prone.
Thus I propose a second approach, which is to use the XmlTask, as per the following example. Adding a prefix xml to our newly parsed properties makes it easier to distinguish them from the rest. For demo purposes we also let the build process log all the new properties under the xml prefix to the console by using the EchoProperties task.
<project name="SO63816092" default="default">
<taskdef name="xmltask" classname="com.oopsconsultancy.xmltask.ant.XmlTask" />
<xmlproperty file="data.xml" prefix="xml" />
<echoproperties prefix="xml"/>
<target name="default">
<xmltask source="display.xml" dest="display.xml" failWithoutMatch="true">
<replace path="//data/text()" withText="${xml.data}" />
</xmltask>
</target>
</project>
Using the second approach, while using the following input file data.xml:
<?xml version="1.0" encoding="UTF-8"?>
<data>123456789</data>
and the following destination file display.xml:
<?xml version="1.0" encoding="UTF-8"?>
<data>abcdefg</data>
I can successfully accomplish what you ask for and get display.xml to become:
<?xml version="1.0" encoding="UTF-8"?>
<data>123456789</data>
Just remember to place the XmlTask Java jar in the current classpath for your Ant process and note, that you may need to change the XPath expression, we use, //data/text(), to something more refined, should your document structure demand it, because the way, it is now, it would replace the value for all data elements it finds, throughout the whole display.xml document.
I am using EXIficient to convert XML data to EXI and back to XML. Here, i use their EXIficientDemo class. Sample Code:
EXIficientDemo sample = new EXIficientDemo();
sample.parseAndProofFileLocations("FilePath");
sample.codeSchemaLess();
Firstly it converted xml file to EXI then back to XML, when it generate XML from previously generated EXI's file, it loses some information about Namespace.
Actual XML File:
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="ja" xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<body>
<div>
<p xml:id="s1">
<span tts:origin="somethings">somethings</span>
</p>
</div>
</body>
Generated XML File By EXIficient
<?xml version="1.0" encoding="UTF-8"?>
<ns3:tt xmlns:ns3="http://www.w3.org/ns/ttml"
xml:lang="ja"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns3:body><ns3:div>
<ns3:p xml:id="s1">
<ns3:span xmlns:ns4="http://www.w3.org/ns/ttml#styling"
ns4:origin="somethings">somethings</ns3:span>
</ns3:p>
</ns3:div></ns3:body>
In the generated XML file, it is missing xmlns:tts="http://www.w3.org/ns/ttml#styling"
How to fixed this problem? If you can, please help me.
EXIficient may be suppressing unused namespaces. Your example doesn't show any use of the ttm namespace.
As you can see, it didn't retain the namespace prefix for the ttml namespace either (changed to ns3). The generated XML is perfectly valid if the ttml#metadata namespace is unused.
Update
With the updated question, where namespace ttml#styling is used by the origin attribute of the span element, the namespace is retained in the rebuilt XML, but it has been moved to the span element.
This is still a very valid XML document.
Namespace declarations (xmlns) can appear anywhere in a XML document, and applies to the element on which it appears, and all subelements (unless overridden, which is very unusual).
The same namespace can be declared many times on different elements. For simplicity and/or optimization, it is common to declare all namespaces up front, on the root element, using different prefixes, but it is not required to do so.
I read this question by accident and rather late unfortunately.
Just in case people are still struggling with this and are wondering what they can do.
As it was pointed out EXIficient behaves just fine with regards to namespace handling.
Having said that, the EXI specification allows one to preserve prefixes and namespaces (see Preserve Options).
In EXIficient one can set these options accordingly,
e.g.,
EXIFactory.getFidelityOptions().setFidelity(FidelityOptions.FEATURE_PREFIX, true);
I am getting below error pls help
"parse error:
Error on line 1 of document :
The markup in the document preceding the root element must be well-formed.
Nested exception: The markup in the document preceding the root element must be well-formed.
XML is below
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<'env:Envelope' xmlns>:env=\"http://www.w3.org/2003/05/soap-envelope\" xmlns:ns1=\"urn:zimbraAdmin\">
xmlns:ns2=\"urn:zimbraAdmin\"><env:Header><ns2:context/></env:Header><env:Body>
<ModifyAccountRequest xmlns=\"urn:zimbraAdmin\"><id>4d41ec71-d898-42b8-b522-3c3cdc5583a0</id>
<a n=\"zimbraIsAdminAccount\">TRUE</a>
</ModifyAccountRequest></env:Body></env:Envelope>
That was terribly malformed. Issues are highlighted below:
1. Every instance of \" should be replaced with a simple " as the slash indicates a literal character to Java and is not needed in normal XML.
2. There should be no single quotes around <'env:Envelope' and I honestly have no idea where they came from.
3. The closing carat at xmlns>:env= should be removed, as should the one at the end of the physical line xmlns:ns1=\"urn:zimbraAdmin\">. Removing that carat brings the next namespace statement (which seems unnecessarily identical to ns1) into the envelope tag.
I have no idea what caused the envelope to become so malformed, but you should read up on the purpose of the values and variables you were setting with the xmlns and namespace references so next time you at least uderstand what all the parts of the XML request do. This will help you troubleshoot your own documents in the future.
In the meantime, since you seem to be at a total loss, here is the XML with the errors above corrected.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:ns1="urn:zimbraAdmin" xmlns:ns2="urn:zimbraAdmin">
<env:Header>
<ns2:context/>
</env:Header>
<env:Body>
<ModifyAccountRequest xmlns="urn:zimbraAdmin">
<id>4d41ec71-d898-42b8-b522-3c3cdc5583a0</id>
<a n="zimbraIsAdminAccount">TRUE</a>
</ModifyAccountRequest>
</env:Body>
</env:Envelope>
i need to parse a XML file that looks like this
1.<?xml version="1.0" encoding="UTF-8"?>
2.<Root>
3.<Record>
4.<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]> </Attribute>]]></XML></in>
5.<out><![CDATA[]]></out>
6.</Record>
7.</Root>
I am getting a erro while parsing line number 4 Is there any way to escape a CDATA end token ( ]]> ) within a CDATA section in an xml document.
Your input is not well formed there are several errors I think you need to fix whatever generated that to generate something more like
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Record>
<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><!-- - --><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]<![CDATA[> </Attribute></XML>]]></in>
<out><![CDATA[]]></out>
</Record>
</Root>
Note that the outer CDATA needs <![CDATA[ not <!CDATA[ the first use of ]]> needs to be quoted (for example by stopping and starting the outer CDATA section as here). The outer ]]> needs to be moved after the </XML> so that the end as well as the start of the element is quoted.
That makes the file technically well formed, although elements with name XML (or in general starting with xml in upper or lower case are reserved by the W3C for use in XML related specifications and should not be used in user XML files unless it is a specific element or attribute (such as xmlns defined by the W3C)
In addition I added a (quoted) comment around the dash after the XML declaration as if that CDATA section were extracted and made into an XML document it would make the resulting document non-well formed as only white space or comments and PIs are allowed before the first element.
can you help me in parsing xml with nested <?xml version="1.0" encoding="utf-8"?> tags. when i am trying to parse this xml, i m getting parsing error.
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
</serviceResponse>
</soapenvBody>
</soap>
I don't think this is really a Java problem. Having a second XML declaration within the XML body is just illegal, so I don't think you'll be able to get any XML parsers to parse that. If you have control over the XML (it looks like you're generating it to store a response) then you could try wrapping the inner-XML document with CDATA:
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
]]>
</serviceResponse>
</soapenvBody>
</soap>
EDIT:
I'm thinking that you most likely don't want the extra XML declaration inside that response at all. Do you have control over the code that creates the response? My guess is that the XML snippet <data>...</data> is created as a separate DOM object and then the string is spliced in the middle of the response. Writing out the entire XML document object results in the XML declaration being included, but if you just grab the document root node object (<data>) and write that out as a string then it probably won't include the extra XML declaration that's causing you all this trouble.
It occurred to me that a parser made for dealing with HTML might be able to do what you want. Since HTML tends to be a total mess compared to strict XML, HTML parsers are usually much more error-tolerant. A quick search turned up jsoup. I was able to pull the respCode from your sample XML above with roughly this code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
String data = "your xml goes here";
Document doc = Jsoup.parse(data);
String respCodeRaw = doc.select("respCode").first().text();
int respCode = Integer.valueOf(respCodeRaw);
(I actually tested the library in the Clojure repl, but the code above should work!)
A tag that starts with like <? is a processing instruction. <?xml...> is an XML declaration, and can only be present at the beginning of the xml content. It's not allowed in the XML body.
Why does your soap body contain this? Do you have the option of removing it?
i did not find any parser in java to parse such embedded xml as it is not a valid xml and i guess almost all parses validate the xml before parsing it. so i choose the option to preprocess the xml and selected the inner xml then using SAX parser i parsed the xml and retrieved the values from xml. Guys thanks for your replies.