Parsing xml data

Parsing xml data - java

DOes anyone know how to resolve this exception??
java.io.FileNotFoundException: F:\eclipse\WS\l\dblp.dtd (The system cannot find the file specified)
Even though I have given the correct path still Im getting this exception.
heres my xml code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models. </title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
<number>7</number>
<url>db/journals/acta/acta33.html#Saxena96</url>
<ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>
<article mdate="2011-01-11" key="journals/acta/Simon83">
<author>Hans-Ulrich Simon</author>
<title>Pattern Matching in Trees and Nets.</title>
<pages>227-248</pages>
<year>1983</year>
<volume>20</volume>
<journal>Acta Inf.</journal>
<url>db/journals/acta/acta20.html#Simon83</url>
<ee>http://dx.doi.org/10.1007/BF01257084</ee>
</article>
</dblp>

You are referencing "dblp.dtd" in your DOCTYPE - do you have that DTD file in the directory mentioned in the exception?
If not and you don't have the DTD, try removing the DOCTYPE line from the xml file, or overriding the entity resolution to tell it not to try loading it, as in this answer.

Related

Missing NameSpace Information In XML file using EXIficient

I am using EXIficient to convert XML data to EXI and back to XML. Here, i use their EXIficientDemo class. Sample Code:
EXIficientDemo sample = new EXIficientDemo();
sample.parseAndProofFileLocations("FilePath");
sample.codeSchemaLess();
Firstly it converted xml file to EXI then back to XML, when it generate XML from previously generated EXI's file, it loses some information about Namespace.
Actual XML File:
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="ja" xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<body>
<div>
<p xml:id="s1">
<span tts:origin="somethings">somethings</span>
</p>
</div>
</body>
Generated XML File By EXIficient
<?xml version="1.0" encoding="UTF-8"?>
<ns3:tt xmlns:ns3="http://www.w3.org/ns/ttml"
xml:lang="ja"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns3:body><ns3:div>
<ns3:p xml:id="s1">
<ns3:span xmlns:ns4="http://www.w3.org/ns/ttml#styling"
ns4:origin="somethings">somethings</ns3:span>
</ns3:p>
</ns3:div></ns3:body>
In the generated XML file, it is missing xmlns:tts="http://www.w3.org/ns/ttml#styling"
How to fixed this problem? If you can, please help me.

EXIficient may be suppressing unused namespaces. Your example doesn't show any use of the ttm namespace.
As you can see, it didn't retain the namespace prefix for the ttml namespace either (changed to ns3). The generated XML is perfectly valid if the ttml#metadata namespace is unused.
Update
With the updated question, where namespace ttml#styling is used by the origin attribute of the span element, the namespace is retained in the rebuilt XML, but it has been moved to the span element.
This is still a very valid XML document.
Namespace declarations (xmlns) can appear anywhere in a XML document, and applies to the element on which it appears, and all subelements (unless overridden, which is very unusual).
The same namespace can be declared many times on different elements. For simplicity and/or optimization, it is common to declare all namespaces up front, on the root element, using different prefixes, but it is not required to do so.

I read this question by accident and rather late unfortunately.
Just in case people are still struggling with this and are wondering what they can do.
As it was pointed out EXIficient behaves just fine with regards to namespace handling.
Having said that, the EXI specification allows one to preserve prefixes and namespaces (see Preserve Options).
In EXIficient one can set these options accordingly,
e.g.,
EXIFactory.getFidelityOptions().setFidelity(FidelityOptions.FEATURE_PREFIX, true);

How to correct for no grammar constraints error

I have a simple XML document that is flagged (correctly) in Eclipse as having no grammar. I use the file to preload a database on initialisation. Is there a "generic" DTD or Schema that I could apply to this document (and similar - I have over 15 of them) to eliminate this warning and be more correct in my XML structure?
<AbnormalFlags>
<AbnormalFlag>
<code>H</code>
<description>High</description>
</AbnormalFlag>
<AbnormalFlag>
<code>L</code>
<description>Low</description>
</AbnormalFlag>
<AbnormalFlag>
<code>A</code>
<description>Abnormal</description>
</AbnormalFlag>
</AbnormalFlags>

Just add this on top of your xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>

how to reference the path to a DTD value in XML

I am a newbie when it comes to XML and DTD values, so forgive me if this is a simple question or if I am going about this in the wrong way. Can you specify a DTD value in the same way you can specify a path to a property in XML?
For instance, if you have the XML file below:
<!DOCTYPE ... SYSTEM "<path_to_file>">
<BOOK>
<AUTHOR>
<FIRST>John</FIRST>
<LAST>Quncy</LAST>
</AUTHOR>
<NAME>blah</NAME>
<DATE>12/23/13</DATE>
</BOOK>
You could specify the first name of the author by the path:
/BOOK/AUTHOR/FIRST
Is there any syntax to specify a DTD entity like the DOCTYPE in the same way?
Ultimately what I would like to do is use an in house XML parser already written in java to find a DTD entry that I specify and delete it from the XML file. For instance, with the above XML, I would like to specify DOCTYPE and have it removed from the XML. There is already code in place that, given a path, will delete that section from the XML file. I would like to leverage that to also delete DTD entries as well, but I have no idea how to reference it.

No. DOCTYPE is a parsing and validation directive. That is: DOCTYPE and DTD affect parsing and validation but are not a part of the document as separate entities after that. The XML data model does not containDOCTYPE or DTD definitions and they practically don't exist after the document has been parsed.

Parse XML with nested xml opening tags <?xml ...?> in java

can you help me in parsing xml with nested <?xml version="1.0" encoding="utf-8"?> tags. when i am trying to parse this xml, i m getting parsing error.
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
</serviceResponse>
</soapenvBody>
</soap>

I don't think this is really a Java problem. Having a second XML declaration within the XML body is just illegal, so I don't think you'll be able to get any XML parsers to parse that. If you have control over the XML (it looks like you're generating it to store a response) then you could try wrapping the inner-XML document with CDATA:
<?xml version="1.0" encoding="utf-8"?>
<soap>
<soapenvBody>
<serviceResponse>
<![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<data>
<respCode>0</respCode>
</data>
]]>
</serviceResponse>
</soapenvBody>
</soap>
EDIT:
I'm thinking that you most likely don't want the extra XML declaration inside that response at all. Do you have control over the code that creates the response? My guess is that the XML snippet <data>...</data> is created as a separate DOM object and then the string is spliced in the middle of the response. Writing out the entire XML document object results in the XML declaration being included, but if you just grab the document root node object (<data>) and write that out as a string then it probably won't include the extra XML declaration that's causing you all this trouble.

It occurred to me that a parser made for dealing with HTML might be able to do what you want. Since HTML tends to be a total mess compared to strict XML, HTML parsers are usually much more error-tolerant. A quick search turned up jsoup. I was able to pull the respCode from your sample XML above with roughly this code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
String data = "your xml goes here";
Document doc = Jsoup.parse(data);
String respCodeRaw = doc.select("respCode").first().text();
int respCode = Integer.valueOf(respCodeRaw);
(I actually tested the library in the Clojure repl, but the code above should work!)

A tag that starts with like <? is a processing instruction. <?xml...> is an XML declaration, and can only be present at the beginning of the xml content. It's not allowed in the XML body.
Why does your soap body contain this? Do you have the option of removing it?

i did not find any parser in java to parse such embedded xml as it is not a valid xml and i guess almost all parses validate the xml before parsing it. so i choose the option to preprocess the xml and selected the inner xml then using SAX parser i parsed the xml and retrieved the values from xml. Guys thanks for your replies.

How do I take off the XML version tag in the XOM library for Java?

I'm writing a small application in Java that uses XOM to output XHTML.
The problem is that XOM places the following tag before all the html:
<?xml version="1.0" encoding="UTF-8"?>
I've read their documentation, but I can't seem to find how to remove this tag. Thanks guys.
Edit: I'm outputting to a file using XOM's Serializer class
Follow up: If it is good practice to use the XML tag before the DOCTYPE, why don't any websites use it? Also, why does the W3C validator give me and error when it sees the XML tag? Here is the error:
Illegal processing instruction target (found xml)
Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

The tag is valid as XML and XHTML, and good practice. There should be no reason to remove it.
Just leave it there ... or fix whatever it is that is expecting it not to be there.
If you don't believe me, take a look at this excerpt from the XHTML 1.1 spec.
"Example of an XHTML 1.1 document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html version="-//W3C//DTD XHTML 1.1//EN"
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/1999/xhtml
http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"
>
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to example.org.</p>
</body>
</html>
Note that in this example, the XML declaration is included. An XML declaration like the one above is not required in all XML documents. XHTML document authors SHOULD use XML declarations in all their documents. XHTML document authors MUST use an XML declaration when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding is specified by a higher-level protocol."
By the way, the W3C validation service says that is OK ... but if there is any whitespace before the <?xml ...?> tag it complains.

Does this work? This is listed in the Javadoc
protected void writeXMLDeclaration()
throws IOException
You could override it, and do nothing.....
Agreed you should normally output the prologue

Assuming you wish to serve your XHTML as text/html content type, you are right to want to remove the XML declaration, because if you don't, it will throw IE6 into quirks mode.
Overriding writeXMLDeclaration() as suggested by MJB looks like a good way to do it.
But you should be aware that you may well hit other problems using an XML serializer and serving the output as text/html.
Most likely, is that the output will produce a tag like this: <script src="myscript.js" />. Browsers (except Safari) won't treat that as a script self closing tag, but as as a script start tag, and everything that follows will be treated as part of the script and not rendered by the browser.
You will probably need to override your serializer to make it HTML aware to resolve this. I suggest overriding the writeEmptyElementTag() function, and for all elements with names not in the list "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", "hr", "isindex", "image", "img", "input", "keygen", "link", "meta", "param", "source", "spacer" and "wbr", call writeStartTag() and then writeEndTag() instead of the default behaviour.
Finally, if I were to put the XML tag
before my DOCTYPE, does this mean I
don't have to specify <meta
charset="UTF-8" /> in my html header?
No it doesn't. When served as text/html, the XML declaration is simply ignored by browsers, so you will still need to provide the character encoding by some other means, either the meta tag, or in the HTTP headers.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing xml data - java

You are referencing "dblp.dtd" in your DOCTYPE - do you have that DTD file in the directory mentioned in the exception? If not and you don't have the DTD, try removing the DOCTYPE line from the xml file, or overriding the entity resolution to tell it not to try loading it, as in this answer.

Related

Missing NameSpace Information In XML file using EXIficient

How to correct for no grammar constraints error

how to reference the path to a DTD value in XML

Parse XML with nested xml opening tags <?xml ...?> in java

How do I take off the XML version tag in the XOM library for Java?

Categories

Resources