how to reference the path to a DTD value in XML - java

I am a newbie when it comes to XML and DTD values, so forgive me if this is a simple question or if I am going about this in the wrong way. Can you specify a DTD value in the same way you can specify a path to a property in XML?
For instance, if you have the XML file below:
<!DOCTYPE ... SYSTEM "<path_to_file>">
<BOOK>
<AUTHOR>
<FIRST>John</FIRST>
<LAST>Quncy</LAST>
</AUTHOR>
<NAME>blah</NAME>
<DATE>12/23/13</DATE>
</BOOK>
You could specify the first name of the author by the path:
/BOOK/AUTHOR/FIRST
Is there any syntax to specify a DTD entity like the DOCTYPE in the same way?
Ultimately what I would like to do is use an in house XML parser already written in java to find a DTD entry that I specify and delete it from the XML file. For instance, with the above XML, I would like to specify DOCTYPE and have it removed from the XML. There is already code in place that, given a path, will delete that section from the XML file. I would like to leverage that to also delete DTD entries as well, but I have no idea how to reference it.

No. DOCTYPE is a parsing and validation directive. That is: DOCTYPE and DTD affect parsing and validation but are not a part of the document as separate entities after that. The XML data model does not containDOCTYPE or DTD definitions and they practically don't exist after the document has been parsed.

Related

Missing NameSpace Information In XML file using EXIficient

I am using EXIficient to convert XML data to EXI and back to XML. Here, i use their EXIficientDemo class. Sample Code:
EXIficientDemo sample = new EXIficientDemo();
sample.parseAndProofFileLocations("FilePath");
sample.codeSchemaLess();
Firstly it converted xml file to EXI then back to XML, when it generate XML from previously generated EXI's file, it loses some information about Namespace.
Actual XML File:
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="ja" xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<body>
<div>
<p xml:id="s1">
<span tts:origin="somethings">somethings</span>
</p>
</div>
</body>
Generated XML File By EXIficient
<?xml version="1.0" encoding="UTF-8"?>
<ns3:tt xmlns:ns3="http://www.w3.org/ns/ttml"
xml:lang="ja"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns3:body><ns3:div>
<ns3:p xml:id="s1">
<ns3:span xmlns:ns4="http://www.w3.org/ns/ttml#styling"
ns4:origin="somethings">somethings</ns3:span>
</ns3:p>
</ns3:div></ns3:body>
In the generated XML file, it is missing xmlns:tts="http://www.w3.org/ns/ttml#styling"
How to fixed this problem? If you can, please help me.
EXIficient may be suppressing unused namespaces. Your example doesn't show any use of the ttm namespace.
As you can see, it didn't retain the namespace prefix for the ttml namespace either (changed to ns3). The generated XML is perfectly valid if the ttml#metadata namespace is unused.
Update
With the updated question, where namespace ttml#styling is used by the origin attribute of the span element, the namespace is retained in the rebuilt XML, but it has been moved to the span element.
This is still a very valid XML document.
Namespace declarations (xmlns) can appear anywhere in a XML document, and applies to the element on which it appears, and all subelements (unless overridden, which is very unusual).
The same namespace can be declared many times on different elements. For simplicity and/or optimization, it is common to declare all namespaces up front, on the root element, using different prefixes, but it is not required to do so.
I read this question by accident and rather late unfortunately.
Just in case people are still struggling with this and are wondering what they can do.
As it was pointed out EXIficient behaves just fine with regards to namespace handling.
Having said that, the EXI specification allows one to preserve prefixes and namespaces (see Preserve Options).
In EXIficient one can set these options accordingly,
e.g.,
EXIFactory.getFidelityOptions().setFidelity(FidelityOptions.FEATURE_PREFIX, true);

Parsing xml data

DOes anyone know how to resolve this exception??
java.io.FileNotFoundException: F:\eclipse\WS\l\dblp.dtd (The system cannot find the file specified)
Even though I have given the correct path still Im getting this exception.
heres my xml code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models. </title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
<number>7</number>
<url>db/journals/acta/acta33.html#Saxena96</url>
<ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>
<article mdate="2011-01-11" key="journals/acta/Simon83">
<author>Hans-Ulrich Simon</author>
<title>Pattern Matching in Trees and Nets.</title>
<pages>227-248</pages>
<year>1983</year>
<volume>20</volume>
<journal>Acta Inf.</journal>
<url>db/journals/acta/acta20.html#Simon83</url>
<ee>http://dx.doi.org/10.1007/BF01257084</ee>
</article>
</dblp>
You are referencing "dblp.dtd" in your DOCTYPE - do you have that DTD file in the directory mentioned in the exception?
If not and you don't have the DTD, try removing the DOCTYPE line from the xml file, or overriding the entity resolution to tell it not to try loading it, as in this answer.

How to correct for no grammar constraints error

I have a simple XML document that is flagged (correctly) in Eclipse as having no grammar. I use the file to preload a database on initialisation. Is there a "generic" DTD or Schema that I could apply to this document (and similar - I have over 15 of them) to eliminate this warning and be more correct in my XML structure?
<AbnormalFlags>
<AbnormalFlag>
<code>H</code>
<description>High</description>
</AbnormalFlag>
<AbnormalFlag>
<code>L</code>
<description>Low</description>
</AbnormalFlag>
<AbnormalFlag>
<code>A</code>
<description>Abnormal</description>
</AbnormalFlag>
</AbnormalFlags>
Just add this on top of your xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>

how to create a dtd for this xml?

i have been asked to create a simple dtd for this xml :
<?xml version='1.0' encoding='ISO-8859-1'?>
<QUERY>
<PORT>
<NB></NB>
</PORT>
<BLOCK>
<TAB></TAB>
</BLOCK>
<STAND>
<LEVEL></LEVEL>
</STAND>
</QUERY>
i am using java, i've never did dtd before nor do i know precisely what does it mean.
i would like some guidance if its possible, thank you
DTD is Document Type Definition, and is used to represent the structure of you XML document. Other representations include XML Schema, Relax NG, etc.:
http://en.wikipedia.org/wiki/Document_Type_Definition
It will look something like the following (although my syntax may not be quite right):
<!ELEMENT QUERY (PORT, BLOCK, STAND)>
<!ELEMENT PORT (NB)>
<!ELEMENT NB (#PCDATA)>
<!ELEMENT BLOCK (TAB)>
<!ELEMENT TAB (#PCDATA)>
<!ELEMENT STAND (LEVEL)>
<!ELEMENT LEVEL (#PCDATA)>
If you look at the definition for QUERY you see it defines that it contains the elements: "PORT", "BLOCK", and "STAND". If you look at the definition for NB, we have declared that it should contain text (parsed character data).
XMLBeans comes with a tool called inst2xsd which can inspect an XML file and create an XSD for it that you can then edit/refine. I've used it with pretty good results.
Just read the installation guide for XMLBeans and when you install XMLBeans you'll have the inst2xsd tool installed as well.
edit - just realized you wanted a DTD and not an XSD, leaving this here in case an XSD (which is very similar in purpose) could actually solve your problem anyway
There are some DTD generators out there. Quick search yields this. Haven't used it, though.

XML parsing with java dom parser converts xml declaration into elements

Here is the xml file that I need to process as part of my assignment.
<?xml-stylesheet type="text/xsl" href="people.xsl"?>
<People>
<Person>
`...`
</Person>
`...`
</People>
I am using the "javax.xml.parsers.DocumentBuilderFactory" to create a dom parser. After parsing, the resultant document does not have People at the root but some root having children as xml-stylesheet and People.
Looks like this can be avoided.
<?xml-stylesheet ... ?> is not an XML declaration. It is a Processing Instruction (PI), and the DOM spec says that a Document node may contain zero or more of them.
One approach would be to code your application to deal appropriately with (e.g. ignore) an PI's in the Document node. Alternatively, just use the Document node's documentElement attribute / getter to get the root Element directly.

Categories

Resources