I have to use a external DTD, that specifies that a certain element can only have id attribute:
<!ELEMENT x (y | z)>
<!ATTLIST x id ID #IMPLIED>
So something like this is valid
<x id="x">...</x>
But if i try something like this:
<x id="x" custom="custom">...</x>
My parser gives me the following error:
Attribute "custom" must be declared for element type "x".
So I understand what the error says and why its happening, but as i said the DTD is external and sadly i cant change it. Is there a workaround or a hack that can use to add my own custom attribute?
You can either disable DTD validation in your parser, or try defining internal DTD.
Related
I am using EXIficient to convert XML data to EXI and back to XML. Here, i use their EXIficientDemo class. Sample Code:
EXIficientDemo sample = new EXIficientDemo();
sample.parseAndProofFileLocations("FilePath");
sample.codeSchemaLess();
Firstly it converted xml file to EXI then back to XML, when it generate XML from previously generated EXI's file, it loses some information about Namespace.
Actual XML File:
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="ja" xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<body>
<div>
<p xml:id="s1">
<span tts:origin="somethings">somethings</span>
</p>
</div>
</body>
Generated XML File By EXIficient
<?xml version="1.0" encoding="UTF-8"?>
<ns3:tt xmlns:ns3="http://www.w3.org/ns/ttml"
xml:lang="ja"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns3:body><ns3:div>
<ns3:p xml:id="s1">
<ns3:span xmlns:ns4="http://www.w3.org/ns/ttml#styling"
ns4:origin="somethings">somethings</ns3:span>
</ns3:p>
</ns3:div></ns3:body>
In the generated XML file, it is missing xmlns:tts="http://www.w3.org/ns/ttml#styling"
How to fixed this problem? If you can, please help me.
EXIficient may be suppressing unused namespaces. Your example doesn't show any use of the ttm namespace.
As you can see, it didn't retain the namespace prefix for the ttml namespace either (changed to ns3). The generated XML is perfectly valid if the ttml#metadata namespace is unused.
Update
With the updated question, where namespace ttml#styling is used by the origin attribute of the span element, the namespace is retained in the rebuilt XML, but it has been moved to the span element.
This is still a very valid XML document.
Namespace declarations (xmlns) can appear anywhere in a XML document, and applies to the element on which it appears, and all subelements (unless overridden, which is very unusual).
The same namespace can be declared many times on different elements. For simplicity and/or optimization, it is common to declare all namespaces up front, on the root element, using different prefixes, but it is not required to do so.
I read this question by accident and rather late unfortunately.
Just in case people are still struggling with this and are wondering what they can do.
As it was pointed out EXIficient behaves just fine with regards to namespace handling.
Having said that, the EXI specification allows one to preserve prefixes and namespaces (see Preserve Options).
In EXIficient one can set these options accordingly,
e.g.,
EXIFactory.getFidelityOptions().setFidelity(FidelityOptions.FEATURE_PREFIX, true);
I am a newbie when it comes to XML and DTD values, so forgive me if this is a simple question or if I am going about this in the wrong way. Can you specify a DTD value in the same way you can specify a path to a property in XML?
For instance, if you have the XML file below:
<!DOCTYPE ... SYSTEM "<path_to_file>">
<BOOK>
<AUTHOR>
<FIRST>John</FIRST>
<LAST>Quncy</LAST>
</AUTHOR>
<NAME>blah</NAME>
<DATE>12/23/13</DATE>
</BOOK>
You could specify the first name of the author by the path:
/BOOK/AUTHOR/FIRST
Is there any syntax to specify a DTD entity like the DOCTYPE in the same way?
Ultimately what I would like to do is use an in house XML parser already written in java to find a DTD entry that I specify and delete it from the XML file. For instance, with the above XML, I would like to specify DOCTYPE and have it removed from the XML. There is already code in place that, given a path, will delete that section from the XML file. I would like to leverage that to also delete DTD entries as well, but I have no idea how to reference it.
No. DOCTYPE is a parsing and validation directive. That is: DOCTYPE and DTD affect parsing and validation but are not a part of the document as separate entities after that. The XML data model does not containDOCTYPE or DTD definitions and they practically don't exist after the document has been parsed.
I need to test a XHTML code like <div> </div> using XmlUnit. The Diff constructor tells me this:
org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but
not declared.
I know that nbsp entity is not defined in XML, but the HTML code is not mine, so I cannot replace it by #160 (that would be obvious solution otherwise).
I don't want to modify the HTML code by adding <!DOCTYPE html [ <!ENTITY nbsp " "> ]>, I would prefer to leave the code without change.
Is there another way around this problem? I know there is a HTMLDocumentBuilder class in XmlUnit, but I wasn't able to find good documentation or examples.
You can use a DOCTYPE declaration that refers to the MathML DTD:
<!DOCTYPE math
PUBLIC "-//W3C//DTD MathML 3.0//EN"
"http://www.w3.org/Math/DTD/mathml3/mathml3.dtd">
or a local copy of the same.
You can enable Feature "http://apache.org/xml/features/continue-after-fatal-error" to not throw an exception in case of unknown entities. This still gives a warning though:
documentBuilderFactory.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
É voilá!
i have been asked to create a simple dtd for this xml :
<?xml version='1.0' encoding='ISO-8859-1'?>
<QUERY>
<PORT>
<NB></NB>
</PORT>
<BLOCK>
<TAB></TAB>
</BLOCK>
<STAND>
<LEVEL></LEVEL>
</STAND>
</QUERY>
i am using java, i've never did dtd before nor do i know precisely what does it mean.
i would like some guidance if its possible, thank you
DTD is Document Type Definition, and is used to represent the structure of you XML document. Other representations include XML Schema, Relax NG, etc.:
http://en.wikipedia.org/wiki/Document_Type_Definition
It will look something like the following (although my syntax may not be quite right):
<!ELEMENT QUERY (PORT, BLOCK, STAND)>
<!ELEMENT PORT (NB)>
<!ELEMENT NB (#PCDATA)>
<!ELEMENT BLOCK (TAB)>
<!ELEMENT TAB (#PCDATA)>
<!ELEMENT STAND (LEVEL)>
<!ELEMENT LEVEL (#PCDATA)>
If you look at the definition for QUERY you see it defines that it contains the elements: "PORT", "BLOCK", and "STAND". If you look at the definition for NB, we have declared that it should contain text (parsed character data).
XMLBeans comes with a tool called inst2xsd which can inspect an XML file and create an XSD for it that you can then edit/refine. I've used it with pretty good results.
Just read the installation guide for XMLBeans and when you install XMLBeans you'll have the inst2xsd tool installed as well.
edit - just realized you wanted a DTD and not an XSD, leaving this here in case an XSD (which is very similar in purpose) could actually solve your problem anyway
There are some DTD generators out there. Quick search yields this. Haven't used it, though.
The javadoc for the Document class has the following note under getElementById.
Note: Attributes with the name "ID" or "id" are not of type ID unless so defined
So, I read an XHTML doc into the DOM (using Xerces 2.9.1).
The doc has a plain old <p id='fribble'> in it.
I call getElementById("fribble"), and it returns null.
I use XPath to get "//*[id='fribble']", and all is well.
So, the question is, what causes the DocumentBuilder to actually mark ID attributes as 'so defined?'
These attributes are special because of their type and not because of their name.
IDs in XML
Although it is easy to think of attributes as name="value" with the value is being a simple string, that is not the full story -- there is also an attribute type associated with attributes.
This is easy to appreciate when there is an XML Schema involved, since XML Schema supports datatypes for both XML elements and XML attributes. The XML attributes are defined to be of a simple type (e.g. xs:string, xs:integer, xs:dateTime, xs:anyURI). The attributes being discussed here are defined with the xs:ID built-in datatype (see section 3.3.8 of the XML Schema Part 2: Datatypes).
<xs:element name="foo">
<xs:complexType>
...
<xs:attribute name="bar" type="xs:ID"/>
...
</xs:complexType>
</xs:element>
Although DTD don't support the rich datatypes in XML Schema, it does support a limited set of attribute types (which is defined in section 3.3.1 of XML 1.0). The attributes being discussed here are defined with an attribute type of ID.
<!ATTLIST foo bar ID #IMPLIED>
With either the above XML Schema or DTD, the following element will be identified by the ID value of "xyz".
<foo bar="xyz"/>
Without knowing the XML Schema or DTD, there is no way to tell what is an ID and what is not:
Attributes with the name of "id" do not necessarily have an attribute type of ID; and
Attributes with names that are not "id" might have an attribute type of ID!
To improve this situation, the xml:id was subsequently invented (see xml:id W3C Recommendation). This is an attribute that always has the same prefix and name, and is intended to be treated as an attribute with attribute type of ID. However, whether it does will depend on the parser being used is aware of xml:id or not. Since many parsers were initially written before xml:id was defined, it might not be supported.
IDs in Java
In Java, getElementById() finds elements by looking for attributes of type ID, not for attributes with the name of "id".
In the above example, getElementById("xyz") will return that foo element, even though the name of the attribute on it is not "id" (assuming the DOM knows that bar has an attribute type of ID).
So how does the DOM know what attribute type an attribute has? There are three ways:
Provide an XML Schema to the parser (example)
Provide a DTD to the parser
Explicitly indicate to the DOM that it is treated as an attribute type of ID.
The third option is done using the setIdAttribute() or setIdAttributeNS() or setIdAttributeNode() methods on the org.w3c.dom.Element class.
Document doc;
Element fooElem;
doc = ...; // load XML document instance
fooElem = ...; // locate the element node "foo" in doc
fooElem.setIdAttribute("bar", true); // without this, 'found' would be null
Element found = doc.getElementById("xyz");
This has to be done for each element node that has one of these type of attributes on them. There is no simple built-in method to make all occurrences of attributes with a given name (e.g. "id") be of attribute type ID.
This third approach is only useful in situations where the code calling the getElementById() is separate from that creating the DOM. If it was the same code, it already has found the element to set the ID attribute so it is unlikely to need to call getElementById().
Also, be aware that those methods were not in the original DOM specification. The getElementById was introduced in DOM level 2.
IDs in XPath
The XPath in the original question gave a result because it was only matching the attribute name.
To match on attribute type ID values, the XPath id function needs to be used (it is one of the Node Set Functions from XPath 1.0):
id("xyz")
If that had been used, the XPath would have given the same result as getElementById() (i.e. no match found).
IDs in XML continued
Two important features of ID should be highlighted.
Firstly, the values of all attributes of attribute type ID must be unique to the whole XML document. In the following example, if personId and companyId both have attribute type of ID, it would be an error to add another company with companyId of id24601, because it will be a duplicate of an existing ID value. Even though the attribute names are different, it is the attribute type that matters.
<test1>
<person personId="id24600">...</person>
<person personId="id24601">...</person>
<company companyId="id12345">...</company>
<company companyId="id12346">...</company>
</test1>
Secondly, the attributes are defined on elements rather than the entire XML document. So attributes with the same attribute name on different elements might have different attribute type properties. In the following example XML document, if only alpha/#bar has an attribute type of ID (and no other attribute was), getElementById("xyz") will return an element, but getElementById("abc") will not (since beta/#bar is not of attribute type ID). Also, it is not an error for the attribute gamma/#bar to have the same value as alpha/#bar, that value is not considered in the uniqueness of IDs in the XML document because it is is not of attribute type ID.
<test2>
<alpha bar="xyz"/>
<beta bar="abc"/>
<gamma bar="xyz"/>
</test2>
For the getElementById() call to work, the Document has to know the types of its nodes, and the target node must be of the XML ID type for the method to find it. It knows about the types of its elements via an associated schema. If the schema is not set, or does not declare the id attribute to be of the XML ID type, getElementById() will never find it.
My guess is that your document doesn't know the p element's id attribute is of the XML ID type (is it?). You can navigate to the node in the DOM using getChildNodes() and other DOM-traversal functions, and try calling Attr.isId() on the id attribute to tell for sure.
From the getElementById javadoc:
The DOM implementation is expected to
use the attribute Attr.isId to
determine if an attribute is of type
ID.
Note: Attributes with the name "ID" or
"id" are not of type ID unless so
defined.
If you are using a DocumentBuilder to parse your XML into a DOM, be sure to call setSchema(schema) on the DocumentBuilderFactory before calling newDocumentBuilder(), to ensure that the builder you get from the factory is aware of element types.
ID attribute isn't an attribute whose name is "ID", it's an attribute which is declared to be an ID attribute by a DTD or a schema. For example, the html 4 DTD describes it:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
The corresponding xpath expression would actually be id('fribble'), which should return the same result as getElementById. For this to work, the dtd or schema associated with your document has to declare the attribute as being of type ID.
If you are in control of the queried xml you could also try renaming the attribute to xml:id as per http://www.w3.org/TR/xml-id/.
The following will allow you to get an element by id:
public static Element getElementById(Element rootElement, String id)
{
try
{
String path = String.format("//*[#id = '%1$s' or #Id = '%1$s' or #ID = '%1$s' or #iD = '%1$s' ]", id);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)xPath.evaluate(path, rootElement, XPathConstants.NODESET);
return (Element) nodes.item(0);
}
catch (Exception e)
{
return null;
}
}