XML parsing with java dom parser converts xml declaration into elements - java

Here is the xml file that I need to process as part of my assignment.
<?xml-stylesheet type="text/xsl" href="people.xsl"?>
<People>
<Person>
`...`
</Person>
`...`
</People>
I am using the "javax.xml.parsers.DocumentBuilderFactory" to create a dom parser. After parsing, the resultant document does not have People at the root but some root having children as xml-stylesheet and People.
Looks like this can be avoided.

<?xml-stylesheet ... ?> is not an XML declaration. It is a Processing Instruction (PI), and the DOM spec says that a Document node may contain zero or more of them.
One approach would be to code your application to deal appropriately with (e.g. ignore) an PI's in the Document node. Alternatively, just use the Document node's documentElement attribute / getter to get the root Element directly.

Related

JAVA DOM API Processing inscrution and doctype before XML prolog

I am working on DOM API in JAVA, i have a problem, how may i add Precessing Instruction (XSLT Style Sheet) and DOCTYPE (Document type) after XML Prolog
each one in new line please?
e.g :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FICHES SYSTEM "docform.dtd">
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
You need to create the DOCTYPE when creating the document, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-102161490 which defines a method createDocumentType on the implementation to create a DOCTYPE and a method createDocument to create a document taking the DOCTYPE as one parameter.
Thus you need
DocumentType docType = implementation.createDocumentType("FICHES", null, "docform.dtd");
Document doc = implementation.createDocument(null, "FICHES", docType);
That way you now have a DOM document doc with a DOCTYPE node and a root element named FICHES, then you can create and insert the processing instruction:
doc.insertBefore(doc.createProcessingInstruction("xml-stylesheet", "type=\"text/xsl\" href=\"stylesheet.xsl\""), doc.getDocumentElement());

How to remove an specific xml attribute from org.w3c.dom.Document

I have this XML:
<Body xmlns:wsu="http://mynamespace">
<Ticket xmlns="http://othernamespace">
<Customer xlmns="">Robert</Customer>
<Products xmlns="">
<Product>a product</>
</Products>
</Ticket>
<Delivered xmlns="" />
<Payment xlmns="">cash</Payment>
</Body>
I am using Java to read it as a DOM document. I want remove the empty namespace attributes (i.e., xmlns=""). Is there any way to do that?
You need to understand that xmlns is a very special attribute. Basically, the xmlns="" is so that your Customer element is in the "unnamed" namespace, rather than the http://othernamespace namespace (and likewise for other elements which would otherwise inherit a default namespace from their ancestors).
If you want to get rid of the xmlns="", you basically need to put the elements into the appropriate namespace - so it's changing the element name. I don't think the W3C API lets you change the name of an element - you may well need to create a new element with the appropriate namespaced-name, and copy the content. Or if you're responsible for creating the document to start with, just use the right namespace.

Parsing data inside CDATA element

i need to parse a XML file that looks like this
1.<?xml version="1.0" encoding="UTF-8"?>
2.<Root>
3.<Record>
4.<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]> </Attribute>]]></XML></in>
5.<out><![CDATA[]]></out>
6.</Record>
7.</Root>
I am getting a erro while parsing line number 4 Is there any way to escape a CDATA end token ( ]]> ) within a CDATA section in an xml document.
Your input is not well formed there are several errors I think you need to fix whatever generated that to generate something more like
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Record>
<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><!-- - --><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]<![CDATA[> </Attribute></XML>]]></in>
<out><![CDATA[]]></out>
</Record>
</Root>
Note that the outer CDATA needs <![CDATA[ not <!CDATA[ the first use of ]]> needs to be quoted (for example by stopping and starting the outer CDATA section as here). The outer ]]> needs to be moved after the </XML> so that the end as well as the start of the element is quoted.
That makes the file technically well formed, although elements with name XML (or in general starting with xml in upper or lower case are reserved by the W3C for use in XML related specifications and should not be used in user XML files unless it is a specific element or attribute (such as xmlns defined by the W3C)
In addition I added a (quoted) comment around the dash after the XML declaration as if that CDATA section were extracted and made into an XML document it would make the resulting document non-well formed as only white space or comments and PIs are allowed before the first element.

How to set namespace only on first tag with XOM?

I am using XOM to build XML documents in Java.
I have created a simple XML document, and I want an XML namespace. But when I set the namespace on the first tag, an empty namespace is set on the childs like xmlns="", how can I get rid of this behaviour? I only want xmlns on the first tag.
I want this XML:
<request xmlns="http://my-namespace">
<type>Test</type>
<data>
<myData>test data</myData>
</data>
</request>
But this is the XML document output from XOM
<request xmlns="http://my-namespace">
<type xmlns="">Test</type>
<data xmlns="">
<myData>test data</myData>
</data>
</request>
This is my Java XOM code:
String namespace = "http://my-namespace";
Element request = new Element("request", namespace);
Element type = new Element("type");
type.appendChild("Test");
request.appendChild(type);
Element data = new Element("data");
request.appendChild(data);
Element myData = new Element("myData");
myData.appendChild("test data");
data.appendChild(myData);
Document doc = new Document(request);
doc.toXML();
This works for me. However, I'm a bit puzzled as to why the Element objects don't inherit the namespace of their parents, though. (Not an XML nor XOM expert)
Code:
String namespace = "http://my-namespace";
Element request = new Element("request", namespace);
Element type = new Element("type", namespace);
type.appendChild("Test");
request.appendChild(type);
Element data = new Element("data", namespace);
request.appendChild(data);
Element myData = new Element("myData", namespace);
myData.appendChild("test data");
data.appendChild(myData);
Document doc = new Document(request);
System.out.println(doc.toXML());
Output:
<?xml version="1.0"?>
<request xmlns="http://my-namespace">
<type>Test</type>
<data>
<myData>test data</myData>
</data>
</request>
I ran into the same problem, and Google lead me here.
#Michael - That's what it says in the javadoc, yes, but unfortunately, that's not how it works when you implement it. The child elements will continue to get blank xmlns attributes unless you do Catchwa's implementation.
Catchwa's implementation works just fine. Only the element I tell it to have a namespace, has a namespace. All empty xmlns attributes are gone. It's strange.
Is it a bug? I can't seem to figure that part out. Or is it just the way XOM works?
Don't confuse namespaces and namespace declarations. The namespace is an intrinsic property of each element. The namespace declaration is the `xmlns' attribute. They are not the same thing, although they are connected. When you create an element, you set its namespace, not its namespace declaration.
In the XOM data model namespaces are not attributes. They are an intrinsic property of the element itself. There is no rule in XML that requires children of an element to be in the same namespace as the parent. Indeed theoretically every element in the document could be in a different namespace.
In XOM you specify the namespace of an element or attribute at the same time you specify the local name. When you create an element, the element initially has no parent so there's no way for XOM to default to giving the element the same namespace as its parent, even if that's what was wanted (and it's not).
When the document is serialized the namespaces are represented by xmlns and xmlns:*prefix* attributes. XOM figures out where to put these elements to match the namespaces you've assigned to each element. Just specify the namespace you want for each element in your code, and let XOM figure out where to put the namespace declarations.
In XOM you can add a namespace declaration to the root element.
Here's a short example with three different namespaces:
final String NS_XLINK = "http://www.w3.org/1999/xlink";
final String NS_OTHER = "http://other.com";
Element root = new Element("root", "http://root.com");
root.addNamespaceDeclaration("xlink", NS_XLINK);
root.addNamespaceDeclaration("other", NS_OTHER);
root.addAttribute(new Attribute("xlink:href", NS_XLINK, "http://somewhere.com"));
root.appendChild(new Element("other:alien", NS_OTHER));
Document doc = new Document(root);
System.out.println(doc.toXML());
which produces this result (with additional line breaks inserted for readability):
<?xml version="1.0"?>
<root
xmlns="http://root.com"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:other="http://other.com"
xlink:href="http://somewhere.com">
<other:alien />
</root>

The markup must be well-formed

First off, let me say I am a new to SAX and Java.
I am trying to read information from an XML file that is not well formed.
When I try to use the SAX or DOM Parser I get the following error in response:
The markup in the document following the root element must be well-formed.
This is how I set up my XML file:
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
...
Can I force the SAX or DOM to parse XML files even if they are not well formed XML?
Thank you for your help. Much appreciated.
Haythem
Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:
<?xml version="1.0"?>
<wrapper>
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
</wrapper>
There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.
Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.
(I know that this is not of too much help...)
As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

Categories

Resources