Setting the order of attribute when writing XML Element [duplicate] - java

This question already has answers here:
Order of XML attributes after DOM processing
(12 answers)
Closed 9 years ago.
When writing in java the following:
Element fieldEl = targetDocument.createElement("field");
fieldEl.setAttribute("Wine","Marlo");
fieldEl.setAttribute("Beer","Corona");
The order of adding the attributes are not kept in the result XML file.
How can I control the order of the attribute inside XML Element (so it will be easy for human being to read...) ??

There is no defined order for attribute nodes according to the DOM standard:
Objects implementing the NamedNodeMap interface are used to represent collections of nodes that can be accessed by name. Note that NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not maintained in any particular order. Objects contained in an object implementing NamedNodeMap may also be accessed by an ordinal index, but this is simply to allow convenient enumeration of the contents of a NamedNodeMap, and does not imply that the DOM specifies an order to these Nodes.
(emphasis added) and neither in the XML standard:
Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.
I don't think, that many DOM implementations support ordering of attributes at all. You'd have to write your own serialization mechanism in order to achieve ordering (no pun intended).

Related

Howto parse xml-tags with prefixes using jersey and jaxb [duplicate]

I saw the following line in an XML file:
xmlns:android="http://schemas.android.com/apk/res/android"
I have also seen xmlns in many other XML files that I've come across.
What is it?
It means XML namespace.
Basically, every element (or attribute) in XML belongs to a namespace, a way of "qualifying" the name of the element.
Imagine you and I both invent our own XML. You invent XML to describe people, I invent mine to describe cities. Both of us include an element called name. Yours refers to the person’s name, and mine to the city name—OK, it’s a little bit contrived.
<person>
<name>Rob</name>
<age>37</age>
<homecity>
<name>London</name>
<lat>123.000</lat>
<long>0.00</long>
</homecity>
</person>
If our two XMLs were combined into a single document, how would we tell the two names apart? As you can see above, there are two name elements, but they both have different meanings.
The answer is that you and I would both assign a namespace to our XML, which we would make unique:
<personxml:person xmlns:personxml="http://www.your.example.com/xml/person"
xmlns:cityxml="http://www.my.example.com/xml/cities">
<personxml:name>Rob</personxml:name>
<personxml:age>37</personxml:age>
<cityxml:homecity>
<cityxml:name>London</cityxml:name>
<cityxml:lat>123.000</cityxml:lat>
<cityxml:long>0.00</cityxml:long>
</cityxml:homecity>
</personxml:person>
Now we’ve fully qualified our XML, there is no ambiguity as to what each name element means. All of the tags that start with personxml: are tags belonging to your XML, all the ones that start with cityxml: are mine.
There are a few points to note:
If you exclude any namespace declarations, things are considered to be in the default namespace.
If you declare a namespace without the identifier, that is, xmlns="http://somenamespace", rather than xmlns:rob="somenamespace", it specifies the default namespace for the document.
The actual namespace itself, often a IRI, is of no real consequence. It should be unique, so people tend to choose a IRI/URI that they own, but it has no greater meaning than that. Sometimes people will place the schema (definition) for the XML at the specified IRI, but that is a convention of some people only.
The prefix is of no consequence either. The only thing that matters is what namespace the prefix is defined as. Several tags beginning with different prefixes, all of which map to the same namespace are considered to be the same.
For instance, if the prefixes personxml and mycityxml both mapped to the same namespace (as in the snippet below), then it wouldn't matter if you prefixed a given element with personxml or mycityxml, they'd both be treated as the same thing by an XML parser. The point is that an XML parser doesn't care what you've chosen as the prefix, only the namespace it maps too. The prefix is just an indirection pointing to the namespace.
<personxml:person
xmlns:personxml="http://example.com/same/url"
xmlns:mycityxml="http://example.com/same/url" />
Attributes can be qualified but are generally not. They also do not inherit their namespace from the element they are on, as opposed to elements (see below).
Also, element namespaces are inherited from the parent element. In other words I could equally have written the above XML as
<person xmlns="http://www.your.example.com/xml/person">
<name>Rob</name>
<age>37</age>
<homecity xmlns="http://www.my.example.com/xml/cities">
<name>London</name>
<lat>123.000</lat>
<long>0.00</long>
</homecity>
</person>
It defines an XML Namespace.
In your example, the Namespace Prefix is "android" and the Namespace URI is "http://schemas.android.com/apk/res/android"
In the document, you see elements like: <android:foo />
Think of the namespace prefix as a variable with a short name alias for the full namespace URI. It is the equivalent of writing <http://schemas.android.com/apk/res/android:foo /> with regards to what it "means" when an XML parser reads the document.
NOTE: You cannot actually use the full namespace URI in place of the namespace prefix in an XML instance document.
Check out this tutorial on namespaces: http://www.sitepoint.com/xml-namespaces-explained/
I think the biggest confusion is that xml namespace is pointing to some kind of URL that doesn't have any information. But the truth is that the person who invented below namespace:
xmlns:android="http://schemas.android.com/apk/res/android"
could also call it like that:
xmlns:android="asjkl;fhgaslifujhaslkfjhliuqwhrqwjlrknqwljk.rho;il"
This is just a unique identifier. However it is established that you should put there URL that is unique and can potentially point to the specification of used tags/attributes in that namespace. It's not required tho.
Why it should be unique? Because namespaces purpose is to have them unique so the attribute for example called background from your namespace can be distinguished from the background from another namespace.
Because of that uniqueness you do not need to worry that if you create your custom attribute you gonna have name collision.
xmlns - xml namespace. It's just a method to avoid element name conflicts. For example:
<config xmlns:rnc="URI1" xmlns:bsc="URI2">
<rnc:node>
<rnc:rncId>5</rnc:rncId>
</rnc:node>
<bsc:node>
<bsc:cId>5</bsc:cId>
</bsc:node>
</config>
Two different node elements in one xml file. Without namespaces this file would not be valid.
You have name spaces so you can have globally unique elements. However, 99% of the time this doesn't really matter, but when you put it in the perspective of The Semantic Web, it starts to become important.
For example, you could make an XML mash-up of different schemes just by using the appropriate xmlns. For example, mash up friend of a friend with vCard, etc.

XPath to check the namespace a prefix is bound to

Say I have the following XML file:
<a xmlns:foo="http://foo"></a>
I need to check whether the prefix foo is bound to http://foo or not. Whereby not bound could indicate that the said prefix does not exist at all or is bound to some other namespace URI.
I already have a library that takes a Document object and an XPath expression and returns a (possibly empty) List of Nodes that exist at that XPath.
So what would be an expression that would check for the presence of a prefix foo in the top-most element (document element) bound to the namespace http://foo and that would yield one node for the above XML and zero nodes for the following XMLs:
<a xmlns:fooX="http://foo"></a>
and
< xmlns:foo="http://fooX"></a>
I tried, as a first step, to just get the value of that attribute using:
/*[#*[local-name()='foo']]
... but it seems that prefix-binding attributes are handled differently from "normal" attributes.
If you want to do it with XPath then you have to use the namespace axis: /*[namespace::foo[. = 'http://foo']]. DOM Level 3 might provide different ways treating the namespace declarations as attributes and resolving prefixes, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-lookupNamespaceURI.

Parsing XML for deeply nested data

I have an XML file that is structured something like this:
<element1>
<element2>
<element3>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
</element3>
</element2>
</element1>
As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.
I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.
I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.
This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the #XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use #XmlPath.
Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?
Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.
Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:
1) Keep reading nodes until you get <elementInterestedIn> node
2) Grab that field in your class
3) Keep on reading until you get <otherElementInterestedIn> node
4) Grab that field too and save the object.
Loop from 1 to 4 until it reachs the end of document.
If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

Define the order of attributes in dom

I currently working on DOM and i wonder how can change the place of tags data
for example
I have created element:
propElement = document.createElement("prop");
The prop is opening the tag.
Then
propElement.setAttribute("name", "name1");
propElement.setAttribute("name2", "name2");
The problem is that despite i put the set method name2 after name1 I will see in the tag name2 before name1.
How can I change the order ?
(Note; I'm using a Java DOM API, not JavaScript.)
You can't, the order of attributes on elements is not significant. In fact, in a live DOM, there is no order. Order only seems to exist in relation to the serialized form of a DOM (e.g., HTML markup and the like). And even then, the order doesn't have any meaning except in relation to invalid text (more below).
Attributes are basically simple properties of an object (the DOM element to which they're attached). There is absolutely no order to them, and in fact the representation of them in the DOM is a NamedNodeMap which is "...not maintained in any particular order."
It's important to remember that the DOM describes an object model. The serialized form of a DOM may be textual (for instance, an HTML document defining a DOM), but the DOM is not. In an HTML document, since it's linear text (top-to-bottom, left-to-right), naturally the text defining one attribute has to precede the text describing another, but that does not imply any kind of order to the attributes in the resulting DOM object, because they have no order at all. So this:
<div a="1" b="2">...</div>
describes exactly the same element as this:
<div b="2" a="1">...</div>
The resulting element is a div which has an attribute a with the value 1 and an attribute b with the value 2.
This is exactly the same as setting properties on an object in program source. Consider some hypothetical obj with x and y properties. This code:
obj.a = 1;
obj.b = 2;
...results in exactly the same object as this code:
obj.b = 2;
obj.a = 1;
...provided a and b really are simple fields (not hidden function calls that may have side effects), which is true of attributes in the DOM.
There is one small way in which attribute order in the textual (serialized) form of a DOM may be significant, and it's only related to invalid text: If the same attribute is specified more than once, only the first value given is used, because it's invalid to specify the same attribute more than once. The values are not combined, and the subsequent value doesn't overwrite the previous one. The first one, only, is used.
So this invalid HTML:
<div class="foo" class="bar">...</div>
...actually results in a div with class "foo" ("bar" is not present at all). But this is just a coping mechanism for dealing with invalid serialized forms.

Java XML Library that keeps attributes in order?

Is there any java xml library that can create xml tags with attributes in a defined order?
I have to create large xml files with 5 to 20 attributes per tag and I want the attributes ordered for better readability. Attributes which are always created should be at the beginning, followed by common attributes and rarely used attributes should be at the end.
Here's an easy way to create a custom outputter based on jdom:
JDOM uses XmlOutputter to convert a DOM into text output. Attributes are printed with the method:
protected void printAttributes(Writer out, List attributes, Element parent,
NamespaceStack namespaces) throws IOException
As you see, the attributes are passed with a simple list. The (simple) idea would be to subclass this outputter and override the method:
#Override
protected void printAttributes(Writer out, List attributes, Element parent,
NamespaceStack namespaces) throws IOException {
Collections.sort(attributes, new ByNameCompartor());
super.printAttributes(out, attribute, parent, namespaces);
}
The implementation of a comparator shouldn't be that difficult.
I'm not aware of any such library.
The problem is that the XML information model explicitly states that the order the attributes of a tag are irrelevant. Therefore, any in-memory XML representation that goes to the effort of preserving the ordering will more memory than is necessary and/or implement the attribute set/get methods suboptimally.
My advice would be to implement the ordering of attributes in the presentation of your XML content; e.g. in your XML editor.
I should point out that my Answer answers the Question as written - about "keeping the attributes in order". This could mean:
preserving the order of insertion, or
keeping in attributes sorted according to some ordering rule; e.g. ascending order.
From your comment below that you really just want to output the attributes in sorted order. This is a simpler problem, and can be addressed by customizing the code that serializes a DOM. Andreas_D explains how you can do this with JDOM, and there may be other options.
The other point is that something is a bit broken if the order of attributes in the input to a tool makes a significant difference to how it behaves. Because the order of XML attributes is supposed to be non-significant ...

Categories

Resources