XPath to check the namespace a prefix is bound to - java

Say I have the following XML file:
<a xmlns:foo="http://foo"></a>
I need to check whether the prefix foo is bound to http://foo or not. Whereby not bound could indicate that the said prefix does not exist at all or is bound to some other namespace URI.
I already have a library that takes a Document object and an XPath expression and returns a (possibly empty) List of Nodes that exist at that XPath.
So what would be an expression that would check for the presence of a prefix foo in the top-most element (document element) bound to the namespace http://foo and that would yield one node for the above XML and zero nodes for the following XMLs:
<a xmlns:fooX="http://foo"></a>
and
< xmlns:foo="http://fooX"></a>
I tried, as a first step, to just get the value of that attribute using:
/*[#*[local-name()='foo']]
... but it seems that prefix-binding attributes are handled differently from "normal" attributes.

If you want to do it with XPath then you have to use the namespace axis: /*[namespace::foo[. = 'http://foo']]. DOM Level 3 might provide different ways treating the namespace declarations as attributes and resolving prefixes, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-lookupNamespaceURI.

Related

Howto parse xml-tags with prefixes using jersey and jaxb [duplicate]

I saw the following line in an XML file:
xmlns:android="http://schemas.android.com/apk/res/android"
I have also seen xmlns in many other XML files that I've come across.
What is it?
It means XML namespace.
Basically, every element (or attribute) in XML belongs to a namespace, a way of "qualifying" the name of the element.
Imagine you and I both invent our own XML. You invent XML to describe people, I invent mine to describe cities. Both of us include an element called name. Yours refers to the person’s name, and mine to the city name—OK, it’s a little bit contrived.
<person>
<name>Rob</name>
<age>37</age>
<homecity>
<name>London</name>
<lat>123.000</lat>
<long>0.00</long>
</homecity>
</person>
If our two XMLs were combined into a single document, how would we tell the two names apart? As you can see above, there are two name elements, but they both have different meanings.
The answer is that you and I would both assign a namespace to our XML, which we would make unique:
<personxml:person xmlns:personxml="http://www.your.example.com/xml/person"
xmlns:cityxml="http://www.my.example.com/xml/cities">
<personxml:name>Rob</personxml:name>
<personxml:age>37</personxml:age>
<cityxml:homecity>
<cityxml:name>London</cityxml:name>
<cityxml:lat>123.000</cityxml:lat>
<cityxml:long>0.00</cityxml:long>
</cityxml:homecity>
</personxml:person>
Now we’ve fully qualified our XML, there is no ambiguity as to what each name element means. All of the tags that start with personxml: are tags belonging to your XML, all the ones that start with cityxml: are mine.
There are a few points to note:
If you exclude any namespace declarations, things are considered to be in the default namespace.
If you declare a namespace without the identifier, that is, xmlns="http://somenamespace", rather than xmlns:rob="somenamespace", it specifies the default namespace for the document.
The actual namespace itself, often a IRI, is of no real consequence. It should be unique, so people tend to choose a IRI/URI that they own, but it has no greater meaning than that. Sometimes people will place the schema (definition) for the XML at the specified IRI, but that is a convention of some people only.
The prefix is of no consequence either. The only thing that matters is what namespace the prefix is defined as. Several tags beginning with different prefixes, all of which map to the same namespace are considered to be the same.
For instance, if the prefixes personxml and mycityxml both mapped to the same namespace (as in the snippet below), then it wouldn't matter if you prefixed a given element with personxml or mycityxml, they'd both be treated as the same thing by an XML parser. The point is that an XML parser doesn't care what you've chosen as the prefix, only the namespace it maps too. The prefix is just an indirection pointing to the namespace.
<personxml:person
xmlns:personxml="http://example.com/same/url"
xmlns:mycityxml="http://example.com/same/url" />
Attributes can be qualified but are generally not. They also do not inherit their namespace from the element they are on, as opposed to elements (see below).
Also, element namespaces are inherited from the parent element. In other words I could equally have written the above XML as
<person xmlns="http://www.your.example.com/xml/person">
<name>Rob</name>
<age>37</age>
<homecity xmlns="http://www.my.example.com/xml/cities">
<name>London</name>
<lat>123.000</lat>
<long>0.00</long>
</homecity>
</person>
It defines an XML Namespace.
In your example, the Namespace Prefix is "android" and the Namespace URI is "http://schemas.android.com/apk/res/android"
In the document, you see elements like: <android:foo />
Think of the namespace prefix as a variable with a short name alias for the full namespace URI. It is the equivalent of writing <http://schemas.android.com/apk/res/android:foo /> with regards to what it "means" when an XML parser reads the document.
NOTE: You cannot actually use the full namespace URI in place of the namespace prefix in an XML instance document.
Check out this tutorial on namespaces: http://www.sitepoint.com/xml-namespaces-explained/
I think the biggest confusion is that xml namespace is pointing to some kind of URL that doesn't have any information. But the truth is that the person who invented below namespace:
xmlns:android="http://schemas.android.com/apk/res/android"
could also call it like that:
xmlns:android="asjkl;fhgaslifujhaslkfjhliuqwhrqwjlrknqwljk.rho;il"
This is just a unique identifier. However it is established that you should put there URL that is unique and can potentially point to the specification of used tags/attributes in that namespace. It's not required tho.
Why it should be unique? Because namespaces purpose is to have them unique so the attribute for example called background from your namespace can be distinguished from the background from another namespace.
Because of that uniqueness you do not need to worry that if you create your custom attribute you gonna have name collision.
xmlns - xml namespace. It's just a method to avoid element name conflicts. For example:
<config xmlns:rnc="URI1" xmlns:bsc="URI2">
<rnc:node>
<rnc:rncId>5</rnc:rncId>
</rnc:node>
<bsc:node>
<bsc:cId>5</bsc:cId>
</bsc:node>
</config>
Two different node elements in one xml file. Without namespaces this file would not be valid.
You have name spaces so you can have globally unique elements. However, 99% of the time this doesn't really matter, but when you put it in the perspective of The Semantic Web, it starts to become important.
For example, you could make an XML mash-up of different schemes just by using the appropriate xmlns. For example, mash up friend of a friend with vCard, etc.

How xpath works internally?

When I Write a XPath then from where does the browser fetch the XML of page,In short how browser works internally with xpath.
I am learning selenium and I am using xpath to identify WebElements.
In general, an XPath expression specifies a pattern that selects a set of XML nodes. XSLT templates then use those patterns when applying transformations.
(XPointer, on the other hand, adds mechanisms for defining a point or a range so that XPath expressions can be used for addressing).
The nodes in an XPath expression refer to more than just elements. They also refer to text and attributes, among other things. In fact, the XPath specification defines an abstract document model.
For more you can refer this link : How xpath works internally
In general an XPath processor takes as input (a) an XPath expression, and (b) a node used as the context node; it evaluates that expression against that context node, and returns a result to the calling application.
So an API for invoking XPath will generally look like
result = xpath.eval(expression, contextNode)
or perhaps
result = contextNode.evalXPath(expression)
or perhaps
result = xpath.compile(expression).eval(contextNode)
In a web browser environment the contextNode might implicitly be set to the HTML page by default.
In practice APIs for invoking XPath have additional complexities, for example to allow the namespace context to be set, and to allow external variables/parameters to be bound to values.

Get object for element that has failed XSD validation

I'm validating an XML document against an XSD, and then want to delete the nodes that cause the document to fail.
I'm hitting against a problem in that SaxParseException doesn't seem to contain any information about the failure that I can use to programatically remove nodes.
Is there a way to get a reference to the element, that can be used to remove it, from a SaxParseException?
See the answers here: How to get the element of and invalid xml file with failed xsd Validation
Note that what you are proposing to do is unsafe in the general case. For a simple counter-example, take an element X of type integer that must occur at least once in its parent. If you put a string value in it, it will now fail validation. If you remove it, the document will violate the minOccurs constraint.
You could try to remove the element and restart validation from scratch, but you could end up in a very long loop and get no good result.

Define the order of attributes in dom

I currently working on DOM and i wonder how can change the place of tags data
for example
I have created element:
propElement = document.createElement("prop");
The prop is opening the tag.
Then
propElement.setAttribute("name", "name1");
propElement.setAttribute("name2", "name2");
The problem is that despite i put the set method name2 after name1 I will see in the tag name2 before name1.
How can I change the order ?
(Note; I'm using a Java DOM API, not JavaScript.)
You can't, the order of attributes on elements is not significant. In fact, in a live DOM, there is no order. Order only seems to exist in relation to the serialized form of a DOM (e.g., HTML markup and the like). And even then, the order doesn't have any meaning except in relation to invalid text (more below).
Attributes are basically simple properties of an object (the DOM element to which they're attached). There is absolutely no order to them, and in fact the representation of them in the DOM is a NamedNodeMap which is "...not maintained in any particular order."
It's important to remember that the DOM describes an object model. The serialized form of a DOM may be textual (for instance, an HTML document defining a DOM), but the DOM is not. In an HTML document, since it's linear text (top-to-bottom, left-to-right), naturally the text defining one attribute has to precede the text describing another, but that does not imply any kind of order to the attributes in the resulting DOM object, because they have no order at all. So this:
<div a="1" b="2">...</div>
describes exactly the same element as this:
<div b="2" a="1">...</div>
The resulting element is a div which has an attribute a with the value 1 and an attribute b with the value 2.
This is exactly the same as setting properties on an object in program source. Consider some hypothetical obj with x and y properties. This code:
obj.a = 1;
obj.b = 2;
...results in exactly the same object as this code:
obj.b = 2;
obj.a = 1;
...provided a and b really are simple fields (not hidden function calls that may have side effects), which is true of attributes in the DOM.
There is one small way in which attribute order in the textual (serialized) form of a DOM may be significant, and it's only related to invalid text: If the same attribute is specified more than once, only the first value given is used, because it's invalid to specify the same attribute more than once. The values are not combined, and the subsequent value doesn't overwrite the previous one. The first one, only, is used.
So this invalid HTML:
<div class="foo" class="bar">...</div>
...actually results in a div with class "foo" ("bar" is not present at all). But this is just a coping mechanism for dealing with invalid serialized forms.

Setting the order of attribute when writing XML Element [duplicate]

This question already has answers here:
Order of XML attributes after DOM processing
(12 answers)
Closed 9 years ago.
When writing in java the following:
Element fieldEl = targetDocument.createElement("field");
fieldEl.setAttribute("Wine","Marlo");
fieldEl.setAttribute("Beer","Corona");
The order of adding the attributes are not kept in the result XML file.
How can I control the order of the attribute inside XML Element (so it will be easy for human being to read...) ??
There is no defined order for attribute nodes according to the DOM standard:
Objects implementing the NamedNodeMap interface are used to represent collections of nodes that can be accessed by name. Note that NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not maintained in any particular order. Objects contained in an object implementing NamedNodeMap may also be accessed by an ordinal index, but this is simply to allow convenient enumeration of the contents of a NamedNodeMap, and does not imply that the DOM specifies an order to these Nodes.
(emphasis added) and neither in the XML standard:
Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.
I don't think, that many DOM implementations support ordering of attributes at all. You'd have to write your own serialization mechanism in order to achieve ordering (no pun intended).

Categories

Resources