Ignore namespace to create nodelist in Java - java

I'm parsing a XML string to generate nodes. Sometimes the tag comes with a namespace & sometimes without namespace. How can I ignore this and
I tried in the following way, but it didnt work.
//NodeList idDetails = doc.getDocumentElement().getElementsByTagNameNS("*", "details");
NodeList idDetails = doc.getElementsByTagName("ns2:details");
Any ideas on how to do it?

First one shall work.
NodeList nodes = doc.getDocumentElement().getElementsByTagNameNS("*", str);
But you have to also call DocumentBuilderFactory.setNamespaceAware(true) for this to work, otherwise namespaces will not be detected.

Related

Java 9, INVALID_CHARACTER_ERR when trying to add URL as element in XML

I'm working with XML for the first time, trying to generate XML to send over to a client and I'm having a hell of a time doing it. Whenever I try to pass a URL, I get an INVALID_CHARACTER_ERR and nothing I've tried so far works.
I tried using replacements like & #123; and so on for the curly braces, and tried escaping everything that wasn't a letter, resulting in the abomination under my code. It seems to throw the error if I have any kind of character that isn't a letter. Another thing that I noticed is that the document's InputEncoding is null, but that seems to be because I'm creating it in code, does that mean that it actually doesn't have an encoding type? I haven't been able to find an easy way to set it either.
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document orders = dBuilder.newDocument();
Element order = orders.createElement("{https://secure.targeturl.com/foo/bar}tagpayload");
Element tOrder = orders.createElement("tagorder");
order.appendChild(tOrder);
Element header = orders.createElement("orderheader");
tOrder.appendChild(header);
Element billto = orders.createElement("billto");
header.appendChild(billto); ```
``` "& #123;https& #58;& #47;& #47;secure& #46;targeturl& #46;com/foo& #47;bar& #125;tagpayload" ```
This is not the correct way to create a namespaced element:
Element order = orders.createElement("{https://secure.targeturl.com/foo/bar}tagpayload");
Instead, use the createElementNS method:
Element order = orders.createElementNS("https://secure.targeturl.com/foo/bar", "tagpayload");
You are seeing an exception because { is not a legal character in an XML element name. createElement has no awareness of namespaces or the “{uri}name” namespace notation.

How do I create an xPath statement from a NodeInfo?

I'm using the S9API with Saxon 9.7 HE, and I have a NodeInfo object. I need to create an xPath statement that uniquely corresponds to this NodeInfo. It's easy enough to get the prefix, the node name, and the parent:
String prefix = node.getPrefix();
String localPart = node.getLocalPart();
NodeInfo parent = node.getParent();
So I can walk up the tree, building the xPath as I go. But what I can't find is any way to get the positional predicate info. IOW, it's not sufficient to create:
/persons/person/child
because it might match multiple child elements. I need to create:
/persons/person[2]/child[1]
which will match only one child element. Any ideas on how to get the positional predicate info? Or maybe there's a better way to do this altogether?
BTW, for those who use the generic DOM and not the S9API, here's an easy solution to this problem: http://practicalxml.sourceforge.net/apidocs/net/sf/practicalxml/DomUtil.html#getAbsolutePath(org.w3c.dom.Element)
Edit: #Michael Kay's answer works. To add some meat to it:
XPathExpression xPathExpression = xPath.compile("./path()");
List matches = (List) xPathExpression.evaluate(node, XPathConstants.NODESET);
String pathToNode = matches.get(0).toString();
// If you want to remove the expanded QName syntax:
pathToNode = pathToNode.replaceAll("Q\\{.*?\\}", "");
This must be done using the same xPath object that was previously used to acquire the NodeInfo object.
In XPath 3.0 you can use fn:path().
Earlier Saxon releases offer saxon:path().
The challenge here is handling namespaces. fn:path() returns a path that's not sensitive to namespace-prefix bindings by using the new expanded-QName syntax
/Q{}persons/Q{}person[2]/Q{}child[1]

How to get Node from XML without considering namespace name in Java?

I am writing a java program in which I am parsing input xml file which looks like this:
...
<ems:DeterminationRequest>
<ems:MessageInformation>
<ns17:MessageID xmlns:ns17="http://www.calheers.ca.gov/EHITSAWSInterfaceCommonSchema">1000225404</ns17:MessageID>
<ns17:MessageTimeStamp xmlns:ns17="http://www.calheers.ca.gov/EHITSAWSInterfaceCommonSchema">2015-07-28T01:17:04</ns17:MessageTimeStamp>
<ns17:SendingSystem xmlns:ns17="http://www.calheers.ca.gov/EHITSAWSInterfaceCommonSchema">CH</ns17:SendingSystem>
<ns17:ReceivingSystem xmlns:ns17="http://www.calheers.ca.gov/EHITSAWSInterfaceCommonSchema">LD</ns17:ReceivingSystem>
<ns17:ServicingFipsCountyCode xmlns:ns17="http://www.calheers.ca.gov/EHITSAWSInterfaceCommonSchema">037</ns17:ServicingFipsCountyCode>
</ems:MessageInformation>
</ems:DeterminationRequest>
...
Now I am trying to get node "ems:MessageInformation" without considering namespace name "ems". So I tried following lines of code:
Document doc = db.parse(new FileInputStream(new File("D:\\test.xml")));
Node element = doc.getDocumentElement().getElementsByTagNameNS("*","MessageInformation").item(0);
System.out.println(element.getNodeName());
But it's giving Null Pointer exception because function is not reading required node. I gone through this link for reference. Can someone tell me what I am doing wrong here?
This is an odd/buggy behaviour in den NodeList implementation returned by
doc.getDocumentElement().getElementsByTagNameNS("*","MessageInformation")
It allows you to access item(0) but returns a null object.
(If you are using a current JDK the NodeList implementation is com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl which lazily loads its items and shows this buggy behaviour).
To prevent the NullPointerException you should first check if the returned NodeList has a length > 0:
NodeList result = doc.getDocumentElement().getElementsByTagNameNS("*","MessageInformation");
if (result.getLength() > 0) {
Node element = (Element)result.item(0);
...
}
Then you need to find out why getElementsByTagNameNS does not return the element.
One possible reason could be that you parsed the document without namespace support. The consequence is that the dom elements don't have namespace information and getElementsByTagNameNS fails.
To turn on namespace support use:
DocumentBuilderFactory.setNamespaceAware(true);
Alternatively without namespace support you could search for
NodeList nl = doc.getDocumentElement().getElementsByTagName("ems:MessageInformation");

Remove Element from JDOM document using removeContent()

Given the following scenario, where the xml, Geography.xml looks like -
<Geography xmlns:ns="some valid namespace">
<Country>
<Region>
<State>
<City>
<Name></Name>
<Population></Population>
</City>
</State>
</Region>
</Country>
</Geography>
and the following sample java code -
InputStream is = new FileInputStream("C:\\Geography.xml");
SAXBuilder saxBuilder = new SAXBuilder();
Document doc = saxBuilder.build(is);
XPath xpath = XPath.newInstance("/*/Country/Region/State/City");
Element el = (Element) xpath.selectSingleNode(doc);
boolean b = doc.removeContent(el);
The removeContent() method doesn't remove the Element City from the content list of the doc. The value of b is false
I don't understand why is it not removing the Element, I even tried to delete the Name & Population elements from the xml just to see if that was the issue but apparently its not.
Another way I tried, I don't know why I know its not essentially different, still just for the sake, was to use Parent -
Parent p = el.getParent();
boolean s = p.removeContent(new Element("City"));
What might the problem? and a possible solution? and if anyone can share the real behaviour of the method removeContent(), I suspect it has to do with the parent-child relationship.
Sure, removeContent(Content child) removes child if child belongs to the parents immediate children, which it does not in your case. Use el.detach()instead.
If you want to remove the City element, get its parent and call removeContent:
XPath xpath = XPath.newInstance("/*/Country/Region/State/City");
Element el = (Element) xpath.selectSingleNode(doc);
el.getParent().removeContent(el);
The reason why doc.removeContent(el) does not work is because el is not a child of doc.
Check the javadocs for details. There are a number of overloaded removeContent methods there.
This way works keeping in mind that .getParent() returns a Parent object instead of an Element object, and the detach() method which eliminates the actual node, must be called from an Element.
Instead do:
el.getParentElement().detach();
This will remove the parent element with all it's children !

Problems setting a new node value in java, dom, xml parsing

I have the following code:
DocumentBuilder dBuilder = dbFactory_.newDocumentBuilder();
StringReader reader = new StringReader(s);
InputSource inputSource = new InputSource(reader);
Document doc_ = dBuilder.parse(inputSource);
and then I would like to create a new element in that node right under the root node with this code:
Node node = doc_.createElement("New_Node");
node.setNodeValue("New_Node_value");
doc_.getDocumentElement().appendChild(node);
The problem is that the node gets created and appended but the value isn't set. I don't know if I just can't see the value when I look at my xml if its hidden in some way but I don't think that's the case because I've tried to get the node value after the create node call and it returns null.
I'm new to xml and dom and I don't know where the value of the new node is stored. Is it like an attribute?
<New_Node value="New_Node_value" />
or does it put value here:
<New_Node> New_Node_value </New_Node>
Any help would be greatly appreciated,
Thanks, Josh
The following code:
Element node = doc_.createElement("New_Node");
node.setTextContent("This is the content"); //adds content
node.setAttribute("attrib", "attrib_value"); //adds an attribute
produces:
<New_Node attrib="attrib_value">This is the content</New_Node>
Hope this clarifies.
For clarification, when you create nodes use:
Attr x = doc.createAttribute(...);
Comment x = doc.createComment(...);
Element x = doc.createElement(...); // as #dogbane pointed out
Text x = doc.createTextNode(...);
instead of using the generic Node for what you get back from each method. It will make your code easier to read/debug.
Secondly, the getNodeValue() / setNodeValue() methods work differently depending on what type of Node you have. See the summary of the Node class for reference. For an Element, you can't use these methods, although for a Text node you can.
As #dogbane pointed out, use setTextContent() for the text between this element's tags. Note that this will destroy any existing child elements.
This is other solution, in my case this solution is working because the setTextContent() function not exist. I am working with Google Web Toolkit (GWT) (It is a development framework Java) and I am imported the XMLParser library for I can use DOM Parser.
import com.google.gwt.xml.client.XMLParser;
Document doc = XMLParser.createDocument();
Element node = doc.createElement("New_Node");
node.appendChild(doc.createTextNode("value"));
doc.appendChild(node);
The result is:
<New_Node> value </New_Node>
<New_Node value="New_Node_value" />
'value' is an attribute of
New_Node
element, for getting into DOM I suggest you http://www.w3schools.com/htmldom/default.asp

Categories

Resources