Using XPath to get child elements - java

I'm trying to use DocumentBuilder and XPath to parse an XML document with structure like:
<questionnaire>
<item>
<question>How have you been?</question>
<response>Great</response>
<response>Good</response>
<response>So-so</response>
<response>Bad</response>
<response>Rather not answer</response>
</item>
</questionnaire>
To access question I've done this (which works):
expression = "/questionnaire/item[" + i + "]/question";
setQuestion(xmlReader.read(expression, XPathConstants.STRING).toString());
Now I need some way to create a list of string based on the response items. The number of responses is variable so one question could have any number of responses. Does anyone know how to do this?
Thanks

Something like this won't do it? You have to note that xmlReader.read probably return a collection in that case.
expression = "/questionnaire/item[" + i + "]/response";
setResponse(xmlReader.read(expression, XPathConstants.STRING));

Related

Java 9, INVALID_CHARACTER_ERR when trying to add URL as element in XML

I'm working with XML for the first time, trying to generate XML to send over to a client and I'm having a hell of a time doing it. Whenever I try to pass a URL, I get an INVALID_CHARACTER_ERR and nothing I've tried so far works.
I tried using replacements like & #123; and so on for the curly braces, and tried escaping everything that wasn't a letter, resulting in the abomination under my code. It seems to throw the error if I have any kind of character that isn't a letter. Another thing that I noticed is that the document's InputEncoding is null, but that seems to be because I'm creating it in code, does that mean that it actually doesn't have an encoding type? I haven't been able to find an easy way to set it either.
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document orders = dBuilder.newDocument();
Element order = orders.createElement("{https://secure.targeturl.com/foo/bar}tagpayload");
Element tOrder = orders.createElement("tagorder");
order.appendChild(tOrder);
Element header = orders.createElement("orderheader");
tOrder.appendChild(header);
Element billto = orders.createElement("billto");
header.appendChild(billto); ```
``` "& #123;https& #58;& #47;& #47;secure& #46;targeturl& #46;com/foo& #47;bar& #125;tagpayload" ```
This is not the correct way to create a namespaced element:
Element order = orders.createElement("{https://secure.targeturl.com/foo/bar}tagpayload");
Instead, use the createElementNS method:
Element order = orders.createElementNS("https://secure.targeturl.com/foo/bar", "tagpayload");
You are seeing an exception because { is not a legal character in an XML element name. createElement has no awareness of namespaces or the “{uri}name” namespace notation.

Parsing xml with multi childs using jsoup

I have an xml file that looks as follows - link.
I would like to get the title from it.
In order to do so, I did the following:
Document bookDoc = Jsoup.connect( url ).parser( Parser.xmlParser() ).get();
Node node = bookDoc.childNode( 2 ).childNode( 3 ).childNode( 3 );
This returns me this:
Now I have 2 questions:
Isnt there any simpler way to get this title instead of using all of these childNodes? My worry is that in some result the title wont exactly be at childNode(3) and all my code wont work.
How do I eventually get this title? Im stuck at this point and cant get the string of the title.
Thank you
You can use selectors to access elements. Here you want to select by tag name. Two ways to get the element you want:
String title1 = bookDoc.select("record>display>title").text();
String title2 = bookDoc.selectFirst("record").selectFirst("display").selectFirst("title").text();
If you want to select more complicated things read:
https://jsoup.org/cookbook/extracting-data/dom-navigation
https://jsoup.org/cookbook/extracting-data/selector-syntax
But you probably won't need them for parsing this XML.

Retrieve value of attribute using XPath

I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?
However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.
The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).
Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.
If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID

How to extract xml tag value without using the tag name in java?

I am using java.I have an xml file which looks like this:
<?xml version="1.0"?>
<personaldetails>
<phno>1553294232</phno>
<email>
<official>xya#gmail.com</official>
<personal>bk#yahoo.com</personal>
</email>
</personaldetails>
Now,I need to check each of the tag values for its type using specific conditions,and put them in separate files.
For example,in the above file,i write conditions like 10 digits equals phone number,
something in the format of xxx#yy.com is an email..
So,what i need to do is i need to extract the tag values in each tag and if it matches a certain condition,it is put in the first text file,if not in the second text file.
in that case,the first text file will contain:
1553294232
xya#gmail.com
bk#yahoo.com
and the rest of the values in the second file.
i just don't know how to extract the tag values without using the tag name.(or without using GetElementsByTagName).
i mean this code should extract the email bk#yahoo.com even if i give <mailing> instead of <personal> tag.It should not depend on the tag name.
Hope i am not confusing.I am new to java using xml.So,pardon me if my question is silly.
Please Help.
Seems like a typical use case for XPath
XPath allows you to query XML in a very flexible way.
This tutorial could help:
http://www.javabeat.net/2009/03/how-to-query-xml-using-xpath/
If you're using Java script, which could to be the case, since you mention getElementsByTagName(), you could just use JQuery selectors, it will give you a consistent behavior across browsers, and JQuery library is useful for a lot of other things, if you are not using it already... http://api.jquery.com/category/selectors/
Here for example is information on this:
http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery
Since you don't know your element name, I would suggest creating a DOM tree and iterating through it. As and when you get a element, you would try to match it against your ruleset (and I would suggest using regex for this purpose) and then write it to your a file.
This would be a sample structure to help you get started, but you would need to modify it based on your requirement:
public void parseXML(){
try{
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc;
doc = documentBuilder.parse(new File("test.xml"));
getData(null, doc.getDocumentElement());
}catch(Exception exe){
exe.printStackTrace();
}
}
private void getData(Node parentNode, Node node){
switch(node.getNodeType()){
case Node.ELEMENT_NODE:{
if(node.hasChildNodes()){
NodeList list = node.getChildNodes();
int size = list.getLength();
for(int index = 0; index < size; index++){
getData(node, list.item(index));
}
}
break;
}
case Node.TEXT_NODE:{
String data = node.getNodeValue();
if(data.trim().length() > 0){
/*
* Here you need to check the data against your ruleset and perform your operation
*/
System.out.println(parentNode.getNodeName()+" :: "+node.getNodeValue());
}
break;
}
}
}
You might want to look at the Chain of Responsibility design pattern to design your ruleset.

Java DOM XML Parsing How to walk through multiple node levels

I have the following xml structure
<clinic>
<category>
<employees>
<medic>
<medic_details>
<medic_name />
<medic_address />
</medic_details>
<pacients>
<pacient>
<pacient_details>
<pacient_name> ...
<pacient_address> ...
</pacient_details>
<diagnostic>
<disease>
<disease_name>Disease</disease_name>
<treatment>Treatment</treatment>
</disease>
<disease>
<disease_name>Disease</disease_name>
<treatment>Treatment</treatment>
</disease>
</diagnostic>
</pacient>
</pacients>
<medic>
</employees>
</category>
</clinic>
I have a JTextArea where I want to show information from the xml file. For example, for showing each medic, with its name, adress, and treating pacients with their respective names, i have the following code:
NodeList medicNList = doc.getElementsByTagName("medic");
for (int temp = 0; temp < medicNList.getLength(); temp++) {
Node medicNode = medicNList.item(temp);
Element eElement = (Element) medicNode;
area.append("\n");
area.append("Medic Name : " + getTagValue("medic_name", eElement) + "\n");
area.append("Medic Address : " + getTagValue("medic_address", eElement) + "\n");
area.append("\n");
area.append("Pacients : \n");
area.append("Pacient Name : " + getTagValue("pacient_name", eElement) + "\n");
area.append("Pacient Name : " + getTagValue("pacient_address", eElement) + "\n");
}
My question is, if i want to have more than 1 disease per pacient, how do I display all of the diseases for each pacient? I don't know how to "walk" to the diagnostic node for each pacient and showing the relevant data inside
Your code looks incorrect as it is. You currently have multiple pacient (patients) per medic so you should be iterating the list of patients for each medic.
Then iterate diseases for each patient. You need to use the getElementsByTagName method for each nesting in the XML. Plus you need to skip over the pluralised elements such as <pacients>.
I would suggest you use an XPath library instead as it can make the code a lot easier to read. There are plenty of good ones out there. I would recommend jaxen
I would give htmlcleaner a try.
HTMLCleaner is Java library used to safely parse and transform any HTML found on web to well-formed XML. It is designed to be small, fast, flexible and independant. HtmlCleaner may be used in java code, as command line tool or as Ant task. Result of parsing is lightweight document object model which can easily be transformed to standards like DOM or JDom, or serialized to XML output in various ways (compact, pretty printed and so on).
You can use XPath with htmlcleaner to get contents within xml tags.Here is a nice
example Xpath Example

Categories

Resources