PostgreSQL Field that maps to a XML Property - java

We are building a WebSite that is 100% data driven. All possible field names will be in PostgreSQL and all values for those fields are coming from a Web Service. The end user will have the ability to build their own page, by clicking on fields that they want on their screen. I'm trying to come up with the best way to have the text field in PostgreSQL to relay the full mapping in the XML data that's coming back from the Web Service. Should I use root\property1\subproperty and just have something loop through breaking it down from XML or is there a more effective way?
EDIT: Replaced JSON with XML. I've been working with JSON so much lately, I misspoke and said JSON, when these REST Web Services return XML.
EDIT2: I found kind of a solution, but as you can see in the below example, if the node name exists twice, then it will return as two nodes. I need to figure out if I should stored in DB as newnode2\firstName, I then should loop through nodes, looking for newnode2 first, then looping through to find firstName. I remember many years ago using an XML object in .NET were I could just do something like a #nodex\subnode and get values.. I might be thinking to hard here.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = null;
try
{
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse( new InputSource( new StringReader( "<response><responseStatus>Success</responseStatus><dataSet name=\"myvalue\"><newnode><firstName>John</firstName><lastName>Smith</lastName></newnode><newnode2><firstName>Bob</firstName><birthDate>11/11/1965</birthDate></newnode2></dataSet></response>" )) );
NodeList nList = doc.getElementsByTagName("firstName");
Node n;
String value;
for(int i=0; i<nList.getLength(); i++)
{
n = nList.item(i);
value = n.getTextContent();
}
} catch (Exception e) {
e.printStackTrace();
}

Ok, I came up with a recursive function. Please let me know if you see or know of a better way
String xml = "<response><responseStatus>Success</responseStatus><dataSet name=\"myvalue\"><newnode><firstName>John</firstName><lastName>Smith</lastName></newnode><newnode2><firstName>Bob</firstName><birthDate>11/11/1965</birthDate></newnode2></dataSet></response>";
String value;
value = getXMLValue(xml, "newnode2/firstName"); //example of multi-node
value = getXMLValue(xml, "birthDate"); //example of going directly to field if it's the only node with that name.
value = getXMLValue(xml, "dataSet/name"); //example of getting attribute
Functions:
public String getXMLValue(String xml, String searchNodes) {
String retVal = "";
String[] nodeSplit = searchNodes.split("/");
try
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc;
NodeList nList;
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse( new InputSource( new StringReader( xml )) );
nList = doc.getElementsByTagName(nodeSplit[0]);
retVal = GetNode(nList, searchNodes, 0);
} catch (Exception e) {
e.printStackTrace();
}
return retVal;
}
public String GetNode(NodeList nl, String searchNodes, int item)
{
String retVal = null;
String findNode = searchNodes.split("/")[item];
int count = searchNodes.split("/").length;
item++;
for(int i=0; i<nl.getLength(); i++) {
String foundNode = nl.item(i).getNodeName();
NamedNodeMap nnm = nl.item(i).getAttributes();
if(nnm!=null && nnm.getLength()>0 && count>item) {
Node attribute = nnm.getNamedItem(searchNodes.split("/")[item]);
if(attribute!=null) {
retVal = attribute.getTextContent();
break;
}
}
if(foundNode.equals(findNode) && count>item) {
retVal = GetNode(nl.item(i).getChildNodes(), searchNodes, item);
break;
} else if(foundNode.equals(findNode) && count==item) {
retVal = nl.item(i).getTextContent();
break;
}
}
return retVal;
}

Related

How to get data from XML node?

I am struggling to get the data out of the following XML node. I use DocumentBuilder to parse XML and I usually get the value of a node by defining the node but in this case I am not sure how the node would be.
<Session.openRs status="success" sessionID="19217B84:AA3649FE:B211FF37:E61A78F1:7A35D91D:48E90C41" roleBasedSecurity="1" entityID="1" />
This is how I am getting the values for other tags by the tag name.
public List<NYProgramTO> getNYPPAData() throws Exception {
this.getConfiguration();
List<NYProgramTO> to = dao.getLatestNYData();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document document = null;
// Returns chunkSize
/*List<NYProgramTO> myList = getNextChunk(to);
ExecutorService executor = Executors.newFixedThreadPool(myList.size());
myList.stream().parallel()
.forEach((NYProgramTO nyTo) ->
{
executor.execute(new NYExecutorThread(nyTo, migrationConfig , appContext, dao));
});
executor.shutdown();
executor.awaitTermination(300, TimeUnit.SECONDS);
System.gc();*/
try {
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource source = new InputSource();
for(NYProgramTO nyProgram: to) {
String reqXML = nyProgram.getRequestXML();
String response = RatingRequestProcessor.postRequestToDC(reqXML, URL);
// dao.storeData(nyProgram);
System.out.println(response);
if(response != null) {
source.setCharacterStream(new StringReader(response));
document = builder.parse(source);
NodeList list = document.getElementsByTagName(NYPG3Constants.SERVER);
for(int iterate = 0; iterate < list.getLength(); iterate++){
Node node = list.item(iterate);
if(node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
nyProgram.setResponseXML(response);
nyProgram.setFirstName(element.getElementsByTagName(NYPG3Constants.F_NAME).item(0).getTextContent());
nyProgram.setLastName(element.getElementsByTagName(NYPG3Constants.L_NAME).item(0).getTextContent());
nyProgram.setPolicyNumber(element.getElementsByTagName(NYPG3Constants.P_NUMBER).item(0).getTextContent());
nyProgram.setZipCode(element.getElementsByTagName(NYPG3Constants.Z_CODE).item(0).getTextContent());
nyProgram.setDateOfBirth(element.getElementsByTagName(NYPG3Constants.DOB).item(0).getTextContent());
nyProgram.setAgencyCode(element.getElementsByTagName(NYPG3Constants.AGENCY_CODE).item(0).getTextContent());
nyProgram.setLob(element.getElementsByTagName(NYPG3Constants.LINE_OF_BUSINESS).item(0).getTextContent());
if(element.getElementsByTagName(NYPG3Constants.SUBMISSION_NUMBER).item(0) != null){
nyProgram.setSubmissionNumber(element.getElementsByTagName(NYPG3Constants.SUBMISSION_NUMBER).item(0).getTextContent());
} else {
nyProgram.setSubmissionNumber("null");
}
I need to get the value for sessionId. What I want to know is the node, I am sure it can't be .I am retrieving the values via tag names so what would be the tag name in this case?
Thanks in advance
You should consider using XPath. At least for me, is so much easy to use and, in your case, in order to get sessionID you could try something like this:
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Session.openRs/#sessionID";
String sessionID = xPath.evaluate(expression,document);
You can obtain 'document' like this:
Document document = builder.newDocumentBuilder();
Hope this can help!!

How to get all attributes from each element separately?

Here's some basic xml doc:
<h1>My Heading</h1>
<p align = "center"> My paragraph
<img src="smiley.gif" alt="Smiley face" height="42" width="42"></img>
<img src="sad.gif" alt="Sad face" height="45" width="45"></img>
<img src="funny.gif" alt="Funny face" height="48" width="48"></img>
</p>
<p>My para</p>
What am i trying to do is find element, all his attributes and save attribute name + attribute value for each element. Here's my code so far:
private Map <String, String> tag = new HashMap <String,String> ();
public Map <String, String> findElement () {
try {
FileReader fRead = new FileReader (sourcePage);
BufferedReader bRead = new BufferedReader (fRead);
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance ();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder ();
Document doc = docBuilder.parse(new FileInputStream (new File (sourcePage)));
XPathFactory xFactory = XPathFactory.newInstance ();
XPath xPath = xFactory.newXPath ();
NodeList nl = (NodeList) xPath.evaluate("//img/#*", doc, XPathConstants.NODESET);
for( int i=0; i<nl.getLength (); i++) {
Attr attr = (Attr) nl.item(i);
String name = attr.getName();
String value = attr.getValue();
tag.put (name,value);
}
bRead.close ();
fRead.close ();
}
catch (Exception e) {
e.printStackTrace();
System.err.println ("An error has occured.");
}
Problem appears when i am looking for img's attributes, because of identical attributes. HashMap is not suitable for this, for its overwriting of values with the same key. Maybe i'm using wrong expression to find all attributes. Is there any other way, how to get attributes names and values of nth img element?
First, let's level the field a little. I cleaned up your code a bit to have a compiling starting point. I removed the unnecessary code and fixed the method by my best guess of what it is supposed to do. And I generized it a little to make it accept one tagName parameter. It's still the same code and does the same mistake, but now it compiles (Java 7 features used for convenience, switch it back to Java 6 if you want). I also split the try-catch into multiple blocks just for the sake of it:
public Map<String, String> getElementAttributesByTagName(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
NodeList attributeList;
try {
XPath xPath = XPathFactory.newInstance().newXPath();
attributeList = (NodeList)xPath.evaluate("//descendant::" + tagName + "[1]/#*", document, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
throw new RuntimeException(e);
}
Map<String, String> tagInfo = new HashMap<>();
for (int i = 0; i < attributeList.getLength(); i++) {
Attr attribute = (Attr)attributeList.item(i);
tagInfo.put(attribute.getName(), attribute.getValue());
}
return tagInfo;
}
When run against your example code above, it returns:
{height=48, alt=Funny face, width=48, src=funny.gif}
The solution depends on what is your expected output. You either want
To get the attributes of only one of the <img> elements (say, the first one)
To get a list of all <img> elements and their attributes
For the first solution, it's enough to change your XPath expression to
//descendant::img[1]/#*
or
//descendant::" + tagName + "[1]/#*
with the tagName parameter. Beware, that this is not the same as //img[1]/#* even though it returns the same element in this particular case.
When changed this way, the method returns:
{height=42, alt=Smiley face, width=42, src=smiley.gif}
which are correctly returned attributes of the first <img> element.
Note that you don't even have to use XPath expression for this kind of work. Here's a non-XPath version:
public Map<String, String> getElementAttributesByTagNameNoXPath(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
Node node = document.getElementsByTagName(tagName).item(0);
NamedNodeMap attributeMap = node.getAttributes();
Map<String, String> tagInfo = new HashMap<>();
for (int i = 0; i < attributeMap.getLength(); i++) {
Node attribute = attributeMap.item(i);
tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
}
return tagInfo;
}
The second solution needs to change things a bit. We want to return the attributes of all <img> elements in the document. Multiple elements means we'll use a List which will hold multiple Map<String, String> instances, where every Map represents one <img> element.
A complete XPath version in case you actually need some complex XPath expression:
public List<Map<String, String>> getElementsAttributesByTagName(String tagName) {
Document document;
try (InputStream input = new FileInputStream(sourcePage)) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
document = docBuilder.parse(input);
} catch (IOException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
NodeList nodeList;
try {
XPath xPath = XPathFactory.newInstance().newXPath();
nodeList = (NodeList)xPath.evaluate("//" + tagName, document, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
throw new RuntimeException(e);
}
List<Map<String, String>> tagInfoList = new ArrayList<>();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
NamedNodeMap attributeMap = node.getAttributes();
Map<String, String> tagInfo = new HashMap<>();
for (int j = 0; j < attributeMap.getLength(); j++) {
Node attribute = attributeMap.item(j);
tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
}
tagInfoList.add(tagInfo);
}
return tagInfoList;
}
To get rid of the XPath part, you can simply switch it to a one-liner:
NodeList nodeList = document.getElementsByTagName(tagName);
Both these versions, when run against your test case above with an "img" parameter, return this: (formatted for clarity)
[ {height=42, alt=Smiley face, width=42, src=smiley.gif},
{height=45, alt=Sad face, width=45, src=sad.gif },
{height=48, alt=Funny face, width=48, src=funny.gif } ]
which is a correct list of all the <img> elements.
try using
Map <String, ArrayList<String>> tag = new HashMap <String, ArrayList<String>> ();
You can use a map inside the map:
Map<Map<int, String>, String> // int = "some index" 0,1,etc.. & String1(the value of the second Map) =src & String2(the value of the original Map) =smiley.gif
OR
You can inverse it and consider that when using it, like :
Map<String, String> // String1=key=smiley.gif , String2=value=src

Node.getTextContent() is there a way to get text content of the current node, not the descendant's text

Node.getTextContent() returns the text content of the current node and its descendants.
is there a way to get text content of the current node, not the descendant's text.
Example
<paragraph>
<link>XML</link>
is a
<strong>browser based XML editor</strong>
editor allows users to edit XML data in an intuitive word processor.
</paragraph>
expected output
paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor
i tried below code
String str = "<paragraph>"+
"<link>XML</link>"+
" is a "+
"<strong>browser based XML editor</strong>"+
"editor allows users to edit XML data in an intuitive word processor."+
"</paragraph>";
org.w3c.dom.Document domDoc = null;
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try {
docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (ParserConfigurationException e1) {
e1.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(
domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + ((Element)n).getTextContent());
}
but it gives the output like this
paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
note the paragraph element contains the text of link and strong tag, which i dont want.
please suggest some ideas?
What you want is to filter children of your node <paragraph> to only keep ones with node type Node.TEXT_NODE.
This is an example of method that will return you the desired content
public static String getFirstLevelTextContent(Node node) {
NodeList list = node.getChildNodes();
StringBuilder textContent = new StringBuilder();
for (int i = 0; i < list.getLength(); ++i) {
Node child = list.item(i);
if (child.getNodeType() == Node.TEXT_NODE)
textContent.append(child.getTextContent());
}
return textContent.toString();
}
Within your example it means:
String str = "<paragraph>" + //
"<link>XML</link>" + //
" is a " + //
"<strong>browser based XML editor</strong>" + //
"editor allows users to edit XML data in an intuitive word processor." + //
"</paragraph>";
Document domDoc = null;
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
domDoc = docBuilder.parse(bis);
} catch (Exception e) {
e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}
Output:
paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor
What it does is iterating on all the children of a Node, keeping only TEXT (thus excluding comments, node and so on) and accumulating their respective text content.
There is no direct method in Node or Element to get only the text content at first level.
If you change the last for loop into the following one it behaves as you wanted
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
String tagname = ((Element) n).getTagName();
StringBuilder content = new StringBuilder();
NodeList children = n.getChildNodes();
for(int i=0; i<children.getLength(); i++) {
Node child = children.item(i);
if(child.getNodeName().equals("#text"))
content.append(child.getTextContent());
}
System.out.println(tagname + "=" + content);
}
I do this with Java 8 streams and a helper class:
import java.util.*;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class NodeLists
{
/** converts a NodeList to java.util.List of Node */
static List<Node> list(NodeList nodeList)
{
List<Node> list = new ArrayList<>();
for(int i=0;i<nodeList.getLength();i++) {list.add(nodeList.item(i));}
return list;
}
}
And then
NodeLists.list(node)
.stream()
.filter(node->node.getNodeType()==Node.TEXT_NODE)
.map(Node::getTextContent)
.reduce("",(s,t)->s+t);
Implicitly don't have any function for the actual node text but with a simple trick you can do it. Ask if the node.getTextContent() contains "\n", if that is the case then the actual node don't have any text.
Hope this help.

How do I stop getNodeName() also printing the node type

I'm stumped, hopefully I've just done a dumb thing that I can fix easily.
I'm passing in a String full of XML, being 'XMLstring'. I want to get one of the elements and print the child nodes in a "name = value" on the console. The problem is that the console keeps printing garbage along with the element name that I cannot work out how to get rid of.
Anyway, this code:
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(XMLstring));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("client-details");
Node node = nodes.item(0);
NodeList client_details = node.getChildNodes();
for (int i = 0; i < client_details.getLength(); i++) {
System.out.println(client_details.item(i).getNodeName()+" = "+getTextContents(client_details.item(i)));
}
}
catch (Exception e) {
e.printStackTrace();
}
Gives me the following:
#text =
testing-mode = false
#text =
name = testman
#text =
age = 30
Why is it printing the "#text ="? How do I get rid of it?
I am using NetBeans if that helps.
You want to use getNodeValue() instead:
System.out.println(client_details.item(i).getNodeValue()+" = "+getTextContents(client_details.item(i)));
If you look in the table at the top of this page, you see that for Text nodes, getNodeName() returns #text.
I am curious to see what each of the two function calls in your System.out.println() is printing out separately, only because the entire output should be on one line. One of those two is causing the problems, and i believe it may be internal to the function.
Otherwise, if you use String splitString = string.split("[=]"); it will split up the line based on the delimeter '='
then you can
String splitString = string.split("[=]");
System.out.println(splitString[1] + " = " + splitString[2]);
or, much more simply, make that one small edit that #retrodrone posted
OK I managed to resolve this issue, for anyone else who cares. The problem with the code is that the Node needs to be cast to an Element before you can get the tag name out of it in this manner. Therefore:
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(XMLstring));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("client-details");
Node node = nodes.item(0);
NodeList client_details = node.getChildNodes();
Element elementary;
for (int i = 0; i < client_details.getLength(); i++) {
if(client_details.item(i).getNodeType() == Node.ELEMENT_NODE) {
elementary = (Element) client_details.item(i);
System.out.println(elementary.getTagName()+" = "+getTextContents(client_details.item(i)));
}
}
}
Which produces the desired result, minus that "#text" bollocks :)
testing-mode = false
name = testman
age = 30
Notice the new "if" statement I added inside the for loop and the cast of the node to an element before calling getNodeName, which does work for Elements.

What's a simple way in java to evaluate an xpath on a string and return a result string

A simple answer needed to a simple question.
For example:
String xml = "<car><manufacturer>toyota</manufacturer></car>";
String xpath = "/car/manufacturer";
assertEquals("toyota",evaluate(xml, xpath));
How can I write the evaluate method in simple and readable way that will work for any given well-formed xml and xpath.
Obviously there are loads of ways this can be achieved but the majority seem very verbose.
Any simple ways I'm missing/libraries that can achieve this?
For cases where multiple nodes are returned I just want the string representation of this.
Here you go, the following can be done with Java SE:
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<car><manufacturer>toyota</manufacturer></car>";
String xpath = "/car/manufacturer";
XPath xPath = XPathFactory.newInstance().newXPath();
assertEquals("toyota",xPath.evaluate(xpath, new InputSource(new StringReader(xml))));
}
}
For this use case the XMLUnit library may be a perfect fit:
http://xmlunit.sourceforge.net/userguide/html/index.html#Xpath%20Tests
It provides some additional assert methods.
For example:
assertXpathEvaluatesTo("toyota", "/car/manufacturer",
"<car><manufacturer>toyota</manufacturer></car>");
Using the Xml class from https://github.com/guppy4j/libraries/tree/master/messaging-impl :
Xml xml = new Xml("<car><manufacturer>toyota</manufacturer></car>");
assertEquals("toyota", xml.get("/car/manufacturer"));
I have written assertXPath() in three languages so far. Ruby and Python are the best because they can also parse HTML with its idiosyncrasies via libxml2 and then run XPaths on them. For XML, or for carefully controlled HTML that doesn't have glitches like < for a JavaScript "less than", here's my assertion suite:
private static final XPathFactory xpathFactory = XPathFactory.newInstance();
private static final XPath xpath = xpathFactory.newXPath();
private static #NonNull Document assertHtml(#NonNull String xml) {
try {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
ByteArrayInputStream stream = new ByteArrayInputStream(xml.replaceAll("i < len;", "i < len;").getBytes()); // Because JavaScript ruined HTML's ability to someday be real XML...
return builder.parse(stream);
} catch (SAXParseException e) {
if (e.getLocalizedMessage().startsWith("Unexpected token") && !xml.startsWith("<xml>"))
return assertHtml("<xml>" + xml + "</xml>");
throw e; // a GOTO to 2 lines down...
}
} catch (Throwable e) {
fail(e.getLocalizedMessage());
}
return null;
}
private static #NonNull List<String> assertXPaths(#NonNull Node node, #NonNull String xpathExpression)
{
NodeList nodes = evaluateXPath(node, xpathExpression);
List<String> values = new ArrayList<>();
if (nodes != null)
for (int i = 0; i < nodes.getLength(); i++) {
Node item = nodes.item(i);
// item.getTextContent();
// item.getNodeName();
values.add(item.getNodeValue());
}
if (values.size() == 0)
fail("XPath not found: " + xpathExpression + "\n\nin: " + nodeToString(node) + "\n");
return values;
}
private static #NonNull Node assertXPath(#NonNull Node node, #NonNull String xpathExpression)
{
NodeList nodes = evaluateXPath(node, xpathExpression);
if (nodes != null && nodes.getLength() > 0)
return nodes.item(0);
fail("XPath not found: " + xpathExpression + "\n\nin: " + nodeToString(node) + "\n");
return null; // this can't happen
}
private static NodeList evaluateXPath(#NonNull Node node, #NonNull String xpathExpression) {
NodeList nodes = null;
try {
XPathExpression expr = xpath.compile(xpathExpression);
nodes = (NodeList) expr.evaluate(node, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
fail(e.getLocalizedMessage());
}
return nodes;
}
private static void assertXPath(Node node, String xpathExpression, String reference) {
List<String> nodes = assertXPaths(node, xpathExpression);
assertEquals(1, nodes.size()); // CONSIDER decorate these assertion diagnostics with nodeToString(). And don't check for one text() - join them all together
assertEquals(reference, nodes.get(0).trim()); // CONSIDER same complaint: We need to see the nodeToString() here
}
private static void refuteXPath(#NonNull Node node, #NonNull String xpathExpression) {
NodeList nodes = evaluateXPath(node, xpathExpression);
if (nodes.getLength() != 0)
fail("XPath should not be found: " + xpathExpression); // CONSIDER decorate this with the contents of the node
}
private static #NonNull String nodeToString(#NonNull Node node) {
StringWriter sw = new StringWriter();
Transformer t = null;
try {
t = TransformerFactory.newInstance().newTransformer();
} catch (TransformerConfigurationException e) {
fail(e.getLocalizedMessage());
}
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
try {
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException e) {
fail(e.getLocalizedMessage());
}
return sw.toString();
}
Use those recursively, like this:
Document doc = assertHtml(myHtml);
Node blockquote = assertXPath(doc, "//blockquote[ 'summary_7' = #id ]");
assertXPath(blockquote, ".//span[ contains(., 'Mammal') and strong/text() = 'anteater' ]");
The benefit of finding a node, then asserting a path relative to that node (via .//) is at failure time nodeToString() will only report the node contents, such as my <blockquote>. The assertion diagnostic message won't contain the entire document, making it very easy to read.

Categories

Resources