i have an xml file that contains lots of different nodes. some in particularly are nested like this:
<emailAddresses>
<emailAddress>
<value>sambj1981#gmail.com</value>
<typeSource>WORK</typeSource>
<typeUser></typeUser>
<primary>false</primary>
</emailAddress>
<emailAddress>
<value>sambj#hotmail.co.uk</value>
<typeSource>HOME</typeSource>
<typeUser></typeUser>
<primary>true</primary>
</emailAddress>
</emailAddresses>
From the above node, what i want to do is go through each and get the values inside it(value, typeSource, typeUser etc) and put them in a POJO.
i tried to see if i can use this xpath expression "//emailAddress" but it doesnt return me the tags inside inside it. maybe i am doing it wrong. i am pretty new to using xpath.
i could do something like this:
//emailAddress/value | //emailAddress/typeSource | .. but doing that will list all elements values together if im not mistaken leaving me to work out when i have finished reading from a specific emailAddress tag and going to the next emailAddress tag.
well to sum up my needs i basically want this to be returned similar to how you would return results from a bog standard sql query that returns results in a row. i.e. if your sql query produces 10 emailAddress, it will return each emailAddress in a row and i can simply iterate over "each emailAddress" and get the appropriate value based on the colunm name or index.
No,
//emailAddress
doesn't return the tags inside, that is correct. What it does return is a NodeList/NodeSet. To actually get the values you can do something like this:
String emailpath = "//emailAddress";
String emailvalue = ".//value";
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
Document document;
public XpathStuff(String file) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
document = builder.parse(bis);
NodeList nodeList = getNodeList(document, emailpath);
for(int i = 0; i < nodeList.getLength(); i++){
System.out.println(getValue(nodeList.item(i), emailvalue));
}
bis.close();
}
public NodeList getNodeList(Document doc, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (NodeList) pathExpr.evaluate(doc, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
//extracts the String value for the given expression
private String getValue(Node n, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (String) pathExpr.evaluate(n,
XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
Maybe I should point out that when iterating over the Nodelist, in .//values the first dot means the current context. Without the dot you would get the first node all the time.
//emailAddress/*
will get these nodes in the document order.
It depends on how you want to iterate through the nodes. We do all our XML using XOM (http://www.xom.nu/) which is an easy reliable Java package. It's possible to write your own strategy using XOM calls.
If you use XStream you can set it up quite easily. Like so:
#XStreamAlias( "EmailAddress" )
public class EmailAddress {
#XStreamAlias()
private String value;
#XStreamAlias()
private String typeSource;
#XStreamAlias()
private String typeUser;
#XStreamAlias()
private boolean primary;
// ... the rest omitted for brevity
}
You then marshal & unmarshal quite simply like so:
XStream xstream = new XStream();
xstream.processAnnotations( EmailAddress.class );
xstream.toXML( /* Object value here */ emailAddress );
xstream.fromXML( /* String xml value here */ "" );
IDK if you have to use XPath or not, but if not I'd consider an out of the box solution like this.
I am totally aware this is not what you were asking for, but may consider using jibx. This is a tool for human-readable XML to POJO mapping.
So I believe you could generate mapping for your email structure in a quick way and let the jibx do the work for you.
Related
I am attempting to begin writing a program which uses DOM4j with which I wish to parse a XML file, save it to some tables and finally allow the user to manipulate the data.
Unfortunately I am stuck on the most basic step, the parsing.
Here is the portion of my XML I am attempting to include:
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.04">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr>
<MsgId>000022222</MsgId>
When I attempt to find the root of my XML it does return the root correctly as "Document". When I attempt to get the child node from Document it also correctly gives me "BkToCstmrDbtCdtNtfctn". The problem is that when I try to go any further and get the child nodes from "Bk" I can't. I get this in the console:
org.dom4j.tree.DefaultElement#2b05039f [Element: <BkToCstmrDbtCdtNtfctn uri: urn:iso:std:iso:20022:tech:xsd:camt.054.001.04 attributes: []/>]
Here is my code, I would appreciate any feedback. Ultimately I want to get the "MsgId" attribute back but in general I just want to figure how to parse deeper into the XML because in reality it probably has about 25 layers.
public static Document getDocument(final String xmlFileName){
Document document = null;
SAXReader reader = new SAXReader();
try{
document = reader.read(xmlFileName);
}
catch (DocumentException e)
{
e.printStackTrace();
}
return document;
}
public static void main(String args[]){
String xmlFileName = "C:\\Users\\jhamric\\Desktop\\Camt54.xml";
String xPath = "//Document";
Document document = getDocument(xmlFileName);
Element root = document.getRootElement();
List<Node> nodes = document.selectNodes(xPath);
for(Iterator i = root.elementIterator(); i.hasNext();){
Element element = (Element) i.next();
System.out.println(element);
}
for(Iterator i = root.elementIterator("BkToCstmrDbtCdtNtfctn");i.hasNext();){
Element bk = (Element) i.next();
System.out.println(bk);
}
}
}
The best approach is probably to use XPath, but since the XML document uses namespaces, you cannot use the "simple" selectNodes methods in the API. I would create a helper method to easily evaluate any XPath expression on either the Document or the Element level:
public static void main(String[] args) throws Exception {
Document doc = getDocument(...);
Map<String, String> namespaceContext = new HashMap<>();
namespaceContext.put("ns", "urn:iso:std:iso:20022:tech:xsd:camt.054.001.04");
// Select the first GrpHdr element in document order
Element element = (Element) select("//ns:GrpHdr[1]", doc, namespaceContext);
System.out.println(element.asXML());
// Select the text content of the MsgId element
Text msgId = (Text) select("./ns:MsgId/text()", element, namespaceContext);
System.out.println(msgId.getText());
}
static Object select(String expression, Branch contextNode, Map<String, String> namespaceContext) {
XPath xp = contextNode.createXPath(expression);
xp.setNamespaceURIs(namespaceContext);
return xp.evaluate(contextNode);
}
Note that the XPath expression must use namespace prefixes that is mapped to the namespace URIs used in the input document, but that the actual value of the prefix doesn't matter.
I've come across and problem that I've looked up on stack overflow but none of the solutions seems to solve the problem for me.
I'm retrieving XML data from Yahoo and it comes back as below (truncated for brevity's sake).
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fantasy_content xmlns="http://fantasysports.yahooapis.com/fantasy/v2/base.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" copyright="Data provided by Yahoo! and STATS, LLC" refresh_rate="31" time="55.814027786255ms" xml:lang="en-US" yahoo:uri="http://fantasysports.yahooapis.com/fantasy/v2/league/328.l.108462/settings">
<league>
<league_key>328.l.108462</league_key>
<league_id>108462</league_id>
<draft_status>postdraft</draft_status>
</league>
</fantasy_content>
I've been having a problem getting XPath to retrieve any elements so I've written a unit test to try to resolve it and it looks like:
final File file = new File("league-settings.xml");
javax.xml.parsers.DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
javax.xml.parsers.DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document doc = dBuilder.parse(file);
javax.xml.xpath.XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new YahooNamespaceContext());
final String expression = "yfs:league";
final XPathExpression expr = xPath.compile(expression);
Object nodes = expr.evaluate(doc, XPathConstants.NODESET);
assert(nodes instanceof NodeList);
NodeList leagueNodes = (NodeList)nodes;
int leaguesLength = leagueNodes.getLength();
assertEquals(leaguesLength, 1);
The YahooNamespaceContext class I created to map the namespaces looks as follows:
public class YahooNamespaceContext implements NamespaceContext {
public static final String YAHOO_NS = "http://www.yahooapis.com/v1/base.rng";
public static final String DEFAULT_NS = "http://fantasysports.yahooapis.com/fantasy/v2/base.rng";
public static final String YAHOO_PREFIX = "yahoo";
public static final String DEFAULT_PREFIX = "yfs";
private final Map<String, String> namespaceMap = new HashMap<String, String>();
public YahooNamespaceContext() {
namespaceMap.put(DEFAULT_PREFIX, DEFAULT_NS);
namespaceMap.put(YAHOO_PREFIX, YAHOO_NS);
}
public String getNamespaceURI(String prefix) {
return namespaceMap.get(prefix);
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator<String> getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
Any help with people with more experience with XML namespaces or debugging tips into Xpath compilation/evaluation would be appreciated.
If the problem is that you're getting zero as the length of the result nodelist, have you tried changing
final String expression = "yfs:league";
to
final String expression = "//yfs:league";
?
It appears that the context for evaluating your XPath expressions, doc, is the root node of the document. dBuilder.parse(file) returns the document root node, not the outermost element (a.k.a. document element). Remember, in XPath, a root node is not an element. So doc
is not the yfs:fantasy_content element node but is its (invisible) parent.
In that context, the XPath expression "yfs:league" will only select an element that is a direct child of that root node, of which there is no yfs:league -- only yfs:fantasy_content.
The XPath expression yfs:league is equivalent to child::yfs:league. It means: find direct children nodes (not descendants) of doc with the specified local name (league) and namespace URI (http://fantasysports.yahooapis.com/fantasy/v2/base.rng).
You must take into account the outermost element (fantasy_content) or search for descendant instead of child nodes.
Replacing
final String expression = "yfs:league";
with
final String expression = "yfs:fantasy_content/yfs:league";
or with
final String expression = "//yfs:league";
will solve the problem.
im struggling with extracting value from a specific node in my XML document. Im using w3c.DOM as i have found many tutorials on it but now i cant find any good ones for this task - i had to use XPath for this task instead.
I always know the exact path (and passing it as a string, example: "Car/Wheels/Wheel[#Index=´x´]/" ) leading to a node from which i need to extract a value (a string) and return it (im converting the string into doubles and integers in other methods later). Variable myDoc is Document myDoc.
How do i get this value?
private String xPathValue(String path){
XPath myPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(path);
String result = (String)expr.evaluate(myDoc);
return result;
}
This however doesnt work and i dont want to create any NodeList since i know the exact paths. Im looking for something that works like Node.getTextContent();
You have 2 options
1) Alter you xPath to return the value of the node instead of the node itself
Using expression: Car/Wheels/Wheel[#Index=´x´]/text()
private String xPathValue(String path) {
XPath myPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(path);
String result = (String)expr.evaluate(myDoc, XPathConstants.STRING);
return result;
}
2) Use the same xpath query but return a node type
Using expression: Car/Wheels/Wheel[#Index=´x´]
private String xPathValue(String path) {
XPath myPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(path);
Node result = (Node)expr.evaluate(myDoc, XPathConstants.NODE);
return result.getTextContent();
}
I have small Strings with XML, like:
String myxml = "<resp><status>good</status><msg>hi</msg></resp>";
which I want to query to get their content.
What would be the simplest way to do this?
XPath using Java 1.5 and above, without external dependencies:
String xml = "<resp><status>good</status><msg>hi</msg></resp>";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(xml));
String status = xpath.evaluate("/resp/status", source);
System.out.println("satus=" + status);
Using dom4j, similar to McDowell's solution:
String myxml = "<resp><status>good</status><msg>hi</msg></resp>";
Document document = new SAXReader().read(new StringReader(myxml));
String status = document.valueOf("/resp/msg");
System.out.println("status = " + status);
XML handling is a bit simpler using dom4j. And several other comparable XML libraries exist. Alternatives to dom4j are discussed here.
Here is example of how to do that with XOM:
String myxml = "<resp><status>good</status><msg>hi</msg></resp>";
Document document = new Builder().build(myxml, "test.xml");
Nodes nodes = document.query("/resp/status");
System.out.println(nodes.get(0).getValue());
I like XOM more than dom4j for its simplicity and correctness. XOM won't let you create invalid XML even if you want to ;-) (e.g. with illegal characters in character data)
You could try JXPath
After your done with simple ways to query XML in java. Look at XOM.
#The comments of this answer:
You can create a method to make it look simpler
String xml = "<resp><status>good</status><msg>hi</msg></resp>";
System.out.printf("satus= %s\n", getValue("/resp/status", xml ) );
The implementation:
public String getValue( String path, String xml ) {
return XPathFactory
.newInstance()
.newXPath()
.evaluate( path , new InputSource(
new StringReader(xml)));
}
convert this string into a DOM object and visit the nodes:
Document dom= DocumentBuilderFactory().newDocumentBuilder().parse(new InputSource(new StringReader(myxml)));
Element root= dom.getDocumentElement();
for(Node n=root.getFirstChild();n!=null;n=n.getNextSibling())
{
System.err.prinlnt("Current node is:"+n);
}
Here is a code snippet of querying your XML with VTD-XML
import com.ximpleware.*;
public class simpleQuery {
public static void main(String[] s) throws Exception{
String myXML="<resp><status>good</status><msg>hi</msg></resp>";
VTDGen vg = new VTDGen();
vg.setDoc(myXML.getBytes());
vg.parse(false);
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/resp/status");
int i = ap.evalXPath();
if (i!=-1)
System.out.println(" result ==>"+vn.toString(i));
}
}
You can use Jerry to query XML similar to jQuery.
jerry(myxml).$("status")
I am parsing a XML file in Java using the W3C DOM.
I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.
The node looks like that:
<td><b>this</b> is a <b>test</b></td>
What function do I have to use to get that:
"<b>this</b> is a <b>test</b>"
I know this was asked long ago but for the next person searching (was me today), this works with JDOM:
JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));
This passes a list of all child nodes into outputString, which will serialize them out in order.
You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter());
. See how-to-pretty-print-xml-from-java
What do you say about this ?
I had same problem today on android, but i managed to make simple "serializator"
private String innerXml(Node node){
String s = "";
NodeList childs = node.getChildNodes();
for( int i = 0;i<childs.getLength();i++ ){
s+= serializeNode(childs.item(i));
}
return s;
}
private String serializeNode(Node node){
String s = "";
if( node.getNodeName().equals("#text") ) return node.getTextContent();
s+= "<" + node.getNodeName()+" ";
NamedNodeMap attributes = node.getAttributes();
if( attributes!= null ){
for( int i = 0;i<attributes.getLength();i++ ){
s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
}
}
NodeList childs = node.getChildNodes();
if( childs == null || childs.getLength() == 0 ){
s+= "/>";
return s;
}
s+=">";
for( int i = 0;i<childs.getLength();i++ )
s+=serializeNode(childs.item(i));
s+= "</"+node.getNodeName()+">";
return s;
}
er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.
edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML book talks about the Load and Save module of Java DOM.
See in particular the LSSerializer which looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilter to not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)
Reading the O'Reilly book seems to indicate doing something like this:
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS lsImpl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
LSSerializer serializer = lsImpl.createLSSerializer();
String nodeString = serializer.writeToString(node);
node.getTextContent();
You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.
To remove unneccesary tags probably such code can be used:
DOMConfiguration config = serializer.getDomConfig();
config.setParameter("canonical-form", true);
But it will not always work, because "canonical-form=true" is optional