I have the following XML structure
<CodeSnippet>
<Code id="code1">
<Tags>button java</Tags>
<Snippet>sample code</Snippet>
</Code>
<Code id="code2">
<Tags>eclipse jbutton java</Tags>
<Snippet>sample code</Snippet>
</Code>
<.....>
</CodeSnippet>
Now, I want to retrieve all the Snippet from the above xml when i search using Tags. For instance, if search for "java" then all the nodes that contain tags as "java" must return the snippet.
My search query is:
//Code/Tags[contains(concat(' ',/text(),' '), ' "+ searchTags[0] +" ')]";
Here, searchTags[0] contains "java".
My result set should contain the Snippets of the selected nodes i.e. code1 and code2 from above xml structure.
Try this expression:
//Code/Tags[contains(., 'Java')]/../Snippet
For retrieving all the "Tags" containing "java", I used the below xpath expression,
//Code/Tags/text()[contains(., 'java')]
For retrieving all the "Snippet" related to the tags "java", I used below expression
//Code/Tags[contains(./text(), 'java')]/parent::Code/Snippet/text()
Thanks to #dfsq for helping me out with his expression. Thanks a lot.
You can write :
//Tags//Snippet
If you used XPath
Related
Can you get the text() of a jxpath element or does it not work?
given some nice xml:
<?xml version="1.0" encoding="UTF-8"?>
<AXISWeb xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="AXISWeb.xsd">
<Action>
<Transaction>PingPOS</Transaction>
<PingPOS>
<PingStep>To POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>POS.PROCESSOR18</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>From POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
</Action>
</AXISWeb>
//Does not work:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep[text()]");
I know I can get the text from using
jxpc.getValue("/action/pingPOS[1]/PingStep");
But that's not the point.
Shouldn't text() work? I could find no examples....
P.S. It's also very very picky about case and capitalization. Can you turn that off somehow?
Thanks,
-G
/AXISWeb/Action/PingPOS[1]/PingStep/text() is valid XPath for your document
But, from what I can see from the user guide of jxpath (note: I don't know jxpath at all), getValue() is already supposed to return the textual content of a node, so you don't need to use the XPath text() at all.
So you may use the following:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep");
Extracted from the user guide:
Consider the following XML document:
<?xml version="1.0" ?>
<address>
<street>Orchard Road</street>
</address>
With the same XPath, getValue("/address/street"), will return the string "Orchard Road", while
selectSingleNode("/address/street") - an object of type Element (DOM
or JDOM, depending on the type of parser used). The returned Element
is, of course, <street>Orchard Road</street>.
Now about case insensitive query on tag names, if you are using XPath 2 you can use lower-case() and node() but this is not really recommended, you may better use correct names.
/*[lower-case(node())='axisweb']/*[lower-case(node())='action']/...
or if using XPath 1, you may use translate() but it gets even worse:
/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'axisweb']/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'action']/...
All in all, try to ensure that you use correct query, you know it is case sensitive, so it's better to pay attention to it. As you would do in Java, foo and fOo are not the same variables.
Edit:
As I said, XML and thus XPath is case sensitive, so pingStep cannot match PingStep, use the correct name to find it.
Concerning text(), it is part of XPath 1.0, there is no need for XPath 2 to use it. The JXPath getValue() is already doing the call to text() for you. If you want to do it yourself you will have to use selectSingleNode("//whatever/text()") that will returns an Object of type TextElement (depending on the underlying parser).
So to sum up, the method JXPathContext.getValue() already does the work to select the node's text content for you, so you don't need to do it yourself and explicitly call XPath's text().
From a post that I've anserwed before the method .getTextContent() do the job for you.
No need to use "text()" when you evaluate the Xpath.
Example :
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
System.out.println(doc.getElementsByTagName("retCode").item(0).getTextContent());
If not, you will get the tag and the value. If you want do more take a look at this
I am having some troubles with parsing an XML file.
The Problem:
<verification appearance="4">
content="<myTag>test<myTag>/images/titleIcon.png"
</verification>
For parsing I used the following:
DocumentBuilder db;
db = DocumentBuilderfactory.newInstance().newDocumentBuilder();
this.doc = db.parse()
If I access the content with [...]getChildNodes().item(1).getTextContent(),
it returns the value without the tags.
I assume the problem has something to do with db.parse(). More Specifically, that he parses <myTag> as a node or something like that.
How can I get the full TextContent as String (including Tags etc.)?
Is there a way to tell the parser (if that's the problem) to ignore all Content that is within two tags?
I already googled a lot. But Solutions like using < ; for < isn't that what I'm looking for.
To do this this XML would have to be like this:
<verification appearance="4">
<![CDATA[
content="<myTag>test<myTag>/images/titleIcon.png"
]]>
</verification>
Then the parser will work as you want it to work.
I have these XML files:
master.xml (which uses XInclude to include child1.xml and child2.xml)
child1.xml
child2.xml
Both child1.xml and child2.xml contain a <section> element with some text.
In the XSLT transformation, I 'd want to add the name of the file the <section> element came from, so I get something like:
<section srcFile="child1.xml">Text from child 1.</section>
<section srcFile="child2.xml">Text from child 2.</section>
How do I retrieve the values child1.xml and child2.xml?
Unless you turn off that feature, all XInclude processors should add an #xml:base attribute
with the URL of the included file. So you don't have to do anything, it should already be:
<section xml:base="child1.xml">Text from child 1.</section>
<section xml:base="child2.xml">Text from child 2.</section>
( If you want, you can use XSLT to transform the #xml:base attr into #srcFile. )
I'm 99% sure that once xi:include has been processed, you have a single document (and single infoset) that won't let you determine which URL any given part of the document came from.
I think you will need to place that information directly in the individual included files. Having said that, you can still give document-uri a try, but I think all nodes will return the same URI.
I am reading an XML using dom4j by using XPath techniques for selecting desired nodes. Consider that my XML looks like this:
<Emp_Dir>
<Emp_Classification type ="Permanent" >
<Emp id= "1">
<name>jame</name>
<Emp_Bio>
<age>12</age>
<height>5.4</height>
<weight>78</weight>
</Emp_Bio>
<Emp_Details>
<salary>2000</salary>
<designation>developer</designation>
</Emp_Details>
</Emp>
<Emp id= "2">
<name>jame</name>
<Emp_Bio>
<age>12</age>
<height>5.4</height>
<weight>78</weight>
</Emp_Bio>
<Emp_Details>
<salary>2000</salary>
<designation>developer</designation>
</Emp_Details>
</Emp>
</Emp_Classification>
<Emp_Classification type ="Contract" >
.
.
.
</Emp_Classification>
<Emp_Classification type ="PartTime" >
.
.
.
</Emp_Classification>
</Emp_Dir>
Note: The above XML might looks ugly to you but i only create this dummy file for the sake of understanding and keeping the secracy of my project
When i specify some simple XPath expression, like:
//Emp_Classification (or)
/Emp_Dir/Emp_Classification
then its works fine but when i specify some complex expression like:
/Emp_Dir/Emp_Classification/[#type='Permanent'] (or)
//Emp_Dir/Emp_Classification/[#type='Permanent']
then it gives me the following error:
"Invalid XPath expression: /Emp_Dir/Emp_Classification/[#type='Permanent'] Expected one of '.', '..', '#', '*', <QName>"
Coulde anybody guides me what goes wrong in my XPath?
My second question is that how do i select the Emp_Bio node of Permanent Employees only, does this works?
//Emp_Dir/Emp_Classification/[#type='Permanent']/Emp/Emp_Bio
Use : //Emp_Dir/Emp_Classification[#type='Permanent']
(note the removal of /)
And then use this : //Emp_Dir/Emp_Classification[#type='Permanent']/Emp/Emp_Bio for the latter part of the question.
I have this job in Talend that is supposed to retrieve a field and loop through it.
My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
<empresa>
<imoveis>
<imovel>
[-- some fields -- ]
<fotos>
<nome id="" order="">photo1</nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
</fotos>
</imovel>
[ -- other entries here -- ]
</imoveis>
</empresa>
</empresas>
Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.
Here is an image of the job:
You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(
Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit
If you want to loop on <nome> nodes your Loop XPath Query has to be
"/empresas/empresa/imoveis/imovel/fotos/nome"
and foto_nome XPath Query something like
"text()"
Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").
There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.
The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".
Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.
I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.
Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.
Also make sure that the encoding is set correctly in the tFileInputXML.
I think you are confusing reading XML and extracting XML from XML.
Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:
set the xpath loop to the <nome> elements, like this "//nome"
add 3 columns in the tFileInputXML component id, order and content
get content column with xpath query "."
get id value with xpath query "#id"
get order value with xpath query "#order"
Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.
To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:
<arg2>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exportInscriptionEnLigneType>
<date>2015-04-10</date>
<nbDossiers>2</nbDossiers>
<reference>20150410100</reference>
<listeDossiers>
<dossier>
<numOrdre>1</numOrdre>
<identifiantDossier>AAAAA</identifiantDossier>
</dossier>
<dossier>
<numOrdre>2</numOrdre>
<identifiantDossier>BBBBB</identifiantDossier>
</dossier>
</listeDossiers>
</exportInscriptionEnLigneType>
]]>
</arg2>
In XML above, arg2>element contains an XML document that you may need to parse.
tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.
Hope it will help.
Best regards,