Talend tExtractXMLField

Talend tExtractXMLField - java

I have this job in Talend that is supposed to retrieve a field and loop through it.
My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
<empresa>
<imoveis>
<imovel>
[-- some fields -- ]
<fotos>
<nome id="" order="">photo1</nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
</fotos>
</imovel>
[ -- other entries here -- ]
</imoveis>
</empresa>
</empresas>
Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.
Here is an image of the job:
You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(
Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

If you want to loop on <nome> nodes your Loop XPath Query has to be
"/empresas/empresa/imoveis/imovel/fotos/nome"
and foto_nome XPath Query something like
"text()"
Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.
The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".
Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.

I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.
Also make sure that the encoding is set correctly in the tFileInputXML.

I think you are confusing reading XML and extracting XML from XML.
Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:
set the xpath loop to the <nome> elements, like this "//nome"
add 3 columns in the tFileInputXML component id, order and content
get content column with xpath query "."
get id value with xpath query "#id"
get order value with xpath query "#order"
Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.
To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:
<arg2>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exportInscriptionEnLigneType>
<date>2015-04-10</date>
<nbDossiers>2</nbDossiers>
<reference>20150410100</reference>
<listeDossiers>
<dossier>
<numOrdre>1</numOrdre>
<identifiantDossier>AAAAA</identifiantDossier>
</dossier>
<dossier>
<numOrdre>2</numOrdre>
<identifiantDossier>BBBBB</identifiantDossier>
</dossier>
</listeDossiers>
</exportInscriptionEnLigneType>
]]>
</arg2>
In XML above, arg2>element contains an XML document that you may need to parse.
tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.
Hope it will help.
Best regards,

Related

Jasper Reports Parse XML column-blob

I am using postgresql as database. In db i have one column which contains xml with language codes. I want to parse that xml and get value trough the report language.
select o.name,o.price from bookdefinations o This o.name contains that xml value:
<?xml version="1.0" encoding="UTF-8"?>
<values>
<en-us>en value</en-us>
<es>es value</es>
<ru>ru value</ru>
<tr>tr value</tr>
</values>
Can i parse this with jasper's expressions or can i parse while selecting from db (i dont have any idea how to get report language in select query and parse xml in select)

There is a xmldatasource for jasperreport, you can do a subreport and send the blob to that subreport as an xmldatasource, then you can parse the xml in the subreport and show the content, I'm not sure if you want to show the values or the entire xml, if it is a report I guess you should show the values in a pretty way I mean not as xml, check this link
http://jasperreports.sourceforge.net/sample.reference/xmldatasource/

You could do this directly in postgresql during the select.
PostgreSql has an XML Type and various XML functions.
If you take a look at those functions, you'll see the xpath function which can be used to extract data from XML types during a select statement.
For a reference to xPath, take a look at the tutorial from W3Schools.

xml parsing using sax

I want to parse XML file using SaxParser. I'm trying to fetch the data associated to a tag or its attributes. The XML is in the following format.
<con>
<fig>
<abc>
<name xyz="">
<id>2</id>
</name>
</abc>
</fig>
</con>
I tried with couple of example but not succeeded in the fetching the data, I am requesting you to provide me any suggestion or and working example to increase my knowledge on parsing using SAX.

Use a DOM parser as your xml is very small and its easy to get what you are asking for.

There are my example and help available online, Please Google your question before posting on the stack over flow from next time.
Please check the following links which has good example for saxParsing
How to read XML file in Java – (SAX Parser)
Getting Attributes And its Value
hope this help you.

XML parsing /JAVA

I have an XML, for example
<root>
<config x="xxx" y="yyyy" z="zzz" />
<properties>blah blah blah </properties>
<example>
<name>...</name>
<decr>...</descr>
</example>
<example>
<name>...</name>
<decr>...</descr>
</example>
</root>
and I need to get nodes config, and properties and all values in it.
Thank you

Xpath can fetch you the data in the config tag. You need to create an expression first like this
expression="//root/config/#x", to get value of x,y,z.
For properties, follow this thread :
Parsing XML with XPath in Java
Hope this helps

DOM,DOM4J,SAX..
if the size of XML file is small,you can use DOM or DOM4J,but the size is big , you use the SAX

If you directly want to query or fetch data XPath can help, but if you want the data as Java Objects so that you can use it further then use JAXB

You can use SAX parser to read the xml manipulate its event based parsing and consumes more memory.
If your xml is big and requires lot of manipulations then go-for DOM/DOM4j either is good. DOM4L is very latest. DOM is widely used in industry.
Based on your requirement go for good parser.
Thanks,
Pavan

how to use text() in jxpath

Can you get the text() of a jxpath element or does it not work?
given some nice xml:
<?xml version="1.0" encoding="UTF-8"?>
<AXISWeb xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="AXISWeb.xsd">
<Action>
<Transaction>PingPOS</Transaction>
<PingPOS>
<PingStep>To POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>POS.PROCESSOR18</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>From POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
</Action>
</AXISWeb>
//Does not work:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep[text()]");
I know I can get the text from using
jxpc.getValue("/action/pingPOS[1]/PingStep");
But that's not the point.
Shouldn't text() work? I could find no examples....
P.S. It's also very very picky about case and capitalization. Can you turn that off somehow?
Thanks,
-G

/AXISWeb/Action/PingPOS[1]/PingStep/text() is valid XPath for your document
But, from what I can see from the user guide of jxpath (note: I don't know jxpath at all), getValue() is already supposed to return the textual content of a node, so you don't need to use the XPath text() at all.
So you may use the following:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep");
Extracted from the user guide:
Consider the following XML document:
<?xml version="1.0" ?>
<address>
<street>Orchard Road</street>
</address>
With the same XPath, getValue("/address/street"), will return the string "Orchard Road", while
selectSingleNode("/address/street") - an object of type Element (DOM
or JDOM, depending on the type of parser used). The returned Element
is, of course, <street>Orchard Road</street>.
Now about case insensitive query on tag names, if you are using XPath 2 you can use lower-case() and node() but this is not really recommended, you may better use correct names.
/*[lower-case(node())='axisweb']/*[lower-case(node())='action']/...
or if using XPath 1, you may use translate() but it gets even worse:
/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'axisweb']/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'action']/...
All in all, try to ensure that you use correct query, you know it is case sensitive, so it's better to pay attention to it. As you would do in Java, foo and fOo are not the same variables.
Edit:
As I said, XML and thus XPath is case sensitive, so pingStep cannot match PingStep, use the correct name to find it.
Concerning text(), it is part of XPath 1.0, there is no need for XPath 2 to use it. The JXPath getValue() is already doing the call to text() for you. If you want to do it yourself you will have to use selectSingleNode("//whatever/text()") that will returns an Object of type TextElement (depending on the underlying parser).
So to sum up, the method JXPathContext.getValue() already does the work to select the node's text content for you, so you don't need to do it yourself and explicitly call XPath's text().

From a post that I've anserwed before the method .getTextContent() do the job for you.
No need to use "text()" when you evaluate the Xpath.
Example :
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
System.out.println(doc.getElementsByTagName("retCode").item(0).getTextContent());
If not, you will get the tag and the value. If you want do more take a look at this

OAI Jaxen XPath problem

I'm having big problems with Xpath evaluation using Jaxen.
Here's part of XML i'm evaluating on:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2011-05-31T13:04:08+00:00</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://citeseerx.ist.psu.edu/oai2</request>
<ListRecords>
<record>
<header>
<identifier>oai:CiteSeerXPSU:10.1.1.1.1484</identifier>
<datestamp>2009-05-24</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Winner-Take-All..</dc:title>
<dc:relation>10.1.1.134.6077</dc:relation>
<dc:relation>10.1.1.65.2144</dc:relation>
<dc:relation>10.1.1.54.7277</dc:relation>
<dc:relation>10.1.1.48.5282</dc:relation>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>10.1.1.1.2041-1547151-500-oai_dc</resumptionToken>
</ListRecords>
</OAI-PMH>
I'm using Jaxen because in my use case it's much faster then Apache implementation. I'm using W3C DOM for XML representation.
I need to select all record arguments, and then on selected nodes evaluate other xpaths (it's needed because of my processing architecture).
I'm selecting all record nodes (this works):
/OAI-PMH/ListRecords/record
Then on every selected record node I'm evaluating other xpaths to get needed data:
Select identifier text value (this works):
header/identifier/text()
Select title text value (this does NOT work):
metadata/oai_dc:dc/dc:title/text()
I've registered namespaces prefixes with their URIs (oai_dc and dc). I also tried other xpaths but none of them work:
metadata/dc/title/text()
metadata//dc:title/text()
I've read other stackoverflow questions about xpaths, namespaces and solution to add prefix "oai" with URI "http://www.openarchives.org/OAI/2.0/". I tried adding that "oai:" prefix to nodes without defined prefix but as result I even didn't select record nodes. Any ideas what I'm doing wrong?
Solution:
Problem was about parser (thanks jasso). It wasn't set to be namespace aware - after changing that setting everything works fine, as expected.

I can't see how the XPath expression /OAI-PMH/ListRecords/record can possibly select anything, since your document does not have a {}OAI-PMH element, only a {http://www.openarchives.org/OAI/2.0/}OAI-PMH element. See http://jaxen.codehaus.org/faq.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Talend tExtractXMLField - java

If you want to loop on <nome> nodes your Loop XPath Query has to be "/empresas/empresa/imoveis/imovel/fotos/nome" and foto_nome XPath Query something like "text()" Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that. Also make sure that the encoding is set correctly in the tFileInputXML.

Related

Jasper Reports Parse XML column-blob

xml parsing using sax

XML parsing /JAVA

how to use text() in jxpath

OAI Jaxen XPath problem

Categories

Resources