I have the following xml:
<config xmlns="http://www.someurl.com">
<product>
<brand>
<content />
</brand>
</product>
</config>
I'm reading it nicely into JDOM.
However, when I try to use Jaxen to grab the contents, I can't seem to get anything.
Here's an example of what doesn't seem to work:
XPath xpath = new JDOMXPath("config");
SimpleNamespaceContext namespaceContext = new SimpleNamespaceContext();
namespaceContext.addNamespace("", "http://www.someurl.com");
xpath.setNamespaceContext(namespaceContext);
assert xpath.selectNodes(document).size() > 0 : "should find more than 0";
This assertion always fails.
What am I doing wrong?
You have to assign a prefix. Make that call addNamespace("hopfrog", "http://...");
Then make the XPath ("hopfrog:config");
Keep in mind that the prefixes in XML aren't part of the real data model. The real data model assigns a URL, possibly blank, to each element and attribute. You can use any prefix you want in XPath so long as it's bound to the right URL. Since the URL you want it blank, you bind a prefix to 'blank'.
Related
I have this XML:
<Body xmlns:wsu="http://mynamespace">
<Ticket xmlns="http://othernamespace">
<Customer xlmns="">Robert</Customer>
<Products xmlns="">
<Product>a product</>
</Products>
</Ticket>
<Delivered xmlns="" />
<Payment xlmns="">cash</Payment>
</Body>
I am using Java to read it as a DOM document. I want remove the empty namespace attributes (i.e., xmlns=""). Is there any way to do that?
You need to understand that xmlns is a very special attribute. Basically, the xmlns="" is so that your Customer element is in the "unnamed" namespace, rather than the http://othernamespace namespace (and likewise for other elements which would otherwise inherit a default namespace from their ancestors).
If you want to get rid of the xmlns="", you basically need to put the elements into the appropriate namespace - so it's changing the element name. I don't think the W3C API lets you change the name of an element - you may well need to create a new element with the appropriate namespaced-name, and copy the content. Or if you're responsible for creating the document to start with, just use the right namespace.
Can you get the text() of a jxpath element or does it not work?
given some nice xml:
<?xml version="1.0" encoding="UTF-8"?>
<AXISWeb xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="AXISWeb.xsd">
<Action>
<Transaction>PingPOS</Transaction>
<PingPOS>
<PingStep>To POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>POS.PROCESSOR18</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
<PingPOS>
<PingStep>From POS</PingStep>
<PingDate>2012-11-15</PingDate>
<PingTime>16:35:57</PingTime>
</PingPOS>
</Action>
</AXISWeb>
//Does not work:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep/text()");
//Does not work:
jxpc.getValue("/action/pingPOS[1]/PingStep[text()]");
I know I can get the text from using
jxpc.getValue("/action/pingPOS[1]/PingStep");
But that's not the point.
Shouldn't text() work? I could find no examples....
P.S. It's also very very picky about case and capitalization. Can you turn that off somehow?
Thanks,
-G
/AXISWeb/Action/PingPOS[1]/PingStep/text() is valid XPath for your document
But, from what I can see from the user guide of jxpath (note: I don't know jxpath at all), getValue() is already supposed to return the textual content of a node, so you don't need to use the XPath text() at all.
So you may use the following:
jxpc.getValue("/AXISWeb/Action/PingPOS[1]/PingStep");
Extracted from the user guide:
Consider the following XML document:
<?xml version="1.0" ?>
<address>
<street>Orchard Road</street>
</address>
With the same XPath, getValue("/address/street"), will return the string "Orchard Road", while
selectSingleNode("/address/street") - an object of type Element (DOM
or JDOM, depending on the type of parser used). The returned Element
is, of course, <street>Orchard Road</street>.
Now about case insensitive query on tag names, if you are using XPath 2 you can use lower-case() and node() but this is not really recommended, you may better use correct names.
/*[lower-case(node())='axisweb']/*[lower-case(node())='action']/...
or if using XPath 1, you may use translate() but it gets even worse:
/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'axisweb']/*[translate(node(),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') = 'action']/...
All in all, try to ensure that you use correct query, you know it is case sensitive, so it's better to pay attention to it. As you would do in Java, foo and fOo are not the same variables.
Edit:
As I said, XML and thus XPath is case sensitive, so pingStep cannot match PingStep, use the correct name to find it.
Concerning text(), it is part of XPath 1.0, there is no need for XPath 2 to use it. The JXPath getValue() is already doing the call to text() for you. If you want to do it yourself you will have to use selectSingleNode("//whatever/text()") that will returns an Object of type TextElement (depending on the underlying parser).
So to sum up, the method JXPathContext.getValue() already does the work to select the node's text content for you, so you don't need to do it yourself and explicitly call XPath's text().
From a post that I've anserwed before the method .getTextContent() do the job for you.
No need to use "text()" when you evaluate the Xpath.
Example :
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
System.out.println(doc.getElementsByTagName("retCode").item(0).getTextContent());
If not, you will get the tag and the value. If you want do more take a look at this
I'm having big problems with Xpath evaluation using Jaxen.
Here's part of XML i'm evaluating on:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2011-05-31T13:04:08+00:00</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://citeseerx.ist.psu.edu/oai2</request>
<ListRecords>
<record>
<header>
<identifier>oai:CiteSeerXPSU:10.1.1.1.1484</identifier>
<datestamp>2009-05-24</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Winner-Take-All..</dc:title>
<dc:relation>10.1.1.134.6077</dc:relation>
<dc:relation>10.1.1.65.2144</dc:relation>
<dc:relation>10.1.1.54.7277</dc:relation>
<dc:relation>10.1.1.48.5282</dc:relation>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>10.1.1.1.2041-1547151-500-oai_dc</resumptionToken>
</ListRecords>
</OAI-PMH>
I'm using Jaxen because in my use case it's much faster then Apache implementation. I'm using W3C DOM for XML representation.
I need to select all record arguments, and then on selected nodes evaluate other xpaths (it's needed because of my processing architecture).
I'm selecting all record nodes (this works):
/OAI-PMH/ListRecords/record
Then on every selected record node I'm evaluating other xpaths to get needed data:
Select identifier text value (this works):
header/identifier/text()
Select title text value (this does NOT work):
metadata/oai_dc:dc/dc:title/text()
I've registered namespaces prefixes with their URIs (oai_dc and dc). I also tried other xpaths but none of them work:
metadata/dc/title/text()
metadata//dc:title/text()
I've read other stackoverflow questions about xpaths, namespaces and solution to add prefix "oai" with URI "http://www.openarchives.org/OAI/2.0/". I tried adding that "oai:" prefix to nodes without defined prefix but as result I even didn't select record nodes. Any ideas what I'm doing wrong?
Solution:
Problem was about parser (thanks jasso). It wasn't set to be namespace aware - after changing that setting everything works fine, as expected.
I can't see how the XPath expression /OAI-PMH/ListRecords/record can possibly select anything, since your document does not have a {}OAI-PMH element, only a {http://www.openarchives.org/OAI/2.0/}OAI-PMH element. See http://jaxen.codehaus.org/faq.html
I've read XPath - how to select text and thought I had the general idea. But, as always, XPath rears up, hisses at me, and scuttles off to find the nearest bacteria-infested urinal to drown in.
I have a JPA orm.xml file. It looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm
http://java.sun.com/xml/ns/persistence/orm_2_0.xsd"
version="2.0">
<persistence-unit-metadata>
<persistence-unit-defaults>
<schema>test</schema>
<catalog>test</catalog>
</persistence-unit-defaults>
</persistence-unit-metadata>
</entity-mappings>
The following XPath expression should, I would think, select the text from the <schema> element:
/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()
But using Java's XPath implementation, it does not.
More specifically, the following code fails (using JUnit asserts) on the last line. The value of the text variable is the empty string.
// Find the file: URL to the orm.xml I mentioned above.
final URL ormUrl = Thread.currentThread().getContextClassLoader().getResource("META-INF/orm.xml");
assertNotNull(ormUrl);
final XPathFactory xpf = XPathFactory.newInstance();
assertNotNull(xpf);
final XPath xpath = xpf.newXPath();
assertNotNull(xpath);
final XPathExpression expression = xpath.compile("/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()");
assertNotNull(expression);
final String text = expression.evaluate(new InputSource(ormUrl.openStream()));
assertEquals("test", text);
This seems to cast into doubt what little understanding I had of XPath expressions to begin with. Flailing around, I then wanted to see if a simple "/" would select the root element. Mercifully, this returned a non-null NodeList, but the NodeList was empty. I really don't want to hunt the authors of the Java XPath support down and string them up, but it's getting awfully difficult not to follow that course of action.
Please help me shoot XPath in the head once and for all. Thanks.
The problem is that the XML declares a default namespace
xmlns="http://java.sun.com/xml/ns/persistence/orm"
while in your XPath expression you have not provided a corresponding namespace context. See this link for details on how to work with namespace contexts. There's a lot of detail there, but in summary you have to write your own implementation of javax.xml.namespace.NamespaceContext that allows the XPath processor to map namespace prefixes to URIs. In your case you must provide a mapping for the default namespace to the appropriate URI.
I have this job in Talend that is supposed to retrieve a field and loop through it.
My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
<empresa>
<imoveis>
<imovel>
[-- some fields -- ]
<fotos>
<nome id="" order="">photo1</nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
</fotos>
</imovel>
[ -- other entries here -- ]
</imoveis>
</empresa>
</empresas>
Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.
Here is an image of the job:
You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(
Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit
If you want to loop on <nome> nodes your Loop XPath Query has to be
"/empresas/empresa/imoveis/imovel/fotos/nome"
and foto_nome XPath Query something like
"text()"
Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").
There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.
The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".
Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.
I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.
Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.
Also make sure that the encoding is set correctly in the tFileInputXML.
I think you are confusing reading XML and extracting XML from XML.
Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:
set the xpath loop to the <nome> elements, like this "//nome"
add 3 columns in the tFileInputXML component id, order and content
get content column with xpath query "."
get id value with xpath query "#id"
get order value with xpath query "#order"
Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.
To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:
<arg2>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exportInscriptionEnLigneType>
<date>2015-04-10</date>
<nbDossiers>2</nbDossiers>
<reference>20150410100</reference>
<listeDossiers>
<dossier>
<numOrdre>1</numOrdre>
<identifiantDossier>AAAAA</identifiantDossier>
</dossier>
<dossier>
<numOrdre>2</numOrdre>
<identifiantDossier>BBBBB</identifiantDossier>
</dossier>
</listeDossiers>
</exportInscriptionEnLigneType>
]]>
</arg2>
In XML above, arg2>element contains an XML document that you may need to parse.
tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.
Hope it will help.
Best regards,