OAI Jaxen XPath problem

OAI Jaxen XPath problem - java

I'm having big problems with Xpath evaluation using Jaxen.
Here's part of XML i'm evaluating on:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2011-05-31T13:04:08+00:00</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://citeseerx.ist.psu.edu/oai2</request>
<ListRecords>
<record>
<header>
<identifier>oai:CiteSeerXPSU:10.1.1.1.1484</identifier>
<datestamp>2009-05-24</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Winner-Take-All..</dc:title>
<dc:relation>10.1.1.134.6077</dc:relation>
<dc:relation>10.1.1.65.2144</dc:relation>
<dc:relation>10.1.1.54.7277</dc:relation>
<dc:relation>10.1.1.48.5282</dc:relation>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>10.1.1.1.2041-1547151-500-oai_dc</resumptionToken>
</ListRecords>
</OAI-PMH>
I'm using Jaxen because in my use case it's much faster then Apache implementation. I'm using W3C DOM for XML representation.
I need to select all record arguments, and then on selected nodes evaluate other xpaths (it's needed because of my processing architecture).
I'm selecting all record nodes (this works):
/OAI-PMH/ListRecords/record
Then on every selected record node I'm evaluating other xpaths to get needed data:
Select identifier text value (this works):
header/identifier/text()
Select title text value (this does NOT work):
metadata/oai_dc:dc/dc:title/text()
I've registered namespaces prefixes with their URIs (oai_dc and dc). I also tried other xpaths but none of them work:
metadata/dc/title/text()
metadata//dc:title/text()
I've read other stackoverflow questions about xpaths, namespaces and solution to add prefix "oai" with URI "http://www.openarchives.org/OAI/2.0/". I tried adding that "oai:" prefix to nodes without defined prefix but as result I even didn't select record nodes. Any ideas what I'm doing wrong?
Solution:
Problem was about parser (thanks jasso). It wasn't set to be namespace aware - after changing that setting everything works fine, as expected.

I can't see how the XPath expression /OAI-PMH/ListRecords/record can possibly select anything, since your document does not have a {}OAI-PMH element, only a {http://www.openarchives.org/OAI/2.0/}OAI-PMH element. See http://jaxen.codehaus.org/faq.html

Related

Missing NameSpace Information In XML file using EXIficient

I am using EXIficient to convert XML data to EXI and back to XML. Here, i use their EXIficientDemo class. Sample Code:
EXIficientDemo sample = new EXIficientDemo();
sample.parseAndProofFileLocations("FilePath");
sample.codeSchemaLess();
Firstly it converted xml file to EXI then back to XML, when it generate XML from previously generated EXI's file, it loses some information about Namespace.
Actual XML File:
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="ja" xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<body>
<div>
<p xml:id="s1">
<span tts:origin="somethings">somethings</span>
</p>
</div>
</body>
Generated XML File By EXIficient
<?xml version="1.0" encoding="UTF-8"?>
<ns3:tt xmlns:ns3="http://www.w3.org/ns/ttml"
xml:lang="ja"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns3:body><ns3:div>
<ns3:p xml:id="s1">
<ns3:span xmlns:ns4="http://www.w3.org/ns/ttml#styling"
ns4:origin="somethings">somethings</ns3:span>
</ns3:p>
</ns3:div></ns3:body>
In the generated XML file, it is missing xmlns:tts="http://www.w3.org/ns/ttml#styling"
How to fixed this problem? If you can, please help me.

EXIficient may be suppressing unused namespaces. Your example doesn't show any use of the ttm namespace.
As you can see, it didn't retain the namespace prefix for the ttml namespace either (changed to ns3). The generated XML is perfectly valid if the ttml#metadata namespace is unused.
Update
With the updated question, where namespace ttml#styling is used by the origin attribute of the span element, the namespace is retained in the rebuilt XML, but it has been moved to the span element.
This is still a very valid XML document.
Namespace declarations (xmlns) can appear anywhere in a XML document, and applies to the element on which it appears, and all subelements (unless overridden, which is very unusual).
The same namespace can be declared many times on different elements. For simplicity and/or optimization, it is common to declare all namespaces up front, on the root element, using different prefixes, but it is not required to do so.

I read this question by accident and rather late unfortunately.
Just in case people are still struggling with this and are wondering what they can do.
As it was pointed out EXIficient behaves just fine with regards to namespace handling.
Having said that, the EXI specification allows one to preserve prefixes and namespaces (see Preserve Options).
In EXIficient one can set these options accordingly,
e.g.,
EXIFactory.getFidelityOptions().setFidelity(FidelityOptions.FEATURE_PREFIX, true);

XSLT attribute with namespace

Sorry in advance if there has been an answer before like mine, but I have checked and haven't found exactly an answer this simple issue:
<self-uri xlink:href="http://www.harmreductionjournal.com/content/1/1/5">
All I want to do is to select the value of the attribute "xlink:href"
Applying the following selector always returns empty result:
<xsl:value-of select="#xlink:href"/>
I have iterated through the attribute values during processing
xlink:href: http://www.harmreductionjournal.com/content/1/1/5
My question is very simple: how can I get the value of an atribute that has a namespace? It was my understanding it should work like this.
If you can point me to the right question on SO that will also suffice.
Thanks in advance.
EDIT:
Based on the answers I have checked my root stylesheet declaration and it looks like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
exclude-result-prefixes="xlink mml">
I'm guessing the exclude attribute has something to do with the issue. If I replace it I find that it still doesn't work and the result xml is full with it on random places.

Your approach should work, as long as your XSL file has the same prefix mapped to the same namespace. In other words, your XSL file should have the namespace mapping
xmlns:xlink="..."
where ... is the same value as defined in your source document for that namespace prefix.

<xsl:template match="self-uri">
<xsl:value-of
select="#xlink:href"
xmlns:xlink="http://www.w3.org/1999/xlink"/>
</xsl:template>
should do, assuming the input document uses http://www.w3.org/TR/xlink11/. Of course you would usually simply put the xmlns:xlink="http://www.w3.org/1999/xlink" on the xsl:stylesheet element of your code.

What is the XPath expression to select text from orm.xml's <schema> element?

I've read XPath - how to select text and thought I had the general idea. But, as always, XPath rears up, hisses at me, and scuttles off to find the nearest bacteria-infested urinal to drown in.
I have a JPA orm.xml file. It looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm
http://java.sun.com/xml/ns/persistence/orm_2_0.xsd"
version="2.0">
<persistence-unit-metadata>
<persistence-unit-defaults>
<schema>test</schema>
<catalog>test</catalog>
</persistence-unit-defaults>
</persistence-unit-metadata>
</entity-mappings>
The following XPath expression should, I would think, select the text from the <schema> element:
/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()
But using Java's XPath implementation, it does not.
More specifically, the following code fails (using JUnit asserts) on the last line. The value of the text variable is the empty string.
// Find the file: URL to the orm.xml I mentioned above.
final URL ormUrl = Thread.currentThread().getContextClassLoader().getResource("META-INF/orm.xml");
assertNotNull(ormUrl);
final XPathFactory xpf = XPathFactory.newInstance();
assertNotNull(xpf);
final XPath xpath = xpf.newXPath();
assertNotNull(xpath);
final XPathExpression expression = xpath.compile("/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()");
assertNotNull(expression);
final String text = expression.evaluate(new InputSource(ormUrl.openStream()));
assertEquals("test", text);
This seems to cast into doubt what little understanding I had of XPath expressions to begin with. Flailing around, I then wanted to see if a simple "/" would select the root element. Mercifully, this returned a non-null NodeList, but the NodeList was empty. I really don't want to hunt the authors of the Java XPath support down and string them up, but it's getting awfully difficult not to follow that course of action.
Please help me shoot XPath in the head once and for all. Thanks.

The problem is that the XML declares a default namespace
xmlns="http://java.sun.com/xml/ns/persistence/orm"
while in your XPath expression you have not provided a corresponding namespace context. See this link for details on how to work with namespace contexts. There's a lot of detail there, but in summary you have to write your own implementation of javax.xml.namespace.NamespaceContext that allows the XPath processor to map namespace prefixes to URIs. In your case you must provide a mapping for the default namespace to the appropriate URI.

Talend tExtractXMLField

I have this job in Talend that is supposed to retrieve a field and loop through it.
My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
<empresa>
<imoveis>
<imovel>
[-- some fields -- ]
<fotos>
<nome id="" order="">photo1</nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
</fotos>
</imovel>
[ -- other entries here -- ]
</imoveis>
</empresa>
</empresas>
Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.
Here is an image of the job:
You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(
Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

If you want to loop on <nome> nodes your Loop XPath Query has to be
"/empresas/empresa/imoveis/imovel/fotos/nome"
and foto_nome XPath Query something like
"text()"
Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.
The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".
Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.

I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.
Also make sure that the encoding is set correctly in the tFileInputXML.

I think you are confusing reading XML and extracting XML from XML.
Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:
set the xpath loop to the <nome> elements, like this "//nome"
add 3 columns in the tFileInputXML component id, order and content
get content column with xpath query "."
get id value with xpath query "#id"
get order value with xpath query "#order"
Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.
To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:
<arg2>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exportInscriptionEnLigneType>
<date>2015-04-10</date>
<nbDossiers>2</nbDossiers>
<reference>20150410100</reference>
<listeDossiers>
<dossier>
<numOrdre>1</numOrdre>
<identifiantDossier>AAAAA</identifiantDossier>
</dossier>
<dossier>
<numOrdre>2</numOrdre>
<identifiantDossier>BBBBB</identifiantDossier>
</dossier>
</listeDossiers>
</exportInscriptionEnLigneType>
]]>
</arg2>
In XML above, arg2>element contains an XML document that you may need to parse.
tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.
Hope it will help.
Best regards,

Finding all valid xpath from xml

I am trying to write a program in java where in i can find all the xpath for the given xml.I found out the link on the internet xpath generator but it does not work when one element can repeat multipletimes for example if we have xml like the following :-
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Name>
<FirstName>A</FirstName>
<LastName>B</LastName>
<MiddleName>C</MiddleName>
</Name>
<Name>
<FirstName>D</FirstName>
<LastName>E</LastName>
<MiddleName>S</MiddleName>
</Name>
</Report>
It will produce xpaths :-
/Report/Name/firstname for both firstname nodes.
but the expected should be /Report/Name1/firstname and /Report/Name[2]/firstname
Any ideas?

I think you may have to do this yourself.
Using a SAX parser will make it straightforward. Just maintain a stack of the elements you encounter and a count so you can increment the indexes (/Report/Name[1], /Report/Name[2]) easily.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

OAI Jaxen XPath problem - java

I can't see how the XPath expression /OAI-PMH/ListRecords/record can possibly select anything, since your document does not have a {}OAI-PMH element, only a {http://www.openarchives.org/OAI/2.0/}OAI-PMH element. See http://jaxen.codehaus.org/faq.html

Related

Missing NameSpace Information In XML file using EXIficient

XSLT attribute with namespace

What is the XPath expression to select text from orm.xml's <schema> element?

Talend tExtractXMLField

Finding all valid xpath from xml

Categories

Resources