How to parse SOAP XML with prefixes with jsoup?

How to parse SOAP XML with prefixes with jsoup? - java

This is a sample XML.
<env:Envelope xmlns:env='http://schemas.xmlsoap.org/soap/envelope/'>
<env:Header/>
<env:Body>
<ns0:NotifyRequest xmlns:ns3='http://dummyurl.com'>
<PartTotal>10</PartTotal>
<PartNo>2</PartNo>
</ns0:NotifyRequest>
</env:Body>
My server accepts these requests and this is parsed via Jsoup. I get the element by tag "ns0:NotifyRequest" then look for sub elements.
My problem is; when the prefix changes, my parser fails because the element tag "ns0:NotifyRequest" is written hard-coded, it gives an error when received XML is like "ns3:NotifyRequest".
Is there a way to ignore this prefix and get the NotifyRequest element? I know I can get the inner elements not directly from their 1st level upper element. (I mean I can use BodyElement.getElementsByTag("PartTotal") instead of NotifyRequestElement.getElementsByTag("PartTotal"), they do the same job) But I want to use regex or something and ignore that random prefix and get the NotifyRequest element.

Related

XML Parse error

I am getting below error pls help
"parse error:
Error on line 1 of document :
The markup in the document preceding the root element must be well-formed.
Nested exception: The markup in the document preceding the root element must be well-formed.
XML is below
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<'env:Envelope' xmlns>:env=\"http://www.w3.org/2003/05/soap-envelope\" xmlns:ns1=\"urn:zimbraAdmin\">
xmlns:ns2=\"urn:zimbraAdmin\"><env:Header><ns2:context/></env:Header><env:Body>
<ModifyAccountRequest xmlns=\"urn:zimbraAdmin\"><id>4d41ec71-d898-42b8-b522-3c3cdc5583a0</id>
<a n=\"zimbraIsAdminAccount\">TRUE</a>
</ModifyAccountRequest></env:Body></env:Envelope>

That was terribly malformed. Issues are highlighted below:
1. Every instance of \" should be replaced with a simple " as the slash indicates a literal character to Java and is not needed in normal XML.
2. There should be no single quotes around <'env:Envelope' and I honestly have no idea where they came from.
3. The closing carat at xmlns>:env= should be removed, as should the one at the end of the physical line xmlns:ns1=\"urn:zimbraAdmin\">. Removing that carat brings the next namespace statement (which seems unnecessarily identical to ns1) into the envelope tag.
I have no idea what caused the envelope to become so malformed, but you should read up on the purpose of the values and variables you were setting with the xmlns and namespace references so next time you at least uderstand what all the parts of the XML request do. This will help you troubleshoot your own documents in the future.
In the meantime, since you seem to be at a total loss, here is the XML with the errors above corrected.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:ns1="urn:zimbraAdmin" xmlns:ns2="urn:zimbraAdmin">
<env:Header>
<ns2:context/>
</env:Header>
<env:Body>
<ModifyAccountRequest xmlns="urn:zimbraAdmin">
<id>4d41ec71-d898-42b8-b522-3c3cdc5583a0</id>
<a n="zimbraIsAdminAccount">TRUE</a>
</ModifyAccountRequest>
</env:Body>
</env:Envelope>

How to remove an specific xml attribute from org.w3c.dom.Document

I have this XML:
<Body xmlns:wsu="http://mynamespace">
<Ticket xmlns="http://othernamespace">
<Customer xlmns="">Robert</Customer>
<Products xmlns="">
<Product>a product</>
</Products>
</Ticket>
<Delivered xmlns="" />
<Payment xlmns="">cash</Payment>
</Body>
I am using Java to read it as a DOM document. I want remove the empty namespace attributes (i.e., xmlns=""). Is there any way to do that?

You need to understand that xmlns is a very special attribute. Basically, the xmlns="" is so that your Customer element is in the "unnamed" namespace, rather than the http://othernamespace namespace (and likewise for other elements which would otherwise inherit a default namespace from their ancestors).
If you want to get rid of the xmlns="", you basically need to put the elements into the appropriate namespace - so it's changing the element name. I don't think the W3C API lets you change the name of an element - you may well need to create a new element with the appropriate namespaced-name, and copy the content. Or if you're responsible for creating the document to start with, just use the right namespace.

In XSLT, how do I get the filepath of the xml file of a certain element if that xml file was included with xinclude?

I have these XML files:
master.xml (which uses XInclude to include child1.xml and child2.xml)
child1.xml
child2.xml
Both child1.xml and child2.xml contain a <section> element with some text.
In the XSLT transformation, I 'd want to add the name of the file the <section> element came from, so I get something like:
<section srcFile="child1.xml">Text from child 1.</section>
<section srcFile="child2.xml">Text from child 2.</section>
How do I retrieve the values child1.xml and child2.xml?

Unless you turn off that feature, all XInclude processors should add an #xml:base attribute
with the URL of the included file. So you don't have to do anything, it should already be:
<section xml:base="child1.xml">Text from child 1.</section>
<section xml:base="child2.xml">Text from child 2.</section>
( If you want, you can use XSLT to transform the #xml:base attr into #srcFile. )

I'm 99% sure that once xi:include has been processed, you have a single document (and single infoset) that won't let you determine which URL any given part of the document came from.
I think you will need to place that information directly in the individual included files. Having said that, you can still give document-uri a try, but I think all nodes will return the same URI.

Talend tExtractXMLField

I have this job in Talend that is supposed to retrieve a field and loop through it.
My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
<empresa>
<imoveis>
<imovel>
[-- some fields -- ]
<fotos>
<nome id="" order="">photo1</nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
<nome id="" order=""></nome>
</fotos>
</imovel>
[ -- other entries here -- ]
</imoveis>
</empresa>
</empresas>
Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.
Here is an image of the job:
You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(
Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

If you want to loop on <nome> nodes your Loop XPath Query has to be
"/empresas/empresa/imoveis/imovel/fotos/nome"
and foto_nome XPath Query something like
"text()"
Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.
The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".
Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.

I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.
Also make sure that the encoding is set correctly in the tFileInputXML.

I think you are confusing reading XML and extracting XML from XML.
Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:
set the xpath loop to the <nome> elements, like this "//nome"
add 3 columns in the tFileInputXML component id, order and content
get content column with xpath query "."
get id value with xpath query "#id"
get order value with xpath query "#order"
Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.
To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:
<arg2>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exportInscriptionEnLigneType>
<date>2015-04-10</date>
<nbDossiers>2</nbDossiers>
<reference>20150410100</reference>
<listeDossiers>
<dossier>
<numOrdre>1</numOrdre>
<identifiantDossier>AAAAA</identifiantDossier>
</dossier>
<dossier>
<numOrdre>2</numOrdre>
<identifiantDossier>BBBBB</identifiantDossier>
</dossier>
</listeDossiers>
</exportInscriptionEnLigneType>
]]>
</arg2>
In XML above, arg2>element contains an XML document that you may need to parse.
tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.
Hope it will help.
Best regards,

The markup must be well-formed

First off, let me say I am a new to SAX and Java.
I am trying to read information from an XML file that is not well formed.
When I try to use the SAX or DOM Parser I get the following error in response:
The markup in the document following the root element must be well-formed.
This is how I set up my XML file:
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
...
Can I force the SAX or DOM to parse XML files even if they are not well formed XML?
Thank you for your help. Much appreciated.
Haythem

Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:
<?xml version="1.0"?>
<wrapper>
<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
</wrapper>
There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.

Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.
(I know that this is not of too much help...)

As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to parse SOAP XML with prefixes with jsoup? - java

Related

XML Parse error

How to remove an specific xml attribute from org.w3c.dom.Document

In XSLT, how do I get the filepath of the xml file of a certain element if that xml file was included with xinclude?

Talend tExtractXMLField

The markup must be well-formed

Categories

Resources