Parsing xml without namespace

Parsing xml without namespace - java

I have a parsing problem that appears when I try to parse from a String, containg a xml, to a org.w3c.dom.Document.
Here is a example of a xml String that i'm trying to parse:
<enviNFe xmlns="http://www.portalfiscal.inf.br/nfe" versao="2.00">
<idLote>123</idLote>
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
...
</NFe>
</enviNFe>
The problem is, that after que String had been parsed, by the following code:
private Document documentFactory(String xml) throws SAXException,
IOException, ParserConfigurationException, DocumentException, TransformerException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document document = factory.newDocumentBuilder().parse(
new ByteArrayInputStream(xml.getBytes()));
return document;
}
The tag NFe loads without the namespace (xmlns="http://www.portalfiscal.inf.br/nfe")
I want to know why this happens, and what I could do to solve this.
Any help will be great.
Thanks, and sorry for my english.
------EDIT----
For better understanding:
This xml will be signed right after de parsing, and will be sent to a Government's server(Brazil).
After this, I do another request to this server, to verify if it was processed or not. If it was, I will get a positive response in case of any error.
The first problem I had, was that the xml was malformed. This happened because I was sending the xml without that namespace in the tag NFe.
To solve this I added it(namespace) right in the File, after the xml had being signed.
This problem in fact had been solved, but another occurred: the difference in the signature.
Because I signs the xml without the namespace, and send with it.

From what i can put together from your various comments, i think you are misunderstanding how xml works. you indicate that you manually added the namespace to the NFe element. however, in your xml example, the NFe node already has that namespace.
In this xml:
<enviNFe xmlns="http://www.portalfiscal.inf.br/nfe" versao="2.00">
<idLote>123</idLote>
<NFe>
...
</NFe>
</enviNFe>
all of the nodes have the "http://www.portalfiscal.inf.br/nfe" namespace. by putting the xmlns="..." attribute on the parent node, the namespace is applied to that node and all of the child nodes with the same prefix (in this case, no prefix).

It is returning the correct document. To test it you can just walk through your document.
doc.getFirstChild().getFirstChild().getNextSibling().getNextSibling().getNextSibling().getNamespaceURI();
Or try to get the tag by it's name:
NodeList tags = doc.getElementsByTagNameNS("http://www.portalfiscal.inf.br/nfe", "NFe");

Related

Java XML Document converting " to "(literal quote) upon parsing/converting to Document

I have this problem where I need to send to soap webservice that requires the root tag to have an xml data, this the xml that I'm trying to send:
<root><test key="Applicants">this is a data</test></root>
I need to append this to the SoapBody object as a document with this code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document result = builder.parse(new ByteArrayInputStream(request.getRequest().getBytes()));
Then adding it to the SoapBody to be sent to the webservice.
However, upon sending this request and tracing the logs, it's actually reverting the " character to literal quotes (")
This is the xml being sent:
<root><test key="Applicants">this is a data</test></root>
As you can see, the " is being transformed to literal quotes, how can I keep the original data within root tag (which has the ")? It seems to be transforming it when I'm converting it to a Document object.
Would appreciate any help. Thanks.
Edit:
The webservice actually requires this format (from their documentation and sample xml requests), if this isn't possible, is it a limitation? Should I user another framework?

The " and " are completely equivalent in this context. You haven't actually said whether this is causing a problem: if it is, then it's because some recipient of the XML isn't processing it correctly. Incidentally, it would also be legitimate to convert the > to >.
When you parse XML and re-serialise it, irrelevant details like redundant whitespace get lost - just as if you copy this text into your text editor, the line-wrapping and font size gets lost.

Extracting elements from an HTTP XML response--HTTP Client & Java

So I've gotten help from here already so I figured why not try it out again!? Any suggestions would be greatly appreciated.
I'm using HTTP client and making a POST request; the response is an XML body that looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<CartLink
xmlns="http://api.gsicommerce.com/schema/ews/1.0">
<Name>vSisFfYlAPwAAAE_CPBZ3qYh</Name>
<Uri>carts/vSisFfYlAPwAAAE_CPBZ3qYh</Uri>
</CartLink>
Now...
I have an HttpEntity which is
[HttpResponse].getEntity().
Then I get a String representation of the response (which is XML in this case) by saying
String content = EntityUtils.toString(HttpEntity)
I tried following some of the suggestions on this post: How to create a XML object from String in Java? but it did not seem to work for me. When I built up the document it still appeared to be null.
MY END GOAL here is just to get the NAME field.. i.e. the "vSisFfYlAPwAAAE_CPBZ3qYh" part. So do I want to build up a document and then extract it...? Or is there a simpler way? I've been trying different things and I can't seem to get it to work.
Thanks for all of the help guys, it is most appreciated!!

Instead of trying to extract the value with string manipulation, try to use Java's inbuilt ability to parse XML. That's a much better approach. Http Components returns responses in an XML format - there's a reason for that. :)
Here's probably one way to solve your problem:
// Parse the response using DocumentBuilder so you can get at elements easily
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(response);
Element root = doc.getDocumentElement();
// Now let's say you have not one, but 'n' nodes that contain the value
// you're looking for. Use NodeList to get a list of all those nodes and just
// pull out the tag/attribute's value you want.
NodeList nameNodesList = doc.getElementsByTagName("Name");
ArrayList<String> nameValues = null;
// Now iterate through the Nodelist to get the values you want.
for (int i=0; i<nameNodesList.getLength(); i++){
nameValues.add(nameNodesList.item(i).getTextContent());
}
The ArrayList "nameValues" will now hold every single value contained within "Name" tags. You could also create a HashMap to store a key value pair of Nodes and their respective text contents.
Hope this helps.

Parsing 'pseudo' XML (that is, not well formed) in java?

I have some xml that looks like this:
<xml><name>oscar</name><race>puppet</race><class>grouch</class></xml>
The tags change and are variable, so there won't always be a 'name' tag.
I've tried 3 or 4 parses and they all seem to choke on it. Any hints?

Just because it doesn't have a defined schema, doesn't mean it isn't "valid" XML - your sample XML is "well formed".
The dom4j library will do it for you. Once parsed (your XML will parse OK) you can iterate through child elements, no matter what their tag name, and work with your data.
Here's an example of how to use it:
import org.dom4j.*;
String text = "<xml><name>oscar</name><race>puppet</race><class>grouch</class></xml>";
Document document = DocumentHelper.parseText(text);
Element root = document.getRootElement();
for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
Element element = (Element) i.next();
String tagName = element.getQName();
String contents = element.getText();
// do something
}

This is valid xml; try adding an XML Schema that allows for optional elements. If you can write an xml schema, you can use JAXB to parse it. XML allows for having optional elements; it isn't too "strict" about it.

Your XML sample is well-formed XML, and if anything "chokes" on it then it would be useful for us to know exactly what the symptoms of the "choking" are.

JDOM - SaxBuilder - Content is not allowed in prolog

I am having trouble parsing an XML file into a JDOM Document instance using the SAXBuilder.
It throws the following exception:
[Fatal Error] :1:1: Content is not allowed in prolog.
I have found and read all those threads on Stack Exchange and on other places in the Internet and tried various things to debug the error.
I have end up with the following code snippet, which throws as well.
String template = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<server></server>";
InputStream in = new StringBufferInputStream(template);
return saxBuilder.build(in);
What's wrong with it?
I am ashamed to admit that but it turned out that the error wasn't produced by the snippet I have shown here but rather at a later point where I was comparing the parsed XML against another one using the XMLUnit library.
The think that made me believe that the error was in the presented lines was the content of the error message.
I believe it would be appropriate to close (and delete, if that's possible) this question as it does not mean any value.

This error usually means you have text before your xml declaration.
In your snippet the xml seems fine. The issue may not be in your document though. If you have a schema or other referenced xml file, the error could in fact refer to one of them.

I suspect the problem is somewhere else. The following code (using dom4j) works for me:
public static void main(String[] args) throws DocumentException {
String template = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<server></server>";
SAXReader saxReader = new SAXReader();
InputStream is = new StringBufferInputStream(template);
Document document = saxReader.read(is);
System.out.println(document.asXML());
}
Note also that StringBufferInputStream is deprecated. An alternative is
StringReader sr = new StringReader(template);
Document document = saxReader.read(sr);
So, the problem is not in your XML snippet, but probably in saxBuilder.build(...)

SAXParseException when “ is used in XML

I'm getting a "org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 26; The entity "ldquo" was referenced, but not declared." exception when reading an XML document. I'm reading it as follows:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlBody));
Document document = builder.parse(is);
And then there's an exception on builder.parse(is);
From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
How do I fix this problem?
Thanks

From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
Well, unless you declare the entity then the document isn't XML and you won't be able to process it using an XML parser.
When you are asked to process input that isn't well-formed XML, the best approach is to fix the process that created the document (the whole idea of using XML for interchange relies on it being well-formed XML). The alternatives are to "repair" the document to turn it into well-formed XML (which you say you can't do), or to forget the fact that it was intended to be XML, and treat it as you would any proprietary non-XML format.
Not a pleasant set of choices - but that's the mess you get into when people pay lip-service to XML but fail to conform to the letter of the standard.

Try
factory.setExpandEntityReferences(false);
This will prevent the parser from trying to expand entities.
EDIT: How about this http://xerces.apache.org/xerces2-j/features.html#dom.create-entity-ref-nodes -- The top of that page has an example of how to set features on the underlying parser. This should cause the parser to create entity-reference DOM nodes instead of trying to expand the entities.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing xml without namespace - java

Related

Java XML Document converting " to "(literal quote) upon parsing/converting to Document

Extracting elements from an HTTP XML response--HTTP Client & Java

Parsing 'pseudo' XML (that is, not well formed) in java?

JDOM - SaxBuilder - Content is not allowed in prolog

SAXParseException when “ is used in XML

Categories

Resources