Java XML Document converting " to "(literal quote) upon parsing/converting to Document

Java XML Document converting " to "(literal quote) upon parsing/converting to Document - java

I have this problem where I need to send to soap webservice that requires the root tag to have an xml data, this the xml that I'm trying to send:
<root><test key="Applicants">this is a data</test></root>
I need to append this to the SoapBody object as a document with this code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document result = builder.parse(new ByteArrayInputStream(request.getRequest().getBytes()));
Then adding it to the SoapBody to be sent to the webservice.
However, upon sending this request and tracing the logs, it's actually reverting the " character to literal quotes (")
This is the xml being sent:
<root><test key="Applicants">this is a data</test></root>
As you can see, the " is being transformed to literal quotes, how can I keep the original data within root tag (which has the ")? It seems to be transforming it when I'm converting it to a Document object.
Would appreciate any help. Thanks.
Edit:
The webservice actually requires this format (from their documentation and sample xml requests), if this isn't possible, is it a limitation? Should I user another framework?

The " and " are completely equivalent in this context. You haven't actually said whether this is causing a problem: if it is, then it's because some recipient of the XML isn't processing it correctly. Incidentally, it would also be legitimate to convert the > to >.
When you parse XML and re-serialise it, irrelevant details like redundant whitespace get lost - just as if you copy this text into your text editor, the line-wrapping and font size gets lost.

Related

Decoding a base64 XML cuts off the last part

I have a base64 encoded string, which represents an XML Schema (xsd). I decode this using Apache's Base64 utilities, put the resulting byte array into an intputsource and let an XMLSchemaCollection read this inputSource:
String base64String = ......
byte[] decoded = Base64.decodeBase64(base64String);
InputSource inputSource = new InputSource(new ByteArrayInputStream(decoded));
xmlSchemaCollection.read(inputSource, new ValidationEventHandler());
This gives an error:
XML document structure must start and end within the same entity
Which usually means the XML structure isn't valid. I performed two tests to see what the base64 actually holds. First is printing it out to the console:
System.out.println(new String(decoded,"UTF-8"));
In eclipse, I see my xml is suddenly cut off, like part of it is missing. However, if I use any online website, such as https://www.base64decode.org/, and I copy/paste my base64, I see the complete full xml. If I validate this xml, the validation succeeds. So I'm a bit confused as to why eclipse seemingly cuts off my xml after decoding?

Errors like this are usually indicative of a badly formatted document:
XML document structures must start and end within the same entity...
A few things you can do to debug this:
1. Print out the XML document to a log and run it through some sort of XML validator.
2. Check to make sure that there are no invalid characters (ex UTF-16 characters in a UTF-8 document)

Extracting elements from an HTTP XML response--HTTP Client & Java

So I've gotten help from here already so I figured why not try it out again!? Any suggestions would be greatly appreciated.
I'm using HTTP client and making a POST request; the response is an XML body that looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<CartLink
xmlns="http://api.gsicommerce.com/schema/ews/1.0">
<Name>vSisFfYlAPwAAAE_CPBZ3qYh</Name>
<Uri>carts/vSisFfYlAPwAAAE_CPBZ3qYh</Uri>
</CartLink>
Now...
I have an HttpEntity which is
[HttpResponse].getEntity().
Then I get a String representation of the response (which is XML in this case) by saying
String content = EntityUtils.toString(HttpEntity)
I tried following some of the suggestions on this post: How to create a XML object from String in Java? but it did not seem to work for me. When I built up the document it still appeared to be null.
MY END GOAL here is just to get the NAME field.. i.e. the "vSisFfYlAPwAAAE_CPBZ3qYh" part. So do I want to build up a document and then extract it...? Or is there a simpler way? I've been trying different things and I can't seem to get it to work.
Thanks for all of the help guys, it is most appreciated!!

Instead of trying to extract the value with string manipulation, try to use Java's inbuilt ability to parse XML. That's a much better approach. Http Components returns responses in an XML format - there's a reason for that. :)
Here's probably one way to solve your problem:
// Parse the response using DocumentBuilder so you can get at elements easily
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(response);
Element root = doc.getDocumentElement();
// Now let's say you have not one, but 'n' nodes that contain the value
// you're looking for. Use NodeList to get a list of all those nodes and just
// pull out the tag/attribute's value you want.
NodeList nameNodesList = doc.getElementsByTagName("Name");
ArrayList<String> nameValues = null;
// Now iterate through the Nodelist to get the values you want.
for (int i=0; i<nameNodesList.getLength(); i++){
nameValues.add(nameNodesList.item(i).getTextContent());
}
The ArrayList "nameValues" will now hold every single value contained within "Name" tags. You could also create a HashMap to store a key value pair of Nodes and their respective text contents.
Hope this helps.

Parsing xml without namespace

I have a parsing problem that appears when I try to parse from a String, containg a xml, to a org.w3c.dom.Document.
Here is a example of a xml String that i'm trying to parse:
<enviNFe xmlns="http://www.portalfiscal.inf.br/nfe" versao="2.00">
<idLote>123</idLote>
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
...
</NFe>
</enviNFe>
The problem is, that after que String had been parsed, by the following code:
private Document documentFactory(String xml) throws SAXException,
IOException, ParserConfigurationException, DocumentException, TransformerException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document document = factory.newDocumentBuilder().parse(
new ByteArrayInputStream(xml.getBytes()));
return document;
}
The tag NFe loads without the namespace (xmlns="http://www.portalfiscal.inf.br/nfe")
I want to know why this happens, and what I could do to solve this.
Any help will be great.
Thanks, and sorry for my english.
------EDIT----
For better understanding:
This xml will be signed right after de parsing, and will be sent to a Government's server(Brazil).
After this, I do another request to this server, to verify if it was processed or not. If it was, I will get a positive response in case of any error.
The first problem I had, was that the xml was malformed. This happened because I was sending the xml without that namespace in the tag NFe.
To solve this I added it(namespace) right in the File, after the xml had being signed.
This problem in fact had been solved, but another occurred: the difference in the signature.
Because I signs the xml without the namespace, and send with it.

From what i can put together from your various comments, i think you are misunderstanding how xml works. you indicate that you manually added the namespace to the NFe element. however, in your xml example, the NFe node already has that namespace.
In this xml:
<enviNFe xmlns="http://www.portalfiscal.inf.br/nfe" versao="2.00">
<idLote>123</idLote>
<NFe>
...
</NFe>
</enviNFe>
all of the nodes have the "http://www.portalfiscal.inf.br/nfe" namespace. by putting the xmlns="..." attribute on the parent node, the namespace is applied to that node and all of the child nodes with the same prefix (in this case, no prefix).

It is returning the correct document. To test it you can just walk through your document.
doc.getFirstChild().getFirstChild().getNextSibling().getNextSibling().getNextSibling().getNamespaceURI();
Or try to get the tag by it's name:
NodeList tags = doc.getElementsByTagNameNS("http://www.portalfiscal.inf.br/nfe", "NFe");

Get an XML from a url using dom

I have store in a String variable(link) the url that I get the xml response, I use a dom to parse the xml data.
In order to be sure that I extract the data correctly I store the xml in the local drive, build my parser and I took the data:
document = builder.parse(new File(filepath));
So when I try to get it from url I used:
document = builder.parse(new URL(link).openStream());
And it didn't work. What am I missing?
The data of the xml are stored in a list which then are shown in a jsf datatable.

Well the above works just fine, the problem was the index of elements of the nodelist. For some reason when i was reading from file
obj.setattribute1(cDetails.item(1).getTextContent());
obj.setattribute2(cDetails.item(3).getTextContent());
see that the item are increased by 2 each time
now that i read a URL the increment is 1 every time
Now i am sure that there is a reason for this which i don't understand probably cause of my limited yet knowledge but the above work and the index of the item increases 1 for the next item in the nodelist.

SAXParseException when “ is used in XML

I'm getting a "org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 26; The entity "ldquo" was referenced, but not declared." exception when reading an XML document. I'm reading it as follows:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlBody));
Document document = builder.parse(is);
And then there's an exception on builder.parse(is);
From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
How do I fix this problem?
Thanks

From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
Well, unless you declare the entity then the document isn't XML and you won't be able to process it using an XML parser.
When you are asked to process input that isn't well-formed XML, the best approach is to fix the process that created the document (the whole idea of using XML for interchange relies on it being well-formed XML). The alternatives are to "repair" the document to turn it into well-formed XML (which you say you can't do), or to forget the fact that it was intended to be XML, and treat it as you would any proprietary non-XML format.
Not a pleasant set of choices - but that's the mess you get into when people pay lip-service to XML but fail to conform to the letter of the standard.

Try
factory.setExpandEntityReferences(false);
This will prevent the parser from trying to expand entities.
EDIT: How about this http://xerces.apache.org/xerces2-j/features.html#dom.create-entity-ref-nodes -- The top of that page has an example of how to set features on the underlying parser. This should cause the parser to create entity-reference DOM nodes instead of trying to expand the entities.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java XML Document converting " to "(literal quote) upon parsing/converting to Document - java

Related

Decoding a base64 XML cuts off the last part

Extracting elements from an HTTP XML response--HTTP Client & Java

Parsing xml without namespace

Get an XML from a url using dom

SAXParseException when “ is used in XML

Categories

Resources