XML parsing tries to parse more lines then exist

XML parsing tries to parse more lines then exist - java

I have a strange problem when parsing an xml request with JAXB: somehow it tries to parse more lines then exists in the string:
String xml; //content with 139 lines in xml format
MyReq req = JAXB.unmarshal(new StringReader(xml), MyReq.class);
Result:
Caused by: org.xml.sax.SAXParseException; lineNumber: 140; columnNumber: 1; Content is not allowed in trailing section.
What might be wrong with this?? The lines does not exist that is supposed to be have an error...
I can copy the xml just as it is to soapUI and execute the request successfully, thus concluding the xml is valid in general.

You should check the xml content. Most of the time Content is not allowed in trailing section error is because the content is not valid, probably some bad characters at the end of the stream.
You should print the content of the xml, with some known delimiters, to ensure that what you received is what you actually tested/expected, something like:
System.out.println("*"+xml+"*");

Related

Reading UTF-16 XML files with JCabi Java

I have found this JCabi snippet code that works well with UTF-8 xml encoded files, it basically reads the xml file and then prints it as a string.
XML xml;
try {
xml = new XMLDocument(new File("test8.xml"));
String xmlString = xml.toString();
System.out.println(xmlString);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
However I need this to run this same code on a UTF-16 encoded xml it gives me the following error:
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Can't parse, most probably the XML is invalid
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
I have read about this error and this means that the parser it is not recognizing the prolog because it's seeing characters that are not supposed to be there because of the encoding.
I have tried other libraries that offer a way to "tell" the class which encoding the source file is encoded in, but the only library I was able to get it to work to some degree was JCabi, but I was not able to find a way to tell it that my source file is encoded in UTF-16.
Thanks, any help is appreciated.

The jcabi XMLDocument has various constructors including one which takes a string. So one approach is to use:
Path path = Paths.get("test16_LE_with_bom.xml");
XML xml = new XMLDocument(Files.readString(path, StandardCharsets.UTF_16LE));
String xmlString = xml.toString();
System.out.println(xmlString);
This makes use of java.nio.charset.StandardCharsets and java.nio.file.Files.
In my first test, my XML file was encoded as UTF-16-LE (and with a BOM at the start: FF FE for little-endian). The above approach handled the BOM OK.
My test file's prolog is as follows (with no explicit encoding - maybe that's a bad thing, here?):
<?xml version="1.0"?>
In my second test I removed the BOM and re-ran with the updated file - which also worked.
I used Notepad++ and a hex editor to verify/select encodings & to edit the test files.
Your file may be different from my test files (BE vs. LE).

Getting Premature end of file when building an XML object from a URL

I'm trying to run the following code val podcastXml = XML.load(new URL(feed)) where the feed in question is https://fourfingerdiscount.podbean.com/feed/
I'm able to load the feed in my browser fine, but I'm getting an
error: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
when I try to run that code against it.
Interestingly enough, when I try to curl the feed URL in my browser, it's empty.
Any idea what I'm doing wrong here? I feel like there's a config option that I'm somehow missing.
Also worth mentioning, some of the feeds work fine such as http://maximumfun.org/feeds/are.xml

Reading from HTTPS URLs is not so straight as with this new URL(feed). It doesn't get the desired XML contents so easily, but rather an empty response and so the SAXParseException.
However the data can be retrieved, one of the ways is the old HttpsUrlConnection. But there are also a number of wrapper libs allowing to facilitate it, for example scalaj-http. With it, you can proceed as follows:
import scalaj.http.Http
val url = "https://fourfingerdiscount.podbean.com/feed/"
val xml = Http(url).execute(XML.load).body
// xml is a parsed scala.xml.Elem with the contents you were looking for

Groovy: parse string with special symbols

I have this description that I get from user:
sample description with special symbols >.
I want to parse this into a valid XML format string to pass it in my REST call. Currently, if I pass this as is, my third party implementation fires an exception, saying "it cannot handle any special symbols"
I have tried XMLParser, XmlSlurper but all fire exception as
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

It needs to be escaped if you are sending it inside xml.
Approach 1:
change from > to > in the value.
Approach 2:
Put the string inside CDATA as shown below.
<![CDATA[sample description with special symbols >]]>

JAXB Getting error While parsing the XML

While parsing the XML using JAXB am getting error as "javax.xml.bind.UnmarshalException: An invalid XML character (Unicode: 0xffffffff) was found in the element content of the document " . Because in my xml node have some special characters like "TRÊS2115". How to handle this scenario. I need that special character values too.

This is a problem with your input data. The accented character needs to be escaped within the XML file. The code that wrote the file failed to properly encode the character.

about SAXparseException: content is not allowed in prolog

I am using glassfish server and the following error keeps coming:
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:304)
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:226)
at com.sun.enterprise.deployment.archivist.Archivist.readStandardDeploymentDescriptor(Archivist.java:480)
at com.sun.enterprise.deployment.archivist.Archivist.readDeploymentDescriptors(Archivist.java:305)
at com.sun.enterprise.deployment.archivist.Archivist.open(Archivist.java:213)
at com.sun.enterprise.deployment.archivist.ApplicationArchivist.openArchive(ApplicationArchivist.java:813)
at com.sun.enterprise.instance.WebModulesManager.getDescriptor(WebModulesManager.java:395)
... 65 more

Check this link
http://mark.koli.ch/2009/02/resolving-orgxmlsaxsaxparseexception-content-is-not-allowed-in-prolog.html
In short, some XML file contains the three-byte pattern (0xEF 0xBB 0xBF) at the front (right before <?xml ...?>), which is the UTF-8 byte order mark. The java's default XML parser can't handle this case.
The quick and dirty solution is to remove the white space at the front of the XML file:
String xml = "<?xml ...";
xml = xml.replaceFirst("^([\\W]+)<","<");
note that the String.trim() dost not enough, since it only trim the limited whitespace characters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XML parsing tries to parse more lines then exist - java

Related

Reading UTF-16 XML files with JCabi Java

Getting Premature end of file when building an XML object from a URL

Groovy: parse string with special symbols

JAXB Getting error While parsing the XML

about SAXparseException: content is not allowed in prolog

Categories

Resources