I'm trying this:
<root>
text:
</root>
But parser says:
org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 12;
Character reference "" is an invalid XML character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
How to use it there?
You cannot ( in XML 1.0). In XML 1.1 there is a slightly larger range of characters you can use, and the characters are expressed differently, but even then, is 'restricted' (hex it is ) which, as far as I can tell, means that it is not valid XML, even though XML parsers should process it successfully. Note that the 'null' character () is never valid. Here's a Wiki article on these characters in XML
You can try forcing the XML document to XML 1.1 and see if your parser will process it successfully.... set the first line of the XML to:
<?xml version="1.1"?>
In fact, I did that, and it works:
<?xml version="1.1"?>
<root></root>
Related
I have found this JCabi snippet code that works well with UTF-8 xml encoded files, it basically reads the xml file and then prints it as a string.
XML xml;
try {
xml = new XMLDocument(new File("test8.xml"));
String xmlString = xml.toString();
System.out.println(xmlString);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
However I need this to run this same code on a UTF-16 encoded xml it gives me the following error:
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Can't parse, most probably the XML is invalid
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
I have read about this error and this means that the parser it is not recognizing the prolog because it's seeing characters that are not supposed to be there because of the encoding.
I have tried other libraries that offer a way to "tell" the class which encoding the source file is encoded in, but the only library I was able to get it to work to some degree was JCabi, but I was not able to find a way to tell it that my source file is encoded in UTF-16.
Thanks, any help is appreciated.
The jcabi XMLDocument has various constructors including one which takes a string. So one approach is to use:
Path path = Paths.get("test16_LE_with_bom.xml");
XML xml = new XMLDocument(Files.readString(path, StandardCharsets.UTF_16LE));
String xmlString = xml.toString();
System.out.println(xmlString);
This makes use of java.nio.charset.StandardCharsets and java.nio.file.Files.
In my first test, my XML file was encoded as UTF-16-LE (and with a BOM at the start: FF FE for little-endian). The above approach handled the BOM OK.
My test file's prolog is as follows (with no explicit encoding - maybe that's a bad thing, here?):
<?xml version="1.0"?>
In my second test I removed the BOM and re-ran with the updated file - which also worked.
I used Notepad++ and a hex editor to verify/select encodings & to edit the test files.
Your file may be different from my test files (BE vs. LE).
I have this description that I get from user:
sample description with special symbols >.
I want to parse this into a valid XML format string to pass it in my REST call. Currently, if I pass this as is, my third party implementation fires an exception, saying "it cannot handle any special symbols"
I have tried XMLParser, XmlSlurper but all fire exception as
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
It needs to be escaped if you are sending it inside xml.
Approach 1:
change from > to > in the value.
Approach 2:
Put the string inside CDATA as shown below.
<![CDATA[sample description with special symbols >]]>
While parsing the XML using JAXB am getting error as "javax.xml.bind.UnmarshalException: An invalid XML character (Unicode: 0xffffffff) was found in the element content of the document " . Because in my xml node have some special characters like "TRÊS2115". How to handle this scenario. I need that special character values too.
This is a problem with your input data. The accented character needs to be escaped within the XML file. The code that wrote the file failed to properly encode the character.
I have a strange problem when parsing an xml request with JAXB: somehow it tries to parse more lines then exists in the string:
String xml; //content with 139 lines in xml format
MyReq req = JAXB.unmarshal(new StringReader(xml), MyReq.class);
Result:
Caused by: org.xml.sax.SAXParseException; lineNumber: 140; columnNumber: 1; Content is not allowed in trailing section.
What might be wrong with this?? The lines does not exist that is supposed to be have an error...
I can copy the xml just as it is to soapUI and execute the request successfully, thus concluding the xml is valid in general.
You should check the xml content. Most of the time Content is not allowed in trailing section error is because the content is not valid, probably some bad characters at the end of the stream.
You should print the content of the xml, with some known delimiters, to ensure that what you received is what you actually tested/expected, something like:
System.out.println("*"+xml+"*");
I am using glassfish server and the following error keeps coming:
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:304)
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:226)
at com.sun.enterprise.deployment.archivist.Archivist.readStandardDeploymentDescriptor(Archivist.java:480)
at com.sun.enterprise.deployment.archivist.Archivist.readDeploymentDescriptors(Archivist.java:305)
at com.sun.enterprise.deployment.archivist.Archivist.open(Archivist.java:213)
at com.sun.enterprise.deployment.archivist.ApplicationArchivist.openArchive(ApplicationArchivist.java:813)
at com.sun.enterprise.instance.WebModulesManager.getDescriptor(WebModulesManager.java:395)
... 65 more
Check this link
http://mark.koli.ch/2009/02/resolving-orgxmlsaxsaxparseexception-content-is-not-allowed-in-prolog.html
In short, some XML file contains the three-byte pattern (0xEF 0xBB 0xBF) at the front (right before <?xml ...?>), which is the UTF-8 byte order mark. The java's default XML parser can't handle this case.
The quick and dirty solution is to remove the white space at the front of the XML file:
String xml = "<?xml ...";
xml = xml.replaceFirst("^([\\W]+)<","<");
note that the String.trim() dost not enough, since it only trim the limited whitespace characters.