While parsing the XML using JAXB am getting error as "javax.xml.bind.UnmarshalException: An invalid XML character (Unicode: 0xffffffff) was found in the element content of the document " . Because in my xml node have some special characters like "TRÊS2115". How to handle this scenario. I need that special character values too.
This is a problem with your input data. The accented character needs to be escaped within the XML file. The code that wrote the file failed to properly encode the character.
Related
For my customer I have to marshal a XML file (received from an external service) to Java entities, and save it on database.
For that I am using a simple Jaxb method that does the job.
I have an issue with the XML file. I received it and I don't understand why acute accent caracter doesn't show correctly in the file.
It is encoded in UTF-8 in Unix (LF).
Acute accent is display like that in the file :
When copy it and paste it on a new file it is correctly displayed.
The problem is that when Jaxb process the file, I get this error:
org.springframework.dao.DataAccessResourceFailureException: Error reading XML stream; nested exception is javax.xml.stream.XMLStreamException: ParseError at [row,col]:[14642,669]
Message: The element type "Nm" must be terminated by the matching end-tag "</Nm>".
It's not an end-tag issue, it is correctly closed. Wen I replace this "XB4" caracter by another one, it works properly.
Java file encoding format is UTF-8.
Does someone have an idea ?
Thanks a lot.
I'm using org.w3c and javax.xml.parsers in Java for reading and writing xml files.
When I read an xml file, the
escaped line breaks will be replaced by real line breaks. When I write the content back to the file, I loose escaping and the content of the file will change unintentionally.
so
<somenode>First line.
Second line</somenode>
will be replaced by:
<somenode>First line.
Second line.</somenode>
Before writing xml content back to disk I tried:
String content = node.getTextContent().replace("\n","
");
node.setTextContent(content);
Of course it does not work, it will be escaped to in the file.
I do not want to litter the file with CDATA tags!
What I want to do is legal XML output so there has to be a way to do it.
Thanks in advance for any ideas :)
Do it by setting the following property for the JAXB Marshaller:
marshaller.setProperty("jaxb.encoding", "Unicode");
How do you determine whether a Document object in Java contains valid XML. Is this checked when the object is constructed?
I can't appear to find any information on this in
http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Document.html
How do you determine whether you have a valid XML Document without using external libraries?
Note: I received this Document object by parsing from an input stream with a DOM XML parser.
Use the Java DOM API. It can handle any valid XML document. A valid document will give no exception. You need no external libraries for DOM.
In case of an error the exception message looks like this:
[Fatal Error] MyXMLFile.xml:6:2: The end-tag for element type "lastname" must end with a '>' delimiter.
The end-tag for element type "lastname" must end with a '>' delimiter.
BUILD SUCCESSFUL (total time: 0 seconds)
While parsing an XML file Stax produces an error:
Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.
Just click on the link below with the xml line with special character as "VI". It's not an alphabetical character: when you try to copy and paste it in Notepad, you will get it as some symbol. I have tried parsing it using Stax. It was showing the above-mentioned error.
Please can somebody give me a solution for this?
Thanks in advance.
0xB (vertical tab) is not a valid character in XML. The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed).
In short, what you are trying to parse is NOT XML.
Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern
Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+");
retunContent = p.matcher(retunContent).replaceAll("");
It will ensure that no invalid special char will enter in xml
According to the XML W3C Recommendation 0xb is not allowed in an XML file:
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
So strictly speaking your input file is not an XML file.
I am using glassfish server and the following error keeps coming:
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:304)
at com.sun.enterprise.deployment.io.DeploymentDescriptorFile.read(DeploymentDescriptorFile.java:226)
at com.sun.enterprise.deployment.archivist.Archivist.readStandardDeploymentDescriptor(Archivist.java:480)
at com.sun.enterprise.deployment.archivist.Archivist.readDeploymentDescriptors(Archivist.java:305)
at com.sun.enterprise.deployment.archivist.Archivist.open(Archivist.java:213)
at com.sun.enterprise.deployment.archivist.ApplicationArchivist.openArchive(ApplicationArchivist.java:813)
at com.sun.enterprise.instance.WebModulesManager.getDescriptor(WebModulesManager.java:395)
... 65 more
Check this link
http://mark.koli.ch/2009/02/resolving-orgxmlsaxsaxparseexception-content-is-not-allowed-in-prolog.html
In short, some XML file contains the three-byte pattern (0xEF 0xBB 0xBF) at the front (right before <?xml ...?>), which is the UTF-8 byte order mark. The java's default XML parser can't handle this case.
The quick and dirty solution is to remove the white space at the front of the XML file:
String xml = "<?xml ...";
xml = xml.replaceFirst("^([\\W]+)<","<");
note that the String.trim() dost not enough, since it only trim the limited whitespace characters.