Java XML parsing error : Content is not allowed in prolog - java

My code write a XML file with the LSSerializer class :
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS","3.0");
LSSerializer ser = implLS.createLSSerializer();
String str = ser.writeToString(doc);
System.out.println(str);
String file = racine+"/"+p.getNom()+".xml";
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
out.write(str);
out.close();
The XML is well-formed, but when I parse it, I get an error.
Parse code :
File f = new File(racine+"/"+filename);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);
XPathFactory xpfactory = XPathFactory.newInstance();
XPath xp = xpfactory.newXPath();
String expression;
expression = "root/nom";
String nom = xp.evaluate(expression, doc);
The error :
[Fatal Error] Terray.xml:1:40: Content is not allowed in prolog.
9 août 2011 19:42:58 controller.MakaluController activatePatient
GRAVE: null
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
at model.MakaluModel.setPatientActif(MakaluModel.java:147)
at controller.MakaluController.activatePatient(MakaluController.java:59)
at view.ListePatientsPanel.jButtonOKActionPerformed(ListePatientsPanel.java:92)
...
Now, with some research, I found that this error is dure to a "hidden" character at the very beginning of the XML.
In fact, I can fix the bug by creating a XML file manually.
But where is the error in the XML writing ? (When I try to println the string, there is no space before ths
Solution : change the serializer
I run the solution of UTF-16 encoding for a while, but it was not very stable.
So I found a new solution : change the serializer of the XML document, so that the encoding is coherent between the XML header and the file encoding. :
DOMSource domSource = new DOMSource(doc);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
String file = racine+"/"+p.getNom()+".xml";
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT,"yes");
transformer.transform(domSource, new StreamResult(out));

But where is the error in the XML writing ?
Looks like the error is not in the writing but the parsing. As you have already discovered there is a blank character at the beginning of the file, which causes the error in the parse call in your stach trace:
Document doc = builder.parse(f);
The reason you do not see the space when you print it out may be simply the encoding you are using. Try changing this line:
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
to use 'UTF-16' or 'US-ASCII'

I think that it is probably linked to BOM (Byte Order Mark). See Wikipedia
You can verify with Notepad++ by example : Open your file and check the "Encoding" Menu to see if you're in "UTF8 without BOM" or "UTF8 with BOM".

Using UTF-16 is the way to go,
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(fileName),"UTF-16");
This can read the file with no issues

Try this code:
InputStream is = new FileInputStream(file);
Document doc = builder.parse(is , "UTF-8");

Related

Format XML file in java

I already created an xml, and i was obliged to modify it using PrintWriter Class. (copying line by line the xml file and modify what i need) but i lost my indentation.
i tried this :
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new InputStreamReader(new FileInputStream(
"notes.xml"))));
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.setOutputProperty(OutputKeys.METHOD, "xml");
xformer.setOutputProperty(OutputKeys.INDENT, "yes");
xformer.setOutputProperty("b;http://xml.apache.org/xsltd;indent-amount", "4");
//xformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
Source source = new DOMSource(document);
Result result = new StreamResult(new File("notes.xml"));
xformer.transform(source, result);
but this isn't working : having an exception in the third line :
[Fatal Error] :3:6537: XML document structures must start and end within the same entity.
org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 6537; XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)
So, Is there a way to format an already created XML file in java ?

Fixing format of print statement

I recently just started working with document builder to build a title with my XML. This is the following code I am using right now:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream = new FileInputStream(new File("c:\\staff.xml"));
org.w3c.dom.Document doc = documentBuilderFactory.newDocumentBuilder().parse(inputStream);
StringWriter stw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(doc), new StreamResult(stw));
String xmldata = (stw.toString());
System.out.println(xmldata);
It builds the title fine, but it starts writing the XML file on the same line due to the parse. Can someone show me how I can alter this code to get <company> onto the second line?
Here is my print:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><company>
<comp id="512">
<firstname>Brandon</firstname>
<lastname>Liens</lastname>
<empid>612</empid>
<rqid>51265</rqid>
</comp>
</company>
Staff.XML file:
<company>
<comp id="512">
<firstname>Brandon</firstname>
<lastname>Nyberg</lastname>
<empid>612</empid>
<rqid>51265</rqid>
</comp>
</company>
You can't add a newline after the XML declaration automatically, but if you can deal with removing the XML declaration entirely, you can add this line just after your Transformer serializer = ... line:
serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

Prevent transformer.transform( source, result ) from escaping special character

I'm updating node and text content of the xml using DOM parser. To save that DOM parser I'm using transformer.transform method.
Below is the sample code.
String xmlText = "<uc>abcd><name>mine</name>efgh\netg<tag>sd</tag></uc>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
InputSource inStream = new InputSource();
inStream.setCharacterStream(new StringReader(xmlText));
Document document = documentBuilder.parse(inStream);
Node node = document.getDocumentElement();
node.normalize();
NodeList childNodes = node.getChildNodes();
for(int i=0; i<childNodes.getLength(); i++) {
if(childNodes.item(i).getNodeType() == Node.TEXT_NODE) {
System.out.println(childNodes.item(i).getTextContent());
childNodes.item(i).setTextContent("123>");
}
}
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "US-ASCII");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource( document );
OutputStream xml = new ByteArrayOutputStream();
StreamResult result = new StreamResult( xml );
transformer.transform( source, result );
String formattedXml = xml.toString();
System.out.println(formattedXml);
Since my updated document is having text content like ">", transformer.transform method is changing it to &g t;
Is there a way to get the output without escaping special characters.
I can't use other parser because of some project constraints.
I can't use StringEscapeUtils.unescapeXml(). The reason is xml can have &g t;. If i use this utility method, &g t; which was originally present in the xml will also get changed.
So i want a mechanism which will not escape any special character.
The transformer you create with
Transformer transformer = tFactory.newTransformer();
is initialized with a default stylesheed that implements the identity transformation. That means it will simply serialize your DOM to a well-formed XML document. Output escaping is automatically applied where necessary.
If you want better control over the output, and possibly generate something that does not adhere to XML document structures, you can use a custom stylesheet that switches the output method to text. This way you control more of the structure but can do more mistakes in the XML area.
More information at
https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer()
https://www.w3.org/TR/xslt20/#element-output

How to load XML file from internet into a string

Currently I have a java application that loads XML from a local file into a string. My code looks like this
private String xmlFile = "D:\\mylocalcomputer\\extract-2339393.xml";
String fileStr = FileUtils.readFileToString(new File(xmlFile));
How can I get the contents of the XML file if it was located on the internet, at a URL like http://mydomain.com/xml/extract-2000.xml ?
try the sax interface
private String xmlURL = "http://mydomain.com/xml/extract-2000.xml";
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(handler);
reader.parse(new InputSource(new URL(xmlURL).openStream()));
For more information regarding SAX check this link
Check this code:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream = new FileInputStream(new File("http://mydomain.com/xml/extract-2000.xml"));
org.w3c.dom.Document doc = documentBuilderFactory.newDocumentBuilder().parse(inputStream);
StringWriter stw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(doc), new StreamResult(stw));
stw.toString();

Writing XML in different character encodings with Java

I am attempting to write an XML library file that can be read again into my program.
The file writer code is as follows:
XMLBuilder builder = new XMLBuilder();
Document doc = builder.build(bookList);
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer ser = implLS.createLSSerializer();
String out = ser.writeToString(doc);
//System.out.println(out);
try{
FileWriter fstream = new FileWriter(location);
BufferedWriter outwrite = new BufferedWriter(fstream);
outwrite.write(out);
outwrite.close();
}catch (Exception e){
}
The above code does write an xml document.
However, in the XML header, it is an attribute that the file is encoded in UTF-16.
when i read in the file, i get the error:
"content not allowed in prolog"
this error does not occur when the encoding attribute is manually changed to UTF-8.
I am trying to get the above code to write an XML document encoded in UTF-8, or successfully parse a UTF-16 file.
the code for parsing in is
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder loader = factory.newDocumentBuilder();
Document document = loader.parse(filename);
the last line returns the error.
the LSSerializer writeToString method does not allow the Serializer to pick a encoding.
with the setEncoding method of an instance of LSOutput, LSSerializer's write method can be used to change encoding. the LSOutput CharacterStream can be set to an instance of the BufferedWriter, such that calls from LSSerializer to write will write to the file.

Categories

Resources