Forcing escaped characters when writing to XML - java

I'm using org.w3c and javax.xml.parsers in Java for reading and writing xml files.
When I read an xml file, the
escaped line breaks will be replaced by real line breaks. When I write the content back to the file, I loose escaping and the content of the file will change unintentionally.
so
<somenode>First line.
Second line</somenode>
will be replaced by:
<somenode>First line.
Second line.</somenode>
Before writing xml content back to disk I tried:
String content = node.getTextContent().replace("\n","
");
node.setTextContent(content);
Of course it does not work, it will be escaped to &#10; in the file.
I do not want to litter the file with CDATA tags!
What I want to do is legal XML output so there has to be a way to do it.
Thanks in advance for any ideas :)

Do it by setting the following property for the JAXB Marshaller:
marshaller.setProperty("jaxb.encoding", "Unicode");

Related

CharConversionException while transforming xml file

I have a Java program which process xml files. When transforming xml into another xml file base on certain schema( xsd/xsl) it throws following error.
This error only throws for one xml file which has a xml tag like this.
<abc>xxx yyyy “ggggg vvvv” uuuu</abc>
But after removing or re-type two quotes, it doesn’t throw the error.
Anybody, please assist me to resolve this issue.
java.io.CharConversionException: Character larger than 4 bytes are not supported: byte 0x93 implies a length of more than 4 bytes
at .org.apache.xmlbeans..impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
<?xml version= “1.0’ encoding =“UTF-8” standalone =“yes “?><xyz xml s=“http://pqr.yy”><Header><abc> aaa “cccc” aaaaa vvv</abc></Header></xyz>.
As others have reported in comments, it has failed because the typographical quotation marks are encoded in Windows-1292 encoding, not in UTF-8, so the parser hasn't managed to decode them.
The encoding declared in the XML declaration must match the actual encoding used for the characters.
To find out how this error arose, and to prevent it happening again, we would need to know where this (wannabe) XML file came from, and how it was created.
My guess would be that someone used a "smart" editor; Microsoft editors in particular are notorious for changing what you type to what Microsoft think you wanted to type. If you're editing XML by hand it's best to use an XML-aware editor.

Groovy script to find and replace characters in a XML file

I have the below groovy script snippet, I want to replace some characters inside a list of xml files, How could I do this?
println "Remove Invisible characters in CustomMetadata"
def customMetadata = ant.fileScanner {
fileset(dir: '${target.dir}') {
include(name: 'customMetadata/*.md')
}
}
// m is the file
for (m in customMetadata) {
//Want to get a content of the file and replace if there's any specified characters
println("Found file $m")
}
If you need to replace characters in the whole file, just read it with
def content = new File('[your file name]').text
use a replaceAll() to replace your character via regular expressions and write the file back with
new File('[your file name]').write(content)
For replacing unnecessary whitespaces, this should work.
A "cleaner" solution would be to parse the file, replace the characters in the xml content and write it back. This is more complicated and might lead to some problems with XML namespaces. To give it a try, search for XMLSlurper or XMLParser: http://www.groovy-lang.org/processing-xml.html

XSL to FO and HTML entities

I'm developing a Java class that converts an HTML string into a FO string via an XSLT document.
Then, the resulting FO string is processed by FOP to create a PDF file.
The problem is that when a special character is found by FOP, i get an error:
(e.g.) The entity "ldquo" was referenced, but not declared.
Now my solution is to replace all these special characters with their Unicode reference.
In this example, "“" becomes "“"
Can I declare those entities in my XSLT file without doing zillions of StringUtils.replaceAll()?
Solved using JTidy with setXmlOut(true)

How to place an XML file in form of a String inside Java file

I am using Eclipse IDE
I have a big XML file .
I wan to copy this XML file and provide it in form of a String .
String XMLStringSource = "XML Content Here" ;
I am getting errros with double quotes in the XML file , please tell me how can we resolve this ??
You should not do that. In fact, it is impossible beyond a certain size as there is a limit of 64KB on the bytecode of methods (which include initializers).
The correct way to do it is put the XML file next to the source code and use Class.getResourceAsStream() to read the file.
You can configure Eclipse to escape text when pasting into a string literal.
Go to Window > Preferences > Java > Editor > Typing.
Select the checkbox which says "Escape text when pasting into a string literal".
Press Apply.
Now create a String literal e.g.
String xml = "";
Copy your xml and paste it inside the quotes. Eclipse will automatically escape it for you.
This is quite handy for small bits of xml or text.
If you have a large file, then you should read the file into a string instead.
Yow will need to escape the quotes. But this will change the look of XML; it will be a combination of Java String/XML. Also if the the XML file is big like you say then you will need to do a search and replace for quotes " with escaped quotes \" before pasting into the java file.
try
{
FileReader fstream = new FileReader("D:\\File.xml");
BufferedReader out = new BufferedReader(fstream);
String y="";
while ((y=out.readLine()) != null) {
System.out.println(y);
}
//out.close();
}catch(Exception e) {
e.printStackTrace();
}
is this acceptable? just read the file line by line, the String variable will have double quotations and they wont generate an error. If reading step by step can solve problem.

How to determine whether a given string is an .xml file

I have an issue that I get some some response as a String.
This String could be a normal string,number etc.. or an .xml file.
Now ,when I get an xml file, I want to treat it differently.
I am not able to distinguish between a string or an .xml file.
Also, this xml file could have some syntatic error.
Please suggest , how do I go ahead
Code is like this:
Document document = reader.read(new StringReader(xml));
where xml can be a string or an xml file itself.
If xml is a string , it is fine but if it is an xml file and with some syntax error then it should throw exception
If it is a proper XML document it should begin with a XML declaration. If that's there, it's intended to be a conforming XML document. If that's not there it cannot be a conforming XML document.
If you are using a coding language like C#, then you can use - XmlDocument.loadxml -
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.loadxml.aspx
This will throw error if the string is not in correct xml format.

Categories

Resources