XML File looses its format after reading and writing in Java

XML File looses its format after reading and writing in Java - java

I'm writing a program in Java that it's going to read a XML file and do some modification,and then write the file with the same format.
The following is the code block that reads and writes the XML file:
final Document fileDocument = parseFileAsDocument(file);
final OutputFormat format = new OutputFormat(fileDocument);
try {
final FileWriter out = new FileWriter(file);
final XMLSerializer serializer = new XMLSerializer(out,format);
serializer.serialize(fileDocument);
}
catch (final IOException e) {
System.out.println(e.getMessage());
}
This is the method used to parse the file:
private Document parseFileAsDocument(final File file) {
Document inputDocument = null;
try {
inputDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
}//catching some exceptions{}
return inputDocument;
}
I'm noticing two changes after the file is written:
Before I had a node similar to this:
<instance ref='filter'>
<value></value>
</instance>
After reading and writing, the node looks like this:
<instance ref="filter">
<value/>
</instance>
As you can see from above, the 'filter' has been changed to "filter" with double quote.
The second change is <value></value> has been changed to <value/>. This change happens across the XML file whenever we have a node similar to <tag></tag> with no value in between. So if we have something like <tag>somevalue</tag>, there is no issue.
Any thought please how to get the XML nodes format to be the same after writing?
I'd appreciate it!

You can't, and you shouldn't try. It's a bit like complaining that when you add 0123 and 0234, you get 357 without the leading zeroes. Leading zeroes in integers aren't considered significant, so arithmetic operations don't preserve them. The same happens to insignificant details of your XML, like the distinction between double quotes and single quotes, and the distinction between a self-closing tags and a start/end tag pair for an empty element. If any consumer of the XML is depending on these details, they need to be sent for retraining.
The most usual reason for asking for lexical details to be preserved is that you want to detect changes. But this means you are doing your comparisons the wrong way: you should be comparing at the logical level, not the physical level. One way to do comparisons is to canonicalize the XML, so whenever there is an arbitrary choice to be made between equivalent representations, it is made the same way.

Related

Two Xmls input, One output with XSL transform

I'm trying to write an XSL that basically need to take some values from one xml and other from another and output a XML. I've searched online for some solution and I found that I've to put this <xsl:variable name='file' select="'file:///C:/Users/file.xml'"> inside my input XML which is supposed to load another XML and store it into a variable but from this I dont know how to get the tags value of the document.
The file.xml is this one
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<silosMediaObject>
<canBeDeleted>-1</canBeDeleted>
<checkedOut>-1</checkedOut>
<checkedOutBy>-1</checkedOutBy>
<deleted>-1</deleted>
<description>Traccia audio migrata da ASCN</description>
<externalResourcePath>TEST/ASCN/lq/3763_2015-05-05.mp3</externalResourcePath>
<fileName>3763_2015-05-05.mp3</fileName>
<framesPerSecond>-1</framesPerSecond>
<hasScheduledIngestion>false</hasScheduledIngestion>
<isArchived>-1</isArchived>
<isArchiving>-1</isArchiving>
<isAvailable>-1</isAvailable>
<isEncoding>-1</isEncoding>
<isRestoring>-1</isRestoring>
<isVerified>-1</isVerified>
<mediaObjectId>-1</mediaObjectId>
<mediaTypeId>-1</mediaTypeId>
<mosId>4347</mosId>
<resourceIsExternal>-1</resourceIsExternal>
<sourceMediaObjectId>-1</sourceMediaObjectId>
<state>AVAILABLE</state>
<versionLinkId>-1</versionLinkId>
</silosMediaObject>
The Java class I'm using to transform the file is this one:
public class TestMain {
public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("C:\\Users\\xmltemplate_transformer.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("C:\\Users\\tobe_transformed.xml"));
transformer.transform(text, new StreamResult(new File("C:\\Users\\out.xml")));
}
}

I've searched online for some solution and I found that I've to put this <xsl:variable name='file' select="'file:///C:/Users/file.xml'"> inside my input XML which is supposed to load another XML and store it into a variable
I don't know where you got that idea, but you're confused. The select value is interpreted as an XPath expression. Yours is a string literal containing a URL with the file scheme. As far as XPath or XSLT is concerned, it is just a string. One might do something further to cause the file designated by that URL to be parsed, but what you've presented has no such effect.
In particular, you might have wanted to do this:
<xsl:variable name='file' select="document('file:///C:/Users/file.xml')"/>
The document() function is the secret sauce that actually causes the designated file to be read and parsed (if possible); when used as shown, its result is a node set containing the root node of the resulting document, or an empty node set if the designated document cannot be parsed and the processor elects not to signal an error.
Note: when you say you put the xsl:variable "inside my input XML", I presume you mean at an appropriate place inside your (XML-based) XSL stylesheet. If you actually mean that you have placed it in a different XML data file that you are processing, then it will have no direct effect there, other than to be included, as itself, in the input tree.
but from this I dont know how to get the tags value of the document.
Having successfully parsed the file, you can use the resulting node set anywhere that XSLT expects an expression that evaluates to a node set. In particular, within its scope, you can use a reference to the variable you've defined ($file) as an argument to XPath functions, or as a whole expression, such as the select expression of an xsl:apply-templates. Since you haven't said what, specifically, you want to do with the contents, I cannot be any more specific myself. See what you can do, and if you can't figure out the details then that could be a suitable topic for a new question.

Relace HWPFDocument paragraph text using java results strange output

I require to replace a HWPFDocument paragraph text of .doc file if it contains a particular text using java. It replaces the text. But the process writes the output text in a strange way. Please help me to rectify this issue.
Code snippet used:
public static HWPFDocument processChange(HWPFDocument doc)
{
try
{
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++)
{
Paragraph paragraph = range.getParagraph(i);
if (paragraph.text().contains("Place Holder"))
{
String text = paragraph.text();
paragraph.replaceText(text, "*******");
}
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
return doc;
}
Input:
Place Holder
Textvalue1
Textvalue2
Textvalue3
Output:
*******Textvalue1
Textvalue1
Textvalue2
Textvalue3

The HWPF library is not in a perfect state for changing / writing .doc files. (At least at the last time that I looked. Some time ago I developed a custom variant of HWPF for my client which - among many other things - provides correct replace and save operations, but that library is not publicly available.)
If you absolutely must use .doc files and Java you may get away by replacing with strings of exactly same length. For instance "12345" -> "abc__" (_ being spaces or whatever works for you). It might make sense to find the absolute location of the to be replaced string in the doc file (using HWPF) and then changing it in the doc file directly (without using HWPF).
Word file format is very complicated and "doing it right" is not a trivial task. Unless you are willing to spend many man months, it will also not be possible to fix part of the library so that just saving works. Many data structures must be handled very precisely and a single "slip up" lets Word crash on the generated output file.

Externalize XML construction from a stream of CSV in Java

I get a stream of values as CSV , based on some condition I need to generate a XML including only a set of values from the CSV. For e.g .
Input : a:value1, b:value2, c:value3, d:value4, e:value5.
if (condition1)
XML O/P = <Request><ValueOfA>value1</ValueOfA><ValueOfE>value5</ValueOfE></Request>
else if (condition2)
XML O/P = <Request><ValueOfB>value2</ValueOfB><ValueOfD>value4</ValueOfD></Request>
I want to externalize the process in a way that given a template the output XML is generated accordingly. String manipulation is the easiest way of implementing this but I do not want to mess up the XML if some special characters appear in the input, etc. Please suggest.

Perhaps you could benefit from templating engine, something like Apache Velocity.

I would suggest creating an xsd and using JAXB to create the Java binding classes that you can use to generate the XML.

I recommend my own templating engine (JATL http://code.google.com/p/jatl/) Although its geared to (X)HTML its also very good at generating XML.
I didn't bother solving the whole problem for you (that is double splitting on the input ("," and then ":").) but this is how you would use JATL.
final String a = "stuff";
HtmlWriter html = new HtmlWriter() {
#Override
protected void build() {
//If condition1
start("Request").start("ValueOfA").text(a).end().end();
}
};
//Now write.
StringWriter writer = new StringWriter();
String results = html.write(writer).getBuffer().toString();
Which would generate
<Request><ValueOfA>stuff</ValueOfA></Request>
All the correct escaping is handled for you.

Parsing an XML file without root in Java

I have this XML file which doesn't have a root node. Other than manually adding a "fake" root element, is there any way I would be able to parse an XML file in Java? Thanks.

I suppose you could create a new implementation of InputStream that wraps the one you'll be parsing from. This implementation would return the bytes of the opening root tag before the bytes from the wrapped stream and the bytes of the closing root tag afterwards. That would be fairly simple to do.
I may be faced with this problem too. Legacy code, eh?
Ian.
Edit: You could also look at java.io.SequenceInputStream which allows you to append streams to one another. You would need to put your prefix and suffix in byte arrays and wrap them in ByteArrayInputStreams but it's all fairly straightforward.

Your XML document needs a root xml element to be considered well formed. Without this you will not be able to parse it with an xml parser.

One way is to provide your own dummy wrapper without touching the original 'xml' (the not well formed 'xml') Need the word for that:
Syntax
<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
<!ENTITY entity-name "Some value to be inserted at the entity">
]
Example:
<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>

You could use another parser like Jsoup. It can parse XML without a root.

I think even if any API would have an option for this, it will only return you the first node of the "XML" which will look like a root and discard the rest.
So the answer is probably to do it yourself. Scanner or StringTokenizer might do the trick.
Maybe some html parsers might help, they are usually less strict.

Here's what I did:
There's an old java.io.SequenceInputStream class, which is so old that it takes Enumeration rather than List or such.
With it, you can prepend and append the root element tags (<div> and </div> in my case) around your no-root XML stream. (You shouldn't do it by concatenating Strings due to performance and memory reasons.)
public void tryExtractHighestHeader(ParserContext context)
{
String xhtmlString = context.getBody();
if (xhtmlString == null || "".equals(xhtmlString))
return;
// The XHTML needs to be wrapped, because it has no root element.
ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());
try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(wrapped);
From here you can do whatever you like, but keep in mind the extra element.
XPath xPath = XPathFactory.newInstance().newXPath();
}
catch (Exception e) {
throw new RuntimeException("Failed parsing XML: " + e.getMessage());
}
}

Creating xml from with java

I need your expertise once again. I have a java class that searches a directory for xml files (displays the files it finds in the eclipse console window), applies the specified xslt to these and sends the output to a directory.
What I want to do now is create an xml containing the file names and file format types. The format should be something like;
<file>
<fileName> </fileName>
<fileType> </fileType>
</file>
<file>
<fileName> </fileName>
<fileType> </fileType>
</file>
Where for every file it finds in the directory it creates a new <file>.
Any help is truely appreciated.

Use an XML library. There are plenty around, and the third party ones are almost all easier to use than the built-in DOM API in Java. Last time I used it, JDom was pretty good. (I haven't had to do much XML recently.)
Something like:
Element rootElement = new Element("root"); // You didn't show what this should be
Document document = new Document(rootElement);
for (Whatever file : files)
{
Element fileElement = new Element("file");
fileElement.addContent(new Element("fileName").addContent(file.getName());
fileElement.addContent(new Element("fileType").addContent(file.getType());
}
String xml = XMLOutputter.outputString(document);

Have a look at DOM and ECS. The following example was adapted to you requirements from here:
XMLDocument document = new XMLDocument();
for (File f : files) {
document.addElement( new XML("file")
.addXMLAttribute("fileName", file.getName())
.addXMLAttribute("fileType", file.getType())
)
);
}

You can use the StringBuilder approach suggested by Vinze, but one caveat is that you will need to make sure your filenames contain no special XML characters, and escape them if they do (for example replace < with <, and deal with quotes appropriately).
In this case it probably doesn't arise and you will get away without it, however if you ever port this code to reuse in another case, you may be bitten by this. So you might want to look at an XMLWriter class which will do all the escaping work for you.

Well just use a StringBuilder :
StringBuilder builder = new StringBuilder();
for(File f : files) {
builder.append("<file>\n\t<fileName>").append(f.getName).append("</fileName>\n)";
[...]
}
System.out.println(builder.toString());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.