How to edit MS Word documents using Java?

How to edit MS Word documents using Java? - java

I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java. I tried lot of libraries including 2-3 versions of docx4j but nothing work well, they all just didn't do anything!
I know this question has been asked before, but I tried all options I know. So, using what java library I can "really" replace/edit these templates? My preference goes to the "easy to use / Few line of codes" type libraries.
I am using Java 8 and my MS Word templates are in MS Word 2007.
Update
This code is written by using the code sample provided by SO member Joop Eggen
public Main() throws URISyntaxException, IOException, ParserConfigurationException, SAXException
{
URI docxUri = new URI("C:/Users/Yohan/Desktop/yohan.docx");
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);
Path documentXmlPath = zipFS.getPath("/word/document.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
//xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper"));
content = xml.getBytes(StandardCharsets.UTF_8);
Files.write(documentXmlPath, content);
}
However this returns the below error
java.nio.file.ProviderNotFoundException: Provider "C" Not found
at: java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)
at java.nio.fileFileSystems.newFileSystem(FileSystems.java:276)

One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing.
URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
Path documentXmlPath = zipFS.getPath("/word/document.xml");
When using XML:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
//Element root = doc.getDocumentElement();
You can then use XPath to find the places, and write the XML back again.
It even might be that you do not need XML but could replace place holders:
byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
...
content = xml.getBytes(StandardCharsets.UTF_8);
Files.delete(documentXmlPath);
Files.write(documentXmlPath, content);
For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files.
File.write should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete as some error occured. See comments.

Try Apache POI. POI can work with doc and docx, but docx is more documented therefore support of it better.
UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx for templates because it more suitable and more documented

I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet:
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
Without such try-with-resources block, the FileSystem resource will not be closed (as explained in Java tutorial), and the word document not modified.

Stepping back a bit, there are about 4 different approaches for editing words/placeholders:
MERGEFIELD or DOCPROPERTY fields (if you are having problems with this in docx4j, then you have probably not set up your input docx correctly)
content control databinding
variable replacement on the document surface (either at the DOM/SAX level, or using a library)
do stuff as XHTML, then import that
Before choosing one, you should decide whether you also need to be able to handle:
repeating data (eg adding table rows)
conditional content (eg entire paragraphs which will either be present or absent)
adding images
If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone.
The other things to consider are:
your authors: how technical are they? What does that imply for the authoring UI?
the "user input" you mention for variable replacement, is this given, or is obtaining it part of the problem you are solving?

Please try this to edit or replace the word in document
public class UpdateDocument {
public static void main(String[] args) throws IOException {
UpdateDocument obj = new UpdateDocument();
obj.updateDocument(
"c:\\test\\template.docx",
"c:\\test\\output.docx",
"Piyush");
}
private void updateDocument(String input, String output, String name)
throws IOException {
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(input)))
) {
List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
//Iterate over paragraph list and check for the replaceable text in each paragraph
for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
String docText = xwpfRun.getText(0);
//replacement and setting position
docText = docText.replace("${name}", name);
xwpfRun.setText(docText, 0);
}
}
// save the docs
try (FileOutputStream out = new FileOutputStream(output)) {
doc.write(out);
}
}
}
}

Related

How to append a PDF file to an existing one with iText?

In an application I am trying to append multiple PDF files to a single already existing file. Using iText I found this
Using iText I found this tutorial, which, in my case doesn't seem to work.
Here are some ways I've tried to make it work.
String path = "path/to/destination.pdf";
PdfCopy mergedFile = new PdfCopy(pdf, new FileOutputStream(path));
PdfReader reader;
for(String toMergePath : toMergePaths){
reader = new PdfReader(toMergePath);
mergedFile.addDocument(reader);
mergedFile.freeReader(reader);
reader.close();
}
mergedFile.close();
When I try to add the document logcat tells me that the document is not open.
But, pdf (the original document) is already open by other methods, and closed only after this one. And, mergedFile is exactly like in the tutorial, which, I believe, must be right.
Did anyone experience the same problem? Otherwise, do anyone know a better method to do what I want to do?
I've seen other solutions copying the bite from one page and append them to a new file but I'm affraid this will "compile" the annotations which I need.
Thank you for your help,
Cordially,
Matthieu Meunier

I hope this code will help you.
public static void mergePdfs(){
try {
String[] files = { "D:\\1.pdf" ,"D:\\2.pdf" ,"D:\\3.pdf" ,"D:\\4.pdf"};
Document pDFCombineUsingJava = new Document();
PdfCopy copy = new PdfCopy(pDFCombineUsingJava , new FileOutputStream("D:\\CombinedFile.pdf"));
pDFCombineUsingJava.open();
PdfReader ReadInputPDF;
int number_of_pages;
for (int i = 0; i < files.length; i++) {
ReadInputPDF = new PdfReader(files[i]);
copy.addDocument(ReadInputPDF);
copy.freeReader(ReadInputPDF);
}
pDFCombineUsingJava.close();
}
catch (Exception i)
{
System.out.println(i);
}
}

PDFbox saying PDDocument closed when its not

I am trying to populate repeated forms with PDFbox. I am using a TreeMap and populating the forms with individual records. The format of the pdf form is such that there are six records listed on page one and a static page inserted on page two. (For a TreeMap larger than six records, the process repeats). The error Im getting is specific to the size of the TreeMap. Therein lies my problem. I can't figure out why when I populate the TreeMap with more than 35 entries I get this warning:
Apr 23, 2018 2:36:25 AM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document
public class test {
public static void main(String[] args) throws IOException, IOException {
// TODO Auto-generated method stub
File dataFile = new File("dataFile.csv");
File fi = new File("form.pdf");
Scanner fileScanner = new Scanner(dataFile);
fileScanner.nextLine();
TreeMap<String, String[]> assetTable = new TreeMap<String, String[]>();
int x = 0;
while (x <= 36) {
String lineIn = fileScanner.nextLine();
String[] elements = lineIn.split(",");
elements[0] = elements[0].toUpperCase().replaceAll(" ", "");
String key = elements[0];
key = key.replaceAll(" ", "");
assetTable.put(key, elements);
x++;
}
PDDocument newDoc = new PDDocument();
int control = 1;
PDDocument doc = PDDocument.load(fi);
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDAcroForm form = cat.getAcroForm();
for (String s : assetTable.keySet()) {
if (control <= 6) {
PDField IDno1 = (form.getField("IDno" + control));
PDField Locno1 = (form.getField("locNo" + control));
PDField serno1 = (form.getField("serNo" + control));
PDField typeno1 = (form.getField("typeNo" + control));
PDField maintno1 = (form.getField("maintNo" + control));
String IDnoOne = assetTable.get(s)[1];
//System.out.println(IDnoOne);
IDno1.setValue(assetTable.get(s)[0]);
IDno1.setReadOnly(true);
Locno1.setValue(assetTable.get(s)[1]);
Locno1.setReadOnly(true);
serno1.setValue(assetTable.get(s)[2]);
serno1.setReadOnly(true);
typeno1.setValue(assetTable.get(s)[3]);
typeno1.setReadOnly(true);
String type = "";
if (assetTable.get(s)[5].equals("1"))
type += "Hydrotest";
if (assetTable.get(s)[5].equals("6"))
type += "6 Year Maintenance";
String maint = assetTable.get(s)[4] + " - " + type;
maintno1.setValue(maint);
maintno1.setReadOnly(true);
control++;
} else {
PDField dateIn = form.getField("dateIn");
dateIn.setValue("1/2019 Yearlies");
dateIn.setReadOnly(true);
PDField tagDate = form.getField("tagDate");
tagDate.setValue("2019 / 2020");
tagDate.setReadOnly(true);
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
control = 1;
doc = PDDocument.load(fi);
cat = doc.getDocumentCatalog();
form = cat.getAcroForm();
}
}
PDField dateIn = form.getField("dateIn");
dateIn.setValue("1/2019 Yearlies");
dateIn.setReadOnly(true);
PDField tagDate = form.getField("tagDate");
tagDate.setValue("2019 / 2020");
tagDate.setReadOnly(true);
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
newDoc.save("PDFtest.pdf");
Desktop.getDesktop().open(new File("PDFtest.pdf"));
}
I cant figure out for the life of me what I'm doing wrong. This is the first week I've been working with PDFbox so I'm hoping its something simple.
Updated Error Message
WARNING: Warning: You did not close a PDF Document
Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1192)
at test.test.main(test.java:87)

The warning by itself
You appear to get the warning wrong. It says:
Warning: You did not close a PDF Document
So in contrast to what you think, "PDFbox saying PDDocument closed when its not", PDFBox says that you did not close a document!
After your edit one sees that it actually says that a COSStream has been closed and that a possible cause is that the enclosing PDDocument already has been closed. This is a mere possibility!
The warning in your case
That been said, by adding pages from one document to another you probably end up having references to those pages from both documents. In that case in the course of closing both documents (e.g. automatically via garbage collection), the second one closing may indeed stumble across some already closed COSStream instances.
So my first advice to simply do close the documents at the end by
doc.close();
newDoc.close();
probably won't remove the warnings, merely change their timing.
Actually you don't merely create two documents doc and newDoc, you even create new PDDocument instances and assign them to doc again and again, in the process setting the former document objects in that variable free for garbage collection. So you eventually have a big bunch of documents to be closed as soon as not referenced anymore.
I don't think it would be a good idea to close all those documents in doc early, in particular not before saving newDoc.
But if your code will eventually be run as part of a larger application instead of as a small, one-shot test application, you should collect all those PDDocument instances in some Collection and explicitly close them right after saving newDoc and then clear the collection.
Actually your exception looks like one of those lost PDDocument instances has already been closed by garbage collection, so you should collect the documents even in case of a simple one-shot utility to keep them from being GC disposed.
(#Tilman, please correct me if I'm wrong...)
Importing pages
To prevent problems with different documents sharing pages, you can try and import the pages to the target document and thereafter add the imported page to the target document page tree. I.e. replace
newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));
by
newDoc.addPage(newDoc.importPage(doc.getPage(0)));
newDoc.addPage(newDoc.importPage(doc.getPage(1)));
This should allow you to close each PDDocument instance in doc before losing it.
There are certain drawbacks to this, though, cf. the method JavaDoc and this answer here.
An actual issue in your code
In your combined document you will have many fields with the same name (at least in case of a sufficiently high number of entries in your CSV file) which you initially set to different values. And you access the fields from the PDAcroForm of the respective original document but don't add them to the PDAcroForm of the combined result document.
This is asking for trouble! The PDF format does consider forms to be document-wide with all fields referenced (directly or indirectly) from the AcroForm dictionary of the document, and it expects fields with the same name to effectively be different visualizations of the same field and therefore to all have the same value.
Thus, PDF processors might handle your document fields in unexpected ways, e.g.
by showing the same value in all fields with the same name (as they are expected to have the same value) or
by ignoring your fields (as they are not in the document AcroForm structure).
In particular programmatic reading of your PDF field values will fail because in that context the form is definitively considered document-wide and based in AcroForm. PDF viewers on the other hand might first show your set values and make look things ok.
To prevent this you should rename the fields before merging. You might consider using the PDFMergerUtility which does such a renaming under the hood. For an example usage of that utility class have a look at the PDFMergerExample.

Even though the above answer was marked as the solution to the problem, since the solution is buried in the comments, I wanted to add this answer at this level. I spent several hours searching for the solution.
My code snippets and comments.
// Collection solely for purpose of preventing premature garbage collection
List<PDDocument> sourceDocuments = new ArrayList<>( );
...
// Source document (actually inside a loop)
PDDocument docIn = PDDocument.load( artifactBytes );
// Add document to collection before using it to prevent the problem
sourceDocuments.add( docIn );
// Extract from source document
PDPage extractedPage = docIn.getPage( 0 );
// Add page to destination document
docOut.addPage( extractedPage );
...
// This was failing with "COSStream has been closed and cannot be read."
// Now it works.
docOut.save( bundleStream );

Marshalling CDATA elements with CDATA_SECTION_ELEMENTS adds carriage return characters

I'm working on an application that exports and imports data from / to a DB. The format of the data extract is XML and I'm using JAXB for the serialization / (un)marshalling. I want some elements to be marshalled as CDATA elements and am using this solution which sets OutputKeys.CDATA_SECTION_ELEMENTS to the Transformer properties.
So far this was working quite well, but now I came to a field in the DB that itself contains an XML string, which I also would like to place inside of a CDATA element. Now, for some reason the Transformer is now adding some unnecessary carriage return characters (\r) to each line end, so that the output looks like this:
This is my code:
private static final String IDENT_LENGTH = "3";
private static final String CDATA_XML_ELEMENTS = "text definition note expression mandatoryExpression optionalExpression settingsXml";
public static void marshall(final Object rootObject, final Schema schema, final Writer writer) throws Exception {
final JAXBContext ctx = JAXBContext.newInstance(rootObject.getClass());
final Document document = createDocument();
final Marshaller marshaller = ctx.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setSchema(schema);
marshaller.marshal(rootObject, document);
createTransformer().transform(new DOMSource(document), new StreamResult(writer));
}
private static Document createDocument() throws ParserConfigurationException {
// the DocumentBuilderFactory is actually being hold in a singleton
final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
return builderFactory.newDocumentBuilder().newDocument();
}
private static Transformer createTransformer() throws TransformerConfigurationException, TransformerFactoryConfigurationError {
// the TransformerFactory is actually being hold in a singleton
final TransformerFactory transformerFactory = TransformerFactory.newInstance();
final Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, CDATA_XML_ELEMENTS);
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", IDENT_LENGTH);
return transformer;
}
I'm passing a FileWriter to the marshall method.
My annotated model class looks like this:
#XmlType
#XmlRootElement
public class DashboardSettings {
#XmlElement
private String settingsXml;
public String getSettingsXml() {
return settingsXml;
}
public void setSettingsXml(final String settingsXml) {
this.settingsXml = settingsXml;
}
}
NOTE:
The XML string coming from the DB has Windows style line endings, i.e. \r and \n. I have the feeling that JAXB expects currently Linux style input (i. e. only \n) and is therefore adding a \r character because I'm currently running on a Windows machine. So the question is actually, what's the best way to solve this? Is there any parameter I can pass to control the line ending characters when marshalling? Or should I convert the line endings to Linux style prior marshalling? How will my program behave on different platforms (Windows, Linux, Mac OS)?
I don't necessarily need a platform independent solution, it's OK if the output is in Windows, Linux or whatever style. What I want to avoid is the combination \r\r\n as shown in the above screenshot.

I realise this question is pretty old, but I ran into a similar problem, so maybe an answer can help someone else.
It seems to be an issue with CDATA sections. In my case, I was using the createCDATASection method to create them. When the code was running on a Windows machine, an additional CR was added, as in your example.
I've tried a bunch of things to solve this "cleanly", to no avail.
In my project, the XML document was then exported to a string to POST to a Linux server. So once the string was generated, I just removed the CR characters, leaving only the LF:
myXmlString.replaceAll("\\r", "");
I might not be an appropriate solution for the specific question, but once again, it may help you (or someone else) find a solution.
Note: I'm stuck with Java 7 for this specific project, so it may have been fixed in a more recent version.

Using boilerpipe to extract non-english articles

I am trying to use boilerpipe java library, to extract news articles from a set of websites.
It works great for texts in english, but for text with special characters, for example, words with accent marks (história), this special characters are not extracted correctly. I think it is an encoding problem.
In the boilerpipe faq, it says "If you extract non-English text you might need to change some parameters" and then refers to a paper. I found no solution in this paper.
My question is, are there any params when using boilerpipe where i can specify the encoding? Is there any way to go around and get the text correctly?
How i'm using the library:
(first attempt based on the URL):
URL url = new URL(link);
String article = ArticleExtractor.INSTANCE.getText(url);
(second on the HTLM source code)
String article = ArticleExtractor.INSTANCE.getText(html_page_as_string);

You don't have to modify inner Boilerpipe classes.
Just pass InputSource object to the ArticleExtractor.INSTANCE.getText() method and force encoding on that object. For example:
URL url = new URL("http://some-page-with-utf8-encodeing.tld");
InputSource is = new InputSource();
is.setEncoding("UTF-8");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);
Regards!

Well, from what I see, when you use it like that, the library will auto-chose what encoding to use. From the HTMLFetcher source:
public static HTMLDocument fetch(final URL url) throws IOException {
final URLConnection conn = url.openConnection();
final String ct = conn.getContentType();
Charset cs = Charset.forName("Cp1252");
if (ct != null) {
Matcher m = PAT_CHARSET.matcher(ct);
if(m.find()) {
final String charset = m.group(1);
try {
cs = Charset.forName(charset);
} catch (UnsupportedCharsetException e) {
// keep default
}
}
}
Try debugging their code a bit, starting with ArticleExtractor.getText(URL), and see if you can override the encoding

Ok, got a solution.
As Andrei said, i had to change the class HTMLFecther, which is in the package de.l3s.boilerpipe.sax
What i did was to convert all the text that was fetched, to UTF-8.
At the end of the fetch function, i had to add two lines, and change the last one:
final byte[] data = bos.toByteArray(); //stays the same
byte[] utf8 = new String(data, cs.displayName()).getBytes("UTF-8"); //new one (convertion)
cs = Charset.forName("UTF-8"); //set the charset to UFT-8
return new HTMLDocument(utf8, cs); // edited line

Boilerpipe's ArticleExtractor uses some algorithms that have been specifically tailored to English - measuring number of words in average phrases, etc. In any language that is more or less verbose than English (ie: every other language) these algorithms will be less accurate.
Additionally, the library uses some English phrases to try and find the end of the article (comments, post a comment, have your say, etc) which will clearly not work in other languages.
This is not to say that the library will outright fail - just be aware that some modification is likely needed for good results in non-English languages.

Java:
import java.net.URL;
import org.xml.sax.InputSource;
import de.l3s.boilerpipe.extractors.ArticleExtractor;
public class Boilerpipe {
public static void main(String[] args) {
try{
URL url = new URL("http://www.azeri.ru/az/traditions/kuraj_pehlevanov/");
InputSource is = new InputSource();
is.setEncoding("UTF-8");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);
System.out.println(text);
}catch(Exception e){
e.printStackTrace();
}
}
}
Eclipse:
Run > Run Configurations > Common Tab. Set Encoding to Other(UTF-8), then click Run.

I had the some problem; the cnr solution works great. Just change UTF-8 encoding to ISO-8859-1. Thank's
URL url = new URL("http://some-page-with-utf8-encodeing.tld");
InputSource is = new InputSource();
is.setEncoding("ISO-8859-1");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);

XML parsing java confirmation

this is the way i made an XML file to a Java object(s).
i used "xjc" and a valid XML schema, and i got back some "generated" *.java files.
i imported them into a different package in eclipse.
I am reading the XML file in 2 way now.
1) Loading the XML file:
System.out.println("Using FILE approach:");
File f = new File ("C:\\test_XML_files\\complex.apx");
JAXBElement felement = (JAXBElement) u.unmarshal(f);
MyObject fmainMyObject = (MyObject) felement.getValue ();
2) Using a DOM buider:
System.out.println("Using DOM BUILDER Approach:");
JAXBElement element = (JAXBElement) u.unmarshal(test());;
MyObject mainMyObject = (MyObject ) element.getValue ();
now in method "test()" the code below is included:
public static Node test(){
Document document = parseXmlDom();
return document.getFirstChild();
}
private static Document parseXmlDom() {
Document document = null;
try {
// getting the default implementation of DOM builder
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
// parsing the XML file
document = builder.parse(new File("C:\\test_XML_files\\MyXML_FILE.apx"));
} catch (Exception e) {
// catching all exceptions
System.out.println();
System.out.println(e.toString());
}
return document;
}
is this the standard way of doing XML to an Java Object?
I tested if I could access the object and everything works fine. (so far)
Do you suggest a different approach?? or is this one sufficient?

I don't know about a "standard way", but either way looks OK to me. The first way looks simpler ( less code ) so that's the way I'd probably do it, unless there were other factors / requirements.
FWIW, I'd expect that the unmarshal(File) method was implemented to do pretty much what you are doing in your second approach.
However, it is really up to you (and your co-workers) to make judgments about what is "sufficient" for your project.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to edit MS Word documents using Java? - java

Try Apache POI. POI can work with doc and docx, but docx is more documented therefore support of it better. UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx for templates because it more suitable and more documented

Related

How to append a PDF file to an existing one with iText?

PDFbox saying PDDocument closed when its not

Marshalling CDATA elements with CDATA_SECTION_ELEMENTS adds carriage return characters

Using boilerpipe to extract non-english articles

XML parsing java confirmation

Categories

Resources