How to read / write into docx file using commons.io.FileUtils?

How to read / write into docx file using commons.io.FileUtils? - java

Need some quick help. I am trying to write a java program to generate a report. I have the report template in a docx file.
What I want to do is, use that docx file as template and put data in it multiple times for various records and write that to a new docx file. The main thing is I want to maintain the formatting and indentation of the contents inside the docx file. They are bullets data. And that's where the problem is.
Below is the piece of code handling the above operation,
public void readWriteDocx(HashMap<String, String> detailsMap) {
try {
File reportTemplateFile = new File("ReportTemplate.docx");
File actualReportFile = new File("ActualReport.docx");
StringBuilder preReport = new StringBuilder();
preReport.append("Some details about pre report goes here...: ");
preReport.append(System.lineSeparator());
String docxContent = "";
for (Map.Entry<String, String> entry : detailsMap.entrySet()) {
docxContent = FileUtils.readFileToString(reportTemplateFile, StandardCharsets.UTF_8);
// code to fetch and get data to insert into docxContent
docxContent = docxContent.replace("$filename", keyFilename);
docxContent = docxContent.replace("$expected", expectedFile);
docxContent = docxContent.replace("$actual", actualFile);
docxContent = docxContent.replace("$reportCount", String.valueOf(reportCount));
docxContent = docxContent.replace("$diffMessage", key);
FileUtils.writeStringToFile(actualReportFile, docxContent, StandardCharsets.UTF_8, true);
}
preReport.append(FileUtils.readFileToString(actualReportFile, StandardCharsets.UTF_8));
System.out.print(preReport.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
As you can see, I am using FileUtils read and write methods and using UTF_8 encoding. That's just a guess, I am not sure about the same. I am trying to append the newly generated docx file contents to a string builder and print the same on console, but that's secondary. Main thing is that the docx should be written properly. But no luck.
When this prints, its all weird characters and nothing is readable. When I try to open the newly generated docx file, it doesn't even open.
Any idea what should I do to get the data in proper format. I am attaching image file of how my ReportTemplate.docx looks, that I am using as a template to generate this report. I am using commons-io-2.4.jar
Please guide if you can. Thanks a lot.

You can use Apache POI for creating and editing doc docx files or docx4j. Otherwise there is no simple way to edit doc or docx files without these libraries.

Related

Is possible to read a tif file like a txt, delete some header rows, and save it back to a tif file?

i'm trying to delete the first 3 rows of a tif file content generated by a scanner, because i cant open correctly.
example of rows to delete:
------=_Part_23XX49_-1XXXX3073.1XXXXX20715
ID: documento<br>
MimeType: image/tiff
I have no problem about change the content, but when i save the new file, i cant open correctly again.
System.out.println(new InputStreamReader(in).getEncoding());
this method tell me that the encoding of source file is "Cp1252", so i've put an argument in the JVM (-Dfile.encoding=Cp1252), but nothing appear to change.
This is what i do:
StringBuilder fileContent = new StringBuilder();
// working with content and save result content in fileContent variable
// save the file again
FileWriter fstreamWrite = new FileWriter(f.getAbsolutePath());
out = new BufferedWriter(fstreamWrite);
out.write(fileContent.toString());
Is possible that something is going wrong with Encoding?
if i do the operation with notepad++, i obtain a correct tiff that i can open without problem.

I found the TIFF Java library that maybe gonna be useful for your requirements.
Please take a look at the readme how to read and how to write a tiff file.
Hope this can help you

OpenCSV reads strange text out of file

I am using AndroidStudio and my applications has to read in a CSV file which looks like this:
"Anmeldung";"1576017126809898";"1547126680978123";"";"";"Frau"
"Anmeldung";"1547126680911112";"1547126680978123";"";"";"Frau"
But as you can see in the following picture, OpenCSV reads some strange character and in my List there are senseless Strings which are not in the file it read
This is how I read the Data out of my file:
try {
FileReader filereader = new FileReader(filePath);
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader csvReader = new CSVReaderBuilder(filereader)
.withSkipLines(1)
.withCSVParser(parser)
.build();
List<String[]> allData = csvReader.readAll();
MainActivity.setAllData(allData);
}
catch (Exception e) {
e.printStackTrace();
}
Thank you

It looks like there is an encoding poblem.
Make sure to open and parse the file with the proper encoding (for example utf-8 or utf-16). Same for viewing the data.

I figured it out. It might sound strange but I took the file and replaced all ; with ;
I think the data I got were exportet with an UTF-16 Encoding or from an linux device.
tl;dr The File had the wrong encoding. And the way I opened and viewed it was correct

Text search through multiple file encoding

I am trying to find a specific word from list of files and these files can be ASCII, Unicode or some other format.
So far I can only work on ASCII files . Is there any way to do same operation with other file encoding formats.
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("C:\\New Microsoft Word Document.docx")));
while (s.hasNext()) {
// final String lineFromFile = s.nextLine();
// if(lineFromFile.contains("DE")){
System.out.println(s.next());
// break;
// }
}
} finally {
if (s != null) {
s.close();
}
}
I get the following results
Q[µM¡°‰”Ø÷Þ3{:½¹®’)xTÖä¬?µXFÚB™QÎÞ‡Ïé=K0SˆÊÈÙ?õº×W?áÂ&¤6˜³qî?s”cÐ3ëÀÐJi½?^ýˆ;!¿Äøm«uÇ¥5LHCô`ÝÎ”bR…¤?§Ï+gF,y\í‹Q9S:êãw~Pá¡Â=‰p®RRª?OM±Ç•®™2R.÷àX9¼!ð#
qe—i;`{¥fzU#2>¼Mä|f}Á
+'šªÎNÛ

docx is not a text format with a different encoding, it's a completely different, non-text file format. Basically, it'a zip archive of various files and folders (with the main data in some xml files). You can't just read it as a text file, you need to use a library such as Apache POI, or some kind of file converter to obtain the text from it.

This has nothing to do with a different text encoding.
docx is a special format from microsoft which holds various information about a document (packed as a zip archive).
You could read the file using java ZipFile and get the entry: word/document.xml
document.xml contains the text of the word document. You can read then through this file and output specific lines.
Pseudocode:
ZipFile file = new ZipFile("doc.docx");
InputStream input = file.getInputStream(file.getEntry("word/document.xml"));
input contains now the text information.
EDIT: document.xml contains the text of the document, but there are many xml tags which you would have to filter out

Apache POI Formatting issue

I was wondering if someone could help me figure out why my text is not lining up when I read a .doc file. So far in my code I am using WordExtractor, but I am having formatting issue with stuff not lining up correctly. Here is my code that was written using Java 1.7.
public class Doc {
File docFile = null;
WordExtractor docExtractor = null ;
WordExtractor exprExtractor = null ;
public void read(){
docFile = new File("blue.doc");
try{
FileInputStream fis = new FileInputStream(docFile.getAbsolutePath());
HWPFDocument doc=new HWPFDocument(fis);
docExtractor = new WordExtractor(doc);
}catch(Exception e){
System.out.println(e.getMessage());
}
System.out.println(docExtractor.getText());
}
}
How the program displays the document.
A E
I'm stuck in Folsom Prison, and time keeps draggin on.
It is supposed to be displayed like this
A E
I'm stuck in Folsom Prison, and time keeps draggin on.

Of course this will not work. You are extracting the content of a document file into a string variable (which will distort formatting into document like paragraphs and all). Further you are printing the text into console and then you expect that it will look exactly like in Microsoft word?
Next, you should think what do you want to do. Assuming that you want to verify both formatting and content of the document, my answer follows. Converting a document into plain text using getText() will give you content of document in a distorted format which does not help you. By using POI library you should instead try to access each paragraph and table in the document and verify/read/write whatever you want to.
doc.getRange() would give you a Range object. Play with this object by referring to http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/Range.html and you would be able to access all paragraphs, tables and sections in the document. That should help you in working out the word document through program.

convert large csv to xml and print xml data to a GUI text area

I am developing a Java application that reads a .CSV file, displays the content of a GUI textarea and convert ths content to XML data(prints XML on a textarea as well) this XML data is now transformed using XSLT.
My application accepts a .CSV file, converting comma separated values data to XML has been a challenge for me. I have read loads of materials on it and I still haven't grasped the concept yet. Can anyone direct me to how I can do this?

You should make a java class that implements Serializable. Then as you read the csv file in, populate each field in that class. Then you can use the Java XMLEncoder to write to an XML file like this.
XMLEncoder encoder = null;
MyClass data = new MyClass();
data.setField1("field 1 from csv");
try {
encoder = new XMLEncoder(new BufferedOutputStream(new FileOutputStream("c:/myfile.xml")));
encoder.writeObject(data);
} catch (final IOException e) {
logger.error(e.getMessage());
} finally {
if (encoder != null) {
encoder.close();
}
}

From your question I read, that you're already to process the csv files and that you're xml schema is already defined (you mentioned an xslt that operates on the result of the csv->xml transformation).
I'd recommend using a small xml library like dom4j to create the xml document. The quick start guide for dom4j has a short example that shows the steps for Creating a new XML document and Converting to and from Strings.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read / write into docx file using commons.io.FileUtils? - java

You can use Apache POI for creating and editing doc docx files or docx4j. Otherwise there is no simple way to edit doc or docx files without these libraries.

Related

Is possible to read a tif file like a txt, delete some header rows, and save it back to a tif file?

OpenCSV reads strange text out of file

Text search through multiple file encoding

Apache POI Formatting issue

convert large csv to xml and print xml data to a GUI text area

Categories

Resources