I have a .dot file using which i will be creating doc file with values replaced.
eg: .dot file having <claimId>
I will replace <claimId> with real claim Id say 1234 and generate a doc file.
I am using Apache POI HWPFDocument, when using HWPFDocument i am getting issues when i replace text inside table.
So i tried XWPFDocument i can feed only .dotx files.
I have no issue when using dotx file with XWPFDocument and successfully generated docx files. Now i need to convert .dot files to .dotx files from java.
Can someone help me on this...
There is no automated way to do this conversion in Apache POI as far as I know.
You can manually convert .dot files to .dotx by opening the file in Microsoft Word and saving it in the newer format.
Related
I have an app that automatically processes a range of excel files but i have one issue. For some files I have what seems to be an html file with a .xls file extension (opening in excel gives corrupt warning and resaving shows it wants to save as an html).
When using Apachi POI:
try (Workbook wkbk = WorkbookFactory.create(myCorruptFile)) {
//myCorruptFile is of type File
This fails to process with apache poi NotOLE2FileException error below
Invalid header signature; read 0x0A0D3E6C6D74683C, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document, { }
If I manually resave as a .xls the file will process appropriately, but is there a way to detect and resave/convert this file via java 11? Manually converting the files isn't an option for me as opposed to an automated one.
myCorruptFile.getContentType()
Gives content type as:
application/vnd.ms-excel
And using Apache Tika gives detected type as:
tika.detect(myCorruptFile.getBytes())
text/html
(My maven pom has no filtering)
I want to append into the existing text file. For that I have tried all this in plenty of way FileWriter,BufferedWriter,PrintWriter,RandomAccessFile,OutputStream,FileOutputStream,PrintStream but I can't get my desired output.
This error java.io.FileNotFoundException: could not open file '//file:/usr/backupdata/5605.txt' using mode 'a+' sucks. (I am working with ewon flexy hardware which supports javaetk 1.4 only)
You are trying to use a url as a file name. That won’t work.
Use
/use/backupdata/5605.txt
as the file name.
I'm trying to process docx file with Apache POI. Just simply read and then write file (just for now). Here is my simple code:
FileInputStream fileInputStream = new FileInputStream(inputFile);
XWPFDocument document = new XWPFDocument(OPCPackage.open(fileInputStream));
FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
document.write(fileOutputStream);
fileOutputStream.flush();
fileOutputStream.close();
fileInputStream.close();
Problem is that input file has small image in header. Because of that after processing input file with POI and opening output file in Microsoft Word I get corrupted file error :
Microsoft Office cannot open this file because some parts are missing or invalid.
Location: Part: /word/settings.xml, Line: 2, Column: 0
Everything works in OO Writer, but not in office.
The question is : what is wrong? Does apache POI not process files with image in header? Do you know any way to work around the problem?
I NEED to use Apache POI, I don't take into consideration other tools. Also I use POI 3.8
The problem is not with the image header but with the Apache POI jar version. Use the latest jars.
poi-3.10-FINAL.jar
poi-ooxml-3.10-FINAL.jar
poi-ooxml-schemas-3.10-FINAL.jar
ooxml-schemas-1.1.jar
Having the above jars solved the issue for me.
I have to attach a pdf file in a pptx slide during runtime.
Tried the following:
Attached a pdf file in the pptx slide (Insert -> Object -> Adobe Acrobat Document).
Accessed the oleobject using the following code :
OleObjectBinaryPart oleObjectBinaryPart = new OleObjectBinaryPart(new PartName("/ppt/embeddings/oleObject1.bin"));
Updating the oleObjectBinaryPart using the following code:
oleObjectBinaryPart.setBinaryData(reportBlob.getBinaryStream());
Updating the pptx with the new oleobject:
pptMlPackage.getParts().getParts().put(new PartName("/ppt/embeddings/oleObject1.bin"), oleObjectBinaryPart);
pptMlPackage.save(new File("C:/test_report/pptx_out.pptx"));
After executing this code the pptx_out.pptx file got generated without any errors. But while trying to open the embedded pdf in powerpoint 2010 I'm getting following error:
The server application, source file, or item can't be found, or returned an unknown error. You may need to reinstall the sever application.
Is it a problem with the oleobject when updating?
You can't just attach the PDF as a binary blob; it has to be in the correct OLE format.
See further this discussion.
I was trying to use Apache POI (Version 3.6) to parse Excel .xls file, but got only Exception:
java.io.IOException: Invalid header signature; read 0x07B1FD124BEDF108, expected 0xE11AB1A1E011CFD0
I have Googled some result, which basically said that "The file is actually not a valid excel file (i.e. .csv and so on) but ended with the suffix .xls". But I'm quite sure that my excel file is valid (in Excel97-2003 format).
For secrecy considerations, I couldn't post my excel, but when I use emacs hexl-mode to view this binary excel file, the header is:
D0CF 11E0 A1B1 1AE1
I think it is just what POI expected (E11AB1A1E011CFD0, but in big-endian). But why I got the exception?
BTW, if I use vim with command %!xxd to view the same excel file, I got a header different from eamcs:
C390 C38F 11C3 A0C2
And the whole binary file seems totally different. I cannot understand.
Thanks for any of your help!
If you get that exception, then your file really isn't a true .xls file. It will instead either be some other file, renamed to have a .xls extension, or a corrupted file.
I'd suggest you try opening the file in Excel, and do a Save-As. That may give you a hint as to the file type. If not, do a save-as as Excel .xls, and then you'll be able to open that file
I don't know what you file is (I don't recognise the header), but I can assure you that it isn't an OLE2 header as a valid .xls file would have.
It's possible that Apache Tika may be able to work out what kind of binary file it is, so you could always try with the Tika-App jar
Just an idea, if you using maven make sure in the resource tag filtering is set to false. Otherwise maven tends to corrupt xls files in the copying phase in in your pom.xml