I have an app that automatically processes a range of excel files but i have one issue. For some files I have what seems to be an html file with a .xls file extension (opening in excel gives corrupt warning and resaving shows it wants to save as an html).
When using Apachi POI:
try (Workbook wkbk = WorkbookFactory.create(myCorruptFile)) {
//myCorruptFile is of type File
This fails to process with apache poi NotOLE2FileException error below
Invalid header signature; read 0x0A0D3E6C6D74683C, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document, { }
If I manually resave as a .xls the file will process appropriately, but is there a way to detect and resave/convert this file via java 11? Manually converting the files isn't an option for me as opposed to an automated one.
myCorruptFile.getContentType()
Gives content type as:
application/vnd.ms-excel
And using Apache Tika gives detected type as:
tika.detect(myCorruptFile.getBytes())
text/html
(My maven pom has no filtering)
Related
I have some problems with Maven, Excel and poi package.
I access to an excel file thanks to the code :
Workbook workbook = WorkbookFactory.create(new File("src/main/resources/file.xlsx"));
Sheet sheet = workbook.getSheet(sheetName);
This code works correctly and I can read data inside later in my code.
Instead of a "new File(..)", I have to use this code below to access resources in dev mode and once the jar is built.
ClassLoader classLoader = ClassLoader.getSystemClassLoader();
String path = classLoader.getResource(fileName).toURI().getPath();
The given path is in "target/classes" and Maven do a "copy" of this file into folder "myproject/target/classes" of the current project(perfect so).
However, the xslx file copied by Maven is corrupted and neither by using Excel software, I can't access to its content. The original file size is 500Kb, the copied file size is more than 1Mb. (All other files img,txt.. are well copied excepted the xslx files)
I done lots of searches, I could find some answers like :
FileInputStream vs ClassPathResource vs getResourceAsStream and file integrity
. I tried all solutions I could find but impossible to solve mine and I always get the same error :
InvalidOperationException: Could not open the specified zip entry source stream
Or
java.io.FileNotFoundException: file.xlsx
From the same way of classLoader, I can access to my json, txt and image files.
Someone has answer on this issue ?
Why Maven doubles the size of the xlsx files and why they are corrupted ?
Any solution to solve that ?
I need help
I am using Apache POI to read and write Excel files for both xls and xlsx formats.
If the code processes folloiwng line for a file written by POI/my code , it doesn't throw an exception but in case of a file written from Excel by user, I get
org.apache.poi.ss.formula.FormulaParseException: Specified named range 'LOCAL_YEAR_FORMAT' does not exist in the current workbook.
The Exception is fired at:
postWB.getSheet(postSheet.getSheetName()).shiftRows(i, postSheet.getPhysicalNumberOfRows(), 1);
As it turns out from some of the questions here, it can be of jar compatibilities or due to bug. I have changed all jars for latest poi library and dependencies.
Any workaround to shiftrows without finding this exception.
I use R XLConnect package.
When I wrote 'XLConnect' function, such as loadWorkbook(), readWorksheetFromFile() etc. , this error message happen.
Error: IllegalArgumentException (Java): Your InputStream was neither
an OLE2 stream, nor an OOXML stream
How to solve this problem?
Before using this function, I took action against crashing between R and Mac OS X by the way http://www.r-bloggers.com/getting-r-and-java-1-8-to-work-together-on-osx/ link told.
I have used Mac OS X.
This message states that the file you have provided to loadWorkbook has not been recognized as *.xls (BIFF-8) or *.xlsx (OOXML) file.
I am having the same issue following a Java update.
I was asking to load a .xlsx file to the function loadWorkbook() of the R XLConnect package.
I temporary solved the issue by asking to load an .xls file.
I also use OS X and after working without problem for a while with this function this error raises without apparent reason... But the reason is really simple. Excel (actually, all MS Office suite) creates temporary files meanwhile you have open the file. This file is hidden:
In my case, I list .xlsx files to open them inside a loop. So, the first file was a hidden file and the error raised. Closing excel (to delete those file) is the solution to avoid this error.
I have a .dot file using which i will be creating doc file with values replaced.
eg: .dot file having <claimId>
I will replace <claimId> with real claim Id say 1234 and generate a doc file.
I am using Apache POI HWPFDocument, when using HWPFDocument i am getting issues when i replace text inside table.
So i tried XWPFDocument i can feed only .dotx files.
I have no issue when using dotx file with XWPFDocument and successfully generated docx files. Now i need to convert .dot files to .dotx files from java.
Can someone help me on this...
There is no automated way to do this conversion in Apache POI as far as I know.
You can manually convert .dot files to .dotx by opening the file in Microsoft Word and saving it in the newer format.
I was trying to use Apache POI (Version 3.6) to parse Excel .xls file, but got only Exception:
java.io.IOException: Invalid header signature; read 0x07B1FD124BEDF108, expected 0xE11AB1A1E011CFD0
I have Googled some result, which basically said that "The file is actually not a valid excel file (i.e. .csv and so on) but ended with the suffix .xls". But I'm quite sure that my excel file is valid (in Excel97-2003 format).
For secrecy considerations, I couldn't post my excel, but when I use emacs hexl-mode to view this binary excel file, the header is:
D0CF 11E0 A1B1 1AE1
I think it is just what POI expected (E11AB1A1E011CFD0, but in big-endian). But why I got the exception?
BTW, if I use vim with command %!xxd to view the same excel file, I got a header different from eamcs:
C390 C38F 11C3 A0C2
And the whole binary file seems totally different. I cannot understand.
Thanks for any of your help!
If you get that exception, then your file really isn't a true .xls file. It will instead either be some other file, renamed to have a .xls extension, or a corrupted file.
I'd suggest you try opening the file in Excel, and do a Save-As. That may give you a hint as to the file type. If not, do a save-as as Excel .xls, and then you'll be able to open that file
I don't know what you file is (I don't recognise the header), but I can assure you that it isn't an OLE2 header as a valid .xls file would have.
It's possible that Apache Tika may be able to work out what kind of binary file it is, so you could always try with the Tika-App jar
Just an idea, if you using maven make sure in the resource tag filtering is set to false. Otherwise maven tends to corrupt xls files in the copying phase in in your pom.xml