I am having an issue with some code I'm writing in Java using PDFBox. I am attempting to populate a PDF with particular forms based on values read from an excel spreadsheet. Below is my class file.
import java.io.FileInputStream;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.hssf.usermodel.*;
/**
* This is a test file for reading and populating a PDF with specific forms
*/
public class JU_TestFile {
PDPage Stick_Form;
PDPage IKE_Form;
PDPage BO_Form;
/**
* Constructor.
*/
public JU_TestFile() throws IOException
{
this.BO_Form = (PDPage) PDDocument.load(new File("C:\\Users\\saf\\Desktop\\JavaTest\\BO Pole Form.pdf")).getPage(0);
this.IKE_Form = (PDPage) PDDocument.load(new File("C:\\Users\\saf\\Desktop\\JavaTest\\IKE Form.pdf")).getPage(0);
this.Stick_Form = (PDPage) PDDocument.load(new File("C:\\Users\\saf\\Desktop\\JavaTest\\Sticking Form.pdf")).getPage(0);
}
public void buildFile(String fileName, String excelSheet) throws IOException {
// Create a Blank PDF Document and load in JU Excel Spreadsheet
PDDocument workingDocument = new PDDocument();
FileInputStream fis = new FileInputStream(new File(excelSheet));
// Load in the workbook
HSSFWorkbook JU_XML = new HSSFWorkbook(fis);
int sheetNumber = 0;
int rowNumber = 0;
String cellValue = "Starting Value";
HSSFSheet currentSheet = JU_XML.getSheetAt(sheetNumber);
// While we have not reached the 25th row in our current sheet
while (rowNumber <= 24) {
// Get the value in the current row, on the 8th column in the xls file
cellValue = currentSheet.getRow(rowNumber + 6).getCell(7).getStringCellValue();
// If it has stuff in it,
if (cellValue != "") {
// Check if it has the letters "IKE" and append the IKE form to our PDF
if (cellValue != "IKE") {
workingDocument.importPage(IKE_Form);
// If it is anything else (other than empty), append the Stick Form to our PDF
} else {
workingDocument.importPage(Stick_Form);
}
// Let's move on to the next row
rowNumber++;
// If the next row number is the "26th" row, we know we need to move on to the
// next sheet, and also reset the rows to the first row of that next sheet
if (rowNumber == 25) {
rowNumber = 0;
currentSheet = JU_XML.getSheetAt(++sheetNumber);
}
// if the 9th row is empty, we should break out of the loop and save/close our PDF, we are done
} else {
break;
}
}
workingDocument.save(fileName);
workingDocument.close();
}
}
I am getting the following error:
Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
I've done research and it seems like a PDDocument is closing before I run the workingDocument.save(fileName) command. I'm not quite sure how to fix this, and I'm also a bit lost on how to find a workaround. I'm a bit rusty on my programming, so any help would be super appreciated! Also any feedback on how to make future posts more informative would be great.
Thanks in advance
Please try it
PDFMergerUtility merger = new PDFMergerUtility();
PDDocument combine = PDDocument.load(file);
merger.appendDocument(getDocument(), combine);
merger.mergeDocuments();
combine.close();
Update:
Since merger.mergeDocuments(); is deprecated in recent APIs, try to make use of the same method using following overloaded methods...
merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
or
merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
Depends on your memory usage, you can further fine tune this method by passing MemoryUsageSetting object.
Related
I am using Apache POI to edit an existing file. This file contains multiple formulas that use the numbers that will be inputted through Apache. And this is where I run into problems, when a number is inputted and that cell is being used in a formula, the file gets corrupted and the formula disappears.
Here the formulas for the 0 are C7+D7, C8+D8, etc.
Here the formulas for the 0 became normal 0, the formulas got lost.
Here is the code I used to write to the excel file:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.EncryptedDocumentException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class write {
public static void main(String[] args) {
String excelFilePath = "C:\\Users\\jose_\\IdeaProjects\\writeExcel\\src\\JavaBooks.xlsx";
try {
FileInputStream inputStream = new FileInputStream(new File(excelFilePath));
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);
/*Cell cell2Update = sheet.getRow(1).getCell(3); // This updates a specific cell: row 0 cell 3
cell2Update.setCellValue(49);*/
Object[][] bookData = {
{2, 17},
{3, 27},
{4, 33},
{5, 44},
};
// int rowCount = sheet.getLastRowNum(); // Gets the last entry
int rowCount = 5;
for (Object[] aBook : bookData) {
Row row = sheet.createRow(++rowCount);
int columnCount = 1;
int lote = 1;
Cell cell = row.createCell(columnCount);
//cell.setCellValue(rowCount); // This sets the index for each entry
cell.setCellValue(lote);
for (Object field : aBook) {
cell = row.createCell(++columnCount);
if (field instanceof String) {
cell.setCellValue((String) field);
} else if (field instanceof Integer) {
cell.setCellValue((Integer) field);
}
}
}
inputStream.close();
FileOutputStream outputStream = new FileOutputStream("C:\\Users\\jose_\\IdeaProjects\\writeExcel\\src\\JavaBooks.xlsx");
workbook.write(outputStream);
workbook.close();
outputStream.close();
} catch (IOException | EncryptedDocumentException ex) {
ex.printStackTrace();
}
}
}
Is there a way to work around this or do I need to set all the formulas again through Apache POI?
You get the error because using code line Row row = sheet.createRow(++rowCount); you always create new empty rows and so you remove all cells in those rows. So you are also removing the cells containing the formulas. Doing so you are damaging the calculation chain. That's what the Excel GUI tells you with the messages.
You should not do this. Instead you always should try to get the rows first using Sheet.getRow. Only if that returns null then you need to create the row.
...
//Row row = sheet.createRow(++rowCount);
Row row = sheet.getRow(rowCount); if (row == null) row = sheet.createRow(rowCount); rowCount++;
...
Additional please read Recalculation of Formulas. So after changing cells referenced in formulas, do always either workbook.getCreationHelper().createFormulaEvaluator().evaluateAll(); or delegate re-calculation to Excel using workbook.setForceFormulaRecalculation(true);.
Please find the attached code snippet and please help me to proceed with this. I am trying to read data from one excel and then write the same to another excel , while trying to write the file it's stopping the code. When I tried debugging I could see that value is properly fetched but write is not working.
package Export;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class TestExport
{
XSSFWorkbook xlsxworkbook;
HSSFWorkbook xlsworkbook;
XSSFWorkbook xlsxworkbook1;
HSSFWorkbook xlsworkbook1;
Sheet sheet;
Sheet sheet1;
TestExport(){
xlsxworkbook=null;
xlsworkbook=null;
sheet=null;
xlsxworkbook1=null;
xlsworkbook1=null;
sheet1=null;
}
public void readExcel(String filePath,String fileName,String sheetName,String filePath1,String fileName1,String sheetName1)
{
try{
FileInputStream fs=new FileInputStream(new File("C:\\Users\\Susmitha-Phases\\Desktop\\TestWorkbook.xlsx"));
FileOutputStream fi=new FileOutputStream(new File("C:\\Users\\Susmitha-Phases\\Desktop\\TestWorkbook1.xlsx"));
fs.toString();
if(fileName.toLowerCase().endsWith("xlsx")){
xlsxworkbook = new XSSFWorkbook(fs);
sheet=xlsxworkbook.getSheet(sheetName);
xlsxworkbook1 = new XSSFWorkbook();
sheet1=xlsxworkbook.getSheet(sheetName1);
}
else{
xlsworkbook=new HSSFWorkbook(fs);
sheet=xlsworkbook.getSheet(sheetName);
xlsworkbook1=new HSSFWorkbook();
sheet1=xlsworkbook.getSheet(sheetName1);
}
int rowCount = sheet.getLastRowNum()-sheet.getFirstRowNum();
//Create a loop over all the rows of excel file to read it
for (int i = 0; i < rowCount+1; i++)
{
Row row = sheet.getRow(i);
Row row1=sheet1.getRow(i);
//Create a loop to print cell values in a row
for (int j = 0; j < row.getLastCellNum(); j++)
{
String temp= row.getCell(j).getStringCellValue();
row1.createCell(i).setCellValue(temp);
//Print Excel data in console
System.out.print(row1.getCell(j).getStringCellValue()+"|| ");
xlsworkbook.write(fi);
//System.out.print(row.getCell(j).getStringCellValue()+"|| ");
}
}
}
catch(Exception e){
System.out.println(e.getMessage());
}
}
public static void main(String[] args) throws IOException{
//Create an object of ReadGuru99ExcelFile class
TestExport objExcelFile = new TestExport();
//Prepare the path of excel file
String filePath = System.getProperty("C:\\Users\\Susmitha-Phases\\Desktop\\TestWorkbook.xlsx");
String filePath1 = System.getProperty("C:\\Users\\Susmitha-Phases\\Desktop\\TestWorkbook1.xlsx");
//Call read file method of the class to read data
objExcelFile.readExcel(filePath,"TestWorkbook.xlsx","Sheet1",filePath1,"TestWorkbook1.xlsx","Sheet1");
}
}
There are some problem with your code it possible throw null pointer exception
You never created the instance for xlsworkbook in your class when the file type xlsx and trying to write the file . which is wrong will throw definitely null pointer exception. so You must change the logic while writing which file type should be write . Probably you can check file type and write the file.
The situation is as follows;
I have a simple program which uses the Apache Poi Library to add one row of data at the end of the an exisiting xlsx file. See below
File file = new File(input);
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
XSSFRow row = sheet.createRow(sheet.getLastRowNum() + 1);
After this I will iterate over the row and set the CellValues. But the problem is that on the second line of the code, as shown above, I get an out of memory error. Is there a way to add a row of data to the existing xlsx file without having to read the file fully?
(not enough reputation to add this as a comment)
Have you tried using SXSSFWorkbook instead of XSSFWorkbook?
You can try XSSF and SAX (Event API).
If getting the XSSFWorkbook fails because of out-of-memory error and the need is to read and write the workbook, then neither SXSSF nor SAX parser will help. The one is only for writing. The other is only for reading.
Both approaches in follow needs knowledge about the *.xlsx file format which is Office Open XML. In general a *.xlsx file is a ZIP archive containing XML files and other files in a special directory structure. So one can unzip the *.xlsx file using a ZIP software to have a look at the XML files. The file format was first standardized by Ecma. So for further recherches I prefer Ecma Markup Language Reference. For example Row.
The ReadAndWriteTest.xlsx used in both examples must have at least one worksheet and the first worksheet must have at least one row.
One approach could be using the DOM methods of XMLBeans. My favorite reference for this is grepcode.
Example:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import java.io.File;
import java.io.OutputStream;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
import org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;
import org.apache.xmlbeans.XmlOptions;
import javax.xml.namespace.QName;
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
class DOMReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
//we only open the OPCPackage, we don't create a Workbook
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the SheetData, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
//get the PackagePart of the first sheet
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
//get the worksheet from the first sheet's XML
//if it even fails while parsing this, then this approach is not usable
WorksheetDocument worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
CTWorksheet worksheet = worksheetdocument.getWorksheet();
CTSheetData sheetdata = worksheet.getSheetData();
//put some data in 10 new rows"
for (int i = 0; i < 10; i++) {
int rowsCount = sheetdata.sizeOfRowArray();
CTCell ctcell= sheetdata.addNewRow().addNewC();
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount + 1));
int sRef = sharedstringstable.addEntry(ctstr);
ctcell.setT(STCellType.S);
ctcell.setV(Integer.toString(sRef));
ctcell=sheetdata.getRowArray(rowsCount).addNewC();
ctcell.setV(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
}
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
//create XmlOptions for saving the worksheet
XmlOptions xmlOptions = new XmlOptions();
xmlOptions.setSaveOuter();
xmlOptions.setUseDefaultNamespace();
xmlOptions.setSaveAggressiveNamespaces();
xmlOptions.setCharacterEncoding("UTF-8");
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
Map<String, String> map = new HashMap<String, String>();
map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
xmlOptions.setSaveSuggestedPrefixes(map);
//save the worksheet
out = sheetpart.getOutputStream();
worksheet.save(out, xmlOptions);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
This code writes 10 new Rows in sheet1 of ReadAndWriteTest.xlsx without opening the whole workbook. But it must at least opening and parsing the sheet1 and the SharedStringsTable. If even this fails, then this approach is not usable.
Another approach could be using StAX. This API can read and write XML event driven. And it uses streaming.
Example:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;
import javax.xml.namespace.QName;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
class StaxReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the sheet data, we need the SharedStringsTable
//if it even fails while parsing this SharedStringsTable, then this approach is not usable
//then we must stream this XML event driven also.
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
int rowsCount = 0;
while(reader.hasNext()){ //loop over all XML in sheet1.xml
XMLEvent event = (XMLEvent)reader.next();
writer.add(event); //by default write each readed event
if(event.isStartElement()){
StartElement startElement = (StartElement)event;
QName startElementName = startElement.getName();
if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
boolean rowStart = true;
rowsCount++;
do {
event = (XMLEvent)reader.next(); //find this row's end
writer.add(event); //by default write each readed event
if(event.isEndElement()){
EndElement endElement = (EndElement)event;
QName endElementName = endElement.getName();
if(endElementName.getLocalPart().equalsIgnoreCase("row")) { //end element of row
rowStart = false;
//we assume that there is nothing else (character data) between end element of row and next element
XMLEvent nextElement = (XMLEvent)reader.peek();
QName nextElementName = null;
if (nextElement.isStartElement()) nextElementName = ((StartElement)nextElement).getName();
else if (nextElement.isEndElement()) nextElementName = ((EndElement)nextElement).getName();
if(!nextElementName.getLocalPart().equalsIgnoreCase("row")) { //next is not start element of row
//we have the last row, so we write new rows now
for (int i = 0; i < 10; i++) {
StartElement newRowStart = eventFactory.createStartElement(new QName("row"), null, null);
writer.add(newRowStart);
//start cell A
Attribute attribute = eventFactory.createAttribute("t", "s");
List attributeList = Arrays.asList(attribute);
StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null);
writer.add(newCellStart);
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount +1));
int sRef = sharedstringstable.addEntry(ctstr);
StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
Characters value = eventFactory.createCharacters(Integer.toString(sRef));
writer.add(value);
EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell A
//start cell B
newCellStart = eventFactory.createStartElement(new QName("c"), null, null);
writer.add(newCellStart);
newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
value = eventFactory.createCharacters(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
writer.add(value);
newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell B
EndElement newRowEnd = eventFactory.createEndElement(new QName("row"), null);
writer.add(newRowEnd);
rowsCount++;
}
}
}
}
} while (rowStart);
}
}
}
writer.flush();
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
This code also writes 10 new Rows in sheet1 of ReadAndWriteTest.xlsx without opening the whole workbook. But it must at least opening and parsing the SharedStringsTable. If even this fails, then this approach is also not usable. But of course even the SharedStringsTable could be streamed using StAX. But as you see in example with generating the rows and cells, this is much more complicated. So using the SharedStringsTable makes things easier in this example.
I am trying to view word file in my editor pane
I tried these lines
import java.awt.Dimension;
import java.awt.GridLayout;
import java.io.File;
import java.io.FileInputStream;
import javax.swing.JEditorPane;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class editorpane extends JEditorPane
{
public editorpane(File file)
{
try
{
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument hwpfd = new HWPFDocument(fis);
WordExtractor we = new WordExtractor(hwpfd);
String[] array = we.getParagraphText();
for (int i = 0; i < array.length; i++)
{
this.setPage(array[i]);
}
} catch (Exception e)
{
e.printStackTrace();
}
but gives me
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at frame1.editorpane.<init>(editorpane.java:24)
in this line
HWPFDocument hwpfd = new HWPFDocument(fis);
how can I solve that ??
beside I am not sure about these lines
for (int i = 0; i < array.length; i++)
{
this.setPage(array[i]);
}
can I get them confirmed ??
You are trying to open a .docx file (XWPF) with code for .doc (HWPF) files. You can use XWPFWordExtractor for .docx files.
There is an ExtractorFactory which you can use to let POI decide which of these applies and uses the correct class to open the file, however you can then not iterate by page as only a generic getText() method is available then.
Use it like this
POITextExtractor extractor = ExtractorFactory.createExtractor(file);
extractor.getText();
I am trying the this testfile with the Apache POI API (current version 3-10-FINAL). The following test code
import java.io.FileInputStream;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class ExcelTest {
public static void main(String[] args) throws Exception {
String filename = "testfile.xlsx";
XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(filename));
XSSFSheet sheet = wb.getSheetAt(0);
System.out.println(sheet.getFirstRowNum());
}
}
results in the first row number to be -1 (and existing rows come back as null). The test file was created by Excel 2010 (I have no control over that part) and can be read with Excel without warnings or problems. If I open and save the file with my version of Excel (2013) it can be read perfectly as expected.
Any hints into why I can't read the original file or how I can is highly appreciated.
The testfile.xlsx is created with "SpreadsheetGear 7.1.1.120". Open the XLSX file with a software which can deal with ZIP archives and look into /xl/workbook.xml to see that. In the worksheets/sheet?.xml files is to notice that all row elements are without row numbers. If I put a row number in the first row-tag like <row r="1"> then apache POI can read this row.
If it comes to the question, who is to blame for this, then the answer is definitely both Apache Poi and SpreadsheetGear ;-). Apache POI because the attribute r in the row element is optional. But SpreadsheetGear also because there is no reason not to use the r attribute if Excel itself does it ever.
If you cannot get the testfile.xlsx in a format which can Apache POI read directly, then you must work with the underlying objects. The following works with your testfile.xlsx:
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.InputStream;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRow;
import java.util.List;
class Testfile {
public static void main(String[] args) {
try {
InputStream inp = new FileInputStream("testfile.xlsx");
Workbook wb = WorkbookFactory.create(inp);
Sheet sheet = wb.getSheetAt(0);
System.out.println(sheet.getFirstRowNum());
CTWorksheet ctWorksheet = ((XSSFSheet)sheet).getCTWorksheet();
CTSheetData ctSheetData = ctWorksheet.getSheetData();
List<CTRow> ctRowList = ctSheetData.getRowList();
Row row = null;
Cell[] cell = new Cell[2];
for (CTRow ctRow : ctRowList) {
row = new MyRow(ctRow, (XSSFSheet)sheet);
cell[0] = row.getCell(0);
cell[1] = row.getCell(1);
if (cell[0] != null && cell[1] != null && cell[0].toString() != "" && cell[1].toString() != "")
System.out.println(cell[0].toString()+"\t"+cell[1].toString());
}
} catch (InvalidFormatException ifex) {
} catch (FileNotFoundException fnfex) {
} catch (IOException ioex) {
}
}
}
class MyRow extends XSSFRow {
MyRow(org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRow row, XSSFSheet sheet) {
super(row, sheet);
}
}
I have used:
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRow
Which are part of the Apache POI Binary Distribution poi-bin-3.10.1-20140818 and there are within poi-ooxml-schemas-3.10.1-20140818.jar
For a documentation see http://grepcode.com/snapshot/repo1.maven.org/maven2/org.apache.poi/ooxml-schemas/1.1/
And I have extend XSSFRow, because we can't use the XSSFRow constructor directly since it has protected access.