Excel generation, unreadable content - java

I'm using Apache POI 3.12 (SXSSF workbook) in order to generate .xlsx files.
The problem is that I'm doing the generation and when I open the file I'm receiving an error message:
Excel found unreadable content in file.xlsx. Do you want to recover
the contents of this workbook? If you trust the source of this
workbook, click Yes.
After clicking Yes, the file opens and I'm receiving this notification
Excel completed file level validation and repair. Some parts of this
workbook may have been repaired or discarded. Removed Records:
Comments from /xl/comments1.xml part (Comments) Repaired Records:
Comments from /xl/comments1.xml part (Comments)
After that, I unzip the excel file and check the comments1.xml. All my comments are present. All 216 of them.
The section of the code that generates the comments is the following
String comment = _propertiesHolder.getComment();
String commentAuthor = _propertiesHolder.getCommentAuthor();
if(comment != null)
{
int colIndex = cell.getColumnIndex();
int rowIndex = cell.getRowIndex();
CreationHelper helper = _workbook.getCreationHelper();
ClientAnchor anchor = helper.createClientAnchor();
anchor.setCol1(colIndex);
anchor.setCol2(colIndex + 1);
anchor.setRow1(rowIndex);
anchor.setRow2(rowIndex + 3);
// Create the comment and set the text+author
Comment cellComment = _drawingPatriarch.createCellComment(anchor);
if(commentAuthor != null)
{
cellComment.setAuthor(commentAuthor);
RichTextString rs = helper.createRichTextString(commentAuthor + ": " + comment);
cellComment.setString(rs);
}
else
{
cellComment.setString(helper.createRichTextString(comment));
}
cellComment.setRow(rowIndex);
cellComment.setColumn(colIndex);
// Assign the comment to the cell
cell.setCellComment(cellComment);
}
Do you have any idea what could be the cause of this problem? Although no information was lost, clearly there is something wrong and I would like to fix it. The comments are retrieved from database (varchar datatype). The biggest comment is 138 characters long.
Update
Something that I forgot to mention. I've also run the same extraction using hssf implementation and no errors were present. It would be a safe assumption that the data are not the problem.

Ok I found the problem. It was with the author.
The problem is this line cellComment.setAuthor(commentAuthor);.
If for one comment we set
cellComment.setAuthor("test")
and then in another comment we set
cellComment.setAuthor("test ")
There will be an error shown when opening the file. Mind the whitespace. The solution is to trim the author string before setting it.

Related

Java String contains/indexof fails due to wrong encoding from local file

EDIT:
I have a semi-working solution at the bottom.
Or, the original text:
I have a local CSV file. The file is encoded in utf16le. I want to read the file into memory in java, modify it, then write it out. I have been having incredibly strange problems for hours.
The source of the file is Facebook leads generation. It is a CSV. Each line of the file contains the text "2022-08-08". However when I read in the line with a buffered reader, all String methods fail. contains("2022-08-08") returns false. I print out the line directly after checking, and it indeed contains the text "2022-08-08". So the String methods are totally failing.
I think it's possibly due to encoding but I'm not sure. I tried pasting the code into this website for help, but any part of the code that includes copy pasted strings from the CSV file refuses to paste into my browser.
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Prints: false, 2022-08-08T19:57:51+07:00
There are tons of invisible characters in the CSV file and in my IDE everywhere I have copy pasted from the file. I know the characters are there because when I backspace them it deletes the invisible character instead of the actual character I would expect it to delete.
Please help me.
EDIT:
This code appears to fix the problem. I think partially the problem is Facebook's encoding of the file, and partially because the file is from user generated inputs and there are a few very strange inputs. If anyone has more to add or a better solution I will award it. Not sure exactly why it works. Combined from different sources that had sparse explanation.
Is there a way to determine the encoding automatically? Windows Notepad is able to do it.
BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(new File("C:\\New folder\\form.csv")), "UTF-16LE"));
BufferedWriter fw = Files.newBufferedWriter(Paths.get("C:\\New folder", "form3.txt"));
String s;
while ((s = fr.readLine()) != null) {
s = s.replaceAll("\\p{C}", "?").replaceAll("[^A-Za-z0-9],", "").replaceAll("[^\\x00-\\x7F]", "");
//doo stuff with s normally
}
You can verify what you're getting from the stream by
byte[] b = s.getBytes(StandardCharsets.UTF_16BE);
System.out.println(Arrays.toString(b));
I think the searching condition for indexOf could be wrong:
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Maybe the condition should be (i != -1), if I'm not wrong too much.
It's a little tricky, because for (i < 0) the string should not contain "2022".

Getting question mark instead of multiple while spaces while exporting to excel in apache poi

I am getting question mark symbol(?) instead of multiple white spaces in output excel. I am using apache poi 3.7. For single space it is working fine.
For example:-
if my input is "a b" then generated output is "a? b".
Here a and b have two spaces in between.
This code snippet works just fine.
Can you compare with your own code and post some code sample if you still have the problem ?
Workbook book = new HSSFWorkbook();
Sheet sheet = book.createSheet();
Row oRow = sheet.createRow(1);
Cell oCell = oRow.createCell(1);
oCell.setCellValue("a b");
OutputStream out = new FileOutputStream("c:\\temp\\test.xls");
book.write(out);
out.close();
Try to open your generated spreadsheet output in Microsoft Excel.
It is encoding issue. Sometimes it might happen that if your input contains multiple white spaces then Open office shows you as "?".
For future reference, this solved my problem. As Eric pointed, one should find out first which character codes are creating trouble, in my particular case they where zeroes.
String s = getStringFromSource();
s = s.replace('\u0000', '\u0020'); // check values with dec to hexa first, u0020 means 32
cell.setValue(s);

Updating TableView using a CSV

Much like this question, I am trying to update a TableView in JavaFX. I have adopted the solution using DataFX.
My code :
File file = new File(path);
if(file.exists() && file.canRead()) {
DataSourceReader dsr1 = new FileSource(file);
String[] columnsArray = {"firstName", "lastName"};
CSVDataSource ds1 = new CSVDataSource(dsr1, columnsArray);
System.out.println("CSV : " + ds1.getData().size()); // outputs 0
//Below is commented out since I don't have data : source of the error
//tblAthleteList.setItems(ds1.getData());
//tblAthleteList.getColumns().addAll(ds1.getColumns());
}
Here is a view of my test .csv file :
firstName, lastName
first, last
test, tester
I am using JavaFX 2, DataFX 1.0 and building in e(fx)clipse
Edit
Have changed the code a bit to use the FileSource(File f) constructor to see if this changes anything. Turns out I am trying to print something from the CSVDataSource and I always get a NullPointerException. Therefore assumming that the CSVDataSource doesn't get any data. From examples I can find this is being done correctly. I can read the file using a simple BufferedReader and a loop.
Edit 2
Edited the question... I am now specifying that the error is in the fact that no data gets pulled into the CSVDataSource from the .csv file. The line ds1.getData().size() returns 0. Posted a very simple .csv file I am using. EOL consists of CR + LF and edited in Notepad++ (no Excel superfluous characters).
make sure column names in columnsArray are exactly equal to column names in CSV file (case sensitive).
i got the similar exception when i put my column name as year in code but in my csv file its Year.
Update According to Edit in Question :
remove space between , and lastName in file or put " lastName" as column name in code :)

POI: How to validate empty cells in a range

I'm creating an Excel file where, once created and downloaded, a user isn't allowed to let empty cells in a specific column (because he will send it again with information he entered).
I'm using POI HSSFDataValidation with setEmptyCellAllowed(false).
But when the user downloads the file, he still can leave empty cells (after writing some text and deleting it).
Any suggestions?
Here's my code:
HSSFDataValidation dv = new HSSFDataValidation();
dv.setFirstColumn((short)19);
dv.setLastColumn((short)19);
dv.setFirstRow((short)4);
dv.setLastRow((short)24);
dv.setDataValidationType(DVConstraint.ValidationType.INTEGER);
dv.setOperator(DVConstraint.OperatorType.BETWEEN);
dv.setDataValidationType(HSSFDataValidation.DATA_TYPE_INTEGER);
dv.setOperator(HSSFDataValidation.OPERATOR_BETWEEN);
dv.setFirstFormula("0");
dv.setSecondFormula("1000");
//dv.setEmptyCellAllowed(true);
dv.setEmptyCellAllowed(false);
dv.setShowPromptBox(true);
dv.setSurppressDropDownArrow(false);
dv.setErrorStyle(HSSFDataValidation.ERROR_STYLE_STOP);
//dv.createErrorBox("", "");
//dv.createPromptBox("", "");
sheet.addValidationData(dv);
this is not answering your question directly but you can use a try and catch block to write something in the cell yourself if the user has kept it blank:

Can I get access to Lotus Notes embedded files without actually extracting them?

I'm working on a way of programatically accessing a Lotus Notes database to gather information on embedded attachments of records over a given period.
My goal is to find records over a given period, then use Apache-POI to get metadata about document size, character count, etc.
The POI part works fine, and so far, I've been able to access the Lotus Notes records thanks to this help:
lotus notes search by date with Java api
and this answer also shows me how to download/copy the attachments:
How do I get all the attachments from a .nsf(lotus notes) file using java
from there I could use my POI code do my job and at the end, just delete the copied attachments. This approach, basically works, but I want to avoid the overhead of copying, saving and then at the end deleting my copy of these attached documents from the database.
I tried passing the result of the EmbeddedObject getSource() method as an input to my POI code and got a FileNotFoundException in the POI code that was expecting a String to make a File.
Is there a way of getting a File reference I can pass to POI, without copying and saving the attachment? Or, what I mean is, is it as simple as getting a File (+path) for the Lotus Notes EmbeddedObject attachment, and how do I do this?
I found the answer and posted it below.
Answering my own question...
...here's the solution I found a little while after posting the question above:
EmbeddedObject's getInputStream to the rescue...
//from the answer in the link in the question above 
Database db = agentContext.getCurrentDatabase();
DocumentCollection dc = db.getAllDocuments();
Document doc = dc.getFirstDocument();
boolean saveFlag = false;
while (doc != null) {
RichTextItem body =
(RichTextItem)doc.getFirstItem("Body");
System.out.println(doc.getItemValueString("Subject"));
Vector v = body.getEmbeddedObjects();
Enumeration e = embeddedObjs.elements();
while(e.hasMoreElements()){
EmbeddedObject eo = (EmbeddedObject)e.nextElement();
if(eo.getType() == EmbeddedObject.EMBED_ATTACHMENT){
//this next line gives Apache-POI access to the InputStream
InputStream is = eo.getInputStream();
POIFSFileSystem POIfs =
HWPFDocument.verifyAndBuildPOIFS(is);
POIOLE2TextExtractor extractor =
ExtractorFactory.createExtractor(POIfs);
System.out.println("extracted text: " + extractor.getText());
is.close(); //closing InputStream
}
eo.recycle(); //recycling EmbeddedObject
//thanks to rhsatrhs for the close() and recycle() tip!

Categories

Resources