I have an excel file with 2 pages.
The first page "sheet_n1" have formula with reference to another page =MAX(sheet_n2!A1:sheet_n2!A3)
The second page "sheet_n2" have a table with data.
After saving, I open my excel file as ZIP-archive. And in the XML-file of "sheet_n1" I have apostrophes in formula on second part of range =MAX(sheet_n2!A1:'sheet_n2'!A3)
This does not affect how the formula works in the excel. And I don't see these apostrophes in excel app.
But this affects opening a file using an Apache POI library in my Java application. I have apostrophes when I read a cell with formula
Can someone explain where these apostrophes come from ? And why only in the second part of the formula?
=MAX(17, 'sheet_n2'!A3) is correct syntax. The sheet name is surrounded by inverted commas. That's the rule.
However, Microsoft, in its never ending effort to make things easier, determined that they can be left out if there is no space in the tab name. The consequence is that Excel will remove the commas if there is no space in the name even if you type them. But if you use Excel in other languages, including VBA, the commas will not be removed even though they may not be needed. Appache POI is likely aware of this and would know how to deal with the commas.
=MAX(sheet_n2!A1:'sheet_n2'!A3) is a special case because the second mention of the sheet isn't required. =MAX(sheet_n2!A1:A3) is adequate. So, Excel doesn't quite know what to do with the extra information and leaves it untouched.
Related
I am making a java program where I input answers for a friendship survey. It spits out the student's top ten friends. However I need to print out the results and give them to the students. The old of doing it was to have the java program write to write html then we would open each file one at a time and print out the page. However, having 400+ students to do it for takes a while.
So since I am re making the program I would like to make it so I can just have it on word files and print them all out at once. However, I don't know how to write to a word file and notepad isn't stylish enough. Anyone know how to make this possible or another way that is easier?
I did a similar thing some years ago, using Rich Text Format. Its advantage is that it's a plain text format that can easily be manipulated.
I created the form document in Word with some unique placeholder strings where I'd later fill in the actual data and saved it as RTF.
With a text editor, I made sure that Word didn't split the placeholders by inserting some junk formatting directives, and corrected that manually where necessary.
Then, filling in the actual data just meant to do some simple text replacement (in my case, there was no risk to interfere with the formatting directives), and saving the resulting RTF file.
As Word typically opens RTF files just as easy as DOC or DOCX ones, this was an easy working solution for me.
I have a pdf textbook which has math equations like this:
However, if i attempt a simple text extraction i get something along the lines of:
V(r) = - 3 - -
2R R2
This is not an image, it is text but I don't know how to preserve the way it looks and get the actual characters into a text file.
The problem you are running into is a frequently encountered one. PDF essentially doesn't care about structure. It has no notion of a column, paragraph, a line of text or even a word, let alone a mathematical formula with lots of special formatting.
PDF - essentially - is only interested in placing things on a page at a specific location. And that's exactly what it does with your formulas as well, it will use the characters and graphics you need for your formulas and put them somewhere on the page. Without any additional knowledge that you could use afterwards to figure out that these characters and graphics even belong to a formula; let alone reconstruct it while doing text extraction.
Two additional points:
1) If you share an example of such a PDF document, we could have a look if there is some useful information in it that could be used to extract this formula in a more competent way; but the chance is close to zero.
2) You would also have to define what a "useful way" from your point of view is. Formulas don't translate well to plain text files, so you probably need something like MathML to store them in.
Does anyone know if using apache-poi library you can change the decimal and thousands separators for Microsoft Excel?
I need to export in excel some data from an web application, and the numbers are formatted depending on some the user's settings. so when the data is exported the numbers should look exactly how they are in the application's page.
Thanks
You need to set your CellStyle dataFormat in this way (if you use integer and want thousand separator)
cellStyle.setDataFormat(creationHelper.createDataFormat().getFormat("#,##0"));
cell.setCellStyle(cellStyle);
I think that you need something like that: (I didn't try it, so maybe you need to modify it a little bit) #,##0.00
please note: is very important you use comma, and not dot. If your locale is setted correcty, you will see a dot.
Formatting in Excel is controlled through the Tools > Options > International dialogs, and is stored in local preferences, not in a file. So you can't control this through POI.
The only solution I can think of is to provide text rather than numbers. But it will prevent user from doing any calculation in Excel.
There's only formatting. It means this format is my format for formatting numeric. The comma is a symbol equals only part of thousands while the dot is part of decimal. You could use "#,##0.00" or "#,##0" does not matter because Microsoft Excel has local settings of separator applies to the application, not a file, you cannot override via API.
Remember, the sheet has a predefined cell style. A cell has a reference only to style. If you change on cell, you change all cells this type.
I have the same issue with format of cell. I think I try to use the method "setVBAProject" on XSSFWorkbook.
https://social.technet.microsoft.com/Forums/office/en-US/eaa4c7f6-197a-4b33-bc5f-20896e5a7e3a/workbook-or-worksheet-specific-decimal-separator?forum=excel
I have a situation where I have been asked to write a program that essentially does an arbitrary SQL select over JDBC, convert the ResultSet to something loadable in Excel and send it as an attachment in an email.
The question goes for what dataformat to use in order to be loadable by as many different versions of Excel as possible.
I have considered:
XLS - native format, the simplest way to generate seems to be with JExcel.
CSV - comma separated format, must use semicolons instead of commas to cope with European decimal commas, and then there is all the quotation stuff.
HTML - it appears that Excel knows how to read an HTML table. It should be sufficient then to set the MIME-type to be application/vnd.ms-excel
but naturally there must be other interesting ways to do it.
My major concern is incorrect interpretion of the data:
Numbers with decimal commas gets misinterpreted on systems with decimal points.
Character encoding issues (We cannot rely on the recipient using ISO-Latin-#).
Date interpretation - we have earlier found that the YYYY-MM-DD format is pretty robust.
My major concern is robustness. I don't mind it being tedious to code, if I can count on the result being good.
Suggestions and experiences to share?
I am aware of JSP generating Excel spreadsheet (XLS) to download - that page does not discuss robustness.
I'd recommend Andy Khan's JExcel. It's the best library for working with Excel in Java.
Apache hssf
This has always been the chosen method where I've worked in Java development.
It's an acronym for Horrible SpreadSheet Format
The quick way to generate Excel files to to write out tab delineated text and name it <name>.xls. Excel will open any text file ending in .xls as a single worksheet.
With a known formula extracted from a spreadsheet, is it possible to apply/evaluate the formula without having it reside in an actual cell?
I suppose one can create / locate a blank cell on the sheet (anyone have any ideas how this might be done efficiently?) and evaluate the formula this way, but is there a better way?
I'm not sure that POI is the way to do for this, given that it looks after creating/reading/writing spreadsheets. Have you looked at invoking the Excel COM object (via, say, JACOB), and running the formulas in Excel itself ?
Excel does let you evaluate a formula without it having to reside in a cell. You can do it via the old XLM macro language with EVALUATE or through the C API, and via VBA with Application.Evaluate or Worksheet.Evaluate.
Of course, that information might be of no help if all you have is the extracted formula and not access to Excel. If you know the formulas will be simple enough, I can see evaluating them yourself or with another tool (although I don't know of anything specific). In general, though, you will need not only access to Excel, but also the actual document the formulas are in, since a formula can call user-written VBA/XLL functions, use defined names, etc.