docx4j / xlsx4j : create simple spreadsheet

docx4j / xlsx4j : create simple spreadsheet - java

I want to create a simple spreadsheet in docx4j / xlsx4j. It shall contain only Strings, no formular is needed. The porpuse is basically switching from a CSV to XLSX
Therefore I tried the example here: https://github.com/plutext/docx4j/blob/master/src/samples/xlsx4j/org/xlsx4j/samples/CreateSimpleSpreadsheet.java
Unfortunetly it is not working. Even after removing the deprecated parts ( http://pastebin.com/bUnJWmFD ).
Excel reports unreadable content and suggest a repair. After that I get the error: "Entfernte Datensätze: Zellinformationen von /xl/worksheets/sheet1.xml-Part". It means something like "removed datasets: Cellinformation at /xl/worksheets/sheet1.xml-Part".
This error occures when createCell is called in line 58 (see. Github, not pastebin) or cell.setV is called with "Hello World" instead of "1234"

I think you are raising 2 issues here:
the resulting XLSX needing repair: this was the result of a typo in cell2.setR, fixed at https://github.com/plutext/docx4j/commit/7d04a65057ad61f5197fb9a98168fc654220f61f
calling setV with "Hello World", you shouldn't do that. Per http://webapp.docx4java.org/OnlineDemo/ecma376/SpreadsheetML/v.html
This element expresses the value contained in a cell. If the cell
contains a string, then this value is an index into the shared string
table, pointing to the actual string value. Otherwise, the value of
the cell is expressed directly in this element. .. For applications
not wanting to implement the shared string table, an 'inline string'
may be expressed in an <is> element under <c> (instead of a
<v> element under <c>),in the same way a string would be
expressed in the shared string table.
though I guess our setV method could detect misuse and either throw an exception or do one of those other things instead.
The CreateSimpleSpreadsheet sample as it stands shows you how to set an inline string, so you just need to test whether your input is a number or not.

Related

Insert double Value for equation in PDF with pdfbox in Java

I'm struggling with a little Java Project:
I made a Program which autofills a PDF Formular. Mostly everything works fine for me, but there is a Problem: In this PDF Formular (which is given from my company, so I have to deal with this document) is a equation Field, which is used for calculation the Costs from Number of Items and the single Price. When I insert the Price of a single Item as a String to my PDF
public void setEinzelpreis(String Einzelpreis)
{
try {
fieldList.get(30).setValue(Einzelpreis);
...
There should be the single Price on the empty field in the first row. The last Cell of the row is auto-calculated by the pdf.
When I Click in the PDF in the "empty" Field, the Value appears:
When I click to another Field, the Value disappears. This is my Problem.
I'm getting the FieldList via pdfbox and the Code for getting my fieldList of the PFD is:
try {
pdfTemplate = PDDocument.load(template);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
if (acroForm != null)
{
// Get field names
fieldList = acroForm.getFields();
}
...
So, can anybody tell what I'm doing wrong? Maybe the PDF wants a double Value for the equation and I am giving a String? But I don't know how to write a double in the FieldList. Thanks a lot for every hint!
Edit:
The PDF File which I'm using:
https://1drv.ms/b/s!Av6exjPNXlgOioouAuXL6QV4eUGkqg?e=ocfhvC
And this is the file I generated:
https://1drv.ms/b/s!Av6exjPNXlgOioovK-HuRuXW2aRy_w?e=D1ZCA8
The strange thing is: when I change the value in the document by hand, everything acts normal, even with a different Document Viewer.

First of all, the AcroForm form structure in your PDF is weird. It looks like someone used a graphical form generation tool he did not understand and clicked, dragged, dropped, copied, ... until the form in a viewer did what he wanted, not caring about it having become difficult to maintain.
In particular the Einzelpreis fields have a completely unnecessary structure of intermediate and final fields, e.g.
Thus, the field Einzelpreis in € exkl USt1 (the '€' is missing in the tree above) is not the one to fill in, it's merely an intermediary field. The actual form field to fill in is Einzelpreis in € exkl USt1.0.0.0.0.
Unfortunately you in your code simply grab the 30th field in the field list returned by PDAcroForm, and this field happens to be the intermediary field Einzelpreis in € exkl USt1; as an intermediary field it has no visible widgets of its own, so your setValue call doesn't change the visible Einzelpreis.
The JavaScript instruction calculating the Gesamtpreis uses the value from the final field, too:
AFSimple_Calculate("PRD", new Array ("Anzahl1", "Einzelpreis in € exkl USt1.0.0.0.0"));
But as the field value is inheritable and none of the .0 fields has an own value, the calculation sees the 100 once form calculation has been triggered and uses it.
Thus, you should fill the Einzelpreis in € exkl USt1.0.0.0.0 field instead. And the more secure way to retrieve it is not by index in a field list but by name:
PDField fieldByName = acroForm.getField("Einzelpreis in € exkl USt1.0.0.0.0");
(excerpt from FillInForm test testFill2020_04BeschaffungsantragEinzelpreis)
After filling that field, the "100" should be visible in your form.
The remaining problem that the Gesamtpreis value is not calculated is due to the fact already mentioned by #Tilman in a comment to the question: PDFBox doesn't use javascript. Thus, you have to calculate those values yourself and update the fields in question accordingly.
If you need to know the correct name of a form field, you can do as Tilman proposed and use the PDFBox PDFDebugger. If you hover over the field there, it will display the name in the status bar at the bottom.
By the way, the AcroForm method getFields won't return the field required here anyways. As documented in its JavaDocs, this method will return all of the documents root fields, no fields further down in the hierarchy, at least not immediately. (From the user perspective the method name getFields is a misnomer. It is accurate, though, from the PDF specification perspective as the corresponding entry in the AcroForms object has the key Fields.)
Beware, though, you probably will have to update your PDFBox version. In earlier versions PDFBox did not update appearances of fields with JavaScript actions (believing some JavaScript would fill it in anyways). I used the current 3.0.0-SNAPSHOT in which that behavior has been changed.

XPages - Lotus Domino Java - getDocumentByKey

In a Java class in my XPages application, I'm trying to get a handle on a Notes Document in a Notes View. The Notes View contains several Notes Documents. To get the Notes Document I want, I use 2 keys. This produces an error. If I use just one key, the first Notes Document in the Notes View is returned. The Notes View contains two sorted columns. The first column contains the empLang value, the second column contains the templateType value. Here is my code:
String empLang = "en";
String templateType = "C";
Database dbCurr = session.getCurrentDatabase();
String viewName = "vieAdminTemplates" + empLang;
View tview = dbCurr.getView(viewName);
Vector viewKey = new Vector();
viewKey.addElement(empLang);
viewKey.addElement(templateType); // this line causes the code to fail
Document templateDoc = tview.getDocumentByKey(viewKey);
What could be the cause of this problem?

A couple of ideas
1) You could concatenate the key into a single column since you said that worked. Something like 'en~C'
2) You could use the database.search method where you include a string of formula language that isolates the document you want. It returns a collection, and then you pull the document from there.

getDocumentByKey works with multiple columns. There's a known problem with doubles, but you're not hitting that there. One thing that stands out is the second column is just a single letter. That could be considered as a Char instead of a String, either when you do addElement or by the view.
I'd recommend debugging out what data type they are. viewKey.get(1).getClass().getName() I think gives you the class it's stored as. Doing the same for the View Column value.
When you say it causes the code to fail, how does it fail? Does it just not return anything or throw an error?
My next step would be to try testing it where the View and the Vector contain more than one character, e.g. "CC", to help check if there's an underlying issue with Java getDocumentByKey and single characters.

I'm very sorry. The problem here is that the view name in the code is incorrect. There is a view "vieAdminTemplates" but it does not have a second column containing the value "C". With the correct view, the code works fine. Thanks for taking the time to respond to my question.

Apache Cayenne - I cannot find code defining the constants for Token.kind field

I'm using Cayenne to parse SQL conditions, through org.apache.cayenne.exp.parser.ExpressionParser, which produces a series of org.apache.cayenne.exp.parser.Tokens, and I want to determine the type of each Token (like identifier, equal sign, number, string etc.).
The token type is definitely identified by the ExpressionParser, and it seems to me that it is stored in the int field Token.kind. The values that this field shows in my parsing tests are definitely consistent (for ex. = is always 5, literal strings are always 42, and operators are always 2 etc.).
My problem is just that I cannot find the Java class containing the constants to compare Token.kind values with.
The Javadoc for field Token.kind says:
An integer that describes the kind of this token. This numbering
system is determined by JavaCCParser, and a table of these numbers is
stored in the file ...Constants.java.
It does not specify the full name of the file, so I downloaded JavaCCParser and I checked several *Constants.* files found in javacc-5.0src.zip, javacc-6.0.zip, the two javacc.jar contained in those two zip, and cayenne-3.0.2-src.tar.gz.
None of the classes I found there seems to me to have constants that consistently match the values I see in my tests.
The closest I was able to get to that was with class org.apache.cayenne.exp.parser.ExpressionParserConstants which for ex. contains int PROPERTY_PATH = 34 and int SINGLE_QUOTED_STRING = 42 which definitely match the actual tokens of my test expressions, but other tokens have no corresponding constant in that class, for ex. the = sign (kind = 5) and the and operator (kind = 2).
So my question is if anyone knows in which Java class are those constants defined.

First I should mention that ExpressionParser is designed to parse very specific format of Cayenne expressions. It certainly can not be used to parse SQL. So you might be looking in the wrong direction.
Parser itself is generated by JavaCC based on this grammar file. Tokens for the parser are formally defined in the bottom of this file, and are very specific to the task at hand.

Java Byte[] to String conversion dropping end quotes / weird side-effect

I am currently trying to perform some regex on the result of a DatagramPacket.getData() call.
Implemented as String myString = new String(thepkt.getData()):
But weirdly, java is dropping the end quotation that it uses to encapsulate all data(see linked image below).
When I click the field in the variable inspector during a debug session and don't change anything, when I click off the variable field it corrects itself again without me changing anything. It even highlights the variable inspection field in yellow to signal change.
Its values are also displaying like it is still a byte array rather than a String object
http://i.imgur.com/8ZItsZI.png
It's throwing off my regex and I can't see anything that would cause it. It's a client server simulation and on the client side, the getData returns the data no problem.

I got it working by using the solution provided in:
https://stackoverflow.com/a/8557165/1700855
But I still don't understand how not specifying the length of the packet to the String constructor would cause it to drop the systematic end double quotes. Can anyone provide an explanation as I really like to understand solutions to my issues before moving on :)

The problem is that you didn't read the spec for DatagramPacket.getData:
Returns the data buffer. The data received or the data to be sent
starts from the offset in the buffer, and runs for length long.
So, to be correct, you should use
new String(thepkt.getData(), thepkt.getOffset(), thepht.getLength())
Or, to not use the default charset:
new String(thepkt.getData(), thepkt.getOffset(), thepht.getLength(), someCharset)

Lucene TermFrequenciesVector

what do I obtain if I call IndexReader.getTermFrequenciesVector(...) on an index created with TermVector.YES option?

The documentation already answers this, as Xodorap notes in a comment.
The TermFreqVector object returned can retrieve which terms (words produced by your analyzer) a field contains and how many times each of those terms exists within that field.
You can cast the returned TermFreqVector to the interface TermPositionVector if you index the field using TermVector.WITH_OFFSETS, TermVector.WITH_POSITIONS or TermVector.WITH_POSITIONS_OFFSETS. This gives you access to GetTermPositions with allow you to check where in the field the term exists, and GetOffsets which allows you to check where in the original content the term originated from. The later allows, combined with Store.YES, highlighting of matching terms in a search query.
There are different contributed highlighters available under Contrib area found at the Lucene homepage.

Or you can implement proximity or first occurrence type score contributions. Which highlighting won't help you with at all.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.