PDFBOX - header in all pages using easytable

PDFBOX - header in all pages using easytable - java

I am using pdfbox and easytable https://github.com/vandeseer/easytable for creating dynamic pages which works great. But I do want header to be added in alL pages. I faced/tried below things.
1) Tablebuilder is created before writing rows so we can create a perfect tablebuilder since rows are dynamic.
2) Tried to insert header in middle while creating tablebuilder which again is not perfect since TableDrawer makes the rows to suffice according to row height
Any idea/help would be appreciated.
Need output similar to this project - https://github.com/eduardohl/Paginated-PDFBox-Table-Sample . only problem here being the content is not dynamic like easytable.

As an addition to #mkl's answer and its comments: In current versions of the library there is a class of its own for this very requirement.
So your code basically boils down to something like:
try (final PDDocument document = new PDDocument()) {
RepeatedHeaderTableDrawer.builder()
.table(createTable())
.startX(50)
.startY(100F)
.endY(50F) // note: if not set, table is drawn over the end of the page
.build()
.draw(() -> document, () -> new PDPage(PDRectangle.A4), 50f);
document.save("your-awesome-document.pdf");
}

This answer had been written at an earlier time when easytable had not yet supported repeating table headers. Meanwhile it does, see the answer by philonous, the easytable author.
easytable does not support repeating table headers or footers. Not yet I should say because this feature actually is easy to implement.
It is difficult, though, to implement on top of easytable because that library (like many others) suffers from excessive data hiding: many interesting member variables and methods are private, so extending the classes is not a viable option.
But what you can do is handle the header rows as a separate table which you draw again and again! The downside is a bit of duplicity of settings.
In case of the test code TwoPagesTableTest you referred to, it can be changed like this:
final Table.TableBuilder tableHeaderBuilder = Table.builder()
.addColumnOfWidth(200)
.addColumnOfWidth(200);
CellText dummyHeaderCell = CellText.builder()
.text("Header dummy")
.backgroundColor(Color.BLUE)
.textColor(Color.WHITE)
.borderWidth(1F)
.build();
tableHeaderBuilder.addRow(
Row.builder()
.add(dummyHeaderCell)
.add(dummyHeaderCell)
.build());
Table tableHeader = tableHeaderBuilder.build();
final Table.TableBuilder tableBuilder = Table.builder()
.addColumnOfWidth(200)
.addColumnOfWidth(200);
CellText dummyCell = CellText.builder()
.text("dummy")
.borderWidth(1F)
.build();
for (int i = 0; i < 50; i++) {
tableBuilder.addRow(
Row.builder()
.add(dummyCell)
.add(dummyCell)
.build());
}
TableDrawer drawer = TableDrawer.builder()
.table(tableBuilder.build())
.startX(50)
.endY(50F) // note: if not set, table is drawn over the end of the page
.build();
final PDDocument document = new PDDocument();
float startY = 100F;
do {
TableDrawer headerDrawer = TableDrawer.builder()
.table(tableHeader)
.startX(50)
.build();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
headerDrawer.startY(startY);
headerDrawer.contentStream(contentStream).draw();
drawer.startY(startY - tableHeader.getHeight());
drawer.contentStream(contentStream).draw();
}
startY = page.getMediaBox().getHeight() - 50;
} while (!drawer.isFinished());
document.save("twoPageTable-repeatingHeader.pdf");
document.close();
(RepeatingTableHeaders test createTwoPageTableRepeatingHeader)
As you see, the code first creates a separate Table tableHeader containing only the header row. This table then is added first on each page and a part of the body rows table is added thereafter.
The result: table headers on each page...
A word of warning: This is a proof of concept, I have only tested with the table generation code from TwoPagesTableTest. For production code you should apply further tests.

Related

Colum headers not visible in GridExporter add-on for Vaadin 23

hi i installed export grid for vaadin 23 the add-on is ok except when i exported in any format not insert the colunm name in first row only blank one for a column. I used a persistence Tuple class in grid to handler a generic query in table.
below an example
private Grid<Tuple> grd_report;
private void buildReport(ReportEntity re)
{
rg=new ReportGen(re);
List<Tuple> rows = rg.generate();
List<TupleElement<?>> elements = rows.get(0).getElements();
grd_report.removeAllColumns();
// HeaderRow headerRow;
if(grd_report.getHeaderRows().size()>0)
grd_report.getHeaderRows().clear();
else
grd_report.appendHeaderRow();
HeaderRow headerRow = grd_report.getHeaderRows().get(0);
for ( int idxCol = 0; idxCol < elements.size(); idxCol++ )
{
Integer xx=idxCol;
String ColumName=elements.get(idxCol).getAlias();
Grid.Column<Tuple> Column = grd_report.addColumn(te->te.get(xx)).setHeader(ColumName).setSortable(true).setKey(ColumName);
if(idxCol==0)
Column.setFooter("Total:" + rows.size());
headerRow.getCell(Column).setComponent(createFilterHeader(ColumName));
Column.setResizable(true);
}
grd_report.setItems(rows);
grd_report.setPageSize(30);
}
private void exportFile(ComboItem ci)
{
Anchor download=null;
exporter = GridExporter.createFor(grd_report);
exporter.setTitle(cmb_reports.getValue().getName());
exporter.setFileName("Export_" + new SimpleDateFormat("yyyyddMM").format(Calendar.getInstance().getTime()));
exporter.setAutoAttachExportButtons(false);
can you help me?
Massimiliano
can you resolved then problem

The add-on does not support component headers:
Grid<String> grid = new Grid<>();
Column<String> col1 = grid.addColumn(x -> x).setHeader("Has text header");
Column<String> col2 = grid.addColumn(x -> x).setHeader(new Span("Text header"));
System.out.println(GridHelper.getHeader(grid, col1));
System.out.println(GridHelper.getHeader(grid, col2)); //prints an empty string
Since Vaadin does not provide an API for retrieving the header (see edit), we used GridHelpers, which implements a workaround that is only able to retrieve string headers:
use a lot of reflection to dig out Renderer from the column, and then Template from the Renderer.
That is implemented here. GridExporter just delegates into it (source)
protected List<Pair<String,Column<T>>> getGridHeaders(Grid<T> grid) {
return exporter.columns.stream().map(column -> ImmutablePair.of(GridHelper.getHeader(grid,column),column))
.collect(Collectors.toList());
}
GridExporter does not (currently) allow setting an "export header" independently from the Grid header.
I created an enhancement request for that.
==EDIT==
Vaadin 23.2 does provide column.getHeaderComponent(), and it's also possible to do getHeaderComponent().getElement().getTextRecursively(), but in most cases it will not be enough, thus the need for a custom export header still stands.

values are getting overwritten in the table -PDFbox

I want to display the set of records in rows and columns. Am getting output but the thing is, it is getting overlapped. should i modify the loop can someone pls suggest.
ArrayList<ResultRecord> Records = new ArrayList<ResultRecord>(MainRestClient.fetchResultRecords(this.savedMainLog));
for(j=0; j<Records.size(); j++)
{
Row<PDPage> row4 = table.createRow(100.0f);
Cell<PDPage> cell10 = row4.createCell(15.0f, temp.getNumber());
Cell<PDPage> cell11 = row4.createCell(45.0f, temp.getDescription());
Cell<PDPage> cell12 = row4.createCell(15.0f, temp.getStatus());
Cell<PDPage> cell13 = row4.createCell(25.0f, temp.getRemarks());
}
The below is the full code for opening a PDF file. I want to retreive set of records in the row4 in the corresponding cells. But the is over written one above the another.
Expected output:
IT should display one below the another.
Is the overlapping reason,is it because of defining the row as row4.
try {
//table.draw();
cell.setFontSize(12);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}

First of all, you should clarify the table drawing library you use. PDFBox only is the underlying PDF library. Considering the classes used I would assume you are using Boxable on top of it.
Furthermore, the reason why all the tables are printed over each other is that you start each table at the same position on the same page, you use
BaseTable table = new BaseTable(yPosition, yStartNewPage,
bottomMargin, tableWidth, margin, document, page, true, drawContent);
without ever changing yPosition or page.
To get one table after the other, you have to update yPosition and page accordingly, e.g. by using the return value of table.draw() and the state of table then, i.e. by replacing
table.draw();
by
yPosition = table.draw();
page = table.getCurrentPage();

How can I create an accessible PDF with Java PDFBox 2.0.8 library that is also verifiable with PAC 2 tool?

Background
I have small project on GitHub in which I am trying to create a section 508 compliant (section508.gov) PDF which has form elements within a complex table structure. The tool recommended to verify these PDFs is at http://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html and my program’s output PDF does pass most of these checks. I will also know what every field is meant for at runtime, so adding tags to structure elements should not be an issue.
The Problem
The PAC 2 tool seems to have an issue with two particular items in the output PDF. In particular, my radio buttons’ widget annotations are not nested inside of a form structure element and my marked content is not tagged (Text and Table Cells).
PAC 2 verifies the P structure element that is within top-left cell but not the marked content…
However, PAC 2 does identify the marked content as an error (i.e. Text/Path object not tagged).
Also, the radio button widgets are detected, but there seems to be no APIs to add them to a form structure element.
What I Have Tried
I have looked at several questions on this website and others on the subject including this one Tagged PDF with PDFBox, but it seems that there are almost no examples for PDF/UA and very little useful documentation (That I have found). The most useful tips that I have found have been at sites that explain specs for tagged PDFs like https://taggedpdf.com/508-pdf-help-center/object-not-tagged/.
The Question
Is it possible to create a PAC 2 verifiable PDF with Apache PDFBox that includes marked content and radio button widget annotations? If it is possible, is it doable using higher level (non-deprecated) PDFBox APIs?
Side Note: This is actually my first StackExchange question (Although I have used the site extensively) and I hope everything is in order! Feel free to add any necessary edits and ask any questions that I may need clarify. Also, I have an example program on GitHub which generates my PDF document at https://github.com/chris271/UAPDFBox.
Edit 1: Direct link to Output PDF Document
*EDIT 2: After using some of the lower-level PDFBox APIs and viewing raw data streams for fully compliant PDFs with PDFDebugger, I was able to generate a PDF with nearly identical content structure compared to the compliant PDF's content structure... However, the same errors appear that the text objects are not tagged and I really can't decide where to go from here... Any guidance would be greatly appreciated!
Edit 3: Side-by-side raw PDF content comparison.
Edit 4: Internal structure of the generated PDF
and the compliant PDF
Edit 5: I have managed to fix the PAC 2 errors for tagged path/text objects thanks in part to suggestions from Tilman Hausherr! I will add an answer if I manage to fix the issues regarding 'annotation widgets not being nested inside form structure elements'.

After going through a large amount of the PDF Spec and many PDFBox examples I was able to fix all issues reported by PAC 2. There were several steps involved to create the verified PDF (with a complex table structure) and the full source code is available here on github. I will attempt to do an overview of the major portions of the code below. (Some method calls will not be explained here!)
Step 1 (Setup metadata)
Various setup info like document title and language
//Setup new document
pdf = new PDDocument();
acroForm = new PDAcroForm(pdf);
pdf.getDocumentInformation().setTitle(title);
//Adjust other document metadata
PDDocumentCatalog documentCatalog = pdf.getDocumentCatalog();
documentCatalog.setLanguage("English");
documentCatalog.setViewerPreferences(new PDViewerPreferences(new COSDictionary()));
documentCatalog.getViewerPreferences().setDisplayDocTitle(true);
documentCatalog.setAcroForm(acroForm);
documentCatalog.setStructureTreeRoot(structureTreeRoot);
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
documentCatalog.setMarkInfo(markInfo);
Embed all fonts directly into resources.
//Set AcroForm Appearance Characteristics
PDResources resources = new PDResources();
defaultFont = PDType0Font.load(pdf,
new PDTrueTypeFont(PDType1Font.HELVETICA.getCOSObject()).getTrueTypeFont(), true);
resources.put(COSName.getPDFName("Helv"), defaultFont);
acroForm.setNeedAppearances(true);
acroForm.setXFA(null);
acroForm.setDefaultResources(resources);
acroForm.setDefaultAppearance(DEFAULT_APPEARANCE);
Add XMP Metadata for PDF/UA spec.
//Add UA XMP metadata based on specs at https://taggedpdf.com/508-pdf-help-center/pdfua-identifier-missing/
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
xmp.createAndAddDublinCoreSchema();
xmp.getDublinCoreSchema().setTitle(title);
xmp.getDublinCoreSchema().setDescription(title);
xmp.createAndAddPDFAExtensionSchemaWithDefaultNS();
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/schema#", "pdfaSchema");
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/property#", "pdfaProperty");
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfua/ns/id/", "pdfuaid");
XMPSchema uaSchema = new XMPSchema(XMPMetadata.createXMPMetadata(),
"pdfaSchema", "pdfaSchema", "pdfaSchema");
uaSchema.setTextPropertyValue("schema", "PDF/UA Universal Accessibility Schema");
uaSchema.setTextPropertyValue("namespaceURI", "http://www.aiim.org/pdfua/ns/id/");
uaSchema.setTextPropertyValue("prefix", "pdfuaid");
XMPSchema uaProp = new XMPSchema(XMPMetadata.createXMPMetadata(),
"pdfaProperty", "pdfaProperty", "pdfaProperty");
uaProp.setTextPropertyValue("name", "part");
uaProp.setTextPropertyValue("valueType", "Integer");
uaProp.setTextPropertyValue("category", "internal");
uaProp.setTextPropertyValue("description", "Indicates, which part of ISO 14289 standard is followed");
uaSchema.addUnqualifiedSequenceValue("property", uaProp);
xmp.getPDFExtensionSchema().addBagValue("schemas", uaSchema);
xmp.getPDFExtensionSchema().setPrefix("pdfuaid");
xmp.getPDFExtensionSchema().setTextPropertyValue("part", "1");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(pdf);
metadata.importXMPMetadata(baos.toByteArray());
pdf.getDocumentCatalog().setMetadata(metadata);
Step 2 (Setup document tag structure)
You will need to add the root structure element and all necessary structure elements as children to the root element.
//Adds a DOCUMENT structure element as the structure tree root.
void addRoot() {
PDStructureElement root = new PDStructureElement(StandardStructureTypes.DOCUMENT, null);
root.setAlternateDescription("The document's root structure element.");
root.setTitle("PDF Document");
pdf.getDocumentCatalog().getStructureTreeRoot().appendKid(root);
currentElem = root;
rootElem = root;
}
Each marked content element (text and background graphics) will need to have an MCID and an associated tag for reference in the parent tree which will be explained in step 3.
//Assign an id for the next marked content element.
private void setNextMarkedContentDictionary(String tag) {
currentMarkedContentDictionary = new COSDictionary();
currentMarkedContentDictionary.setName("Tag", tag);
currentMarkedContentDictionary.setInt(COSName.MCID, currentMCID);
currentMCID++;
}
Artifacts (background graphics) will not be detected by the screen reader. Text needs to be detectable so a P structure element is used here when adding text.
//Set up the next marked content element with an MCID and create the containing TD structure element.
PDPageContentStream contents = new PDPageContentStream(
pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
currentElem = addContentToParent(null, StandardStructureTypes.TD, pages.get(pageIndex), currentRow);
//Make the actual cell rectangle and set as artifact to avoid detection.
setNextMarkedContentDictionary(COSName.ARTIFACT.getName());
contents.beginMarkedContent(COSName.ARTIFACT, PDPropertyList.create(currentMarkedContentDictionary));
//Draws the cell itself with the given colors and location.
drawDataCell(table.getCell(i, j).getCellColor(), table.getCell(i, j).getBorderColor(),
x + table.getRows().get(i).getCellPosition(j),
y + table.getRowPosition(i),
table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(), contents);
contents.endMarkedContent();
currentElem = addContentToParent(COSName.ARTIFACT, StandardStructureTypes.P, pages.get(pageIndex), currentElem);
contents.close();
//Draw the cell's text as a P structure element
contents = new PDPageContentStream(
pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
setNextMarkedContentDictionary(COSName.P.getName());
contents.beginMarkedContent(COSName.P, PDPropertyList.create(currentMarkedContentDictionary));
//... Code to draw actual text...//
//End the marked content and append it's P structure element to the containing TD structure element.
contents.endMarkedContent();
addContentToParent(COSName.P, null, pages.get(pageIndex), currentElem);
contents.close();
Annotation Widgets (form objects in this case) will need to be nested within Form structure elements.
//Add a radio button widget.
if (!table.getCell(i, j).getRbVal().isEmpty()) {
PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
radioWidgets.add(addRadioButton(
x + table.getRows().get(i).getCellPosition(j) -
radioWidgets.size() * 10 + table.getCell(i, j).getWidth() / 4,
y + table.getRowPosition(i),
table.getCell(i, j).getWidth() * 1.5f, 20,
radioValues, pageIndex, radioWidgets.size()));
fieldElem.setPage(pages.get(pageIndex));
COSArray kArray = new COSArray();
kArray.add(COSInteger.get(currentMCID));
fieldElem.getCOSObject().setItem(COSName.K, kArray);
addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
}
//Add a text field in the current cell.
if (!table.getCell(i, j).getTextVal().isEmpty()) {
PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
addTextField(x + table.getRows().get(i).getCellPosition(j),
y + table.getRowPosition(i),
table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(),
table.getCell(i, j).getTextVal(), pageIndex);
fieldElem.setPage(pages.get(pageIndex));
COSArray kArray = new COSArray();
kArray.add(COSInteger.get(currentMCID));
fieldElem.getCOSObject().setItem(COSName.K, kArray);
addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
}
Step 3
After all content elements have been written to the content stream and tag structure has been setup, it is necessary to go back and add the parent tree to the structure tree root. Note: Some method calls (addWidgetContent() and addContentToParent()) in the above code setup the necessary COSDictionary objects.
//Adds the parent tree to root struct element to identify tagged content
void addParentTree() {
COSDictionary dict = new COSDictionary();
nums.add(numDictionaries);
for (int i = 1; i < currentStructParent; i++) {
nums.add(COSInteger.get(i));
nums.add(annotDicts.get(i - 1));
}
dict.setItem(COSName.NUMS, nums);
PDNumberTreeNode numberTreeNode = new PDNumberTreeNode(dict, dict.getClass());
pdf.getDocumentCatalog().getStructureTreeRoot().setParentTreeNextKey(currentStructParent);
pdf.getDocumentCatalog().getStructureTreeRoot().setParentTree(numberTreeNode);
}
If all widget annotations and marked content were added correctly to the structure tree and parent tree then you should get something like this from PAC 2 and PDFDebugger.
Thank you to Tilman Hausherr for pointing me in the right direction to solve this! I will most likely make some edits to this answer for additional clarity as recommended by others.
Edit 1:
If you want to have a table structure like the one I have generated you will also need to add correct table markup to fully comply with the 508 standard... The 'Scope', 'ColSpan', 'RowSpan', or 'Headers' attributes will need to be correctly added to each table cell structure element similar to this or this. The main purpose for this markup is to allow a screen reading software like JAWS to read the table content in an understandable way. These attributes can be added in a similar way as below...
private void addTableCellMarkup(Cell cell, int pageIndex, PDStructureElement currentRow) {
COSDictionary cellAttr = new COSDictionary();
cellAttr.setName(COSName.O, "Table");
if (cell.getCellMarkup().isHeader()) {
currentElem = addContentToParent(null, StandardStructureTypes.TH, pages.get(pageIndex), currentRow);
currentElem.getCOSObject().setString(COSName.ID, cell.getCellMarkup().getId());
if (cell.getCellMarkup().getScope().length() > 0) {
cellAttr.setName(COSName.getPDFName("Scope"), cell.getCellMarkup().getScope());
}
if (cell.getCellMarkup().getColspan() > 1) {
cellAttr.setInt(COSName.getPDFName("ColSpan"), cell.getCellMarkup().getColspan());
}
if (cell.getCellMarkup().getRowSpan() > 1) {
cellAttr.setInt(COSName.getPDFName("RowSpan"), cell.getCellMarkup().getRowSpan());
}
} else {
currentElem = addContentToParent(null, StandardStructureTypes.TD, pages.get(pageIndex), currentRow);
}
if (cell.getCellMarkup().getHeaders().length > 0) {
COSArray headerA = new COSArray();
for (String s : cell.getCellMarkup().getHeaders()) {
headerA.add(new COSString(s));
}
cellAttr.setItem(COSName.getPDFName("Headers"), headerA);
}
currentElem.getCOSObject().setItem(COSName.A, cellAttr);
}
Be sure to do something like currentElem.setAlternateDescription(currentCell.getText()); on each of the structure elements with text marked content for JAWS to read the text.
Note: Each of the fields (radio button and textbox) will need a unique name to avoid setting multiple field values. GitHub has been updated with a more complex example PDF with table markup and improved form fields!

Stop iText table from spliting on new page

I am developing an app for android that generates pdf.
I am using itextpdf to generate the pdf.
I have the following problem:
I have a table that has 3 rows and when this table is near the end of a page sometimes it puts one row on one page and two rows on the next page.
Is there a way to force this table to start on the next page so I can have the full table on the next page?
Thanks

As an alternative to Bruno's approach of nesting the table in a 1-cell table to prevent splitting, you can also use PdfPTable.setKeepTogether(true) to start the table on a new page when it doesn't fit the current page.
Using a similar example:
Paragraph p = new Paragraph("Test");
PdfPTable table = new PdfPTable(2);
for (int i = 1; i < 6; i++) {
table.addCell("key " + i);
table.addCell("value " + i);
}
for (int i = 0; i < 40; i++) {
document.add(p);
}
// Try to keep the table on 1 page
table.setKeepTogether(true);
document.add(table);
Both approaches (nesting in a 1-cell table and using setKeepTogether()) behave exactly the same in my tests. This includes when the table is too large to fit on the new page and still needs to be split, e.g. when adding 50 instead of 5 rows in the example above.

Please take a look at the Splitting example:
Paragraph p = new Paragraph("Test");
PdfPTable table = new PdfPTable(2);
for (int i = 1; i < 6; i++) {
table.addCell("key " + i);
table.addCell("value " + i);
}
for (int i = 0; i < 40; i++) {
document.add(p);
}
document.add(table);
We have a table with 5 rows, and in this case, we're adding some paragraphs so that the table is added at the end of a page.
By default, iText will try not to split rows, but if the full table doesn't fit, it will forward the rows that don't fit to the next page:
You want to avoid this behavior: you don't want the table to split.
Knowing that iText will try to keep full rows intact, you can work around this problem by nesting the table you don' want to split inside another table:
PdfPTable nesting = new PdfPTable(1);
PdfPCell cell = new PdfPCell(table);
cell.setBorder(PdfPCell.NO_BORDER);
nesting.addCell(cell);
document.add(nesting);
Now you get this result:
There was sufficient space on the previous page to render a couple of rows, but as we've wrapped the full table inside a row with a single column, iText will forward the complete table to the next page.

Tables in PDF with horizontal page breaks

Does someone know a (preferably open-source) PDF layout engine for Java, capable of rendering tables with horizontal page breaks? "Horizontal page breaking" is at least how the feature is named in BIRT, but to clarify: If a table has too many columns to fit across the available page width, I want the table to be split horizontally across multiple pages, e.g. for a 10-column table, the columns 1-4 to be output on the first page and columns 5-10 on the second page. This should of course also be repeated on the following pages, if the table has too many rows to fit vertically on one page.
So far, it has been quite difficult to search for products. I reckon that such a feature may be named differently in other products, making it difficult to use aunt Google to find a suitable solution.
So far, I've tried:
BIRT claims to support this, but the actual implementation is so buggy, that it cannot be used. I though it is self-evident for such a functionality, that the row height is kept consistent across all pages, making it possible to align the rows when placing the pages next to each other. BIRT however calculates the required row height separately for each page.
Jasper has no support.
I also considered Apache FOP, but I don't find any suitable syntax for this in the XSL-FO specification.
iText is generally a little bit too "low level" for this task anyway (making it difficult to layout other parts of the intended PDF documents), but does not seem to offer support.
Since there seem to be some dozens other reporting or layout engines, which may or may not fit and I find it a little bit difficult to guess exactly what to look for, I was hoping that someone perhaps already had similar requirements and can provide at least a suggestion in the right direction. It is relatively important that the product can be easily integrated in a Java server application, a native Java library would be ideal.
Now, to keep the rows aligned across all pages, the row heights must be calculated as follows:
Row1.height = max(A1.height, B1.height, C1.height, D1.height)
Row2.height = max(A2.height, B2.height, C2.height, D2.height)
While BIRT currently seem to do something like:
Page1.Row1.height = max(A1.height, B1.height)
Page2.Row1.height = max(C1.height, D1.height)
Page1.Row2.height = max(A2.height, B2.height)
Page2.Row2.height = max(C2.height, D2.height)

It's possible to display a table the way you want with iText. You need to use custom table positioning and custom row and column writing.
I was able to adapt this iText example to write on multiple pages horizontally and vertically. The idea is to remember the start and end row that get in vertically on a page. I've put the whole code so you can easily run it.
public class Main {
public static final String RESULT = "results/part1/chapter04/zhang.pdf";
public static final float PAGE_HEIGHT = PageSize.A4.getHeight() - 100f;
public void createPdf(String filename)
throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
//setup of the table: first row is a really tall one
PdfPTable table = new PdfPTable(new float[] {1, 5, 5, 1});
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 50; i++) {
sb.append("tall text").append(i + 1).append("\n");
}
for(int i = 0; i < 4; i++) {
table.addCell(sb.toString());
}
for (int i = 0; i < 50; i++) {
sb = new StringBuilder("some text");
table.addCell(sb.append(i + 1).append(" col1").toString());
sb = new StringBuilder("some text");
table.addCell(sb.append(i + 1).append(" col2").toString());
sb = new StringBuilder("some text");
table.addCell(sb.append(i + 1).append(" col3").toString());
sb = new StringBuilder("some text");
table.addCell(sb.append(i + 1).append(" col4").toString());
}
// set the total width of the table
table.setTotalWidth(600);
PdfContentByte canvas = writer.getDirectContent();
ArrayList<PdfPRow> rows = table.getRows();
//check every row height and split it if is taller than the page height
//can be enhanced to split if the row is 2,3, ... n times higher than the page
for (int i = 0; i < rows.size(); i++) {
PdfPRow currentRow = rows.get(i);
float rowHeight = currentRow.getMaxHeights();
if(rowHeight > PAGE_HEIGHT) {
PdfPRow newRow = currentRow.splitRow(table,i, PAGE_HEIGHT);
if(newRow != null) {
rows.add(++i, newRow);
}
}
}
List<Integer[]> chunks = new ArrayList<Integer[]>();
int startRow = 0;
int endRow = 0;
float chunkHeight = 0;
//determine how many rows gets in one page vertically
//and remember the first and last row that gets in one page
for (int i = 0; i < rows.size(); i++) {
PdfPRow currentRow = rows.get(i);
chunkHeight += currentRow.getMaxHeights();
endRow = i;
//verify against some desired height
if (chunkHeight > PAGE_HEIGHT) {
//remember start and end row
chunks.add(new Integer[]{startRow, endRow});
startRow = endRow;
chunkHeight = 0;
i--;
}
}
//last pair
chunks.add(new Integer[]{startRow, endRow + 1});
//render each pair of startRow - endRow on 2 pages horizontally, get to the next page for the next pair
for(Integer[] chunk : chunks) {
table.writeSelectedRows(0, 2, chunk[0], chunk[1], 236, 806, canvas);
document.newPage();
table.writeSelectedRows(2, -1, chunk[0], chunk[1], 36, 806, canvas);
document.newPage();
}
document.close();
}
public static void main(String[] args) throws IOException, DocumentException {
new Main().createPdf(RESULT);
}
}
I understand that maybe iText is too low level just for reports, but it can be employed beside standard reporting tools for special needs like this.
Update: Now rows taller than page height are first splited. The code doesn't do splitting if the row is 2, 3,..., n times taller but can be adapted for this too.

Same idea here than Dev Blanked but using wkhtmltopdf (https://code.google.com/p/wkhtmltopdf/) and some javascript, you can achieve what you need. When running wkhtmltopdf against this fiddle you get the result shown below (screenshot of pdf pages). You can place the "break-after" class anywhere on the header row. We use wkhtmltopdf server-side in a Java EE web app to produce dynamic reports and the performance is actually very good.
HTML
<body>
<table id="table">
<thead>
<tr><th >Header 1</th><th class="break-after">Header 2</th><th>Header 3</th><th>Header 4</th></tr>
</thead>
<tbody>
<tr valign="top">
<td>A1<br/>text<br/>text</td>
<td>B1<br/>text</td>
<td>C1</td>
<td>D1</td>
</tr>
<tr valign="top">
<td>A2</td>
<td>B2<br/>text<br/>text<br/>text</td>
<td>C2</td>
<td>D2<br/>text</td>
</tr>
</tbody>
</table>
</body>
Script
$(document).ready(function() {
var thisTable = $('#table'),
otherTable= thisTable.clone(false, true),
breakAfterIndex = $('tr th', thisTable).index($('tr th.break-after', thisTable)),
wrapper = $('<div/>');
wrapper.css({'page-break-before': 'always'});
wrapper.append(otherTable);
thisTable.after(wrapper);
$('tr', thisTable).find('th:gt(' + breakAfterIndex + ')').remove();
$('tr', thisTable).find('td:gt(' + breakAfterIndex + ')').remove();
$('tr', otherTable).find('th:lt(' + (breakAfterIndex + 1) + ')').remove();
$('tr', otherTable).find('td:lt(' + (breakAfterIndex + 1) + ')').remove();
$('tr', table).each(function(index) {
var $this =$(this),
$otherTr = $($('tr', otherTable).get(index)),
maxHeight = Math.max($this.height(), $otherTr.height());
$this.height(maxHeight);
$otherTr.height(maxHeight);
});
});

Have you tried http://code.google.com/p/flying-saucer/. It is supposed to convert HTML to PDF.

My advice is to use FOP transformer.
Here you can see some examples and how to use it.
Here you can find some examples with fop and tables.

Jasper has no support.
According to the Jasper documentation it does have support, via:
column break element (i.e. a break element with a type=column attribute). This can be placed at any location in a report.
isStartNewColumn attribute on groups/headers
See http://books.google.com.au/books?id=LWTbssKt6MUC&pg=PA165&lpg=PA165&dq=jasper+reports+%22column+break%22&source=bl&ots=aSKZfqgHR5&sig=KlH4_OiLP-cNsBPGJ7yzWPYgH_k&hl=en&sa=X&ei=h_1kUb6YO6uhiAeNk4GYCw&redir_esc=y#v=onepage&q=column%20break&f=false
If you're really stuck, as a last resort you could use Excel / OpenOffice Calc: manually copy data into cells, manually format it as you desire, save as xls format. Then use apache POI from java to dynamically populate/replace the desired data & print to file/PDF. At least it gives very fine-grained control of column & row formatting/breaks/margins etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.