Background
I have small project on GitHub in which I am trying to create a section 508 compliant (section508.gov) PDF which has form elements within a complex table structure. The tool recommended to verify these PDFs is at http://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html and my program’s output PDF does pass most of these checks. I will also know what every field is meant for at runtime, so adding tags to structure elements should not be an issue.
The Problem
The PAC 2 tool seems to have an issue with two particular items in the output PDF. In particular, my radio buttons’ widget annotations are not nested inside of a form structure element and my marked content is not tagged (Text and Table Cells).
PAC 2 verifies the P structure element that is within top-left cell but not the marked content…
However, PAC 2 does identify the marked content as an error (i.e. Text/Path object not tagged).
Also, the radio button widgets are detected, but there seems to be no APIs to add them to a form structure element.
What I Have Tried
I have looked at several questions on this website and others on the subject including this one Tagged PDF with PDFBox, but it seems that there are almost no examples for PDF/UA and very little useful documentation (That I have found). The most useful tips that I have found have been at sites that explain specs for tagged PDFs like https://taggedpdf.com/508-pdf-help-center/object-not-tagged/.
The Question
Is it possible to create a PAC 2 verifiable PDF with Apache PDFBox that includes marked content and radio button widget annotations? If it is possible, is it doable using higher level (non-deprecated) PDFBox APIs?
Side Note: This is actually my first StackExchange question (Although I have used the site extensively) and I hope everything is in order! Feel free to add any necessary edits and ask any questions that I may need clarify. Also, I have an example program on GitHub which generates my PDF document at https://github.com/chris271/UAPDFBox.
Edit 1: Direct link to Output PDF Document
*EDIT 2: After using some of the lower-level PDFBox APIs and viewing raw data streams for fully compliant PDFs with PDFDebugger, I was able to generate a PDF with nearly identical content structure compared to the compliant PDF's content structure... However, the same errors appear that the text objects are not tagged and I really can't decide where to go from here... Any guidance would be greatly appreciated!
Edit 3: Side-by-side raw PDF content comparison.
Edit 4: Internal structure of the generated PDF
and the compliant PDF
Edit 5: I have managed to fix the PAC 2 errors for tagged path/text objects thanks in part to suggestions from Tilman Hausherr! I will add an answer if I manage to fix the issues regarding 'annotation widgets not being nested inside form structure elements'.
After going through a large amount of the PDF Spec and many PDFBox examples I was able to fix all issues reported by PAC 2. There were several steps involved to create the verified PDF (with a complex table structure) and the full source code is available here on github. I will attempt to do an overview of the major portions of the code below. (Some method calls will not be explained here!)
Step 1 (Setup metadata)
Various setup info like document title and language
//Setup new document
pdf = new PDDocument();
acroForm = new PDAcroForm(pdf);
pdf.getDocumentInformation().setTitle(title);
//Adjust other document metadata
PDDocumentCatalog documentCatalog = pdf.getDocumentCatalog();
documentCatalog.setLanguage("English");
documentCatalog.setViewerPreferences(new PDViewerPreferences(new COSDictionary()));
documentCatalog.getViewerPreferences().setDisplayDocTitle(true);
documentCatalog.setAcroForm(acroForm);
documentCatalog.setStructureTreeRoot(structureTreeRoot);
PDMarkInfo markInfo = new PDMarkInfo();
markInfo.setMarked(true);
documentCatalog.setMarkInfo(markInfo);
Embed all fonts directly into resources.
//Set AcroForm Appearance Characteristics
PDResources resources = new PDResources();
defaultFont = PDType0Font.load(pdf,
new PDTrueTypeFont(PDType1Font.HELVETICA.getCOSObject()).getTrueTypeFont(), true);
resources.put(COSName.getPDFName("Helv"), defaultFont);
acroForm.setNeedAppearances(true);
acroForm.setXFA(null);
acroForm.setDefaultResources(resources);
acroForm.setDefaultAppearance(DEFAULT_APPEARANCE);
Add XMP Metadata for PDF/UA spec.
//Add UA XMP metadata based on specs at https://taggedpdf.com/508-pdf-help-center/pdfua-identifier-missing/
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
xmp.createAndAddDublinCoreSchema();
xmp.getDublinCoreSchema().setTitle(title);
xmp.getDublinCoreSchema().setDescription(title);
xmp.createAndAddPDFAExtensionSchemaWithDefaultNS();
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/schema#", "pdfaSchema");
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/property#", "pdfaProperty");
xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfua/ns/id/", "pdfuaid");
XMPSchema uaSchema = new XMPSchema(XMPMetadata.createXMPMetadata(),
"pdfaSchema", "pdfaSchema", "pdfaSchema");
uaSchema.setTextPropertyValue("schema", "PDF/UA Universal Accessibility Schema");
uaSchema.setTextPropertyValue("namespaceURI", "http://www.aiim.org/pdfua/ns/id/");
uaSchema.setTextPropertyValue("prefix", "pdfuaid");
XMPSchema uaProp = new XMPSchema(XMPMetadata.createXMPMetadata(),
"pdfaProperty", "pdfaProperty", "pdfaProperty");
uaProp.setTextPropertyValue("name", "part");
uaProp.setTextPropertyValue("valueType", "Integer");
uaProp.setTextPropertyValue("category", "internal");
uaProp.setTextPropertyValue("description", "Indicates, which part of ISO 14289 standard is followed");
uaSchema.addUnqualifiedSequenceValue("property", uaProp);
xmp.getPDFExtensionSchema().addBagValue("schemas", uaSchema);
xmp.getPDFExtensionSchema().setPrefix("pdfuaid");
xmp.getPDFExtensionSchema().setTextPropertyValue("part", "1");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(pdf);
metadata.importXMPMetadata(baos.toByteArray());
pdf.getDocumentCatalog().setMetadata(metadata);
Step 2 (Setup document tag structure)
You will need to add the root structure element and all necessary structure elements as children to the root element.
//Adds a DOCUMENT structure element as the structure tree root.
void addRoot() {
PDStructureElement root = new PDStructureElement(StandardStructureTypes.DOCUMENT, null);
root.setAlternateDescription("The document's root structure element.");
root.setTitle("PDF Document");
pdf.getDocumentCatalog().getStructureTreeRoot().appendKid(root);
currentElem = root;
rootElem = root;
}
Each marked content element (text and background graphics) will need to have an MCID and an associated tag for reference in the parent tree which will be explained in step 3.
//Assign an id for the next marked content element.
private void setNextMarkedContentDictionary(String tag) {
currentMarkedContentDictionary = new COSDictionary();
currentMarkedContentDictionary.setName("Tag", tag);
currentMarkedContentDictionary.setInt(COSName.MCID, currentMCID);
currentMCID++;
}
Artifacts (background graphics) will not be detected by the screen reader. Text needs to be detectable so a P structure element is used here when adding text.
//Set up the next marked content element with an MCID and create the containing TD structure element.
PDPageContentStream contents = new PDPageContentStream(
pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
currentElem = addContentToParent(null, StandardStructureTypes.TD, pages.get(pageIndex), currentRow);
//Make the actual cell rectangle and set as artifact to avoid detection.
setNextMarkedContentDictionary(COSName.ARTIFACT.getName());
contents.beginMarkedContent(COSName.ARTIFACT, PDPropertyList.create(currentMarkedContentDictionary));
//Draws the cell itself with the given colors and location.
drawDataCell(table.getCell(i, j).getCellColor(), table.getCell(i, j).getBorderColor(),
x + table.getRows().get(i).getCellPosition(j),
y + table.getRowPosition(i),
table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(), contents);
contents.endMarkedContent();
currentElem = addContentToParent(COSName.ARTIFACT, StandardStructureTypes.P, pages.get(pageIndex), currentElem);
contents.close();
//Draw the cell's text as a P structure element
contents = new PDPageContentStream(
pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
setNextMarkedContentDictionary(COSName.P.getName());
contents.beginMarkedContent(COSName.P, PDPropertyList.create(currentMarkedContentDictionary));
//... Code to draw actual text...//
//End the marked content and append it's P structure element to the containing TD structure element.
contents.endMarkedContent();
addContentToParent(COSName.P, null, pages.get(pageIndex), currentElem);
contents.close();
Annotation Widgets (form objects in this case) will need to be nested within Form structure elements.
//Add a radio button widget.
if (!table.getCell(i, j).getRbVal().isEmpty()) {
PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
radioWidgets.add(addRadioButton(
x + table.getRows().get(i).getCellPosition(j) -
radioWidgets.size() * 10 + table.getCell(i, j).getWidth() / 4,
y + table.getRowPosition(i),
table.getCell(i, j).getWidth() * 1.5f, 20,
radioValues, pageIndex, radioWidgets.size()));
fieldElem.setPage(pages.get(pageIndex));
COSArray kArray = new COSArray();
kArray.add(COSInteger.get(currentMCID));
fieldElem.getCOSObject().setItem(COSName.K, kArray);
addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
}
//Add a text field in the current cell.
if (!table.getCell(i, j).getTextVal().isEmpty()) {
PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
addTextField(x + table.getRows().get(i).getCellPosition(j),
y + table.getRowPosition(i),
table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(),
table.getCell(i, j).getTextVal(), pageIndex);
fieldElem.setPage(pages.get(pageIndex));
COSArray kArray = new COSArray();
kArray.add(COSInteger.get(currentMCID));
fieldElem.getCOSObject().setItem(COSName.K, kArray);
addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
}
Step 3
After all content elements have been written to the content stream and tag structure has been setup, it is necessary to go back and add the parent tree to the structure tree root. Note: Some method calls (addWidgetContent() and addContentToParent()) in the above code setup the necessary COSDictionary objects.
//Adds the parent tree to root struct element to identify tagged content
void addParentTree() {
COSDictionary dict = new COSDictionary();
nums.add(numDictionaries);
for (int i = 1; i < currentStructParent; i++) {
nums.add(COSInteger.get(i));
nums.add(annotDicts.get(i - 1));
}
dict.setItem(COSName.NUMS, nums);
PDNumberTreeNode numberTreeNode = new PDNumberTreeNode(dict, dict.getClass());
pdf.getDocumentCatalog().getStructureTreeRoot().setParentTreeNextKey(currentStructParent);
pdf.getDocumentCatalog().getStructureTreeRoot().setParentTree(numberTreeNode);
}
If all widget annotations and marked content were added correctly to the structure tree and parent tree then you should get something like this from PAC 2 and PDFDebugger.
Thank you to Tilman Hausherr for pointing me in the right direction to solve this! I will most likely make some edits to this answer for additional clarity as recommended by others.
Edit 1:
If you want to have a table structure like the one I have generated you will also need to add correct table markup to fully comply with the 508 standard... The 'Scope', 'ColSpan', 'RowSpan', or 'Headers' attributes will need to be correctly added to each table cell structure element similar to this or this. The main purpose for this markup is to allow a screen reading software like JAWS to read the table content in an understandable way. These attributes can be added in a similar way as below...
private void addTableCellMarkup(Cell cell, int pageIndex, PDStructureElement currentRow) {
COSDictionary cellAttr = new COSDictionary();
cellAttr.setName(COSName.O, "Table");
if (cell.getCellMarkup().isHeader()) {
currentElem = addContentToParent(null, StandardStructureTypes.TH, pages.get(pageIndex), currentRow);
currentElem.getCOSObject().setString(COSName.ID, cell.getCellMarkup().getId());
if (cell.getCellMarkup().getScope().length() > 0) {
cellAttr.setName(COSName.getPDFName("Scope"), cell.getCellMarkup().getScope());
}
if (cell.getCellMarkup().getColspan() > 1) {
cellAttr.setInt(COSName.getPDFName("ColSpan"), cell.getCellMarkup().getColspan());
}
if (cell.getCellMarkup().getRowSpan() > 1) {
cellAttr.setInt(COSName.getPDFName("RowSpan"), cell.getCellMarkup().getRowSpan());
}
} else {
currentElem = addContentToParent(null, StandardStructureTypes.TD, pages.get(pageIndex), currentRow);
}
if (cell.getCellMarkup().getHeaders().length > 0) {
COSArray headerA = new COSArray();
for (String s : cell.getCellMarkup().getHeaders()) {
headerA.add(new COSString(s));
}
cellAttr.setItem(COSName.getPDFName("Headers"), headerA);
}
currentElem.getCOSObject().setItem(COSName.A, cellAttr);
}
Be sure to do something like currentElem.setAlternateDescription(currentCell.getText()); on each of the structure elements with text marked content for JAWS to read the text.
Note: Each of the fields (radio button and textbox) will need a unique name to avoid setting multiple field values. GitHub has been updated with a more complex example PDF with table markup and improved form fields!
I want to share diamond detail in table format(Plain text not HTML)
I have made this using this library but its prints the data properly using the system.out but while I sharing it, its format is changed:
Below is my code:
List<String> headersList = Arrays.asList("", "");
List<List<String>> rowsList = Arrays.asList(
Arrays.asList("Stone Id :", details[0]),
Arrays.asList("Lab", details[1]),
Arrays.asList("Shape", details[2]),
Arrays.asList("Carat", details[3]),
Arrays.asList("Clarity-Color", details[4]),
Arrays.asList("Cut-Pol-Sym-Flou", details[5]));
Board board = new Board(75);
Table table = new Table(board, 75, headersList, rowsList);
table.invalidate().setGridMode(Table.GRID_NON).setRowsList(rowsList);
List<Integer> colWidthsList = Arrays.asList(30, 14);
table.setColWidthsList(colWidthsList);
Block tableBlock = table.tableToBlocks();
board.setInitialBlock(tableBlock);
board.build();
String preview1 = board.getPreview();
System.out.print(preview1);
sharingIntent.putExtra(Intent.EXTRA_TEXT,preview1);
The console uses monospaced font which takes exactly same width for all characters. But your view isn't using it, so it looks messed up.
Use a monospaced font.
Or use a tabular format. Perhaps a ListView with each row having two text views side by side having fixed width.
Backgroud:
Use Java + BIRT to generate report.
Generate report in viewer and allow user to choose to export it to different format (pdf, xls, word...).
All program are in "Layout", no program in "Master Page".
Have 1 "Data Set". The fields in "Layout" refer to this DS.
There is Group in "Layout", gropu by one field.
In "Group Header", I create one cell to use as page number. "Page : MyPageNumber".
"MyPageNumber" is a field I define which would +1 in Group Header.
Problem:
When I use 1st method to generate report, "MyPageNumber" could not show correctly. Because group header only load one time for each group. It would always show 1.
Question:
As I know there is "restart page number in group" in Crystal report. How to restart page in BIRT?
I want to show data of different group in 1 report file, and the page number start from 1 for each group.
You can do it with BIRT reports using page variables. For example:
Add 2 page variables... Group_page, Group_name.
Add 1 report variable... Group_total_page.
In the report beforeFactory add the script:
prevGroupKey = "";
groupPageNumber = 1;
reportContext.setGlobalVariable("gGROUP_NAME", "");
reportContext.setGlobalVariable("gGROUP_PAGE", 1);
In the report onPageEnd add the script:
var groupKey = currGroup;
var prevGroupKey = reportContext.getGlobalVariable("gGROUP_NAME");
var groupPageNumber = reportContext.getGlobalVariable("gGROUP_PAGE");
if( prevGroupKey == null ){
prevGroupKey = "";
}
if (prevGroupKey == groupKey)
{
if (groupPageNumber != null)
{
groupPageNumber = parseInt(groupPageNumber) + 1;
}
else {
groupPageNumber = 1;
}
}
else {
groupPageNumber = 1;
prevGroupKey = groupKey;
}
reportContext.setPageVariable("GROUP_NAME", groupKey);
reportContext.setPageVariable("GROUP_PAGE", groupPageNumber);
reportContext.setGlobalVariable("gGROUP_NAME", groupKey);
reportContext.setGlobalVariable("gGROUP_PAGE", groupPageNumber);
var groupTotalPage = reportContext.getPageVariable("GROUP_TOTAL_PAGE");
if (groupTotalPage == null)
{
groupTotalPage = new java.util.HashMap();
reportContext.setPageVariable("GROUP_TOTAL_PAGE", groupTotalPage);
}
groupTotalPage.put(groupKey, groupPageNumber);
In a master page onRender script add the following script:
var totalPage = reportContext.getPageVariable("GROUP_TOTAL_PAGE");
var groupName = reportContext.getPageVariable("GROUP_NAME");
if (totalPage != null)
{
this.text = java.lang.Integer.toString(totalPage.get(groupName));
}
In the table group header onCreate event, add the following script, replacing 'COUNTRY' with the name of the column that you are grouping on:
currGroup = this.getRowData().getColumnValue("COUNTRY");
In the master page add a grid to the header or footer and add an autotext variable for Group_page and Group_total_page. Optionally add the page variable for the Group_name as well.
Check out these links for more information about BIRT page variables:
https://books.google.ch/books?id=aIjZ4FYJOQkC&pg=PA85&lpg=PA85&dq=birt+change+autotext&source=bl&ots=K0nCmF2hrD&sig=CBOr_otRW0B72sZoFS7LC_1Mrz4&hl=en&sa=X&ei=ZKNAVcnuLYLHsAXRmIHoCw&ved=0CEoQ6AEwBQ#v=onepage&q=birt%20change%20autotext&f=false
https://www.youtube.com/watch?v=lw_k1qHY_gU
http://www.eclipse.org/birt/phoenix/project/notable2.5.php#jump_4
https://bugs.eclipse.org/bugs/show_bug.cgi?id=316173
http://www.eclipse.org/forums/index.php/t/575172/
Alas, this is not supported with BIRT.
That's probably not the answer you've hoped for, but it's the truth.
This is one of the very few aspects where BIRT is way behind other report generator tools.
However, depending on how you have BIRT integrated into your environment, a workaround approach is possible for PDF export that we use in our solution with great success.
The idea is to let BIRT generate a PDF outline based on the grouping.
And the BIRT report creates information in the ReportContext about where and how it wants the page numbers to be displayed.
After BIRT generated the PDF, a custom PDFPostProcessor uses the PDF outline and the information from the ReportContext to add the page numbers with iText.
If this work-around is viable for you, feel free to contact me.
I have a form that runs a java agent on the WebQueryOpen event. This agent pulls data from a DB2 database and then puts them into the computed text fields I have placed on the form and are displayed whenever I open the form in the browser. This is working for me. However, when I try to use RichTextFields I get a ClassCastException error. No document is actually saved, I just open the form in the browser using this domino URL - https://company.com/database.nsf/sampleform?OpenForm
Sample code of simple text field - Displayed with w/o problems
Document sampledoc = agentContext.getDocumentContext();
String samplestr = "sample data from db2";
sampledoc.replaceItemValue("sampletextfield", samplestr);
When I tried using rich text field
Document sampledoc = agentContext.getDocumentContext();
String samplestr = "sample data from db2";
RichTextItem rtsample = (RichTextItem)sampledoc.getFirstItem('samplerichtextfield');
rtsample.appendText(samplestr); // ClassCastException error
Basically, I wanted to use rich text field so that it could accommodate more characters in case I pull a very long string data.
Screenshot of the field (As you can see it's a RichText)
The problem is that you're trying to access a regular Item as a RichTextItem.
The RichTextItem are special fields that are created with its own method just like this:
RichTextItem rtsample = (RichTextItem)sampledoc.createRichTextItem('samplerichtextfield');
It's different to the regular Items that can be created with a simple sampledoc.replaceItemValue(etc).
So, if you want to know if a item is RichTextItem and if it does not exist, create it, you can do this:
RichTextItem rti = null;
Item item = doc.getFirstItem("somefield");
if (item != null) {
if (item instanceof RichTextItem) {
//Yay!
rti = (RichTextItem) item;
} else {
//:-(
}
} else {
rti = doc.createRichTextItem("somefield");
//etc.
}
I am creating PPT files using the library docx4j. I have been able to create slides with text and images, but I have not been able to add notes to them.
I am creating the slide like this:
MainPresentationPart pp = (MainPresentationPart)presentationParts.get(new PartName("/ppt/presentation.xml"));
SlideLayoutPart layoutPart = (SlideLayoutPart)presentationParts.get(new PartName("/ppt/slideLayouts/slideLayout1.xml"));
SlidePart slidePart = PresentationMLPackage.createSlidePart(pp, layoutPart, new PartName("/ppt/slides/slide" + ++slideNumber + ".xml"));
so I can add text or images to the body, but when I try to access the field slidePart.notes it is null. I have tried to initialize it
slidePart.setPartShortcut(new NotesSlidePart());
but then everything inside notes is null and I have not achieved anything.
So, does anyone have a working example of how to add notes to a PPT file?
Many thanks
Its not enough to do:
slidePart.setPartShortcut(new NotesSlidePart());
You need to explicitly add the notes slide part to your slide part (so that the relationships get set up correctly), by invoking addTargetPart.
But there's more you have to do given the way the pptx format works. To see what parts are required, upload a pptx to the docx4j webapp. Here's the code I wrote just now based on doing that:
// Now add notes slide.
// 1. Notes master
NotesMasterPart nmp = new NotesMasterPart();
NotesMaster notesmaster = (NotesMaster)XmlUtils.unmarshalString(notesMasterXml, Context.jcPML);
nmp.setJaxbElement(notesmaster);
// .. connect it to /ppt/presentation.xml
Relationship ppRelNmp = pp.addTargetPart(nmp);
/*
* <p:notesMasterIdLst>
<p:notesMasterId r:id="rId3"/>
</p:notesMasterIdLst>
*/
pp.getJaxbElement().setNotesMasterIdLst(createNotesMasterIdListPlusEntry(ppRelNmp.getId()));
// .. NotesMasterPart typically has a rel to a theme
// .. can we get away without it?
// Nope .. read this in from a file
ThemePart themePart = new ThemePart(new PartName("/ppt/theme/theme2.xml"));
// TODO: read it from a string instead
themePart.unmarshal(
FileUtils.openInputStream(new File(System.getProperty("user.dir") + "/theme2.xml"))
);
nmp.addTargetPart(themePart);
// 2. Notes slide
NotesSlidePart nsp = new NotesSlidePart();
Notes notes = (Notes)XmlUtils.unmarshalString(notesXML, Context.jcPML);
nsp.setJaxbElement(notes);
// .. connect it to the slide
slidePart.addTargetPart(nsp);
// .. it also has a rel to the slide
nsp.addTargetPart(slidePart);
// .. and the slide master
nsp.addTargetPart(nmp);
You can find the complete example at https://github.com/plutext/docx4j/blob/master/src/samples/pptx4j/org/pptx4j/samples/SlideNotes.java