How to get rid of Helvetica in iText XMLWorker?

How to get rid of Helvetica in iText XMLWorker? - java

We're using iText to generate PDF files from Java code, which works pretty well in most cases. A few days ago we started to generate PDF/A instead of normal PDF files which needs to embed all fonts. The iText Document is mostly build of custom PdfPTable and other classes where we control the fonts directly. All used fonts are created from TTF files loaded via the following code - which works just fine:
private BaseFont load(String path) {
try {
URL fontResource = PrintSettings.class.getResource(path);
if (fontResource == null) {
return null;
}
String fontPath = fontResource.toExternalForm();
BaseFont baseFont = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
baseFont.setSubset(true);
return baseFont;
}
catch (DocumentException ex) {
Logger.getLogger(PrintSettings.class).warn("...");
}
catch (IOException ex) {
Logger.getLogger(PrintSettings.class).warn("...");
}
return FontFactory.getFont(PrintSettings.FONT, "UTF-8", true, 8f, Font.NORMAL, PrintSettings.COLOR_TEXT).getBaseFont();
}
Now we use one specific content type in the PDF which generates from HTML code. We use the XMLWorkerto handle that part. This worked just fine, as long as we didn't embed the fonts. But with PDF/A we need to embed all fonts and now we struggle with an unknown source of Helvetica usage.
We've tried to solve this by using our own FontProvider class like this one:
public class PrintFontProvider extends FontFactoryImp {
#Override
public Font getFont(String fontName, String encoding, boolean embedded, float size, int style, BaseColor color, boolean cached) {
// LiberationSans – http://de.wikipedia.org/wiki/Liberation_(Schriftart) – http://scripts.sil.org/cms/scripts/page.php?item_id=OFL_web
if (style == Font.NORMAL) return new Font(this.load("fonts/Liberation/LiberationSans-Regular.ttf"), size, Font.NORMAL, color);
if (style == Font.BOLD) return new Font(this.load("fonts/Liberation/LiberationSans-Bold.ttf"), size, Font.NORMAL, color);
if (style == Font.BOLDITALIC) return new Font(this.load("fonts/Liberation/LiberationSans-BoldItalic.ttf"), size, Font.NORMAL, color);
if (style == Font.ITALIC) return new Font(this.load("fonts/Liberation/LiberationSans-Italic.ttf"), size, Font.NORMAL, color);
return new Font(this.load("fonts/Liberation/LiberationSans-Regular.ttf"), size, style, color);
}
private BaseFont load(String path) { ... }
}
It's connected with the XMLWorker using the following code:
HtmlPipelineContext html = new HtmlPipelineContext(null);
html.setTagFactory(Tags.getHtmlTagProcessorFactory());
CSSResolver css = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// We need to control the FontProdiver!
html.setCssAppliers(new CssAppliersImpl(new PrintFontProvider()));
Pipeline<?> pipeline = new CssResolverPipeline(css, new HtmlPipeline(html, new PdfWriterPipeline(this.document, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(worker);
p.parse(new ByteArrayInputStream(StringUtils.iTextHTML(string).getBytes()));
Most simple HTML elements work this way... but there are some which seem to ignore the FontProvider and keep using Helvetica which won't be embedded in the PDF/A (we don't have that font). For example <ol><li>...</li></ol> make use of this.
Caused by: com.itextpdf.text.pdf.PdfXConformanceException: All the fonts must be embedded. This one isn't: Helvetica
at com.itextpdf.text.pdf.internal.PdfXConformanceImp.checkPDFXConformance(PdfXConformanceImp.java:225)
at com.itextpdf.text.pdf.PdfWriter.addSimple(PdfWriter.java:2192)
at com.itextpdf.text.pdf.PdfContentByte.setFontAndSize(PdfContentByte.java:1444)
at com.itextpdf.text.pdf.PdfDocument.writeLineToContent(PdfDocument.java:1463)
at com.itextpdf.text.pdf.ColumnText.go(ColumnText.java:968)
at com.itextpdf.text.pdf.ColumnText.go(ColumnText.java:841)
at com.itextpdf.text.pdf.ColumnText.showTextAligned(ColumnText.java:1189)
at com.itextpdf.text.pdf.ColumnText.showTextAligned(ColumnText.java:1208)
at com.itextpdf.text.pdf.PdfDocument.flushLines(PdfDocument.java:1193)
at com.itextpdf.text.pdf.PdfDocument.newPage(PdfDocument.java:830)
at com.itextpdf.text.Document.newPage(Document.java:367)
I've run out of ideas how to get rid of Helvetica for now... trying to solve this for 8+ hours now... any more ideas?

I've dug a little deeper and traveled from OrderedUnorderedList over ListItem to List...
/**
* Adds an <CODE>Element</CODE> to the <CODE>List</CODE>.
*
* #param o the element to add.
* #return true if adding the object succeeded
* #since 5.0.1 (signature changed to use Element)
*/
#Override
public boolean add(final Element o) {
if (o instanceof ListItem) {
ListItem item = (ListItem) o;
if (this.numbered || this.lettered) {
Chunk chunk = new Chunk(this.preSymbol, this.symbol.getFont());
chunk.setAttributes(this.symbol.getAttributes());
int index = this.first + this.list.size();
if ( this.lettered )
chunk.append(RomanAlphabetFactory.getString(index, this.lowercase));
else
chunk.append(String.valueOf(index));
chunk.append(this.postSymbol);
item.setListSymbol(chunk);
}
else {
item.setListSymbol(this.symbol);
}
item.setIndentationLeft(this.symbolIndent, this.autoindent);
item.setIndentationRight(0);
return this.list.add(item);
}
else if (o instanceof List) {
List nested = (List) o;
nested.setIndentationLeft(nested.getIndentationLeft() + this.symbolIndent);
this.first--;
return this.list.add(nested);
}
return false;
}
This code refers to this.symbol.getFont() which is set to undefined on class initialization...
public class List implements TextElementArray, Indentable {
[...]
/** This is the listsymbol of a list that is not numbered. */
protected Chunk symbol = new Chunk("- ");
I simply used another Chunk constructor which takes a Font of mine and voila... SOLVED. The numbered list no longer uses Helvetica but my own font which gets embedded properly.
This took me ages! Another way might have been to implement an own TagProcessor for <ol> but we don't have the time for this anymore. I'll file a bug report for this... we'll see if it gets fixed a bit more flexible.

Related

PDFBox: No glyph for U+0050 in extracted font

I'm trying to create new page in document and write some text to it, while using the font contained in the file.
The font is extracted from the resources:
PDPage page = document.getPage(0);
PDResources res = page.getResources();
List<PDFont> fonts = new ArrayList<>();
for (COSName fontName : res.getFontNames()) {
PDFont font = res.getFont(fontName);
System.out.println(font);
fonts.add(font);
}
And later used to write some text:
stream.beginText();
stream.setFont(fonts.get(0), 12);
stream.setTextMatrix(Matrix.getTranslateInstance(20, 50));
stream.showText("Protokol");
stream.endText();
The showText method always fails with error
No glyph for U+0050 (P) in font QZHBRL+ArialMT
But the glyph is there, as verified by FontForge:
Also the method hasGlyph returns true.
The complete project including the PDF is available at github repository showing the issue

Here you actually ran into an open PDFBox TODO, your stream.showText eventually calls encode of the underlying CID font for each character and here we have:
public class PDCIDFontType2 extends PDCIDFont
{
...
public byte[] encode(int unicode)
{
int cid = -1;
if (isEmbedded)
{
...
// otherwise we require an explicit ToUnicode CMap
if (cid == -1)
{
//TODO: invert the ToUnicode CMap?
// see also PDFBOX-4233
cid = 0;
}
}
...
if (cid == 0)
{
throw new IllegalArgumentException(
String.format("No glyph for U+%04X (%c) in font %s", unicode, (char) unicode, getName()));
}
return encodeGlyphId(cid);
}
...
}
(org.apache.pdfbox.pdmodel.font.PDCIDFontType2)
Where PDFBox could not otherwise determine a mapping from Unicode to glyph code (if (cid == -1)), the JavaDoc comments indicate another way to determine a glyph code, an inverse lookup of the ToUnicode map. If this was implemented, PDFBox could have determined a glyph ID and written your text.
Unfortunately it is not implemented yet.

This has been fixed in issue PDFBOX-5103. This will be available in PDFBox 2.0.23 and until then, in a snapshot build.

Handle many unicode caracters with PDFBox

I am writing a Java function which takes a String as a parameter and produce a PDF as an output with PDFBox.
Everything is working fine as long as I use latin characters.
However, I don't know in advance what will be the input, and it might be some English as well as Chinese or Japanese characters.
In the case of non latin characters, here is the error I get:
Exception in thread "main" java.lang.IllegalArgumentException: U+3053 ('kohiragana') is not available in this font Helvetica encoding: WinAnsiEncoding
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:426)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:324)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showTextInternal(PDPageContentStream.java:509)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:471)
at com.mylib.pdf.PDFBuilder.generatePdfFromString(PDFBuilder.java:122)
at com.mylib.pdf.PDFBuilder.main(PDFBuilder.java:111)
If I understand correctly, I have to use a specific font for Japanese, another one for Chinese and so on, because the one that I am using (Helvetiva) doesn't handle all required unicode characters.
I could also use a font which handle all these unicode characters, such as Arial Unicode. However this font is under a specific license so I cannot use it and I haven't found another one.
I found some projects that want to overcome this issue, like the Google NOTO project.
However, this project provides multiple font files. So I would have to choose, at runtime, the correct file to load depending on the input I have.
So I am facing 2 options, one of which I don't know how to implement properly:
Keep searching for a font that handle almost every unicode character (where is this grail I am desperately seeking?!)
Try to detect which language is used and select a font depending on it.
Despite the fact that I don't know (yet) how to do that, I don't find it to be a clean implementation, as the mapping between the input and the font file will be hardcoded, meaning I will have to hardcode all the possible mappings.
Is there another solution?
Am I completely off tracks?
Thanks in advance for your help and guidance!
Here is the code I use to generate the PDF:
public static void main(String args[]) throws IOException {
String latinText = "This is latin text";
String japaneseText = "これは日本語です";
// This works good
generatePdfFromString(latinText);
// This generate an error
generatePdfFromString(japaneseText);
}
private static OutputStream generatePdfFromString(String content) throws IOException {
PDPage page = new PDPage();
try (PDDocument doc = new PDDocument();
PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
doc.addPage(page);
contentStream.setFont(PDType1Font.HELVETICA, 12);
// Or load a specific font from a file
// contentStream.setFont(PDType0Font.load(this.doc, new File("/fontPath.ttf")), 12);
contentStream.beginText();
contentStream.showText(content);
contentStream.endText();
contentStream.close();
OutputStream os = new ByteArrayOutputStream();
doc.save(os);
return os;
}
}

A better solution than waiting for a font or guessing a text's language is to have a multitude of fonts and selecting the correct font on a glyph-by-glyph base.
You already found the Google Noto Fonts which are a good base collection of fonts for this task.
Unfortunately, though, Google publishes the Noto CJK fonts only as OpenType fonts (.otf), not as TrueType fonts (.ttf), a policy that isn't likely to change, cf. the Noto fonts issue 249 and others. On the other hand PDFBox does not support OpenType fonts and isn't actively working on OpenType support either, cf. PDFBOX-2482.
Thus, one has to convert the OpenType font somehow to TrueType. I simply took the file shared by djmilch in his blog post FREE FONT NOTO SANS CJK IN TTF.
Font selection per character
So you essentially need a method which checks your text character by character and dissects it into chunks which can be drawn using the same font.
Unfortunately I don't see a better method to ask a PDFBox PDFont whether it knows a glyph for a given character than to actually try to encode the character and consider a IllegalArgumentException a "no".
I, therefore, implemented that functionality using the following helper class TextWithFont and method fontify:
class TextWithFont {
final String text;
final PDFont font;
TextWithFont(String text, PDFont font) {
this.text = text;
this.font = font;
}
public void show(PDPageContentStream canvas, float fontSize) throws IOException {
canvas.setFont(font, fontSize);
canvas.showText(text);
}
}
(AddTextWithDynamicFonts inner class)
List<TextWithFont> fontify(List<PDFont> fonts, String text) throws IOException {
List<TextWithFont> result = new ArrayList<>();
if (text.length() > 0) {
PDFont currentFont = null;
int start = 0;
for (int i = 0; i < text.length(); ) {
int codePoint = text.codePointAt(i);
int codeChars = Character.charCount(codePoint);
String codePointString = text.substring(i, i + codeChars);
boolean canEncode = false;
for (PDFont font : fonts) {
try {
font.encode(codePointString);
canEncode = true;
if (font != currentFont) {
if (currentFont != null) {
result.add(new TextWithFont(text.substring(start, i), currentFont));
}
currentFont = font;
start = i;
}
break;
} catch (Exception ioe) {
// font cannot encode codepoint
}
}
if (!canEncode) {
throw new IOException("Cannot encode '" + codePointString + "'.");
}
i += codeChars;
}
result.add(new TextWithFont(text.substring(start, text.length()), currentFont));
}
return result;
}
(AddTextWithDynamicFonts method)
Example use
Using the method and the class above like this
String latinText = "This is latin text";
String japaneseText = "これは日本語です";
String mixedText = "Tこhれiはs日 本i語sで すlatin text";
generatePdfFromStringImproved(latinText).writeTo(new FileOutputStream("Cccompany-Latin-Improved.pdf"));
generatePdfFromStringImproved(japaneseText).writeTo(new FileOutputStream("Cccompany-Japanese-Improved.pdf"));
generatePdfFromStringImproved(mixedText).writeTo(new FileOutputStream("Cccompany-Mixed-Improved.pdf"));
(AddTextWithDynamicFonts test testAddLikeCccompanyImproved)
ByteArrayOutputStream generatePdfFromStringImproved(String content) throws IOException {
try ( PDDocument doc = new PDDocument();
InputStream notoSansRegularResource = AddTextWithDynamicFonts.class.getResourceAsStream("NotoSans-Regular.ttf");
InputStream notoSansCjkRegularResource = AddTextWithDynamicFonts.class.getResourceAsStream("NotoSansCJKtc-Regular.ttf") ) {
PDType0Font notoSansRegular = PDType0Font.load(doc, notoSansRegularResource);
PDType0Font notoSansCjkRegular = PDType0Font.load(doc, notoSansCjkRegularResource);
List<PDFont> fonts = Arrays.asList(notoSansRegular, notoSansCjkRegular);
List<TextWithFont> fontifiedContent = fontify(fonts, content);
PDPage page = new PDPage();
doc.addPage(page);
try ( PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
contentStream.beginText();
for (TextWithFont textWithFont : fontifiedContent) {
textWithFont.show(contentStream, 12);
}
contentStream.endText();
}
ByteArrayOutputStream os = new ByteArrayOutputStream();
doc.save(os);
return os;
}
}
(AddTextWithDynamicFonts helper method)
I get
for latinText = "This is latin text"
for japaneseText = "これは日本語です"
and for mixedText = "Tこhれiはs日 本i語sで すlatin text"
Some asides
I retrieved the fonts as Java resources but you can use any kind of InputStream for them.
The font selection mechanism above can quite easily be combined with the line breaking mechanism shown in this answer and the justification extension thereof in this answer

Below is another implementation of splitting a plain text into the chunks of TextWithFont objects. Algorithm does character-by-character encoding and always tries to encode with a main font and only in the case of a failure will proceed with the next fonts in the list of fallback fonts.
Main classwith properties:
public class SplitByFontsProcessor {
/** Text to be processed */
private String text;
/** List of fonts to be used for processing */
private List<PDFont> fonts;
/** Main font to be used for processing */
private PDFont mainFont;
/** List of fallback fonts to be used for processing. It does not contain the main font. */
private List<PDFont> fallbackFonts;
........
}
Methods within the same class:
private List<TextWithFont> splitUsingFallbackFonts() throws IOException {
final List<TextWithFont> fontifiedText = new ArrayList<>();
final StringBuilder strBuilder = new StringBuilder();
boolean isHandledByMainFont = false;
// Iterator over Unicode codepoints in Java string
final PrimitiveIterator.OfInt iterator = text.codePoints().iterator();
while (iterator.hasNext()) {
int codePoint = iterator.nextInt();
final String stringCodePoint = new String(Character.toChars(codePoint));
// try to encode Unicode codepoint
try {
// Multi-byte encoding with 1 to 4 bytes.
mainFont.encode(stringCodePoint); // fails here if can not be handled by the font
strBuilder.append(stringCodePoint); // append if succeeded to encode
isHandledByMainFont = true;
} catch(IllegalArgumentException ex) {
// IllegalArgumentException is thrown if character can not be handled by a given Font
// Adding successfully handled characters so far
if (StringUtils.isNotEmpty(strBuilder.toString())) {
fontifiedText.add(new TextWithFont(strBuilder.toString(), mainFont));
strBuilder.setLength(0);// clear StringBuilder
}
handleByFallbackFonts(fontifiedText, stringCodePoint);
isHandledByMainFont = false;
} // end main font try-catch
}
// If this is the last successful run that was handled by main font, then add result
if (isHandledByMainFont) {
fontifiedText.add(new TextWithFont(strBuilder.toString(), mainFont));
}
return mergeAdjacents(fontifiedText);
}
Method handleByFallbackFonts():
private void handleByFallbackFonts(List<TextWithFont> fontifiedText, String stringCodePoint)
throws IOException {
final StringBuilder strBuilder = new StringBuilder();
boolean isHandledByFallbackFont = false;
// Retry with fallback fonts
final Iterator<PDFont> fallbackFontsIterator = fallbackFonts.iterator();
while(fallbackFontsIterator.hasNext()) {
try {
final PDFont fallbackFont = fallbackFontsIterator.next();
fallbackFont.encode(stringCodePoint); // fails here if can not be handled by the font
isHandledByFallbackFont = true;
strBuilder.append(stringCodePoint);
fontifiedText.add(new TextWithFont(strBuilder.toString(), fallbackFont));
break; // if successfully handled - break the loop
} catch(IllegalArgumentException exception) {
// do nothing, proceed to the next font
}
} // end while
// If character was not handled and this is the last font - throw an exception
if (!isHandledByFallbackFont) {
final String fontNames = fonts.stream()
.map(PDFont::getName)
.collect(Collectors.joining(", "));
int codePoint = stringCodePoint.codePointAt(0);
throw new TextProcessingException(
String.format("Unicode code point [%s] can not be handled by configured fonts: [%s]",
codePoint, fontNames));
}
}
Method splitUsingFallbackFonts() returns a list of TextWithFont objects in which adjacent objects with the same font will not be necessarily belong to the same object. This happens because an algorithm will always first retry to render a character by the main font, and in case it fails, it will create a new object with the font capable of rendering the character. So we need to call a utility method, mergeAdjacents(), which will merge them together.
private static List<TextWithFont> mergeAdjacents(final List<TextWithFont> fontifiedText) {
final Deque<TextWithFont> result = new LinkedList<>();
for (TextWithFont elem : fontifiedText) {
final TextWithFont resElem = result.peekLast();
if (resElem == null || !resElem.getFont().equals(elem.getFont())) {
result.addLast(elem);
} else {
result.addLast(merge(result.pollLast(), elem));
}
}
return new ArrayList<>(result);
}

Render Type3 font character as image using PDFBox

In my project, I'm stuck with necessity to parse PDF file, that contains some characters rendered by Type3 fonts. So, what I need to do is to render such characters into BufferedImage for further processing.
I'm not sure if I'm looking in correct way, but I'm trying to get PDType3CharProc for such characters:
PDType3Font font = (PDType3Font)textPosition.getFont();
PDType3CharProc charProc = font.getCharProc(textPosition.getCharacterCodes()[0]);
and the input stream of this procedure contains following data:
54 0 1 -1 50 43 d1
q
49 0 0 44 1.1 -1.1 cm
BI
/W 49
/H 44
/BPC 1
/IM true
ID
<some binary data here>
EI
Q
but unfortunately I don't have any idea how can I use this data to render character into an image using PDFBox (or any other Java libraries).
Am I looking in correct direction, and what can I do with this data?
If not, are there some other tools that can solve such problem?

Unfortunately PDFBox out-of-the-box does not provide a class to render contents of arbitrary XObjects (like the type 3 font char procs), at least as far as I can see.
But it does provide a class for rendering complete PDF pages; thus, to render a given type 3 font glyph, one can simply create a page containing only that glyph and render this temporary page!
Assuming, for example, the type 3 font is defined on the first page of a PDDocument document and has name F1, all its char procs can be rendered like this:
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
COSName f1Name = COSName.getPDFName("F1");
PDType3Font fontF1 = (PDType3Font) pageResources.getFont(f1Name);
Map<String, Integer> f1NameToCode = fontF1.getEncoding().getNameToCodeMap();
COSDictionary charProcsDictionary = fontF1.getCharProcs();
for (COSName key : charProcsDictionary.keySet())
{
COSStream stream = (COSStream) charProcsDictionary.getDictionaryObject(key);
PDType3CharProc charProc = new PDType3CharProc(fontF1, stream);
PDRectangle bbox = charProc.getGlyphBBox();
if (bbox == null)
bbox = charProc.getBBox();
Integer code = f1NameToCode.get(key.getName());
if (code != null)
{
PDDocument charDocument = new PDDocument();
PDPage charPage = new PDPage(bbox);
charDocument.addPage(charPage);
charPage.setResources(pageResources);
PDPageContentStream charContentStream = new PDPageContentStream(charDocument, charPage);
charContentStream.beginText();
charContentStream.setFont(fontF1, bbox.getHeight());
charContentStream.getOutput().write(String.format("<%2X> Tj\n", code).getBytes());
charContentStream.endText();
charContentStream.close();
File result = new File(RESULT_FOLDER, String.format("4700198773-%s-%s.png", key.getName(), code));
PDFRenderer renderer = new PDFRenderer(charDocument);
BufferedImage image = renderer.renderImageWithDPI(0, 96);
ImageIO.write(image, "PNG", result);
charDocument.close();
}
}
(RenderType3Character.java test method testRender4700198773)
Considering the textPosition variable in the OP's code, he quite likely attempts this from a text extraction use case. Thus, he'll have to either pre-generate the bitmaps as above and simply look them up by name or adapt the code to match the available information in his use case (e.g. he might not have the original page at hand, only the font object; in that case he cannot copy the resources of the original page but instead may create a new resources object and add the font object to it).
Unfortunately the OP did not provide a sample PDF. Thus I used one from another stack overflow question, 4700198773.pdf from extract text with custom font result non readble for my test. There obviously might remain issues with the OP's own files.

I stumbled upon the same issue and I was able to render Type3 font by modifying PDFRenderer and the underlying PageDrawer:
class Type3PDFRenderer extends PDFRenderer
{
private PDFont font;
public Type3PDFRenderer(PDDocument document, PDFont font)
{
super(document);
this.font = font;
}
#Override
protected PageDrawer createPageDrawer(PageDrawerParameters parameters) throws IOException
{
FontType3PageDrawer pd = new FontType3PageDrawer(parameters, this.font);
pd.setAnnotationFilter(super.getAnnotationsFilter());//as done in the super class
return pd;
}
}
class FontType3PageDrawer extends PageDrawer
{
private PDFont font;
public FontType3PageDrawer(PageDrawerParameters parameters, PDFont font) throws IOException
{
super(parameters);
this.font = font;
}
#Override
public PDGraphicsState getGraphicsState()
{
PDGraphicsState gs = super.getGraphicsState();
gs.getTextState().setFont(this.font);
return gs;
}
}
Simply use Type3PDFRenderer instead of PDFRendered. Of course if you have multiple fonts this needs some more modification to handle them.
Edit: tested with pdfbox 2.0.9

iText add Watermark to selected pages

I need to add a watermark to every page that has certain text, such as "PROCEDURE DELETED".
Based on Bruno Lowagie's suggestion in Adding watermark directly to the stream
So far have the PdfWatermark Class with:
protected Phrase watermark = new Phrase("DELETED", new Font(FontFamily.HELVETICA, 60, Font.NORMAL, BaseColor.PINK));
ArrayList<Integer> arrPages = new ArrayList<Integer>();
boolean pdfChecked = false;
#Override
public void onEndPage(PdfWriter writer, Document document) {
if(pdfChecked == false) {
detectPages(writer, document);
pdfChecked = true;
}
int pageNum = writer.getPageNumber();
if(arrPages.contains(pageNum)) {
PdfContentByte canvas = writer.getDirectContentUnder();
ColumnText.showTextAligned(canvas, Element.ALIGN_CENTER, watermark, 298, 421, 45);
}
}
And this works fine if I add, say, the number 3 to the arrPages ArrayList in my custom detectPages method - it shows the desired watermark on page 3.
What I am having trouble with is how to search through the document for the text string, which I have access to here, only from the PdfWriter writer or the com.itextpdf.text.Document document sent to onEndPage method.
Here is what I have tried, unsuccessfully:
private void detectPages(PdfWriter writer, Document document) {
try {
//arrPages.add(3);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document, byteArrayOutputStream);
//following code no work
PdfReader reader = new PdfReader(writer.getDirectContent().getPdfDocument());
PdfContentByte canvas = writer.getDirectContent();
PdfImportedPage page;
for (int i = 0; i < reader.getNumberOfPages(); ) {
page = writer.getImportedPage(reader, ++i);
canvas.addTemplate(page, 1f, 0, 0.4f, 0.4f, 72, 50 * i);
canvas.beginText();
canvas.showTextAligned(Element.ALIGN_CENTER,
//search String
String.valueOf((char)(181 + i)), 496, 150 + 50 * i, 0);
//if(detected) arrPages.add(i);
canvas.endText();
}
Am I on the right track with this as a solution or do I need to back out?
Can anyone supply the missing link needed to scan the doc and pick out "PROCEDURE DELETED" pages?
EDIT: I am using iText 5.0.4 - cannot upgrade to 5.5.X at this time, but could probably upgrade to latest version below that.
EDIT2: More information: This is approach to adding text to the document (doc):
String processed = processText(template);
List<Element> objects = HTMLWorker.parseToListLog(new StringReader(processed),
styles, interfaceProps, errors);
for (Element elem : objects) {
doc.add(elem);
}
That is called in an addText method I control. The template is simply html from a database LOB. The processText checks the html for custom markers contained by curlies as in ${replaceMe}.
This seems to be the place to identify the "PROCEDURE DELETED" string during generation of the document, but I don't see the path to Chunk.setGenericTags().
EDIT3: Table difficulties
List<Element> objects = HTMLWorker.parseToListLog(new StringReader(processed),
styles, interfaceProps, errors);
for (Element elem : objects) {
//Code below no work
if (elem instanceof PdfPTable) {
PdfPTable table = (PdfPTable) elem;
ArrayList<Chunk> chks = table.getChunks();
for(Chunk chk : chks){
if(chk.toString().contains("TEST DELETED")) {
chk.setGenericTag("delete_tag");
}
}
}
doc.add(elem);
}
Commenters mlk and Bruno suggested to detect the "PROCEDURE DELETED" keywords at the time they are added to the doc. However, since the keywords are necessarily inside a table, they have to be detected through PdfPTable rather than the simpler Element.
I could not do it with the code above. Any suggestions exactly how to find text inside a table cell and do a string comparison on it?
EDIT4: Based on some experimentation, I would like to make some assertions and please show me the way through them:
Using Chunk.setGenericTag() is required to trigger the handler onGenericTag
For some reason (PdfPTable) table.getChunks() does not return chunks, at least that my system picks up. This is counterintuitive and possibly there is a setup, version, or code bug causing this behavior.
Therefore, a selection text string inside a table cannot be used to trigger a watermark.

Thanks to all the help from #mkl and #Bruno, I finally got a workable solution. Anybody interested who can give a more elegant approach is welcome - there is certainly room.
Two classes were in play in all the snippets given in the original question. Below is partial code that embodies the working solution.
public class PdfExporter
//a custom class to build content
//create some content to add to pdf
String text = "<div>Lots of important content here</div>";
//assume the section is known to be deleted
if(section_deleted) {
//add a special marker to the text, in white-on-white
text = "<span style=\"color:white;\">.!^DEL^?.</span>" + text + "</span>";
}
//based on some indicator, we know we want to force a new page for a new section
boolean ensureNewPage = true;
//use an instance of PdfDocHandler to add the content
pdfDocHandler.addText(text, ensureNewPage);
public class PdfDocHandler extends PdfPageEventHelper
private boolean isDEL;
public boolean addText(String text, boolean ensureNewPage)
if (ensureNewPage) {
//turn isDEL off: a forced pagebreak indicates a new section
isDEL = false;
}
//attempt to find the special DELETE marker in first chunk
//NOTE: this can be done several ways
ArrayList<Chunk> chks = elem.getChunks();
if(chks.size()>0) {
if(chks.get(0).getContent().contains(".!^DEL^?.")) {
//special delete marker found in text
isDEL = true;
}
}
//doc is an iText Document
doc.add(elem);
public void onEndPage(PdfWriter writer, Document document) {
if(isDEL) {
//set the watermark
Phrase watermark = new Phrase("DELETED", new Font(Font.FontFamily.HELVETICA, 60, Font.NORMAL, BaseColor.PINK));
PdfContentByte canvas = writer.getDirectContentUnder();
ColumnText.showTextAligned(canvas, Element.ALIGN_CENTER, watermark, 298, 421, 45);
}
}

iText monospaced fonts in html view with bold, italic from css style

I am using html to generate PDF document, since its an EMR record I have to use Monospaced fonts.
PDF is getting generated fine, but css style for bold and italics are getting ignored, as I am using single .otf file for font hence no bold and italics.
I was wondering how to enable the same. Below are the code snippets.
Font Factory:
public static class MyFontFactory implements FontProvider,Serializable {
public Font getFont(String fontname,
String encoding, boolean embedded, float size,
int style, BaseColor color) {
BaseFont bf3 = null;
try {
bf3 = BaseFont.createFont("Inconsolata.otf",BaseFont.CP1252, BaseFont.EMBEDDED);
} catch (Exception e) {
e.printStackTrace();
}
return new Font(bf3, 6);
}
public boolean isRegistered(String fontname) {
return false;
}
}
PDF Generation Code:
public void createPdf(Object object) throws Exception, DocumentException{
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(new File("test.pdf")));
// step 3
document.open();
// create extra properties
HashMap<String,Object> map = new HashMap<String, Object>();
map.put(HTMLWorker.FONT_PROVIDER, new MyFontFactory());
// step 4
String snippet;
// create the snippet
snippet = createHtmlSnippet(object);
Map<Object,Object> model = new HashMap<Object,Object>();
model.put("object", object);
StyleSheet css = new StyleSheet();
Map<String, String> stylemap = new HashMap<String, String>();
stylemap.put("font-style", "italic");
stylemap.put("font-size", "small");
stylemap.put("font-weight", "bold");
css.loadStyle("header",(HashMap<String, String>) stylemap);
css.loadStyle("strongClass", "text-decoration", "underline");
List<Element> objects = HTMLWorker.parseToList(new StringReader(snippet), css, map);
for (Element element : objects)
document.add(element);
// step 5
document.close();
}
In the above code css supplied does not produce any effect on output
as I mentioned due to single font defined, if I want to have bold and
italics how can that be achieved?
Really appreciate if anyone provides pointers or help regarding same.
Thanks.
Note: If I remove Monospaced fonts css gets applied.

You are confusing a font family with a font.
Inconsolata is a font family consisting of different fonts:
Inconsolata regular as defined in inconsolata.ttf
Inconsolata bold as defined in inconsolata-Bold.ttf
See http://code.google.com/p/googlefontdirectory/source/browse/ofl/inconsolata/
I didn't know of any bold, italic or bold-italic version because I assumed "there is no bold or italic for Inconsolata." And if there is no font program for other styles, you shouldn't expect iText to support those styles (*).
Then I found a repository with a TTF for the bold font: http://code.google.com/p/googlefontdirectory/source/browse/ofl/inconsolata/
Searching StackOverflow I read the question about Inconsolata Italic in MacVim on StackOverflow; unfortunately these fonts can't be used in iText.
(*) When a font doesn't support bold or italic, iText can mimic these styles by changing the render mode and/or the skew. However, you'll have better results by choosing another monospaced font.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get rid of Helvetica in iText XMLWorker? - java

Related

PDFBox: No glyph for U+0050 in extracted font

Handle many unicode caracters with PDFBox

Render Type3 font character as image using PDFBox

iText add Watermark to selected pages

iText monospaced fonts in html view with bold, italic from css style

Categories

Resources