PDFBox leaves font files open after generating PDF

PDFBox leaves font files open after generating PDF - java

I am experiencing problems with TTF fonts loaded for generating PDFs which eventually result in too many open files on Linux. The PDFBox version I tested last was 2.0.0-RC3.
Basically for each PDF I create a document and load two fonts which I want to use. After the document is generated I close all the resources, but the file descriptors for both fonts remain open.
My question is how do I close these two files?
This is my basic code:
doc = new PDDocument();
page = new PDPage(PDRectangle.A4);
doc.addPage(page);
PDFont font = PDType0Font.load(doc, new File(settings.getProperty("font.location")));
PDFont boldFont = PDType0Font.load(doc, new File(settings.getProperty("bold.font.location")));
PDPageContentStream content = new PDPageContentStream(doc, page);
// add content stuff
content.close();
bos = new ByteArrayOutputStream();
doc.save(bos);
bos.flush();
byte[] bytes = bos.toByteArray();
doc.close();

Related

pdfbox: how to load a font once and use it many times?

I am trying to create a lot of pdf files in a loop.
for(int i=0; i<10000; ++i){
PDDocument doc = PDDocument.load(inputstream);
PDPage page = doc.getPage(0);
PDPageContentStream content = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true, true);
content.beginText();
//what happens here?
PDFont font = PDType0Font.load(doc, Thread.currentThread().getContextClassLoader().getResourceAsStream("font/simsun.ttf") );
content.setFont(font, 10);
//...
doc.save(outstream);
doc.close();
}
what does it happen by calling PDType0Font.load... ? Because the ttf file is large (10M), will it create ephemeral big objects of font 10000 times? If so, is there a way to make the font as embedded as PDType1Font, so I can just load it once and use it many times in the loop?
I encountered a full GC problem here, and I'm trying to figure it out.

Create the font at the fontbox level:
TrueTypeFont ttf = new TTFParser().parse(...);
You can now reuse ttf in different PDDocument objects like this:
PDFont font = PDType0Font.load(doc, ttf, true);
When done with all documents, don't forget to close ttf.
See also PDFontTest.testPDFBox3826() in the source code.

Barcode in PDF not displaying in FireFox [duplicate]

I am running into strange issue with generated pdf's from iText7. The generated pdf's are opening properly in Adobe reader and Chrome browser. But the same pdf is opening partially in the Firefox browser. I am getting the below message in Firefox. The strange thing is other pdf, which are not generated via iText are properly rendering in firefox.
Java code
public static byte[] createPdf(List<String> htmlPages, PageSize pageSize, boolean rotate) throws IOException {
ConverterProperties properties = new ConverterProperties();
// Register classpath protocol handler to be able to load HTML resources from class patch
org.apache.catalina.webresources.TomcatURLStreamHandlerFactory.register();
properties.setBaseUri("classpath:/");
// properties.setBaseUri(baseUri);
FontProvider fontProvider = new DefaultFontProvider(true,false,false);
properties.setFontProvider(fontProvider);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfDocument pdf = new PdfDocument(new PdfWriter(byteArrayOutputStream));
PdfMerger merger = new PdfMerger(pdf);
for (String htmlPage : htmlPages) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
if(rotate) {
temp.setDefaultPageSize(pageSize.rotate()); /** Page Size and Orientation */
} else {
temp.setDefaultPageSize(pageSize); /** Page Size and Orientation */
}
HtmlConverter.convertToPdf(htmlPage, temp, properties);
temp = new PdfDocument(new PdfReader(new ByteArrayInputStream(baos.toByteArray())));
merger.merge(temp, 1, temp.getNumberOfPages());
temp.close();
}
pdf.close();
byteArrayOutputStream.flush(); // Tried this
byteArrayOutputStream.close(); // Tried this
byte[] byteArray = byteArrayOutputStream.toByteArray();
Timestamp timestamp = new Timestamp(System.currentTimeMillis());
try (FileOutputStream fileOuputStream = new FileOutputStream("D:\\Labels\\Label_"+timestamp.getTime()+".pdf")){
fileOuputStream.write(byteArray);
}
return byteArray;
}
Thanks in advance.
Edit 1:
You can find pdf and html/css for reproducing issue here.

When you embedded the images into your html using base64 URIs, something weird happened to the image of the barcode: Instead of the 205×59 bitmap image in labelData/barcode.png you embedded a 39578×44 image! (Yes, an image nearly a thousand times wider than high...)
The iText HtmlConverter embedded that image just fine but apparently Firefox has issues displaying an image with those dimensions even though (or probably because?) it is transformed into the desired dimensions (about four times wider than high) on the label. At least my Firefox installation stops drawing label contents right here. (Beware, the order of drawing in the PDF content is not identical to the of the HTML elements; in particular in the PDF the number 3232000... is drawn right before the barcode, not afterwards!)
On Firefox:
On Acrobat Reader:
Thus, you may want to check the transformation of the bar code image to the base64 image URI in your HTML file.

Subsetting OpenType Collection font in pdfbox

I'm trying to embed a subset of noto-regular in my code. but I keeping on getting:
java.lang.UnsupportedOperationException: OTF fonts do not have a glyf table
at org.apache.fontbox.ttf.OpenTypeFont.getGlyph(OpenTypeFont.java:66)
at org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:481)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:136)
at org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:306)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:162)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
I downloaded the font file NotoSansCJK-Regular.ttc from https://www.google.com/get/noto/help/cjk/
Font subsetting works for .ttf fonts, as I haven't had any issues if the document I saved contains no special characters.
EDIT
It appears that true type collection fonts can have shared glyf table (makes sense since the font collection contains Japanese glyphs). So the individual PDType0Font parsed from .ttc can't be treated as an individual font.
I loaded the font using:
ttc.processAllFonts((TrueTypeFont ttf) -> {
PDFont font = PDType0Font.load(doc, ttf, true);
fontList.add(font);
});
I'm guessing that there are extra work I needed to do to make this work, but I can't find any code samples anywhere.
EDIT2
Seems like the problem is that when subsetting specific OpenType font files, (which font collection contains) turns on an internal flag isPostScript. The flag is then checked and process is aborted when getGlyph() is called.
The following code generates the glyf table error when creating the pdf documents
// downloaded from Noto project site
String OTF_FILE = "./src/test/resources/NotoSansJP-Regular.otf";
PDDocument doc = new PDDocument();
PDFont otf = null;
try (InputStream inputStream = new FileInputStream(new File(OTF_FILE))) {
otf = PDType0Font.load(doc, new OTFParser().parse(inputStream), true);
PDPage page = new PDPage();
PDPageContentStream stream = new PDPageContentStream(doc, page);
stream.setFont(otf, 10f);
stream.beginText();
stream.newLineAtOffset(100f, 600f);
stream.showText("二ろほス反2化みた大第リきやね景手ハニエ者性ルヤリウ円脱");
stream.endText();
stream.close();
doc.addPage(page);
doc.save("test.pdf");
} catch (IOException iox) {
// failed
}
but it will generate the pdf fine as soon as I set the subsetting parameter to true in the PDType0Font.load call
Similarily if I load the otf font through the collection:
String OTF_FILE = "./src/test/resources/NotoSansCJK-Regular.ttc";
PDDocument doc = new PDDocument();
PDFont otf = null;
try (InputStream inputStream = new FileInputStream(new File(OTF_FILE))) {
TrueTypeCollection ttc = new TrueTypeCollection(inputStream);
otf = PDType0Font.load(doc, ttc.getFontByName("NotoSansCJKjp-Regular"), true);
PDPage page = new PDPage();
PDPageContentStream stream = new PDPageContentStream(doc, page);
stream.setFont(otf, 10f);
stream.beginText();
stream.newLineAtOffset(100f, 600f);
stream.showText("二ろほス反2化みた大第リきやね景手ハニエ者性ルヤリウ円脱");
stream.endText();
stream.close();
doc.addPage(page);
doc.save("test.pdf");
} catch (IOException iox) {
// failed
}
I either need to embed the whole font or subsetting will throw the error
EDIT 3
I ended up circumvent this by downloading the OTF font from "Language-specific OpenType/CFF (OTF)", which contains characters from all 4 regions and converted it using otf2ttf from fonttools

PDFBox incorrect text appearance after copy/paste

I’m using PDFBox 2.0.4 to create PDF documents with acroForms. Here is my test code example:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
String dir = "../testPdfBox/src/main/resources/fonts/";
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
PDResources resources = new PDResources();
String fontName = resources.add(font).getName();
acroForm.setDefaultResources(resources);
String defaultAppearanceString = format("/%s 12 Tf 0 g", fontName);
acroForm.setDefaultAppearance(defaultAppearanceString);
PDTextField field = new PDTextField(acroForm);
field.setPartialName("SampleField");
field.setDefaultAppearance(defaultAppearanceString);
acroForm.getFields().add(field);
PDAnnotationWidget widget = field.getWidgets().get(0);
PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
widget.setPrinted(true);
page.getAnnotations().add(widget);
field.setValue("Sample field 123456");
acroForm.flatten();
document.save("target/SimpleForm.pdf");
document.close();
Everything works fine. But when I try to copy text from the created document and paste it to the NotePad or Word it becomes squares.
􀀷􀁅􀁑􀁔􀁐􀁉􀀄􀁊􀁍􀁉􀁐􀁈􀀄􀀕􀀖􀀗􀀘􀀙􀀚
I search a lot about this problem. The most popular answer is that there is no toUnicode cmap in created PDF. So I explore my document with CanOpener for Acrobat:
Yes, there is no toUnicode cmap, but everything works properly, if not to use acroForm.flatten(). When form fields are not flattened, I can copy/paste text from the document and it looks correct. Nevertheless I need all fields to be flattened.
So, I have two questions:
Why there is a problem with copy/pasting text in flattened form, and everything is ok in non-flattened?
What can I do to avoid problem with text copy/pasting?
Is there only one solution - to create toUnicode CMap by my own, like in this example?
My test pdf files are available here.

Please replace
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
with
PDType0Font font = PDType0Font.load(document, new FileInputStream(dir + "Roboto-Regular.ttf"), false);
This makes sure that the font is embedded in full and not just as a subset.

What is the recommended way to write a BufferedImage to a PDF using PDFBox?

I'm trying to convert a TIFF to a PDF using Apache Imaging and PDFBox. Everything I've tried results in a blank (but non-zero-byte) pdf.
There are some examples in other SO questions like Add BufferedImage to PDFBox document and PDFBox draw black image from BufferedImage which I've tried.
I know the buffered image that I'm reading from the TIFF is valid because I can display it and see it in a JFrame.
I've also tried a PDFJpeg instead of a PDFPixelMap, and contentStream.drawImage() instead of .drawXObject(), the result is always the same.
I also tried creating the PDXObjectImage before the content stream and writing multiple images as recommended in Can't add an image to a pdf using PDFBox, the result is still the same. The output file is larger when I write the image multiple times, so it's doing "something" but I don't know what it is.
How can I write a BufferedImage to a page on a PDF using PDFBox?
try(PDDocument document = new PDDocument();ByteArrayOutputStream outputStream = new ByteArrayOutputStream())
{
PDPage blankPage = new PDPage();
document.addPage( blankPage );
// PDXObjectImage pdImage = new PDJpeg(document, image);
PDXObjectImage pdImage = new PDPixelMap(document, image);
// contentStream.drawImage(pdImage, 5, 300);
PDPageContentStream contentStream = new PDPageContentStream(document, blankPage);
contentStream.drawXObject(pdImage, 5, 5, 100, 100);
document.save("test.pdf");
contentStream.close();
}
catch(COSVisitorException ex) {
throw new IOException(ex);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDFBox leaves font files open after generating PDF - java

Related

pdfbox: how to load a font once and use it many times?

Barcode in PDF not displaying in FireFox [duplicate]

Subsetting OpenType Collection font in pdfbox

PDFBox incorrect text appearance after copy/paste

What is the recommended way to write a BufferedImage to a PDF using PDFBox?

Categories

Resources