Java FPDF libraries. Compression Issues

Java FPDF libraries. Compression Issues - java

I am using Java FPDF libraries.
I successfully generated a document that have special chars, such as Euro and copyright symbol.
The document is rendered as expected (yes, I managed to resolve the charset encoding issues), but only if I set the compression to false, using function myDoc.setCompression(false);.
As soon as the compression is set to true, the Euro symbol is rendered as â‚¬, and the Copyright symbol is rendered as Â©.
I had a view to the libraries and everything looks fine to me. It is using the Java implementation of the ZLIB algorithm, and the command written to the output file is /Filter /FlateDecode, as per the PDF 1.3 reference (page 54 referred to the document).
I know that the document isn't corrupt, because the PDF viewer recognizes it and render all the text and images as it should, apart from those special chars.
After many tests, I couldn't resolve the problem. My suspect is that there is a problem either in the compression libraries or in the viewer itself.
Should be there any problem with the Java code, does anyone know how to fix it?.
Thanks for your help.

Related

Adobe PDFBox produces unexpected changes for load followed by save

I'm working on a tool to automatically generate PDF files based on a template file and other source data. I'm using Mac OS 10.14.4, Java 1.8 and PDFBox version 2.0.15.
As a basic test, I trimmed the open-and-save code down to two lines, which have an obvious problem for one particular PDF and more subtle issues for all other PDFs I've tried:
PDDocument targetPDF = PDDocument.load(new File(templatePath));
targetPDF.save(targetFileName)
The observed problem for one particular PDF is that unexpected characters are inserted at the top of the first page. (They appear to be in an alphabet which is not otherwise used, and are clipped.) Other PDFs are visually similar, but very different when I run them through diff. Is this something tricky I should do to save the files? Is this a problem with that one file? Is PDFBox is doing something odd?
I've looked for similar reports, and found a few that are concerned with the size of the output files: In PDFBox, why does file size becomes extremely large after saving? and Split and merge pdf files using PDFBOX produces large file I do see a noticeable increase in file size, but not as much as those questions report. In one case, the input and output files are visually different. In others, diff -y --text template.pdf target.pdf reports large differences but I don't detect any differences by eye alone. (In Mac's built-in "Preview" document viewer. The breaking "template.pdf" is created in Adobe Acrobat. I don't know about the non-breaking files.)
After comparison to https://issues.apache.org/jira/browse/PDFBOX-2690 and http://useof.org/java-open-source/org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject I tried adding targetPDF.close() after the targetPDF.save(), but that made no difference.
https://pdfbox.apache.org/2.0/faq.html seems to suggest closing a content stream before saving the file, but I don't know how to do that. (File itself doesn't have a close() method. None of the methods on PDDocument seem related to streams or closing them, except PDDocument.close() itself, which prevents saving.)
I would paste in some log files here, but I'm not getting any log messages from PDFBox classes....

Issue with russian-letter links in Aspose pdf-viewer

I am faced with an encoding problem while using embedded aspose pdf-previewer and doc-to-pdf converter in my java project.
When I try to convert a .doc file with clickable links that contain Russian symbols to a pdf file using the com.aspose.words.Document.saveToPdf(...) method I get a good pdf file. But when I try to open this file in standard aspose pdf-previewer and follow these links with Russian symbols I see a "wrong url" error.
Links by itself looks okay (Russian letters look correct) but in mouseover tooltip I note a wrong encoded symbols instead of Russian ones.
How can I deal with this problem?
Should I convert doc file with some specific options or maybe should to configure the pdf-previewer in another way?

Document.saveToPdf() method is no longer available in the latest library, you can just use Document.save("filename.ext") method to save to pdf or any other supported format.
Try the latest version, chances are that this bug might already be fixed. As, I tried to convert a Word document to Pdf with Russian letters in link, the encoding seems to work fine.
I work as a Developer Evangelist for Aspose.

Displaying embedded fonts with PDFBox and Swing

I am using PDFBox to display PDF files inside a JInternalFrame. When opening PDF I get lots of warnings like this:
Changing font on <m> from <Tahoma Negrita> to the default font
I am aware that the fonts being reported are not part of the standard set of 14 fonts. So I decided to check if those fonts are embedded on the PDF file (thinking that there shouldn't be a problem loading embedded fonts, right?).
So I open the file on different readers and check properties/fonts. I am in doubt whether this section reports fonts required by the document or fonts actually embedded in the document.
The information that I get is as follows:
BAAAA+Tahoma-Bold (embedded Subset), type:TrueType, Encoding:
CAAAA+Tahoma (Embedded Subset), type:TrueType, Encoding:
Confused about this, I researched on how to embed fonts from OpenOffice and found that the PDF/A-1a option should be checked. So I made another PDF using this option (in case this was not used when making the original PDF file), yet I got the same results.
I would like your guidance understanding how this works. I would like to be able to open PDF files just as PDF readers do. I also read about the PDFBox_External_Fonts.properties but I am guessing this file shouldn't be modified since I am dealing with embedded fonts.
Thanks.

pdfbox is not able to parse embedded subsets of TrueType fonts.
As far as I understand it, embedded TrueType subsets are missing some metadata for the font file that pdfbox needs.
The bug is known but not easy to solve. Right now I can only advise to use embedded Type 1 Fonts if possible, pdfbox can deal with them.
You can also try to set the path to your complete font files in your pdfbox.jar under org/apache/pdfbox/resources/PDFBox_External_Fonts.properties, so if pdfbox cannot parse the subset, at least it can find a full path to the original font file. Maybe that works, but I have not tested this.
Good Luck!

International characters with Java

I am building an app that takes information from java and builds an excel spreadsheet. Some of the information contains international characters. I am having an issue when Russian characters, for example, are rendered correctly in Java, but when I send those characters to Excel, they are not rendered properly. I initially thought the issue was an encoding problem, but I am now thinking that the problem is simply not have the Russian language pack loaded on my Windows 7 machine.
I need to know if there is a way for a Java application to "force" Excel to show international characters.
Thanks

Check the file encoding you're using is characters don't show up. Java defaults to platform native encoding (Win-1252 for Windows) instead of UTF-8. You can explicitly set the writers to use UTF-8 or Cyrillic encoding.

Exporting a JasperReport to PDF, Characters Missing

I have a Java application that is generating JasperReports. It will create as many as three JasperPrints from a single report: one prints on the printer, one is serialized and saved to the database, and the third is exported to PDF using Jasper's built-in export capability.
The problem is that when exporting to PDF, characters containing 8 or more bits (i.e. not 7-bit ASCII) are showing up as empty squares, meaning Acrobat Reader is not able to display that character. The print version is correct, and loading the database version and printing it shows up correctly. If I change the PDF exported version to a different format, e.g. XML, the character shows up fine in a web browser.
Based on the evidence, I believe the issue is something specific to font handling in PDFs, but I am not sure what.
The font used is Lucida Sans Typewriter, a Unicode monospaced font. The Windows "font" directory is listed in the Java classpath: without this step, PDF exporting fails miserably with zero text at all, so I know it is finding the font.
The specific characters not displayed are accented characters used in Spanish text: á, é, í, ó, and ú. I haven't checked ñ but I am guessing that won't work too.
Any ideas what the problem is, areas of the system to check, or maybe parameters I need to send to the export process?

The PDF encoding used for exporting was UTF-8, and apparently the font didn't support that properly. When I changed it to ISO-8859-1, every character showed up correctly in the PDF output.

In iReport, try setting the Pdf Embedded property of your TextFields to true.

I'm using Jasper Report 6, My team has spend a few days to display Khmer Unicode. I have found solution finally, and everything work as expected.
follow this https://community.jaspersoft.com/wiki/custom-font-font-extension
after you exported, upload your jar file to lib folder and restart your jasper server.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.