How to render Asian characters in a PDF using xhtmlrenderer - java

I was wondering what steps were needed to render Asian characters using the java based xhtmlrenderer (flying saucer) library?
I am wanting to render the following:
<html>
<body>同名の映画のモデ</body>
</html>
Without any font settings being added to the HTML this renders fine in normal browsers, but I can't find anyway to render this to PDF using the iTextRenderer portion of xhtmlrenderer.
After following various threads on the mailing list, I see lots of posts talking about adding .TTF files from the c:\windows\fonts directory, and I have modified the examples to run on linux ( https://gist.github.com/643173745182c9becc57 ), which shows me various fonts being displayed, but I don't see any Asian glyffs.
Does anyone have any decent pointers, or clean solutions to this problem? Or am I looking at the wrong problem with a really simple solution elsewhere?

You can also add the font style information in css.
#font-face {
font-family: 'your_font_face_name';
src: url('your_font.ttf');
-fs-pdf-font-embed: embed;
-fs-pdf-font-encoding: Identity-H;
}

To support the big character set you need to specify a font file that has all those characters in it. Once you've picked a font file you'll need your application to point to that file. I've found that just putting the font files in your font's directory doesn't work.
Try embedding the font too, eg.
renderer.getFontResolver().addFont("your_font_file.ttf", BaseFont.EMBEDDED);
This link has quite a few font files.

Related

PDFBox - show icon for embedded files in pdf

I developed a Java PDF viewer using Apache PDFBox. The problem is, when rendering a page of a PDF, if the page has file attachments, there is no icon shown in PDFBox rendering, like there is a paper clip icon, when such a file is opened in Adobe PDF reader.
Is it possible to automatically have such icons in the rendering using PDFBox? I think I saw such a code some time ago, like a single line that switches this behavior on and off but I can't find it. Thanks.
This was fixed in PDFBOX-5394 and will be in the version 2.0.26. However only one single symbol will be shown at this time: a paperclip in fixed size.

How to use custom (installed) font for displaying non-english text in Java

I am unable to use custom font for displaying Urdu text in my awt application. I am new to java and I have searched the api and internet thoroughly but to no avail. Interestingly, Urdu text can be displayed with fonts like Times New Roman, Arial etc but not with the font I want to use despite the fact that its ttf and is able to diplay all the glyphs.
What packages/classes I should use to remedy this problem?
Any help would be greatly appreceiated.

Displaying embedded fonts with PDFBox and Swing

I am using PDFBox to display PDF files inside a JInternalFrame. When opening PDF I get lots of warnings like this:
Changing font on <m> from <Tahoma Negrita> to the default font
I am aware that the fonts being reported are not part of the standard set of 14 fonts. So I decided to check if those fonts are embedded on the PDF file (thinking that there shouldn't be a problem loading embedded fonts, right?).
So I open the file on different readers and check properties/fonts. I am in doubt whether this section reports fonts required by the document or fonts actually embedded in the document.
The information that I get is as follows:
BAAAA+Tahoma-Bold (embedded Subset), type:TrueType, Encoding:
CAAAA+Tahoma (Embedded Subset), type:TrueType, Encoding:
Confused about this, I researched on how to embed fonts from OpenOffice and found that the PDF/A-1a option should be checked. So I made another PDF using this option (in case this was not used when making the original PDF file), yet I got the same results.
I would like your guidance understanding how this works. I would like to be able to open PDF files just as PDF readers do. I also read about the PDFBox_External_Fonts.properties but I am guessing this file shouldn't be modified since I am dealing with embedded fonts.
Thanks.
pdfbox is not able to parse embedded subsets of TrueType fonts.
As far as I understand it, embedded TrueType subsets are missing some metadata for the font file that pdfbox needs.
The bug is known but not easy to solve. Right now I can only advise to use embedded Type 1 Fonts if possible, pdfbox can deal with them.
You can also try to set the path to your complete font files in your pdfbox.jar under org/apache/pdfbox/resources/PDFBox_External_Fonts.properties, so if pdfbox cannot parse the subset, at least it can find a full path to the original font file. Maybe that works, but I have not tested this.
Good Luck!

Custom PDF creation - Large images

Looking for a Java based PDF creation library. We're currently using Apache Velocity with HTML to render PDFs on the fly.
We'd like to be able to find a way to render large images (sometimes as big as 3000 x 1700) in a creative manner within the PDF container. For instance, a scrollable image pane within a PDF. This might not be possible within a PDF, I might be wrong.
Open source would ideal.
For a good PDF library you should take a look at iText: http://itextpdf.com/
I have used images of around 5000x4000 with iText without any problems.
I don't know if it is possible to create a working scrollpane inside a PDF, unless of course you were doing it through a custom PDF creator/viewer.
iText is open source but make sure to check out the AGPL license before you use it commecrially: http://itextpdf.com/terms-of-use/agpl.php
For just creating PDF files from images iText is a little overdimensioned. Give xsPDF a chance, it has no limits for images sizes and seems to be appropriate for your problem.
Just a FYI for anyone that may run into this in the future:
I used a library called PDFBox (http://pdfbox.apache.org/) to open a pre-existing PDF and modify the PDF with a custom sized PDFRectangle with the dimensions of the image. Then inserted the image and rectangle into that new page and got the desired results.
I didn't realize you could have multiple page sizes in a single PDF.

Accessing font files within PDF

We are currently working with a selection of publishers to generate online books from their PDF's. Our legacy app uses flex, so for this we are converting the PDF to SWF files using PDF2SWF by SWFTools.
The problem that we are having is that the text within the SWF document is not being highlighted by our flex reader when the user performs a search. After a quick investigation we found that when extracting text we need to embed the fonts that are used by the PDF document:
http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F
pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf
As you can see from the code above, we need a path to a font directory containig the fonts found within that PDF.
Since we will be converting a large number of PDF's, is it possible to access the font files directly through the PDF rather than having a lot of fonts stored within our app?
Additional Information
Our app is written in Java.
We are currently using PDFBox and Ghostscript within the app, so if any solutions use these libraries than that would be a preferred option, but we are open to all ideas.
PDF files don't contain font 'files' they may not even contain any fonts at all, though this is rare. The embedded font data can be in a bewildering variety of formats:
type 1 PostScript fonts
type 3 PostScript
fonts TrueType fonts
PostScript CFF fonts
CIDFonts with type 1 PostScript outlines
CIDFonts with type 3 PostScript outlines
CIDFonts with TrueType outlines
CIDFonts with CFF outlines
CIDFonts with bitmap images
Will your application be able to read all these font formats ? If you want to use them then you must use the fonts embedded in the PDF file as these will very often be subset fonts, and supplied with a custom Encoding, which means that even if you have the original font, you can't use it because the Encoding will not be correct.
Of course it may be that these PDF files are all created in a consistent way and do not use embedded fonts, but I have my doubts....

Categories

Resources