We are currently working with a selection of publishers to generate online books from their PDF's. Our legacy app uses flex, so for this we are converting the PDF to SWF files using PDF2SWF by SWFTools.
The problem that we are having is that the text within the SWF document is not being highlighted by our flex reader when the user performs a search. After a quick investigation we found that when extracting text we need to embed the fonts that are used by the PDF document:
http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F
pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf
As you can see from the code above, we need a path to a font directory containig the fonts found within that PDF.
Since we will be converting a large number of PDF's, is it possible to access the font files directly through the PDF rather than having a lot of fonts stored within our app?
Additional Information
Our app is written in Java.
We are currently using PDFBox and Ghostscript within the app, so if any solutions use these libraries than that would be a preferred option, but we are open to all ideas.
PDF files don't contain font 'files' they may not even contain any fonts at all, though this is rare. The embedded font data can be in a bewildering variety of formats:
type 1 PostScript fonts
type 3 PostScript
fonts TrueType fonts
PostScript CFF fonts
CIDFonts with type 1 PostScript outlines
CIDFonts with type 3 PostScript outlines
CIDFonts with TrueType outlines
CIDFonts with CFF outlines
CIDFonts with bitmap images
Will your application be able to read all these font formats ? If you want to use them then you must use the fonts embedded in the PDF file as these will very often be subset fonts, and supplied with a custom Encoding, which means that even if you have the original font, you can't use it because the Encoding will not be correct.
Of course it may be that these PDF files are all created in a consistent way and do not use embedded fonts, but I have my doubts....
Related
I am using aspose java jar for web application and aspose Android via java for mobile application. By default aspose increase font size when i create pdf from mobile . Can someone tell me what exactly aspose doing for mobile?
Thanks for your inquiry. I suspect that the difference in font size is actually the difference in fonts. Android has a limited set of fonts and when Aspose.Words cannot find some fonts upon rendering document to PDF it tries to substitute them. You can configure font substitution rules using FontSubstitutionSetings class.
If you need the exact rendering, then the same set of fonts must be used on different platforms. You can put the required fonts into a folder and use it as font source in FontSettings.
Right now I'm working on displaying LaTeX generated document with Java.
Strictly speaking, LaTeX source can be used to directly generate two formats:
DVI using latex, the first one to be supported;
PDF using pdflatex, more recent.
However rendering dvi or pdf is not available as far as I know.
Is there any way to handle those formats ? Or maybe others that makes sense ?
There are not enough details with regards to how you wish to "render" DVI or PDF from a LaTeX document. However, you could always just render the pdf using pdflatex and DVI using latex and use ICEpdf for viewing PDFs and javaDVI for viewing DVIs.
Another neat hack to display pdf in a panel is to pass the file path to an embedded web component in the application, and the web component will use whatever pdf rendering tool is available on your machine (Acrobat, Foxit, Preview, etc.)
I remember there was a post about this a long time ago.
I don't think there's a generic way to preview the rendered output without generating the file itself. You can write your own LaTeX engine which caches the output every few seconds and displays that but regardless of the storage, you have to output it somewhere physically and then render the output separately using any of the steps mentioned above.
Another approach is to convert the div output to an svg image file and render that with SVGGraphics2D. That will produce nice scalable results. Dvi files can be converted to svg on the command line (or in a script) using:
dvisvgm --no-fonts input.dvi -o output.svg
For more conversion options see this thread on how to convert pdf to clean svg.
Currently I'm developing an application that allows users to create a template and generate it into a DOCX file. The application needs to be able to display to users the changes in the template as the user is creating it.
The approach I tried was using DOCX4J library (allows manipulation of DOCX file) and ICEPDF which is primarily used to display the DOCX into the swing component by converting it first into a PDF file. Now the problem in this approach is that it loads pretty slow and some of the changes that occurs in the DOCX file does not reflect on the PDF conversion (example: dashed underline, font changes). When I tried to open the DOCX file ouput in MS WORD, the file is viewed correctly so I know changes do occur, but it seems that ICEPDF just can't show it properly.
So I was wondering if anyone knows a java library that allows DOCX files to be viewed directly from a Swing Component instead of converting it first into a PDF file.
You can try docx4all or DocxEditorKit. Both of these are built around docx4j.
I am using PDFBox to display PDF files inside a JInternalFrame. When opening PDF I get lots of warnings like this:
Changing font on <m> from <Tahoma Negrita> to the default font
I am aware that the fonts being reported are not part of the standard set of 14 fonts. So I decided to check if those fonts are embedded on the PDF file (thinking that there shouldn't be a problem loading embedded fonts, right?).
So I open the file on different readers and check properties/fonts. I am in doubt whether this section reports fonts required by the document or fonts actually embedded in the document.
The information that I get is as follows:
BAAAA+Tahoma-Bold (embedded Subset), type:TrueType, Encoding:
CAAAA+Tahoma (Embedded Subset), type:TrueType, Encoding:
Confused about this, I researched on how to embed fonts from OpenOffice and found that the PDF/A-1a option should be checked. So I made another PDF using this option (in case this was not used when making the original PDF file), yet I got the same results.
I would like your guidance understanding how this works. I would like to be able to open PDF files just as PDF readers do. I also read about the PDFBox_External_Fonts.properties but I am guessing this file shouldn't be modified since I am dealing with embedded fonts.
Thanks.
pdfbox is not able to parse embedded subsets of TrueType fonts.
As far as I understand it, embedded TrueType subsets are missing some metadata for the font file that pdfbox needs.
The bug is known but not easy to solve. Right now I can only advise to use embedded Type 1 Fonts if possible, pdfbox can deal with them.
You can also try to set the path to your complete font files in your pdfbox.jar under org/apache/pdfbox/resources/PDFBox_External_Fonts.properties, so if pdfbox cannot parse the subset, at least it can find a full path to the original font file. Maybe that works, but I have not tested this.
Good Luck!
I would like to have a preview of a .pdf, .docx or .doc file inside a JDialog. But I'm unable to find previewers that allow nesting of such previews inside a Swing application. Alternatively are there any previewers that can transform such files into .html and then display them in a TextPane.
Fidelity isn't that much of an issue as is embedding and ease of use. Also I don't require one tool to be able to preview all types of files.
That's a tough one because of the formats you're dealing with. You might want to try ImageMagik for PDF -> image format for display in your TextPane. If that works well enough for PDFs, then you could use JOD Converter or Docmosis to get from Doc -> PDF then ImageMagick again for a display image. JODConverter and Docmosis are based on OpenOffice which can do pretty rough html / xhtml output as another option for display. The latest version of OpenOffice can read docx also, meaning all your bases are covered, and if fidelity is not too big a deal as you've indciated, then JODConverter/Docmosis and ImageMagick might be a combo you can use.