I have an existing PDF file that I would like to convert to excel file using python script. Currently using PDFBox, however there are multiple errors similar to the following:
org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
No Unicode mapping for CID+24 (24) in font DroidSansFallback
Can I substitute the droidsansfallback font or replace the font with another font using pdfbox or other java/python script?
Please help.
Convert PDF using print to Microsoft pdf file and use that. It will take care of all fonts
Related
What ways are there to convert an RTF to PDF that contains a table in the document in Windows or Unix using Java?
The option we have tried here are:
ITEXT - But the table inside the rtf document is not coming properly once converted to PDF. In short the PDF doesn't contain the Table. Here is the code gist. ITEXT for rtf to pdf java code gist
POI - Does apache POI support RTF document parsing? But I found that it is not supported. POI support for RTF
TIKA - Using Tika I am able to read the document, but the table in RTFis not parsed correctly and I don't know how to convert it to PDF. TIKA java code for reading rtf
We have looked into other options. Is possible to develop or convert RTF to PDF with Java?
Other options we looked into are in this link
Yes, its possible. Take a look at JasperReports!
http://community.jaspersoft.com/project/jasperreports-library
There is also a good API available from Jaspersoft to code your custom PDF-engine and your custom datasource. Start with iReport (UI-Editor).
I am using PDFBox to display PDF files inside a JInternalFrame. When opening PDF I get lots of warnings like this:
Changing font on <m> from <Tahoma Negrita> to the default font
I am aware that the fonts being reported are not part of the standard set of 14 fonts. So I decided to check if those fonts are embedded on the PDF file (thinking that there shouldn't be a problem loading embedded fonts, right?).
So I open the file on different readers and check properties/fonts. I am in doubt whether this section reports fonts required by the document or fonts actually embedded in the document.
The information that I get is as follows:
BAAAA+Tahoma-Bold (embedded Subset), type:TrueType, Encoding:
CAAAA+Tahoma (Embedded Subset), type:TrueType, Encoding:
Confused about this, I researched on how to embed fonts from OpenOffice and found that the PDF/A-1a option should be checked. So I made another PDF using this option (in case this was not used when making the original PDF file), yet I got the same results.
I would like your guidance understanding how this works. I would like to be able to open PDF files just as PDF readers do. I also read about the PDFBox_External_Fonts.properties but I am guessing this file shouldn't be modified since I am dealing with embedded fonts.
Thanks.
pdfbox is not able to parse embedded subsets of TrueType fonts.
As far as I understand it, embedded TrueType subsets are missing some metadata for the font file that pdfbox needs.
The bug is known but not easy to solve. Right now I can only advise to use embedded Type 1 Fonts if possible, pdfbox can deal with them.
You can also try to set the path to your complete font files in your pdfbox.jar under org/apache/pdfbox/resources/PDFBox_External_Fonts.properties, so if pdfbox cannot parse the subset, at least it can find a full path to the original font file. Maybe that works, but I have not tested this.
Good Luck!
I have an application which generates PDFs. Now I'm using Apache FOP just for generate a document from scratch (XML+XSLT). The question is there some kind of library/method that I can treat my source PDF document as a template?
I mean, I create a document with Adobe Acrobat and just set there some markups like ${Name}, ${Surname}, ${Address} and then I put it into the library providing values for Name, Surname and Address.
Hope you can understand me.
Regards.
PDFBox, iText and PDFlib are PDF libraries that allow you to modify existing PDF files instead of only generating them like FOP does. This would allow you to load the template document and replace the placeholders with the actual values.
http://pdfbox.apache.org/
http://itextpdf.com/
http://www.pdflib.com/
PDFBox also provides sample code on how to replace a string in the document with another value: https://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/pdmodel/ReplaceString.html
I want to generate PDF file from RTF file.
I have tried following.
Itext
It's already outdated and new version doesn't support rtf.
JDocConverter
It uses OpenOffice on the background. it is working fine, there is only one problem. Open office doesn't support drawing object in RTF.
Any other possible and reliable solutions?
Note: It would be fine don't use any commercial software.
Windows has native convert RTF to PDF using command line, however it will to a degree be limited, so it will use direct convert text and images, but it will depend on rtf syntax as to which drawn objects are supported. WORD ART drawing objects need MS Word to print
The output looks reasonable but here is the source in MSWord where the art was clearly not handled by the non-word printout.
Under Windows you could print to CutePDF Writer. This freeware uses Ghostscript as a back end.
You may try Aspose.Words for Java to convert RTF file to PDF format. You can load a file in RTF format into Aspose.Words for Java and then save it to PDF format. Please note that while loading specify RTF as LoadFormat value and pass PDF as SaveFormat value while saving the document. This doesn't require OpenOffice or any other software to be installed for the conversion to work.
Disclosure: I work as developer evangelist at Aspose.
Best way to do it is use MS Office. And Ms Office is able to save file in PDF format (you need install some addons I think).
I am developing a standalone application in Java. I want to generate a pdf file using Java code. I have a display form in which all the details are fetched from database and displayed in the window. Details are Customer Name, Order Details etc.
Now I want to have a button there which says Convert to pdf.
I want to convert this to pdf file with proper alignment and formatting like tables, font etc.
What can be an ideal way to go about it?
I'd suggest you to use reporting tool like a jasperreports.
JasperReports is entirely written in
Java and it is able to use data coming
from any kind of data source and
produce pixel-perfect documents that
can be viewed, printed or exported in
a variety of document formats
including HTML, PDF, Excel, OpenOffice
and Word.
Have a look at other open source projects (pdf api):
Apache PDFBox
Apache Tika (Toolkit for detecting and extracting metadata and structured text content from various documents using POI and PDFBOX parser libs.)
PDFjet
Use iText:
http://itextpdf.com/