I am faced with an encoding problem while using embedded aspose pdf-previewer and doc-to-pdf converter in my java project.
When I try to convert a .doc file with clickable links that contain Russian symbols to a pdf file using the com.aspose.words.Document.saveToPdf(...) method I get a good pdf file. But when I try to open this file in standard aspose pdf-previewer and follow these links with Russian symbols I see a "wrong url" error.
Links by itself looks okay (Russian letters look correct) but in mouseover tooltip I note a wrong encoded symbols instead of Russian ones.
How can I deal with this problem?
Should I convert doc file with some specific options or maybe should to configure the pdf-previewer in another way?
Document.saveToPdf() method is no longer available in the latest library, you can just use Document.save("filename.ext") method to save to pdf or any other supported format.
Try the latest version, chances are that this bug might already be fixed. As, I tried to convert a Word document to Pdf with Russian letters in link, the encoding seems to work fine.
I work as a Developer Evangelist for Aspose.
Related
I'm using docx4j to convert a microsoft word document into a pdf then displaying it in a browser http://www.docx4java.org/trac/docx4j and it works well for a preview. The problem I'm facing is that this conversion loses most of the microsoft word document formatting. Page breaks and fonts don't transfer into the PDF format properly and even though I'm using standard font types docx4j doesn't come with them. In a Linux Tomcat hosted scenario fonts are not found and throw exceptions as it falls back to sans serif or other generic types.
I have found this Microsoft tool to make documents render online, but I'm behind a firewall so I cannot include this tool as an option: https://products.office.com/en-us/office-online/view-office-documents-online
I'm open to suggestion on displaying a docx file as a preview and print option from within a browser. Pdf conversion appears to be the most promising, but I run into formatting issues.
Any ideas are welcome!
Have a play with http://converter-eval.plutext.com/viewer.html
Consider it an alpha level preview. We haven't quite released it yet, but you will be able to host it behind a firewall.
It isn't open source, I'm afraid, and we're still working out pricing (and whether/how there could be a free edition).
If you only need to render a docx document in a browser, u can use Google Documents Viewer for this as :
<iframe src="http://docs.google.com/gview?url=pathOfDocx&embedded=true" />
I use the PDFBox 1.8.3 jar to print a PDF file in printer(HW). I printed the PDF file in both ways normal and program. When I print the PDF using normal way, I got the original pdf file as a printed document. But when I use my code I'm unable to get the original pdf file as the printed output. I can see a couple of changes in the printed file; for example alignments, font and ink are different from the original document.
ReadPDF readPDF = new ReadPDF();
PDDocument document = readPDF.loadPdf(path);
document.addPage(new PDPage());
printerJob.setPageable(document);
printRequestAttributeSet.add(new PageRanges(1,3));
printerJob.print(printRequestAttributeSet);
Also I try to uppgrade the PDFBox jar 1.8.3 to upcoming jar 2.0.0. I faced a few difficulties (for example: in PDFBox 2.0.0 I'm unable to use the printerJob.setPageable(document);). Could you please help me to solve this issue.
This is sometimes related with printer also. Please try out in a different printer, just to check.
You can have a look at the answer of below question on StackOverflow and can make use of extracts from the explaination.
How to determine artificial bold style ,artificial italic style and artificial outline style of a text using PDFBOX
Also, verify if the fonts which are supplied to the original PDF are also present in the container in which the application is running.
Shishir
I want to create something like this (code is here):
in pdf format. I'm using google charts and regarding to this forum converting chart to pdf is impossible. I've already tryied iText+XMLWorker, but there is some problem with css and any js supporting at all, I think.
So, the questions are: How can I convert html+css+js to .pdf file? Or, may be, the issue have other variants?
As promised in the comment, I've asked Raf. This was his answer:
One way to use XML Worker for HTML+CSS+JS is to use a browser engine to preprocess the HTML. Examples of such a browser engine are WebKit (Chrome, Safari) and Gecko (Firefox). These can interpret the CSS and JS and give you HTML that is ready to be parsed by XML Worker.
Examples of competing products are:
wkhtmltopdf, a command line tool that uses WebKit as its rendering engine.
Prince XML supports HTML+CSS+JS to PDF using their own engine.
Maybe there are others, but this is what Raf told me. I hope this helps.
I want to generate PDF file from RTF file.
I have tried following.
Itext
It's already outdated and new version doesn't support rtf.
JDocConverter
It uses OpenOffice on the background. it is working fine, there is only one problem. Open office doesn't support drawing object in RTF.
Any other possible and reliable solutions?
Note: It would be fine don't use any commercial software.
Windows has native convert RTF to PDF using command line, however it will to a degree be limited, so it will use direct convert text and images, but it will depend on rtf syntax as to which drawn objects are supported. WORD ART drawing objects need MS Word to print
The output looks reasonable but here is the source in MSWord where the art was clearly not handled by the non-word printout.
Under Windows you could print to CutePDF Writer. This freeware uses Ghostscript as a back end.
You may try Aspose.Words for Java to convert RTF file to PDF format. You can load a file in RTF format into Aspose.Words for Java and then save it to PDF format. Please note that while loading specify RTF as LoadFormat value and pass PDF as SaveFormat value while saving the document. This doesn't require OpenOffice or any other software to be installed for the conversion to work.
Disclosure: I work as developer evangelist at Aspose.
Best way to do it is use MS Office. And Ms Office is able to save file in PDF format (you need install some addons I think).
I have an Arabic PDF, and I want to parse it into text document using Java. I have tried many times, and the English words parse successfully but the Arabic words don't.
Can anyone recommend a solution that will convert the Arabic words properly as well?
There are several libraries that come to mind. Apache Tika, iText or pdfbox will all more or less solve your problem. Although, I must put in a word for Tika, as it supports language detection, and can also handle other document types too.
I think you can use iText for pdf manipulation using Java. It supports Arabic too.