API's For converting A file into PDF - java

I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...

I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.

Use
http://pdfbox.apache.org/
and
http://poi.apache.org/

If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.

Apache's poi API is best for convert any file to pdf

You can use Itext . It is well documented and comes with ton of examples.

Related

Extract text from PCL6 using Java

I don't have much knowledge regarding PCL6 file format. I wanted to know if there is any way to extract text out of PCL6 file using Java.
Thanks,
Usman
Convert the file to PDF (see Ghostscript/GhostPDL) and then use Apache Tika.
The first step will require to use some Runtime.getRuntime().exec(...)

Parsing pdf files using apache camel

How do i read/parse pdf files using Apache Camel. Any specific example or code snippets to parse the file ??
appreciate your help.
Thanks in advance.
You could use the Apache Tika project to extract data from you PDF files. It is a generic tool to extract data from various types of documents. It uses PDFBox under the hood for PDF.
Camel is not about parsing any files at all. You may want to take a look at Apache PDFBox
There is a camel-fop component: http://camel.apache.org/fop but its only for rendering pdf files. There is no support for parsing a pdf file.
actually with the component pdf of camel you can actually also extract text, you can see an example of how to do that here: https://github.com/apache/camel/blob/master/components/camel-pdf/src/test/java/org/apache/camel/component/pdf/PdfTextExtractionTest.java
the component is based on Apache PDFBox:
https://camel.apache.org/components/latest/pdf-component.html

convert docx to doc with java

I have a legacy software which produces a xml and then with help of docx4j a docx document . I must also create a microsoft doc document from the xml file with java.
How can I do that. I'd really appreciate for any help.
Thanks
Look into poi. It's pretty much the defacto standard for modifying Microsoft documents with Java.
docx4j has POI as a dependency, and POI has reasonable support for the legacy binary doc format (hwpf). So you could use that to convert to doc without introducing additional dependencies. Basically, iterate through your content, and emit each paragraph/table/image in doc format. That would be the reverse of convert/in/Doc.java.
However, the devil is in the detail, and it would be a lot of work if your documents contain a variety of features. This assertion stands whether you were doing docx4j to binary doc (hwpf), or POI's own xwpf to hwpf, since POI doesn't have a common interface across the two of them.
So instead of using POI for this, I'd use JODConverter to drive LibreOffice (or OpenOffice, their docx features are a bit different) to convert docx to legacy binary .doc.
The JODConverter approach is definitely the path of least resistance, and will generally give good results. The downside with it is that if you find something which isn't supported properly, you'll have to wait for the LO/OO guys to fix it, which wouldn't be the case if you did decide to build binary doc output for docx4j using POI. If you did build this, we'd happily accept it as a contribution :-)

Creating PDF, HTML, and optionally RTF documents from the same source using Java?

I was looking at using iText to create both a pdf and html version of a document with RTF as a possible option. According to this question this is no longer possible with iText. Is there a library that will allow me to create a document in Java and output it as both PDF and HTML? The ability to output RTF would be nice but is not required.
As that answer to the other question states, you can just use the iText RTF Library.
I have used PD4ML to convert HTML to pdf. Even though it is a commercial app. It is very reliable and supports CSS well.
JasperReports. If you look at this package it supports export to:
pdf
html
rtf
xls
xml
You have two options to create the documents:
via iReport - a visual designer for reports
via an API, where you construct everything with Java code.
Note that even though JasperReports's main function is to create reports, it can very well create other documents, with no tabular data for example.
You could also try Docmosis since that supports the output formats provided by OpenOffice (including the ones you specified) and you can often do the job with a lot less code.

Parse Pdf File and write content in word file using java

how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Categories

Resources