Parse Pdf File and write content in word file using java - java

how to Parse a PDF file and write the content in word file using Java?

For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.

Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).

You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!

You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Related

How to extract data from a pdf file using JPedal?

Actually I am attempting to extract the data from a PDF file but I didn't find any example in the internet and I am asking if there is any possibility that I can use the JPedal library to open to read the data from a PDF file.
You can use PDFBox from Apache.
I am not familiar with JPedal, but I write lots of code that generates and processes pdf files. I use IText and highly recommend it. If you have a specific question on how to process a pdf file, let me know.

API's For converting A file into PDF

I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...
I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.
Use
http://pdfbox.apache.org/
and
http://poi.apache.org/
If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.
Apache's poi API is best for convert any file to pdf
You can use Itext . It is well documented and comes with ton of examples.

Java generate PDF from RTF

I want to generate PDF file from RTF file.
I have tried following.
Itext
It's already outdated and new version doesn't support rtf.
JDocConverter
It uses OpenOffice on the background. it is working fine, there is only one problem. Open office doesn't support drawing object in RTF.
Any other possible and reliable solutions?
Note: It would be fine don't use any commercial software.
Windows has native convert RTF to PDF using command line, however it will to a degree be limited, so it will use direct convert text and images, but it will depend on rtf syntax as to which drawn objects are supported. WORD ART drawing objects need MS Word to print
The output looks reasonable but here is the source in MSWord where the art was clearly not handled by the non-word printout.
Under Windows you could print to CutePDF Writer. This freeware uses Ghostscript as a back end.
You may try Aspose.Words for Java to convert RTF file to PDF format. You can load a file in RTF format into Aspose.Words for Java and then save it to PDF format. Please note that while loading specify RTF as LoadFormat value and pass PDF as SaveFormat value while saving the document. This doesn't require OpenOffice or any other software to be installed for the conversion to work.
Disclosure: I work as developer evangelist at Aspose.
Best way to do it is use MS Office. And Ms Office is able to save file in PDF format (you need install some addons I think).

How to manipulate .doc files

I need to create a little desktop app in Java that creates for me a .doc file and writes a bit of text into the file. I found an interesting tool called Aspose, but i saw it is not free at all.
Do yoy know what kind of, java API can i use for doing that(for free)?
Is it possible to do that only with the java SE libraries?
What do you think would be the easiest and fastest way to achive this goal?
I suggest you have a look at the Apache POI framework, specifically the HWPF - Java API to Handle Microsoft Word Files:
HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It also provides limited read only support for the older Word 6 and Word 95 file formats.
if you are going with .doc then as a learning excercise, open a Word document with some content (ideally similar to what you want to create) then save that as XML, and review the contents.
you will need to do some basic DOM parsing and management in your code to insert the right stuff.
By .doc file, I assume you mean Microsoft Office? Reading and writing Office file formats is something of a black art. Does it have to be a .doc format file specifically? A lot easier would be to write out a Rich Text Format file (.rtf) that Word could load.
And if you don't need to use .doc specifically I would suggest you use .odt, http://www.jopendocument.org/.

Java: Placing a Header in MS Word Document

We are converting a C++ project to Java where we generate reports in ".doc" extension. The problem is we don't use any third party library to generate MS Word document, rather a file with .doc extension. Everything works fine except that we can't seem to find a way to add a Header at the beginning of every page. Using line numbers is not an option. Any other way it can be done?
Thank you.
The Apache POI library might be of some help.
It has facilities to read and modify Microsoft proprietary file formats like MS-Word .doc and MS-Excel .xls

Categories

Resources