Extract text from PCL6 using Java - java

I don't have much knowledge regarding PCL6 file format. I wanted to know if there is any way to extract text out of PCL6 file using Java.
Thanks,
Usman

Convert the file to PDF (see Ghostscript/GhostPDL) and then use Apache Tika.
The first step will require to use some Runtime.getRuntime().exec(...)

Related

How to modify .docx template in either Java or Python

I want to be able to generate .docx files through either Java or Python based off of a template .docx file. I need to be able to insert in simple text, some bullets and a table or two.
I would like suggestions on specific libraries/modules for either Python or Java that would allow me to load a template, insert basic text and tables and then save it.
I have been looking into JACOB for Java and docx for Python. Any alternatives? Or will one of these be able to do what I need?
Thanks in advance
If you want to generate a docx, than you might like docxtemplater, which is a library I maintain which does docx generation from a template (much like Mustache for HTML).
It runs on node but has a command line interface so you can use it from any language.
DocxTemplater Library
Demo Site
Give docx4j as a choice, it's based on Apache POI but with better documentation
Did you look at Aapache POI (the Java API for Microsoft Documents) project?
http://poi.apache.org/
Good luck!

How to extract data from a pdf file using JPedal?

Actually I am attempting to extract the data from a PDF file but I didn't find any example in the internet and I am asking if there is any possibility that I can use the JPedal library to open to read the data from a PDF file.
You can use PDFBox from Apache.
I am not familiar with JPedal, but I write lots of code that generates and processes pdf files. I use IText and highly recommend it. If you have a specific question on how to process a pdf file, let me know.

API's For converting A file into PDF

I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...
I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.
Use
http://pdfbox.apache.org/
and
http://poi.apache.org/
If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.
Apache's poi API is best for convert any file to pdf
You can use Itext . It is well documented and comes with ton of examples.

How to manipulate .doc files

I need to create a little desktop app in Java that creates for me a .doc file and writes a bit of text into the file. I found an interesting tool called Aspose, but i saw it is not free at all.
Do yoy know what kind of, java API can i use for doing that(for free)?
Is it possible to do that only with the java SE libraries?
What do you think would be the easiest and fastest way to achive this goal?
I suggest you have a look at the Apache POI framework, specifically the HWPF - Java API to Handle Microsoft Word Files:
HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It also provides limited read only support for the older Word 6 and Word 95 file formats.
if you are going with .doc then as a learning excercise, open a Word document with some content (ideally similar to what you want to create) then save that as XML, and review the contents.
you will need to do some basic DOM parsing and management in your code to insert the right stuff.
By .doc file, I assume you mean Microsoft Office? Reading and writing Office file formats is something of a black art. Does it have to be a .doc format file specifically? A lot easier would be to write out a Rich Text Format file (.rtf) that Word could load.
And if you don't need to use .doc specifically I would suggest you use .odt, http://www.jopendocument.org/.

Parse Pdf File and write content in word file using java

how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Categories

Resources