I wanted to ask how can we proceed about manipulating a PDF file using Java. I am familiar with the structure of a PDF file (how its objects are arranged etc.). I would like to know how to proceed from scratch. By scratch I mean I don't want to use the freely available API's and libraries available, I want to strip a PDF down to its constituent objects.
Don;t try to reinvent the wheel.
There is already excellent work done here
Related
I am currently working at a project which generates contracts. The idea is that I put the data in a form and save it in a simple database.
So long, this was my favorite place to search for good ideas and simple solutions.
Now I am facing another problem and I don't know how I can solve that. I want to create a PDF and replace some placeholders with some data from my form.
One idea was, that I use an existing Word template with some bookmarks and replace them with the data from my form. Maybe there is a way to do that, and I am just too stupid to find it.
Another idea was, that I am using XML. Therefore, I thought I was clever and just converted the Word template to an PDF, so I am able to convert that PDF to an XML. Attached, you find the XML file. But now I need the XSL file - is there an easy way to create the XSL file?
Or maybe there is another simple solution to solve my problem.
In these attachments you find the PDF file, the Word template and the XML:
Thank you a lot :)
Using a template is a good idea - it makes some changes much quicker to make and then deploy. The comments above are focused on conversion, but don't forget you need to merge your data in (population) first.
If you can use Adobe tools, you can have a PDF template and use the Adobe tools to populate. This saves a "conversion" stage.
You mentioned using Word for templates. This means you to run through two stages of processing:
population - docx is a zipped set of XML files - so you can process them with your own code or using a library.
conversion - you need pdf, so you have to convert the docx to pdf. You also have to watch out for fonts at this stage (ie make sure they are available on your host).
The population stage you could do yourself since you are familiar with XML. But it is definitely complicated. The conversion needs to use a tool that is ideal for it. There are a few mentioned in the comments already.
There are some free/os and commercial tools that can do both parts:
docx4j
JOD Reports
Libre Office (using the Java Uno API) (I blogged this once - Java Convert Word to PDF with UNO)
Docmosis (please note I work for Docmosis)
I suggest starting with the simple example you have attached and prove you can both populate and convert that. Then switch to a more complicated example to see if you can do the other things that might be required (eg repeating or conditions or other logic) during the population stage.
I have been bumping my head against the wall with this one, have researched and pretty much tried every library suggested to me. I am currently trying to write a program in java that will extract text AND images from a pdf file and allow me to write the extracted content to a word file. I have managed to extract the content using the ICEpdf library, however the problem is that I need to be able to write the content in the exact same order as it was read. So, to clarify, I need a library that will help me keep track of where exactly in the page the text and images are situated so I can put them in the same place in my word file.
A PDF to Word converter is a horribly complex proposition.
Your best bet will probably to use Open Office to do it for you and not even try to handle the intermediate steps.
http://www.openoffice.org/api/
Look at this: Advanced PDF parser for Java
OFF:
-Also to my knowledge there is a python parser that sorta converts the pdf to html (that way you can keep track of the ordering of the objects within the pdf). I know its not java, but you might be able to use the output.
http://www.unixuser.org/~euske/python/pdfminer/index.html
I'm trying to create an automated "spider diagram" like the ones created by VUE:
http://vue.tufts.edu/
VUE is open source, but the issue is that you create the maps in the program. I want to have a program that will pull the data from an excel sheet and display the map automatically when run.
I know how to open and parse the data in files, so reading the file isn't the issue. I can program the behavior of how I want everything to "link up", but I just don't want to have to create an applet, then develop the software from scratch.
If I made anything unclear, let me know. I'm very tired today, so it's difficult to stay focused very long.
Many thanks!
-Justian
JGraph is a library to do that. You give it the node and edges and it figures out how to present them in a meaningful way. It is kind of like using graphviz but in Java.
For visualization of production runs we use graphviz out of process and show the images generated from that. It works fine, but a single process solution would be better.
Reading an excel as CSV should be straightforward. POI allows you to read directly the Excel files.
I'm trying to generate some graphs with prefuse, and it seems like the easiest way to load the data into prefuse is to use a GraphML file.
Is there an easy way to write these files from my data?
Or is there an easier way to load my data into prefuse?
Thanks
yEd can export graphs in GraphML format and JGraphT has a GraphMLExporter. Leaves the problem on how to get your data into those products or libraries. But at least both can create the desired format.
on the other hand - GraphML is in XML format so you can easily use jdom or dom4j to create a DOM, add the nodes based on your data an serialize it to an XML file. This shouldn't be to complicated.
You could use the Network Workbench, which allows you to load data in a lot of different forms including edge lists. Edge lists are usually the easiest format to generate.
I'm not completely sure if you can export from NWB to say GraphML, but NWB includes a number of visualizations, some of which are based on Prefuse.
If you want to do more with your data than just visualize it then NWB might help you.
Check PyGraphML, a basic Python library designed to parse and generate GraphML files. http://hadim.github.io/pygraphml/index.html
Can anyone tell me how to create doc files using java?
I know, there's a POI library, but it seems like it can save only simple documents. You can read anything you wish, but you can't save it all back again. Or may be i missed something? How can i save whole document with pictures, tables and styles?
Docmosis lets you do heaps of styling easily via the template. Uses OpenOffice and Java to do the job. Its free and free to distribute.
It ain't cheap, but you could try Aspose.Words. It'll do everything you state above and more.
JODConverter will allow you to convert lots of different document formats
OpenOffice.org's Universal Network Objects (UNO), allow you to generate .doc,.PDF, as well as OpenOffice documents. It supports several programming languages like: Java, C++, Visual Basic, etcc..
Some good things is that: its free, open source and plataform-independent
You can build documents, spreadsheets, presentations, etc. Start from scratch or using a template and fill the gaps..
In order to use it you will need to include some libraries that comes with the OpenOffice suite.
Useful links:
Open Office home
Open Office UNO Developer's Guide