This question already has answers here:
Open source PDF library for C/C++ application? [closed]
(10 answers)
Closed 9 years ago.
It used to be, and remains, possible for one's program to output (encapsulated) postscript by simply writing some lines in a text file. To draw an 'x' one might for instance write
%!PS
%%BoundingBox: 0 0 100 100
newpath 100 0 moveto 0 100 lineto stroke
newpath 0 0 moveto 100 100 lineto stroke
showpage
Is there an equivalent method to output pdf?
Edit
Please do suggest an inelegant, regular, or luxurious way to output pdf.
An inelegant method would be one that still passes by eps. A regular one would be one that parallels the eps text file above. A luxurious method would be a comfortable API/library.
Edit2
A "regular" solution is platform neutral, but a solution in neither the first nor the third categories is. So let me clarify that I am looking for a solution using Java on android.
As discussed in the copious comments on your question, the ISO 32000-1 standard for the PDF file format can be found here. (Thanks to #mkl for the updated link).
It may not be trivial, but it would certainly be possible to create PDF files from scratch by using the most appropriate parts of this document for your application.
Well if I understood you correctly i might have done something similar in one of my Android applications.
I implemented a domain specific language for interactive questionnaires which results are extracted in pdf format. For the pdf creation I used the iText open source library. So, you can create a dsl and an api, between your dsl and a pdf creation and manipulation library, like iText. However, I don't know if you are interested in a ready solution or to develop something from scratch, so I am not sure if this helps.
You can find the code here github with some more details for the dsl(like the syntax etc.).
And here is the demo app in google play.
PDF is not a simple text format, so it is not that simple. But it should not be difficult to integrate a Java PDF library in your app, see iText, for example.
Related
I am currently working at a project which generates contracts. The idea is that I put the data in a form and save it in a simple database.
So long, this was my favorite place to search for good ideas and simple solutions.
Now I am facing another problem and I don't know how I can solve that. I want to create a PDF and replace some placeholders with some data from my form.
One idea was, that I use an existing Word template with some bookmarks and replace them with the data from my form. Maybe there is a way to do that, and I am just too stupid to find it.
Another idea was, that I am using XML. Therefore, I thought I was clever and just converted the Word template to an PDF, so I am able to convert that PDF to an XML. Attached, you find the XML file. But now I need the XSL file - is there an easy way to create the XSL file?
Or maybe there is another simple solution to solve my problem.
In these attachments you find the PDF file, the Word template and the XML:
Thank you a lot :)
Using a template is a good idea - it makes some changes much quicker to make and then deploy. The comments above are focused on conversion, but don't forget you need to merge your data in (population) first.
If you can use Adobe tools, you can have a PDF template and use the Adobe tools to populate. This saves a "conversion" stage.
You mentioned using Word for templates. This means you to run through two stages of processing:
population - docx is a zipped set of XML files - so you can process them with your own code or using a library.
conversion - you need pdf, so you have to convert the docx to pdf. You also have to watch out for fonts at this stage (ie make sure they are available on your host).
The population stage you could do yourself since you are familiar with XML. But it is definitely complicated. The conversion needs to use a tool that is ideal for it. There are a few mentioned in the comments already.
There are some free/os and commercial tools that can do both parts:
docx4j
JOD Reports
Libre Office (using the Java Uno API) (I blogged this once - Java Convert Word to PDF with UNO)
Docmosis (please note I work for Docmosis)
I suggest starting with the simple example you have attached and prove you can both populate and convert that. Then switch to a more complicated example to see if you can do the other things that might be required (eg repeating or conditions or other logic) during the population stage.
I have checked multiple links and two options were shown for editing MS visio file in Java code.
Apache POI - HDGF and XDGF - Java API To Access Microsoft Visio Format Files
Aspose.diagram APIs
Has anyone done any coding in Java language using above option?
I am using eclipse IDE.
Also please suggest if there is third better way to edit MS visio file using java code.
If you are talking about libraries, these are the two basically. Apache POI AFAIK can't create diagrams, only read, if I am not mistaking - but please verify, maybe something changed since I last looked at that ten years ago.
So this basically leaves you with a single choice. Or you can always spend a few years and write it all yourself. Well, man does not simply walk into mordor create visio files with java.
Maybe you could consider using SVG instead, that can be generated and consumed by basically anything? Visio can also read and write SVG out of the box.
Are there any JAVA APIs or tools that can convert Handwritten Scanned Doc to txt files?
I have tried google tesseract and few other tools , but I am not getting satisfactory results for hand written scanned docs.
Strange that other answers here are pointing out to OCR tools while question clearly states handwriting recongition.
Handwriting is even more difficult area than OCR and number of technologies available is very narrow. I don't think you will be able to find any open source tool for that, while there are few commertial vendors:
http://www.a2ia.com
http://www.parascript.com/
I don't know if they have Java API, but it is better to start researching from contacting them.
You can try the Java OCR Project. I think that you might do the writing to a text file section yourself though.
Also, hand writing tends to vary from one individual to another, so I guess you will need to select some good training data to get good results.
Have a look at these :
Java OCR
Java OCR is a suite of pure java libraries for image processing and character recognition. Provides modular structure for easier deployment .
GOCR
GOCR is an OCR program, developed under the GNU Public License. It converts scanned images of text back to text files.
I've being researching on how to extract images from a big (> 300MB) PDF file. I'm using pdfbox but for some particular reason that I can't figure out, some pages are not correctly extracted.
I'm using the PDFToImage class of pdfbox as base for my code.
So, do you know another library that may help me to do this? I know that iText may be used, but I read that it can't be used for commercial products.
I've installed the packages xpdf and xpdf-utils, and the utility called pdfimages is working perfect. But I need to solve this problem from Java and it should be portable.
I think you're talking about two different things here: extracting images from a PDF, and converting PDF pages to images. PDFToImage will output an image for every page, while pdfimages extracts all embedded images (e.g. a text document has 0 images).
Take a look at org.apache.pdfbox.tools.ExtractImages (source code) to see if it does what you want.
The most likely reason why it is hard working with 300 Mb PDF's is that you run out of memory. If it works well for smaller PDF's I would have a closer look at why it fails.
Have you tried icepdf or JPedal (both pure java)?
I am using iText API to generate RTF using Java. The RTF file is generated fine but one requirement is adding a barcode. What i did is:
FontFactory.register("c:\\windows\\fonts\\FREE3OF9.ttf", "Free 3 of 9 Extended");
return FontFactory.getFont("Free 3 of 9 Extended",20, Font.NORMAL, Color.BLACK);
I tried loading other fonts; that was working fine, but it doesn't work when I use the barcode font (FREE3OF9.ttf).
The RTF file is generated but the the "Font name" looks like "New" instead of "Free 3 of 9 Extended" in MS Word when I open it. When I select the words and choose the font name, the barcode appears fine.
I think there is a problem with Free 3 of 9 Extended Font.
Odd. I don't really have The Answer, but I can certainly offer some advice.
Check the return value from getFont(...). It's entirely possible that the problem is in registration, and you're just getting back a default font. I don't see why it'd be called "New", but definitely worth checking.
If that's not the problem, have a look at your raw RTF output. Is the font in question really "New" or is MS mangling it on the way in?
Check MS's RTF output when you manually pick the font vs your own.
Get the iText source, step through it, see what's wrong.
The source link is to iText v2.1.7, the last version that supported RTF. The guy who worked on it vanished into the web some time prior to that, so we stopped supporting it with 5.0 (along with the licensing change, package rename, and so forth).