Any clue about a good library to programmatically produce a PDF in Java, using a PDF as a template?
try iText. It has many many goodies. There is also a book about it by Manning iText in Action
If you want to be able to edit the Text in the template, you should set up the template carefully and use forms for text content. You can't easily replace text in PDFs because they do not contain text structure.
There is a blog article highlighting some of the issues at http://pdf.jpedal.org/java-pdf-blog/bid/17370/Problems-editing-PDF-files
Related
I have a PDF file that was produced with iText and created with JasperReports (I don't know if it's relevant) and I was wondering if I can find some API or anything to see the structure because I need to extract text from it.
I tried with iText, PDFBox and other Java libraries but I only get text line by line and that's not what I need.
I also tried conversion in HTML, XML, DOM but I get the same result with text extraction, no structure parsed.
If I try to open it as DOCX I see that Word recognize sort of structure, for example an area that looks like a table in PDF, after conversion in DOCX it is actually a table.
I need to understand how the PDF was created, if this is possible. I know that working with PDF's is not easy, but I need to start with something useful. Thanks!
PDFTron PDFGenie can do full semantic table and paragraph extraction from a PDF file. It can generate a reflowable HTML file containing all the appropriate HTML tags for tables and paragraphs.
See this blog for more details.
https://www.pdftron.com/blog/parsing-extraction/table-extraction-and-pdf-to-xml-with-pdfgenie/#a-idpart7aevaluating-accuracy-of-pdf-table-recognition
You can download Windows/macOS/Linux PDFGenie command line tool here.
https://www.pdftron.com/downloads/linux
One more option, we can extract from Aspose PDF also, if you want look into the below link
https://blog.aspose.com/2018/02/28/extract-text-by-paragraphs-and-convert-files-to-pdf-with-aspose.pdf/
In my application, there are notes being fed by user inside browser. These notes can be formatted for font, size, color etc. These notes are saved in database using html tags string.
Now I want to export these formatted text into PPTX. Is there any solution for it? Currently, I have tried Apache POI which allows for formatted text but does not allow input of html string.
I am looking for open source library, so using Aspose is a difficulty. Somehow, I need to render these HTML text and then copy as it is to PPTX.
Any solution or way will be helpful.
EDIT: I am thinking for custom parsing the string html text; using JAXB to convert the tags into objects and then using some java logic to integrate POI with it. Any wayout/ help on achieving this will be appreciated.
Aspose.Slides offers you to import HTML text inside presentation and also exporting presentation to HTML. I suggest you please visit the following documentation link to serve the purpose in this regard. You are right that Aspose.Slide
I work as developer evangelist at Aspose.
I have a project, i have to get title,author informations from inside of the PDF file(not from metaData). So i try to read text from PDF by given coordinates and try to get fonts of texts.
Is there any way to do that, can anyone give advise ? Or is there another solutions to do my project?
Thanks for every help and thought you're sharing with me.
There are multiple PDF libraries for Java which allow you to extract text, my favourite being iText, as examples for text parsing have a look at ExtractPageContentArea and other examples from chapter 15 of iText in Action, 2nd edition.
Currently there is no example making use of the font information, but the information is available to the RenderListeners.
Which APIs in java help in extracting table metadata from a pdf, and presenting that table in a web page?
The result should be that when the source of page is viewed it will show the html code of that table.
Itext is usefull in this context
http://itextpdf.com/
I assume that, you need a PDF library for Java.
PDFBox is one of the popular libraries created to PDF manipulation and I think it is worth to look at it.
try The Metadata Extract Tool which extracts metadata from specific file types including PDF. Then you can parse the xml output with any Java XML parser. Once you're able to parse it, elements can be easily laid down in your view page.
I was looking at using iText to create both a pdf and html version of a document with RTF as a possible option. According to this question this is no longer possible with iText. Is there a library that will allow me to create a document in Java and output it as both PDF and HTML? The ability to output RTF would be nice but is not required.
As that answer to the other question states, you can just use the iText RTF Library.
I have used PD4ML to convert HTML to pdf. Even though it is a commercial app. It is very reliable and supports CSS well.
JasperReports. If you look at this package it supports export to:
pdf
html
rtf
xls
xml
You have two options to create the documents:
via iReport - a visual designer for reports
via an API, where you construct everything with Java code.
Note that even though JasperReports's main function is to create reports, it can very well create other documents, with no tabular data for example.
You could also try Docmosis since that supports the output formats provided by OpenOffice (including the ones you specified) and you can often do the job with a lot less code.