I am trying to convert a PDF file to a Wordpress post format, but I am not able to preserve the formatting. I am first converting my PDF to html using a PDF to HTML Java library, and then I copy the html code directly to the Wordpress WYSIWYG editor. But I am not able to preserve the formatting.
Can anybody explain to me how I would be able to preserve the HTML formatting in the WYSIWYG editor?
PDFs aren't web documents by nature, so their structural layout doesn't translate very well to a web environment. So with WordPress it's really hit-or-miss. A lot of elements wont translate well; you'll probably have to do some re-formatting yourself.
Related
My requirement is just to show the DVT hierarchyView in a HTML format in PDF, I don't want to embed the flash content in to PDF.
Also, the above links seems suggesting use of FLEX which we are not comfortable of.
Do you have any other pointers please
In my application, there are notes being fed by user inside browser. These notes can be formatted for font, size, color etc. These notes are saved in database using html tags string.
Now I want to export these formatted text into PPTX. Is there any solution for it? Currently, I have tried Apache POI which allows for formatted text but does not allow input of html string.
I am looking for open source library, so using Aspose is a difficulty. Somehow, I need to render these HTML text and then copy as it is to PPTX.
Any solution or way will be helpful.
EDIT: I am thinking for custom parsing the string html text; using JAXB to convert the tags into objects and then using some java logic to integrate POI with it. Any wayout/ help on achieving this will be appreciated.
Aspose.Slides offers you to import HTML text inside presentation and also exporting presentation to HTML. I suggest you please visit the following documentation link to serve the purpose in this regard. You are right that Aspose.Slide
I work as developer evangelist at Aspose.
I know that there is already PDFbox and iText but they don't have the ability for visual content extraction as well as need to work offline with the pdf. withal, I want a way to do some text and visual content extraction online. do not want to download the pdf file and then do stuff. what kind of API or library is there for Java language?
EDIT for those who find it not clear, I explain some more:
Just imagine when using any HTML parser you can parse a page online, make the DOM or SAX tree and going through their elements and then extracting photos and text based on the content of the nodes in those trees. at least, for photos, you can get their corresponding HTML tags and for text, the same plus you can get actual text. now, I want to know if there is anything similar for doing with PDFs? going through text and images without downloading
Gnostice PDFOne (for Java) has a getPageElements() method that can parse a PDF page for text and image elements. Text in a PDF is not in a DOM like a HTML or XML document. Text just appears in various x-y coordinates and magically looks well-formatted. However, PDFOne has some PDF text extraction methods that reconstruct those text elements to user-friendly sentences. DISCLOSURE: I work for the company that makes this library.
PDFImageStream can do that. There is a free version with only one restriction: it can only be used in single-threaded applications.
On googling,I have found some graph creation software's like amcharts,fusion charts, etc. and PDF creation software's like iText etc which only creates graphs and writes to pdf respectively. Amcharts has an option of exporting to PDF as it exports only graph data but not html data. But my web application has both HTML data(tables,text in tags) and graphs. I have to write both html content and graph data to a single PDF file using javascript or Java .Is there any way to do this.I need to generate a graph from the data in the table and write both table and the created graph to PDF. Please help me.
I don't think that converting HTML to PDF is a simple task. It is because HTML and PDF structures are very different. You need some engine (or library if you want) to render/print HTML file into PDF. Then HTML file can contain anything (text, images, specially layouted elements to looks like a graph, etc.) and some engine could render it if it is good enough.
For HTML to PDF Java library you can see this thread A java library for converting xml/html to pdf
For HTML to PDF JavaScript library you can see this thread https://stackoverflow.com/questions/20029132/javascript-pdf-generator-library
PS: Some version of amcharts works with flash only. It is impossible render into PDF using library. I recommend you to use something else for example d3.js, it si awesome JavaScript library for data visualization (like interactive charts).
Which APIs in java help in extracting table metadata from a pdf, and presenting that table in a web page?
The result should be that when the source of page is viewed it will show the html code of that table.
Itext is usefull in this context
http://itextpdf.com/
I assume that, you need a PDF library for Java.
PDFBox is one of the popular libraries created to PDF manipulation and I think it is worth to look at it.
try The Metadata Extract Tool which extracts metadata from specific file types including PDF. Then you can parse the xml output with any Java XML parser. Once you're able to parse it, elements can be easily laid down in your view page.