My requirement is just to show the DVT hierarchyView in a HTML format in PDF, I don't want to embed the flash content in to PDF.
Also, the above links seems suggesting use of FLEX which we are not comfortable of.
Do you have any other pointers please
Related
I am trying to convert a PDF file to a Wordpress post format, but I am not able to preserve the formatting. I am first converting my PDF to html using a PDF to HTML Java library, and then I copy the html code directly to the Wordpress WYSIWYG editor. But I am not able to preserve the formatting.
Can anybody explain to me how I would be able to preserve the HTML formatting in the WYSIWYG editor?
PDFs aren't web documents by nature, so their structural layout doesn't translate very well to a web environment. So with WordPress it's really hit-or-miss. A lot of elements wont translate well; you'll probably have to do some re-formatting yourself.
I know this may sound silly to some of you experienced guys out there but it’s really important for me and my group at school, we need to create a software that allows the user to create a new RTF document from scratch (like an editor where you can center, change font size, style, save, insert picture), it also needs to be able read a docx document with images and format included and save it as a RTF document.
What we have done so far is being able to open the .docx document, extract the text without format and put it into an RTF document out. In other words using docx4j library we have been able to transform a .docx document text to .rtf, no pictures included, no formatting, just plain text surrounded by [ ].
We have made some progress today but we can’t figure out the next steps, considering the delivery date is in 72 hours, I thought it’d be a good idea to ask for help from more experienced people than us.
Please leave your answers or request info about the project, we’ll be glad to learn from you guys
To convert a .docx to .rtf use a library like https://code.google.com/p/jodconverter/. It will do all the heavy lifting for you.
Anyway, now about your editor itself. If I had to do it as fast as I could, I would use JavaFX to make my interface. There is a control called "Rich Text Editor" (http://docs.oracle.com/javafx/2/ui_controls/editor.htm) which you can just put into your application.
The trick here is that you can actually extract the HTML of the editor using getHtmlText(), and then you can the HTML to RTF using... yes, a library. I suspect that jodconverter can do this too, but if not, you can look at this question: Convert HTML to RTF in java?.
This should give you a better idea of how to do your project. There are Java libraries to handle conversion between HTML and RTF, so you can use an HTML editor (provided by JavaFX). And of course, a .docx can be converted to HTML too. Let libraries do all the dirty work :).
I know that there is already PDFbox and iText but they don't have the ability for visual content extraction as well as need to work offline with the pdf. withal, I want a way to do some text and visual content extraction online. do not want to download the pdf file and then do stuff. what kind of API or library is there for Java language?
EDIT for those who find it not clear, I explain some more:
Just imagine when using any HTML parser you can parse a page online, make the DOM or SAX tree and going through their elements and then extracting photos and text based on the content of the nodes in those trees. at least, for photos, you can get their corresponding HTML tags and for text, the same plus you can get actual text. now, I want to know if there is anything similar for doing with PDFs? going through text and images without downloading
Gnostice PDFOne (for Java) has a getPageElements() method that can parse a PDF page for text and image elements. Text in a PDF is not in a DOM like a HTML or XML document. Text just appears in various x-y coordinates and magically looks well-formatted. However, PDFOne has some PDF text extraction methods that reconstruct those text elements to user-friendly sentences. DISCLOSURE: I work for the company that makes this library.
PDFImageStream can do that. There is a free version with only one restriction: it can only be used in single-threaded applications.
On googling,I have found some graph creation software's like amcharts,fusion charts, etc. and PDF creation software's like iText etc which only creates graphs and writes to pdf respectively. Amcharts has an option of exporting to PDF as it exports only graph data but not html data. But my web application has both HTML data(tables,text in tags) and graphs. I have to write both html content and graph data to a single PDF file using javascript or Java .Is there any way to do this.I need to generate a graph from the data in the table and write both table and the created graph to PDF. Please help me.
I don't think that converting HTML to PDF is a simple task. It is because HTML and PDF structures are very different. You need some engine (or library if you want) to render/print HTML file into PDF. Then HTML file can contain anything (text, images, specially layouted elements to looks like a graph, etc.) and some engine could render it if it is good enough.
For HTML to PDF Java library you can see this thread A java library for converting xml/html to pdf
For HTML to PDF JavaScript library you can see this thread https://stackoverflow.com/questions/20029132/javascript-pdf-generator-library
PS: Some version of amcharts works with flash only. It is impossible render into PDF using library. I recommend you to use something else for example d3.js, it si awesome JavaScript library for data visualization (like interactive charts).
I am open to alternative solutions, so here is my problem.
I have 111 PDFs that contain information on various degree programs. I can convert them to HTML using freeware.
The problem with the HTML is that it contains CSS, JEditorPane doesn't display the webpage, and the PDF libraries are slow and bulky.
I want to have a JCombobox where users can select a page to view, and have it appear below the box.
Any ideas on the best method?
Use iText API for PDF and DJ Project for HTML/webPage.
Most libraries that convert PDF to HTML can also convert PDF to image.
If you can convert to SVG and display it using one of the SVG libraries (E.g. Batik), that would be one way you could display it without losing any functionality like zoom in/out.
Otherwise you can convert PDF to high-res JPG/PNG and display it in your app.