Open Source HTML to PDF in Java (2014) [closed]

Open Source HTML to PDF in Java (2014) [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've been searching high and low for an up to date solution to this age old problem.
Long story short I want to take css + html -> pdf and do it in java.
I don't want to use an API as the data is sensitive. Googling provides me with countless sites/services that offer to do this but I'm looking for a stand alone tool and looking for one that will work nicely from my java server. I've found this awesome looking command line tool but it's a command line tool and spawning processes off a web server starts to get sketchy IMO (but I'm always willing to hear otherwise). Additionally flying saucer seems to be a standard choice, but I've heard mixed reviews.
Here is a 5 year old question on the subject, but I figure things have changed! Especially with all the work being done in the area of front end unit testing with dom manipulation I figure there might be some less than conventional solutions and I'm willing to hear them all!
Any help would be greatly appreciated.

You might try a combination CSSBox that converts HTML+CSS to SVG and then use for example Batik for creating your PDF as proposed for example here. FlyingSaucer could also do the job.
The choice depends on your further requirements. E.g. are you processing "street HTML" or well-formed documents? What about the pages in the resulting PDF? What about interactive elements in the HTML pages?
I mean the only way is to try at least some options practically and then you may ask more specific questions about some particular problems.

Related

Is there a Java library for all HTML tag names? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I was wondering if there exists any Java library that contains all HTML tags. I am writing selenium tests for fairly complex web sites using the Java binding, and often needing to find an element by tag name. I thought having a class with constants referring to each tag name would be nice. Since there is a finite list of HTML tags, I'm thinking this must already exist. I could begin writing mine, of course, but why reinvent the wheel if there is one out there. I have checked the Selenium Java API documentation but can't find any. Any suggestions?

No, I do not believe such a library is currently available.
Although there are a finite number of STANDARD tags in html, there is also the ability to have USER DEFINED tags. There are also different versions of HTML (current is HTML5) that support many different tags For example the tag is no longer supported in HTML5, but does exist in older versions of HTML. These two factors may make it increasingly difficult to create a definitive library of all tags.
The best option would probably be to create your own, personalized library for the project.

What is the best way for a Java developer to generate Javascript without writing Javascript [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am an experienced Java programmer, and Im trying to create a website with much of its content based on dynamic data from a database. The scope of the website is quite small with only about 5 webpage designs required ( although the user will see thousands of different pages generated from the data), but each page is quite complex.
I decided to go with plain old Java and Servlets as I understand this well, I also understand html and CSS so have no real difficultly generating the basic html pages from the data.
My problem lies with the addition of Javascript to improve the user interface. Ive tried using Javascript a few times over the years and always make very slow progress, if there is an off the shelf well documented solution such as a Jquery widget then I okay, but if I need to modify it or create custom Javascript I always get stuck.
Im looking for any alternative to writing pure Javascript. Im not looking at learning a new framewotk for the complete site, or for a way to abstract the html because I understand that and I don't really like deploying generated code that I didnt write.
But in the case of Javascript I would consider generated code, is there a tool that I could use to generate Javascript without writing Javascript that I could then reference from my webpages, or it impossible to consider Javascript and Html in isolation from each other.

Jeremy Ashkenas's public List of languages that compile to JS lists pretty many (~hundred) options.
The section for Java/JVM to JavaScript alone lists 15 choices.

Coffeescript is a language that generate Javascript. I haven`t used it, but friends that develop in Javascript have told me that Coffeescript is a nice tool.

Convert PDF to Word in Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is it possible to convert PDF to Word in Java? I'm not talking about parsing a PDF document and then custom render it again to Word. I want a Java library that can directly convert it.

Reading PDF documents is a very involved process and there are no good free libraries for extracting non-text information from PDF documents in Java. Worse yet, PDF documents have a lot of layout information that is hard to reconstruct, for example a table in a Word document becomes some lines and a bunch of pieces of text in PDF.

It is almost impossible to recreate semantic information from an arbitrary PDF. If you have the same tool that wrote it you have somewhat more chance but even so there is much uncertainty. The only thing you can be sure of in a (text) PDF is the position of each character on the page. (Note that some PDFs include bitmaps in which textual information occurs and that has to rely on OCR).
There are several groups in computer science departments and elsewqhere who are spending very significant effort to try and get semantic information. We collaborate with Penn State - one of the leaders - and they are working on extracting tables. In good casees they get 90% in bad ones 50%.
So the answer is formally that you cannot, but you may occasionally be fortunate. (We do a lot of this for chemistry and count ourselves lucky if we get 50% on a regular basis).

You can try to do it with the iText library. Read the PDF and then write it as an RTF.
This is not that simple though, as you have to preserve the different style that the PDF has.
You can use some external tools.
Install some free program like "Free PDF to Doc" and execute it from you java program.
This Works fine in most cases.
use the Acrobat Pro SDK from you java code.
Best of luck

HTML rendering algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 months ago.
Improve this question
I am making an e-book reader for the J2ME and I wonder if I could make it render HTML pages. For the moment, I am using some simplified styling of my own.
So, could anyone point me to a good in-depth tutorial or a specification of an open-source HTML engine? Of course, I have some idea about it all, i.e. the main steps involved, the usage of finite-state machines an so forth, but it's not enough.
But why reinvent the wheel, when it's complicated enough? Do you know of any HTML engine written purely in Java, and light enough to be used as a lib in a J2ME project?
P.S. For the J2ME know-hows:
Porting from Java SE to J2ME is not necessarily an issue for me
I am not yet concerned about the inability (or at least unsuitability) of using vector fonts
UPDATE
If you could only point me to a detailed guide about layouting HTML code, I'd be more than grateful! I need to layout some very simple HTML, like text with basic styling, images, divs and tables. That's all.
(I know it's not trivial even though I need simple layouting, that's why I am asking.)

Webkit comes to mind.

I think Firefox uses Gecko Layout engine. Could prove helpful. More here
https://developer.mozilla.org/en/docs/Gecko and
https://wiki.mozilla.org/Gecko:Home_Page and
For some videos http://redivide.com/blog/gecko-reflow-awesome-visualization-of-web-page-layout/

Dear me, I seem to be answering my own question.
The only possibilities that I found are:
J2ME Polish HTML Browser Component
J2MEHTML
Fire
Unfortunatelly, neither of these seems to be agile enough so that I could implement it for my own puproses, which are:
render on any Graphics object
support for bitmap fonts
split content to pages
TeX hyphenation
be able to obtain the word (if any) at a given point on the image.
This all I've done, but the trouble is that it is not rendering html, but custom and limited styling.

I googled and found Cobra

Another option would be LWUIT
It has an HTML component in last version.(see http://www.nextgenmoco.com/2010/05/css-support-added-to-htmlcomponent.html)
LWUIT is a swing-insipered set of UI components for J2ME, it's open source and had some sort of SUN support, I don't know if oracle will still support it.

JSP/HTML Page to PDF conversion [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
How to convert to PDF from my JSP/HTML file?.
I want to convert a particular part of my webpage to a PDF file. Is it possible?

Yes. Take a good look at booth Apache FOP and iText. No matter what you use, you'll probably have to do a little fiddling.

I used HTMLDoc a couple of years ago and had pretty good luck with it.

try wkhtmltopdf. It is a command line utility that can be provided an html file or web address and a save location for the pdf. Very easy to use and utilizes the same rendering engine as safari. Works MUCH better than many of the other parsers that I have used (that don't always support CSS and other advanced layout features.

Take a look at html2ps (Perl) or html2ps (PHP). However, none of the two is implemented in Java.
You might also want to read this article.

flying saucer library is the best one to use. It works on top of itext and makes the task of conversion very easy.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.