I have an HTML page that has Javascript code. It needs to be rendered first before it can be converted into an image.
I am aware of projects like wkhtmltoimage, PhantomJS, khtmltopng, webkit2png, PrinceXML and html2image. I have implemented a few of those but I am trying to find a pure Java solution that does not have to use Process to execute a command. Any help would be great, thanks!
edit: I looked into Cobra however it seems that the JS support is still in dev and it does not parse my html file properly.
Or if there are any other ways of doing this, please let me know. I am just trying to find the best solution possible.
There is no pure Java solution - no one has written a browser in Java that supports HTML 5.
I'd try either of these approaches:
Use env.js + rhino to simulate a browser in which you can run the JavaScript. That should give you a DOM which you can render using FlyingSaucer, for example.
Add SWT to your classpath (plus the binary for your platform). It contains a Browser component that uses your system's browser to render URLs or an HTML string.
You probably need SWTBot to run the browser in headless mode.
If that doesn't work and you're on Linux, then you can start an in-memory X server Xvfb to open your browser. Or you can use vncserver to start a desktop on your server.
[EDIT] The project phantomjs might do what you want:
PhantomJS (www.phantomjs.org) is a headless WebKit scriptable with JavaScript or CoffeeScript.
[...]
Use cases: Headless web testing, Site scraping, Page rendering
Multiplatform, available on major operating systems: Windows, Mac OS X, Linux, other Unices
Fast and native implementation of web standards: DOM, CSS, JavaScript, Canvas, SVG. No emulation!
Pure headless (X11) on Linux, ideal for continuous integration systems. Also runs on Amazon EC2.
The quickstart page explains how to load a web page and render it to an image.
I have found a solution using WebRenderer. WebRenderer is a paid solution and has a swing, server, and desktop edition. The swing edition is the only one that supports HTML5 as of 7/9/2012. However, the swing edition can be used on a server to convert the image by instantiating the browser and not creating a JFrame. See this question.
Related
I am a Java programmer. I would like to write a client-side Java program that adds-on to Firefox to perform operations on the HTML received from a specific remote web site, BEFORE that HTML is displayed in the user's browser. The client side Java program would have to:
Locate and read specific files on the local (end-user) machine on which it resides.
Check the URLs of web pages requested by Firefox.
If a URL requested through Firefox contains a specific domain:
Iterate through the HTML text looking for startcode and endcode.
Slice out the string between startcode and endcode.
Transform the string between startcode and endcode using info from file on local pc.
Replace the string between startcode and endcode with the transformed string.
Allow the Firefox browser window to display the modified HTML.
Basically, the Java program would intercept incoming HTML from a specific web site and alter the contents before the contents are displayed on the user's screen. How would I go about writing this kind of program?
Of course, I have administrative privileges on the computers that would run this program. But I have never written a browser add on before. I would like to write it in Java, but the code would need to always be on the client computer. The code could never be on the server. I do not know where to start this project.
#Athafoud is correct in general. No browser supports Java out of the box.
Instead:
You can write browser extensions for Firefox, Chrome, Safari, Opera in Javascript. E.g. the firefox-addon has a link list to get you started with Firefox extension development.
You can also write browser extensions for Firefox in C/C++ (to some extend) using either js-ctypes or XPCOM.
You can write some limited C++ stuff for Chrome via their NaCL APIs.
You could potentially write Java Applets for browsers that support the Java plugin and bundle them with and script them from your extension (to some extend) but that is a PITA.
Firefox extension APIs are the most capable as anything Firefox can do, extensions can do too (incl. calling into external libraries). Other browsers have far more limited extensibility/extension-facing APIs (due to architectural issues and sometimes in the name of security, although that bold security claim is... well, bold).
As for the particular requirements you gave in your question:
Firefox extensions are capable of transforming raw HTTP responses (although this is a bit cumbersome), as well as the DOM once HTML is parsed (from javascript). Firefox can read/write all files in the file system (abiding OS-level ACLs, of course).
Chrome extensions are not capable of transforming raw HTTP responses ATM, but you could modify the DOM once parsed. Also IIRC Chrome cannot read arbitrary files by default but you can manually enable read-access.
I dont think that you are able to use native java to write a firefox addon. You can use javascript. A good place to start is on Mozilla documentation site.
There is also a good guide here shortest-tutorial-for-firefox-extension, it is a bit old and the SDK has change, but i think is good start.
And a more update from mozzila itself how-to-develop-firefox-extension
I have an old tool an (ex-)colleague wrote a few years back with Jaxer, that I'd like to replace/rewrite.
Jaxer is an (abandoned) server-side framework based on a headless Mozilla/Gecko-Browser allowing you to use JavaScript and the DOM server-side.
Since Jaxer is abandoned and because I have big problems installing and running Aptana Studio 1.5 with Jaxer on a new computer, I'm looking for a library/framework/something on which I can base a new version.
This tool is only run locally inside Aptana Studio (the IDE for Jaxer) and was never intended to be an actual web app. It crawls our customers websites by loading them page by page into the server-side Mozilla. In order to do that it uses jQuery and predefined CSS selectors to find the links in the menus and parse other information out of the pages. The final result is basically a glorified sitemap.
I'd like to keep this modus operandi if possible and continue using jQuery/JavaScript/the DOM to load and parse/access the pages, but it can be wrapped in a framework based on another language such as Java. I considered writing something based on Gecko myself, but that seems a bit over the top, so I'm open for an other suggestions.
As far as HTML crawling/parsing goes:
http://ccil.org/~cowan/XML/tagsoup/
or
http://jsoup.org/
I needed a headless browser to parse pages.
HtmlUnit allow me to setup a Heroku Java app to fullfil this purpose.
But now I'm meeting with couple of issues.
The current one is malformed url "//path" instead of "/path" or "http(s)://path".
I downloaded sources of the 2.9.4 version and pushed tiny fixes in the sources ...
It's not really efficient to modify standard sources for obvious maintainability reasons.
I'm so wondering if i'm not digging in the wrong direction.
HtmlUnit is designed to browse pages in a testing purpose. Mine is to do like a browser, so make page working the most possible, especially because my damned targeted websites are the kind of ultra-dirty-not-respecting-anything...
What is your opinion about this retrospection ?
HTML Unit is used in Selenium 2/Web Driver for headless browser "simulation". There it works fine.
So I see no reason not to try Html Unit. May you can have a look at Selenium 2/Web Driver too.
IDEA: Implement a recent web browser into a java application (for saved offline, non server content).
The question is this: can I have a java application implement a webbrowser with jquery / html / css support within a java program?
So I am asking anyone who has played with JRex for advice: I want to know how complicated will it be to integrate an open source webbrowser into java. I am not all that keen on the idea of compiling Mozilla from source build. Is there a ready made compiled version?
Is there a simplified method to have latest compiled version (most current in terms of support for HTML css & javascript), and integrate that into an application?
Also: I appreciate the amount of work required to support for HTML4 nevermind 5, and CSS2 compliance. How close is JRex to that?
Application: My intention with the webbrowser is to render a webpage from offline content. It will not need to be online content, and will simply be for file based displays = e.g. file:///C:...
Does the webbrowser have to be wrapped into a server to function, e.g. to pass files to the browser to render is how complicated? I am not keen to have to implement Jetty or another server type application just for this.
If JRex is not the solution... what then? Is it possible to start a browser implementation within Java and can Java interact with the information and traverse the Dom?
Or alternatively is there .hta equivalent in recent browsers like firefox?
If you need to have the embedded browser interact with your application code, you could try the SWT Browser control, it's actually maintained as opposed to JRex. Browser uses either WebKit or Gecko or embedded IE as appropriate, or lets you choose which one you want, so it should run jQuery and familiar Javascript. And since SWT is a JNI library to begin with they probably already have guidance on how to deploy an app that uses JNI.
You can feed HTML into the control from a string (example) or a java Url - which can point to local files or resource files in your JAR, which I assume will let you split your app into different files.
To call Java code, you need to expose it as Javascript functions. example
To manipulate the HTML from Java code, you need to call Javascript functions from Java. example
To make the previous two tasks easier, you might want to look into a JSON library to simplify passing around complex data.
Does it have to be implemented within a Java program? Could you let the user use the default browser on their machine (ie does it matter what browser)?
If not would use the Java Desktop API.
if (desktop.isSupported(Desktop.Action.BROWSE)) {
txtBrowserURI.setEnabled(true);
btnLaunchBrowser.setEnabled(true);
}
If you are using Java 1.5 try http://javadesktop.org/articles/jdic/
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is there a way to embed a browser in Java?
I am working on an application where I have to embed a web browser with application. Any ideas on how to achieve it? Also will I be able to interact JavaScript from within the code?
see this post .... Embed a web browser within a java application
check out Eclipse, it has an embedded browser which is configurable by the user (they support multiple browsers).
you can probably embed their browser even if you use an AWT based application using the AWT_SWT bridge if you are using Java swing application.
this article may help you get started.
Check out Lobo
[Lobo] is an open source web browser that is
written completely in Java. Lobo is
being actively developed with the aim
to fully support HTML 4, Javascript
and CSS2. Lobo also supports direct
JavaFX rendering.
For very simple pages, you can use the JEditorPane, which is in the Swing API: see doc
I'm afraid that embedding a real browser is your only option, if you need JavaScript capabily and AJAX etc. look at JavaXPCOM API in order to see how easily you can embed FireFox. There is also JDIC, that will allow you to embed IE in Windows environments. You will need some DLL files but the procedure is straight forward with both APIs and well documented.
Let's admit that JEditorPane is fine, but mainly for HTML you have control over. I use it only to render HTML help files into my application. Once you start visiting sites with it things start getting nasty.