How do you render HTML using Java? - java

I am working on building a web browser in Java. I have used a parser (JTIDY) to parse the HTML page into a DOM document. I know that rendering means the graphical representation of DOM document. But I don't know what the engineering process is for HTML rendering in Java. In Java there is a JEditorkit to parse and render, but it is not well-formed. So I want a better solution.
My question is: what is the actual process of HTML rendering and showing the complete web page in Java?

You can try either the Cobra renderer & parser or Flying Saucer project.

There is the pretty much vaporware JWebPane that is part of JavaFX, but who knows when and if that will ever come out.
You can use MozSwing which integrates the Mozilla rendering framework with the Java Swing toolkit.

Related

converting ppt to html

I want to implement a function that can see PowerPoint on the web at this time.
You can do it simply by converting PowerPoint to an image, but if you convert it to an image, I think there are issues that you can not use video or audio.
So the idea was to convert PowerPoint to HTML and place it where I wanted. However, it does not have much ability to directly implement the pure function of converting PowerPoint to HTML. To solve this problem, I have been looking for open source or various libraries, but I have not found them yet.
The development environment is java8 + Spring Boot.
If you are OK with converting your PPT files to PDF before converting them to HTML, then pdf2htmlEX could be worth looking at. It is the best tool I could find for this kind of work, as it is capable of converting PDFs to HTML very precisely (have a look at the exmples 1,2,3,4). You should be able to find wrapper libraries in the maven repo so that you are able to call it from your Java applications.
If you are OK in using iframe you may use a Microsoft solution https://products.office.com/it-IT/office-online/view-office-documents-online
You may use this code:
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=[you_ppt_url]' width='100%' height='600px' frameborder='0'>
There's an older node package called PPTX2HTML. It outputs a bunch of garbled code on a canvas element, but it might work. They even have a demo website to try it out. They seemed to have broken the powerpoint up into parseable XML and rendered the elements.

PDF to wordpress

I am trying to convert a PDF file to a Wordpress post format, but I am not able to preserve the formatting. I am first converting my PDF to html using a PDF to HTML Java library, and then I copy the html code directly to the Wordpress WYSIWYG editor. But I am not able to preserve the formatting.
Can anybody explain to me how I would be able to preserve the HTML formatting in the WYSIWYG editor?
PDFs aren't web documents by nature, so their structural layout doesn't translate very well to a web environment. So with WordPress it's really hit-or-miss. A lot of elements wont translate well; you'll probably have to do some re-formatting yourself.

Convert html+css+js to PDF

I want to create something like this (code is here):
in pdf format. I'm using google charts and regarding to this forum converting chart to pdf is impossible. I've already tryied iText+XMLWorker, but there is some problem with css and any js supporting at all, I think.
So, the questions are: How can I convert html+css+js to .pdf file? Or, may be, the issue have other variants?
As promised in the comment, I've asked Raf. This was his answer:
One way to use XML Worker for HTML+CSS+JS is to use a browser engine to preprocess the HTML. Examples of such a browser engine are WebKit (Chrome, Safari) and Gecko (Firefox). These can interpret the CSS and JS and give you HTML that is ready to be parsed by XML Worker.
Examples of competing products are:
wkhtmltopdf, a command line tool that uses WebKit as its rendering engine.
Prince XML supports HTML+CSS+JS to PDF using their own engine.
Maybe there are others, but this is what Raf told me. I hope this helps.

using jpedal to extract hyperlinks from html? --java

JPedal library in java is usually used to convert pdf to XML or HTML. However, I needed to know if we could extract data from HTML5 document and save it to XML using JPedal library API?
Is there any other possible alternative to this?
Also , I am trying to parse HTML5 document using Java and store it in XML. are there any good solutions to find just specific tags and render an XML out of them?
Please do let me know . Thank you.
There are a number of Java HTML parsers out there, but I recommend using the HTML5 parser from validator.nu available for download from here: http://about.validator.nu/htmlparser/.
Written to use the HTML5 parser algorithm by one of the main protagonists of HTML5, Henri Sivonen of Mozilla, you won't find a more reliable HTML parser and it creates a true DOM that can be manipulated using standard XML tools and queried for hyperlinks using XPath. There are examples of how to use XSLT transformations with it and how to get an XML serialization of the created DOM.

Are there any tools for parsing HTML using GWT

in my GWT application, on the client side I have a string containing html. Is there a good way to go about parsing that and finding specific html tags within it and returning the id's of those tags?
Any help would be much appreciated, thanks!
Check out GWT query. It is a jQuery like API for GWT that allows easily traversing and manipulating HTML.
You could attach your HTML string to the DOM - using Element.setInnerHTML(yourString). That way you're using the browser's parser. Attaching it to an invisible element or an invisible iframe should hide whats happening from the user.
For the querying you can use GWT's DOM functions if you want to stick with plain GWT. Using JavaScript directly or any JavaScript library like jQuery are also options. GWT query might also be an option, but I haven't used that yet.
UPDATE:
This approach can be abused by XSS (cross site scripting) attacks - so you must either trust or sanitize the HTML string.

Categories

Resources