In a web browser written in java different types of parser have been used to do the parsing and create a DOM document. In the process of rendering how the browser visualize the DOM into J-Component . Can anyone tell me about the whole process of implementing of DOM into J-Component to show the whole web-page in java ?
Here is a link where you can find how to display a DOM Hierarchy into JTree (subclass of JComponent) component:
http://download.oracle.com/docs/cd/E17802_01/j2ee/j2ee/1.4/docs/tutorial-update2/doc/JAXPDOM4.html#wp64186
Hope it will help you.
That is far too large a subject for this forum - Unless you restrict the browser to a specific version of HTML without CSS, without JavaScript (or other scripting languages) and without any embedded objects.
You could look at existing code if you can work within the GPL and other licences.
Well, basically you implement the HTML and CSS standards. Doing so completely and correctly is a HUGE amount of work, several man-years at least. There are some projects are are attempting this, but none have been very successful so far.
Related
I need to access HTML elements from my Java program based on the id or className of the element (like getElementByID or getElementsByClassName). I also need to be able to click a few buttons on the page.
Main Points:
I am creating a desktop application. It is not a web app.
I need a browserless solution
My code needs to auto fill a form and submit it without opening a page.
Are there any libraries out there that could suit my needs, or could I achieve this in plain Java code? If my question isn't clear please let me know and I will try to explain it in a better way. Thank you.
Selenium is a library used for scraping html elements using java. It is often used for testing. There is another library called Vaadin. It is used for doing front-end work on the back-end using java. It is similar to Java Swing but used on the web. Hope this helps!
Is it possible to make a JEditorPane understand more advanced HTML and CSS?
I mean, when you enter HTML or CSS such as <button>, <progress>, or most other tags, it does not understand them. Is there any way to make it understand and display more complex and advanced coding? If it is, how would I do it?
Thanks!
Yes. Java 7 EE has HTML5 functionality although I'm not sure about specific tags. Also, there are external packages, like one from Intel freely available and you can always extend the functionality yourself in some way or another.
Edit: addendum to my 2nd suggestion (implementing the topic in the linked question)
The Intel JWebEngine (which might be what you are already looking for) documentation says:
What are the advantages versus the built-in Java HTML editor?
The built-in Java HTML editor of Java is very limited. It only
supports HTML 3.2 with some features of HTML 4 and CSS. The support of
CSS is wrong in many cases, especially when it comes to cascading
selectors. The text is often too small and on many web sites Java only
shows an error instead of a incorrectly formatted page.
Here are some further differences:
JWebEngine can display the ACID 1 test correctly - Java does not
JWebEngine can display the ACID 2 test partly - Java only displays a
blank page because an error occurs
JWebEngine basically shows HTML pages like a browser with JavaScript disabled
In cases where the CSS
and HTML specifications are not explicit about implementation
JWebEngine is designed to mirror Mozilla FireFox's behavior.
No it just understands basic HTML 4.
You could use a JavaFx WebView inside a JFXPanel
there
I am working on a project which would translate the html code of a web into a specific JS library using JAVA, so that the div blocks can have different dynamic behaviors.
To translate the html div into a JS object, I have to know the coordinates of it as well as the width and length.
I turned into several JAVA html parser library: http://java-source.net/open-source/html-parsers
But none of them have this functionality except Cobra http://lobobrowser.org/cobra/java-html-parser.jsp . It has a rendering engine which could provide the coordinates and dimension of a div. But this library turns out to be really buggy. I cannot even run through its test which comes with the library.
Does anyone know how to handle this problem? I would really appreciate it if you could help!
Thanks in advance!
Phil
You could try some component of HtmlUnit, which emulates a browser. Honestly though, I think you need to think about your question more carefully. JQuery can do the 'different dynamic behaviours' thing you talk about via modification of the HTML DOM (Document Object Model) with Javascript, and if you need anything in the HTML document, inspection of the DOM via Javascript should be your first port of call. Java should not be required anywhere (unless you're using it server-side for page and input processing with JSP or some similar tech). Any responses to client input can be triggered server-side and sent to Javascript on the client-side, which triggers JQuery actions that modify the DOM.
So here's the challenge... I need to create clean HTML from random web pages out there in the wild. My goal is to read in a page and pass it off to a library which will in turn give me back perfectly well-formed HTML.
Doesn't sound so tough, right? After all, every browser on the market effectively deals with the challenge of malformed HTML and turning it into something render-able with nearly every page load. Each has its own slightly particular algorithm for cleaning up the contents (ahem...for HTML < 5 that is), but they tend to do a very good job of capturing what i like to refer to as the author's intention. So then, why can't I find a good java library for this very task?
One thing to mention is that I'm not at all interested in parsing the HTML as XML. I've found that libraries such as NekoHTML, TagSoup, HtmlCleaner, and JTidy (to name a few) are more focused on solving the problem of converting to HTML to valid XML, and in the process, they lose sight of how the poorly-formatted document should be re-structured. With nasty HTML they frequently don't capture the author's intention and spit out documents that render quite differently from the original source. And for this project, it's of the utmost importance that the two documents render similarly.
I am quite fond of Jericho HTML, but it doesn't seem to be the ideal candidate for this job...at least not without a lot of effort on my part. Also, Native dependencies are a no-go, so the mozilla parser is out.
Can anyone help me in my search for the perfect HTML parser? Thanks in advance!
JSoup I would say
See Also
which-html-parser-is-best
I have used HTML Tidy in the past.
TagSoup?
I have a project where they want me to embed a website into a java application and they want the website to have a similar color scheme as the rest of the application. I know a lot about CSS and building websites but I am not aware of a way to change the look of a website as it comes in on the fly. Is there someone who can help?
Update:
I don't have access to the header because it is not my website. To give more info about the project is we have a browser embedded in a java client application. The user needs to access a website that displays the contents of a database. I have no access to the original html or css from the site.
What i need is to change the background color and font sizes of the incoming webpage to match the look and feel of the java application.
One approach would be to replace their CSS with your own.
You could also take the approach used by the Stylish plugin, which involves a lot !important decelerations to override the site's CSS. Since this is a Java app, I assume the user will not have opportunity to supply their own CSS, so using !important here doesn't precisely go against the standard.
In your particular situation, I'd look into data scraping, all you need to do is scrape the website for the data, and then re-style it to present it how you want.
Good luck
The Greasemonkey add-on for Firefox does just this. You can write a bit of Javascript code and have it run when certain web pages load. One common thing to use it for is to make changes to the DOM to move page elements around, hide or resize elements, change colors, etc. There are a bunch of examples at userscripts.org if you want to get an idea of what I am talking about.
Your code would simply need to do something similar. Load the page (including the normal style sheets) and then use the DOM to make changes to style elements as desired. Browse through the source of the page to get the names/ids of important elements, and your code can key off of those. Loading an additional CSS file containing your changes is an option, but doing it programmatically will likely give you more flexibility in the event that the target website changes.
Depends on what do you use to show the pages in Java. Most browser implementations support dynamic changes to the DOM, so you can simply add a CSS file to header as a last element, and it will be applied.
you need to know the markup of the html / css so you can make the best skin.
you could theoretically do it by styling just the basic tags: h1...h6, p, etc... but it would not be as good and would probably fail to produce the best results at times and even produce horrible things at times.
if you KNOW the site markup then you can make a skin and simply use CSS/images to skin it as you wanted it.
just include your CSS markup LAST so that it overrides the one already present on the site that you want to skin differently.
should not be a difficult thing per se. the skin itself is probably the better (more effort required) part of the job.
On the fly, should mean changing the html fetched. So parsing and replacing tokens seems to be a/the way.
You could change the locations of the style sheet files by replacing the href value in a link that points to a css file, and set the value to your style sheet (a different URI).
<link type="text/css" href="mylocalURI" rel="stylesheet />
(this should be the result of a process/replacement)
I think you understand what should happend for inline styles.
I would use JTidy to normalize the original site HTML to XHTML, then use XSLT to filter only the interesting/relevant information, obtaining XML format; and finally (since I wouln't want to convert XML to objects), XSLT again to transform the "pure" XML into the HTML look & feel I need/want.
All of this can be assembled as streams, using more or less 4 Kb of buffer per filter (12 Kb total) per thread. Also meaning that it will run fast enough. And all built on standard, open-source available components.
Cheers.