What's the best way to externalize large quantities of HTML in a GWT app? We have a rather complicated GWT app of about 30 "pages"; each page has a sort of guide at the bottom that is several paragraphs of HTML markup. I'd like to externalize the HTML so that it can remain as "unescaped" as possible.
I know and understand how to use property files in GWT; that's certainly better than embedding the content in Java classes, but still kind of ugly for HTML (you need to backslashify everything, as well as escape quotes, etc.)
Normally this is the kind of thing you would put in a JSP, but I don't see any equivalent to that in GWT. I'm considering just writing a widget that will simply fetch the content from html files on the server and then add the text to an HTML widget. But it seems there ought to be a simpler way.
I've used ClientBundle in a similar setting. I've created a package my.resources and put my HTML document and the following class there:
package my.resources;
import com.google.gwt.core.client.GWT;
import com.google.gwt.resources.client.ClientBundle;
import com.google.gwt.resources.client.TextResource;
public interface MyHtmlResources extends ClientBundle {
public static final MyHtmlResources INSTANCE = GWT.create(MyHtmlResources.class);
#Source("intro.html")
public TextResource getIntroHtml();
}
Then I get the content of that file by calling the following from my GWT client code:
HTML htmlPanel = new HTML();
String html = MyHtmlResources.INSTANCE.getIntroHtml().getText();
htmlPanel.setHTML(html);
See http://code.google.com/webtoolkit/doc/latest/DevGuideClientBundle.html for further information.
You can use some templating mechanism. Try FreeMarker or Velocity templates. You'll be having your HTML in files that will be retrieved by templating libraries. These files can be named with proper extensions, e.g. .html, .css, .js obsearvable on their own.
I'd say you load the external html through a Frame.
Frame frame = new Frame();
frame.setUrl(GWT.getModuleBase() + getCurrentPageHelp());
add(frame);
You can arrange some convention or lookup for the getCurrentPageHelp() to return the appropriate path (eg: /manuals/myPage/help.html)
Here's an example of frame in action.
In GWT 2.0, you can do this using the UiBinder.
<ui:UiBinder xmlns:ui='urn:ui:com.google.gwt.uibinder'>
<div>
Hello, <span ui:field='nameSpan’/>, this is just good ‘ol HTML.
</div>
</ui:UiBinder>
These files are kept separate from your Java code and can be edited as HTML. They are also provide integration with GWT widgets, so that you can easily access elements within the HTML from your GWT code.
GWT 2.0, when released, should have a ClientBundle, which probably tackles this need.
You could try implementing a Generator to load external HTML from a file at compile time and build a class that emits it. There doesn't seem to be too much help online for creating generators but here's a post to the GWT group that might get you started: GWT group on groups.google.com.
I was doing similar research and, so far, I see that the best way to approach this problem is via the DeclarativeUI or UriBind. Unfortunately it still in incubator, so we need to work around the problem.
I solve it in couple of different ways:
Active overlay, i.e.: you create your standard HTML/CSS and inject the GET code via <script> tag. Everywhere you need to access an element from GWT code you write something like this:
RootPanel.get("element-name").setVisible(false);
You write your code 100% GWT and then, if a big HTML chunk is needed, you bring it to the client either via IFRAME or via AJAX and then inject it via HTML panel like this:
String html = "<div id='one' "
+ "style='border:3px dotted blue;'>"
+ "</div><div id='two' "
+ "style='border:3px dotted green;'"
+ "></div>";
HTMLPanel panel = new HTMLPanel(html);
panel.setSize("200px", "120px");
panel.addStyleName("demo-panel");
panel.add(new Button("Do Nothing"), "one");
panel.add(new TextBox(), "two");
RootPanel.get("demo").add(panel);
Why not to use good-old IFRAME? Just create an iFrame where you wish to put a hint and change its location when GWT 'page' changes.
Advantages:
Hits are stored in separate maintainable HTML files of any structure
AJAX-style loading with no coding at all on server side
If needed, application could still interact with loaded info
Disadvantages:
Each hint file should have link to shared CSS for common look-and-feel
Hard to internationalize
To make this approach a bit better, you might handle loading errors and redirect to default language/topic on 404 errors. So, search priority will be like that:
Current topic for current language
Current topic for default language
Default topic for current language
Default error page
I think it's quite easy to create such GWT component to incorporate iFrame interactions
The GWT Portlets framework (http://code.google.com/p/gwtportlets/) includes a WebAppContentPortlet. This serves up any content from your web app (static HTML, JSPs etc.). You can put it on a page with additional functionality in other Portlets and everything is fetched with a single async call when the page loads.
Have a look at the source for WebAppContentPortlet and WebAppContentDataProvider to see how it is done or try using the framework itself. Here are the relevant bits of source:
WebAppContentPortlet (client side)
((HasHTML)getWidget()).setHTML(html == null ? "<i>Web App Content</i>" : html);
WebAppContentDataProvider (server side):
HttpServletRequest servletRequest = req.getServletRequest();
String path = f.path.startsWith("/") ? f.path : "/" + f.path;
RequestDispatcher rd = servletRequest.getRequestDispatcher(path);
BufferedResponse res = new BufferedResponse(req.getServletResponse());
try {
rd.include(servletRequest, res);
res.getWriter().flush();
f.html = new String(res.toByteArray(), res.getCharacterEncoding());
} catch (Exception e) {
log.error("Error including '" + path + "': " + e, e);
f.html = "Error including '" + path +
"'<br>(see server log for details)";
}
You can use servlets with jsps for the html parts of the page and still include the javascript needed to run the gwt app on the page.
I'm not sure I understand your question, but I'm going to assume you've factored out this common summary into it's own widget. If so, the problem is that you don't like the ugly way of embedding HTML into the Java code.
GWT 2.0 has UiBinder, which allows you to define the GUI in raw HTMLish template, and you can inject values into the template from the Java world. Read through the dev guide and it gives a pretty good outline.
Take a look at
http://code.google.com/intl/es-ES/webtoolkit/doc/latest/DevGuideClientBundle.html
You can try GWT App with html templates generated and binded on run-time, no compiling-time.
Not knowing GWT, but can't you define and anchor div tag in your app html then perform a get against the HTML files that you need, and append to the div? How different would this be from a micro-template?
UPDATE:
I just found this nice jQuery plugin in an answer to another StackOverflow question.
Related
In GWT we need to use # in a URL to navigate from one page to another i.e for creating history for eg. www.abc.com/#questions/10245857 but due to which I am facing a problem in sharing the url. Google scrappers are reading the url only before # i.e. www.abc.com.
Now I want to remove # from my url and want to keep it straight as www.abc.com/question/10245857.
I am unable to do so. How can I do this?
When user navigates the app I use the hash urls and History object (as
to not reload the pages). However sometimes it's nice/needed to have a
pretty URL (e.g. for sharing, showing in public, etc..) so I would like to know how to
provide the pretty URL of the same page.
Note:
We have to do this to make our webpages url crawlable and to link the website with outside world.
There are 3 issues here, and each can be solved:
The URL should appear prettier to the user
Going directly to the pretty URL should work.
WebCrawlers should be able to get the content
These may all seem like the same issue, but they are quite distinct in this context.
Display Pretty URLs
Can be done with a small javascript file which uses HTML5 state methods. You can see a simple demo here, with source here. This makes all changes to "#" appear without the "#" (on modern browsers).
Relevent code from fiddle:
var stateObj = {locationHash: hash};
history.replaceState(stateObj, "Page Title", baseURL + hash.substring(1));
Repsond to Pretty URLs
This is relatively simple, as long as you have a listener in GWT to load based on the "#" at page load already. You can just throw up a simple re-direct servlet which reinserts the "#" mark where it belongs when requests come in.
For a servlet, listening for the pretty URL:
if(request.getPathInfo()!=null && request.getPathInfo().length()>1){
response.sendRedirect("#" + request.getPathInfo());
return;
}
Alternatively, you can serve up your GWT app directly from this servlet, and initialize it with parameters from the URL, but there's a bit of relative-path bookkeeping to be aware of.
WebCrawlers
This is the trickiest one. Basically you can't get around having static(ish) pages here. That's not too hard if there are a finite set of simple states that you're indexing. One simple scheme is to have a separate servlet which returns the raw content you normally fetch with GWT, in minimal formatted HTML. This servlet can have a different URL pattern like "/indexing/". These wouldn't be meant for humans, just for the webcrawlers. You can attach a simple javascript in the <head> to redirect users to the pretty url once the page loads.
Here's an example for the doGet method of such a servlet:
response.setContentType("text/html;charset=UTF-8");
response.setStatus(200);
pw = response.getWriter();
pw.println("<html>");
pw.println("<head><script>");
pw.println("window.location.href='http://www.example.com/#"
+ request.getPathInfo() + "';");
pw.println("</script></head>");
pw.println("<body>");
pw.println(getRawPageContent(request.getPathInfo()));
pw.println("</body>");
pw.println("</html>");
pw.flush();
pw.close();
return;
You should then just have some links to these indexing pages hidden somewhere on your main app URL (or behind a link on your main app URL).
I have read hundreds of SO Posts and studied several Java HTTP-Proxy Sources available... but I could not find a solution for my Problem.
I wrote a WebApp that proxies Http-Requests. The WebApp is working, but links and referrers become broken because the "Root" of the proxied page points to the root of my server and not to the path of my proxyservlet..
To make it more clear:
My ProxyServlet gets a Request "http://myserver.com/proxy/ProxyServlet?foo=bar"
The ProxyServlet now fetches the pagecontent from ServerX (e.g. "http://original.com/test.html")
The content of the page is delivered to the browser by just reading and writing from one stream to the other and copying the headers.
The browser displays the page, the URL, that the browser shows is the original request ("http://myserver.com/proxy/ProxyServlet?foo=bar"), but all relative links now point to
"http://myserver.com/XXX.html" instead of "http://myserver.com/proxy/ProxyServlet/XXX.html"
Is there a response-header where I can change the "path" so that relative links correctly point to my ProxyServlet?
(Rewriting the page-content and replacing links would be too difficult, because the page contains relatively addressed elements such as javascript code and other active content...)
(Changing the mapping for my Servlet to "/*" is also not possible... it must be accessed via this path...)
You are inventing a "reverse proxy", and miss the "URL rewriting" feature...
Off the top of my search results, here's an open source proxy servlet that does this:
http://j2ep.sourceforge.net/docs/rewrite.html
Also you should know there is probably something wrong with the system architecture if you have to do this. Dropping in a standalone proxy like Apache, nginex, Varnish should always be an option, as you will HAVE to add one (or more!) as you start scaling.
It sounds like the page you're proxying in is using absolute links, e.g. <a href="/XXX.html"> which means "no matter where this link is found, look for it relative to the document root". If you have control of it, the best thing is for the proxy target to be more lenient in it's linking, and instead use <a href="XXX.html">. If you can't do that, then you need to re-write these URLs, some example code, using JSoup:
Document doc = Jsoup.parse(rawBody, getDisplayUrl());
for(Element cssALink : doc.select("link[rel=stylesheet],a[href]"))
{
cssALink.attr("href", cssALink.absUrl("href"));
}
for(Element imgJsLink : doc.select("script[src],img[src]"))
{
imgJsLink.attr("src", imgJsLink.absUrl("src"));
}
return doc.toString();
I'm currently building a SmartGWT-based web application (using the Portlet Layout). So I have several "Portlet", which basically extend GWT Window with different content. Now I want a Portlet to display Dygraphs. So I've created an RPC Service implementation which returns a JSON String (based on a DataTable object).
Since I cannot directly serialize a DataTable object I use
String json = JsonRenderer.renderDataTable(data, true, true).toString();
where "data" is of type DataTable.
Now this String gets correctly passed to the client side where I want to create the Dygraph. In this thread , someone suggested to use
public static native DataTable toDataTable(String json)
/-{ return new $wnd.google.visualization.DataTable(eval("(" + json + ")")); }-/;
If I use this in my GWT client code, i get an error saying
com.google.gwt.core.client.JavaScriptException: (TypeError): $wnd.google.visualization is undefined
Do i miss some "import" of the visualization API? Where do i have to instantiate it?
Or is there another way to get the JSON datastring into the Dygraph? I can't find any examples...
Thank you for any hint!
I assume you have included the visualization.jar and the visualization namespace in your module's XML
<inherits name="com.google.gwt.visualization.Visualization"/>
This will give you the Classes. You probably have done this otherwise you would have gotten a compiler error.
However you also have to include the actual visualization javascript file from the google servers (the visualization.jar is only a wrapper). This can be done in two different ways:
1.) Include it in the host page:
<script type="text/javascript">
google.load("visualization", "1", {'packages' : ["corechart"] });
</script>
or
2.) Load it dynamically where you need it:
VisualizationUtils.loadVisualizationApi(onLoadCallback, MotionChart.PACKAGE);
see http://code.google.com/docreader/#p=gwt-google-apis&s=gwt-google-apis&t=VisualizationGettingStarted
Btw. I have forked the Dygraphs Project and changed the GWT wrapper to more like the other visualization wrappers. You can check it out here: https://github.com/timeu/dygraphs
Edit: I have a new GWT wrapper for dygraphs that uses the GWT 2.8's new JsInterop: https://github.com/timeu/dygraphs-gwt
Note: I changed some behaviour in dygraphs and added some features which weren't available in the upstream code.
I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.
I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").
Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):
String unsanitized = "...<...>..."; // some potentially
// dangerous html here on input
HtmlSanitizer sat = new HtmlSanitizer(); // sanitizer util class created
String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...
Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.
How about that?
You can try OWASP Java HTML Sanitizer. It is very simple to use.
PolicyFactory policy = new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
.build();
String safeHTML = policy.sanitize(untrustedHTML);
Thanks to #Saljack's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:
<dependency>
<groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
<artifactId>owasp-java-html-sanitizer</artifactId>
<version>20150501.1</version>
</dependency>
Check here for latest release.
Then I added this function for sanitization:
private String sanitizeHTML(String untrustedHTML){
PolicyFactory policy = new HtmlPolicyBuilder()
.allowAttributes("src").onElements("img")
.allowAttributes("href").onElements("a")
.allowStandardUrlProtocols()
.allowElements(
"a", "img"
).toFactory();
return policy.sanitize(untrustedHTML);
}
More tags can be added by extending the comma delimited parameter in allowElements method.
Just add this line prior passing the bean off to save the data:
bean.setHtml(sanitizeHTML(bean.getHtml()));
That's it!
For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.
You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.
Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPI can be found in the XSS prevention cheatsheet published by OWASP.
You could use the OWASP AntiSamy project to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sample TinyMCE policy file on the AntiSamy downloads page.
HTML escaping inputs works very well. But in some cases business rules might require you NOT to escape the HTML. Using REGEX is not fit for the task and it is too hard to come up with a good solution using it.
The best solution I found was to use: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
It builds a DOM tree with the provided input and filters any element not previosly allowed by a Whitelist. The API also has other functions for cleaning up html.
And it can also be used with javax.validation #SafeHtml(whitelistType=, additionalTags=)
Regarding Antisamy, you may want to check this regarding the dependencies:
http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave
What are the best Java libraries to "fully download any webpage and render the built-in JavaScript(s) and then access the rendered webpage (that is the DOM-Tree !) programmatically and get the DOM Tree as an "HTML-Source"?
(Something similarly what firebug does in the end, it renders the page and I get access to the fully rendered DOM Tree, as the page looks like in the browser! In contrast, if I click "show source" I only get the JavaScript source code. This is not what I want. I need to have access to the rendered page...)
(With rendering I mean only rendering the DOM Tree not a visual rendering...)
This does not have to be one single library, it's ok to have several libraries that can accomplish this together (one will download, one render...), but due to the dynamic nature of JavaScript most likely the JavaScript library will also have to have some kind of downloader to fully render any asynchronous JS...
Background:
In the "good old days" HttpClient (Apache Library) was everything required to build your own very simple crawler. (A lot of cralwers like Nutch or Heretrix are still built around this core princible, mainly focussing on Standard HTML parsing, so I can't learn from them)
My problem is that I need to crawl some websites that rely heavily on JavaScript and that I can't parse with HttpClient as I defenitely need to execute the JavaScripts before...
You can use JavaFX 2 WebEngine. Download JavaFX SDK (you may already have it if you installed JDK7u2 or later) and try code below.
It will print html with processed javascript.
You can uncomment lines in the middle to see rendering as well.
public class WebLauncher extends Application {
#Override
public void start(Stage stage) {
final WebView webView = new WebView();
final WebEngine webEngine = webView.getEngine();
webEngine.load("http://stackoverflow.com");
//stage.setScene(new Scene(webView));
//stage.show();
webEngine.getLoadWorker().workDoneProperty().addListener(new ChangeListener<Number>() {
#Override
public void changed(ObservableValue<? extends Number> observable, Number oldValue, Number newValue) {
if (newValue.intValue() == 100 /*percents*/) {
try {
org.w3c.dom.Document doc = webEngine.getDocument();
new XMLSerializer(System.out, new OutputFormat(doc, "UTF-8", true)).serialize(doc);
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
});
}
public static void main(String[] args) {
launch();
}
}
This is a bit outside of the box, but if you are planning on running your code in a server where you have complete control over your environment, it might work...
Install Firefox (or XulRunner, if you want to keep things lightweight) on your machine.
Using the Firefox plugins system, write a small plugin which takes loads a given URL, waits a few seconds, then copies the page's DOM into a String.
From this plugin, use the Java LiveConnect API (see http://jdk6.java.net/plugin2/liveconnect/ and https://developer.mozilla.org/en/LiveConnect ) to push that string across to a public static function in some embedded Java code, which can either do the required processing itself or farm it out to some more complicated code.
Benefits: You are using a browser that most application developers target, so the observed behavior should be comparable. You can also upgrade the browser along the normal upgrade path, so your library won't become out-of-date as HTML standards change.
Disadvantages: You will need to have permission to start a non-headless application on your server. You'll also have the complexity of inter-process communication to worry about.
I have used the plugin API to call Java before, and it's quite achievable. If you'd like some sample code, you should take a look at the XQuery plugin - it loads XQuery code from the DOM, passes it across to the Java Saxon library for processing, then pushes the result back into the browser. There are some details about it here:
https://developer.mozilla.org/en/XQuery
The Selenium library is normally used for testing, but does give you remote control of most standard browsers (IE, Firefox, etc) as well as a headless, browser free mode (using HtmlUnit). Because it is intended for UI verification by page scraping, it may well serve your purposes.
In my experience it can sometimes struggle with very slow JavaScript, but with careful use of "wait" commands you can get quite reliable results.
It also has the benefit that you can actually drive the page, not just scrape it. That means that if you perform some actions on the page before you get to the data you want (click the search button, click next, now scrape) then you can code that into the process.
I don't know if you'll be able to get the full DOM in a navigable form from Selenium, but it does provide XPath retrieval for the various parts of the page, which is what you'd normally need for a scraping application.
You can use Java, Groovy with or without Grails. Then use Webdriver, Selenium, Spock and Geb these are for testing purposes, but the libraries are useful for your case.
You can implement a Crawler that won't open a new window but just a runtime of these either browser.
Selenium : http://code.google.com/p/selenium/
Webdriver : http://seleniumhq.org/projects/webdriver/
Spock : http://code.google.com/p/spock/
Geb : http://www.gebish.org/manual/current/testing.html
MozSwing could help http://confluence.concord.org/display/MZSW/Home.
You can try JExplorer.
For more information see http://www.teamdev.com/downloads/jexplorer/docs/JExplorer-PGuide.html
You can also try Cobra, see http://lobobrowser.org/cobra.jsp
I haven't tried this project, but I have seen several implementations for node.js that include javascript dom manipulation.
https://github.com/tmpvar/jsdom