By multipage I mean separate HTML files, say index.html, admin.html etc.
Now one solution to achieve this is to have this in the EntryPoint class:
if (!Window.Location.getPath().toLowerCase().endsWith("myhtmlpage.html") {
return;
}
The thing I want to understand deeply here is what is my GWT app have the main app, admin app, etc. The tendency of the app nocache.js file will get bigger, thus longer to load.
The question would be, does the code above prevent other parts of the compiled GWT app to load unnecessary parts of the app, say athe code for the Index EntryPoint or the Admin EntryPoint are loaded separately?
No, your if/return statement would not prevent any unnecessary javascript code to load.
The standard way to partition the UI javascript code is through code splitting.
The standard way to emulate multiple pages is by managing history & hyperlinks. Basically use tokens to manage your app states with hash code at the end of url -- e.g., #home, #admin.
A pattern I like is a combination of the above two. For a page that does not need to load initially, I hide it behind a GWT.runAsync code-splitting call to server with a distinct history token. For pages for which I want to dynamically control the content on the server side without having to recompile the javascript, I create a server call I fully control that returns html displayed on the browser through GWT HTMLPanel -- of course no need to recompile as long as the html structure and corresponding HTMLPanel code do not change. A side advantage of the latter is that you may control your server side logging to track page load statistics.
Finally, you may want to read up on GWT Activities & Places, from what I read a standard for dealing with history & such.
Right Patrick,
In addition, there is no way to use the code splitting method for js libraries that are not in your project (not GWT). So splitting pages is right to avoid js libraries you use for your admin code but you don't use in your front office, that's ok if you include the js in the page and you don't inject it through GWT. Otherwise it's your responsibility to split the code (have a common package that is available to all, but individual loads per 'page')
In theory, they say, the good split point is an activity (but I'm not convinced, since I have many activities in my pages, and loading each script alone could be bad for performance, so it's a per case analysis), you can see all what is included in your split in the compiler report
Take the time to look this video, it will save you a lot of troubles
https://www.youtube.com/watch?v=0F5zc1UAt2Y
Related
I'm working on learning JSP and the Play framework, and I understand that it runs on Scala and renders views based on templates, but what if I just want to use plain HTML rather than scala templates?
The situation I'm in is that I'm designing the site to match a visual template, so I'm using Dreamweaver to build the html files. I really like Play framework though, so I'd like to continue using it. So, what are my options here?
I don't get. Play's views are not just nice html files, of course you can (or even should) use your favorite tools for design part, anyway you have to also learn how to include a dynamic parts in it.
Of course you can use DreamWeaver for that task as it has feature for editing source code. But I can ensure you from my own experience, that there are better tools for every-day work with Play's views than DW.
You can also use plain HTML in your /public folder however in this scenario you won't be able to make it dynamic, so it has no sense, as you can create the pages without any framework - just using static files created with DW.
In general words: you need to verify your needs, cause from your question I read: "I like Play framework, anyway I don't want to use it for its job..."
After-comments edit:
You don't have to make views dynamic. If you won't pass any arguments into the view and will put there pure HTML it will be 'relatively cheap' way for displaying static pages as well. Just you need to remeber to leave first line of the file empty. So you don't need to use File index = new File... instead just put your bare HTML code into ie: app/views/staticContact.scala.html and then use an action:
public static Result staticContact(){
return ok(views.html.staticContact.render());
}
On the quite other hand, last time I was wondering if it wasn't better to put HTML code of the static pages into the DB, in such case you could create an editing page, where you could change HTML without redeploying the application. All what you will need it was just fetching HTML from DB and displaying it in one generic view. For better performance you can use included Cache implementation.
GET / controllers.Assets.at(path="/public/html", file="index.html")
This is working for play 2.0.1 for /public/html/index.html file
Requirement is to keep a copy of complete web page at server side same as it is rendered on client browser as past records.These records are revisited.
We are trying to store the html of rendered web page. The html is then rendered using resources like javascript, css and image present at server side. These resources keep on changing. Therefore old records are no longer rendered perfectly.
Is there any other way to solve above? We are also thinking converting it into pdf using IText or apache FOP api but they does not consider javascript effect on page while conversion. Is there any APIs available in java to achieve this?
Till now, no approach working perfectly. Please suggest.
Edit:
In summary,requirement is to create a exact copy of rendered web page at server side to store user activities on that page.
wkhtmltopdf should do this quite nicely for you. It will take a URL, and return a pdf.
code.google.com/p/wkhtmltopdf
Example:
wkhtmltopdf http://www.google.com google.pdf
Depending on just how sophisticated your javascript is, and depending on how faithfully you want to capture what the client saw, you may be undertaking an impossible task.
At a high level, you have the following options:
Keep a copy of everything you send to the client
Get the client to return back exactly whatever it has rendered
Build your system in such a way that you can actually fetch all historical versions of the constituent resources if/when you need to reproduce a browser's view.
You can do #1 using JSP filters etc, but it doesn't address issues like the javascript fetching dynamic html content during rendering on the client.
Getting the client to return what they are seeing (#2) is tricky, and bandwidth intensive.
So I would opt for #3. In order to turn a website that renders dynamic content versioned, you have to do several things. First, all datasources need to versioned too. So any queries would need to specify the version. "Version" can be a timestamp or some generation counter that you maintain. If you are taking this approach, you would also need to ensure that any javascript you feed to the client does not fetch external resources directly. Rather, it should ask for any resources from your system. Your system would in turn fetch the external content (or reuse from a cache).
The answer would depend on the server technology being used to write the HTML. Are you using Java/JSPs or Servlets or some sort of an HTTPResponse object to push the HTML/data to the browser?
If only the CSS/JS/HTML are changing, why don't you just take snapshots of your client-side codebase and store them as website versions?
If other data is involved (like XML/JSON) take a snapshot of those and version that as well. Then the snapshot of the client codebase as mentioned above with the contemporary snapshot of the data should together give you the exact rendering of your website as at that point of time.
A very resource-consuming requirement but...
You haven't written what application server you are using and what framework. If you're generating responces in your own code, you can just store it while generating.
Another possibility is to write a filter, that would wrap servlet's OutputStream and log everything that was written to it, you must just assure your filter is on the top of the hierarchy.
Another, very powerfull, easiest to manage and generic solution, however possibly the most resource-consuming: write transparent proxy server staying between user and application server, that would redirect each call to app server and return exact response, additionally saving each request and response.
If you're storing the html page, why not the references to the js, css, and images too?
I don't know what your implementation is now, but you should create a filesystem with all of the html pages and resources, and create references to the locations in a db. You should be backing up the resources in the filesystem every time you change them!
I use this implementation for an image archive. When a client passes us the url of an image we want to be able to go back and check out exactly what the image was at that time they sent it (since it's a url it can change at any time). I have a script that will download the image as soon as we receive the url, store it in the filesystem, and then store the path to the file in the db along with other various details. This is similar to what you need, just a couple more rows in your table for the js, css, images paths.
Still trying to decide which application will suit : current options JxBrowser vs SWT widget.
Java application implements a webbrowser control like JxBrowser or SWT browser control. Both of these provide options to pass info from java to javascript.
Now I need to know: Is it possible not to save the html/css/javascript file into the cache? Is it possible to have java serve the content as input (looks like this is possible with SWT, unsure of JxBrowser).
Essentially I don't want to have temporary files, either in the cache or in temp folder, and I want to feed the information from an input stream.
Or do you have to roll/embed your own browser to avoid having all saved to cache?
Perhaps for clarity: I am asking if these two programs offer defined methods when implementing their own browser from within java, to not cache, and if you can stream input directly to serve the html / css/ javascript content.
I understand the no-cache methods in a webbrowser, here I am simply asking whether embedding the browser behaves in the same way. The documentation does not seem very specific about this issue, but perhaps I need to look more.
On further looking it looks like it saves files to cache. Secondly, methods like clearing the cache are so non-specific that if you call clear cache function you happen to have to empty the clients entire cache. Argh....
It's possible to render HTML from memory by SWT Browser widget. I'm not sure how is it with caching those pages (it may depend on used browser), but it seems reasonable not cache those pages.
See SWT Browser snippets for additional info about rendering HTML from memory.
I'd like to store then later display user-entered content securely with minimal effort (my goal is a web app not writing a bunch of security-related code).
EDIT: Google App Engine for Java
I'm working with the same issue myself; but I haven't had the chance to get it out into the real world yet; so please just keep in mind that MY ANSWER IS NOT BATTLE TESTED. USE AT YOUR OWN RISK.
First, you need to ask yourself if you're going to be allowing the user to use ANY html markup. So, for example, can the user enter a link? What about make bold text?
If the answer is NO, then it is fairly simple. Here is the idea of how to set the filter up:
http://greatwebguy.com/programming/java/simple-cross-site-scripting-xss-servlet-filter/
But personally, I don't like the filter being used in the first example; I just put it there to show you how to set the filter up.
I would recommend using this filter:
http://xss-html-filter.sourceforge.net/
So basically:
Setup the example from first link, get it working
Download the example from the second link, put it in your project in such a way you can access it from your code.
Rewrite the cleanXSS method to use what you downloaded from the second link. So probably something like:
private String cleanXSS(String value) {
return new HTMLInputFilter().filter( input );
}
If you do want to allow HTML (such as an anchor tag/etc) then it looks like the HTMLInputFilter has mechanisms to allow this; but it isn't documented so you'll have to figure it out by looking at the code yourself or provide your own way of filtering.
user-entered content securely with minimal effort (my goal is a web app not writing a bunch of security-related code).
How much security-related code you need to write depends on how much you are at risk (how likely is it someone would want to attack your site, which it self is related to how popular your site is).
For example if your writing a public notepad, which will have a total of 3 users, you can get away with the bare minimum, if however your writing a we hate China, Iran and all hackers/crackers app dealing with $1,000,000 worth of transactions an hour and 3 billion users, you may be a bit more of a target.
Simply put you shouldn't trust any data that comes from outside your app including from the datastore. All this data should be checked that it's what you expect.
I've not validated incoming Java Strings against XSS however removing HTML is normally good enough, and Jsoup looks interesting for this (See Remove HTML tags from a String )
Also to be sure you should ensure your outputting what you expect to be outputting and not the some JavaScript.
Most templating engines, including django's (which is bundled with App Engine), provide facilities to escape output to make it safe to print in HTML. In newer versions of Django, this is done automatically unless you tell it not to; in 0.9.6 (still the default in webapp), you pass your output values to |escape in the template.
Escaping on output is universally the best way to do this, because it means you have the original unmodified text; if you modify your escaping or output formatting later, you can still format text entered before that.
You can also use a service that will proxy all connections and block any XSS attempts. I know only one service like that - CloudFlare (but it doesn't mean there aren't others like that). Unfortunately security features goes in with Pro plan which is paid :(
I need to screen scrape some data from a website, because it isn't available via their web service. When I've needed to do this previously, I've written the Java code myself using Apache's HTTP client library to make the relevant HTTP calls to download the data. I figured out the relevant calls I needed to make by clicking through the relevant screens in a browser while using the Charles web proxy to log the corresponding HTTP calls.
As you can imagine this is a fairly tedious process, and I'm wodering if there's a tool that can actually generate the Java code that corresponds to a browser session. I expect the generated code wouldn't be as pretty as code written manually, but I could always tidy it up afterwards. Does anyone know if such a tool exists? Selenium is one possibility I'm aware of, though I'm not sure if it supports this exact use case.
Thanks,
Don
I would also add +1 for HtmlUnit since its functionality is very powerful: if you are needing behaviour 'as though a real browser was scraping and using the page' that's definitely the best option available. HtmlUnit executes (if you want it to) the Javascript in the page.
It currently has full featured support for all the main Javascript libraries and will execute JS code using them. Corresponding with that you can get handles to the Javascript objects in page programmatically within your test.
If however the scope of what you are trying to do is less, more along the lines of reading some of the HTML elements and where you dont much care about Javascript, then using NekoHTML should suffice. Its similar to JDom giving programmatic - rather than XPath - access to the tree. You would probably need to use Apache's HttpClient to retrieve pages.
The manageability.org blog has an entry which lists a whole bunch of web page scraping tools for Java. However, I do not seem to be able to reach it right now, but I did find a text only representation in Google's cache here.
You should take a look at HtmlUnit - it was designed for testing websites but works great for screen scraping and navigating through multiple pages. It takes care of cookies and other session-related stuff.
I would say I personally like to use HtmlUnit and Selenium as my 2 favorite tools for Screen Scraping.
A tool called The Grinder allows you to script a session to a site by going through its proxy. The output is Python (runnable in Jython).