Rendering HTML : Serverside Vs Clientside

Rendering HTML : Serverside Vs Clientside - java

I have a complex invoice page with orderitems etc details . I will be working with these kind of forms all time . When page is loaded I am using javacode(for loops etc) and writing html in that way .
Another thing I can do is return a json object and use javascript to build that form . What is better practice ?
FYI I am only using Java , jsps , JQuery , json .

These are two absolutely different approaches.
When you building your page with java jsp technology it is a server technology. That means that you can control data on server which will be returned to clients. Representation of this data it's another question.
JavaScript is client technology, more flexible, less boilerplate, faster, but page generation depends on client javascript engine. If it disabled, you can get some problems.
If you not tied to any tech-requirements, I recommend to choose javascript, because you have the case to get data from server.
In case send data to the server server technology is preferable.

There are many "options" for doing this. For web forms I prefer the route of javascript with a json object. js is very flexible and is very easy to implement.

I would suggest json, jquery plugins... More manageable & cleaner codes

Client side rendering is generally considered slower (in terms of page load times), as was found recently when Twitter switched back to server-side rendering from client-side and found that their pages were loading in 1/5th the time.
It's not going to make much difference when you're running a site on a core i7 computer with 16GB of RAM; where it does matter is when you running a heavily client-side site on a mobile device or older computer. More javascript = higher RAM usage and more work for the browser to do (download, parsing, running).
Don't get me wrong, I'm a massive big fan of client-side rendering as it lets me use the awesome AngularJS. For me it about finding a nice balance between the two: I want a fast loading page with a rich UI experience.
So for example, if I was creating a new listings page; I'd could have the first 10 or so results, loaded with the page, from the server. Then I'd have something like a "Load more news articles" button at the bottom or the page which loads the next 10 news item asynchronously via some javascript call (e.g. jQuery $.ajax(), AngularJS $http etc) to a JSON service.
As opposed to:
Client-side only: load the initial 10 news items via javascript & JSON, as your page is loading. Clicking the "Load more news articles" button, uses the same service to then load additional news items on request.
Server-side only: load the initial 10 news items with the page, from the server. Clicking the "Load more news articles" button triggers a page-refresh and the next 10 news articles are returned from the server.

Related

Web Data Extraction and Form filling

I am currently beginning the development of a (UI?) backup from a Webplatform. It is not our platform and I don't have access to the source.
I just have the HTML-rendered view of the Form-Data of the elements I entered.
So the task is to browse to the HTML, store the data (XML/JSON) and then login to the site to fill out the forms again to resubmit the data...
At the moment I'm prototyping with C++ QtWebEngine.
What' the best way to do such a task? What are good frameworks for "browsing" the web and analysing HTML?
Solutions in c++/java/javascript (or a firefox-addon?) are preferred.
Thanks for your help!

same as DSL language interpreter use "Document Object Model (DOM)"
my advice : C# webform app and webbrowser control:
webbrowser.navigate([url])
WebBrowser.DocumentCompleted Event
WebBrowser.Document (read document and help about "System.Windows.Forms.HtmlDocument" )
maybe need inject some java script in
/*
please don't use this info for hack and attack
*/

You could definitely do something like this using Firefox's Addon SDK. In particular you should look into the PageWorker module that allows you to load and run JS code against web pages without showing the page - everything happens in the background.

scrape website multiple pages using Web Client java

I am trying to scrape a website, using Web Client, i am able to get the data on the first page and parse it, but I do not know how to read the data on the second page, the website is calling a java script to navigate to the second page. Can anyone suggest me how do I get the data from the next pages?
Thanks in advance

The problem you're going to have is while you (a person) can read the JavaScript in the first page and see it is navigating to another page, having the computer do this is going to be hard.
If you could identify the block of code performing the navigation, you would then need to execute it in such a way that allowed your program to extract the URL. This again is going to be very specific to the structure of the JavaScript and would require a person to identify this.
In short, I think you're dead in the water with this one, though it serves as a good example of why the Unobtrusive JavaScript concept is so important.

This framework integrates HtmlUnit with its headless javascript enabled browser to fully support scriping multiple pages in the same WebClient session: https://github.com/subes/invesdwin-webproxy

Does it worth it to confirm to JSR286 porlet for new web project?

I am about to start a totally new web project.
The project need to have different small window, which contain html generated from other web site.
One important requirement is when user submit a form in the window, NO refresh should be invoked on the other window.
My leader say let's look into jsr286 portlet (coz portlet sounds like window?). But after look into some example (pluto portal/jetspeed2), none of them support the requirement, whenever a window is submit, whole page is submitted.
My rough idea is to use iframe in each window, and let the iframe does the rest (like ref to external website, handle form submission).
Personally, I don't think iframe fit quite well into the portlet jsr286. And most of the window has nothing to do with each other, so processEvent is not compulsory.
So my questions is:
for a new project with such requirement (separate form submission), does it worth it to confirm to portlet jsr286?
If it does, how does the iframe works with different portlet modes(VIEW/EDIT/HELP) or window state(MAX/NORMAL/MIN)?
Thank you very much!

there's a good explanation here that you can point your team leader to. it says:
Mashups and portals are both content aggregation technologies. Portals are an older technology designed as an extension to traditional dynamic Web applications, in which the process of converting data content into marked-up Web pages is split into two phases: generation of markup "fragments" and aggregation of the fragments into pages. Each markup fragment is generated by a "portlet", and the portal combines them into a single Web page. Portlets may be hosted locally on the portal server or remotely on a separate server.
and, critically:
Portal technology is about server-side, presentation-tier aggregation.
so aggregation is done on the portal server (even when the portlet servers are separate - this is all driven by the need to make the server side scalable on big sites; it's not about clients combining from multiple sources). and that's why submission refreshes the whole page (because it has to load the new page from the portal).
which should help clear things up, since it sounds like what you are looking for is client-side aggregation (i don't think i'm telling you anything new here, but i'm giving you references in "enterprise speak" that might sound more convincing).
(so, just in case it's not clear, your requirements sound like you need a client-side mashup. portlets won't work because they are assembled server-side. iframes would work, but have some limitations (size, rescale, style / dynamic changes). i was going to suggest combining things on the client using javascript with backbone, but i am worried you'll have issues with pulling data from different sites because of the restrictions on what javascript from within a web page can access. looks like this article would be worth reading...)

How to store a copy of complete web page at server side as soon as it is rendered on client browser?

Requirement is to keep a copy of complete web page at server side same as it is rendered on client browser as past records.These records are revisited.
We are trying to store the html of rendered web page. The html is then rendered using resources like javascript, css and image present at server side. These resources keep on changing. Therefore old records are no longer rendered perfectly.
Is there any other way to solve above? We are also thinking converting it into pdf using IText or apache FOP api but they does not consider javascript effect on page while conversion. Is there any APIs available in java to achieve this?
Till now, no approach working perfectly. Please suggest.
Edit:
In summary,requirement is to create a exact copy of rendered web page at server side to store user activities on that page.

wkhtmltopdf should do this quite nicely for you. It will take a URL, and return a pdf.
code.google.com/p/wkhtmltopdf
Example:
wkhtmltopdf http://www.google.com google.pdf

Depending on just how sophisticated your javascript is, and depending on how faithfully you want to capture what the client saw, you may be undertaking an impossible task.
At a high level, you have the following options:
Keep a copy of everything you send to the client
Get the client to return back exactly whatever it has rendered
Build your system in such a way that you can actually fetch all historical versions of the constituent resources if/when you need to reproduce a browser's view.
You can do #1 using JSP filters etc, but it doesn't address issues like the javascript fetching dynamic html content during rendering on the client.
Getting the client to return what they are seeing (#2) is tricky, and bandwidth intensive.
So I would opt for #3. In order to turn a website that renders dynamic content versioned, you have to do several things. First, all datasources need to versioned too. So any queries would need to specify the version. "Version" can be a timestamp or some generation counter that you maintain. If you are taking this approach, you would also need to ensure that any javascript you feed to the client does not fetch external resources directly. Rather, it should ask for any resources from your system. Your system would in turn fetch the external content (or reuse from a cache).

The answer would depend on the server technology being used to write the HTML. Are you using Java/JSPs or Servlets or some sort of an HTTPResponse object to push the HTML/data to the browser?
If only the CSS/JS/HTML are changing, why don't you just take snapshots of your client-side codebase and store them as website versions?
If other data is involved (like XML/JSON) take a snapshot of those and version that as well. Then the snapshot of the client codebase as mentioned above with the contemporary snapshot of the data should together give you the exact rendering of your website as at that point of time.

A very resource-consuming requirement but...
You haven't written what application server you are using and what framework. If you're generating responces in your own code, you can just store it while generating.
Another possibility is to write a filter, that would wrap servlet's OutputStream and log everything that was written to it, you must just assure your filter is on the top of the hierarchy.
Another, very powerfull, easiest to manage and generic solution, however possibly the most resource-consuming: write transparent proxy server staying between user and application server, that would redirect each call to app server and return exact response, additionally saving each request and response.

If you're storing the html page, why not the references to the js, css, and images too?
I don't know what your implementation is now, but you should create a filesystem with all of the html pages and resources, and create references to the locations in a db. You should be backing up the resources in the filesystem every time you change them!
I use this implementation for an image archive. When a client passes us the url of an image we want to be able to go back and check out exactly what the image was at that time they sent it (since it's a url it can change at any time). I have a script that will download the image as soon as we receive the url, store it in the filesystem, and then store the path to the file in the db along with other various details. This is similar to what you need, just a couple more rows in your table for the js, css, images paths.

autogenerate HTTP screen scraping Java code

I need to screen scrape some data from a website, because it isn't available via their web service. When I've needed to do this previously, I've written the Java code myself using Apache's HTTP client library to make the relevant HTTP calls to download the data. I figured out the relevant calls I needed to make by clicking through the relevant screens in a browser while using the Charles web proxy to log the corresponding HTTP calls.
As you can imagine this is a fairly tedious process, and I'm wodering if there's a tool that can actually generate the Java code that corresponds to a browser session. I expect the generated code wouldn't be as pretty as code written manually, but I could always tidy it up afterwards. Does anyone know if such a tool exists? Selenium is one possibility I'm aware of, though I'm not sure if it supports this exact use case.
Thanks,
Don

I would also add +1 for HtmlUnit since its functionality is very powerful: if you are needing behaviour 'as though a real browser was scraping and using the page' that's definitely the best option available. HtmlUnit executes (if you want it to) the Javascript in the page.
It currently has full featured support for all the main Javascript libraries and will execute JS code using them. Corresponding with that you can get handles to the Javascript objects in page programmatically within your test.
If however the scope of what you are trying to do is less, more along the lines of reading some of the HTML elements and where you dont much care about Javascript, then using NekoHTML should suffice. Its similar to JDom giving programmatic - rather than XPath - access to the tree. You would probably need to use Apache's HttpClient to retrieve pages.

The manageability.org blog has an entry which lists a whole bunch of web page scraping tools for Java. However, I do not seem to be able to reach it right now, but I did find a text only representation in Google's cache here.

You should take a look at HtmlUnit - it was designed for testing websites but works great for screen scraping and navigating through multiple pages. It takes care of cookies and other session-related stuff.

I would say I personally like to use HtmlUnit and Selenium as my 2 favorite tools for Screen Scraping.

A tool called The Grinder allows you to script a session to a site by going through its proxy. The output is Python (runnable in Jython).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.