I am a beginner in java and web app development. Suppose to analyze & optimize JSP pages, those are taking some while to get data from server.
My question is, can we load the supporting file while the jsp is Waiting for server response?
I think you misunderstand how HTML, JS and CSS work.
In short: the browser sends a request for a certain JSP page. This page return from the server and holds within it a number of link tags referring to the CSS and JS for the file. The browser parses this page and sees that it needs extra resources in order to properly use the page. So it sends another request to the server for the CSS and JS.
Because of this, it is impossible for the browser to know in advance what CSS and JS the JSP page would need, because these are determined by the contents of the page itself.
However, that does not mean that you are out of luck. the first page will always need to load it afterwards, but it is possible to load the CSS and JS for the other pages in advance through the explanations on Pre-loading external files (CSS, JavaScript) for other pages. I have not tried these methods myself, but they seem valid.
Well, if I understand you correctly, why don't you just load the CSS/JS files and fire your other function when that's done? I'm not quite sure why you'd want that, though.
Related
The program I am writing is in Java.
I am writing a little program that will download the html of webpages and save them. It works easily for basic pages that don't use JavaScript. But how can I download the page if I want it after a script has updated it? The page I am dealing with is actually updated by Ajax which might be one step harder.
I understand that this is probably a difficult problem that involves setting up a JavaScript run time environment of some kind. I am prepared for a solution of any level of difficulty, I just don't know exactly how to approach it or where to get started.
You can't do that alone with Java only. As the page that you want to download is rendered with javascript, then you must be able to execute the javascript to get the whole rendered page.
Because of this situation, you need to use a headless browser which is a web browser that can access to web pages but can’t show the output within a GUI, aims to provide the content of web pages as fully rendered to serve to the programs or scripts.
You can start with the most famous ones which are Selenium, HtmlUnit and PhantomJS
I am facing a problem retrieving the contents of an HTML page using java. I have described the problem below.
I am loading a URL in java which returns an HTML page.
This page uses javascript. So when I load the URL in the browser, a javascript function call occurs AFTER the page has been loaded (onBodyLoad of HTML page) and it modifies some content (one of the div id's innerHtml) on the webpage. This change is obviously visible to me in the browser.
Now, when I try to do the same thing using java, I only get the HTML content of the page , BEFORE the javascript call has occurred.
What I want to do is, fetch the contents of the html page after the javascript function call has occurred and all this has to be done using java.
How can I do this? What should my approach be?
You need to use a server side browser library that will also execute the JavaScript, so you can get the JavaScript updated DOM contents. The default browser mechanism doesn't do this, which is why you don't get the expected result.
You should try Cobra: Java HTML Parser, which will execute your JavaScript. See here for the download and for the documentation on how to use it.
Cobra:
It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM. However, Javascript can be disabled.
For anyone reading this answer, Scott's answer above was a starting point for me. The Cobra project is long dead and cannot handle pages which use complex JavaScript.
However there is something called HTML Unit which does just exactly what I want.
Here is a small description:
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.
It is typically used for testing purposes or to retrieve information from web sites.
Java - My target is to get complete web page source(html) fill form and submit.
These days web pages are very complex to read b/c they do not get load in one page request call, they execute scripts on page load to fetch data from server and inject in page itself.
Thats where i am having problem, page i am trying to work with having multiple ajax calls to load forms inside and i have to fill the form and submit it pragmatically to get the result.
I tried "selenium" and used HtmlUnitDriver to do all things in background but selenium is failing on Javascript execution after enabling the js.
I want to get all page in one call regardless of if it have ajax calls to load different sections of the page.
One solution i am guessing may be if there is any thing like running server and requesting with url to get page which also maintain the session to submit form or some thing like that.
please feel free to share your thoughts.
I make similar thing in my job. I use HtmlUnit (because it does not rendering -> faster than selenium). A little problematic is to wait when Ajax loading is finished. I poll and look if expected parts of HTML-code are present in the page, which are inserted by ajax.
When I am sure that all needed parts are loaded by ajax, I fill the form and submit it.
In java, is there any way to get the content of a webpage, wich is an .ASPX file.
I know how to read/write anything from a normal HTML page, but asp pages seem to have one URL for multiple pages, so it's not really possible to reach the desired page by URL.
I understand you can't/won't give me complete instructions right here, but could you maybe send me in the right direction ?
Thanks in advance.
There is nothing special about ASPX pages compared to any other type of page; "plain" html pages could have been dynamically generated as well.
Just don't forget that the query string is also part of the URL. Many ASPX, PHP, etc pages might not even be 'correct' to request without some query string value at all. And other sites don't have file extensions at all... like this site itself. You just have to be sure to get the entire URL for each unique 'page'.
I'm not an expert on .asp, so I might be wrong. However, my impression is that a .asp page should ultimately return HTML (similarly to what a .jsp page does), so you can fetch the content in the same way as you would do for an HTML page.
However, you write that
asp pages seem to have one URL for multiple pages
this makes me think that perhaps your .asp page is using AJAX and so the page content may change while the URL doesn't. Is this your case?
I understand that you are trying to read the aspx from a client PC, not from the server.
If that's right, accessing an HTTP resource is independent from the technology used by the server, all you need to do is to open an http request and retrieve the results.
If you see multiple pages from one URL, then one of the following is happening:
1) POST data is sent to the aspx, and it renders different HTML due to these parameters
2) You are not looking really at the inner page but to a page that provides the frames for the HTML being rendered
3) The page uses heavily Ajax in order to be rendered. The "contents" of the page are not download through the initial request but later by javascript.
Generally, it is probably the first reason.
I want to retrieve all the links in web page ,but the web page use javascript and each page contain number of links
how can i go to the next page and read its contain in java program
Getting this info from a Javascript'ed page can be a hard job. Your program must interpret the whole page and understand what the JS is doing. Not all web spiders doing this.
Most modern JS libraries (jquery, etc) are mostly manipulate CSS and attributes of HTML elements. So first you have to generate the "flat" HTML from HTML source and JS and then maybe run a classical web spider over the flat HTML code.
(For example the FF webdeveloper plugin allows to see the original source code of a page and the generated code of the page, when all JS is done).
What you are looking for is called Web Spider engine. There are plenty of open source web spider engine's are available. Check http://j-spider.sourceforge.net/ for example