In java, is there any way to get the content of a webpage, wich is an .ASPX file.
I know how to read/write anything from a normal HTML page, but asp pages seem to have one URL for multiple pages, so it's not really possible to reach the desired page by URL.
I understand you can't/won't give me complete instructions right here, but could you maybe send me in the right direction ?
Thanks in advance.
There is nothing special about ASPX pages compared to any other type of page; "plain" html pages could have been dynamically generated as well.
Just don't forget that the query string is also part of the URL. Many ASPX, PHP, etc pages might not even be 'correct' to request without some query string value at all. And other sites don't have file extensions at all... like this site itself. You just have to be sure to get the entire URL for each unique 'page'.
I'm not an expert on .asp, so I might be wrong. However, my impression is that a .asp page should ultimately return HTML (similarly to what a .jsp page does), so you can fetch the content in the same way as you would do for an HTML page.
However, you write that
asp pages seem to have one URL for multiple pages
this makes me think that perhaps your .asp page is using AJAX and so the page content may change while the URL doesn't. Is this your case?
I understand that you are trying to read the aspx from a client PC, not from the server.
If that's right, accessing an HTTP resource is independent from the technology used by the server, all you need to do is to open an http request and retrieve the results.
If you see multiple pages from one URL, then one of the following is happening:
1) POST data is sent to the aspx, and it renders different HTML due to these parameters
2) You are not looking really at the inner page but to a page that provides the frames for the HTML being rendered
3) The page uses heavily Ajax in order to be rendered. The "contents" of the page are not download through the initial request but later by javascript.
Generally, it is probably the first reason.
Related
This API returns a whole HTML table. I'm searching how to add this table (as is) into my UI but I've never seen such API throwing HTM table.Browsing Internet for an answer is not giving me any hope either.
Is it possible to put it into a webview? or any other UI object? My application sends a word to the API, and I'm getting the table in return.
I'd appreciate some code example.
You can certainly just show that exact same page in a WebView. If you want to parse the table and display only certain information, there is a library call JSOUP that is available which makes it very convenient to parse HTML.
It looks like you don't mind displaying the whole thing in a WebView - if that is acceptable, then you just load the page into a WebView widget. WebView will take care of rendering the page exactly as you see it in a browser. You only have to tell it what to load.
You parse the output like you would any other web request. If you wanted to include the table in your own webpage, you could. Or you could parse the response for the specific info you need.
Don't think of it as an API, think of it as a URL you're requesting and now you need to do something with the contents. That might help with your Googling. You're essentially doing page scraping.
I want to crawl the whole content of the following link with a Java program. The first page is no problem, but when I want to crawl the data of the next pages, there is the same source code as for page one. Therefore a simple HTTP Get does not help at all.
This is the link for the page I need to crawl.
The web site has active contents that need to be interpreted and executed by a HMTL/CSS/JavaScript rendering engine. Therefore I have a simple solution with PhantomJS, but it is sophisticated to run PhantomJS code in Java.
Is there any easier way to read the whole content of the page with Java code? I already searched for a solution, but could not find anything suitable.
Appreciate your help,
kind regards.
Using the Chrome network log (or a similar tool in any other browser) you can identify the XHR request that loads the actual data displayed on the page. I have removed some of the query parameters, but essentially the request looks like this:
GET https://www.blablacar.de/search_xhr?fn=frankfurt&fcc=DE&tn=muenchen&tcc=DE&sort=trip_date&order=asc&limit=10&page=1&user_bridge=0&_=1461181945520
Helpfully, the query parameters look quite easy to understand. The order=asc&limit=10&page=1 part looks like it would be easy to adjust to return your desired results. You could adjust the page parameter to crawl successive pages of data.
The response is JSON, for which there are a ton of libraries available.
I retrieving the Off time of a page and returning offtimeQuery.toString() and retrieving the page title String resultPageTitle = resultPage.getTitle(); using java.
I am sending an email to the content authors of all the pages which have reached off time. How do i display this off time and page name in my HTML email using javascript?
It is near impossible to make an email with JS in it. If you do manage, then half of the email clients will tear it apart to protect to client-side computer.
Instead, I assume that you are using a Linux box for hosting, you could do two things that would work. Use a bash file or equivalent to dynamically create the page and then fire it off at given times, followed immediately by an email that serves the HTML of that page. That is pretty easy.
The other way would be to use a JS file to do the same. This could work through either time-based(HARD) or by you accessing the page through a browser when you wanted it to go. Again, dynamically create the page with JS and then use the system to send the HTML of the page.
Don't use javascript. Almost all HTML email clients will not run any Javascript, because it is a huge security hole.
Instead put the relevant data into the body of the email as you construct it in your Java code. Presumably you have, in your Java code, those bits of data, and you have the HTML content you're sending as an email. Insert the data there, at most basic using String.format(template, data, ...). But if you are going to do anything other than trivial replacement, use a proper HTML templating system.
Don't try to include any javascript with a HTML email. You may be able to find an email client where it works, but won't for most of your clients.
Javascript gets stripped from email messages due to concerns with security. You could force a redirect and perform phishing attacks, and steal other nasty info such as cookies from the domain that the email was sent to if Dynamic emails were made possible with javascript.
If you are really interested in displaying dynamic content, and don't care how, think about creating a server script which returns an image. You could pass a static identifier to the script, and it could return a dynamic result.
See my project in php at https://github.com/TabLand/EmailTracker which generates dynamic images. Only the time string is dynamic. I'd show you the demonstration, but would end up logging your useragent and IP address!
I am a beginner in java and web app development. Suppose to analyze & optimize JSP pages, those are taking some while to get data from server.
My question is, can we load the supporting file while the jsp is Waiting for server response?
I think you misunderstand how HTML, JS and CSS work.
In short: the browser sends a request for a certain JSP page. This page return from the server and holds within it a number of link tags referring to the CSS and JS for the file. The browser parses this page and sees that it needs extra resources in order to properly use the page. So it sends another request to the server for the CSS and JS.
Because of this, it is impossible for the browser to know in advance what CSS and JS the JSP page would need, because these are determined by the contents of the page itself.
However, that does not mean that you are out of luck. the first page will always need to load it afterwards, but it is possible to load the CSS and JS for the other pages in advance through the explanations on Pre-loading external files (CSS, JavaScript) for other pages. I have not tried these methods myself, but they seem valid.
Well, if I understand you correctly, why don't you just load the CSS/JS files and fire your other function when that's done? I'm not quite sure why you'd want that, though.
I am facing a problem retrieving the contents of an HTML page using java. I have described the problem below.
I am loading a URL in java which returns an HTML page.
This page uses javascript. So when I load the URL in the browser, a javascript function call occurs AFTER the page has been loaded (onBodyLoad of HTML page) and it modifies some content (one of the div id's innerHtml) on the webpage. This change is obviously visible to me in the browser.
Now, when I try to do the same thing using java, I only get the HTML content of the page , BEFORE the javascript call has occurred.
What I want to do is, fetch the contents of the html page after the javascript function call has occurred and all this has to be done using java.
How can I do this? What should my approach be?
You need to use a server side browser library that will also execute the JavaScript, so you can get the JavaScript updated DOM contents. The default browser mechanism doesn't do this, which is why you don't get the expected result.
You should try Cobra: Java HTML Parser, which will execute your JavaScript. See here for the download and for the documentation on how to use it.
Cobra:
It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM. However, Javascript can be disabled.
For anyone reading this answer, Scott's answer above was a starting point for me. The Cobra project is long dead and cannot handle pages which use complex JavaScript.
However there is something called HTML Unit which does just exactly what I want.
Here is a small description:
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.
It is typically used for testing purposes or to retrieve information from web sites.