How to deal in Android with RESTful API that throws HTML table

How to deal in Android with RESTful API that throws HTML table - java

This API returns a whole HTML table. I'm searching how to add this table (as is) into my UI but I've never seen such API throwing HTM table.Browsing Internet for an answer is not giving me any hope either.
Is it possible to put it into a webview? or any other UI object? My application sends a word to the API, and I'm getting the table in return.
I'd appreciate some code example.

You can certainly just show that exact same page in a WebView. If you want to parse the table and display only certain information, there is a library call JSOUP that is available which makes it very convenient to parse HTML.
It looks like you don't mind displaying the whole thing in a WebView - if that is acceptable, then you just load the page into a WebView widget. WebView will take care of rendering the page exactly as you see it in a browser. You only have to tell it what to load.

You parse the output like you would any other web request. If you wanted to include the table in your own webpage, you could. Or you could parse the response for the specific info you need.
Don't think of it as an API, think of it as a URL you're requesting and now you need to do something with the contents. That might help with your Googling. You're essentially doing page scraping.

Related

Read full content of a web page in Java

I want to crawl the whole content of the following link with a Java program. The first page is no problem, but when I want to crawl the data of the next pages, there is the same source code as for page one. Therefore a simple HTTP Get does not help at all.
This is the link for the page I need to crawl.
The web site has active contents that need to be interpreted and executed by a HMTL/CSS/JavaScript rendering engine. Therefore I have a simple solution with PhantomJS, but it is sophisticated to run PhantomJS code in Java.
Is there any easier way to read the whole content of the page with Java code? I already searched for a solution, but could not find anything suitable.
Appreciate your help,
kind regards.

Using the Chrome network log (or a similar tool in any other browser) you can identify the XHR request that loads the actual data displayed on the page. I have removed some of the query parameters, but essentially the request looks like this:
GET https://www.blablacar.de/search_xhr?fn=frankfurt&fcc=DE&tn=muenchen&tcc=DE&sort=trip_date&order=asc&limit=10&page=1&user_bridge=0&_=1461181945520
Helpfully, the query parameters look quite easy to understand. The order=asc&limit=10&page=1 part looks like it would be easy to adjust to return your desired results. You could adjust the page parameter to crawl successive pages of data.
The response is JSON, for which there are a ton of libraries available.

Retrieving contents of URL after they have been changed by javascript

I am facing a problem retrieving the contents of an HTML page using java. I have described the problem below.
I am loading a URL in java which returns an HTML page.
This page uses javascript. So when I load the URL in the browser, a javascript function call occurs AFTER the page has been loaded (onBodyLoad of HTML page) and it modifies some content (one of the div id's innerHtml) on the webpage. This change is obviously visible to me in the browser.
Now, when I try to do the same thing using java, I only get the HTML content of the page , BEFORE the javascript call has occurred.
What I want to do is, fetch the contents of the html page after the javascript function call has occurred and all this has to be done using java.
How can I do this? What should my approach be?

You need to use a server side browser library that will also execute the JavaScript, so you can get the JavaScript updated DOM contents. The default browser mechanism doesn't do this, which is why you don't get the expected result.
You should try Cobra: Java HTML Parser, which will execute your JavaScript. See here for the download and for the documentation on how to use it.
Cobra:
It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM. However, Javascript can be disabled.

For anyone reading this answer, Scott's answer above was a starting point for me. The Cobra project is long dead and cannot handle pages which use complex JavaScript.
However there is something called HTML Unit which does just exactly what I want.
Here is a small description:
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.
It is typically used for testing purposes or to retrieve information from web sites.

How to fill out form data on a website

I am looking to develop an app that will take login details from the user, go to a website, login, return values on the web page and then display them to the user on the phone.
Does java have this functionallity? Will I need to use javascript instead maybe? do these answers depend on the website that I am trying to access?
In my head I figure that I could just read in the paramaters as strings or chars, parse the webpage for the appropriate form and "paste" the appropriate value into the form "box". However, I have never attempted anything like this with coding so I am completely new to the idea and dont really know where to start. I tried googling around but any information that I found was either irrelevant or conflicting.
I'm not looking for the code to do it because I will not really learn anythig from that but a finger in the right direction would be great. I really do want to try get better at programming so that's why I've started to give myself these little side projects
Any help that can be offered would be great

Ian,
You can try using http-client (http://hc.apache.org/httpclient-3.x/) lib from apache. It lets to pro grammatically access a website (from a Java code). You will need to do the following things
Use the http-client lib to POST the data to the web site.
Receive the html response.
Use some html parser or xpath to retrieve the values from the response html.

You would need a script which accesses the webpage and enters the data, but in my opinion this is illegal. Because you are accessing a secured area and are able to look into sensitive data. Also accessing the page via a script is "botting" - most pages have safety precautions to prevent the execution of scripts, because most of them are harmful.
In my opinion there is no legal and easy solution to this.

How do i get Contents of an ASPX file through java?

In java, is there any way to get the content of a webpage, wich is an .ASPX file.
I know how to read/write anything from a normal HTML page, but asp pages seem to have one URL for multiple pages, so it's not really possible to reach the desired page by URL.
I understand you can't/won't give me complete instructions right here, but could you maybe send me in the right direction ?
Thanks in advance.

There is nothing special about ASPX pages compared to any other type of page; "plain" html pages could have been dynamically generated as well.
Just don't forget that the query string is also part of the URL. Many ASPX, PHP, etc pages might not even be 'correct' to request without some query string value at all. And other sites don't have file extensions at all... like this site itself. You just have to be sure to get the entire URL for each unique 'page'.

I'm not an expert on .asp, so I might be wrong. However, my impression is that a .asp page should ultimately return HTML (similarly to what a .jsp page does), so you can fetch the content in the same way as you would do for an HTML page.
However, you write that
asp pages seem to have one URL for multiple pages
this makes me think that perhaps your .asp page is using AJAX and so the page content may change while the URL doesn't. Is this your case?

I understand that you are trying to read the aspx from a client PC, not from the server.
If that's right, accessing an HTTP resource is independent from the technology used by the server, all you need to do is to open an http request and retrieve the results.
If you see multiple pages from one URL, then one of the following is happening:
1) POST data is sent to the aspx, and it renders different HTML due to these parameters
2) You are not looking really at the inner page but to a page that provides the frames for the HTML being rendered
3) The page uses heavily Ajax in order to be rendered. The "contents" of the page are not download through the initial request but later by javascript.
Generally, it is probably the first reason.

java html parser doesnt read all page

I'm parsing html pages to get specific information, but there are some pages that I cant get all the information displayed on the web page, for example in this page
I cant get the reviews information.
By the way, if you see the source code of the page there are very much empty lines, and the reviews information dont appear.
Do you know why?
Some library to read this type of pages?
Thanks

I'm willing to bet they are using some sort of javascript to load in the review information. In order to access that information, you are going to need to somehow either mimic the request or evaluate the javascript and then parse the resulting page. I would suggest examining their javascript and mimicking the request they use to download the review information as that will be much easier than attempting to evaluate the javascript in your code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to deal in Android with RESTful API that throws HTML table - java

Related

Read full content of a web page in Java

Retrieving contents of URL after they have been changed by javascript

How to fill out form data on a website

How do i get Contents of an ASPX file through java?

java html parser doesnt read all page

Categories

Resources