Retrieve HTML content refreshed with Ajax - java

I tried to get HTML content from a website and I did it with this code.
public void extractRoutes(String urlStringifyed) throws MalformedURLException, IOException {
URL url = new URL(urlStringifyed);
URLConnection c = url.openConnection();
c.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = null;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line);
}
}
Now I want to get the content from a specific page that is loaded with Ajax and protected with ReChapta, but I can't.
Below is the url. I'm passing all the arguments, but the content that I get from this link says to me that the service is temporally down and I should try later. The thing that I don't understand is that when I copy the url and paste in my browser, it works fine. The second link that does not involve rechapta shouts the same thing.
https://mersultrenurilor.infofer.ro/ro-RO/Itineraries?DepartureStationName=Ia%C8%99i&ArrivalStationName=Suceava&DepartureDate=21.01.2019&TimeSelectionId=0&MinutesInDay=0&ChangeStationName=&DepartureTrainRunningNumber=&ArrivalTrainRunningNumber=&ConnectionsTypeId=0&OrderingTypeId=0&g-recaptcha-response=03AO9ZY1ChGhLCoSKCnF49dyCskHENK7ZUYdJEK_UCDVPn7RYGp40CMRUxvA0Q_ni6fDhP9BRm6viymicOOudd78WJbaHb2vbbtCq0DLS7NzngWBAgBKaWBFBa94RKqetwMSR89p5G1a8oS3bknB6d2tyZ2zhUk1veesR2Ef-RNVXDMpy0GotKH_XGPylDTvL5ftIrDem1LmWb4lQYNY0CCJ7jFScQf6SRqSH18jBWHAGEXVSlsQjoK8X4Q6riSlo1LK_vMJR-F-HVig7vavBd6zTI6LjceGyBtlQZCK7tcIuj4cS9Yg-tMbRKn_laukwLkceOpN8Q88_Aafz9JPtyx-eJAN_5fMbuRw
http://mersultrenurilor.infofer.ro/ro-RO/Itineraries?DepartureStationName=Ia%C8%99i&ArrivalStationName=Suceava&DepartureDate=21.01.2019%200%3A00%3A00&AreOnlyTrainsWithReservation=False&ArrivalTrainRunningNumber=&DepartureTrainRunningNumber=&ConnectionsTypeId=0&MinutesInDay=0&OrderingTypeId=0&TimeSelectionId=0&ChangeStationName=&IsSearchWanted=False
How I can get html content(I'm interested in train routes that are showed) from this loaded url?

Related

Download AJAX generated content using java

I have a webpage on which a list of movies is being displayed. The content is created using AJAX (as far as my limited knowledge would suggest...).
I want to download the content, in this case the movie playing times, using Java. I know how to download a simple website, but here my solution only gives me the following as an result instead of the playing times:
ajaxpage('http://data.cineradoplex.de/mod/AndyCineradoProg/extern',
"kinoprogramm");
How do I make my program download the results this AJAX function gives?
Here is the code I use:
String line = "";
URL myUrl = http://www.cineradoplex.de/programm/spielplan/;
BufferedReader in = null;
try {
myUrl = new URL(URL);
in = new BufferedReader(new InputStreamReader(myUrl.openStream()));
while ((line = in.readLine()) != null) {
System.out.println(line);
}
} finally {
if (in != null) {
in.close();
}
}
In your response you can see the address from which actual data is retrieved
http://data.cineradoplex.de/mod/AndyCineradoProg/extern
You can request its contents and parse it.

How to print data from a webpage? Not the html code of the page.

In java I am trying to read a webpage. I want to print only the data of the page. But my code is printing whole html code. It looks weird. I can see the exact data I want it is hiding in the html. How can I get rid of printing the html code?
here is my code:
URL url = new URL("http://www.rxbd.info/Controller/Controller?action=details&drug=zorubicin&group=generic");
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null ) {
System.out.println(line);
}
Have a look at Jericho. The Renderer class can render the original HTML to text, The TextExtractor class can just extract the text.

Screen scraping in Java

I'm trying to create an application, written in java, that uses my university class search function. I am using a simple http get request with the following code:
public static String GET_Request(String urlToRead) {
java.net.CookieManager cm = new java.net.CookieManager();
java.net.CookieHandler.setDefault(cm);
URL url;
HttpURLConnection conn;
BufferedReader rd;
String line;
String result = "";
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
}
catch (Exception e) {
e.printStackTrace();
}
return result;
}
But it is not working.
Here is the url I am trying to scrape:
https://webapp4.asu.edu/catalog/classlist?c=TEMPE&s=CSE&n=100&t=2141&e=open&hon=F
I tried looking into jsoup but when I go to their try jsoup tab and fetch the url it is coming up with the same results as the get request is coming up with.
The, repeated, failed results that I'm getting with the http get request and jsoup is that it is bring up the search page of the university but not the actual classes and information about if they are open or not.
What I am ultimately looking for is a way to scrape the website that shows if the classes have open seats or not. Once I get the contents of the web page I could parse through it I'm just not getting any good results.
Thanks!
You need to add a cookie to answer the initial course offerings question:
class search course catalog
Indicate which course offerings you wish to see
* ASU Campus
* ASU Online
You do this by simply adding
conn.setRequestProperty("Cookie", "onlineCampusSelection=C");
to the HttpURLConnection.
I found the cookie by using Google Chrome's Developer Tools (Ctrl-Shift-I) and looked at Resources tab then expanded Cookies to see the webapp4.asu.edu cookies.
The following code (mostly yours) gets the HTML of the page you are looking for:
public static void main(String[] args) {
System.out.println(download("https://webapp4.asu.edu/catalog/classlist?c=TEMPE&s=CSE&n=100&t=2141&e=open&hon=F"));
}
static String download(String urlToRead) {
java.net.CookieManager cm = new java.net.CookieManager();
java.net.CookieHandler.setDefault(cm);
String result = "";
try {
URL url = new URL(urlToRead);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Cookie", "onlineCampusSelection=C");
BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
result += line + "\n";
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
Although, I'd use a real parser like jsoup or HTML Parser to do the actual parsing job.

Get jsp content after forward

How can I get content of a jsp page, after servlet made a forward. At them moment I'm trying the following:
request.getRequestDispatcher(DESTINATION_PAGE).forward(request, response);
URL teamsURL = new URL(request.getScheme(), request.getServerName(), request.getServerPort(), request.getContextPath() + DESTINATION_PAGE);
URLConnection teamsCon = teamsURL.openConnection();
String fileName = request.getServletContext().getRealPath("/") + System.currentTimeMillis() + ".html";
System.out.println(fileName);
try (BufferedReader in = new BufferedReader(new InputStreamReader(teamsCon.getInputStream()));
PrintWriter out = new PrintWriter(fileName)) {
String inputLine = null;
while ((inputLine = in.readLine()) != null) {
out.println(inputLine);
}
}
I get the html with empty divs. But, I want the same page I see in browser.
Sorry for messy post, ask for what info you need, I'll update my post accordingly.
If you're trying to get the response body after you've written it, you'll need to use a custom HttpServletResponse wrapper that keeps track of what was written to the OutputStream directly or with the Writer.
You will do this in a servlet Filter after chain.doFilter(request, yourResponseWrapper) returns. A simple example can be found here.

How to get data from a URL?

I have data provided in a website and it is presented as in the following image:
This website offers values of some parameters which are updated every while.
Now I want to retrieve these data according to the column name and the row number in the Android app, I know how to open http connection. But unfortunately I have no idea from where should I start and how to read the data provided in the image.
Unless you have a special data source to work on, you have to read the website's contents then process it manually. Here is a link from the java tutorials on how to read from an URL connection.
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
EDIT:
If you are behind a proxy you should also set these system properties (to the appropriate values):
System.setProperty("http.proxyHost", "3.182.12.1");
System.setProperty("http.proxyPort", "1111");
If the data is only clear text and the format of table doesnt change you can parse the entire table, for Example after reading the "------- ..." Line you can parse the values using a scanner:
Scanner s;
while ((inputLine = in.readLine()) != null)
{
s = new Scanner(input).useDelimiter(" ");
//Then readthe Values like
value = s.next()); // add all values in a list or array
}
s.close();
You have to parse the whole content. Couldn't you call a webservice to get this data, or directly the database which belongs to this view?

Categories

Resources