URLConnection didnt return complete content of file

URLConnection didnt return complete content of file - java

My code looks like
URL oracle = new URL(calURL);
FileWriter overall = new FileWriter("overall.txt");
HttpURLConnection yc = (HttpURLConnection) oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
overall.append("\n"+inputLine);
}
It seems it is returning only half of content .. Not getting the full content
Note : calURL is dynamically generated

calURL is taking much time to load. Before its my stream starts reading I guess. I included timeout before URL connection it is getting full data now.

Related

Preloading a website before fetching HTML from the URL

I'm trying to get data off of a URL, but the information I need takes a few seconds to load, and only shows as LOADING in the HTML until it does load, so when I use this code I can't pull the data I need.
URL url = new URL("https://www.cardservices.uga.edu/fs_mobile/");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null){
System.out.println(lineNumber +": "+ line);
}
How could I go about allowing the URL to load for a set amount of time before pulling the HTML off of it?

The webpage you are calling probably call an ajax call to fetch the data, thats why you won't get it using your approach.
You have 2 options to get that data:
Use browser's inspect elements(F12 in chrome) and in "network" tab, get that ajax call, and use it instead of the URL you are using in your code.
Call your URL using a headless library(e.g ghoustjs) and after page is load crawl the data.
IMO I would choose option 1

Here is a working alternate,
URL url = new URL("https://www.cardservices.uga.edu/fs_mobile/index.php/dashboard/occupancies/"); //This is the AJAX call that goes to load the data into webpage. You can get this from inspecting the network calls.
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null){
System.out.println(line);
}
Which basically gives you the JSON response containing the percentage.
Hope it helps.
Also, you can use Selenium for performing wait if you are so curious to get the exact HTML output.

Reading HTML from URL in Java vs. Python

I'm trying to read the HTML from a particular URL and store it into a String for parsing. I referred to a previous post to help me out. When I print out what was read, all I get are special characters.
Here is my Java code (with try/catches left out) that reads from a URL and prints:
String path = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL url = new URL(path);
InputStream in = url.openStream();
BufferedReader bw = new BufferedReader(new InputStreamReader(in, "UTF-8");
String line;
while ((line = bw.readLine()) != null) {
System.out.println(line);
}
Program output:
�ĘY106-6b1bd15200.jsonpmP�r� �Ƨ�!�%m�vD"��Ra*��w�%����ݳ�sβ��MK�d�9+%�m��l^��މ����:���� ���8B�Vce�.A*��x$FCo���a�b�<����Xy��m�c�>t����� �Z������Gx�o� �J���oKe�0�5�kGYpb�*l����+|�U���-�N3��jBp�R�z5Cۥjh��o�;�~)����~��)~ɮhy��<c,=;tHW���'�c�=~�w���
Expected output:
window.page106_callback(["<div class=\"newpage\" id=\"page106\" style=\"width: 902px; height:1273px\">\n<div class=image_layer style=\"z-index: 1\">\n<div class=ie_fix>\n<img class=\"absimg\" style=\"left:18px;top:27px;width:860px;height:1077px;clip:rect(1px 859px 1076px 1px)\" orig=\"http://html.scribd.com/913q5pjrsw60h9i4/images/106-6b1bd15200.jpg\"/>\n</div>\n</div>\n</div>\n\n"]);
At first, I thought it was an issue with permissions or something that somehow encrypted the stream, but my friend wrote a small Python script to do the same thing and it worked, thereby ruling this out. This is what he wrote:
import requests
link = 'https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-
6b1bd15200.jsonp'
f = requests.get(link)
text = (f.text)
print(text)
So the question is, why is the Java version unable to correctly read and print from this particular URL? Note that I tried testing some other URLs from various websites and those worked fine. Maybe I should learn Python.

The response is gzip-encoded. You can do:
InputStream in = new GZIPInputStream(con.getInputStream());

#Maurice Perry is right, I tried with below code
String url = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(new GZIPInputStream(con.getInputStream())));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());

405 - method not allowed calling web service from struts

I have a data access object which calls a web service. In my browser I can hit the web service using a url and it is successful.
http://mycompany:9080/ReportingManager/service/repManHealth/importHistoryTrafficLightStatus.json
But when try to execute the code below in my data access object I get a 405 error saying method not allowed.
String requestURI = "http://mycompany:9080/ReportingManager/service/repManHealth/importHistoryTrafficLightStatus.json";
URL url = new URL(requestURI);
HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();
httpCon.setDoOutput(true);
httpCon.setRequestMethod("GET");
httpCon.setRequestProperty("Accept", "application/json");
OutputStreamWriter out = new OutputStreamWriter(
httpCon.getOutputStream());
int responseCode = httpCon.getResponseCode();
String responseMessage = httpCon.getResponseMessage();
BufferedReader rd = new BufferedReader(new InputStreamReader(httpCon.getInputStream()));
StringBuffer sb = new StringBuffer();
String line = null;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
rd.close();
String jsonResponse = sb.toString();
out.close();
httpCon.disconnect();
Can someone help me with what might be wrong here?
Also maybe there is a better way to execute a web service to an external application and read the response using struts? Or do people think this method is okay?
thanks

If u are using GET method. Try the below code.
string url = String.Format("http://somedomain.com/samplerequest?greeting={0}",param);
WebClient serviceRequest = new WebClient();
serviceRequest.Headers[HttpRequestHeader.ContentType] = "application/json";
string response = serviceRequest.DownloadString(new Uri(url));

Thanks for your ideas however non of them were quite right. I fixed the 405 using the code below...
String requestURI = "http://myserver:9080/ReportingManager/service/repManHealth/importHistoryTrafficLightStatus.json";
URL url = new URL(requestURI);
URLConnection conn = url.openConnection();
InputStream in = conn.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(in));
StringBuffer sb = new StringBuffer();
String line = null;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
rd.close();

By calling httpCon.getOutputStream()); you're not sending a HTTP GET anymore, but a HTTP POST.
Note: This is under the assumption you end up getting the implementation provided by Sun. Which will change GET to POST for backwards compatibility.

Reading http file on url

I am trying to read an xml file on an http url. "Unexpected end of file from server" keeps on coming . the page is password protected, I would like to know if I am properly giving in my url authentication details.
I tried with non protected pages and I can read them properly. Here is my code:
URL url = new URL("http://username:pass#0.0.0.0:0000/test.xml");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();

Get raw text from html

Im on quite a basic level of android development.
I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that)
So, to be clear: I want to get the text written on a page into etc. a string in my application.
I tried this code, but im not even sure if it does what i want.
URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";
I cant get any text from it anyhow. How should I do this?

From the sample code you gave you are not even reading the response from the request. I would get the html with the following code
URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null)
buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());
From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.

You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.

Here is one way:
public String scrape(String urlString) throws Exception {
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line = null, data = "";
while ((line = reader.readLine()) != null) {
data += line + "\n";
}
return data;
}
Here is another.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

URLConnection didnt return complete content of file - java

calURL is taking much time to load. Before its my stream starts reading I guess. I included timeout before URL connection it is getting full data now.

Related

Preloading a website before fetching HTML from the URL

Reading HTML from URL in Java vs. Python

405 - method not allowed calling web service from struts

Reading http file on url

Get raw text from html

Categories

Resources