Java URL hangs and never reads in Website - java

I am trying to read in a website and save it to a string. I'm using this code below which works perfectly fine in Eclipse. But when I try to run the program via the command line in Windows like "java MyProgram", the program starts and just hangs and never reads in the URL. Anyone know why this would be happening?
URL link = new URL("http://www.yahoo.com");
BufferedReader in = new BufferedReader(new InputStreamReader(link.openStream()));
//InputStream in = link.openStream();
String inputLine = "";
int count = 0;
while ((inputLine = in.readLine()) != null)
{
site = site + "\n" + inputLine;
}
in.close();
...

It could be because you are behind a proxy, and Eclipse is automatically adding settings in to configure this.
If you are behind a proxy, when running from the command prompt, try setting the java.net.useSystemProxies property. You can also manually configure proxy settings with a few network properties found here (http.proxyHost, http.proxyPort).

I encountered such a problem and found a solution.
Here my working code:
// Create a URL for the desired page
URL url = new URL("your url");
// Get connection
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setConnectTimeout(5000); // 5 seconds connectTimeout
connection.setReadTimeout(5000 ); // 5 seconds socketTimeout
// Connect
connection.connect(); // Without this line, method readLine() stucks!!!
// because it reads incorrect data, possibly from another memory area
InputStreamReader isr = new InputStreamReader(url.openStream(),"UTF-8");
BufferedReader in = new BufferedReader(isr);
String str;
while (true) {
str = in.readLine();
if(str==null){break;}
listItems.add(str);
}
// Closing all
in.close();
isr.close();
connection.disconnect();

If that's all your code is doing, there's no reason it shoudln't work from the command line. I suspect you've cut out what's broken. For example:
public static void main(String[] args) throws Exception {
String site = "";
URL link = new URL("http://www.yahoo.com");
BufferedReader in = new BufferedReader(new InputStreamReader(link.openStream()));
//InputStream in = link.openStream();
String inputLine = "";
int count = 0;
while ((inputLine = in.readLine()) != null) {
site = site + "\n" + inputLine;
}
in.close();
System.out.println(site);
}
works fine. Another possibility would be if you're running it in Eclipse and from the command line on two different computers, and the latter can't reach http://www.yahoo.com.

Related

Reading HTML from URL in Java vs. Python

I'm trying to read the HTML from a particular URL and store it into a String for parsing. I referred to a previous post to help me out. When I print out what was read, all I get are special characters.
Here is my Java code (with try/catches left out) that reads from a URL and prints:
String path = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL url = new URL(path);
InputStream in = url.openStream();
BufferedReader bw = new BufferedReader(new InputStreamReader(in, "UTF-8");
String line;
while ((line = bw.readLine()) != null) {
System.out.println(line);
}
Program output:
�ĘY106-6b1bd15200.jsonpmP�r� �Ƨ�!�%m�vD"��Ra*��w�%����ݳ�sβ��MK�d�9+%�m��l^��މ����:���� ���8B�Vce�.A*��x$FCo���a�b�<����Xy��m�c�>t����� �Z������Gx�o� �J���oKe�0�5�kGYpb�*l����+|�U���-�N3��jBp�R�z5Cۥjh��o�;�~)����~��)~ɮhy��<c,=;tHW���'�c�=~�w���
Expected output:
window.page106_callback(["<div class=\"newpage\" id=\"page106\" style=\"width: 902px; height:1273px\">\n<div class=image_layer style=\"z-index: 1\">\n<div class=ie_fix>\n<img class=\"absimg\" style=\"left:18px;top:27px;width:860px;height:1077px;clip:rect(1px 859px 1076px 1px)\" orig=\"http://html.scribd.com/913q5pjrsw60h9i4/images/106-6b1bd15200.jpg\"/>\n</div>\n</div>\n</div>\n\n"]);
At first, I thought it was an issue with permissions or something that somehow encrypted the stream, but my friend wrote a small Python script to do the same thing and it worked, thereby ruling this out. This is what he wrote:
import requests
link = 'https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-
6b1bd15200.jsonp'
f = requests.get(link)
text = (f.text)
print(text)
So the question is, why is the Java version unable to correctly read and print from this particular URL? Note that I tried testing some other URLs from various websites and those worked fine. Maybe I should learn Python.
The response is gzip-encoded. You can do:
InputStream in = new GZIPInputStream(con.getInputStream());
#Maurice Perry is right, I tried with below code
String url = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(new GZIPInputStream(con.getInputStream())));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());

Java client POST to php script - empty $_POST

I have researched extensively and cannot find a solution. I have been using the solutions provided to other users and it does not seem to work for me.
My java code:
public class Post {
public static void main(String[] args) {
String name = "Bobby";
String address = "123 Main St., Queens, NY";
String phone = "4445556666";
String data = "";
try {
// POST as urlencoded is basically key-value pairs
// create key=value&key=value.... pairs
data += "name=" + URLEncoder.encode(name, "UTF-8");
data += "&address=" +
URLEncoder.encode(address, "UTF-8");
data += "&phone=" +
URLEncoder.encode(phone, "UTF-8");
// convert string to byte array, as it should be sent
byte[] dataBytes = data.toString().getBytes("UTF-8");
// open a connection to the site
URL url = new URL("http://xx.xx.xx.xxx/yyy.php");
HttpURLConnection conn =
(HttpURLConnection) url.openConnection();
// tell the server this is POST & the format of the data.
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
conn.setRequestMethod("POST");
conn.setFixedLengthStreamingMode(dataBytes.length);
conn.getOutputStream().write(dataBytes);
conn.getInputStream();
// Print out the echo statements from the php script
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));
String line;
while((line = in.readLine()) != null)
System.out.println(line);
in.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}
and the php
<?php
echo $_POST["name"];
?>
The output I receive is an empty line. I tested to see if it was a php/server side issue by making an html form that sends data over to a similar script and prints the data on the screen and that worked. But, for the life of me, I cannot get this to work with a remote client.
I am using Ubuntu server and Apache.
Thank you in advance.
The problem is actually in what you read as output. You are doing two requests:
1)conn.getInputStream(); - sends POST request with desired body
2)BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream())); - sends empty GET request (!!)
Change it to:
// ...
conn.getOutputStream().write(dataBytes);
BufferedReader in = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
and see result.

URLConnection didnt return complete content of file

My code looks like
URL oracle = new URL(calURL);
FileWriter overall = new FileWriter("overall.txt");
HttpURLConnection yc = (HttpURLConnection) oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
overall.append("\n"+inputLine);
}
It seems it is returning only half of content .. Not getting the full content
Note : calURL is dynamically generated
calURL is taking much time to load. Before its my stream starts reading I guess. I included timeout before URL connection it is getting full data now.

Accessing a URL using a loop

I have created a web application, and hosted in on the server. Now I want to create a java program which will access (or "hit") the URL of my application in a loop, so that I can check how much load can my web application can handle. Also, the program should be able to tell me when the URL was successfully accessed, and when it wasn't.
I tried executing it without using a loop:
try {
URL url = new URL("https://example.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
} catch (Exception e) {
System.out.println("e: " + e.toString());
}
But, I got this error:
e: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching example.com found.
Use,
javax.net.ssl.HttpsURLConnection
to connect to https://
something like (note, handle resource closing, exceptions etc left on you)
final URL url = new URL("https://example.com");
final HttpsURLConnection con = (HttpsURLConnection)url.openConnection();
final BufferedReader br = new BufferedReader(new InputStreamReader(con.getInputStream()));
String input;
while ((input = br.readLine()) != null){
System.out.println(input);
}
However there are lots of tool available to load test it
you are just getting a stream from url, but obviously this url is using HTTPS, so you need a "Public Key" imported to your client application. otherwise the client and server won't be able to communicate with each other.

Get raw text from html

Im on quite a basic level of android development.
I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that)
So, to be clear: I want to get the text written on a page into etc. a string in my application.
I tried this code, but im not even sure if it does what i want.
URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";
I cant get any text from it anyhow. How should I do this?
From the sample code you gave you are not even reading the response from the request. I would get the html with the following code
URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null)
buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());
From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.
You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.
Here is one way:
public String scrape(String urlString) throws Exception {
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line = null, data = "";
while ((line = reader.readLine()) != null) {
data += line + "\n";
}
return data;
}
Here is another.

Categories

Resources