Java read from URL stream working selectively

Java read from URL stream working selectively - java

Summary: Sample Java code that reads over a URLConnection reads only certain URLs, not others.
Details: I have this sample Java code that I am using to read over a URLConnection. When the URL is "http://www.example.com", the code reads the page content without any issues. However, if the URL is "http://www.cnn.com", the page content is not read
public class StackOverflow {
public static void main(String[] args) throws Exception {
BufferedReader inputStream = null;
try {
String urlStr = "http://www.cnn.com"; // Does not work
// urlStr = "http://www.example.com"; // **Works if this line is uncommented**
URL url = new URL(urlStr);
inputStream = new BufferedReader(new InputStreamReader(url.openStream()));
String textLine = null;
while((textLine = inputStream.readLine()) != null) {
System.out.println(textLine);
}
}
catch (Exception e) {
e.printStackTrace();
}
finally {
if(inputStream != null) inputStream.close();
}
}
}

CNN redirects from http to https but your call doesn't follow redirects. You are getting a 307 with an empty body so the readline results in a null and your loop is skipped. Try with https for CNN.

Related

How to make http call from standalone java application

I'm making a small dictionary kind of app using java swings. I'm using oxford dictionary api for that. Is there any way to make a simple ajax request in java without using servelets and all advanced java concepts. As in android we use http url connection to do this job.I googled a lot for finding this but I could't find a solution as every page is showing results using servelets. But I know core java alone.If it is possible to make ajax call without servelts please help me...Thanks in advance...

Use HttpURLConnection class to make http call.
If you need more help for that then go for offical documentation site of java Here
Example
public class JavaHttpUrlConnectionReader {
public static void main(String[] args) throws IOException{
String results = doHttpUrlConnectionAction("https://your.url.com/", "GET");
System.out.println(results);
}
public static String doHttpUrlConnectionAction(String desiredUrl, String requestType) throws IOException {
BufferedReader reader = null;
StringBuilder stringBuilder;
try {
HttpURLConnection connection = (HttpURLConnection) new URL(desiredUrl).openConnection();
connection.setRequestMethod(requestType);// Can be "GET","POST","DELETE",etc
connection.setReadTimeout(3 * 1000);
connection.connect();// Make call
reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));// Reading Responce
stringBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line).append("\n");
}
return stringBuilder.toString();
} catch (IOException e) {
throw new IOException("Problam in connection : ", e);
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException ioe) {
throw new IOException("Problam in closing reader : ", ioe);
}
}
}
}
}
It will make a call and give response as return string. If you want to make POST call the need to do some extra for that :
try{
DataOutputStream wr = new DataOutputStream(connection.getOutputStream());
wr.write(postParam.getBytes());
} catch(IOException e){
}
Note : Here postParam is String type with value somthing like "someId=156422&someAnotherId=32651"
And put this porson befor connection.connect() statement.

Reading HTML content into java program

I am trying to get recharge plan information of service provider into my java program, the website contains dynamic data, and when i am fetching the URL using URLConnection i am only getting the static content,I want to automate the recharge plans of different website into my program.
package com.fs.store.test;
import java.net.*;
import java.io.*;
public class MyURLConnection
{
private static final String baseTataUrl = "https://www.tatadocomo/pre-paypacks";`enter code here`
public MyURLConnection()
{
}
public void getMeData()
{
URLConnection urlConnection = null;
BufferedReader in = null;
try
{
URL url = new URL(baseTataUrl);
urlConnection = url.openConnection();
HttpURLConnection connection = null;
connection = (HttpURLConnection) urlConnection;
in = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()/*,"UTF-8"*/));
String currentLine = null;
StringBuilder line = new StringBuilder();
while((currentLine = in.readLine()) != null)
{
System.out.println(currentLine);
line = line.append(currentLine.trim());
}
}catch(IOException e)
{
e.printStackTrace();
}
finally{
try{
in.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
public static void main (String args[])
{
MyURLConnection test = new MyURLConnection();
System.out.println("About to call getMeData()");
test.getMeData();
}
}

You must use one of HtmlEditorKits
with Javascript enabled in your browser
and then get content.
See examples:
oreilly

Inspect the traffjc. Firefox has a TamperData plugin for instance. Then you may communicate more directly.
Use apache's HttpClient to facilitate the communication, instead of plain URL.
Maybe use some JSON library if JSON data are coming back.
More details, but you might now skip some loading.

Issue with downloading webpage in java?

So I'm trying to download the text of an aspx webpage (Roblox) with java. My code looks like this:
URL url;
InputStream is = null;
DataInputStream dis;
String line = "";
try {
System.out.println("connecting");
url = new URL("http://www.roblox.com");
is = url.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((line = dis.readLine()) != null) {
System.out.println(line);
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
try {
is.close();
} catch (IOException ioe) {}
}
And it works for www.roblox.com. However, when I try to navigate to a different page - http://www.roblox.com/My/Money.aspx#/#TradeCurrency_tab
- it doesn't work, and just loads the www.roblox.com screen.
Could anyone help clarify this? Any help would be appreciated.

You are getting different content in java than you see in the browser because the server adds the following header to the response:
Location=https://www.roblox.com/Login/Default.aspx?ReturnUrl=%2fMy%2fMoney.aspx
You should get headers' values from URLConnection and redirect manually if the 'Location' header is present. As far as I know even if you used HttpConnection you won't be redirect automatically to 'https'
EDITED:
You could do it with smth like this (I removed other code, like exception handling just to focus on redirection, so don't take it as proper 'coding' example):
public static void main(String[] args) throws Exception {
printPage("http://www.roblox.com/My/Money.aspx#/#TradeCurrency_tab");
}
public static void printPage(String address) throws Exception {
String line = null;
System.out.println("connecting to:" + address);
URL url = new URL(address);
URLConnection conn = url.openConnection();
String redirectAdress = conn.getHeaderField("Location");
if (redirectAdress != null) {
printPage(redirectAdress);
} else {
InputStream is = url.openStream();
DataInputStream dis = new DataInputStream(new BufferedInputStream(is));
while ((line = dis.readLine()) != null) {
System.out.println(line);
}
}
}

Judging by the URL and the use of # I suspect this page is using javascript to dynamically create pages.
You can use something like http://seleniumhq.org/ to emulate a web-browser (including cookies) and this is a far more reliable approach for any kind of dynamic web content.
// The Firefox driver supports javascript
WebDriver driver = new FirefoxDriver();
// Go to the roblox page
driver.get("http://www.roblox.com");
System.out.println(driver.getPageSource());
Of course, there are many better ways to access elements of a page via Selenium's WebDriver API: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebDriver.html
Download the JAR and all the deps in one file: http://code.google.com/p/selenium/downloads/detail?name=selenium-server-standalone-2.27.0.jar
And note, you can navigate to other pages via code: http://seleniumhq.org/docs/03_webdriver.html -
WebElement link = driver.findElement(By.linkText("Click Here Or Whatever"));
link.click();
then
System.out.println(driver.getPageSource());
will get the page text on the next page.

How to handle problem with Network Connectivity in Java

I have a simple java code which gets html text from the input url:
try {
URL url = new URL("www.abc.com");
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(url.openStream()));
while ((line = rd.readLine()) != null) {
String code = code + line;
} catch (IOException e){}
I am using this code in an android project. Now the problem comes when there is no internet connectivity. The application just halts and later gives error.
Is there some way to break this after some fixed timeout, or even return some specific string after an exception is thrown. Can you please tell me how to do that??

Try this:
try
{
URL url = new URL("www.abc.com");
String newline = System.getProperty("line.separator");
InputStream is = url.openStream();
if (is != null)
{
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
StringBuilder contents = new StringBuilder();
while ((line = rd.readLine()) != null)
{
contents.append(line).append(newline);
}
}
else
{
System.out.println("input stream was null");
}
}
catch (Exception e)
{
e.printStackTrace();
}
An empty catch block is asking for trouble.

I don't know what the default timeout is for URL, and a quick look at the javadocs doesn't seem to reveal anything. So try using HttpURLConnection directly instead http://download.oracle.com/javase/1.5.0/docs/api/java/net/HttpURLConnection.html. This lets you set timeout values:
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.google.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setConnectTimeout(5000); // 5 seconds
conn.setRequestMethod("GET");
conn.connect();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
System.out.println(line);
}
conn.disconnect();
}
You can also set a read time out as well, as well as specify behaviour re redirects and a few other things.

I think in addition to timeouts it could be also smart to check the Internet availability right before the requesting:
public class ConnectivityHelper {
public static boolean isAnyNetworkConnected(Context context) {
return isWiFiNetworkConnected(context) || isMobileNetworkConnected(context);
}
public static boolean isWiFiNetworkConnected(Context context) {
return getWiFiNetworkInfo(context).isConnected();
}
public static boolean isMobileNetworkConnected(Context context) {
return getMobileNetworkInfo(context).isConnected();
}
private static ConnectivityManager getConnectivityManager(Context context) {
return (ConnectivityManager) context.getSystemService(Context.CONNECTIVITY_SERVICE);
}
}
UPDATE: For timeouts see an excellent kuester2000's reply here.

Just a general tip on working with Streams always close them when they are no longer needed. I just wanted to post that up as it seems that most people didn't take care of it in their examples.

How do you Programmatically Download a Webpage in Java

I would like to be able to fetch a web page's html and save it to a String, so I can do some processing on it. Also, how could I handle various types of compression.
How would I go about doing that using Java?

I'd use a decent HTML parser like Jsoup. It's then as easy as:
String html = Jsoup.connect("http://stackoverflow.com").get().html();
It handles GZIP and chunked responses and character encoding fully transparently. It offers more advantages as well, like HTML traversing and manipulation by CSS selectors like as jQuery can do. You only have to grab it as Document, not as a String.
Document document = Jsoup.connect("http://google.com").get();
You really don't want to run basic String methods or even regex on HTML to process it.
See also:
What are the pros and cons of leading HTML parsers in Java?

Here's some tested code using Java's URL class. I'd recommend do a better job than I do here of handling the exceptions or passing them up the call stack, though.
public static void main(String[] args) {
URL url;
InputStream is = null;
BufferedReader br;
String line;
try {
url = new URL("http://stackoverflow.com/");
is = url.openStream(); // throws an IOException
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (is != null) is.close();
} catch (IOException ioe) {
// nothing to see here
}
}
}

Bill's answer is very good, but you may want to do some things with the request like compression or user-agents. The following code shows how you can various types of compression to your requests.
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection(); // Cast shouldn't fail
HttpURLConnection.setFollowRedirects(true);
// allow both GZip and Deflate (ZLib) encodings
conn.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = conn.getContentEncoding();
InputStream inStr = null;
// create the appropriate stream wrapper based on
// the encoding type
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
inStr = new GZIPInputStream(conn.getInputStream());
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
inStr = new InflaterInputStream(conn.getInputStream(),
new Inflater(true));
} else {
inStr = conn.getInputStream();
}
To also set the user-agent add the following code:
conn.setRequestProperty ( "User-agent", "my agent name");

Well, you could go with the built-in libraries such as URL and URLConnection, but they don't give very much control.
Personally I'd go with the Apache HTTPClient library.
Edit: HTTPClient has been set to end of life by Apache. The replacement is: HTTP Components

All the above mentioned approaches do not download the web page text as it looks in the browser. these days a lot of data is loaded into browsers through scripts in html pages. none of above mentioned techniques supports scripts, they just downloads the html text only. HTMLUNIT supports the javascripts. so if you are looking to download the web page text as it looks in the browser then you should use HTMLUNIT.

You'd most likely need to extract code from a secure web page (https protocol). In the following example, the html file is being saved into c:\temp\filename.html Enjoy!
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import javax.net.ssl.HttpsURLConnection;
/**
* <b>Get the Html source from the secure url </b>
*/
public class HttpsClientUtil {
public static void main(String[] args) throws Exception {
String httpsURL = "https://stackoverflow.com";
String FILENAME = "c:\\temp\\filename.html";
BufferedWriter bw = new BufferedWriter(new FileWriter(FILENAME));
URL myurl = new URL(httpsURL);
HttpsURLConnection con = (HttpsURLConnection) myurl.openConnection();
con.setRequestProperty ( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0" );
InputStream ins = con.getInputStream();
InputStreamReader isr = new InputStreamReader(ins, "Windows-1252");
BufferedReader in = new BufferedReader(isr);
String inputLine;
// Write each line into the file
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
bw.write(inputLine);
}
in.close();
bw.close();
}
}

To do so using NIO.2 powerful Files.copy(InputStream in, Path target):
URL url = new URL( "http://download.me/" );
Files.copy( url.openStream(), Paths.get("downloaded.html" ) );

On a Unix/Linux box you could just run 'wget' but this is not really an option if you're writing a cross-platform client. Of course this assumes that you don't really want to do much with the data you download between the point of downloading it and it hitting the disk.

Get help from this class it get code and filter some information.
public class MainActivity extends AppCompatActivity {
EditText url;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate( savedInstanceState );
setContentView( R.layout.activity_main );
url = ((EditText)findViewById( R.id.editText));
DownloadCode obj = new DownloadCode();
try {
String des=" ";
String tag1= "<div class=\"description\">";
String l = obj.execute( "http://www.nu.edu.pk/Campus/Chiniot-Faisalabad/Faculty" ).get();
url.setText( l );
url.setText( " " );
String[] t1 = l.split(tag1);
String[] t2 = t1[0].split( "</div>" );
url.setText( t2[0] );
}
catch (Exception e)
{
Toast.makeText( this,e.toString(),Toast.LENGTH_SHORT ).show();
}
}
// input, extrafunctionrunparallel, output
class DownloadCode extends AsyncTask<String,Void,String>
{
#Override
protected String doInBackground(String... WebAddress) // string of webAddress separate by ','
{
String htmlcontent = " ";
try {
URL url = new URL( WebAddress[0] );
HttpURLConnection c = (HttpURLConnection) url.openConnection();
c.connect();
InputStream input = c.getInputStream();
int data;
InputStreamReader reader = new InputStreamReader( input );
data = reader.read();
while (data != -1)
{
char content = (char) data;
htmlcontent+=content;
data = reader.read();
}
}
catch (Exception e)
{
Log.i("Status : ",e.toString());
}
return htmlcontent;
}
}
}

Jetty has an HTTP client which can be use to download a web page.
package com.zetcode;
import org.eclipse.jetty.client.HttpClient;
import org.eclipse.jetty.client.api.ContentResponse;
public class ReadWebPageEx5 {
public static void main(String[] args) throws Exception {
HttpClient client = null;
try {
client = new HttpClient();
client.start();
String url = "http://example.com";
ContentResponse res = client.GET(url);
System.out.println(res.getContentAsString());
} finally {
if (client != null) {
client.stop();
}
}
}
}
The example prints the contents of a simple web page.
In a Reading a web page in Java tutorial I have written six examples of dowloading a web page programmaticaly in Java using URL, JSoup, HtmlCleaner, Apache HttpClient, Jetty HttpClient, and HtmlUnit.

I used the actual answer to this post (url) and writing the output into a
file.
package test;
import java.net.*;
import java.io.*;
public class PDFTest {
public static void main(String[] args) throws Exception {
try {
URL oracle = new URL("http://www.fetagracollege.org");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
String fileName = "D:\\a_01\\output.txt";
PrintWriter writer = new PrintWriter(fileName, "UTF-8");
OutputStream outputStream = new FileOutputStream(fileName);
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
writer.println(inputLine);
}
in.close();
} catch(Exception e) {
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java read from URL stream working selectively - java

CNN redirects from http to https but your call doesn't follow redirects. You are getting a 307 with an empty body so the readline results in a null and your loop is skipped. Try with https for CNN.

Related

How to make http call from standalone java application

Reading HTML content into java program

Issue with downloading webpage in java?

How to handle problem with Network Connectivity in Java

How do you Programmatically Download a Webpage in Java

Categories

Resources