How do I retrieve a URL from a web site using Java? - java

I want to use HTTP GET and POST commands to retrieve URLs from a website and parse the HTML. How do I do this?

You can use HttpURLConnection in combination with URL.
URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.
For parsing the html, you can use html parser.

Related

How to HTTP POST parameters with Android/Java?

I'm struggling to find good examples on how to POST key value pairs to a URL with Android in Java.
Here is what the Android documentation says (and pretty much every other example):
URL url = new URL(params[0]);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
try {
urlConnection.setDoOutput(true);
urlConnection.setChunkedStreamingMode(0);
OutputStream out = new BufferedOutputStream(urlConnection.getOutputStream());
writeStream(out);
InputStream in = new BufferedInputStream(urlConnection.getInputStream());
readStream(in);
} finally {
urlConnection.disconnect();
}
How do I implement writeStream?
Many other examples with POST put the parameters in the URL (a=1&b=2&c=3...), but then I could just use GET (?). And I don't want to place the parameters in the URL because that increases the chance of sensitive information to be logged on the server side.
Chrome POSTs data as such (body):
------WebKitFormBoundaryyr0AtYZxcOCCp7hA
Content-Disposition: form-data; name="parameterNameHere"
valueHere
------WebKitFormBoundaryyr0AtYZxcOCCp7hA--
Does the Android framework support this?
If not, are there any good libraries?
EDIT:
This is not a duplication of what was suggested. What was suggested does in no way answer the question, in that it does not show how to post with parameters, which is what this question is about.
There are many libraries out there that would help you achieve this. One of the libraries I use the most is OkHTTP. Include this library in your gradle and check the post from 'mauker' for an example on how to post
How to use OKHTTP to make a post request?

How to pass input to a web page using a automated script

How to pass input to a php web page using a automated script ,i.e. i just want to know how pass arguments to text fields using a script. like passing input to username and password field of a web page and then pressing submit button(that too with a script).
favorable language: JAVA
Try Selenium. Selenium is great at automating web browsers.
http://seleniumhq.org/
Also has pure support with Java. But not only.
When it comes to custom methods, see ...
String urlParameters = "param1=a&param2=b&param3=c";
String request = "http://example.com/index.php";
URL url = new URL(request);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setInstanceFollowRedirects(false);
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("charset", "utf-8");
connection.setRequestProperty("Content-Length", "" + Integer.toString(urlParameters.getBytes().length));
connection.setUseCaches (false);
DataOutputStream wr = new DataOutputStream(connection.getOutputStream ());
wr.writeBytes(urlParameters);
wr.flush();
wr.close();
connection.disconnect();
source (Java - sending HTTP parameters via POST method easily)
if you web page uses the GET method to accept data (i.e. from URL), just connect to the web pages giving the data you want to pass:
http://www.mysite.com/mypage.html?data0=data0,data1=data1
if the web page uses POST things get a little bit more complicated: you have to forge an appropriate HTML request with all your data in the header (as POST method requires)
You can use the Apache HTTPClient - see the example at:
http://hc.apache.org/httpclient-3.x/methods/post.html
This allows you to simulate submitting a fully filled form directly to the destination page and grab the results.
Remember that, after the call, you have to grab and store the session cookie in the response and resubmit it to the following pages you want to "visit" to stay "logged on"
I would like to show how I would do to pass an input to the HTML. I usually use python to send request to the page where I need to input the data. Before doing that you need to know if you need to supply web-cookies or not, if yes, copy the cookie, if you need to be logged in otherwise not, just check that. Once that is done, you need to know the field names for the input area as you will be using them to POST or GET data using your script. Here is sample usage.
import urllib
import urllib2
import string
headers = {'Cookie': 'You cookies if you need'}
values = {'form_name':'sample text', 'submit':''}
data = urllib.urlencode(values)
req = urllib2.Request('website where you making request to',data,headers)
opener1 = urllib2.build_opener()
page1=opener1.open(req)
#OPTIONAL
htmlfile=page1.read()
fout = open('MYHTMLFILE.html', "wb")
fout.write(htmlfile)
fout.close()

Getting HTTP response code in Java

I need to find the HTTP response code of URLs in java. I know this can be done using URL & HTTPURLConnection API and have gone through previous questions like this
and this.
I need to do this on around 2000 links so speed is the most required attribute and among those I already have crawled 150-250 pages using crawler4j and don't know a way to get code from this library (due to which I will have to make connection on those links again with another library to find the response code).
In Crawler4J, the class WebCrawler has a method handlePageStatusCode, which is exactly what you are looking for and what you would also have found if you had looked for it. Override it and be happy.
The answer behind your first link contains everything you need:
How to get HTTP response code for a URL in Java?
URL url = new URL("http://google.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
int code = connection.getResponseCode();
The response code is the HTTP code returned by the server.

Sending sms via java

I am going to send sms via java. The problem is the sms gateway ask me to send in this format
http://push1.maccesssmspush.com/servlet/com.aclwireless.pushconnectivity.listen
ers.TextListener?userId=xxxxx&pass=xxxx&appid=xxxx&subappid=xxxx&msgtyp
e=1&contenttype=1&selfid=true&to=9810790590,9810549717&from=ACL&dlrre
q=true&text=This+is+a+test+msg+from+ACL&alert=
The problem how to call this from a java application is it possible or does it need special libraries? IS it using HttpURLConnection will do the job? Thank you.
A Sample code I have done below is this correct.
URL sendSms1 = new URL("http://push1.maccesssmspush.com/servlet/com.aclwireless.pushconnectivity.listen
ers.TextListener?userId=xxxxx&pass=xxxx&appid=xxxx&subappid=xxxx&msgtyp
e=1&contenttype=1&selfid=true&to=9810790590,9810549717&from=ACL&dlrre
q=true&text=This+is+a+test+msg+from+ACL&alert=");
URLConnection smsConn1 =
sendSms1.openConnection();
It's just an HTTP call, you don't need anything special in Java (or any modern language, I expect). Just build up the string as appropriate*, then make an HTTP request to that URL.
Take a peek at the Sun tutorial Reading from and Writing to a URLConnection if you need to pick up the basics of how to do the request part in Java. This uses the built-in classes, I'm sure there are dozens of libraries that handles connections in funky and/or convenient ways too, so by all means use one of those if you're familiar with it.
*One potential gotcha which might not have occurred to you - your query string arguments will have to be URL-encoded. So the + characters for example in the text parameter, are encoded spaces (which would have a different meaning in the URL). Likewise, if you wanted to send a ? character in one of your parameters, it would have to appear as %3F. Have a look at the accepted answer to HTTP URL Address Encoding in Java for an example of how you might build the URL string safely.
It looks like a simple GET request, you can use Apache HttpClient libarary for executing such a request. Have a look into a tutorial by Vogella here: http://www.vogella.de/articles/ApacheHttpClient/article.html for sample source code and explanations.
You can try to use java.net.URL library。
like this
// at this before you need to generate the urlString as "http://push1.maccesssmspush.com/servlet/com.aclwireless.pushconnectivity.listen
ers.TextListener?userId=xxxxx&pass=xxxx&appid=xxxx&subappid=xxxx&msgtyp
e=1&contenttype=1&selfid=true&to=9810790590,9810549717&from=ACL&dlrre
q=true&text=This+is+a+test+msg+from+ACL&alert="
URL url = new URL(urlString);
// send sms
URLConnection urlConnection = url.openConnection();// open the url
// and you, also can get the feedback if you want
BufferedReader br = new BufferedReader(new InputStreamReader(
urlConnection.getInputStream()));
URL url = new URL("http://smscountry.com/SMSCwebservice.asp");
HttpURLConnection urlconnection = (HttpURLConnection) url.openConnection();
[Edit]
urlconnection.setRequestMethod("POST");
urlconnection.setRequestProperty("Content-Type","application/x-www-form-urlenc‌​oded");
urlconnection.setDoOutput(true);
OutputStreamWriter out = new OutputStreamWriter(urlconnection.getOutputStream());
out.write(postData);
out.close();
BufferedReader in = new BufferedReader(new InputStreamReader(urlconnection.getInputStream()));
String decodedString;
while ((decodedString = in.readLine()) != null) {
retval += decodedString;
}

Alternative to java.net.URL for custom timeout setting

Need timeout setting for remote data request made using java.net.URL class. After some googling found out that there are two system properties which can be used to set timeout for URL class as follows.
sun.net.client.defaultConnectTimeout
sun.net.client.defaultReadTimeout
I don't have control over all the systems and don't want everybody to keep setting the system properties. Is there any other alternative for making remote request which will allow me to set timeouts.
Without any library, If available in java itself is preferable.
If you're opening a URLConnection from URL you can set the timeouts this way:
URL url = new URL(urlPath);
URLConnection con = url.openConnection();
con.setConnectTimeout(connectTimeout);
con.setReadTimeout(readTimeout);
InputStream in = con.getInputStream();
How are you using the URL or what are you passing it to?
A common replacement is the Apache Commons HttpClient, it gives much more control over the entire process of fetching HTTP URLs.

Categories

Resources