Java - Read page source from url does not work - java

I am using the code below to read page source from url. It works almost for all urls but not for this url and just returns the url itself.
public static String getURLSource(String url) throws IOException
{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
//urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
private static String toString(InputStream inputStream) throws IOException
{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
{
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
What is the problem and how can I modify the code to work properly? Thanks.

You must use a HttpsURLConnection, since it is https.

Related

How to get error 403 internet explorer subcodes

I've this code in Java 8
String u = "https://test.com?WSDL";
URL url = new URL(u);
System.out.println("Link: " + u);
SSLContext sc = SSLContext.getInstance("TLSv1");
sc.init(null, null, new java.security.SecureRandom());
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
con.setSSLSocketFactory(sc.getSocketFactory());
System.out.println("After url connection");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
con.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(con.getInputStream(), Charset.forName("UTF-8")));
String input;
while ((input = br.readLine()) != null) {
System.out.println(input);
}
br.close();
And I get Error 403 forbidden when I connect to a ii's web service. How can I get Internet explorer subcode error (es. 403.1 execute access forbidden, 403.2 Read access forbidden)?

open the url link for any protocol (not only http and https)

public class Html {
public static List<String> extractLinks(String url) throws IOException{
Document doc = (Document) Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for(Element link : links)
{
System.out.println(" Link : "+link.attr("abs:href"));
Document doc1 = Jsoup.connect(link.attr("abs:href")).get();
String title = doc1.title();
if(doc1 != null)
{
System.out.println(" Title :"+title);
System.out.println("\n");
}
else
{
System.out.println("Not found");
}
}
return null;
}
public static void main(String[] args) throws IOException {
try
{
String site = "http://english.whut.edu.cn/";
Html.extractLinks(site);
}catch(Exception e)
{
System.out.println(e);
}
}
}
This code can open and read the title only for http and https protocols, But I need to open and read other protocols too. Is there any specific method for that?
Maybe this can help :
urlSource = getURLSource("YourURL");
public static String getURLSource(String url) throws IOException{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
public static String toString(InputStream inputStream) throws IOException{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"))){
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null){
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
with this function you get in a String the source code of any website you want

HttpUrlConnection to get content from aspx?

So, I want to get data from aspx page. And I need to do the follow steps
1)Send GET request and get __VIEWSTATE and __EVENTVALIDATION.
2)Send POST request with parameters (to login) then update __VIEWSTATE and __EVENTVALIDATION
3)Send POST request with parameters (to choose field) then update variables too
4)Send POST request with parameters (to press button)
5)Parse page content.
Page content dynamically loaded after script.
I tryed to use HttpUrlConnection
public String sendGet(String url) throws Exception {
StringBuilder result = new StringBuilder();
URL ur = new URL(url);
HttpURLConnection connection = (HttpURLConnection) ur.openConnection();
connection.setRequestMethod("GET");
connection.setUseCaches(false);
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36");
connection.setRequestProperty("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
connection.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = br.readLine()) != null) {
result.append(inputLine);
}
br.close();
return result.toString();
}
public String sendPost(String url, List<NameValuePair> formParams) throws Exception {
URL ur = new URL(url);
HttpURLConnection connection = (HttpURLConnection) ur.openConnection();
connection.setReadTimeout(10000);
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setConnectTimeout(10000);
connection.setRequestMethod("POST");
for (HttpCookie cookie : this.cookies) {
connection.addRequestProperty("Cookie", cookie.toString());
}
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.connect();
OutputStream os = connection.getOutputStream();
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(os, "UTF-8"));
writer.write(getQuery(formParams));
writer.flush();
writer.close();
os.close();
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String response = "";
String line;
while ((line = br.readLine()) != null) {
response += line;
}
System.out.println(Jsoup.parse(response).text());
updateViewState(response);
updateSubSession(response);
return response;
}
And I stayed in the second step. POST and GET subSession is different , but Cookie is the same :). So, EVENTVALIDATION and VIEWSTATE is invalid, and I get "login.aspx" page again. This is how I get Cookie
List<HttpCookie> cookies;
public void init() throws Exception {
CookieManager manager = new CookieManager();
manager.setCookiePolicy(CookiePolicy.ACCEPT_ALL);
CookieHandler.setDefault(manager);
String html = sendGet(url);
updateViewState(html);
updateSubSession(html);
CookieStore cookieJar = manager.getCookieStore();
cookies=cookieJar.getCookies();
}
So, I tryed to send POST with subSession.
public String sendPost(String url, List<NameValuePair> formParams) throws Exception {
URL ur = new URL(url+subSession);
HttpURLConnection connection = (HttpURLConnection) ur.openConnection();
And...
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 500 for URL: http://www.aogc2.state.ar.us:8080/DWClient/Login.aspx?DWSubSession=9852&v=1589
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at com.company.UrlConnection.sendPost(UrlConnection.java:111)
at com.company.UrlConnection.login(UrlConnection.java:138)
at com.company.UrlConnection.start(UrlConnection.java:35)
at com.company.Main.main(Main.java:11)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
I thought that I was wrong to get cookie and I tryed to get cookie like this:
connection.getHeaderFields().get("Set-Cookie");
but Cookie were null .
Where I was wrong? Thank you a lot!

Java 400 error but opens in browser

I am making a facebook software and for it i need to make a fql query to get friendlist. I use the below code
String url = "https://graph.facebook.com/fql?q=SELECT uid, name, pic_square FROM user WHERE uid = me() OR uid IN (SELECT uid2 FROM friend WHERE uid1 = me())&access_token="+access;
url = url.replace(" ", "%20");
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("POST");
//add request header
String USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36";
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
System.out.println(responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
But i always seem to get a 400 error.I have tried replace spaces by %20 but to no effect.
When i open the link in the browser, it opens good.
I feel ashamed.The solution is really the POST line which should be GET.
String url = "https://graph.facebook.com/fql?q=SELECT uid, name, pic_square FROM user WHERE uid = me() OR uid IN (SELECT uid2 FROM friend WHERE uid1 = me())&access_token="+access;
url = url.replace(" ", "%20");
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
String USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36";
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
System.out.println(responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();

How to get a web page's source code from Java [duplicate]

This question already has answers here:
How do you Programmatically Download a Webpage in Java
(11 answers)
Closed 7 years ago.
I just want to retrieve any web page's source code from Java. I found lots of solutions so far, but I couldn't find any code that works for all the links below:
http://www.cumhuriyet.com.tr?hn=298710
http://www.fotomac.com.tr/Yazarlar/Olcay%20%C3%87ak%C4%B1r/2011/11/23/hesap-makinesi
http://www.sabah.com.tr/Gundem/2011/12/23/basbakan-konferansta-konusuyor#
The main problem for me is that some codes retrieve web page source code, but with missing ones. For example the code below does not work for the first link.
InputStream is = fURL.openStream(); //fURL can be one of the links above
BufferedReader buffer = null;
buffer = new BufferedReader(new InputStreamReader(is, "iso-8859-9"));
int byteRead;
while ((byteRead = buffer.read()) != -1) {
builder.append((char) byteRead);
}
buffer.close();
System.out.println(builder.toString());
Try the following code with an added request property:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class SocketConnection
{
public static String getURLSource(String url) throws IOException
{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
private static String toString(InputStream inputStream) throws IOException
{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
{
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
}
URL yahoo = new URL("http://www.yahoo.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(
yahoo.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
I am sure that you have found a solution somewhere over the past 2 years but the following is a solution that works for your requested site
package javasandbox;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
/**
*
* #author Ryan.Oglesby
*/
public class JavaSandbox {
private static String sURL;
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws MalformedURLException, IOException {
sURL = "http://www.cumhuriyet.com.tr/?hn=298710";
System.out.println(sURL);
URL url = new URL(sURL);
HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();
//set http request headers
httpCon.addRequestProperty("Host", "www.cumhuriyet.com.tr");
httpCon.addRequestProperty("Connection", "keep-alive");
httpCon.addRequestProperty("Cache-Control", "max-age=0");
httpCon.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
httpCon.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
httpCon.addRequestProperty("Accept-Encoding", "gzip,deflate,sdch");
httpCon.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
//httpCon.addRequestProperty("Cookie", "JSESSIONID=EC0F373FCC023CD3B8B9C1E2E2F7606C; lang=tr; __utma=169322547.1217782332.1386173665.1386173665.1386173665.1; __utmb=169322547.1.10.1386173665; __utmc=169322547; __utmz=169322547.1386173665.1.1.utmcsr=stackoverflow.com|utmccn=(referral)|utmcmd=referral|utmcct=/questions/8616781/how-to-get-a-web-pages-source-code-from-java; __gads=ID=3ab4e50d8713e391:T=1386173664:S=ALNI_Mb8N_wW0xS_wRa68vhR0gTRl8MwFA; scrElm=body");
HttpURLConnection.setFollowRedirects(false);
httpCon.setInstanceFollowRedirects(false);
httpCon.setDoOutput(true);
httpCon.setUseCaches(true);
httpCon.setRequestMethod("GET");
BufferedReader in = new BufferedReader(new InputStreamReader(httpCon.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
a.append(inputLine);
in.close();
System.out.println(a.toString());
httpCon.disconnect();
}
}

Categories

Resources