Error reading from webpage

Error reading from webpage - java

I'm trying reading to read a webpage.
In a browser it just looks like this:
<b>Failure</b>
<b>Success</b>
But When I read it with my application it gives me this:
http://pastebin.com/vJ6GDWpx
This is my code:
URL url = new URL("http://example.com/auth.php?username=" + username + "&password=" + password);
URLConnection urlconnection = url.openConnection();
urlconnection.setConnectTimeout(10000);
urlconnection.setReadTimeout(10000);
urlconnection.addRequestProperty("Host", "example.com");
urlconnection.addRequestProperty("Connection", "keep-alive");
urlconnection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2");
urlconnection.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
urlconnection.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
urlconnection.addRequestProperty("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
BufferedReader br = new BufferedReader(new InputStreamReader(urlconnection.getInputStream()));
String result;
while ((result = br.readLine()) != null) {
System.out.println(result);
}
br.close();
How can I solve this problem?
Works with HTMLUnit but their library is sooo big.
Is there a smaller solution?

Related

Java URL Connection throws IOException with 403, works perfectly in browser

so I am currently trying to download the html of a website, however I ran into this problem where one website constantly gives me 403 back. I've already had that error in previous projects, and were always able to fix it with adding a User-Agent, however this time, nothing I tried helped. I even copied every single part of my header in my browser, but I still get 403 in Java, while it works perfectly with wget, or other programming languages. Maybe someone here can help me?
URL im trying to download is: here
I'm using the following code (I've copied them 1:1 from my request in firefox):
if (file.exists()) {
Files.delete(file.toPath());
}
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestMethod("GET");
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
httpcon.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpcon.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
httpcon.setRequestProperty("Cache-Control", "max-age=0");
httpcon.setRequestProperty("Connection", "keep-alive");
httpcon.setRequestProperty("Host", "www.mediamarkt.de");
httpcon.setRequestProperty("Sec-Fetch-Dest", "document");
httpcon.setRequestProperty("Sec-Fetch-Mode", "navigate");
httpcon.setRequestProperty("Sec-Fetch-Site", "none");
httpcon.setRequestProperty("Sec-Fetch-User", "?1");
httpcon.setRequestProperty("TE", "trailers");
httpcon.setRequestProperty("Upgrade-Insecure-Requests", "1");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
InputStream is = httpcon.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
BufferedWriter bwr = new BufferedWriter(new FileWriter(file, true));
String line;
while ((line = br.readLine()) != null) {
bwr.write(line);
}
is.close();
br.close();
bwr.close();

How can I read an html code from Amazon by using URL or any other method ? 503 Code Appears

When I try to compile this code
URL url = new URL("https://www.amazon.com");
BufferedReader bufr = new BufferedReader(new InputStreamReader(url.openStream()));
String data;
while ((data=bufr.readLine())!=null)
System.out.println(data);
It says : java.io.IOException: Server returned HTTP response code: 503 for URL: https://www.amazon.com
How can I search for a word in amazon url?

I read couple of links got to know that User-Agent value needs to be added to fix 503 error. Below is the sample code.
URL url = new URL("https://www.amazon.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
BufferedReader bufr = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String data;
while ((data = bufr.readLine()) != null)
System.out.println(data);

I'm getting java.io.FileNotFoundException for HTTPS URL

My code is like that:
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Language","en-US");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
However i am getting java.io.FileNotFoundException at BufferedReader. The address is correct and any browser displays the json result. I need to get the human readable address from lat and lon, also known as reverse geocoding. I have tried many things but nothing worked, so i will be very thankful if you tell me what i am doing wrong. If it is possible i prefer not to use any external library.

I wrote this code block and found the solution. You can look to parameters of setRequestProperty method
String response = null;
try {
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
connection.getResponseCode(); //if you want to check response code
InputStream stream = connection.getErrorStream();
if (stream == null) {
stream = connection.getInputStream();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}

In fact the problem seems to be gone for now as the only thing corrected is addRequestProperty instead of setRequestProperty and the user-agent data but i don't think it is so important. I am not so familiar with add and set requestproperty and don't know exactly what is the difference, but it seems to be important in this case.
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("GET"); //POST or GET no matter
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
Thank you all for your answers, problem is solved!

Read Downloaded Text File - FileNotFoundException

plz help, I'm trying to get the data from this google translate API URL
and it works only if the value is 1 word.. if its 2 it gives me an error..
i mean this will values will work:
String sourceLang = "auto";
String targetLang = "en";
String sourceText = "olas";
String urlstring = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=" + sourceLang + "&tl=" + targetLang + "&dt=t&q=" + sourceText;
but if i put it with 2 words:
String sourceText = "olas olas";
it will gives me the filenotfoundexception error
this is the code:
URL url = new URL(urlstring);
HttpURLConnection httpURLconnection = (HttpURLConnection) url.openConnection();
httpURLconnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
InputStream inputStream = httpURLconnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
while(line != null){
line = bufferedReader.readLine();
data = data + line;
}

Replace space with "%20" like
urlstring=urlstring.replace(" ", "%20");
URL url = new URL(urlstring);
HttpURLConnection httpURLconnection = (HttpURLConnection) url.openConnection();
httpURLconnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
InputStream inputStream = httpURLconnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
while(line != null){
line = bufferedReader.readLine();
data = data + line;
}

Why am i getting 403 error reading a website url on java?

I'm trying to read http://www.meuhumor.com.br/ on java using this:
URL url;
HttpURLConnection connection = null;
try{
url = new URL(targetURL);
connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
connection.setUseCaches(false);
connection.setDoInput(true);
connection.setDoOutput(true);
DataOutputStream dataout = new DataOutputStream(connection.getOutputStream());
dataout.flush();
dataout.close();
InputStream is = connection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
StringBuffer response = new StringBuffer();
while((line = br.readLine()) != null){
response.append(line);
response.append('\n');
}
br.close();
String html = response.toString();
I can access the website using any browser, but when i try to get the html with Java im getting java.io.IOException: Server returned HTTP response code: 403 for URL:
Someone know a way to get the html?

You are most likely getting an HTTP 403 response because your POST request has no body. Your code looks like it's trying to submit a form. If your intention was to simply pull down the page content without submitting a form, try a GET request, remove the Content-Type header, remove connection.setDoOutput(true), and remove the 3 DataOutputStream lines.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error reading from webpage - java

Related

Java URL Connection throws IOException with 403, works perfectly in browser

How can I read an html code from Amazon by using URL or any other method ? 503 Code Appears

I'm getting java.io.FileNotFoundException for HTTPS URL

Read Downloaded Text File - FileNotFoundException

Why am i getting 403 error reading a website url on java?

Categories

Resources