I'm trying reading to read a webpage.
In a browser it just looks like this:
<b>Failure</b>
<b>Success</b>
But When I read it with my application it gives me this:
http://pastebin.com/vJ6GDWpx
This is my code:
URL url = new URL("http://example.com/auth.php?username=" + username + "&password=" + password);
URLConnection urlconnection = url.openConnection();
urlconnection.setConnectTimeout(10000);
urlconnection.setReadTimeout(10000);
urlconnection.addRequestProperty("Host", "example.com");
urlconnection.addRequestProperty("Connection", "keep-alive");
urlconnection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2");
urlconnection.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
urlconnection.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
urlconnection.addRequestProperty("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
BufferedReader br = new BufferedReader(new InputStreamReader(urlconnection.getInputStream()));
String result;
while ((result = br.readLine()) != null) {
System.out.println(result);
}
br.close();
How can I solve this problem?
Works with HTMLUnit but their library is sooo big.
Is there a smaller solution?
Related
so I am currently trying to download the html of a website, however I ran into this problem where one website constantly gives me 403 back. I've already had that error in previous projects, and were always able to fix it with adding a User-Agent, however this time, nothing I tried helped. I even copied every single part of my header in my browser, but I still get 403 in Java, while it works perfectly with wget, or other programming languages. Maybe someone here can help me?
URL im trying to download is: here
I'm using the following code (I've copied them 1:1 from my request in firefox):
if (file.exists()) {
Files.delete(file.toPath());
}
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestMethod("GET");
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
httpcon.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpcon.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
httpcon.setRequestProperty("Cache-Control", "max-age=0");
httpcon.setRequestProperty("Connection", "keep-alive");
httpcon.setRequestProperty("Host", "www.mediamarkt.de");
httpcon.setRequestProperty("Sec-Fetch-Dest", "document");
httpcon.setRequestProperty("Sec-Fetch-Mode", "navigate");
httpcon.setRequestProperty("Sec-Fetch-Site", "none");
httpcon.setRequestProperty("Sec-Fetch-User", "?1");
httpcon.setRequestProperty("TE", "trailers");
httpcon.setRequestProperty("Upgrade-Insecure-Requests", "1");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
InputStream is = httpcon.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
BufferedWriter bwr = new BufferedWriter(new FileWriter(file, true));
String line;
while ((line = br.readLine()) != null) {
bwr.write(line);
}
is.close();
br.close();
bwr.close();
When I try to compile this code
URL url = new URL("https://www.amazon.com");
BufferedReader bufr = new BufferedReader(new InputStreamReader(url.openStream()));
String data;
while ((data=bufr.readLine())!=null)
System.out.println(data);
It says : java.io.IOException: Server returned HTTP response code: 503 for URL: https://www.amazon.com
How can I search for a word in amazon url?
I read couple of links got to know that User-Agent value needs to be added to fix 503 error. Below is the sample code.
URL url = new URL("https://www.amazon.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
BufferedReader bufr = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String data;
while ((data = bufr.readLine()) != null)
System.out.println(data);
My code is like that:
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Language","en-US");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
However i am getting java.io.FileNotFoundException at BufferedReader. The address is correct and any browser displays the json result. I need to get the human readable address from lat and lon, also known as reverse geocoding. I have tried many things but nothing worked, so i will be very thankful if you tell me what i am doing wrong. If it is possible i prefer not to use any external library.
I wrote this code block and found the solution. You can look to parameters of setRequestProperty method
String response = null;
try {
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
connection.getResponseCode(); //if you want to check response code
InputStream stream = connection.getErrorStream();
if (stream == null) {
stream = connection.getInputStream();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
In fact the problem seems to be gone for now as the only thing corrected is addRequestProperty instead of setRequestProperty and the user-agent data but i don't think it is so important. I am not so familiar with add and set requestproperty and don't know exactly what is the difference, but it seems to be important in this case.
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("GET"); //POST or GET no matter
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
Thank you all for your answers, problem is solved!
plz help, I'm trying to get the data from this google translate API URL
and it works only if the value is 1 word.. if its 2 it gives me an error..
i mean this will values will work:
String sourceLang = "auto";
String targetLang = "en";
String sourceText = "olas";
String urlstring = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=" + sourceLang + "&tl=" + targetLang + "&dt=t&q=" + sourceText;
but if i put it with 2 words:
String sourceText = "olas olas";
it will gives me the filenotfoundexception error
this is the code:
URL url = new URL(urlstring);
HttpURLConnection httpURLconnection = (HttpURLConnection) url.openConnection();
httpURLconnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
InputStream inputStream = httpURLconnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
while(line != null){
line = bufferedReader.readLine();
data = data + line;
}
Replace space with "%20" like
urlstring=urlstring.replace(" ", "%20");
URL url = new URL(urlstring);
HttpURLConnection httpURLconnection = (HttpURLConnection) url.openConnection();
httpURLconnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
InputStream inputStream = httpURLconnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
while(line != null){
line = bufferedReader.readLine();
data = data + line;
}
I'm trying to read http://www.meuhumor.com.br/ on java using this:
URL url;
HttpURLConnection connection = null;
try{
url = new URL(targetURL);
connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
connection.setUseCaches(false);
connection.setDoInput(true);
connection.setDoOutput(true);
DataOutputStream dataout = new DataOutputStream(connection.getOutputStream());
dataout.flush();
dataout.close();
InputStream is = connection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
StringBuffer response = new StringBuffer();
while((line = br.readLine()) != null){
response.append(line);
response.append('\n');
}
br.close();
String html = response.toString();
I can access the website using any browser, but when i try to get the html with Java im getting java.io.IOException: Server returned HTTP response code: 403 for URL:
Someone know a way to get the html?
You are most likely getting an HTTP 403 response because your POST request has no body. Your code looks like it's trying to submit a form. If your intention was to simply pull down the page content without submitting a form, try a GET request, remove the Content-Type header, remove connection.setDoOutput(true), and remove the 3 DataOutputStream lines.