so I am currently trying to download the html of a website, however I ran into this problem where one website constantly gives me 403 back. I've already had that error in previous projects, and were always able to fix it with adding a User-Agent, however this time, nothing I tried helped. I even copied every single part of my header in my browser, but I still get 403 in Java, while it works perfectly with wget, or other programming languages. Maybe someone here can help me?
URL im trying to download is: here
I'm using the following code (I've copied them 1:1 from my request in firefox):
if (file.exists()) {
Files.delete(file.toPath());
}
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestMethod("GET");
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
httpcon.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpcon.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
httpcon.setRequestProperty("Cache-Control", "max-age=0");
httpcon.setRequestProperty("Connection", "keep-alive");
httpcon.setRequestProperty("Host", "www.mediamarkt.de");
httpcon.setRequestProperty("Sec-Fetch-Dest", "document");
httpcon.setRequestProperty("Sec-Fetch-Mode", "navigate");
httpcon.setRequestProperty("Sec-Fetch-Site", "none");
httpcon.setRequestProperty("Sec-Fetch-User", "?1");
httpcon.setRequestProperty("TE", "trailers");
httpcon.setRequestProperty("Upgrade-Insecure-Requests", "1");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
InputStream is = httpcon.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
BufferedWriter bwr = new BufferedWriter(new FileWriter(file, true));
String line;
while ((line = br.readLine()) != null) {
bwr.write(line);
}
is.close();
br.close();
bwr.close();
Related
When I try to compile this code
URL url = new URL("https://www.amazon.com");
BufferedReader bufr = new BufferedReader(new InputStreamReader(url.openStream()));
String data;
while ((data=bufr.readLine())!=null)
System.out.println(data);
It says : java.io.IOException: Server returned HTTP response code: 503 for URL: https://www.amazon.com
How can I search for a word in amazon url?
I read couple of links got to know that User-Agent value needs to be added to fix 503 error. Below is the sample code.
URL url = new URL("https://www.amazon.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
BufferedReader bufr = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String data;
while ((data = bufr.readLine()) != null)
System.out.println(data);
My code is like that:
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Language","en-US");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
However i am getting java.io.FileNotFoundException at BufferedReader. The address is correct and any browser displays the json result. I need to get the human readable address from lat and lon, also known as reverse geocoding. I have tried many things but nothing worked, so i will be very thankful if you tell me what i am doing wrong. If it is possible i prefer not to use any external library.
I wrote this code block and found the solution. You can look to parameters of setRequestProperty method
String response = null;
try {
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
connection.getResponseCode(); //if you want to check response code
InputStream stream = connection.getErrorStream();
if (stream == null) {
stream = connection.getInputStream();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
In fact the problem seems to be gone for now as the only thing corrected is addRequestProperty instead of setRequestProperty and the user-agent data but i don't think it is so important. I am not so familiar with add and set requestproperty and don't know exactly what is the difference, but it seems to be important in this case.
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("GET"); //POST or GET no matter
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
Thank you all for your answers, problem is solved!
I'm trying to get the stream link for a video that is embeded in a website. Firstly I get the html from the website containg the player. Then refine this to the embedded link and then from that i get the stream link. In the past when i have done this I have been able to use Chrome to find the video player element then look for it in Java. However, when i look for the component i found from chrome it is not in the html code i get from Java.
(this method has worked in the past with different websites)
I'm using Inspect Element in chrome to find the player
This is my code to find an element of a website in Java:
//Opens Connection
URL url = new URL(address);
//Gets Data
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36");
InputStream is = connection.getInputStream();
//Creates bufferd reader
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
//Finds the line
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains(target) == true) {
break;
}
}
//Closes The input stream and buffered reader
in.close();
is.close();
//Returns the found line
return inputLine;
Any help is appreciated.
I'm trying reading to read a webpage.
In a browser it just looks like this:
<b>Failure</b>
<b>Success</b>
But When I read it with my application it gives me this:
http://pastebin.com/vJ6GDWpx
This is my code:
URL url = new URL("http://example.com/auth.php?username=" + username + "&password=" + password);
URLConnection urlconnection = url.openConnection();
urlconnection.setConnectTimeout(10000);
urlconnection.setReadTimeout(10000);
urlconnection.addRequestProperty("Host", "example.com");
urlconnection.addRequestProperty("Connection", "keep-alive");
urlconnection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2");
urlconnection.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
urlconnection.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
urlconnection.addRequestProperty("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
BufferedReader br = new BufferedReader(new InputStreamReader(urlconnection.getInputStream()));
String result;
while ((result = br.readLine()) != null) {
System.out.println(result);
}
br.close();
How can I solve this problem?
Works with HTMLUnit but their library is sooo big.
Is there a smaller solution?
I'm trying to read http://www.meuhumor.com.br/ on java using this:
URL url;
HttpURLConnection connection = null;
try{
url = new URL(targetURL);
connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11");
connection.setUseCaches(false);
connection.setDoInput(true);
connection.setDoOutput(true);
DataOutputStream dataout = new DataOutputStream(connection.getOutputStream());
dataout.flush();
dataout.close();
InputStream is = connection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
StringBuffer response = new StringBuffer();
while((line = br.readLine()) != null){
response.append(line);
response.append('\n');
}
br.close();
String html = response.toString();
I can access the website using any browser, but when i try to get the html with Java im getting java.io.IOException: Server returned HTTP response code: 403 for URL:
Someone know a way to get the html?
You are most likely getting an HTTP 403 response because your POST request has no body. Your code looks like it's trying to submit a form. If your intention was to simply pull down the page content without submitting a form, try a GET request, remove the Content-Type header, remove connection.setDoOutput(true), and remove the 3 DataOutputStream lines.