I am writing a Java program which uses Apache-HttpComponents to load a page and prints its HTML to the console; however, the program only prints part of the HTML before throwing this error: Exception in thread "main" java.net.SocketException: socket closed. The portion of the HTML displayed before the exception is exactly the same every time I run the program, and the error occurs in this simplified example with Google, Yahoo and Craigslist:
String USERAGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22";
DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet("http://www.craigslist.org");
get.setHeader(HTTP.USER_AGENT,USERAGENT);
HttpResponse page = client.execute(get);
get.releaseConnection();
InputStream stream = page.getEntity().getContent();
try{
BufferedReader br = new BufferedReader(new InputStreamReader(stream));
String line = "";
while ((line = br.readLine()) != null){
System.out.println(line);
}
}
finally{
EntityUtils.consume(page.getEntity());
}
I've found that get.releaseConnection(); should not be called until after I've finished reading the HTML. Calling it immediately after EntityUtils.consume(page.getEntity()); fixes the above code.
Related
so I am currently trying to download the html of a website, however I ran into this problem where one website constantly gives me 403 back. I've already had that error in previous projects, and were always able to fix it with adding a User-Agent, however this time, nothing I tried helped. I even copied every single part of my header in my browser, but I still get 403 in Java, while it works perfectly with wget, or other programming languages. Maybe someone here can help me?
URL im trying to download is: here
I'm using the following code (I've copied them 1:1 from my request in firefox):
if (file.exists()) {
Files.delete(file.toPath());
}
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestMethod("GET");
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
httpcon.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpcon.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
httpcon.setRequestProperty("Cache-Control", "max-age=0");
httpcon.setRequestProperty("Connection", "keep-alive");
httpcon.setRequestProperty("Host", "www.mediamarkt.de");
httpcon.setRequestProperty("Sec-Fetch-Dest", "document");
httpcon.setRequestProperty("Sec-Fetch-Mode", "navigate");
httpcon.setRequestProperty("Sec-Fetch-Site", "none");
httpcon.setRequestProperty("Sec-Fetch-User", "?1");
httpcon.setRequestProperty("TE", "trailers");
httpcon.setRequestProperty("Upgrade-Insecure-Requests", "1");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
InputStream is = httpcon.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
BufferedWriter bwr = new BufferedWriter(new FileWriter(file, true));
String line;
while ((line = br.readLine()) != null) {
bwr.write(line);
}
is.close();
br.close();
bwr.close();
I'm using Django server and getting the error Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL"
public static void main(String[] args) throws Exception {
// check if the user enter the right args from the command line.
if(args.length != 2){
System.out.println("Usage: java Reverse "
+ "http://<location of your servlet/script> "
+ "string_to_reverse");// display the error.
System.exit(1); // exit the program.
}
/**the sting that will be reversed may contain spaces or other
* non-alphanumeric characters. These characters must be
* encoded because the string is processed on its way to the server.
* the URLEncoder class methods encode the characters.*/
String stringToReverse = URLEncoder.encode(args[1], "UTF-8");
// create object for the specified url for the command line.
URL url = new URL(args[0]);
// sets the connection so that it can write to it.
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
connection.setReadTimeout(5000);
connection.setConnectTimeout(5000);
// The program then creates an output stream on the connection
// and opens an OutputSteamWriter on it;
OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
// the program writes the required information t the output
// stream and closes the stream.
out.write("string = " + stringToReverse);
out.close();
// read the specified url.
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String decodeString;
while((decodeString = in.readLine()) != null ){
System.out.println(decodeString);
}
in.close();
}
Error:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://127.0.0.1:8000/test
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1913)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
at Reverse.main(Reverse.java:52)
I also tried the follow things to fix the error but still not working.
connection.setRequestProperty("http.agent", "Chrome");
connection.setRequestProperty("User-agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.setRequestProperty("User-agent", "Mozilla/5.0");
connection.setRequestProperty("User-agent", "Mozilla");
Can somebody help me fix this?
My code is like that:
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Language","en-US");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
However i am getting java.io.FileNotFoundException at BufferedReader. The address is correct and any browser displays the json result. I need to get the human readable address from lat and lon, also known as reverse geocoding. I have tried many things but nothing worked, so i will be very thankful if you tell me what i am doing wrong. If it is possible i prefer not to use any external library.
I wrote this code block and found the solution. You can look to parameters of setRequestProperty method
String response = null;
try {
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
connection.getResponseCode(); //if you want to check response code
InputStream stream = connection.getErrorStream();
if (stream == null) {
stream = connection.getInputStream();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
In fact the problem seems to be gone for now as the only thing corrected is addRequestProperty instead of setRequestProperty and the user-agent data but i don't think it is so important. I am not so familiar with add and set requestproperty and don't know exactly what is the difference, but it seems to be important in this case.
URL url = new URL("https://nominatim.openstreetmap.org/reverse?format=json&lat=44.400000&lon=26.088492&zoom=18&addressdetails=1");
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setRequestMethod("GET"); //POST or GET no matter
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder json = new StringBuilder(1024);
String tmp;
while ((tmp = reader.readLine()) != null) json.append(tmp).append("\n");
reader.close();
JSONObject data = new JSONObject(json.toString());
Thank you all for your answers, problem is solved!
I'm trying to get the stream link for a video that is embeded in a website. Firstly I get the html from the website containg the player. Then refine this to the embedded link and then from that i get the stream link. In the past when i have done this I have been able to use Chrome to find the video player element then look for it in Java. However, when i look for the component i found from chrome it is not in the html code i get from Java.
(this method has worked in the past with different websites)
I'm using Inspect Element in chrome to find the player
This is my code to find an element of a website in Java:
//Opens Connection
URL url = new URL(address);
//Gets Data
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36");
InputStream is = connection.getInputStream();
//Creates bufferd reader
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
//Finds the line
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains(target) == true) {
break;
}
}
//Closes The input stream and buffered reader
in.close();
is.close();
//Returns the found line
return inputLine;
Any help is appreciated.
So I'm trying to write a program which connects to a site and pulls data from the source code. Whenever I call this method, once it reaches the line connection.setRequestProperty("Cookie", cookie); it doesn't proceed any further and spits out "IllegalStateException: Already connected". I'm trying to cycle through 123 different URL's so the URL changes each time the method is called, so I'm not too sure why it's telling me it's already connected when I'm attempting to reconnect to a different URL. I've tried searching everywhere for a solution and cannot find one. Can any of you help? Thanks!
private void getUrlData(String u, String championName) throws IOException {
List<String> data = new ArrayList<String>();
try {
BufferedWriter out = new BufferedWriter(new FileWriter("Other Stuff/Champion Data Test.txt"));
out.write(championName);
out.newLine();
URL url = new URL(u);
URLConnection connection = url.openConnection();
String cookie = connection.getHeaderField("Set-Cookie");
connection.setRequestProperty("Cookie", cookie);
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36");
connection.connect();
Scanner in = new Scanner(connection.getInputStream());
String inputLine;
while(in.hasNext()) {
inputLine = in.nextLine();
if(inputLine.contains("stat-label")) {
out.write(in.nextLine());
in.nextLine();
in.nextLine();
out.write(" " + in.nextLine());
}
}
}
catch(Exception e) {
System.out.println(e);
}
}
I found out the problem, but me solving this problem aroused new problems. The problem was me using the method connection.getHeaderField("Set-Cookie").