Couldn't download image - java

I'm trying to download images from url but got java.io.IOException: exception.
My code is:
public static void main(String[] args) throws MalformedURLException, IOException {
File picutreFile = new File("test.jpg");
FileUtils.copyURLToFile(new java.net.URL("http://paceoil.ca/files/includes/images/images-stories-presentation-october-icon-graphic.jpg"), picutreFile);
}
When run threw :
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://paceoil.ca/files/includes/images/images-stories-presentation-october-icon-graphic.jpg
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1460)
at com.hrant.Test.main(Test.java:14)
This code works for some urls but for this I couldnt understand why not.
Advance many Thanks.

try setting user agent property on your url, for example, see if it helps:
File picutreFile = new File("src/test.jpg");
URL url=new URL("http://paceoil.ca/files/includes/images/images-stories-presentation-october-icon-graphic.jpg");
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0");
conn.connect();
FileUtils.copyInputStreamToFile(conn.getInputStream(), picutreFile);

Just a suggestion:
When this only happen to this URL (or Domain), it may be possible, that the http-server implementation on the other end is checking some HTTP-Headers before processing the request., And you do not send such informations.
The result may be, that the server will response with 403.
Maybe: Forbidden to unknown Client (e.g User-Agent)
Maybe this Link will help to add headers to your request:
https://stackoverflow.com/a/15555952/3887073

this server maybe dont promise the some request that has invalid HTTP request .
you can send your HTTP header file with you request and server will promise you to download this image if its need the some HTTP header information .
something like this :
URL myURL = new URL(serviceURL);
HttpURLConnection myURLConnection = (HttpURLConnection)myURL.openConnection();
String userCredent = "username:password";
myURLConnection.setRequestProperty ("property", value);
myURLConnection.setRequestMethod("POST");

Related

Server returned HTTP response code 503 for URL

I'm able to access website kissmanga.com yet I can't access it via program. I fixed error 403 that I was getting before that but now I get error 503.
URL url = new URL("http://kissmanga.com/");
System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36");
BufferedReader bf = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while((str = bf.readLine()) != null){
System.out.println(str);
}
Error that I get:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://kissmanga.com/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at KissManga.main(KissManga.java:10)
Okay this code works with one small annoying problem. I don't get full html but just 2/3 of it.
HtmlUnitDriver driver = new HtmlUnitDriver();
driver.get("http://kissmanga.com/");
Thread.sleep(5000);
System.out.println(driver.getPageSource());
driver.quit();
You won't get any data this way, because site checks for Javascript enabled.
You should try tools which can emulate browser behaviour. For example, that's how you can get page source with the help of Selenium Htmlunit Driver:
HtmlUnitDriver drv = new HtmlUnitDriver(BrowserVersion.FIREFOX_38);
drv.setJavascriptEnabled(true);
drv.get("http://kissmanga.com/");
drv.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
System.out.println(drv.getPageSource());
Error 503 means that server is reachable, but returned an error status code
503 is for "Service Unavailable"
Maybe a problem happened temporarily on server or server rejected your request for some reason
It's because the site appears to use Cloudflare.
You can tell when you visit the site and get 'please wait while we check your browser'
503 = HTTP 503 Service Unavailable
This is Cloudflare telling you to hang on while it makes sure you aren't a DDOS.
You will need to code your parser to review the body and either wait out the redirect, or visit it manually yourself.

HttpURLConnection respond 404 when file clearly exist

I have a java program that is trying to read any arbitrary file from URL. However, it return an 404 error when the file clearly exist, try it for yourself with the URL. What is wrong?
URL url = new URL("http://images.all-free-download.com/images/graphiclarge/blue_abstract_background_310971.jpg");
HttpURLConnection myHTTPConTest = null;
myHTTPConTest = (HttpURLConnection) url.openConnection();
int responseCode = myHTTPConTest.getResponseCode(); // Returns 404
Added user-agent, no change:
myHTTPConTest.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401");
I think you had a bad luck and you just hit the server restart time or something like that.
Because it returns 200 now.

Trying to get html texts from google url, but error 401

I am trying to retrieve some html texts from a list of google returned pages. most of them work fine, but for urls such as https://www.google.com/patents/US6034687 always gives 401 error see below
Server returned HTTP response code: 401 for URL: https://www.google.com/patents/US6034687
I am using java and I did look up on this error code, it seems authentication related, but this kind of URL can be accessed from any browsers without asking for login. So I am confused, how come only this kind of URL does not work for me.
here is my code for retrieving html
URL u=new URL(url);
StringBuilder html =new StringBuilder();
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "text/html");
BufferedReader br;
try {
br = new BufferedReader(new InputStreamReader((conn.getInputStream())));
String out="";
while ((out= br.readLine()) != null) {
// System.out.println(out);
html.append(out+"\n");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Any idea?
thanks
Try sending a User-Agent header in the request. That 401 status is misleading. Some servers do not allow requests from non-browser clients.
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.2; rv:21.0) Gecko/20100101 Firefox/21.0");
BTW, when you do openConnection() for an https scheme, the return value is HttpsURLConnection, which extends HttpURLConnection.
The request requires user authentication. The response MUST include a WWW-Authenticate header field containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field. If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information. HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication

How can I prevent a 403 HTTP error code in Java?

I use simple code to get html for http://www.ip-adress.com, but it shows error http code 403.
I try it in other website like google.com in program, it can work. i can also open www.ip-adress.com in browse, why i can't use it in java program.
public class urlconnection
{
public static void main(String[] args)
{
StringBuffer document = new StringBuffer();
try
{
URL url = new URL("http://www.ip-adress.com");
URLConnection conn = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null)
document.append(line + " ");
reader.close();
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
System.out.println(document.toString());
}
}
java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.ip-adress.com/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at urlconnection.main(urlconnection.java:14)
This is the line you required
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
refer this
The web-server can detect that you are not actually trying to access it via HTTP, so it rejects your request. There are ways to fake that to trick the server into thinking that you are a browser.
I suppose the site checks user agent header and blocks what it seems to be "a robot". You need to mimic normal browser. Check this solution Setting user agent of a java URLConnection or try to use commons http client AND set user agent.
I don't believe that this is fundamentally a Java problem. You're doing the right thing to make an HTTP connection, and the server is doing "the right thing" from its perspective by responding to your request with a 403 response.
Let's be clear about this - the response you're getting is due to whatever logic is being employed by the target webserver.
So if you were to ask "how can I modify my request so that http://www.ip-address.com returns a 200 response", then people may be able to come up with workarounds that keep that server happy. But this is a host-specific process; your Java code is arguably correct, though it should have better error handling because you can always get non-2xx responses.
Try to change Connection User-Agent to something like Browsers, most of times I use Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1

java.io.IOException: Server returned HTTP response code: 403 for URL

I want to download the mp3 file from url : "http://upload13.music.qzone.soso.com/30671794.mp3", i always got java.io.IOException: Server returned HTTP response code: 403 for URL. But it's ok when open the url using browser. Below is part of my code:
BufferedInputStream bis = null;
BufferedOutputStream bos = null;
try {
URL url = new URL(link);
URLConnection urlConn = url.openConnection();
urlConn.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String contentType = urlConn.getContentType();
System.out.println("contentType:" + contentType);
InputStream is = urlConn.getInputStream();
bis = new BufferedInputStream(is, 4 * 1024);
bos = new BufferedOutputStream(new FileOutputStream(
fileName.toString()));​
Anyone could help me? Thanks in advance!
You can also use
System.setProperty("http.agent", "Chrome");
it worked for me.
//Update
Explanation
Because HttpURLConnection reads the property "http.agent" if set.
You can read it here: https://www.innovation.ch/java/HTTPClient/advanced_info.html
Or you can look it up in the source code of the HttpURLConnection Class:
String agent = java.security.AccessController.doPrivileged(new sun.security.action.GetPropertyAction("http.agent"));
Instead of using URLConnection in java, if you use HttpURLConnection you should beable to access the requested web page from java. Try the following code:
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
Normal java using urlConnection wont be accepted to access the internet. To access the browser it will need to perform a search without theexception HTTP response code : 403 for URL
EDIT (#Mordechai): No need to do the casting, just add the user agent.
When I access the URL with my browser I also get 403. Perhaps you're logged in to the site with your browser?
If that's the case you need to duplicate the cookie from your browser and send it along, perhaps even do more to replicate your browser's signature if the site does any extra checks.
You can set the cookie by adding:
urlConn.setRequestProperty("Cookie", "foo=bar");
Where foo=bar is the key-value pair you'll find when you locate the site's cookie in your browser.
The problem is given by the Status code. 403 means actually "Forbidden" and implies The request was denied for a reason the server does not want to (or has no means to) indicate to the client.
the problem lies at the server-side.
I would also check if the server were the resource is located has an ACL or similar in place, we just resolved a "java.io.IOException: 403" issue this way.
It happens that 403 errors are very generic and you cannot really be sure of the source as it can be just anything.

Categories

Resources