Why does Java properly fetch one webpage's content, but not another?

Why does Java properly fetch one webpage's content, but not another? - java

I'm trying to fetch a CSV-formatted webpage to use as a rudimentary database. The test page is at http://prog.bhstudios.org/bhmi/database/get, and browsers open it no problem. However, when I run the following code, Java throws a 403 error:
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Main
{
static
{
Logger.getGlobal().setLevel(Level.ALL);
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException
{
InputStream is = null;
try
{
System.out.println("Starting...");
URL url = new URL("http://prog.bhstudios.org/prog/bhmi/database/get/");
URLConnection urlc = url.openConnection();
urlc.connect();
is = urlc.getInputStream();
int data;
while ((data = is.read()) != -1)
{
System.out.print((char)data);
}
System.out.println("\r\nSuccess!");
}
catch (IOException ex)
{
Logger.getGlobal().log(Level.SEVERE, ex.getMessage(), ex);
System.out.println("\r\nFailure!");
}
if (is != null)
is.close();
}
}
Here's the console output:
Starting...
Nov 18, 2013 3:01:48 PM org.bh.mi.Main main
SEVERE: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
java.io.IOException: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
at org.bh.mi.Main.main(Main.java:36)
Failure!
Note that 403 means the server is on and properly accepted the request, but refuses to do anything further.
Now here's the kicker: If I get, say, http://example.com, it works just fine!
How can I get my Java app to read this file from my webserver?

I tested against your server and if I submit the request - using TamperData - with User-Agent: Java/1.6.0_14 (I just picked a random java version), your webserver responds with 403 Forbidden.
My browser shows the following error message:
Error 1010
Access denied
What happened?
The owner of this website (prog.bhstudios.org) has banned your access based on your browser's signature (cf7ab9f58210755-ua21).
In other words, your server (or more likely: your proxy, as the headers both indicate use of cloadflare-nginx and ASP.net) filters based on user agent strings. This is probably done to prevent bots and screenscrapers from accessing your websites.
You either need to drop this filter (ask your proxy adminstrator), or set a different user agent for URLConnection, see Setting user agent of a java URLConnection and How to modify the header of a HttpUrlConnection

Your server for some reason is configured to forbid access when the request header
User-Agent: Java/...
is present. I was able to reproduce the problem and also got it to work by doing
URLConnection urlc = url.openConnection();
urlc.setRequestProperty("User-Agent", "");
urlc.connect();

Related

RestApi post request to specific URL

Im working on integration with some rest API and i need to make calls to their URLS to receive the data.
Im just wondering if its possible to use a REST web-service which will be mapped to that certain URL instead of the local one and later on I will write the client side that will be mapped to these calls.
for example:
#Path("/URL")
public class MessageRestService {
#GET
#Path("/{param}")
public Response printMessage(#PathParam("param") String msg) {
String result = "Restful example : " + msg;
return Response.status(200).entity(result).build();
}
}
I cant make straight API calls from client side for example using AngularJs because i get this error:
Response to preflight request doesn't pass access control check: No 'Access- Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:63342' is therefore not allowed access. The response had HTTP status code 400.
I did find code samples for straight API calls to URLS from java, but it looks messy especially when you have to create it for a lot of API calls:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
public class Connection {
public static void main(String[] args) {
try {
URL url = new URL("INSERT URL HERE");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
String messageToPost = "POST";
OutputStream os = conn.getOutputStream();
os.write(input.getBytes());
os.flush();
conn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
System.out.println(output);
}
conn.disconnect();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

You are facing a same origin policy issue.
This is because your client-side (web browser) application is fetched from Server-A, while it tries to interact with data on Server-B.
Server-A is wherever you application is fetched from (before it is displayed to the user on their web browser).
Server-B is localhost, where your mock service is deployed to
For security reasons, by default, only code originating from Server-B can talk to Server-B (over-simplifying a little bit). This is meant to prevent malicious code from Server-A to hijack a legal application from Server-B and trick it into manipulating data on Server-B, behind the user's back.
To overcome this, if a legal application from Server-A needs to talk to Server-B, Server-B must explicitly allow it. For this you need to to implement CORS (Cross Origin Resource Sharing) - Try googling this, you will find plenty of resources that explain how to do it. https://www.html5rocks.com/en/tutorials/cors/ is also a great starting point.
However, as your Server-B/localhost service is just a mock service used during development and test, if your application is simple enough, you may get away with the mock service simply adding the following HTTP headers to all its responses:
Access-Control-Allow-Origin:*
Access-Control-Allow-Headers:Keep-Alive,User-Agent,Content-Type,Accept [enhance with whatever you use in you app]
As an alternative solution (during dev/tests only!) you may try forcing the web browser to disregard the same origin policy (eg: --disable-web-security for Chrome) - but this is dangerous if you do not pay attention to use separate instances of the web browser for your tests and for you regular web browsing.

Parse https with jsoup (java)

i try to parse a document with jsoup (java). This is my java-code:
package test;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class crawler{
private static final int TIMEOUT_IN_MS = 5000;
public static void main(String[] args) throws MalformedURLException, IOException
{
Document doc = Jsoup.parse(new URL("http://www.internet.com/"), TIMEOUT_IN_MS);
System.out.println(doc.html());
}
}
Ok, this works. But when i want to parse a https site, i get this error message:
Document doc = Jsoup.parse(new URL("https://www.somesite.com/"), TIMEOUT_IN_MS);
System.out.println(doc.html());
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://www.somesite.com/
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216)
at org.jsoup.Jsoup.parse(Jsoup.java:183)
at test.crawler.main(crawler.java:14)
I only get this error messages, when i try to parse https. http is working.

Jsoup supports https fine - it just uses Java's URLConnection under the hood.
A 403 server response indicates that the server has 'forbidden' the request, normally due to authorization issues. If you're getting a HTTP response status code, the TLS (https) negotiation has worked.
The issue here is probably not related to HTTPS, it just that the URL you're having troubles fetching happens to be HTTPS. You need to understand why the server is giving you a 403 - my guess is either you need to send some authorization tokens (cookies or URL params), or it is blocking the request because of the user agent (which defaults to "Java" unless you specify it). Lots of services block requests that way. Look to set the useragent to a common browser string. Use the Jsoup.Connect methods to do that.
(People won't be able to help you more without real example URLs, because we can't tell what the server is doing just with this info.)

You would need to provide authentication when hitting the URL. Also try the solution in 403 Forbidden with Java but not web browser? if the request works in a browser and not using JAVA code.

You could also just ignore SSL certificate if it's required
Jsoup.connect("https://example.com").validateTLSCertificates(false).get()

Java HttpUrlConnection throws Connection Refused

I know there are several question regarding this topic But I did't find an answer in any of them.
I'm trying to open a connection to my local server but I keep getting connection refused.
I have the server running and I tested the connection with the Browser and with a Google App called Postman and it works.
It's failing when opening the connection as if there where nothing to connect to. or maybe something is blocking the connection? I tested with firewall and antivirus down, no luck.
testing in Postman the URL returns a User as it should...
If I replace the url with "http://www.google.com" It Works fine.
here is my code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
/**
*
* #author Gabriel
*/
public class HttpConnection {
public HttpConnection() {
}
public void makeRequest() throws MalformedURLException, IOException {
String url = "http://localhost:8000/users/1";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36");
con.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
con.setRequestProperty("Accept-Encoding", "gzip,deflate,sdch");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.8,es;q=0.6");
con.setRequestProperty("Connection", "keep-alive");
con.setRequestProperty("Host", "localhost:8000");
int responseCode = con.getResponseCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
System.out.println(response.toString());
}
}

I faced exactly the same problem. Use this instead of localhost:
http://[::1]:8000/index.php

I have similar code that is working, but my request header is a lot simpler. Basically just:
con.setRequestProperty("User-Agent", "Mozilla/5.0");
If simplifying the header does not help, I would capture the traffic when using your browser with something like fiddler and then making the request look exactly like that.

I will make a wild guess what can be the problem. It is possible a IPv4/IPv6 problem.
If so, here is two possible solutions
If the server is only listening on an ipv6 address, change it to listening to ipv4.
If the server is listening to ipv4, then force Java to use ipv4 with
java.net.preferIPv4Stack=true

You can try implementing CORS at the API you are trying to connect by setting access-control-allow-origin:* property in response header.

The code is good and works great. Now the problem must be on the transportation or network part. What I want to mean is you don't request the right server. If you use 127.0.0.1 instead of localhost I think you won't get a problem. So, my guest will be that you have a problem in /etc/hosts or C:\Windows\System32\drivers\etc\hosts.
I advice you to try a simple test: ping the hostname and check in the output if the ip address is good.

Well, put http://localhost:8000/users/1 in your web browser and what do you get? A simple Connection Refused error. It's not you, it's the website. Also, Url returns websites using Protocol Identifiers(http://, https://), Ending Domains(.com, .edu, .gov) that's also another reason why you get an error.

You mentioned that you were opening a connection to your "local server"
I am assuming that you are doing this on the same computer that you're hosting the server on?
Try to open the connection to your local server using a different computer.

Can't read in HTML content from valid URL

I am trying out a simple program for reading the HTML content from a given URL. The URL I am trying in this case doesn't require any cookie/username/password, but still I am getting a io.IOException: Server returned HTTP response code: 403 error. Can anyone tell me what am I doing wrong here? (I know there are similar question in SO, but they didn't help):
import java.net.*;
import java.io.*;
import java.net.MalformedURLException;
import java.io.IOException;
public class urlcont {
public static void main(String[] args) {
try {
URL u = new URL("http://www.amnesty.org/");
URLConnection uc = u.openConnection();
uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
uc.connect();
InputStream in = uc.getInputStream();
int b;
File f = new File("C:\\Users\\kausta\\Desktop\\urlcont.txt");
f.createNewFile();
OutputStream s = new FileOutputStream(f);
while ((b = in.read()) != -1) {
s.write(b);
}
}
catch (MalformedURLException e) {System.err.println(e);}
catch (IOException e) {System.err.println(e);}
}
}

If you can fetch the URL in a browser, but not via Java, that indicates, to me, that they are blocking programmatic access to the page via user-agent filtering. Try setting the user-agent on your connection so that your code appears, to the webserver, to be a web-browser.
See this thread for help on that: What is the proper way of setting headers in a URLConnection?

There is a permission problem:
A web server may return a 403 Forbidden HTTP status code in response to a request from a client for a web page or resource to indicate that the server refuses to allow the requested action

you are not doing anything "wrong", the server you are trying to access is blocking your request, as you are not allowed to access the file
Http-Error 403 means Forbidden --> the remote server blocks the request.
check if you need to give authentification to access the document you want and in that case provide it with the request ;)

URL is accessable with browser but still FileNotFoundException with URLConnection

I use a HttpURLConnection to connect to a website and receive an ResponseCode=404 (HTTP_NOT_FOUND). However I have no problem opening the website in my browser (IE).
Why the difference, and what can I do about it?
Regards, Pavan
This is my Program
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
public class TestGet {
private static URL source;
public static void main(String[] args) {
doGet();
}
public static void doGet() {
try {
source = new URL("http://localhost:8080/");
System.out.println("Url is" + source.toString());
URLConnection connection = source.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
connection.setRequestProperty("Accept","*/*");
connection.setDoInput(true);
connection.setDoOutput(true);
System.out.println(((HttpURLConnection) connection).getResponseCode());
BufferedReader rdr = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
StringBuffer b = new StringBuffer();
String line = null;
while (true) {
line = rdr.readLine();
if (line == null)
break;
b.append(line);
}
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.toString());
}
}
}
Stack Trace
Url ishttp://localhost:8080/
404
java.io.FileNotFoundException: http://localhost:8080/
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$6.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at TestGet.doGet(TestGet.java:28)
at TestGet.main(TestGet.java:11)
Caused by: java.io.FileNotFoundException: http://localhost:8080/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at TestGet.doGet(TestGet.java:26)
... 1 more
java.io.FileNotFoundException: http://localhost:8080/

You are getting 404 error that means the response for the request is not found. First you need to make sure that there is a server serving at http://localhost:8080/ and it must return some content with code 200. If not, then there is nothing we can help you.
The easiest way to test whether there is anything at the url is to paste the url on the web browser address bar and click go. However, this does not guarantee that the Java code will be able to access it. For example, if the server is designed to response 404 if it cannot find the web browser User-Agent header.
Since the server returns a status code, either 200 or 404, it means this is not a firewall problem.
According to your latest edition of the question, you can view it with the web browser but cannot download it with your java code and the header seems to be set correctly. There are only two problem I can see:
You should not set connection.setDoOutput(true); to true. This will enforce the connection to do HTTP POST instead of GET and the server may not support POST.
Your server may be always returning 404 even if it should have been 200. Since the web browser doesn't care about the error status and tries to render all the content so it seems to be working from the web browser. If so, you should fix the server to reponse correctly first, otherwise try getting error stream instead HttpURLConnection#getErrorStream()

I had a similar issue. For me it helped to inspect the packets using RawCap. RawCap is one of the few Windows packet sniffers that lets you sniff localhost.
In my cases the server was returning a 404 due to an authentication issue.

If the url http://localhost:8080/ can be accessed well in the web browser, the code should work well. I run the program in my machine, it works well. So you must check whether the webserver service is ok.

I know this is very late in the game, but I was just recently having the same issue and none of the solutions here worked for me. In my case, I actually had another process running on the same port that was stealing the requests from the java app. Using yair's answer here you can check for a process running on the same port like this: In the command prompt, do netstat -nao | find "8080" on Windows or netstat -nap | grep 8080 on Linux. It should show a line with LISTENING and 127.0.0.1:8080 and next would be the process ID. Just terminate the process and you should be good to go.

I had the problem too. In my case i had a invisible unicode character in the url string. So connection couldnt open it (FileNotFound indicates that). I removed it and it worked.

I had a similar scenario where the web service processed POST requests from the browser (in my case Postman, an API testing Chrome extension) correctly, but HttpURLConnection kept failing with a 404 for large payloads. I mistakenly assumed that the problem must be in my HttpURLConnection client code.
When I later tried to replicate the request from cUrl with a large payload, I got the same 404 error. Even though I used the cUrl code generated by Postman, which therefore should be identical to Postman's request, there was a difference in how the web service reacted to both requests. Some client middleware on Postman may have intercepted and modified the requests.
TL;DR
Check the web service. It may be the culprit. Try another non-browser barebones Http client like cUrl to see how the web service reacts to it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why does Java properly fetch one webpage's content, but not another? - java

Your server for some reason is configured to forbid access when the request header User-Agent: Java/... is present. I was able to reproduce the problem and also got it to work by doing URLConnection urlc = url.openConnection(); urlc.setRequestProperty("User-Agent", ""); urlc.connect();

Related

RestApi post request to specific URL

Parse https with jsoup (java)

Java HttpUrlConnection throws Connection Refused

Can't read in HTML content from valid URL

URL is accessable with browser but still FileNotFoundException with URLConnection

Categories

Resources