Why I can not read some https pages with java code? - java

I write a java program like I saw here
How to read the https page content using java?
but for some sites the code does not work.
I got Error Server returned HTTP response code: 403 for URL: https://research.investors.com/stock-quotes/nyse-sailpoint-tech-holdings-sail.htm
It works for
url = "https://maven.apache.org/guides/mini/guide-repository-ssl.html";
Can someone help me ?

403 HTTP status stands for "Forbidden", most likely investors.com can check your request headers and deny the resource.
Try modifying the request headers using an User-Agent that site might accept.

403 Forbidden
The request contained valid data and was understood by the server, but the server is refusing action. This may be due to the user not having the necessary permissions for a resource or needing an account of some sort, or attempting a prohibited action (e.g. creating a duplicate record where only one is allowed). This code is also typically used if the request provided authentication by answering the WWW-Authenticate header field challenge, but the server did not accept that authentication. The request should not be repeated.
So probably website, which you want to scrape, just restricted requests like yours (i mean requests, that was made not from browser).
But you can try Selenium.

OK , I solved.
I use con.setRequestProperty and set "User-Agent", "Accept", "Content-Type", "Accept-Language".
Thank you.

Related

Rest Assured - Getting Operation Timed Out Error while testing using RestAssured. But The same endpoint working fine in browser

I have an endpoint to be tested using RestAssured. The same endpoint is working fine while opening it in browser/Postman. But, while trying to test the same using RestAssured,
I am getting Operation Timed Out Error.
I had to connect to proxy to make that end point working in browser. used the same proxy in the rest assured also.
Sample Code below:
given().proxy("My_Proxy_URL_HERE",8080).when().get("My_API_URL_Here").then().log().all();
I am getting the response as
"Operation Timed Out" with Status Code 503.
I need your suggestion, what could be the possible issue, how to debug etc. Any suggestion is appreciated. Thanks in advance.
There can be many reasons for this behavior:
The address is just wrong and given there is some load balancer/proxy it can be configured to wait for a certain period of time and then respond with 503 status code.
Note, 503 is not a "request timed out", but "Service Unavailable".
The request url is good, but the request lacks some headers so that the load balancer/proxy won't be able to route the request to the required server.
How to check this? there exist tools that can come handy in this situation:
Check the access logs of the load balancer/proxy and even of your server if its possible - and see the request.
If it doesn't help, try to compare requests coming from rest-assured vs regular request. You can use tools like Burp for example, there are others, or you can even roll your own.
The idea is simple:
Start the "interceptor" on some port of your local computer (say, 9999 for example)
Configure the interceptor to forward all the requests to proxy of your choice (identified by URL - My_Proxy_URL_HERE and port 8080).
Now rest-assured must call localhost:9999 and the request will be intercepted by this tool. You'll be able to inspect its contents - headers, body, http method - everything.
Do the same for browser request and compare.

request.getHeader("referer") not working in HTTPS

Can anyone please guide me how to fetch request.getHeader("referer") in HTTPS mode?
Currently it is returning null.
Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.1.3

Android Volley - Strange Error with HTTP code 401- java.io.IOException: No authentication challenges found

I meet this error when send a request and get back response with code 401:
com.android.volley.NoConnectionError: java.io.IOException: No authentication challenges found
Some people say that:
This error happens beause the server sends a 401 (Unauthorized) but does not give a "WWW-Authenticate" which is a hint for the client what to do next. The "WWW-Authenticate" Header tells the client which kind of authentication is needed (either Basic or Digest). This is usually not very useful in headless http clients, but thats how the standard is defined. The error occurs because the lib tries to parse the "WWW-Authenticate" header but can't.
( android - volley error No authentication challenges found )
But it's quite weird for me because I don't want to use WWW-authenticate things, I just want to get the code 401, but I always get the exception.
How can I bypass this problem? Any suggestion is really appreciated.
I have do some research and come to conclusion that, this is a server issue, that did not follow the convention.
From wiki:
401 Unauthorized (RFC 7235)
Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication.
I think the best way to solve this problem is to solve in the server by add the header (something like:
WWW-Authenticate: xBasic realm=""
For me, I cannot change the server, so I have to check the error message to detect that a 401 error:
if (error.getMessage().equalsIgnoreCase("java.io.IOException: No authentication challenges found")){
showLoginError();
}
Not very elegant solution but work for now.

403 Forbidden Error when I click on the link.Works Fine when enter the url in browser directly

I have a problem with opening site through the link in tomcat - I am got forbidden.
My steps are:
1. Launch http://hostname
2. Select https://hostname/Site
3. 403 forbidden message
When I am going directly through the browser like this https://hostame/site everything works fine.
even
Select https://hostname/OtherSite Works fine
Can anyone help? Thanks in advance.
here is my case.
you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
if your target-domain not allow across-origin, then you get 403 forbidden error.
click link and enter url sending different request,
I found the difference is in request header:
click link (with 403 forbidden error),
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, (no 403 forbidden),
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

error while calling a soap webservice in eclipse

java.io.IOException: Server returned HTTP response code: 500 for URL: https://***/fiwebservice/services/FIUsbWebService
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1459)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:234)
at com.abcde.testClient.TestClientTry.main(TestClientTry.java:109)
I have replaced the url as *** for security purpose, as it is confidential..
Why is there an error when I call a soap webservice in eclipse?
Please help me regarding this.
It seems that there is an error in com.abcde.testClient.TestClientTry. Could you provide the logs and the the source of the File?
Http 500 can mean many things. In Spring security I think that can mean that you didnt have the appropriate authentication to reach the resource. Without knowing much about your server side its hard to say what the problem is or how to solve it.
What kind of technology did you have at the server?
HTTP status code 500 usually means that the web server code crashes.
If HttpURLConnection#getResponseCode() and error and HttpURLConnectionof#getErrorStream() instead of (to the status code to determine in advance), it is necessary to read. It can in other words, information about the problem.
Host if blocked you, you have got the code 4NN State, as the more 401 and 403rd

Categories

Resources