I want to fetch a web page from a ASP.NET site that is only accessible from within a session. I'm using Apache HttpClient. I first open the main page of the site, then I search for the link to the "goal" page, and then I fire up a GET request for the "goal" page. The problem is that when I get the response for the second GET request, I always get the same (first) page. If I open the site with Firefox or Google Chrome I get the "goal" page.
From the first response from the server I get the following headers:
HTTP/1.1 200 OK
Date: Sun, 12 Dec 2010 19:03:56 GMT
Server: Microsoft-IIS/6.0
Platform: Mobitel Pla.NET
Node: 4
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Set-Cookie: ASP.NET_SessionId=0vpgd055cifko3mnw4nkuimz; path=/
Cache-Control: no-cache, must-revalidate
Content-Type: text/html; charset=utf-8
Content-Length: 7032
I inspected the traffic with WireShark and all headers look OK. I send the correct cookie back to the server on the second GET request.
I'm using Apache HttpClient. I have only one instance of DefaultHttpClient and I reuse that for the second request. I have BROWSER_COMPATIBILITY Cookie Policy.
Any ideas?
You need send back this header from the client (send back the cookie you received) in all your further requests:
Cookie: ASP.NET_SessionId=0vpgd055cifko3mnw4nkuimz; // and all other cookies
That should do the trick
I found my stupid mistake.
The mistake was that I was sending the second GET request to a link, without replacing the ampersand character codes.
Ex:
/(0vpgd055cifko3mnw4nkuimz)/Mp.aspx?ni=1482&pi=72&_72_url=925b9749-b7c7-4615-9f1a-9b613c344c82
That is wrong, because I send & instead of &
The RIGHT way to do it is:
/(0vpgd055cifko3mnw4nkuimz)/Mp.aspx?ni=1482&pi=72&_72_url=925b9749-b7c7-4615-9f1a-9b613c344c82
Related
I am using Spring Oauth client setup on my frontend. I'm authentication against my API, which returns this
{
error: "unauthorized"
error_description: "User does not exist"
}
using my rest client (Google Plug: Advance RestClient); which is expected.When I use Spring Oauth Client setup, I was expecting that the error object RestClientException would have that JSON result in the response body however it is empty. See attached image (Watch Console shows what's in the response body).
Right click the image and select open image in new tab to make the image bigger
Request sent
grant_type=c_password&username=test&password=test&client_id=test&client_secret=test
Response
Date: Sun, 26 Apr 2015 20:59:45 GMT
Connection: close
Cache-Control: no-store
Pragma: no-cache
Www-Authenticate: Bearer realm="api/", error="unauthorized", error_description="User does not exist"
Content-Type: application/json;charset=UTF-8
Server: Jetty(7.x.y-SNAPSHOT)
Via: 1.1 vegur
Raw
JSON
This may be a RestTemplate bug - your server (or the proxy) is using a combination of Connection: close and no Content-Length headers.
One way to confirm this: make your server write Content-Length headers (in Spring, adding a ShallowEtagHeaderFilter will do that).
If this workaround fixes this, then this bug has been fixed in SPR-8016 - upgrading the client application to Spring 4.1.5 will solve this.
it seems the actual exception that you are getting is not RestClientException. Just debug the error, it would be something HttpClientErrorException. So catch this exception instead of above one. This exception has methods to get the desired result.
I'm trying to retrieve links from this page: http://www.seas.harvard.edu/academics/areas
There is a link named "Computer Science" in the middle of the page. Its underlying link is given as "/academics/areas/computer-science". I'm able to convert it to an absolute URL with the Java built-in URL class, obtaining "http://www.seas.harvard.edu/academics/areas/computer-science".
When I click the link in Chrome browser, however, the absolute URL changes to "http://www.seas.harvard.edu/computer-science".
So my question is two-fold:
How does the URL redirect work in this page?
Is there any library or method in Java that would help me obtain the URL after redirect?
I need to obtain the URL after redirect because I want to read the source code of the page but the URL before redirect doesn't work for me. I'm using the JSoup library to read from the URL so I suspect it might be a javascript-based redirect.
From curl --dump-header [file] [URL] the file looked like:
HTTP/1.1 301 Moved Permanently
Age: 0
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: text/html
Date: Tue, 13 Aug 2013 13:00:12 GMT
ETag: "1376398812"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Tue, 13 Aug 2013 13:00:12 GMT
Location: http://www.seas.harvard.edu/computer-science
Server: nginx
Vary: Accept-Encoding
Via: 1.1 varnish
X-AH-Environment: prod
X-Cache: MISS
X-Drupal-Cache: MISS
X-Redirect-ID: 44
X-Varnish: 2704315535
transfer-encoding: chunked
Connection: keep-alive
As you can see this is a 301 permanent redirect served from the server.
To obtain the data:
You can use HttpURLConnection to connect, but before connecting, call myConn.setInstanceFollowRedirects(true). Redirects are followed and you can get your output stream and read it.
To obtain the URL itself:
You can use HttpURLConnection to connect, but before connecting, call myConn.setInstanceFollowRedirects(false) to not follow redirects. This will save the actual URL in the right place.
The trick here is that for some odd reason, HttpURLConnection doesn't allow to retrieve a header by name unless you parse it as a date.
So, you will need to iterate an integer, calling getHeaderFieldKey after making the connection and checking if it equal to Location and if it is, getting getHeaderField with the same integer to get the location. Annoying, I know. But a location isn't a date and this is a JRE oversight.
I used Fiddler to investigate and the site return for link http://www.seas.harvard.edu/academics/areas/computer-science HTTP 301 response code, that performs the redirect.
I you want to get real URL. You should perform real request to harvard.edu web server and parse response. (Redirect URL is located in Location key in HTTP Header).
Sorry about your second question. I don't have skill in Java.
This SO question may help (httpclient-4-how-to-capture-last-redirect-url)
There is probably e.g. a .htaccess and mod_rewrite redirect. Using Firefox's Console I could see the requests. As you can see below the server is sending back a 301 Moved Permanently message. This tells the browser to redirect to the address returned in the Location header of the response.
The way you obtain the changed URL depends on the way you load the page:
If you use ready libraries & code to load the page to e.g. a DOM object, the you could use that ready HTTP system to load the response, this will probably result to it automatically redirecting -> you will get the URL from the URL of the loaded page. If it does not do that, then you must check for status code 301 or 302 and when those are received then the changed URL is in the Location header of the response.
If you have your own code written to load the response via TCP sockets, then you must just load the response as normal, but again check for the 301 and 302 status codes and do as described in the previous section.
I can only attempt to address Q1 since I'm not a Java programmer. The source code says they're using Drupal, so I speculate that they're using Drupal's global redirect module (SO discussion about Drupal redirect module here). Looking at the module's documentation might shed some light on how to obtain the correct url with Java.
There's also numerous ways within javascript to have url requests automatically redirect to some base page (e.g., CS homepage), while physically navigating the site allows the user to advance to new pages. This is standard practice in many single page web apps. If this is the case, then #hexafraction 's suggestion might be able to help you retrieve the desired url, though I'm unfamiliar with the Java methods (s)he is suggesting.
You can get the Redirect URL from the below code setting followRedirects to false.
You will get the source code of the redirected page if you set it to true and that's the default behavior of Jsoup
Connection con = Jsoup.connect("http://www.seas.harvard.edu/academics/areas/computer-science")
.userAgent("Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36")
.followRedirects(false);
System.out.println("Redirected Url : " + con.execute().header("Location")); //null if followRedirect is true
Document doc = con.get();
System.out.println(doc.html());
System.out.println("=================================================");
I'm using jCryption and JavaCryption, the server-side implementation of the jCryption JavaScript plug-in.
There appears to be an issue with what seems the Java implementation, in that FireFox's firebug is reporting a "not-well formed" error with the JSON that is returned to the client from the server. So, unlike the thread, "not well-formed" error in Firefox when loading JSON file with XMLHttpRequest, this is coming from the response object, NOT the request object.
I tried adding .JSON, application/json as a MIMEtype to my web server, IIS 7.5, but that didn't help. Then I tried .JSON, text/plain and that didn't help, either.
Do I have to edit the Java code to force application/json, when it sends it back to the client? Or, what can I do to resolve this issue?
Thank you for any help.
Here is the raw output for the first one causing the "not-well formed" error:
HTTP/1.1 200 OK
Content-Length: 294
Server: Microsoft-IIS/7.5
X-Powered-By: Nothing
Date: Tue, 23 Oct 2012 02:10:24 GMT
{"e":"10001","n":"b3fbbe3d2e3599e840a117be08f72726d8ee643dada3805ab24b9a9150d123a7a0902ae45f2f2e194e5462c4f5c3b91cca91b48d1f07c6cd7fab629a331148f66516df05dfa0bd95cc9f477069e60fa54eab8a5586d08436717758d9706b90c884eded7260af1ce5ff70f507b9c5ddb019b6e1313a77f4eab3b2d04a09934d8d","maxdigits":"131"}
Here is the second one:
HTTP/1.1 200 OK
Content-Length: 200
Server: Microsoft-IIS/7.5
X-Powered-By: Nothing
Date: Tue, 23 Oct 2012 02:10:24 GMT
{"challenge":"zf6iI5D8hVDCmMVuHIFy71ikKxcqVzkLplMDKP6Hgz7EPv2STfYjcBlf6ep1wu5OMCCsPKf4dRECpVvr7yIK8kCm0I5c4xTXCkmnyyzBXeHgbvkzGWVmaLzxj5RYajdWLFkvN1waV41FhR+PtK1tOmGe8k57wSZ/yyZUAsvh7NaJf6THc9P9rQ=="}
You need to look at what is in the actual response. You need to look at what the responses content-type header currently says, and what the body of the response contains. Firebug can show you both of those.
There is a good chance that the response body is not JSON at all ... but an HTML error report about some problem with the request (as the server sees it).
Either way, you can't resolve the problem properly until you have worked out what is causing it. Simply assuming that it is content type problem is not a sound strategy.
Based on the response you posted, the problem is most likely due to the fact that there is no Content-Type header. If this response comes from Java, then you probably do need to modify the Java (or JSP) to set the missing header in the response.
Set Content-Type to application/json in jsp file
Setting a Content-type header in a servlet.
(There are other ways to do this if it is impossible to change the servlet or JSP code.)
I make very simple HTTP server in Java. The response sent to the browser is
HTTP 1.1 200 OK
Server: OneFile 1.0
Content-Type: text/html; charset=utf-8
Content-Length: 202
Transfer-Encoding: chunked
<HTML><HEAD><TITLE>My website</TITLE></HEAD>
<BODY><H1>Document </H1>
</BODY></HTML>
mozilla firefox displays it as text/plain although it should be text/html Why?
I suspect the Setup info is ignored...is it any difference for browser if I make connection on port 8080?
Thanks for any help
The browser will honor your headers. Unfortunately, your response is malformed for several reasons:
the response should start HTTP/1.1, not HTTP 1.1
you specify Transfer-Encoding: chunked, but your response does not follow the chunked format.
It appears that Firefox, quite sensibly, refuses to interpret such malformed response and just shows it unchanged.
I have java application running under tomcat which is fronted by apache webserver.
In my code I set cookie domain as
.example.com
but still my cookies shows up under www.example.com instead of under example.com in the client browser. What is so strange google analytics cookies shows up under example.com but my own code cannot store cookies under example.com?
Apache server is setup such that requests for example.com shows up as www.example.com in the client browser address bar if that is related to the issue ? I do need this otherwise different session id are generated for example.com and www.example.com which is bad for my applicaton.
Apache server is setup such that
requests for example.com shows up as
www.example.com in the client browser
address bar if that is related to the
issue ?
I am not 100% sure, but this looks like the root of the problem. How does Apache make the client browser to display www.example.com instead of example.com? Most probably, by redirecting each request for example.com to www.example.com. When the browser processes redirection, it sends a request for www.example.com and from that point on thinks that it is working with www.example.com.
Now, what happens when there is a Set-Cookie in the response header? It will obviously treat it as coming from www.example.com. There is no way a browser would allow such cookie to set its domain to .example.com because it would be a security problem. Imagine that mysite.somefreehosting.com sets a cookie for the domain .somefreehosting.com. Then someothersite.somefreehosting.com would receive this cookie which may lead to a lot of trouble. The standard specifies that such cookie should be rejected, but I wouldn't be surprised if some browsers are smart enough to handle such cases and to treat .example.com as www.example.com.
To be sure, I recommend that you check what exactly your site sends to the browser by sending a request with something like lwp-request script. You'll see what redirections are happening and what headers are actually set in the response, like this:
alqualos#ubuntu:~$ lwp-request -sSed http://google.com/
GET http://google.com/ --> 301 Moved Permanently
GET http://www.google.com/ --> 302 Found
GET http://www.google.co.il/ --> 200 OK
Cache-Control: private, max-age=0
Connection: close
Date: Sat, 18 Dec 2010 18:54:57 GMT
Server: gws
Content-Type: text/html; charset=windows-1255
Content-Type: text/html; charset=windows-1255
Expires: -1
Client-Date: Sat, 18 Dec 2010 18:54:57 GMT
Client-Peer: 173.194.37.104:80
Client-Response-Num: 1
Set-Cookie: PREF=ID=368e9cfd56643257:FF=0:TM=1292698497:LM=1292698497:S=s-Jur84NgaNH5Mzx;
expires=Mon, 17-Dec-2012 18:54:57 GMT; path=/; domain=.google.co.il
Set-Cookie: NID=42=bZ6goDV_b2MiWlTMONwiijaON5U_TBGB2_yNheonEwA1GVLU77EhyfUhk9Wvj70xTFrpvGy4s_aBp1UZtvRRnsnYjacjz_UVx0_iSr9R3nYXMyRtwkS5qV98_Egb16pZ;
expires=Sun, 19-Jun-2011 18:54:57 GMT; path=/; domain=.google.co.il; HttpOnly
Title: Google
X-XSS-Protection: 1; mode=block