I need to log URLs that are linking to my site in a Java Servlet.
It's available in the HTTP referer header. You can get it in a servlet as follows:
String referrer = request.getHeader("referer"); // Yes, with the legendary misspelling.
You, however, need to realize that this is a client-controlled value and can thus be spoofed to something entirely different or even removed. Thus, whatever value it returns, you should not use it for any critical business processes in the backend, but only for presentation control (e.g. hiding/showing/changing certain pure layout parts) and/or statistics.
For the interested, background about the misspelling can be found in Wikipedia.
Actually it's:
request.getHeader("Referer"),
or even better, and to be 100% sure,
request.getHeader(HttpHeaders.REFERER),
where HttpHeaders is com.google.common.net.HttpHeaders
The URLs are passed in the request: request.getRequestURL().
If you mean other sites that are linking to you? You want to capture the HTTP Referrer, which you can do by calling:
request.getHeader("referer");
As all have mentioned it is
request.getHeader("referer");
I would like to add some more details about security aspect of referer header in contrast with accepted answer. In Open Web Application Security Project(OWASP) cheat sheets, under Cross-Site Request Forgery (CSRF) Prevention Cheat Sheet it mentions about importance of referer header.
More importantly for this recommended Same Origin check, a number of HTTP request headers can't be set by JavaScript because they are on the 'forbidden' headers list. Only the browsers themselves can set values for these headers, making them more trustworthy because not even an XSS vulnerability can be used to modify them.
The Source Origin check recommended here relies on three of these
protected headers: Origin, Referer, and Host, making it a pretty
strong CSRF defense all on its own.
You can refer Forbidden header list here. User agent(ie:browser) has the full control over these headers not the user.
Related
Title already says it, when i do an web-api call on my jetty servlet endpoint.
Is it enough to catch the
request.getMethod().equals("OPTIONS")
flag?
Or are there other possibilities to determine if the current request is of type preflight?
Preflight requests are sent as OPTIONS so you can check for that, yes.
But depending on what you are doing (I'm assuming some CORS filter?) you might also want to check if the following headers are present on the preflight: Access-Control-Request-Method (what HTTP method will be used when the actual request is made) and Access-Control-Request-Headers (what HTTP headers will be used when the actual request is made).
I need to write an API to check if a user name already exists in a database.
I want my server (Struts Action class instance in tomcat server) to return true/false.
Its something like this
checkUserName?userName=john
I want to know what is the standard way to do this?
Shall I return a JSON response with just one boolean value ... seems like a overkill.
Shall I do something like manually setting the HTTP header to 200 or 404 (for true/false), but that seems to violate the actual purpose of using the headers which I believe must only be used to indicate network failures etc.
(Too long for a comment.)
I don't see any reason not to return a standard JSON response with something indicating whether or not the user name exists. That's what APIs do: there's nothing "overkill" about providing a response useful across clients.
To your second point: headers do a lot more than "indicate network problems". A 404 isn't a network problem, it means the requested resource doesn't exist. It is not appropriate in your case, because you're not requesting a resource: the resource is checkUserName, which does exist. If instead your request was /userByName/john a 404 would be appropriate if the user didn't exist. That's not an appropriate request in this case, because you don't want to return the user.
A 401 isn't a network problem, it's an authentication issue. A 302 isn't a network problem, it's a redirect. Etc. Using HTTP response codes is entirely appropriate, if they match your requests.
I'm connecting to URLs with Java (HttpURLConnection).
I have noticed that in some cases, the response code is 3xx but the 'Location' header is empty.
How does a client browser know where to redirect after receiving this kind of HTTP response?
Thanks
Not all 3xx replies can redirect automatically.
300 provides multiple URLs in the response body, not in the Location header. The client/user has to decide which one to retrieve next.
301, 302, 303, and 307 provide a Location only if the next URL is known. Otherwise, the client/user has to decide what to do next.
304 is not a redirect. It is a response to a conditional GET, where the requested content has not changed since the requested criteria was last satisfied.
305 always provides a Location to the required proxy to connect to.
306 is not used anymore.
If you look at the HTTP spec on some of the 3xx status codes, some of them only SHOULD provide a Location header.
How does a client browser know where to redirect after receiving this
kind of HTTP response?
It doesn't. It's up to the client to handle what to do in that case.
Location headers redirect the user agent to retrieve another URI reference when used with 3xx redirection status codes, except for 304 Not Modified. Both absolute URIs and relative references can be provided, including empty references which refer to the current resource (see the URI specification for further information).
Still, only Firefox and old Edge accept empty Location headers; the new Edge and Chrome don't. Although HTTP redirects are only meant to redirect to different resources or URIs (see RFC 7231 section 6.4), all browsers implement non-empty Location headers that explicitly refer to the same page.
Whenever the user agent receives a redirection status code but no Location header (or an invalid Location header or in the case of Chrome, an empty Location header) it will not redirect but show the response body. This also applies when the user disables automatic redirection. Therefore the response body should also include the respective link.
Empty Location headers may obviously introduce redirect loops. Nevertheless, the status code 303 See Other could be used in conjunction with an empty Location header to implement the Post/Redirect/Get idiom using the very same URI. This idiom prevents users from resubmitting the same form using POST when reloading the page because 303 See Other requires the user agent to use the GET request method when following the new Location. 301 Moved Permanently and 302 Found may also change the request method to GET (but they also may not); 307 Temporary Redirect and 308 Permanent Redirect never change the request method.
While this use case seems kind of elegant I would not recommend to implement it because browser support diverges.
I am looking for a clean/simple way in HtmlUnit to request a webpage from a server in a specific language.
To do this i have been trying to request "bankofamerica.com" for their homepage in spanish instead of english.
This is what i have done so far:
I tried to set "Accept-Language" header to "es" in the Http request. I did this using:
myWebClient.addRequestHeader("Accept-Language" , "es");
It did not work. I then created a web request with the following code:
URL myUrl = new URL("https://www.bankofamerica.com/");
WebRequest myRequest = new WebRequest(myUrl);
myRequest.setAdditionalHeader("Accept-Language", "es");
HtmlPage aPage = myWebClient.getPage(myRequest);
Since this failed too i printed out the request object for this url , to check if these headers are being set.
[<url="https://www.bankofamerica.com/", GET, EncodingType[name=application/x-www-form-urlencoded], [], {Accept-Language=es, Accept-Encoding=gzip, deflate, Accept=*/*}, null>]
So the server is being requested for a spanish page but in response its sending the homepage in english (the response header has the value of Content-Language set to en-US)
I did find a hack to retrieve the BOA page in spanish. I visited this page and used the chrome developer tool to get the cookie value from the request
header. I used this value to do the following:
myRequest.setAdditionalHeader("Cookie", "TLTSID= ........._LOCALE_COOKIE=es-US; CONTEXT=es_US; INTL_LANG=es_US; LANG_COOKIE=es_US; hp_pf_anon=anon=((ct=+||st=+||fn=+||zc=+||lang=es_US));..........1870903; throttle_value=43");
I am guessing the answer lies somewhere here.
Here lies my next question. If i am writing a script to retrieve 100 different websites in Spanish (ie Assuming they all have their pages in the spanish) . Is there a clean way in HtmlUnit to accomplish this.
(If cookies is indeed a solution then to create them in htmlunit you need to specify the domain name. One would have to then create cookies for each of the 100 sites. As far as i know there is no way in HtmlUnit to do something like:
Cookie langCookie = new Cookie("All Domains","LANG_COOKIE","es_US");
myWebClient.getCookieManager().addCookie(langCookie);)
NOTE: I am using HtmlUnit 2.12 and setting BrowserVersion.CHROME in the webclient
Thanks.
Regarding your first concern the clear/simple(/only?) way of requesting a webpage in a particular language is, as you said, to set the HTTP Accept-Language request header to the locale(s) you want. That is it.
Now the fact that you request a page in a particular language doesn't mean that you will actually get a page in that language. The server has to be set up to process that HTTP header and respond accordingly. Even if a site has a whole section in spanish it doesn't mean that the site is responding to the HTTP header.
A clear example of this is the page you provided. I performed a quick test on it and found that it is clearly not responding accordingly to the Accept-Language I've set (which was es). Hitting the home page using es resulted in getting results in english. However, the page has a link that states En Español which means In Spanish the page does switch to spanish and you get redirected to https://www.bankofamerica.com?request_locale=es_US.
So you might be tempted to think that the page handles the locale by a request parameter. However, that is not (only) the case. Because if you then open the home page again (without the locale parameter) you will see the Spanish version again. That is clearly a proof that they are being stored somewhere else, most likely in the session, which will most likely be handled by cookies.
That can easily be confirmed by opening a private session or clearing the cookies and confirming this behaviour (I've just done that).
I think that explains the mystery of the webpage existing in Spanish but being fetched in English. (Note how most bank webpages do not conform to basic standards such as responding to simple HTTP requests... and they are handling our money!)
Regarding your second question, it would be like asking What is the recipe to not get ill ever?. It just doesn't depend on you. Also note that your first concerned used the word request while your second concern used the word retrieve. I think it should be clear by now that you can only be 100% sure of what you request but not of what you retrieve.
Regarding setting a value in a cookie manually, that is technically possible. However, that is just like adding another parameter in a get request: http://domain.com?login=yes. The parameter will only be processed by the server if it is expecting it. Otherwise, it will be ignored. That is what will happen to the value in your cookie.
Summary: There are standards to follow. You can try to use them but if the one in the other side doesn't then you won't get the results you expect. Your best choice: do your best and follow the standards.
I am having an issue with ajax caching, This was a problem in IE browser too but i fixed it by Writing the Following code.
response.setHeader("Cache-Control", "no-cache");
response.setHeader("expires","-1");
response.setHeader("pragma","no-cache");
But I see Safari4.0 on MAC is Caching the Ajax request(We have a requirment to support this). Fire Fox never a problem. Regarding this "Expire" i am setting it to -1, i see lot of places it is set 0 or some old date from past. Will it make a difference?
Send an extra parameter with your GET request that will never be the same, for example, the current timestamp. Something like:
url = url + '&nocache=' + new Date().getTime();
This will prevent caching.
First, a note on your Expires header. Your question doesn't specify what server framework you're using, so I'm not sure if this is applicable. However, it looks like you might be sending an invalid Expires header.
The RFC requires Expires to be a date, however you appear to be setting the header to a literal "-1". There are many frameworks that have an expires property on their HTTP response object that takes an int and automatically calculates a date that is that number of seconds from now.
Use a HTTP inspector to ensure that your server is sending a validly formatted date and not -1 in the Expires header.
You might try making your Cache-Control header more restrictive:
response.setHeader("Cache-Control", "private, no-cache, no-store, must-revalidate");
must-revalidate tells caches that they must obey any freshness information you give them. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you’re telling the cache that you want it to strictly follow your rules. [1]
According to RFC 2616 section 9.5 about POST
Responses to this method are not cacheable, unless the response
includes appropriate Cache-Control or Expires header fields. However,
the 303 (See Other) response can be used to direct the user agent to
retrieve a cacheable resource.
So, the browser must not cache POST responses, unless the response specifies otherwise. In the same time, browsers may cache GET responses, unless the response specifies otherwise. So, for the requests that should not be cached, such as AJAX requests, POST is preferrable method.
If you, for any reason, don't want to use POSTs for AJAX, you should use the trick mentioned by minitech, it is in fact widely used to force browser to load current version of any resource.