HtmlUnit: Request website from server in a specific language - java

I am looking for a clean/simple way in HtmlUnit to request a webpage from a server in a specific language.
To do this i have been trying to request "bankofamerica.com" for their homepage in spanish instead of english.
This is what i have done so far:
I tried to set "Accept-Language" header to "es" in the Http request. I did this using:
myWebClient.addRequestHeader("Accept-Language" , "es");
It did not work. I then created a web request with the following code:
URL myUrl = new URL("https://www.bankofamerica.com/");
WebRequest myRequest = new WebRequest(myUrl);
myRequest.setAdditionalHeader("Accept-Language", "es");
HtmlPage aPage = myWebClient.getPage(myRequest);
Since this failed too i printed out the request object for this url , to check if these headers are being set.
[<url="https://www.bankofamerica.com/", GET, EncodingType[name=application/x-www-form-urlencoded], [], {Accept-Language=es, Accept-Encoding=gzip, deflate, Accept=*/*}, null>]
So the server is being requested for a spanish page but in response its sending the homepage in english (the response header has the value of Content-Language set to en-US)
I did find a hack to retrieve the BOA page in spanish. I visited this page and used the chrome developer tool to get the cookie value from the request
header. I used this value to do the following:
myRequest.setAdditionalHeader("Cookie", "TLTSID= ........._LOCALE_COOKIE=es-US; CONTEXT=es_US; INTL_LANG=es_US; LANG_COOKIE=es_US; hp_pf_anon=anon=((ct=+||st=+||fn=+||zc=+||lang=es_US));..........1870903; throttle_value=43");
I am guessing the answer lies somewhere here.
Here lies my next question. If i am writing a script to retrieve 100 different websites in Spanish (ie Assuming they all have their pages in the spanish) . Is there a clean way in HtmlUnit to accomplish this.
(If cookies is indeed a solution then to create them in htmlunit you need to specify the domain name. One would have to then create cookies for each of the 100 sites. As far as i know there is no way in HtmlUnit to do something like:
Cookie langCookie = new Cookie("All Domains","LANG_COOKIE","es_US");
myWebClient.getCookieManager().addCookie(langCookie);)
NOTE: I am using HtmlUnit 2.12 and setting BrowserVersion.CHROME in the webclient
Thanks.

Regarding your first concern the clear/simple(/only?) way of requesting a webpage in a particular language is, as you said, to set the HTTP Accept-Language request header to the locale(s) you want. That is it.
Now the fact that you request a page in a particular language doesn't mean that you will actually get a page in that language. The server has to be set up to process that HTTP header and respond accordingly. Even if a site has a whole section in spanish it doesn't mean that the site is responding to the HTTP header.
A clear example of this is the page you provided. I performed a quick test on it and found that it is clearly not responding accordingly to the Accept-Language I've set (which was es). Hitting the home page using es resulted in getting results in english. However, the page has a link that states En Español which means In Spanish the page does switch to spanish and you get redirected to https://www.bankofamerica.com?request_locale=es_US.
So you might be tempted to think that the page handles the locale by a request parameter. However, that is not (only) the case. Because if you then open the home page again (without the locale parameter) you will see the Spanish version again. That is clearly a proof that they are being stored somewhere else, most likely in the session, which will most likely be handled by cookies.
That can easily be confirmed by opening a private session or clearing the cookies and confirming this behaviour (I've just done that).
I think that explains the mystery of the webpage existing in Spanish but being fetched in English. (Note how most bank webpages do not conform to basic standards such as responding to simple HTTP requests... and they are handling our money!)
Regarding your second question, it would be like asking What is the recipe to not get ill ever?. It just doesn't depend on you. Also note that your first concerned used the word request while your second concern used the word retrieve. I think it should be clear by now that you can only be 100% sure of what you request but not of what you retrieve.
Regarding setting a value in a cookie manually, that is technically possible. However, that is just like adding another parameter in a get request: http://domain.com?login=yes. The parameter will only be processed by the server if it is expecting it. Otherwise, it will be ignored. That is what will happen to the value in your cookie.
Summary: There are standards to follow. You can try to use them but if the one in the other side doesn't then you won't get the results you expect. Your best choice: do your best and follow the standards.

Related

How to write a Java code to read fields from a website that requires login and uses POST request?

Need some help with fetching some data from a website.
Previously , we had following code in our application and it used to fetch the required data. We just used to read the required fields by forming a URL by passing username , password and search parameter (DEA number). The same URL (with parameters ) could also be hit from browser directly to see the results. It was a simple GET request:
{URL url = new URL(
"http://www.deanumber.com/Websvc/deaWebsvc.asmx/GetQuery?UserName="+getUsername()+"&Password="+getPassword()+"&DEA="
+ deaNumber
+ "&BAC=&BASC=&ExpirationDate=&Company=&Zip=&State=&PI=&MaxRows=");
Document document = parser.parse(url.toExternalForm());
// Ask the document for a list of all <sect1> tags it contains
NodeList sections = document.getElementsByTagName("DEA");
//Followed by a loop code to get each element by using sections.item(index).getFirstChild() etc.
}
Now, the website URL has got changed to following:
https://www.deanumber.com/RelId/33637/ISvars/default/Home.htm
I am able to login to the URL with credentials , go to the search page , enter the DEA number and search. The login page comes as a pop-up once I click 'Login' link on home page. Also, the final result comes as a pop-up. This is a POST request so I am unable to form the complete URL which I could use in my code.
I am not an expert in Web Services , but I think I need a web service URL like the one mentioned in the code above. Not sure how to get that !! Even if I get the URL , I am not sure how to perform the login through Java code and search the DEA number.
Also, it would be great if I could validate the URL manually before using in Java. Let me know if there is any way.
Or, in case there is any alternate approach in Java; kindly suggest.
Thanks in advance.
First of all, the previous approach provided by the website was completely wrong and insecure, because it passes the username and password as querystring parameters in plain text. I think, they would have realized this thing and changed their way of authentication.
Also, it looks like that they have restricted the direct URL based requests from the client applications like yours. For such requests from clients, they have published the web services. Check this link. They also have mentioned the rates for web service request counts.
So, you may need to open a formal communication channel to get authentication and other details to access their web services for this purpose. Depends on what they use for web service client authentication, you may code your client to access the web services.
I hope this helps.

URL not changing on page redirection in Jersey and MongoDB

My scenario is like this:
I'm building a website where I'm posting an ad regarding a topic. So, after the form filling of ad, the request goes to a REST service class as:
http://localhost:8080/cloudproject/postadvaction?title=tution&tag=tution&description=tution+%401000+%2F+month&category=TUTOR&location=indore
Here, the details of ad go in the database which is MongoDB. After all of this is done I'm redirecting to the profile page of user using Viewable model of jersey, where he can see all the ads posted by him. It is done as:
return new Viewable("/profile.jsp");
After this the response is redirected to profile page of the user.
But the problem is that, on redirecting the response to simply profile.jsp, the URL in the address bar has not changed to http://localhost:8080/profile.jsp, instead, it has remained the same as mentioned above. So, when user refreshes the page, the request of same ad post triggers and the whole process is followed again. Since, database is MongoDB, same ad is stored twice in it and same is displayed on the profile page of user with 2 identical ads.
So, how can I redirect to profile page without having the address of servlet in address bar?
Update: The question is related to PRG technique & Duplicate Form Submissions and not to just redirection.
See Post/Redirect/Get
When a web form is submitted to a server through an HTTP POST request, a web user that attempts to refresh the server response in certain user agents can cause the contents of the original HTTP POST request to be resubmitted, possibly causing undesired results, such as a duplicate web purchase.
To avoid this problem, many web developers use the PRG pattern[1] — instead of returning a web page directly, the POST operation returns a redirection command. The HTTP 1.1 specification introduced the HTTP 303 ("See other") response code to ensure that in this situation, the web user's browser can safely refresh the server response without causing the initial HTTP POST request to be resubmitted. However most common commercial applications in use today (new and old alike) still continue to issue HTTP 302 ("Found") responses in these situations.
With Jersey you can use
Response.seeOther(URI) - Create a new ResponseBuilder for a redirection. Used in the redirect-after-POST (aka POST/redirect/GET) pattern.
You just need to change your method signature to return a Response and return the built Response
return Response.seeOther(URI.create(...)).build();
Also stated about the URI parameter
the redirection URI. If a relative URI is supplied it will be converted into an absolute URI by resolving it relative to the base URI of the application (see UriInfo.getBaseUri()).

How to programmatically send a HTTP request with parameters? [duplicate]

This question already has answers here:
How to use java.net.URLConnection to fire and handle HTTP requests
(12 answers)
Closed 7 years ago.
If I use a browser to send information to the server (for example using a log-in, password page), I just fill the user text-box and the password text-box and clicking on the log-in button.
I would like to send this information but without having to use the browser. I would like to 'fill' the text-boxes but without having to do it manually in the browser. May be using a Servlet.
My question is: How to send information in text-boxes, for example, to a website, doing it from a Servlet?
why not just make a call to the URL from Java using a URL like http://your.domain.name/your/servlet/path?userFieldName=THE_VALUE_YOU_WANT_TO_PASS&passwdFieldName=PASSWORD
The servlet will feel like the values are coming from those boxes.
Or you may want to dive into Apache HTTP Client to mimick a request sent from an client.
uh..oh.. are you doing functional testing? Why not look into JMeter?
Updates as per comment
You need to know what actually form submission does? It basically forms a query string composed of Key-Values (KV) pair.
So, if you have a a text field named tfield where user has typed some text, and there is a drop down named, ddfield where user has selected optionX which has value optionX-Val. And this form gets submitted to a URL, http://my.domain.name/my/servlet -- the browser will send a request which will look like
http://my.domain.name/my/servlet?tfield=some%20text&ddfield=optionX-Val
If you want to mimic form submission, you will have to manually create a URL that has a request string containing all the fields and their values as FIELD_NAME=FIELDVALUE ordered pair separated by ampersand (&)
ah, great idea. If you use Firebug (a Firefox extension), open the NET panel in Firebug, make a manual submission of the form that you wanted to mimic. See what request is posted when you submitted the form. It will have exact URL format that you are after. Copy this URL, replace the values and make fake submissions as much as you want.
Hope this helps.
It is not clear to me what you really up to. I assume that the servlet will be the one who will send the data. Here some examples.
Using setAttribute then Forward the request
//On your servlet
request.setAttibute('user', 'admin');
request.setAttribute('password', '123');
getServletContext().getRequestDispatcher("page.jsp").forward(request, response);
//On your jsp page get the value using EL
<span>${user}</span>
Using session
//On your servlet
HttpSession session = request.getSession(true);
session.setAttribute('user', 'admin');
session.setAttribute('password', '123');
getServletContext().getRequestDispatcher("page.jsp").forward(request, response);
//On your jsp page get the value using EL
<span>${user}</span>
The above example is intended to work within the web application. To send information to another web application, which expecting a request. See sample below.
//On your jsp or servlet, you can also do the same within web application
request.sendRedirect('http://example.com?user=admin&password=123');
//on your jsp #example.com
<span>${param.user}</span>
If this is not what you mean, adding more details will be a help.
a servlet takes care of the other end: it's basically a handler for http requests that lives inside a servlet container. If I understand you correctly, you're wanting to send an http request. You can do that using command-line tools like curl, or if you want to stay within java land, you could try this example on exampledepot. Use your favourite search engine to search for more examples, e.g. with search terms such as "sending GET requests through a url".
In your situation, where you need to send information for username and password, you would need to look at the html and find the url for the form element's action attribute. Then you need to find the names of the username and password fields. Using these names as url parameters, you can construct a GET request that mimics sending a form.
NOTE: usually storing a password in plain text in code and/or sending it in plain text to a website is not a good thing to do.
Just in case anyone is interested, there is a plugin for Firefox called Tamper data. With it you can stop the sending of and http request and modify it. It will show you the "url" you need for sending the params, the values they currently have, and their name. You can check it out here. After that you can use a request.sendRedirect('url you got from Tamper Data');

Logging in remote site then testing the form

There's a website www.runescape.com and I'm trying to find a way to login the site and automate entering 2-3 letter words (from a-z0-9) to find a unique list of available usernames. (https://secure.runescape.com/m=displaynames/name.ws)
This is the only page I know of where you can view usernames that are available or taken.
In the form where you change your name--you type in a name and it queries it for you with no form submit needed. In about a second or two, the page will display a lime green "Available" or a red "Not available" in text.
Is there a way in java where I can login to see this page, then enter in values, and read if username(s) are Available or Not available? Can someone help point me in the right direction to do this?
Much appreciation.
By viewing the source code of the web site, I can see that the form is sent via POST to https://secure.runescape.com/m=weblogin/login.ws. You will want to send an HTTP Request to that address with the associated username and password parameters respectively.
There's a Java HTTP Request tutorial on SO Here: Using java.net.URLConnection to fire and handle HTTP requests
After you've sent the HTTP Request to the page, an HTTP Response object should be returned to you that has the body of the entire web page that you're looking for. Most likely, you'll need to use RegEx to scan through the body of text and filter out the data you want e.g. <li>([a-z0-9]{1,20})<\/li>
Handling the HTTP Response object is also available on the link provided.

Using hash symbol in forward URL with RequestDispatcher

I'm trying to forward a request to another URL which includes the hash symbol ('#'):
request.getRequestDispatcher("/some/path.jsp#somehash").forward(request, response);
Tomcat, however, tells me that "the requested resource is not available". If I remove the hash from the URL, everything works fine. Are hashes not allowed or am I not treating them right?
The # symbol is a browser thing, not a server thing. When you type a URL with a # into the browser, the browser doesn't send that part to the server. It sends the URL without it, then jumps to the named anchor when it gets the page back.
When you ask the container to get that URL for you, it doesn't treat the # any differently to any other URL - it has no special meaning for it, so it looks for a JSP page called /some/path.jsp#somehash, which of course doesn't exist.
You'll need to keep that jump-to-anchor logic on the client somehow. Perhaps you could put some javascript on the resulting page to scroll to that point in the document.
URL fragments are purely client side. The RequestDispatcher#forward() is entirely server-side. The in the forward given URL won't be sent to the client side. You can however redirect to the given URL using HttpServletResponse#sendRedirect(). The URL fragment will then be sent to the client side and reflected in browser address bar as well. Redirecting the request has however the disadvantage that the current request will be garbaged and a brand new one will be created. If that's not affordable, then you'll indeed have to look in the JavaScript corner for the solution.

Categories

Resources