How to programmatically access web page in java - java

There is a web page from which I want to retrieve a certain string. In order to do so, I need to login, click some buttons, fill a text box, click another button - and then the string appears.
How can I write a java program to do that automatically? Are there any useful libraries for that purpose?
Thanks

Try HtmlUnit
HtmlUnit is a "GUI-Less browser for
Java programs". It models HTML
documents and provides an API that
allows you to invoke pages, fill out
forms, click links, etc... just like
you do in your "normal" browser.
Example code for submiting form:
#Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
For more details check:
http://htmlunit.sourceforge.net/gettingStarted.html

The super simple way to do this is using HtmlUnit here:
http://htmlunit.sourceforge.net/
and what you want to do can be as simple as:
#Test
public void homePage() throws Exception {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}

Take a look at the apache HttpClient project, or if you need to run Javascript on the page, try HttpUnit.

Well when you press a button usually you do a request via a HTTP POST method, so you should use HttpClient to handle request and HtmlParser to handle the response page with the string you need.

Yes:
java.net.URL#openConnection() will allow you to make http requests and get the http responses
Apache HttpComponents is a library that makes it easier to work with HTTP.

Related

Open a web browser page after a POST request using Htmlunit library

I'm testing my website and what I do is moving inside of it using Htmlunit library and Java. Like this for example:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
HtmlPage page1 = webClient.getPage(mypage);
// sent using POST
HtmlForm form = page1.getForms().get(0);
HtmlSubmitInput button = form.getInputByName("myButton");
HtmlPage page2 = button.click();
// I want to open page2 on a web browser and continue there using a function like
// continueOnBrowser(page2);
I filled a form programmatically using Htmlunit then I sent the form which uses a POST method. But I'd want to see the content of the response inside a web browser page. The fact is that if I use the URL to see the response it doesn't work since it's the response to a POST method.
It seems like it's the wrong approach to me, it's obvious that if you do anything programmatically you could not expect to open the browser and continue there... I can't figure out what could solve my problem.
Do you have any suggestions?

Testing for Concurrent users for a dynamic web appliacation

I would like to test a web application which takes an input as parameter and produces output. I don't want to do load or stress testing, I would like to have some 100 users inputting the parameter and clicking the submit. How can we automate this?
The web application I would like to test is http://protein.rnet.missouri.edu:8080/MongoTest/
You can achieve such functionality by using HtmlUnit.
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML
documents and provides an API that allows you to invoke pages, fill
out forms, click links, etc... just like you do in your "normal"
browser.
The way to do this is something like the following:
//set browser
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_10);
//not to throw exception on javascript error
webClient.setThrowExceptionOnScriptError(false);
//set page to access
final HtmlPage homepageEn = webClient.getPage("http://protein.rnet.missouri.edu:8080/MongoTest/");
//get the form by id
HtmlForm form = homepageEn.getFirstByXPath("//form[#id='input_form']");
//setup the fields to use
HtmlTextInput mailField = form.getInputByName("mail");
HtmlPasswordInput passwordField = form.getInputByName("password");
//define the submit button (defined by value)
HtmlSubmitInput submitButton = form.getInputByValue("submit");
//change the value of text fields
mailField.setValueAttribute("somemail#xyzmail.com");
passwordField.setValueAttribute("some_password");
//finally submit the form by clicking the button
final HtmlPage resultsPage = submitButton.click();
You can then implement the 100 users maybe using a loop or something. That's totally up to you..
Hope this helps...

input in search box of a website programmatically using JAVA

I'm making a Java program where I programmatically insert data into search field of a website and submit it programmatically using java .
After submission a new webpage is opened..
Eg if website name is www.pqr.net/index.php
after I make search submission I'm redirected to that page.
eg. www.pqr.net/ind2.php
i know i can read data using URLCONNECTION.
how to get the url of page where I'm redirected because I want to read the contents of that page , unless I don't know the url of the page where I'm redirected , I can't read the contents
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("www.pqr.net");
HtmlForm form = page1.getFormByName("f1");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("searc");
textField.setValueAttribute("value");
final HtmlPage page2 = button.click();
The URL of the page you are redirected to is in a Location header of the response message. Please refer to the specification for the details, and to the HttpURLConnection javadoc for the method you should use to get a Header from the response.

Getting Final HTML with Javascript rendered Java as String

I want to fetch data from an HTML page(scrape it). But it contains reviews in javascript. In normal java url fetch I am only getting the HTML(actual one) without Javascript executed. I want the final page with Javascript executed.
Example :- http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp
This page has comments as a facebook plugin which are fetched as Javascript.
Also similar to this even on this.
http://www.imdb.com/title/tt0848228/reviews
What should I do?
Use phantomjs: http://phantomjs.org
var page = require('webpage').create();
page.open("http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp")
setTimeout(function(){
// Where you want to save it
page.render("screenshoot.png")
// You can access its content using jQuery
var fbcomments = page.evaluate(function(){
return $(".fb-comments iframe").contents().find(".postContainer")
})
},10000)
You have to use the option in phantom --web-security=no to allow cross-domain interaction (ie for facebook iframe)
To communicate with other applications from phantomjs you can use a web server or make a POST request: https://github.com/ariya/phantomjs/blob/master/examples/post.js
You can use HTML Unit, A java based "GUI LESS Browser". You can easily get the final rendered output of any page because this loads the page as a web browser do so and returns the final rendered output. You can disable this behaviour though.
UPDATE: You were asking for example? You don't have to do anything extra for doing that:
Example:
WebClient webClient = new WebClient();
HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl));
UPDATE 2: You can get iframe as follows:
HtmlPage myFrame = (HtmlPage) myPage.getFrameByName(myIframeName).getEnclosedPage();
Please read the documentation from above link. There is nothing you can't do about getting page content in HTMLUnit
The simple way to solve that problem.
Hello, you can use HtmlUnit is java API, i think it can help you to access the executed js content, as a simple html.
WebClient webClient = new WebClient();
HtmlPage myPage = (HtmlPage) webClient.getPage(new URL("YourURL"));
System.out.println(myPage.getVisibleText());

HtmlUnit, how to post form without clicking submit button?

I know that in HtmlUnit i can fireEvent submit on form and it will be posted. But what If I disabled javascript and would like to post a form using some built in function?
I've checked the javadoc and haven't found any way to do this. It is strange that there is no such function in HtmlForm...
I read the javadoc and tutorial on htmlunit page and I Know that i can use getInputByName() and click it. BuT sometimes there are forms that don't have submit type button
or even there is such button but without name attribute.
I am asking for help in such situation, this is why i am using fireEvent but it does not always work.
You can use a 'temporary' submit button:
WebClient client = new WebClient();
HtmlPage page = client.getPage("http://stackoverflow.com");
// create a submit button - it doesn't work with 'input'
HtmlElement button = page.createElement("button");
button.setAttribute("type", "submit");
// append the button to the form
HtmlElement form = ...;
form.appendChild(button);
// submit the form
page = button.click();
WebRequest requestSettings = new WebRequest(new URL("http://localhost:8080/TestBox"), HttpMethod.POST);
// Then we set the request parameters
requestSettings.setRequestParameters(Collections.singletonList(new NameValuePair(InopticsNfcBoxPage.MESSAGE, Utils.marshalXml(inoptics, "UTF-8"))));
// Finally, we can get the page
HtmlPage page = webClient.getPage(requestSettings);
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlPage page2 = button.click()
From the htmlunit doc
#Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
How about getting use of built-in javascript support? Just fire submit event on that form:
HtmlForm form = page.getForms().get(0);
form.fireEvent(Event.TYPE_SUBMIT);
The code supposes you want to submit first form on the site.
And if the submit forwards you to another site, just link the response to the page variable:
HtmlForm form = page.getForms().get(0);
page = (HtmlPage) form.fireEvent(Event.TYPE_SUBMIT).getNewPage();
Although this question has good and working answers none of them seems to emulate the acutal user behaviour quite well.
Think about it: What does a human do when there's no button to click? Simple, you hit enter (return). And that's just how it works in HtmlUnit:
// assuming page holds your current HtmlPage
HtmlForm form = page.getFormByName("yourFormName");
HtmlTextInput input = form.getInputByName("yourTextInputName");
// type something in
input.type("here goes the input");
// now hit enter and get the new page
page = (HtmlPage) input.type('\n');
Note that input.type("\n"); is not the same as input.type('\n');!
This works when you have disabled javascript exection and when there's no submit button available.
IMHO you should first think of how you want to submit the form. What scenario is it that you want to test? A user hitting return might be a different case that clicking some button (that might have some onClick). And submitting the form via JavaScript might be another test case.
When you figured that out pick the appropriate way of submitting your form from the other answers (and this one of course).

Categories

Resources