I want to crawl web page using HtmlUnit. My purpose is:
Load page
Write something to text field
Press download button
Get new page
This is the web site: https://9xbuddy.com/
Using browser I can write an url to text field then press download button and get download link
My code is:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/sites/fembed");
final HtmlForm form = page.getForms().get(0);
final HtmlInput urlInput = form.getInputByName("url");
urlInput.click();
urlInput.type(iframeUrl);
final List<HtmlButton> byXPath = (List<HtmlButton>) form.getByXPath("//button[#class='orange-gradient submit_btn']");
final HtmlPage click = byXPath.get(0).click();
webClient.waitForBackgroundJavaScript(15000);
The problem is:
When I press download button it probably send Ajax reuquest because title changed to save and after few seconds title changed to Process clompleted
. With the code below I want to wait all ajax request but what i finally got is save title that mean HtmlUnit didn't wait for ajax. What is my way to do it?
Related
I want to crawl web page, this page has a download button, when I press it current page show me download progress in title and then show me download link which can be pressed. I think its done via Ajax because I can see some in developer console -> Network ->XHR
This my code to crawl site
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/process?url=https://www.fembed.com/v/6mv22g3qfsdfsd");
// final ScriptResult scriptResult = page.executeJavaScript("beacon.js");
webClient.waitForBackgroundJavaScript(10000);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
But this code return me page which I get after button click and don't load Ajax. I know which Ajax requests were made by site, is it any way to manually call Ajax requests?
You can construct the Ajax calls manually with HtmlUnit, if you find that the Google Chrome console is not sufficient, you can use a tool such as Fiddler. Once you have identified the HTTP call, you can reconstruct it with HTMLUnit like below
URL url = new URL(
"http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);
requestSettings.setAdditionalHeader("Accept", "*/*");
requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
Page page = webClient.getPage(requestSettings);
System.out.println(page.getWebResponse().getContentAsString());
I tried hard to find a way to extract data from my LinkedIn account without
using the REST API but any result :/ Anyone know if it's possible and how?
When I tried this code in Eclipse the result were either a
NullPointerException or null when I selected some fields from the response
html page.
Note that the selector path works well in the console of the browser.
Thank you very much.
String url = "https://www.linkedin.com/uas/login?goback=&trk=hb_signin";
final WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
HtmlPage loginPage = webClient.getPage(url);
final HtmlForm loginForm = loginPage.getFormByName("login");
final HtmlSubmitInput button = loginForm.getInputByName("signin");
final HtmlTextInput usernameTextField =
loginForm.getInputByName("session_key");
final HtmlPasswordInput passwordTextField =
loginForm.getInputByName("session_password");
usernameTextField.setValueAttribute("something#outlook.com");
passwordTextField.setValueAttribute("**************");
final HtmlPage response = button.click();
loginPage=webClient.getPage("https://www.linkedin.com/in/issa-hammoud-
0a2802114/");
System.out.println(loginPage.querySelector("#profile-wrapper > div.pv-
content.profile-view-grid.neptune-grid.two-column.ghost-animate-in >
div.core-rail > section div > div > button > img");
Since you are making a secured connection (HTTPS) you need to specify getOptions().setUseInsecureSSL(true);
Also make sure you enable cookies getCookieManager().setCookiesEnabled(true);
Having said that you should really be using the Linkedin's REST API.
Hope that helps
I'm making a Java program where I programmatically insert data into search field of a website and submit it programmatically using java .
After submission a new webpage is opened..
Eg if website name is www.pqr.net/index.php
after I make search submission I'm redirected to that page.
eg. www.pqr.net/ind2.php
i know i can read data using URLCONNECTION.
how to get the url of page where I'm redirected because I want to read the contents of that page , unless I don't know the url of the page where I'm redirected , I can't read the contents
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("www.pqr.net");
HtmlForm form = page1.getFormByName("f1");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("searc");
textField.setValueAttribute("value");
final HtmlPage page2 = button.click();
The URL of the page you are redirected to is in a Location header of the response message. Please refer to the specification for the details, and to the HttpURLConnection javadoc for the method you should use to get a Header from the response.
I know that in HtmlUnit i can fireEvent submit on form and it will be posted. But what If I disabled javascript and would like to post a form using some built in function?
I've checked the javadoc and haven't found any way to do this. It is strange that there is no such function in HtmlForm...
I read the javadoc and tutorial on htmlunit page and I Know that i can use getInputByName() and click it. BuT sometimes there are forms that don't have submit type button
or even there is such button but without name attribute.
I am asking for help in such situation, this is why i am using fireEvent but it does not always work.
You can use a 'temporary' submit button:
WebClient client = new WebClient();
HtmlPage page = client.getPage("http://stackoverflow.com");
// create a submit button - it doesn't work with 'input'
HtmlElement button = page.createElement("button");
button.setAttribute("type", "submit");
// append the button to the form
HtmlElement form = ...;
form.appendChild(button);
// submit the form
page = button.click();
WebRequest requestSettings = new WebRequest(new URL("http://localhost:8080/TestBox"), HttpMethod.POST);
// Then we set the request parameters
requestSettings.setRequestParameters(Collections.singletonList(new NameValuePair(InopticsNfcBoxPage.MESSAGE, Utils.marshalXml(inoptics, "UTF-8"))));
// Finally, we can get the page
HtmlPage page = webClient.getPage(requestSettings);
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlPage page2 = button.click()
From the htmlunit doc
#Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
How about getting use of built-in javascript support? Just fire submit event on that form:
HtmlForm form = page.getForms().get(0);
form.fireEvent(Event.TYPE_SUBMIT);
The code supposes you want to submit first form on the site.
And if the submit forwards you to another site, just link the response to the page variable:
HtmlForm form = page.getForms().get(0);
page = (HtmlPage) form.fireEvent(Event.TYPE_SUBMIT).getNewPage();
Although this question has good and working answers none of them seems to emulate the acutal user behaviour quite well.
Think about it: What does a human do when there's no button to click? Simple, you hit enter (return). And that's just how it works in HtmlUnit:
// assuming page holds your current HtmlPage
HtmlForm form = page.getFormByName("yourFormName");
HtmlTextInput input = form.getInputByName("yourTextInputName");
// type something in
input.type("here goes the input");
// now hit enter and get the new page
page = (HtmlPage) input.type('\n');
Note that input.type("\n"); is not the same as input.type('\n');!
This works when you have disabled javascript exection and when there's no submit button available.
IMHO you should first think of how you want to submit the form. What scenario is it that you want to test? A user hitting return might be a different case that clicking some button (that might have some onClick). And submitting the form via JavaScript might be another test case.
When you figured that out pick the appropriate way of submitting your form from the other answers (and this one of course).
I have a URL that I need to go to in my Java application and then get the source code of the page. The problem is that you need to be authorized on Facebook to access the page.
Is it possible to go into a web browser, log-in to Facebook then somehow run my application and have access to the page?
Or do I need to log into Facebook through my application? How do I do that? I have tried using this: code.google.com/p/facebook-java-api/ but I can't find any basic tutorials for noobs on how to set this up and most are outdated so please don't link me to anything.
I'd prefer to use only the official API if that's possible...
Thanks in advance.
The following code and this page should help you to log into your facebook with Java:
final WebClient webClient = new WebClient();
final HtmlPage page1 = webClient.getPage("http://www.facebook.com");
final HtmlForm form = page1.getFormByName("login_form");
final HtmlSubmitInput button = form.getInputsByValue("Log in");
final HtmlTextInput textField = form.getInputByName("email");
textField.setValueAttribute("youremailaddress#domain.com");
final HtmlTextInput textField = form.getInputByName("pass");
textField.setValueAttribute("yourPassword");
final HtmlPage page2 = button.click();