I want fill https://login.live.com/ form but I coult not. I don't want use Chromium Embeded Framework or java Selenium. Because they opening a browser. Is there a way do it without open browser?
I tried HtmlUnit but javascript problem occurred:
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
final HtmlPage page1 = webClient.getPage("https://login.live.com/en");
final HtmlForm form = (HtmlForm) page1.getElementById("i0281");
final HtmlTextInput textField = form.getInputByName("loginfmt");
textField.setValueAttribute("email");
Error message:
Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[input] attributeName=[name] attributeValue=[loginfmt]
at com.gargoylesoftware.htmlunit.html.HtmlForm.getInputByName(HtmlForm.java:572)
It is working html pages without javascript.
If you don't want code you can give me some hint. You can use this or you can google this framework ect...
Thank you
I want to crawl web page using HtmlUnit. My purpose is:
Load page
Write something to text field
Press download button
Get new page
This is the web site: https://9xbuddy.com/
Using browser I can write an url to text field then press download button and get download link
My code is:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/sites/fembed");
final HtmlForm form = page.getForms().get(0);
final HtmlInput urlInput = form.getInputByName("url");
urlInput.click();
urlInput.type(iframeUrl);
final List<HtmlButton> byXPath = (List<HtmlButton>) form.getByXPath("//button[#class='orange-gradient submit_btn']");
final HtmlPage click = byXPath.get(0).click();
webClient.waitForBackgroundJavaScript(15000);
The problem is:
When I press download button it probably send Ajax reuquest because title changed to save and after few seconds title changed to Process clompleted
. With the code below I want to wait all ajax request but what i finally got is save title that mean HtmlUnit didn't wait for ajax. What is my way to do it?
I want to crawl web page, this page has a download button, when I press it current page show me download progress in title and then show me download link which can be pressed. I think its done via Ajax because I can see some in developer console -> Network ->XHR
This my code to crawl site
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/process?url=https://www.fembed.com/v/6mv22g3qfsdfsd");
// final ScriptResult scriptResult = page.executeJavaScript("beacon.js");
webClient.waitForBackgroundJavaScript(10000);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
But this code return me page which I get after button click and don't load Ajax. I know which Ajax requests were made by site, is it any way to manually call Ajax requests?
You can construct the Ajax calls manually with HtmlUnit, if you find that the Google Chrome console is not sufficient, you can use a tool such as Fiddler. Once you have identified the HTTP call, you can reconstruct it with HTMLUnit like below
URL url = new URL(
"http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);
requestSettings.setAdditionalHeader("Accept", "*/*");
requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
Page page = webClient.getPage(requestSettings);
System.out.println(page.getWebResponse().getContentAsString());
I'm trying to extract statements from my gas company (PSNC Energy) at https://www.psncenergy.com
Code is as follows:
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.INFO);
WebClient webClient = new WebClient(BrowserVersion.CHROME);
try {
HtmlPage page = webClient.getPage("https://www.psncenergy.com");
System.out.println(page.asXml());
HtmlInput userNameInput = page.getFirstByXPath("//input[#name='user-name']");
userNameInput.setTextContent("john");
HtmlPasswordInput password = page.getFirstByXPath("//input[#type='password']");
password.setText("doe");
HtmlButton loginButton = (HtmlButton) page.getElementById("login-button");
The page returned is entirely different than what I would have gotten with either FF or Chrome. The OS platform is Linux FC 24. In particular, PSNC's page as reported by HtmlUnit says "You appear to be using a small screen."
I could try to use a different login page, but that gives a different (Javascript) error, which again is different than what a real browser gives.
My goal is to sign in, download the latest bill from them, then sign off.
I'm testing my website and what I do is moving inside of it using Htmlunit library and Java. Like this for example:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
HtmlPage page1 = webClient.getPage(mypage);
// sent using POST
HtmlForm form = page1.getForms().get(0);
HtmlSubmitInput button = form.getInputByName("myButton");
HtmlPage page2 = button.click();
// I want to open page2 on a web browser and continue there using a function like
// continueOnBrowser(page2);
I filled a form programmatically using Htmlunit then I sent the form which uses a POST method. But I'd want to see the content of the response inside a web browser page. The fact is that if I use the URL to see the response it doesn't work since it's the response to a POST method.
It seems like it's the wrong approach to me, it's obvious that if you do anything programmatically you could not expect to open the browser and continue there... I can't figure out what could solve my problem.
Do you have any suggestions?