I am using htmlunit from net.sourceforge.htmlunit for simulating web browser. I try to log in in steam web app, but I encoutered problem. After setting credentials I wanted to use click method:
final WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.setCookieManager(new CookieManager());
final HtmlPage loginPage = webClient.getPage(loginPageConfiguration.getLoginPageUrl());
final HtmlTextInput user = loginPage.getHtmlElementById(loginPageConfiguration.getLoginInputId());
user.setText(loginCredentials.getUsername());
final HtmlPasswordInput password = loginPage.getHtmlElementById(loginPageConfiguration.getPasswordInputId());
password.setText(loginCredentials.getPassword());
final HtmlPage afterLoginPage = loginPage.getHtmlElementById(loginPageConfiguration.getLoginButtonId()).click();
In normal browser after succesfull login it redirects to http://store.steampowered.com/ but afterLoginPage is still in previous login page.
Without knowing the page and having credentials to access i can only guess. Maybe the application is a single page application that replaces the visual content using ajax (maybe in combination with redirecting). Because ajax is async it might help to wait a bit after the click and before addressing the page.
It also might be a good starting point to understand what is going on by checking the http communication of the real browser (by using the developer tools or by using a web proxy like Charles). You can than compare this with the communication done by HtmlUnit (e.g. enable HttpClient wire log).
Another option might be a javascript error. Please check your log output.
Related
When I open the same site with Google Chrome on one side and HtmlUnit.WebClient(BrowserVersion.CHROME) on the other, I do not see the same cookies on both sides. Cookies are checked here with Google-Chrome-Dev. Nine cookies vs four cookies for the same site.
The site is linckx.odoo.com.
Is there something missing in my HtmlUnit code?
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(false);
webClient.getCookieManager().setCookiesEnabled(true);
CookieManager cookieManager = webClient.getCookieManager();
final HtmlPage loginPage = webClient.getPage(url + "/en_US/web/login");
If i access the page https://linckx.odoo.com/en_US/web/login
with a new browser (remove all cookies before) i get
nothing more.
Maybe the differences are from other request you did in your browser before.
new WebClient() in HtmlUnit is like starting a Browser with a complete new profile (all caches are empty, no cookies).
I am using HtmlUnit to load a webpage containing a dynamically updated ajax component using the following:-
WebClient webClient = new WebClient(BrowserVersion.CHROME);
URL url = new URL("https://live.xxx.com/en/ajax/getDetailedQuote/" + instrument);
WebRequest requestSettings = new WebRequest(url, HttpMethod.POST);
HtmlPage redirectPage = webClient.getPage(requestSettings);
This works and I get the contents of the page at the time of request.
I want however to be able to monitor and respond to changes on the page.
I tried the following:-
webClient.addWebWindowListener(new WebWindowListener() {
public void webWindowContentChanged(WebWindowEvent event) {
System.out.println("Content changed ");
}
});
But I only get "Content changed" when the page is first loaded, and not when it updates.
I fear there is not really a solution with HtmlUnit (at least not out of the box). The webWindowContentChanged hook is only called if the (whole) page inside the window is replaced.
You can try to implement an DomChangeListener and attach that to the page (or maybe the body of the page).
If you like to track the ajax requests more on the http level you have the option to intercept the requests (see https://htmlunit.sourceforge.io/faq.html#HowToModifyRequestOrResponse for more details).
If you need more, please open an issue and we can discuss.
I have the URL https://www.facebook.com/ads/library/?id=286238429359299 which gets redirected to https://www.facebook.com/ads/library/?active_status=all&ad_type=political_and_issue_ads&country=US&impression_search_field=has_impressions_lifetime&id=286238429359299&view_all_page_id=575939395898200 in the browser.
I'm using the following code:
#Test
public void createWebClient() throws IOException {
getLogger("com.gargoylesoftware").setLevel(OFF);
WebClient webClient = new WebClient(CHROME);
WebClientOptions options = webClient.getOptions();
options.setJavaScriptEnabled(true);
options.setRedirectEnabled(true);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
// IMPORTANT: Without the country/language selection cookie the redirection does not work!
URL s = webClient.getPage("https://www.facebook.com/ads/library/?id=286238429359299").getUrl();
}
The above code doesn't take into account of the redirection, is there something I am missing? I need to get the final URL the original URL resolves to.
actually the url https://www.facebook.com/ads/library/?id=286238429359299 return a page with javascript.The javascript will detect environment of the web browser.For example,the js will detect if the current browser is the Headless browser and if the web driver is legal.So I think the solution is to analysis the javascript and you will get the final url.
I think it never actually resolves to final URL due being headless.
Please load the same page in a browser, load the source code and search for "page_uri" and you will see exactly URI you are looking for.
If you would check HtmlUnit output or print the page
System.out.println(page.asXml());
You will see that "page_uri" contains originally entered URL.
I suggest to use Selenium WebDriver (not headless)
I am trying to web router page scraping to search for connected devices and information about them.
I written this code:
String searchUrl="http://192.168.1.1";
HtmlPage page=client.getPage(searchUrl);
System.out.println(page.asXml());
The problem is that the code returns by HtmlUnit is different from the code in Chrome. In HtmlUnit I don't have the section of code that lists connected devices.
try something like this
try (final WebClient webClient = new WebClient()) {
// do not stop at js errors
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getPage(searchUrl);
// now wait for the js starting async (you can play with the timing)
webClient.waitForBackgroundJavaScript(10000);
// maybe the page was replaced from the async js
HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
System.out.println(page.asXml());
}
Usually that helps.
If you still facing problems you have to open an issue on github (https://github.com/HtmlUnit/htmlunit).
But keep in mind i can only really help if i can run/and debug your code here - means you web app has to be public.
Basically I'm trying to use HTML unit to perform a login.
However the login as form to input the username with a button next, then it actulizes the form and the password should be inputed. My problem occurs when I do button.click() the page gets the first form not the second where should be inputted the password
public void search() throws Exception {
WebClient wb = new WebClient();
HtmlPage p = wb.getPage(
"https://account.booking.com/sign-in?op_token=EgVvYXV0aCJHChQ2Wjcyb0hPZDM2Tm43emszcGlyaBIJYXV0aG9yaXplGhpodHRwczovL2FkbWluLmJvb2tpbmcuY29tLyoCe31CBGNvZGUqDDCgqZHe5rMjOgBCAA");
// HtmlPage p = (HtmlPage) wb.getPage(this.bUrl);
List<HtmlForm> form = p.getForms();
form.get(0).getInputByName("loginname").setValueAttribute("1234567");
HtmlForm fm = form.get(0);
System.out.println(form.get(0).getInputByName("loginname").getValueAttribute().toString());
List<Object> button = fm.getByXPath("//button[#type='submit']");
HtmlButton bt = (HtmlButton) button.get(0);
System.out.println(p.asText() + "\n+_________________");
bt.click();
System.out.println(p.asText());
}
The output shows to be the same before and after the bt.click()
1234567
Booking.com Account
This website uses cookies. Click here for more information.
Close
Sign In to Manage Your Property
Username
1234567
Next
Having trouble signing in?
Questions about your property or the Extranet? Visit the Partner Help Center or ask another partner on the Partner Forum.
Add your property to Booking.com
Create a partner account to list and manage your property.
Register
By clicking "Allow access" you authorize Extranet to use your Booking.com account info according to Extranet Terms of service.
+_________________
Booking.com Account
This website uses cookies. Click here for more information.
Close
Sign In to Manage Your Property
Username
Enter your username
Next
Having trouble signing in?
Questions about your property or the Extranet? Visit the Partner Help Center or ask another partner on the Partner Forum.
Add your property to Booking.com
Create a partner account to list and manage your property.
Register
By clicking "Allow access" you authorize Extranet to use your Booking.com account info according to Extranet Terms of service.
Sorry, but your code is based on a fundamental misunderstanding of Html and HtmlUnit.
HtmlPage p = wb.getPage(.....
retrieves a (html) page. This page is shown inside a browser window (same in HtmlUnit). If you interact with elements on this page like
form.get(0).getInputByName("loginname").setValueAttribute("1234567");
or better
form.get(0).getInputByName("loginname").type("1234567");
these elements are changing there state and as a result the whole page changes.
But:
Clicking an submit button is a total different story. In this case the browser (and HtmlUnit also) sends a Http Request to the server and gets back a new HtmlPage. Usually this page is shown inside the same window.
In HtmlUnit this is reflected by the return value of the click method - the return value is the new page. As long you are not assigning this value to a page variable and doing your next steps on this new page you are still working with the old one.
BTW: there is a commented sample on the Getting Started HtmlUnit page.
So far the simplest version of form/submit handling. But today the thinks are a bit (in fact many bits) more complicated because most of the pages out there doing (additional) magic based on javascript (e.g. Ajax).
Suggestion:
if you send me some credentials via private mail i can try to help you to get this login working based on HtmlUnit.
Suggestion 2:
Try to learn and understand all the technical stuff related to the web, without this you will be lost.