I am trying to get to the docSearch form of the https://eagletw.mohavecounty.us/treasurer/treasurerweb/search.jsp web page using the latest HTMLUnit release (2.37.0). As you can see using Firefox's DOM Inspector, there is such a form
WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setRefreshHandler(new RefreshHandler() {
public void handleRefresh(Page page, URL url, int arg) throws IOException {
System.out.println("handleRefresh");
}
});
HtmlPage page = (HtmlPage) webClient.getPage("https://eagletw.mohavecounty.us/treasurer/treasurerweb/search.jsp");
webClient.waitForBackgroundJavaScript(1000000);
webClient.waitForBackgroundJavaScriptStartingBefore(100000);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
page.getEnclosingWindow().getJobManager().waitForJobs(1000000);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.waitForBackgroundJavaScriptStartingBefore(1000000);
HtmlForm form = page.getFormByName("docSearch");
The last line of the above code gives me the following exception:
com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[form] attributeName=[name] attributeValue=[docSearch]
Any tips on what I can try in my code to get to the docSearch form ?
Do you believe this is a problem with HTMLUnit itself? Should I file this as an issue on HTMLUnit's GitHub site?
Have spend some time on this to build a complete sample. The page is only available from the us - i had to set up a vpn to access the page. The sample contains some hints; hope that helps.
final String url = "https://eagletw.mohavecounty.us/treasurer/treasurerweb/search.jsp";
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
// webClient.getOptions().setUseInsecureSSL(true);
// open the url, this will do a redirect to the login page
HtmlPage page = webClient.getPage(url);
// System.out.println(page.asXml());
// System.out.println("--------------------------------");
// click the Public User Login
for (DomElement elem : page.getElementById("middle_left").getElementsByTagName("input")) {
if (elem instanceof HtmlSubmitInput
&& "Login".equals(((HtmlSubmitInput) elem).getValueAttribute())) {
page = elem.click();
break;
}
}
// System.out.println(page.asXml());
// System.out.println("--------------------------------");
// search by owner name
HtmlInput ownerInput = (HtmlInput) page.getElementById("TaxAOwnerIDSearchString");
ownerInput.type("Trump");
// click submit
for (DomElement elem : page.getElementsByTagName("input")) {
if (elem instanceof HtmlSubmitInput) {
page = elem.click();
}
}
// System.out.println(page.asXml());
// System.out.println("--------------------------------");
System.out.println(page.asText());
Your code looks really desperate, usually it is more helpful to try to understand what is going on than copy every snippet you can find into you code and hope this will help.
A good starting point is to understand how the page is working. Use a good web proxy like Charles (or Fiddler) to monitor what happens when you open the page with your browser. Sadly I cannot open your url because my browser reports server not found. Because of this the rest of this answer is more of a guess.
The next step is to create your web client and try to live with the default settings.
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnScriptError(false);
With this two lines your client is ready.
At least you RefreshHandler setup completely breaks the handling of refresh cases.
Next step is to check the output after you got the page and compare with your browser/web proxy session.
HtmlPage page = (HtmlPage) webClient.getPage("https://eagletw.mohavecounty.us/treasurer/treasurerweb/search.jsp");
System.out.println(page.asXml());
No you can check if the form is there (in the output) or not. If not you have to figure out with the proxy if there is any kind of js based background reloading. Usual you will see the requests in your proxy output.
To wait for this you can call something like
webClient.waitForBackgroundJavaScriptStartingBefore(100_000);
Sometimes these backround jobs are replacing the content of the current window. To take care of this it is a good idea to get the current page content from the window before dumping.
page = page.getEnclosingWindow(getEnclosedPage());
System.out.println(page.asXML());
Hope that clarifies it a bit. If you need more help i need to be able to access the page myself. Otherwise it is only guessing.
Related
I am using HtmlUnit to load a webpage containing a dynamically updated ajax component using the following:-
WebClient webClient = new WebClient(BrowserVersion.CHROME);
URL url = new URL("https://live.xxx.com/en/ajax/getDetailedQuote/" + instrument);
WebRequest requestSettings = new WebRequest(url, HttpMethod.POST);
HtmlPage redirectPage = webClient.getPage(requestSettings);
This works and I get the contents of the page at the time of request.
I want however to be able to monitor and respond to changes on the page.
I tried the following:-
webClient.addWebWindowListener(new WebWindowListener() {
public void webWindowContentChanged(WebWindowEvent event) {
System.out.println("Content changed ");
}
});
But I only get "Content changed" when the page is first loaded, and not when it updates.
I fear there is not really a solution with HtmlUnit (at least not out of the box). The webWindowContentChanged hook is only called if the (whole) page inside the window is replaced.
You can try to implement an DomChangeListener and attach that to the page (or maybe the body of the page).
If you like to track the ajax requests more on the http level you have the option to intercept the requests (see https://htmlunit.sourceforge.io/faq.html#HowToModifyRequestOrResponse for more details).
If you need more, please open an issue and we can discuss.
I am trying to web router page scraping to search for connected devices and information about them.
I written this code:
String searchUrl="http://192.168.1.1";
HtmlPage page=client.getPage(searchUrl);
System.out.println(page.asXml());
The problem is that the code returns by HtmlUnit is different from the code in Chrome. In HtmlUnit I don't have the section of code that lists connected devices.
try something like this
try (final WebClient webClient = new WebClient()) {
// do not stop at js errors
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getPage(searchUrl);
// now wait for the js starting async (you can play with the timing)
webClient.waitForBackgroundJavaScript(10000);
// maybe the page was replaced from the async js
HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
System.out.println(page.asXml());
}
Usually that helps.
If you still facing problems you have to open an issue on github (https://github.com/HtmlUnit/htmlunit).
But keep in mind i can only really help if i can run/and debug your code here - means you web app has to be public.
I am using htmlunit from net.sourceforge.htmlunit for simulating web browser. I try to log in in steam web app, but I encoutered problem. After setting credentials I wanted to use click method:
final WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.setCookieManager(new CookieManager());
final HtmlPage loginPage = webClient.getPage(loginPageConfiguration.getLoginPageUrl());
final HtmlTextInput user = loginPage.getHtmlElementById(loginPageConfiguration.getLoginInputId());
user.setText(loginCredentials.getUsername());
final HtmlPasswordInput password = loginPage.getHtmlElementById(loginPageConfiguration.getPasswordInputId());
password.setText(loginCredentials.getPassword());
final HtmlPage afterLoginPage = loginPage.getHtmlElementById(loginPageConfiguration.getLoginButtonId()).click();
In normal browser after succesfull login it redirects to http://store.steampowered.com/ but afterLoginPage is still in previous login page.
Without knowing the page and having credentials to access i can only guess. Maybe the application is a single page application that replaces the visual content using ajax (maybe in combination with redirecting). Because ajax is async it might help to wait a bit after the click and before addressing the page.
It also might be a good starting point to understand what is going on by checking the http communication of the real browser (by using the developer tools or by using a web proxy like Charles). You can than compare this with the communication done by HtmlUnit (e.g. enable HttpClient wire log).
Another option might be a javascript error. Please check your log output.
I am using the gui-less browser htmlunits to retrieve the webcontent for webpages and the code is working fine for other sites except "http://www.xyzzzzzzz.com.sg/". Can anybody explain why this is happening???? I already used HtmlUnit webdriver for all three browsers CHROME, FIREFOX and IE as BrowserVersion, nothing is working.
public class Test{
public static void main(String[] args) throws Exception {
String url = "http://www.xyzzzzzzz.com.sg/";
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setUseInsecureSSL(true);
HtmlPage currentPage = (HtmlPage) webClient.getPage(url);
String content = currentPage.asXml();
webClient.waitForBackgroundJavaScript(20000);
System.out.println(content); // NOT SHOWING PROPER CONTECT
}
}
Cab you please describe what do you mean by NOT SHOWING PROPER CONTECT.Because I dnt think there is some mistake in code.
Some time JS makes problem to HtmlUnit for execution so check by stopping it too.
I'm developing a JAVA SWING application, and when i click on a button, and I want it to open a browser page with the HTML form fields fully filled(with the data that I want to pass from JAVA application), understood?
In this moment i'm using HtmlUnit library, but if you know something better, I'm open to sugestions!
This is what i have:
WebClient webClient = new WebClient();
HtmlPage page1 = webClient.getPage("http://www.google.com");
HtmlForm form = page1.getFormByName("search");
HtmlButton button = form.getButtonByName("submitSearch");
HtmlTextInput textField = form.getInputByName("searchTxtField");
textField.setValueAttribute("TEST VALUE");
HtmlPage page2 = button.click(); //final line
On the final line I submit the form, instead of that, I want it to open a web browser with the text:"TEXT VALUE" in "searchTxtField".
HELP ME, please...
HtmlUnit doesn't control any external web browser. It is a web browser (without any graphical UI).
Use Selenium to control an external web browser.
Since Java 6, you could use the method in public void browse(URI uri) the java.awt.Desktop. You need build an URI with some parameters.
try {
URI uri = new URI("http://www.google.com/#q=TEST+VALUE&oq=TEST+VALUE");
Desktop desktop = null;
if (Desktop.isDesktopSupported()) {
desktop = Desktop.getDesktop();
}
if (desktop != null)
desktop.browse(uri);
} catch (IOException ioe) {
// log error
}
Check Google Search URL Parameters – Query String Anatomy