I am writing a program in Java using Htmlunit that has a Radio Button that needs to be clicked to fill out a set of information. I am currently having an issue finding the fields that need to be entered after the radio button is clicked. Currently my code is:
String url = "http://cpdocket.cp.cuyahogacounty.us/";
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage(url);
final HtmlForm form = page.getForms().get(0);
final HtmlElement button = form.getElementById("SheetContentPlaceHolder_btnYes");
final HtmlPage page2 = button.click();
try {
synchronized (page2) {
page2.wait(3000);
}
}
catch(InterruptedException e)
{
System.out.println("error");
}
//returns the first page after the security page
final HtmlForm form2 = page2.getForms().get(0);
final HtmlRadioButtonInput button2 = form2.getInputByValue("forcl");
button2.setDefaultChecked(true);
page2.refresh();
final HtmlForm form3 = page2.getForms().get(0);
form3.getInputByName("ctl00$SheetContentPlaceHolder$foreclosureSearch$txtZip").setValueAttribute("44106");
final HtmlSubmitInput button3 = form3.getInputByValue("Submit");
final HtmlPage page3 = button3.click();
try {
synchronized (page3) {
page2.wait(10000);
}
}
catch(InterruptedException e)
{
System.out.println("error");
}
While the first page is a security page that needs to be bypassed, the second page is where I am running into the issue as I am getting the error "
com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[input] attributeName=[name] attributeValue=[ctl00$SheetContentPlaceHolder$foreclosureSearch$txtZip]
at com.gargoylesoftware.htmlunit.html.HtmlForm.getInputByName(HtmlForm.java:463)
at Courtscraper.scrapeWebsite(Courtscraper.java:58)"
I believe this means that the input field cannot be found in the form. I have been referring to two websites as reference. Website1, Website2. I am not sure, but i believe I may have to create a new HtmlPage after setting the radio button to true.
Without knowing the page it is impossible to see why the error is happening. However, as you say, it is clear that the getInputByName is not finding the element and raising the exception.
Given that code, and assuming you've not committed a syntactical error in the string to fetch the input by name, I would suggest removing this line:
page2.refresh();
Refreshing the page after making modifications to it might result in getting an unmodified page again.
Regarding creating a new HtmlPage after setting the radio button to true, that would only be necessary if the radio has an onchange or a similar event attached that fires a JavaScript AJAX call that modifies the DOM and creates the element that you are trying to fetch.
That's all I can suggest given that code.
In your code after creating page2 you will make a WebRequest not creating a new page like this.
String url = "http://cpdocket.cp.cuyahogacounty.us/Search.aspx";
String EventTarget = "ctl00$SheetContentPlaceHolder$rbCivilForeclosure";
String world = "ctl00$SheetContentPlaceHolder$UpdatePanel1|ctl00$SheetContentPlaceHolder$rbCivilForeclosure";
String Viewstate = page2.getElementById("__VIEWSTATE").getAttribute("value");
String EventValidation = page2.getElementById("__EVENTVALIDATION").getAttribute("value");
WebRequest req1 = new WebRequest(new URL(url));
req1.setHttpMethod(HttpMethod.POST);
req1.setAdditionalHeader("Origin", "http://cpdocket.cp.cuyahogacounty.us");
req1.setAdditionalHeader("Referer", "http://cpdocket.cp.cuyahogacounty.us/Search.aspx");
req1.setAdditionalHeader("X-Requested-With", "XMLHttpRequest");
String txtview1 = "ctl00$ScriptManager1=" + URLEncoder.encode(world) + "&__EVENTTARGET=" + URLEncoder.encode(EventTarget) + "&__EVENTARGUMENT=&__LASTFOCUS=&__VIEWSTATE=" + URLEncoder.encode(Viewstate) + "&__EVENTVALIDATION=" + URLEncoder.encode(EventValidation) + "&ctl00$SheetContentPlaceHolder$rbSearches=forcl&__ASYNCPOST=true&";
//System.out.println("this is text view =============== " + txtview1);
req1.setRequestBody(txtview1);
req1.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
String re=client.getPage(req1).getWebResponse().getContentAsString();
System.out.println("========== " + re);
After done above code successfully you getting a String in which your response is come.
Related
I am trying to webscrape a food store but I can not find the elements in the html file. I can see them in Chrome Inspect elements but when I download get the webcontent with:
HtmlPage page = webClient.getPage(url);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();
I can not see every element which I can see in Chrome Inspect.
What I want to do is for example this url: https://www.coop.se/handla/sok/?q=kvarg
get the first search result "Kvarg Mild Vanilj" which I can find in the element
<h3 class="ProductTeaser-heading">Kvarg Mild Vanilj</h3>
But that element is not in the content I got before.
So when I try:
HtmlElement price = (HtmlElement) page.getByXPath("//h3").get(0);
I just got exception like when it not finding it:
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100) ~[na:na] at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106) ~[na:na] at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302) ~[na:na]
So my question is, how can I find these elements using HTMLUnit, or are there other libraries for that in java?
Here is a picture of the element I trying to get to java application
Tried different Xpath but nothing is finding them, but the same Xpath is working in Chrome inspect
final String url = "https://www.coop.se/handla/sok/?q=kvarg";
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
HtmlPage page = webClient.getPage(url);
//Findning this one
HtmlElement input = (HtmlElement) page.getByXPath("//input").get(0);
//Not finding this one, even if it is in chrome inspect
HtmlElement price = (HtmlElement) page.getByXPath("//h3").get(0);
} catch (IOException e) {
e.printStackTrace();
}
I am trying to run the tutorial on here. The code looks like this:
public class Test {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient client = new WebClient(BrowserVersion.FIREFOX);
HtmlPage page = client.getPage("https://google.com/");
// Getting Form from google home page. tsf is the form name
HtmlForm form = page.getHtmlElementById("tsf"); // Error occurs here
form.getInputByName("q").setValueAttribute("test");
// Creating a virtual submit button
HtmlButton submitButton = (HtmlButton)page.createElement("button");
submitButton.setAttribute("type", "submit");
form.appendChild(submitButton);
// Submitting the form and getting the result
HtmlPage newPage = submitButton.click();
// Getting the result as text
String text = page.asNormalizedText();
System.out.println(text);
}
}
But I am getting error message:
Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[*] attributeName=[id] attributeValue=[tsf]
at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1670)
at Test.main(Test.java:20)
Since this tutorial is relatively old, the ID tsf might be outdated. However, if I check the form name from the google home page, I cant figure it out. Maybe I dont understand the meaning of the whole HtmlForm object. (I am completely new to this topic)
There is no element with ID tsf anymore. Best way to check it out is to go to the site and use Web Developer Tools of your browser (f12) mostly on every browsers. You can see the whole HTML document from there.
I have problems with getting content by URL. I'm using HtmlUnit for parsing an HTML page, but when I run my application I don't get content without filling after executed js.
I getting html without needed me content.
Who can help me please ?
Example code:
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38)) {
webClient.waitForBackgroundJavaScript(30 * 1000);
final HtmlPage page = webClient.getPage("http://.....some url");
final String pageAsXml = page.asXml();
final String pageAsText = page.asText();
} –
I'm trying to get page content, that javascript function getWines() returns. The page I'm trying to get info from is http://hedonism.co.uk/wines/. So I'm using HtmlUnit and wrote the following code:
final WebClient webClient = new WebClient(
BrowserVersion.INTERNET_EXPLORER_10);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(false);
final HtmlPage page = webClient.getPage(url);
String javaScriptCode = "getWines(1)";
ScriptResult result = page.executeJavaScript(javaScriptCode);
Page page1 = result.getNewPage();
StringBuffer p = WebGet.getBuffPageContent(page1.getUrl().toString(), true);
System.out.println(p.toString());
But it seems such approach isn't working. I receive the same page I had before function call with the same source code, so I'm not able to get info about, for example, wine name. Maybe I'm totally doing incorrect?
I am using web client for getting page source. I have logged in successfully. After that, I use same object for getting page source using different URL but it's showing an Exception like:
java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage
This is the code which i am using.
forms = (List<HtmlForm>) firstPage.getForms();
form = firstPage.getFormByName("");
HtmlTextInput usernameInput = form.getInputByName("email");
HtmlPasswordInput passInput = form.getInputByName("password");
HtmlHiddenInput redirectInput = form.getInputByName("redirect");
HtmlHiddenInput submitInput = form.getInputByName("form_submit");
usernameInput.setValueAttribute(username);
passInput.setValueAttribute(password);
//Create Submit Button
HtmlElement button = firstPage.createElement("button");
button.setAttribute("type", "submit");
button.setAttribute("name", "submit");
form.appendChild(button);
System.out.println(form.asXml());
HtmlPage pageAfterLogin = button.click();
String sourc = pageAfterLogin.asXml();
System.out.println(pageAfterLogin.asXml());
/////////////////////////////////////////////////////////////////////////
above code running successfully and login
After that i am using this code
HtmlPage downloadPage = null;
downloadPage=(HtmlPage)webClient.getPage("url");
But i am getting Exception
java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage
Within the JavaDoc of UnexpectedPage they state that
A generic page that is returned whenever an unexpected content type is
returned by the server.
I would advise that you check the content type of webClient.getPage("url");
Instead of using
HtmlPage downloadPage = null;
downloadPage=(HtmlPage)webClient.getPage("url");
Use
UnexpectedPage downloadPage = null;
downloadPage=(HtmlPage)webClient.getPage("url");
It worked fine with me.