I want to get this using preferably java or if there is a way to do it using selenium webdriver, I dont want the links present in a page . I want a result like https://www.xml-sitemaps.com/ gives a list of all page URLs in a domain. I dont need it like a tree or an xml, just plain simple URLs will do
You can look for tags ( like href or a ) and then store the links in a list.
List links = driver.findElements(By.tagName("href"));
I am rendering a html page with a list of things with an option to edit any one of them.
Draft:
On clicking on any of the item, makes a server call and effectively an update in the database.
Now, What i want to do is, When the list is long and the page is scrolled all the way to the bottom (for eg. ITEM 1000) and the user makes any server request,
after the page is reloaded, the user should be scrolled down at the exact item. Is this possible?
What is a good way to approach this functionality?
I am aware of anchor tag and name attribute and then posting url.com/#anchorTagName. But in case of server call, we dont provide any urls, its just a form.submit
Any suggestions are much appreciated!
Anchor tag is the best approach url.com/#anchorTagName but as you said the server doesn't provide it.
Here are some other alternatives:
1. Session attribute:
Once the server call is made and the data is loaded successfully in the DB, set a session variable. (session variable will contain tag)
In the UI using scriplet, assign the session variable to a javascript variable.
Now on page load, let the javascript to get the variable and scroll to the particular location.
example:
<script type="text/javascript">
window.onload = function() {
var scrollNow = "#<%=session.getAttribute('')%>"
window.location.href = scrollNow; // this will take to the particular element based on the ID
}
</script>
Another option, is to append the location in query string. Same way get the location and let the javascript do its part just like before :)
Another option,
In the form submit, handle the event via javascript before submitting the form.
Construct the action url dynamically with the hash tag and send it to the server.
So when the server receives it, it will ignore the anchor tag and will process the data.
When the page refershes again, the page scroll to the previous location as the anchor tag will be there in the URL :)
I have a paragraph with id = story and I want to change its text dynamically using a servlet. How do I do this? I'm new and using getWriter().println() seems to create a new document instead of appending to the existing one.
Thanks
Simple answer - you can't.
Server-side code cannot change a response which has already been sent to the client.
To change text inside an HTML tag, use Javascript on the browser.
See http://www.w3schools.com/ajax/
What I need to do is browse to a webpage, login, then browse to another webpage on that site that requires you to be logged in, so it needs to save cookies. After that, I need to click an element on that page, in which I would fill out the form and get the message that the webpage returns to me. The reason I need to actually go to the page and click the button as suppose to just navigating directly to the link is because the you are assigned a session ID every time you log in and click the link, and its always different. The button looks like this, its not a normal href link:
<span id=":tv" idlink="" class="sA" tabindex="0" role="link">Next</span>
Anyway, what would be the easiest way to do this? Thanks.
Update:
After trying HTMLunit, and other headless browser libraries, it doesnt seem that its happening using anything "headless." Another thing that I recently found out about this page is that that all the HTML is in some weird format... Its all inside a script tag. Here is a sample.
"?ui\x3d2\x26view\x3dss\x26mset\x3dmain\x26ver\x3d-68igm85d1771\x26am\x3d!Zsl-0RZ-XLv0BO3aNKsL0sgMg3nH10t5WrPgJSU8CYS-KNWlyrLmiW3HvC5ykER_n_5dDw\x26fri"],"http://example.com/?ctx\x3d%67mail\x26hl\x3den",,0,"Gmail","Gmail",[["us","c130f0854ca2c2bb",[["n"],["m","New features!"],["u"],["k","0"],["p","1000:500000,10,200000,5,100000,3,75000,2,0,1"],["h","https://survey.googleratings.com/wix/p1679258.aspx?l\x3d1033"],["at","query,5,contacts,5,adv,5,cf,5,default,20"],["v","https://www.youtube.com/embed/Ra8HG6MkOXY?showinfo\x3d0"],
When I do inspect element on the button, the HTML code that I posted above for the button comes up, but not when doing view source. Basically, what I am going to need to do is use some sort of GUI and have the user navigate to the link and then have the program fill out the info. Does anyone know how I can do this? Thanks.
Have a look at the 5 Minute Getting Started Guide for Selenium: http://code.google.com/p/selenium/wiki/GettingStarted
On the login page, look at the form's HTML to see the url it posts to and the url parameters. Then request that url with the same parameters filled in with correct info, and make sure to save all the cookie headers to send to the second page. Then use an html parser to find your link. There are several html parsers available on sourceforge, and you could even try java's built in xml parsers, though if the site has even a tiny html mistake they will glitch.
EDIT didn't notice the fact that it is not a normal link. In that case you will need to look at the site's javascript to see where the link leads. If the link requires javascript to run, it gets more complicated. Java is not able to execute browser javascript, but I found a library called DJ native swing which includes a web browser class that you can add to jframes. It uses your native browser to render, and to run javascript.
This should be possible in Selenium as others have noted.
I have used Selenium to login then crawl a site and discover every permuation of values for every form on the site (30+ forms). These values are later used to fill and submit the form with a specific perumation of values. This site was very JS/jQuery heavy and I used Selenium's built-in support of javascript executor, css selectors, and XPath to accomplish this.
I implemented HtmlUnit and HttpUnit as faster alternatives, but found they were not as reliable as Selenium given the JS semantics of the site I was crawling.
It's hard to give you code on how to accomplish it because your Selenium implementation will be quite page-specific and I can't look at the page you're coding against to figure out what's going on with that button script junk. However, I have include some possibly relevant selenium code (Java) snippets:
Element element = driver.findElements(By.id(value)); //find element on page
List<Element> buttons = parent.findElements(By.xpath("./tr/td/button")); //find child element
button.click();
element.submit() //submit enclosing form
element.sendKeys(text); //enter text in an input
String elementText = (String) ((JavascriptExecutor) driver).executeScript("return arguments[0].innerText || arguments[0].textContent", element); //interact with a selenium element via JS
If you are coding similar functions on different pages, then PageObjects behind interfaces can help.
The link Anew posted is a good starting point and good ol' StackOverflow has answers to just about any Selenium problem ever.
Instead of trying to browse around programmatically, try executing the login request and save the cookies then set those in the next request to the form post.
HTMLUnit is pretty bad at processing JavaScript, the Rhino JS library produces often errors (actually no errors is much the exception). I would advise to use Selenium, which is basically a framework to control headless browsers (chrome, firefox based).
For your question, the following code would do the work
selenium.open(myurl);
selenium.click("id=:tv");
You then have to wait for the page to load
selenium.waitForPageToLoad(someTime);
I would recommend htmlunit any day. It's a great library.
First, check out their web page(http://htmlunit.sourceforge.net/) to get htmlunit up and running. Make sure you use the latest snapshot(2.12 when writing this)
Try these settings to ignore pretty much any obstacle:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
Then when fetching your page, make sure you wait for background Javascript before doing anything with the page, like posting a login form:
//Get Page
HtmlPage page1 = webClient.getPage("https://login-url/");
//Wait for background Javascript
webClient.waitForBackgroundJavaScript(10000);
//Get first form on page
HtmlForm form = page1.getForms().get(0);
//Get login input fields using input field name
HtmlTextInput userName = form.getInputByName("UserName");
HtmlPasswordInput password = form.getInputByName("Password");
//Set input values
userName.setValueAttribute("MyUserName");
password.setValueAttribute("MyPassword");
//Find the first button in form using name, id or xpath
HtmlElement button = (HtmlElement) form.getFirstByXPath("//button");
//Post by clicking the button and cast the result, login arrival url, to a new page and repeat what you did with page1 or something else :)
HtmlPage page2 = (HtmlPage) button.click();
//Profit
System.out.println(page2.asXml());
I hope this basic example will help you!
How can I access this element:
<input type="submit" value="Save as XML" onclick="some code goes here">
More info: I have to access programmatically a web page and simulate clicking on a button on it, which then will generate a xml file which I hope to be able to save on the local machine.
I am trying to do so by using HtmlUnit libraries, but all examples I could find use getElementById() or getElementByName() methods. Unfortunately, this exact element doesn't have a name or Id, so I failed miserably. I supposed then that the thing I have to do is use the getByXPath() method but I got completely lost into XPath documentation(this matter is all new to me).
I have been stuck on this for a couple of hours so I really need all the help I can get.
Thanks in advance.
There are several options for an XPATH to select that input element.
Below is one option, which looks throughout the document for an input element that has an attribute named type with the value "submit" and an attribute named value with the value "Save as XML".
//input[#type='submit' and #value='Save as XML']
If you could provide a little bit more structure, a more specific (and efficient) XPATH could be created. For instance, something like this might work:
/html/body//form//input[#type='submit' and #value='Save as XML']
You should be able to use the XPATH with code like this:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
submitButton = page.getByXPath("/html/body//form//input[#type='submit' and #value='Save as XML']")
Although I would, in most cases, recommend using XPath, if you don't know anything about it you can try the getInputByValue(String value) method. This is an example based on your question:
// Fetch the form somehow
HtmlForm form = this.page.getForms().get(0);
// Get the input by its value
System.out.println(form.getInputByValue("Save as XML").asXml());