How to get dynamic body text in Selenium with Java

How to get dynamic body text in Selenium with Java - java

I need to get text value as indicated below:
<!DOCTYPE html>
<html lang="en">
<head>
</head>
<body>
<b>Some Text I can find using xPath</b>
<hr>
**TEXT I WOULD LIKE TO FIND THAT IS BEING ADDED DYNAMICALLY - it will be different number every time page loads**
<hr>
**some other text dynamically added**
</body>
</html>
I tried by using
driver.findElement(By.xpath("/html/body/text()[1]"));
with no luck.

It's not straight forward due to the WebDriver not handling anything but element nodes. I opened a while ago two issues: one against the WebDriver and another one against the W3C WebDriver specification. (vote for them if, it helps showing a need of the user base).
Meanwhile, as a (painful) workaround, you will need to rely on JavascriptExecutor capabilities of your WebDriver. An example (in another context, thus will have to be adapted to your specifics), in one of my older answers.
Adapted to your case, with the note it may contain bugs cause by typos (I haven't checked it):
WebElement contextNode=driver.findElement(By.xpath("/html/body"));
if(driver instanceof JavascriptExecutor) {
String jswalker=
"var tw = document.createTreeWalker("
+ "arguments[0],"
+ "NodeFilter.SHOW_TEXT,"
+ "{ acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT;} },"
+ "false"
+ ");"
+ "var ret=null;"
// skip over the number of text nodes indicated by the arguments[1]
+ "var skip;"
+ "for(skip=0; tw.nextNode() && skip<arguments[1]; skip++);"
+ "if(skip==arguments[1]) { " // found before tw.nextNode() ran out
+ "ret=tw.currentNode.wholeText.trim();"
+ "}"
+ "return ret;"
;
int textNodeIndex=3; // there will be empty text nodes before after <b>
Object val=((JavascriptExecutor) driver).executeScript(
jswalker, contextNode, textNodeIndex
);
String textThatINeed=(null!=val ? val.toString() : null);
}
Please let me know if/how it works.

Related

Is the standard library my best option for Java to load/read and edit/modify and save a html file with no reformatting?

I want to load/read and edit/modify and save a html file located on my hard drive. I tried JSOUP, but it kept reformatting the html file. I want to avoid reformating.
I'm wanting to inject some JavaScript after the <script> and before var deviceReady = false; in the html file.
Do I need to parse the file?
Should I use default Java? (BufferedReader, FileReader, Scanner)
<!DOCTYPE html>
<html lang="en">
<head>
<meta name='viewport' content='initial-scale = 1, minimum-scale = 1, maximum-scale = 1'/>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="x-ua-compatible" content="IE=10">
<title>LX-XXX-KU</title>
<style type="text/css">#initialLoading{background:url(assets/htmlimages/loader.gif) no-repeat center
center;background-color:#ffffff;position:absolute;margin:auto;top:0;left:0;right:0;bottom:0;z-
index:10010;}</style>
"
<script>
var deviceReady = false;
var initCalled = false ;
var initialized = false;
function onBodyLoad()
{
if(typeof window.device === 'undefined')
{
document.addEventListener("deviceready", onDeviceReady, false);
}
else
{
onDeviceReady();
}
}
Javasacript I want to add after the <script> and before var deviceReady = false;
`//adds numbers to TOC
window.addEventListener( 'moduleReadyEvent', function ( e )
{
var myText = document.getElementsByClassName('tocText');
for ( var i = 0; i < myText.length; i++ )
{
var getText = myText[ i ].childNodes;
var str = ( i + 1 ) + ' ' + getText[ 0 ].innerHTML;
getText[ 0 ].innerHTML = str;
}
});`

This can be accomplished like so:
File f = ...;
String contents = new String(Files.readAllBytes(f));
int idx = contents.indexOf(insertBeforeStr);
contents = contents.substring(0, idx) + contentToBeAdded + contents.substring(idx + 1);
// write contents back to the disk.

If you turn off jsoup's pretty printing option, and use the XML parser instead of the validating HTML parser, the document and all of its text verbatim, including whitespace, is passed through pretty much unmolested, other than syntax fixes for attributes, missing end tags, and the like.
See for example your input on Try jsoup with pretty-printing off, and using the XML parser, is effectively the same as your original.
The code would be something like:
Document doc = Jsoup.parse("<script>\nSomething(); ", "", Parser.xmlParser());
doc.outputSettings().prettyPrint(false);
Element scriptEl = doc.selectFirst("script");
DataNode scriptData = scriptEl.dataNodes().get(0);
scriptData.setWholeData(scriptData.getWholeData() + "\nanotherFunction();");
System.out.println(doc.html());
Gives us (note that there's no HTML structure automatically created, due to using the XML parser):
<script>
Something();
anotherFunction()</script>
ControlAltDel's answer definitely works and means you can do it with just the Java base library. The benefit of using jsoup is (IMHO - as the author of jsoup) in this case is that you're not trying to string-match HTML, and won't get caught by e.g. a <script> in a comment, or in this case a missing close </script> tag, etc. But of course YMMV.
Incidentally, once jsoup 1.14.1 is released (soon!) with the change #1419 (which for script elements, proxies text settings into data without escaping), the code will simplify to:
Element scriptEl = doc.selectFirst("script");
scriptEl.appendText("\nanotherFunction()");

How to use Selenium get text from an element not including its sub-elements

HTML
<div id='one'>
<button id='two'>I am a button</button>
<button id='three'>I am a button</button>
I am a div
</div>
Code
driver.findElement(By.id('one')).getText();

I've seen this question pop up a few times in the last maybe year or so and I've wanted to try writing this function... so here you go. It takes the parent element and removes each child's textContent until what remains is the textNode. I've tested this on your HTML and it works.
/**
* Takes a parent element and strips out the textContent of all child elements and returns textNode content only
*
* #param e
* the parent element
* #return the text from the child textNodes
*/
public static String getTextNode(WebElement e)
{
String text = e.getText().trim();
List<WebElement> children = e.findElements(By.xpath("./*"));
for (WebElement child : children)
{
text = text.replaceFirst(child.getText(), "").trim();
}
return text;
}
and you call it
System.out.println(getTextNode(driver.findElement(By.id("one"))));

Warning: the initial solution (deep below) won't workI opened an enhancement request: 2840 against the Selenium WebDrive and another one against the W3C WebDrive specification - the more votes, the sooner they'll get enough attention (one can hope). Until then, the solution suggested by #shivansh in the other answer (execution of a JavaScript via Selenium) remains the only alternative. Here's the Java adaptation of that solution (collects all text nodes, discards all that are whitespace only, separates the remaining by \t):
WebElement e=driver.findElement(By.xpath("//*[#id='one']"));
if(driver instanceof JavascriptExecutor) {
String jswalker=
"var tw = document.createTreeWalker("
+ "arguments[0],"
+ "NodeFilter.SHOW_TEXT,"
+ "{ acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT;} },"
+ "false"
+ ");"
+ "var ret=null;"
+ "while(tw.nextNode()){"
+ "var t=tw.currentNode.wholeText.trim();"
+ "if(t.length>0){" // skip over all-white text values
+ "ret=(ret ? ret+'\t'+t : t);" // if many, tab-separate them
+ "}"
+ "}"
+ "return ret;" // will return null if no non-empty text nodes are found
;
Object val=((JavascriptExecutor) driver).executeScript(jswalker, e);
// ---- Pass the context node here ------------------------------^
String textNodesTabSeparated=(null!=val ? val.toString() : null);
// ----^ --- this is the result you want
}
References:
TreeWalker - supported by all browsers
Selenium Javascript Executor
Initial suggested solution - not working - see enhancement request: 2840
driver.findElement(By.id('one')).find(By.XPath("./text()").getText();
In a single search
driver.findElement(By.XPath("//[#id=one]/text()")).getText();
See XPath spec/Location Paths the child::text() selector.

I use a function like below:
private static final String ALL_DIRECT_TEXT_CONTENT =
"var element = arguments[0], text = '';\n" +
"for (var i = 0; i < element.childNodes.length; ++i) {\n" +
" var node = element.childNodes[i];\n" +
" if (node.nodeType == Node.TEXT_NODE" +
" && node.textContent.trim() != '')\n" +
" text += node.textContent.trim();\n" +
"}\n" +
"return text;";
public String getText(WebDriver driver, WebElement element) {
return (String) ((JavascriptExecutor) driver).executeScript(ALL_DIRECT_TEXT_CONTENT, element);
}

var outerElement = driver.FindElement(By.XPath("a"));
var outerElementTextWithNoSubText = outerElement.Text.Replace(outerElement.FindElement(By.XPath("./*")).Text, "");

Similar solution to the ones given, but instead of JavaScript or setting text to "", I remove elements in the XML and then get the text.
Problem:
Need text from 'root element without children' where children can be x levels deep and the text in the root can be the same as the text in other elements.
The solution treats the webelement as an XML and replaces the children with voids so only the root remains.
The result is then parsed. In my cases this seems to be working.
I only verified this code in a environment with Groovy. No idea if it will work in Java without modifications. Essentially you need to replace the groovy libraries for XML with Java libraries and off you go I guess.
As for the code itself, I have two parameters:
WebElement el
boolean strict
When strict is true, then really only the root is taken into account. If strict is false, then markup tags will be left. I included in this whitelist p, b, i, strong, em, mark, small, del, ins, sub, sup.
The logic is:
Manage whitelisted tags
Get element as string (XML)
Parse to an XML object
Set all child nodes to void
Parse and get text
Up until now this seems to be working out.
You can find the code here: GitHub Code

Replace string with jsoup only in text portions

I have found several topics with similar questions and valuable answers, but I am still struggling with this:
I want to parse some html with Jsoup so I can replace, for example,
"changeme"
with
<changed>changeme</changed>
, but only if it appears on a text portion of the html, no if it is part of a tag. So, starting with this html:
<body>
<p>test changeme app</p>
</BODY>
</HTML>
I would want to get to this:
<body>
<p>test <changed>changeme</changed> app</p>
</BODY>
</HTML>
I have tried several approaches, this one is which brings me closer to the desired result:
Document doc = null;
try {
doc = Jsoup.parse(new File("tmp1450348256397.txt"), "UTF-8");
} catch (Exception ex) {
}
Elements els = doc.body().getAllElements();
for (Element e : els) {
if (e.text().contains("changeme")) {
e.html(e.html().replaceAll("changeme","<changed>changeme</changed>"));
}
}
html = doc.toString();
System.out.println(html);
But with this approach I find two problems:
<body>
<p><a href="http://<changed>changeme</changed> .html">test
<changed>
changeme
</changed>
app</a></p>
</BODY>
</HTML>
Line breaks are inserted before and after the new element I am introducing. This is not a real problem as I coul get rid of them if I use #changed# to do the replacing and after the doc.toString() I replace them again to the desired value (with < >).
The real problem: The URL in the href has been modified, and I don't want it to happen.
Ideas? Thx.

Here is my solution:
String html=""
+"<p><a href=\"http://changeme.html\">"
+ "test changeme "
+ "<div class=\"changeme\">"
+ "inner text changeme"
+ "</div>"
+ " app</a>"
+"</p>";
Document doc = Jsoup.parse(html);
Elements els = doc.body().getAllElements();
for (Element e : els) {
List<TextNode> tnList = e.textNodes();
for (TextNode tn : tnList){
String orig = tn.text();
tn.text(orig.replaceAll("changeme","<changed>changeme</changed>"));
}
}
html = doc.toString();
System.out.println(html);
TextNodes are always leaf nodes, i.e. they do not contain more HTML elements. In your original approach you replace the HTML of an element with new HTML with replaced changme strings. You only check for the changeme to be part of the TextNodes contents, but you replace every occurrence in the HTML string of the element, including all occurrences outside TextNodes.
My solution basically works like yours, but I use the JSoup method textNodes(). This way I don't need to typecast.
P.S.
Of course, my solution as well as yours will contain <changed>changeme</changed> instead of <changed>changeme</changed> in the end. This may or may not be what you want. If you do not want this, then your result is not any more valid HTML, since changed is no valid HTML tag. Jsoup will not help you in this case. However, you can of course replace in the resulting string all <changed>changeme</changed> again - outside JSoup.

I think your issue is that you're replacing the elements html rather than just its text, change:
e.html(e.html().replaceAll("changeme","<changed>changeme</changed>"));
to
e.text(e.text().replaceAll("changeme","<changed>changeme</changed>"));
the line breaks issue can probably be solved by doing doc.outputSettings().prettyPrint(false); before doing html = doc.toString();

Finally I tried this solution (at the end of the question), using TextNodes:
How I can replace "text" in the each tag using Jsoup
This is the resulting code:
Elements els = doc.body().getAllElements();
for (Element e : els) {
for (Node child : e.childNodes()){
if (child instanceof TextNode && !((TextNode) child).isBlank()) {
((TextNode)child).text(((TextNode)child).text().replaceAll("changeme","<changed>changeme</changed>"));
}
}
}
Now the output is the expected, and it even does not introduce extra break lines. In this case prettyPrint must be set to True.
The only problem is that I don't really understand the difference of using TextNode vs Element.text(). If someone wants to provide some info it will be much appreciated.
Thanks.

WebDriver : Automated Code Generation without using Selenium IDE

My Companys website is compatible only with IE. So i cannot use IDE for recording webdriver scripts.
There are HTML pages which has about 100 or 200(not exact count) of textboxes and Dropdowns.
Writing java code to automate this is very much tedious.
Can someone provide me with tool or utility to read the HTML file itself and generate the corresponding code ?
Or guide me how to develop a utility to meet my need ?
For example :
Consider an html file like this
<html>
<body>
<input name = "employee_name" />
<select id = "designation">
<option value = "MD">MD</option>
<option value = "programmer"> Programmer </option>
<option value = "CEO"> CEO </option>
</option>
<body>
</html>
If i give this file as input to utility it will generate me a java file like this
WebDriver driver = new InternetExplorerDriver();
WebElement employee_name = driver.findElement(By.name("employee_name"));
employee_name.sendKeys("...");
Select designation = new Select(driver.findElement(By.id("designation")));
designation.selectByVisibleText("...");
Thanks in Advance !

You should be using "Selenium Builder" rather than "Selenium IDE", BUT, in theory, you could get all similar elements from a page in a group like so:
List<WebElement> bodyinputs = driver
.findElements( By.xpath("//div[#class='body']/input") );
List<WebElement> footeranchors = driver
.findElements( By.xpath("//div[#class='footer']/a") );
Then, for each of these groups, you can loop through the lists and use a JavaScriptExecutor to evaluate and figure out the XPath for each element and store the XPath in a hashtable with each Element:
protected String getXPath() {
String jscript = "function getPathTo(node) {" +
" var stack = [];" +
" while(node.parentNode !== null) {" +
" stack.unshift(node.tagName);" +
" node = node.parentNode;" +
" }" +
" return stack.join('/');" +
"}" +
"return getPathTo(arguments[0]);";
return (String) driver.executeScript(jscript, webElement);
}
Then, the final step, you can auto-generate "By locators" using the HashTable as input.
But even if you do that you still need to write code to intelligently figure out which By locators get which inputs and which ones don't.

How can I consistently remove the default text from an input element with Selenium?

I'm trying to use Selenium WebDriver to input text to a GWT input element that has default text, "Enter User ID". Here are a few ways I've tried to get this to work:
searchField.click();
if(!searchField.getAttribute("value").isEmpty()) {
// clear field, if not already empty
searchField.clear();
}
if(!searchField.getAttribute("value").isEmpty()) {
// if it still didn't clear, click away and click back
externalLinksHeader.click();
searchField.click();
}
searchField.sendKeys(username);
The strange thing is the above this only works some of the time. Sometimes, it ends up searching for "Enter User IDus", basically beginning to type "username" after the default text -- and not even finishing that.
Any other better, more reliable ways to clear out default text from a GWT element?
Edited to add: The HTML of the input element. Unfortunately, there's not much to see, thanks to the JS/GWT hotness. Here's the field when it's unselected:
<input type="text" class="gwt-TextBox empty" maxlength="40">
After I've clicked it and given it focus manually, the default text and the "empty" class are removed.
The JS to setDefaultText() gets called both onBlur() and onChange() if the change results in an empty text field. Guess that's why the searchField.clear() isn't helping.
I've also stepped through this method in debug mode, and in that case, it never works. When run normally, it works the majority of the time. I can't say why, though.

Okay, the script obviously kicks in when the clear() method clears the input and leaves it empty. The solutions it came up with are given below.
The naïve one, presses Backspace 10 times:
String b = Keys.BACK_SPACE.toString();
searchField.sendKeys(b+b+b+b+b+b+b+b+b+b + username);
(StringUtils.repeat() from Apache Commons Lang or Google Guava's Strings.repeat() may come in handy)
The nicer one using Ctrl+A, Delete:
String del = Keys.chord(Keys.CONTROL, "a") + Keys.DELETE;
searchField.sendKeys(del + username);
Deleting the content of the input via JavaScript:
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("arguments[0].value = '';", searchField);
searchField.sendKeys(username);
Setting the value of the input via JavaScript altogether:
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("arguments[0].value = '" + username + "';", searchField);
Note that javascript might not always work, as shown here: Why can't I clear an input field with javascript?

For what it is worth I'm have a very similar issue. WebDriver 2.28.0 and FireFox 18.0.1
I'm also using GWT but can reproduce it with simple HTML/JS:
<html>
<body>
<div>
<h3>Box one</h3>
<input id="boxOne" type="text" onfocus="if (this.value == 'foo') this.value = '';" onblur="if (this.value == '') this.value = 'foo';"/>
</div>
<div>
<h3>Box two</h3>
<input id="boxTwo" type="text" />
</div>
</body>
</html>
This test fails most of the time:
#Test
public void testTextFocusBlurDirect() throws Exception {
FirefoxDriver driver = new FirefoxDriver();
driver.navigate().to(getClass().getResource("/TestTextFocusBlur.html"));
for (int i = 0; i < 200; i++) {
String magic = "test" + System.currentTimeMillis();
driver.findElementById("boxOne").clear();
Thread.sleep(100);
driver.findElementById("boxOne").sendKeys(magic);
Thread.sleep(100);
driver.findElementById("boxTwo").clear();
Thread.sleep(100);
driver.findElementById("boxTwo").sendKeys("" + i);
Thread.sleep(100);
assertEquals(magic, driver.findElementById("boxOne").getAttribute("value"));
}
driver.quit();
}
It could just be the OS taking focus away from the browser in a way WebDriver can't control. We don't seem to get this issue on the CI server to maybe that is the case.

I cannot add a comment yet, so I am putting it as an answer here. I want to inform you that if you want to use only javascript to clear and/or edit an input text field, then the javascript approach given by #slanec will not work. Here is an example: Why can't I clear an input field with javascript?

In case you use c# then solution would be :
// provide some text
webElement.SendKeys("aa");
// this is how you use this in C# , VS
String b = Keys.Backspace.ToString();
// then provide back space few times
webElement.SendKeys(b + b + b + b + b + b + b + b + b + b);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get dynamic body text in Selenium with Java - java

Related

Is the standard library my best option for Java to load/read and edit/modify and save a html file with no reformatting?

How to use Selenium get text from an element not including its sub-elements

Replace string with jsoup only in text portions

WebDriver : Automated Code Generation without using Selenium IDE

How can I consistently remove the default text from an input element with Selenium?

Categories

Resources