Selenium: Extract Text of a div with cssSelector in Java - java

I am writing a JUnit test for a webpage, using Selenium, and I am trying to verify that the expected text exists within a page. The code of the webpage I am testing looks like this:
<div id="recipient_div_3" class="label_spacer">
<label class="nodisplay" for="Recipient_nickname"> recipient field: reqd info </label>
<span id="Recipient_nickname_div_2" class="required-field"> *</span>
Recipient:
</div>
I want to compare what is expected with what is on the page, so I want to use
Assert.assertTrue(). I know that to get everything from the div, I can do
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ");
but this will return "reqd info * Recipient:"
Is there any way to just get the text from the div ("Recipient") using cssSelector, without the other tags?

You can't do this with a CSS selector, because CSS selectors don't have a fine-grained enough approach to express "the text node contained in the DIV but not its other contents". You can do that with an XPath locator, though:
driver.findElement(By.xpath("//div[#id='recipient_div_3']/text()")).getText()
That XPath expression will identify just the single text node that is a direct child of the DIV, rather than all the text contained within it and its child nodes.

I am not sure if it is possible with one css locator, but you can get text from div, then get text from div's child nodes and subtract them. Something like that (code wasn't checked):
String temp = "";
List<WebElement> tempElements = driver.findElements(By.cssSelector("div[id='recipient_div_3'] *"));
for (WebElement tempElement : tempElements) {
temp =+ " " + tempElement.getText();
}
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ").replace(temp, "");
This is for case when you try to avoid using xpath. Xpath allows to do it:
//div[#id='recipient_div_3']/text()

You could also get the text content of an element and remove the tags with regexp. Also notice: you should use the reluctant quntifier
https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
String getTextContentWithoutTags(WebElement element) {
return element.getText().replaceAll("<[^>]*?/>", "").trim();
}

Related

How to locate element with no tag name in Selenium Webdriver Java

<span id="UniversalRepositoryExplorer_treeNode7_name" style="white-space:nowrap;"> == $0
<img id="UniversalRepositoryExplorer_treeNodeIcon_7" src="../../images/server_running.gif"
style="width:16px;height:16px;" alt border="0"> == $0
" Running " == $0
tag span has inside tag img and below text Running doesn't have any tag name
I have tried the below x-path that didn't work:
//img[#id='UniversalRepositoryExplorer_treeNodeIcon_7']
Can someone suggest to me how to get Running through x-path?
Basically, you're trying to get the text node.
xpath: '//span/text()[last()]'
Example:
document.evaluate("//span/text()[last()]", window.document, null, XPathResult.ANY_TYPE, null).iterateNext()
// output: " Running "
Looking at the element you have noted above, if I was trying to evaluate for a cucumber step, I would write this for selenium-java:
#And("^I can view the \"([^\"]*)\" image$")
public void iCanViewTheImage(String text) throws Throwable {
String imgXpath = "//span[contains(text(),'" + text + "') and contains(#id, 'UniversalRepositoryExplorer_treeNode7_name')]//img[#id='UniversalRepositoryExplorer_treeNodeIcon_7']"
driver.findElement(By.xpath(imgXpath));
assertTrue(driver.findElement(By.xpath(imgXpath)).getText().contains(text));
}
where the regex "([^"]*)" would be "Running" in your cucumber step:
And I can view the "Running" image
Your xpath with the text based on the above will be:
//span[contains(text(),'Running') and contains(#id, 'UniversalRepositoryExplorer_treeNode7_name')]//img[#id='UniversalRepositoryExplorer_treeNodeIcon_7']
or simply
//span[contains(text(),'Running') and contains(#id, 'UniversalRepositoryExplorer_treeNode7_name')]
The problem with your locator is that an IMG tag is self-closing (there is no </img> tag) so it can't contain text. The parent SPAN is likely the element that contains the "Running" text.
You haven't posted valid HTML for the portion being discussed so the below HTML is what I'm guessing the general structure looks like.
<span id="UniversalRepositoryExplorer_treeNode7_name" style="white-space:nowrap;">
<img id="UniversalRepositoryExplorer_treeNodeIcon_7" src="../../images/server_running.gif" style="width:16px;height:16px;" alt border="0">
" Running "
</span>
Given that HTML, the code would look like
driver.findElement(By.id("UniversalRepositoryExplorer_treeNode7_name")).getText();
When you grab the SPAN and then use .getText(), it will get all the text contained within that SPAN. This is of course a guess since you haven't posted all the HTML under that span. This may get more than what you wanted but that's the best we can do with an incomplete question. If you add more details, we can adjust our answers.
The text Running is within a text node and it is the second child of it's parent <span> tag. So to retrieve the text you have to induce WebDriverWait for the visibilityOfElementLocated() and you can use either of the following Locator Strategies:
cssSelector:
System.out.println(((JavascriptExecutor)driver).executeScript("return arguments[0].childNodes[2].textContent;", new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("span#UniversalRepositoryExplorer_treeNode7_name")))).toString());
xpath:
System.out.println(((JavascriptExecutor)driver).executeScript("return arguments[0].childNodes[2].textContent;", new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span[#id='UniversalRepositoryExplorer_treeNode7_name']")))).toString());
References
You can find a couple od relevant detailed discussions in:
How to extract just the number from html?
How do I use selenium to scrape text from a text node within a class through Python
How to retrieve partial text from a text node using Selenium and Python

Jsoup Trouble extracting formatting from html tables

<tr>
<th align="LEFT" bgcolor="GREY"> <span class="smallfont">Higher-order
Theorems</span>
</th><th bgcolor="PINK"> <em><a href="\
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax---3.2\]
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax--
-3.2)">Satallax</a><br><span class="xxsmallfont">3.2</span></em>
</th><th bgcolor="SKYBLUE"> <a href="\
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax---3.3\]
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax--
-3.3)">Satallax</a><br><span class="xxsmallfont">3.3</span>
</th><th bgcolor="LIME"> <a href="\
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Leo-III---1.3\]
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Leo-III--
-1.3)">Leo‑III</a><br><span class="xxsmallfont">1.3</span>
</th><th bgcolor="YELLOW"> <a href="\
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#LEO-II---1.7.0\]
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#LEO-II--
-1.7.0)">LEO‑II</a><br><span class="xxsmallfont">1.7.0</span>
</th></tr>
So lets say I want to extract bgcolor, align, and what is contained in the span class. So for example GREY,LEFT,Higher-order Theorems.
If I just wanted to extract at the very least bgcolor, but ideally all 3, how would i do so?
So I was attempting to extract just the bgcolor and
I've tried doc.select("tr:contains([bgcolor]"), doc.select(th, [bgcolor), doc.select([bgcolor]), doc.select(tr:containsdata(bgcolor) , as well as doc.select([style]) and all have either returned no output or returned a parse error. I can extract the stuff in the span class just fine but it is more of a problem of also extracting bgcolor and align.
You just need to parse the HTML code you want to scrap into JSOUP and then select the attributes of the HTML tags you want, using the attr selector from JSOUP Elements, and that gives you the value of that attribute for every th tag in the HTML. To retrieve also the text contained between the span tags you need to select the nested span in the th and get the .text().
Document document = Jsoup.parse(YOUT HTML GOES HERE);
System.out.println(document);
Elements elements = document.select("tr > th");
for (Element element : elements) {
String align = element.attr("align");
String color = element.attr("bgcolor");
String spanText = element.select("span").text();
System.out.println("Align is " + align +
"\nBackground Color is " + color +
"\nSpan Text is " + spanText);
}
For any further information feel free to ask me! Hope this helped you!
Updated Answer to comment:
To do that, you'll need to use this line inside the for each loop:
String fullText = element.text();
That way you can get all the text contained between the selected Element tags, but you should look up this blog and fit you desired query to it. I guess you will also need to check if the String is empty or not, and do separate queries for each possible case, using IF conditionals.
That implies having one for this structure: tr > th > span, another for this one: tr > th > em, and another for: tr > th.

Is there any way to get the XPath or CSS path of an element using Java?

I am trying to get the XPath and CSS path of an element using Java. I have used jsoup to parse the HTML and I am getting the CSS path, but in some cases it is returning the wrong path. (I am matching it with Selenium generated paths.)
I am using this code to generate CSS path
my element is "s-Rectangle_44"
<div id="s-Rectangle_44" class="rectangle firer click commentable">
<div class="clipping">
<div class="content">
<div class="valign">
<span id="rtr-s-Rectangle_44_0"></span>
</div>
</div>
</div>
</div>
And selenium is giving css path as css=#s-Rectangle_44 > div.clipping > div.content > div.valign while I am getting array Index out of bound exception.I need to get the XPath also. Is there any other method to get this? Can I use Firebug with Java?
public static String getCssPath(Element el) {
if (el == null)
return "";
if (!el.id().isEmpty())
return "#" + el.id();
StringBuilder selector = new StringBuilder(el.tagName());
String classes = StringUtil.join(el.classNames(), ".");
if (!classes.isEmpty())
selector.append('.').append(classes);
if (el.parent() == null)
return selector.toString();
selector.insert(0, " > ");
if (el.parent().select(selector.toString()).size() > 1) {
selector.append(String.format(":nth-child(%d)",
el.elementSiblingIndex() + 1));
}
return getCssPath(el.parent()) + selector.toString();
}
CSS path
According to the jsoup API reference an Element has a cssSelector() function, which returns the related CSS path.
This function simply returns #s-Rectangle_44 for your element with the ID s-Rectangle_44.
XPath
If you can assume that every element you want to match has an ID, then this code may be sufficient:
String getXPath(Element el)
{
return "//*[#id='" + el.id() + "']";
}
Regarding your example this would return //*[#id='s-Rectangle_44'] for your element with the ID s-Rectangle_44.
There is also an answer to a similar thread, which has code for absolute paths.
Firebug's code (written in JavaScript) has functions for getting the CSS path and functions for getting the XPath, which generate absolute paths and could be translated to Java.
Note that Firebug is under BSD license.
I would recommend to make your own XPath manually, because sometimes you need to use other attributes than "id" and classes like e.g. "data-id" when a developer uses it to store some properties.
Another way to get an XPath for an element is via the developer panel in Chrome. Just right-click the element and click Copy XPath. But it copies an absolute XPath with ID.

Get severals class same name with JSOUP

Is there a way to get HTML from severals class with same name with the plugin JSoup of Java ?
For example:
<div class="div_idalgo_content_result_date_match_local">
blablabla
</div>
<div class="div_idalgo_content_result_date_match_local">
123456789
</div>
I'd like get blablabla in one String and 123456789 in another.
I wish my question is understandable.
This can be done in several different ways.
If you want to select the div's with the class name above, you can simply use the following:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
This will give you a collection of Element that you can iterate over.
If you after that would like to select perhaps only the first one, you can use the :eq(0)-parameter, or the first()-parameter.
Element firstDiv = div.first();
OR
Elements div = doc.select("div.div_idalgo_content_result_date_match_local:eq(0)");
Note that the second method you are selecting from the document, while in the first method you select from the collection of Element's. You can of course also change the value of the :eq(0) to something else that matches your element. There are many useful selectors that you can use that I have included a link to in the end of the answer.
The following code will split your div's into two:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
Element firstDiv = div.first();
Element secondDiv = div.get(1);
System.out.println("This is the first div: " + firstDiv.text());
System.out.println("This is the second div: " + secondDiv.text());
JSoup Cookbook - Selector syntax

Locating child nodes of WebElements in selenium

I am using selenium to test my web application and I can successfully find tags using By.xpath. However now and then I need to find child nodes within that node.
Example:
<div id="a">
<div>
<span />
<input />
</div>
</div>
I can do:
WebElement divA = driver.findElement( By.xpath( "//div[#id='a']" ) )
But now I need to find the input, so I could do:
driver.findElement( By.xpath( "//div[#id='a']//input" ) )
However, at that point in code I only have divA, not its xpath anymore... I would like to do something like this:
WebElement input = driver.findElement( divA, By.xpath( "//input" ) );
But such a function does not exist.
Can I do this anyhow?
BTW: Sometimes I need to find a <div> that has a certain decendent node. How can I ask in xpath for "the <div> that contains a <span> with the text 'hello world'"?
According to JavaDocs, you can do this:
WebElement input = divA.findElement(By.xpath(".//input"));
How can I ask in xpath for "the div-tag that contains a span with the
text 'hello world'"?
WebElement elem = driver.findElement(By.xpath("//div[span[text()='hello world']]"));
The XPath spec is a suprisingly good read on this.
If you have to wait there is a method presenceOfNestedElementLocatedBy that takes the "parent" element and a locator, e.g. a By.xpath:
WebElement subNode = new WebDriverWait(driver,10).until(
ExpectedConditions.presenceOfNestedElementLocatedBy(
divA, By.xpath(".//div/span")
)
);
For Finding All the ChildNodes you can use the below Snippet
List<WebElement> childs = MyCurrentWebElement.findElements(By.xpath("./child::*"));
for (WebElement e : childs)
{
System.out.println(e.getTagName());
}
Note that this will give all the Child Nodes at same level ->
Like if you have structure like this :
<Html>
<body>
<div> ---suppose this is current WebElement
<a>
<a>
<img>
<a>
<img>
<a>
It will give me tag names of 3 anchor tags here only . If you want all the child Elements recursively , you can replace the above code with
MyCurrentWebElement.findElements(By.xpath(".//*"));
Hope That Helps !!
I also found myself in a similar position a couple of weeks ago. You can also do this by creating a custom ElementLocatorFactory (or simply passing in divA into the DefaultElementLocatorFactory) to see if it's a child of the first div - you would then call the appropriate PageFactory initElements method.
In this case if you did the following:
PageFactory.initElements(new DefaultElementLocatorFactory(divA), pageObjectInstance));
// The Page Object instance would then need a WebElement
// annotated with something like the xpath above or #FindBy(tagName = "input")
The toString() method of Selenium's By-Class produces something like
"By.xpath: //XpathFoo"
So you could take a substring starting at the colon with something like this:
String selector = divA.toString().substring(s.indexOf(":") + 2);
With this, you could find your element inside your other element with this:
WebElement input = driver.findElement( By.xpath( selector + "//input" ) );
Advantage: You have to search only once on the actual SUT, so it could give you a bonus in performance.
Disadvantage: Ugly... if you want to search for the parent element with css selectory and use xpath for it's childs, you have to check for types before you concatenate...
In this case, Slanec's solution (using findElement on a WebElement) is much better.

Categories

Resources