get div by class containing TWO whitespaces in a row (JSoup) - java

i'm trying to get a specific div by it's class. The class actually contains multiple classes seperated with spaces, but: the last class is seperated by to spaces!
Ex: class=test[SPACE]test[SPACE]test[SPACE][SPACE]test
full:
listing[SPACE]category_templates[SPACE]clearfix[SPACE]shelfListing[SPACE][SPACE]multiSaveListing
Now how did i go on about doing that?
Did not work (No Error thrown):
Elements divItemContainer = doc.select("div[class=listing category_templates clearfix shelfListing multiSaveListing]");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
Did not work (Thrown Error: String cannot contain whitespaces):
Elements divItemContainer = doc.select("div.listing.category_templates.clearfix.shelfListing..multiSaveListing");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
Did not work (No Error):
Elements divItemContainer = doc.select("div.listing.category_templates.clearfix.shelfListing.multiSaveListing");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
PS: The Toast is meant to purposly crash the App! It does nothing but kill and that's supposed to happen (at least at the moment)
Source:
<div class="listing category_templates clearfix shelfListing multiSaveListing"><div id="yousaveImage"></div><div class="multisave" id="multiSaveId"><a class="linksave" href="/promotion/2-for-250/ls85559"><span class="view-all">View all</span><span class="offer-2for3">2 for</span><span><span class="poundSign"></span><span class="ping-offer-finalValue">£2.50</span><span class="ping-offer-finalValue-1" style="display:none"></span><span class="pencep" style="display:none">p</span></span></a></div><div class="container"><div class="slider category_templates"><input id="itemId" value="1000000476716" type="hidden"><input id="maxQtyId" value="24.0" type="hidden"><div class="product active"><div class="slider"><div class="information active"><div class="imgContainer"><img class="" src="http://ui2.assets-asda.com:80/g/v5/501/375/5051413501375_130_IDShot_4.jpeg" data-original="http://ui2.assets-asda.com:80/g/v5/501/375/5051413501375_130_IDShot_4.jpeg" alt="ASDA Chosen By You Orange & Pineapple Double Strength Squash 2 FOR £2.50" title="" onerror="loadNoImage(this)"><span class="accessible"> Add to shopping list</span></div><p class="bundle-contains" style="display:none;"> Contains <span>0</span> <span>items</span></p><p class="subTitle">1.5LT</p></div></div><div class="product-content"><span class="bundle-banner" style="display:none;"> Bundle </span><span class="promoBanner"></span><span class="primaryBanner" style="display:none;">2 FOR £2.50</span><span class="title" id="productTitle"><a role="presentation" aria-hidden="true" tabindex="-1" href="/product/no-added-sugar/asda-chosen-by-you-orange-pineapple-double-strength-squash/1000000476716" title="ASDA Chosen By You Orange & Pineapple Double Strength Squash"><span>ASDA Chosen By You Orange & Pineapple Double Strength Squash</span></a></span><div class="product-type-icons" style="visibility:visible"><i data-contentid="" data-similarproducts="true" data-title="Suitable for Vegetarians" data-name="Suitable for Vegetarians" title="Vegetarian" class="type-icon icon-suitable-for-vegetarians" data-infoiconid="1215398078196" data-id="2854136">Vegetarian</i></div><div class="rating-static rating-50"><span class="star star1"></span><span class="star star2"></span><span class="star star3"></span><span class="star star4"></span><span class="star star5"></span></div><div class="prod-limit-Mask"></div><div class="quantity-info-Mask"><span class="qLimit-toolTip"></span> Close <div class="qLimit-popUp"><p id="quantityLimitText"><span class="qLimit-Sorry">Sorry...</span>You can't add more than <span class="max-qty-val">24</span> per order</p></div></div><div id="cartBground" class="addedbg"><div class="price-cart-block"><div class="price-wrap category_templates"><span class="price"><span>£1.40</span></span><span class="priceInformation"> (9.3p/100ml) </span></div>AddView bundle<div class="quantityOptions clearfix"><span>–</span><input aria-label="Quantity in your trolley" value="1" name="quantityInTrolley" class="prd-txt" maxlength="5" type="number"><span>+</span>Add<div id="qtySelect" class="qty-wrapper" style="display: none;"><div class="qty-select"><span class="qty-value" tabindex="0" title="Quantity">Q<span class="accessible">uanti</span>ty</span><span class="qty-select-icon"></span></div><ul class="qty-list" style="display:none"><li class="qtyAccessible"><span title="Quantity" data-salesunit="Qty">Q<span class="accessible">uanti</span>ty</span></li><li class="kgAccessible"><span title="Kilogram" data-salesunit="kg">k<span class="accessible">ilo</span>g<span class="accessible">ram</span></span></li></ul></div><p id="inTrolleyId">in your trolley</p></div></div><div id="itemAjaxLoader" class="ajaxLoader 1000000476716" style="display:none;"><img src="//ui3.assets-asda.com/theme/img/common/loader.svg" style="width: 32px;" onerror="this.src=//ui3.assets-asda.com/theme/img/common/ajax-loader.gif; this.onerror=null;"></div><div class="unavail-item-message"> Item unavailable<span class="qLimit-toolTip"></span></div><div class="unavail-item"><span class="unavailable-image"></span><span></span></div></div></div><div class="sectionMenu"></div></div></div></div></div>

This works, but it is unsafe and no reason to use it. Moreover in order for this to work you the order of the classes and the whitespaces must be identical. You say it doesn't, but I've tested it and it does.
Elements divItemContainer = doc.select("div[class=listing category_templates clearfix shelfListing multiSaveListing]");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
This is the way to do it. The order of the classes doesn't matter, nor the whitespaces. You say it doesn't work, but I've tested it and it does.
Elements divItemContainer = doc.select("div.listing.category_templates.clearfix.shelfListing.multiSaveListing");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
For this one the error is correct.
Elements divItemContainer = doc.select("div.listing.category_templates.clearfix.shelfListing..multiSaveListing");
for (Element div : divItemContainer) {
Toast.makeText(ApplicationContextProvider.getContext(), "Got Div: ", Toast.LENGTH_SHORT).show();
}
You query goes through a validation before executed. The validation that takes place takes as a parameter every class you input. The css selector you type gets split for every . and by typing consecutive . you are creating empty classes.
public static void notEmpty(String string) {
if ((string == null) || (string.length() == 0))
throw new IllegalArgumentException("String must not be empty");
}
The reason it doesn't work is not your selector. Try typing the response you get from the server. When you don Document doc = Jsoup.parse()... try printing the doc. Does this contain the element you are searching for? I'm suspecting it doesn't.
If I'm right in that the element you are searching for is not present in the response you are getting, then you have two possibilities.
The server perceives your program as a bot and doesn't allow that or it serves you a page for mobiles, so it serves you something else from what you are seeing when navigating through the browser. If this is the case then the solution is to set a userAgent
The element is not present because it is generated by javascript. Jsoup is just a parser, not a browser. It cannot execute javascript, thus it cannot generate the dynamic content. In order to check if the content you need is dynamic, just navigate to the page and press Ctrl + U and check if the element you need is in there. That's the content before any javascript is executed.

Related

Selenium WebDriver - Using Java - How can I check if error messages are visible or not in a webpage?

I am testing a webpage that does some user error validation. When the webpage first appears, no error messages should appear, so I need to check for that. Then, depending upon the error (sometimes after clicking “submit” other times after the user enters data), I need to verify that the correct error messages appear.
In the code below, no error message should appear when the webpage is first loaded, but if I don’t enter a date and click the submit button, the error message should appear.
<div id="showNotIE" style="display: none;">
<input id="txtImplantDate" class="ng-pristine ng-untouched ng-empty ng-invalid ng-invalid-required" type="date" required="" placeholder="YYYY-MM-DD" name="txtImplantDate" ng-model="ImplantInsert.ImplantImplantDate">
</div>
<div class="ng-scope" ng-if="showMsgs && (Currentdate < newDate)" style="">
<span style="color:red;">Implant Date cannot be greater than today and is required</span>
Using the Java code below, this seems to function properly (the first check in the IE 11 browser takes a REALLY LONG TIME, but it does appear to work).
//Confirming text is not visible
boolean isPresent = driver.findElements(By.xpath(textLocator)).size() > 0;
if (isPresent) {
//Write to log that text is present (FAIL)
} else {
//Write to log that text is not present (PASS)
} //end if
This code also seems to work:
//Confirming text is not visible
boolean isEmpty = driver.findElements(By.xpath(textLocator)).isEmpty();
if (isEmpty) {
//Write to log that text is not present (PASS)
} else {
//Write to log that text is present (FAIL)
} //end if
However, when I test against this HTML code and use the same Selenium WebDriver Java logic to test, I get the wrong results.
<select id="selPatientTypes" class="fontInput ng-pristine ng-untouched ng-empty ng-invalid ng-invalid-required" required="" name="selPatientTypes" ng-options="n.PatienTypeID as n.PatientTypeDescription for n in scPatientTypes | filter:FilterPatientTypes" ng-model="ImplantInsert.ImplantPatientTypeID">
<option class="" value="" selected="selected">-- Please select Patient Type --</option>
<option label="Adult" value="number:1">Adult</option>
<option label="Pediatric" value="number:2">Pediatric</option>
</select>
<span class="ng-hide" style="color:red;" ng-show="showMsgs && ImplantForm.selPatientTypes.$error.required">Patient Type is required</span>
If I try using this “isDisplayed” code, Java errors out.
try {
boolean xpathIsDisplayed = driver.findElement(By.xpath(fieldLocator[value])).isDisplayed();
// Write to log that text is present (FAIL)
} catch (Error e) {
//Write to log that text is not present (PASS)
} //end try
The error message is:
org.openqa.selenium.NoSuchElementException: Unable to find element with xpath == //div[2]/table[2]/tbody/tr[1]/td[2]/div[3]/span (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 30.06 seconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html (BTW: This URL doesn’t provide any useful information)
This is another type of error logic that is used on the webpage.
<input id="txtPatientBSA" class="fontInput ng-pristine ng-untouched ng-valid ng-empty ng-valid-maxlength" oninput="this.value = this.value.replace(/[^0-9.]/g, ''); this.value = this.value.replace(/(\..*)\./g, '$1'); " title="Patient BSA should be in range of 0 to 5.00 (X.XX)" ng-keyup="ValidateBSA()" style="direction: rtl" max="5.00" min="0.00" maxlength="4" size="4" ng-model="ImplantInsert.ImplantPatientBSA" name="txtPatientBSA" placeholder="X.XX">
<span id="ErrorMsgBSA" class="error error-keyup-1 ng-hide" style="color:red;" ng-show="ShowErrorMsgBSA"> Patient BSA should be in range of 0 to 5.00 (X.XX)</span>
Does anyone know if there is a way to check for all types of HTML error message logic and determine if they are visible or not visible on the webpage?
Thanks.
I'm guessing the reason your first error code takes a really long time is because you have an implicit wait set for a long time. Implicit waits will wait for the specified amount of time for an element to appear. In the case of an error message that isn't there, it will wait for the timeout period and then move on which makes your code execute slowly. I would remove the implicit wait and add explicit waits where needed. The second error is just saying that it can't find the element with the given XPath. You probably need to update your XPath or use another locator.
Here's a Java function that you can pass your desired locator into and it will return true if the element exists and is visible.
/**
* Returns whether an element is visible
*
* #param locator the locator to find the desired element
* #return true if the element exists and is visible, false otherwise
*/
public boolean IsVisible(By locator)
{
List<WebElement> elements = driver.findElements(locator);
if (elements.isEmpty())
{
// element doesn't exist
return false;
}
else
{
// element exists, check for visibility
return elements.get(0).isDisplayed();
}
}
For the element below,
<span id="ErrorMsgBSA" class="error error-keyup-1 ng-hide" style="color:red;" ng-show="ShowErrorMsgBSA"> Patient BSA should be in range of 0 to 5.00 (X.XX)</span>
you could use the locator, By.id("ErrorMsgBSA").
For the element below,
<span class="ng-hide" style="color:red;" ng-show="showMsgs && ImplantForm.selPatientTypes.$error.required">Patient Type is required</span>
you could use a CSS selector like, "span.ng-hide" which just means find a SPAN that contains a class (.) ng-hide.
A CSS Reference
CSS Selector Tips
you should add findElement in try catch block and use findElement instead of findElements (Xpath is a unique).
Use for construction. the point is that your element can be there, but the selenium may not be able to find it at once. So you need to check it more then once and wait a little after every iteration.
WebElement element = null;
boolean flag = false;
for (int i = 0; i< 10; i++) {// iterating 10 times
try {
element = driver.findElement(By.xpath("yourXpath"));
if (element != null) {
flag = true;
break;
}
} catch (Exception e) {
//you can log or whatever you want
}
pause(1000); //after iteration wait for a second ,
//so it will give a time to load the DOM
}
if(flag){
//your code goes here
}else{
//your code goes here
}

How to get related classes and values in JSoup?

I have an HTML file, a part of which looks like this:
<a name="user_createtime"></a>
<p class="column">
<span class="coltitle">CreateTime</span> <span class="titleDesc"><span class='defPopupLink' onClick='popupDefinition(event, "datetime")'>datetime</span></span> <span class = "spaceandsize">(non-null)<sup><span class='glossaryLink' onclick="popupDefinition(event, '<b>non-null</b><br>The column cannot contain null values.')">?</span></sup></span>
<br>
<span class="desc">Timestamp when the object was created</span>
<a name="user_createuser"></a>
<p class="column">
<span class="coltitle">CreateUser</span> <span class="titleDesc">foreign key to User</span>
<span class = "spaceandsize">(database column: CreateUserID)</span>
<br>
<span class="desc">User who created the object</span>
There are many such Coltitle. titleDesc and desc classes.
Now, if I get an input string like "CreateTime", I want the output to be:
CreateTime, datetime, Timestamp when the object was created
and if I get an input string "CreateUser", I want the output to be:
CreateUser, foreign key to User, User who created the object
I'm using Jsoup for this, and I have gotten this far:
Elements colElements = Jsoup.parse(html).getElementsByClass("coltitle").select("*");
System.out.println("your Col:");
for (Element element : colElements)
{
if(element.ownText().equalsIgnoreCase("CreateTime"))
System.out.println(element.text());
}
which just prints the selected coltitle. How do I parse the related classes and get their values? Or, are they not even related and am I just treading down the wrong path?
Can someone please help me get my desired output?
You are only selecting the <span>-tags, thus, only printing what they values they hold.
You can use the siblingElements()-method to get the siblings of the element that you first select.
Your HTML does not seem to be formatted correctly, but the following should work
System.out.println("your Col:");
for (Element element : colElements) {
if (element.ownText().equalsIgnoreCase("CreateTime")) {
System.out.print(element.text());
for (Element sibling : element.siblingElements()) {
System.out.print(", " + sibling.text());
}
}
if (element.ownText().equalsIgnoreCase("CreateUser")) {
System.out.print("\n"+element.text());
for (Element sibling : element.siblingElements()) {
System.out.print(", " + sibling.text());
}
}
}
This will select the elements of the class 'colTitle'.
The if-case will check if it's either of them, and then print out the element text. It will then move on to it's siblings, and print out their texts.
According to the api docs, you can call children() on colElements.
http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#children()

Using JSoup to select a group of tags

I am attempting to use JSoup to scrape some information off a page, which can be identified by a group of tags in a particular order. The order of them is as follows:
<span class="sold" >Sold</span></td>
<td class='prc'>
<div class="g-b bidsold" itemprop="price">
AU $1.00</div>
I am looking to grab each value that is in place of the AU $1.00 field on the page, but they can only be identified by the span class="sold" selector that occurs a few tags beforehand.
I have tried something like select("span.sold:lt(4) + [itemprop=price]") but feel like I'm flailing around in the dark!
The code below should do the trick!!!
Document doc = Jsoup.connect(/*URL of your HTML document*/").get();
Element part = doc.body();
Elements parts = part.getElementsByTag("div");
String attValue;
String requiredContent;
for(Element ent : parts)
{
if(ent.hasAttr("class"))
{
attValue = ent.attr("class");
if(attValue.equals("g-b bidsold"))
{
System.out.println("\n");
requiredContent=ent.text();
System.out.println(requiredContent);
}
}
}
Just make sure to iterate and get the output in an array.
You could also do this:
Elements soldPrices = doc.select("td:has(.sold) + td [itemprop=price]");
That will return elements (the DIVs) that have price itemprops, which have immediately preceeding TDs with elements (the SPANs) with class=sold.
See the Selector syntax for more details.

Can't find an amazon element with jsoup (Java) because I have little knowledge with web development

I'm currently trying to scrape amazon for a bunch of data. I'm using jsoup to help me do this, and everything has gone pretty smoothly, but for some reason I can't figure out how to pull the current number of sellers selling new products.
Here's an example of the url I'm scraping : http://www.amazon.com/dp/B006L7KIWG
I want to extract "39 new" from the following below:
<div id="secondaryUsedAndNew" class="mbcOlp">
<div class="mbcOlpLink">
<a class="buyAction" href="/gp/offer-listing/B006L7KIWG/ref=dp_olp_new_mbc?ie=UTF8&condition=new">
39 new
</a> from
<span class="price">$60.00</span>
</div>
</div>
This project is the first time I've used jsoup, so the coding may be a bit iffy, but here are some of the things I have tried:
String asinPage = "http://www.amazon.com/dp/" + getAsin();
try {
Document document = Jsoup.connect(asinPage).timeout(timeout).get();
.....
//get new sellers try one
Elements links = document.select("a[href]");
for (Element link : links) {
// System.out.println("Span olp:"+link.text());
String code = link.attr("abs:href");
String label = trim(link.text(), 35);
if (label.contains("new")) {
System.out.println(label + " : " + code);
}
}
//get new sellers try one
Elements links = document.select("div.mbcOlpLink");
for (Element link : links) {
// System.out.println("Span olp:"+link.text());
}
//about a million other failed attempts that you'll just have to take my word on.
I've been successful when scrape everything else I need on the page, but for some reason this particular element is being a pain, any help would be GREAT! Thanks guys!
I would use
String s = document.select("div[id=secondaryUsedAndNew] a.buyAction").text.replace(" "," ");
This should leave you "42 new" as it says on the page at this moment.
Hope this works for you!

How can I consistently remove the default text from an input element with Selenium?

I'm trying to use Selenium WebDriver to input text to a GWT input element that has default text, "Enter User ID". Here are a few ways I've tried to get this to work:
searchField.click();
if(!searchField.getAttribute("value").isEmpty()) {
// clear field, if not already empty
searchField.clear();
}
if(!searchField.getAttribute("value").isEmpty()) {
// if it still didn't clear, click away and click back
externalLinksHeader.click();
searchField.click();
}
searchField.sendKeys(username);
The strange thing is the above this only works some of the time. Sometimes, it ends up searching for "Enter User IDus", basically beginning to type "username" after the default text -- and not even finishing that.
Any other better, more reliable ways to clear out default text from a GWT element?
Edited to add: The HTML of the input element. Unfortunately, there's not much to see, thanks to the JS/GWT hotness. Here's the field when it's unselected:
<input type="text" class="gwt-TextBox empty" maxlength="40">
After I've clicked it and given it focus manually, the default text and the "empty" class are removed.
The JS to setDefaultText() gets called both onBlur() and onChange() if the change results in an empty text field. Guess that's why the searchField.clear() isn't helping.
I've also stepped through this method in debug mode, and in that case, it never works. When run normally, it works the majority of the time. I can't say why, though.
Okay, the script obviously kicks in when the clear() method clears the input and leaves it empty. The solutions it came up with are given below.
The naïve one, presses Backspace 10 times:
String b = Keys.BACK_SPACE.toString();
searchField.sendKeys(b+b+b+b+b+b+b+b+b+b + username);
(StringUtils.repeat() from Apache Commons Lang or Google Guava's Strings.repeat() may come in handy)
The nicer one using Ctrl+A, Delete:
String del = Keys.chord(Keys.CONTROL, "a") + Keys.DELETE;
searchField.sendKeys(del + username);
Deleting the content of the input via JavaScript:
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("arguments[0].value = '';", searchField);
searchField.sendKeys(username);
Setting the value of the input via JavaScript altogether:
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("arguments[0].value = '" + username + "';", searchField);
Note that javascript might not always work, as shown here: Why can't I clear an input field with javascript?
For what it is worth I'm have a very similar issue. WebDriver 2.28.0 and FireFox 18.0.1
I'm also using GWT but can reproduce it with simple HTML/JS:
<html>
<body>
<div>
<h3>Box one</h3>
<input id="boxOne" type="text" onfocus="if (this.value == 'foo') this.value = '';" onblur="if (this.value == '') this.value = 'foo';"/>
</div>
<div>
<h3>Box two</h3>
<input id="boxTwo" type="text" />
</div>
</body>
</html>
This test fails most of the time:
#Test
public void testTextFocusBlurDirect() throws Exception {
FirefoxDriver driver = new FirefoxDriver();
driver.navigate().to(getClass().getResource("/TestTextFocusBlur.html"));
for (int i = 0; i < 200; i++) {
String magic = "test" + System.currentTimeMillis();
driver.findElementById("boxOne").clear();
Thread.sleep(100);
driver.findElementById("boxOne").sendKeys(magic);
Thread.sleep(100);
driver.findElementById("boxTwo").clear();
Thread.sleep(100);
driver.findElementById("boxTwo").sendKeys("" + i);
Thread.sleep(100);
assertEquals(magic, driver.findElementById("boxOne").getAttribute("value"));
}
driver.quit();
}
It could just be the OS taking focus away from the browser in a way WebDriver can't control. We don't seem to get this issue on the CI server to maybe that is the case.
I cannot add a comment yet, so I am putting it as an answer here. I want to inform you that if you want to use only javascript to clear and/or edit an input text field, then the javascript approach given by #slanec will not work. Here is an example: Why can't I clear an input field with javascript?
In case you use c# then solution would be :
// provide some text
webElement.SendKeys("aa");
// this is how you use this in C# , VS
String b = Keys.Backspace.ToString();
// then provide back space few times
webElement.SendKeys(b + b + b + b + b + b + b + b + b + b);

Categories

Resources