Get severals class same name with JSOUP

Get severals class same name with JSOUP - java

Is there a way to get HTML from severals class with same name with the plugin JSoup of Java ?
For example:
<div class="div_idalgo_content_result_date_match_local">
blablabla
</div>
<div class="div_idalgo_content_result_date_match_local">
123456789
</div>
I'd like get blablabla in one String and 123456789 in another.
I wish my question is understandable.

This can be done in several different ways.
If you want to select the div's with the class name above, you can simply use the following:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
This will give you a collection of Element that you can iterate over.
If you after that would like to select perhaps only the first one, you can use the :eq(0)-parameter, or the first()-parameter.
Element firstDiv = div.first();
OR
Elements div = doc.select("div.div_idalgo_content_result_date_match_local:eq(0)");
Note that the second method you are selecting from the document, while in the first method you select from the collection of Element's. You can of course also change the value of the :eq(0) to something else that matches your element. There are many useful selectors that you can use that I have included a link to in the end of the answer.
The following code will split your div's into two:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
Element firstDiv = div.first();
Element secondDiv = div.get(1);
System.out.println("This is the first div: " + firstDiv.text());
System.out.println("This is the second div: " + secondDiv.text());
JSoup Cookbook - Selector syntax

Related

How to write an XPath using AND Operator to add multiple spans in a single XPath?

The page contains a Product name-(3 OF 3) GOLDEN GLOW ( DELUXE ).
The product name has 6 different spans so we want to print the product name "GOLDEN GLOW ( DELUXE )", i.e including the all the spans so I have tried to use the and multiple time inside the [] but it didn't work. Below is the XPath:
//*[#class='itemTitleCopy no-mobile' and contains(#class, 'no-mobile') and contains(#class, 'sizeDescriptionTitle no-mobile') contains(#class, 'no-mobile') ]
Below is the HTML code:
<span class="m-shopping-cart-item-header-number">
(
<span id="itemNo-1" class="itemNo">3</span>
of
<span id="totalItems-1" class="totalItems">3</span>
)
<span class="itemTitleCopy no-mobile" id="itemTitleCopy-1">Golden Glow</span>
<span class="no-mobile">(</span>
<span class="sizeDescriptionTitle no-mobile" id="sizeDescriptionTitle-1">Deluxe</span>
<span class="no-mobile">)</span>
</span>
Update
Code trials:
WebElement checkoutShippingProdName = new WebDriverWait(getDriver(), 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span[#class='m-shopping-cart-item-header-number']")));
String shipProdElementHtml = checkoutShippingProdName.getAttribute("innerHTML");
String[] shipProdElementHtmlHtmlSplit = shipProdElementHtml.split("span>");
String currentProd = shipProdElementHtmlHtmlSplit[shipProdElementHtmlHtmlSplit.length -1];
currentProd = StringEscapeUtils.unescapeHtml4(StringUtils.trim(currentProd));
System.out.println("The Product Name is:" + currentProd);

'//span[#class="totalItems"]/following-sibling::span'
should select all span nodes after span with class="totalItems". There might be different approaches of extracting required text content depends on Selenium binding.
This is Python code to get required output:
text = " ".join([span.text for span in driver.find_elements_by_xpath('//span[#class="totalItems"]/following-sibling::span')])
print(text)
# 'Golden Glow(Deluxe)'

As #Michael Kay has answered what you need is to use to or operator!
You can do this with the findElements Selenium.
It should look something like this:
driver.findElements(By.xpath("//*[#class='itemTitleCopy no-mobile' or contains(#class, 'no-mobile') or contains(#class, 'sizeDescriptionTitle no-mobile')]"))
This returns a list of WebElements now you can iterate through them and join the text to create your desired string of "GOLDEN GLOW ( DELUXE )".
All the credit is to #Michael Kay I just gave you the example...

You seem to be confused about the meaning of and and or. The and operator within a predicate means that both conditions must be true: it's more restrictive, so in general less data will be selected. The or operator means either condition must be true: it's more liberal, so more data will be selected.
You seem to be thinking of "and" as meaning "union" - select X and (also select) Y. That's never its meaning in boolean logic.

Use this:
//*[#class=('itemTitleCopy no-mobile','sizeDescriptionTitle no-mobile','no-mobile')]
Hope it will solve.

To extract the text Golden Glow ( Deluxe ) you can use the following Locator Strategy:
Using XPath:
String myString = driver.findElement(By.xpath("//span[#class='m-shopping-cart-item-header-number']")).getText();
String[] parts = myString.split("?<=)");
System.out.println(parts[1]);

Jsoup Select method returns null

I am trying to get the rating of each movie but I cant seem to use the select method in the right way. I am trying to get the 7.0 part from the webpage:
http://www.imdb.com/title/tt0800369/
<div class="star-box giga-star">
<div class="titlePageSprite star-box-giga-star"> 7.0 </div>
I am using this line in java:
Element rating = doc.select("star-box giga-star").first();
System.out.println(rating);
Thanks in advance!

You can select an element by its class using .star-box-giga-star, and use text() to get the textual content of the element.
doc.select(".star-box-giga-star").text();

Problem with your selector is that you are using ancestor child selector instead of .class or element.class like div.star-box. Notice that to use multiple class you need to use element.class1.class2 or just .class1.class2 if you don't want to specify element.
Also if you want to specify parent child relationship you will have to use > so try maybe something like
Document doc = Jsoup.connect("http://www.imdb.com/title/tt0800369/").get();
Element rating = doc
.select("div.star-box.giga-star > div.titlePageSprite.star-box-giga-star")
.first();
System.out.println(rating);
Unfortunately this will print
<div class="titlePageSprite star-box-giga-star">
7.0
</div>
so if you want to get only text contend from that element use System.out.println(rating.text());
BTW since there is only one element with class star-box-giga-star you can just use
String rating = doc.select(".star-box-giga-star").text();
as shown in Alex answer

Select a non significant div tag using jsoup

I'm using jsoup for webscraping and have run into another issue. The div I need information from has no class, id or any special indication. It's buried in the page. Here it is:
<div class="column">
<div class="form-label">Rate: </div>
<div>11.082/11.167</div>
<div class="form-label padding-top">High/Low: </div>
<div>1005.0/0.0004</div>
</div>
I need to get the 1st set of numbers but I'm not sure how I can tell jsoup I want them specifically; does anyone have any advice?

Assuming doc is your Document object...
doc.select('.column > div:eq(1)');
should do the job, you basically select the parent div by class, then get all child div's, but filter the child div's so that the element at index 1 is returned (this is a zero based index, so index 1 is the 2nd element)
Personally, i'd switch to jQuery as it uses a far better selector engine, but each to their own...

Select all divs with class="column"
Loop through your list of selected elements. Select the first div inside your element that has the text Rate:
your Text is inside the 2. div
Sorry Code formatting isnt working o.0
public String getRage(Document document) {
for(Element e : document.getElementsByClass("column")) {
if(e.getElementsByTagName("div").get(0).ownText().equals("Rate: ")) {
return e.getElementsByTagName("div").get(1).ownText();
}
}
return null;
}

Selenium: Extract Text of a div with cssSelector in Java

I am writing a JUnit test for a webpage, using Selenium, and I am trying to verify that the expected text exists within a page. The code of the webpage I am testing looks like this:
<div id="recipient_div_3" class="label_spacer">
<label class="nodisplay" for="Recipient_nickname"> recipient field: reqd info </label>
<span id="Recipient_nickname_div_2" class="required-field"> *</span>
Recipient:
</div>
I want to compare what is expected with what is on the page, so I want to use
Assert.assertTrue(). I know that to get everything from the div, I can do
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ");
but this will return "reqd info * Recipient:"
Is there any way to just get the text from the div ("Recipient") using cssSelector, without the other tags?

You can't do this with a CSS selector, because CSS selectors don't have a fine-grained enough approach to express "the text node contained in the DIV but not its other contents". You can do that with an XPath locator, though:
driver.findElement(By.xpath("//div[#id='recipient_div_3']/text()")).getText()
That XPath expression will identify just the single text node that is a direct child of the DIV, rather than all the text contained within it and its child nodes.

I am not sure if it is possible with one css locator, but you can get text from div, then get text from div's child nodes and subtract them. Something like that (code wasn't checked):
String temp = "";
List<WebElement> tempElements = driver.findElements(By.cssSelector("div[id='recipient_div_3'] *"));
for (WebElement tempElement : tempElements) {
temp =+ " " + tempElement.getText();
}
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ").replace(temp, "");
This is for case when you try to avoid using xpath. Xpath allows to do it:
//div[#id='recipient_div_3']/text()

You could also get the text content of an element and remove the tags with regexp. Also notice: you should use the reluctant quntifier
https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
String getTextContentWithoutTags(WebElement element) {
return element.getText().replaceAll("<[^>]*?/>", "").trim();
}

retrieve data from a class inside another class

I have the following html code
<span class="tag" style="font-size: 12px;">Black Library<span class="count"> (1)</span> </span>
and i want to retrieve number "(1)" from class count inside class tag... how can i do that with jsoup?
code like
Elements num = document.select(".tag count");
is not working.
In fact i want both the "tag" Black Library and the "count" 1.. Any help to do that?
PS. there is another class count, for which the html code is
<li class="gap">Reviews <span class="count">(0)</span></li>
but i dont want that result.

Elements num = document.select(".tag count");
This will select elements with class="tag" attribute and then it will in its children look for <count> elements. But you actually want to look for elements with class="count" attribute. Fix the CSS selector accordingly:
Elements num = document.select(".tag .count");
See also:
W3 CSS3 selectors specification
Jsoup CSS selector syntax cookbook
Jsoup Selector API javadoc

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get severals class same name with JSOUP - java

Related

How to write an XPath using AND Operator to add multiple spans in a single XPath?

Jsoup Select method returns null

Select a non significant div tag using jsoup

Selenium: Extract Text of a div with cssSelector in Java

retrieve data from a class inside another class

Categories

Resources