Jsoup Select method returns null - java

I am trying to get the rating of each movie but I cant seem to use the select method in the right way. I am trying to get the 7.0 part from the webpage:
http://www.imdb.com/title/tt0800369/
<div class="star-box giga-star">
<div class="titlePageSprite star-box-giga-star"> 7.0 </div>
I am using this line in java:
Element rating = doc.select("star-box giga-star").first();
System.out.println(rating);
Thanks in advance!

You can select an element by its class using .star-box-giga-star, and use text() to get the textual content of the element.
doc.select(".star-box-giga-star").text();

Problem with your selector is that you are using ancestor child selector instead of .class or element.class like div.star-box. Notice that to use multiple class you need to use element.class1.class2 or just .class1.class2 if you don't want to specify element.
Also if you want to specify parent child relationship you will have to use > so try maybe something like
Document doc = Jsoup.connect("http://www.imdb.com/title/tt0800369/").get();
Element rating = doc
.select("div.star-box.giga-star > div.titlePageSprite.star-box-giga-star")
.first();
System.out.println(rating);
Unfortunately this will print
<div class="titlePageSprite star-box-giga-star">
7.0
</div>
so if you want to get only text contend from that element use System.out.println(rating.text());
BTW since there is only one element with class star-box-giga-star you can just use
String rating = doc.select(".star-box-giga-star").text();
as shown in Alex answer

Related

How to find element having dynamic ID / Name changing on every page load using Selenium WebDriver

I found a few answers about this but they did not answer my question an so I am writing a new question.
I have HTML code having below kind of checkbox elements (in browser's inspect element)
<input role="checkbox" type="checkbox" id="jqg_TransactionFormModel501EditCollection2_147354_grid_-1274" class="cbox" name="jqg_TransactionFormModel501EditCollection2_147354_grid_-1274" value="true">
In my test case I want to click on checkbox using its ID using Selenium Webdriver.
here Id= "jqg_TransactionFormModel501EditCollection2_147354grid-1274" is dynamic.
in above id, Bold & Italic marked letters (dynamic) will change with different check boxes in same page as well as page refresh.
Bold marked letters (dynamic) will change on page refresh only (remain same through all the check boxes in same page.)
How shoud I format/write XPATH so that I can click on desired check boxes using below statement.
WebElement checkbox = webDriver.findElement(By.id("idOfTheElement"));
if (!checkbox.isSelected()) {
checkbox.click();
}
Thanks for your help in advance.. !
Here are a few examples of xpaths which you can use to find your checkbox
//input[contains(#id,'jqg_TransactionFormModel')]
OR, if you want more checks, try something like
//*[starts-with(#id,'jqg_TransactionFormModel') and contains(#id,'EditCollection2_')]
Additionally, you can try regex as well using matches
//*[matches(#id,'<regex matching your id>')]
You can use partial ID using cssSelector or xpath
webDriver.findElement(By.cssSelector("[id*='TransactionFormModel']"));
webDriver.findElement(By.xpath("//input[contains(#id, 'TransactionFormModel')]"));
You can replace TransactionFormModel with any other fixed part of the ID.
As a side note, no need to locate the element twice. You can do
WebElement checkbox = webDriver.findElement(By.id("idOfTheElement"));
if (!checkbox.isSelected()) {
checkbox.click();
}
You can write xpath like this :
//input[starts-with(#id,'jqg_TransactionFormModel')]
//input[contains(#id,'jqg_TransactionFormModel')]
I recommend not using ID's or xpath and adding a data attribute to your elements. This way, you can avoid the annoyance of xpath and have a strong selector that you feel confident will always be selectable.
For example, you could call the attribute: data-selector. You can assign a value of "transactionFormModelCheckbox". Then, when creating a new element, you create by css selector with the value referenced above.
No matter what type of formatting, you'll always be able to select that checkbox - as long as the element exists in the DOM. If you need to investigate other attributes, you can use the selector to do things like hasClass() or anything else you need.
Hope that helps !

Java - jsoup get element with specific string

I would like to select an element that matches a specific String
<img src='http://iblink.ch/resized/sjg63ngi3h3g4a.jpg' alt='tree'>
since I don't have a specific class or div to trigger I try to use getElementsContainingOwnText("resized")
method to get this element.
But it does not find it?
I also try: getElementsContainingText
Same output :(
Anyone have any idea?
The text is the part outside the tags: <tag attribute="value">Text</tag>
So you want to select Elements with a certain attribute value like this:
Elements els = doc.select("img[src*=resized]");
Have a look into CSS selectors as they are implemented in Jsoup.

How to get the specific information from an element in JSOUP?

I am trying to parse an html file using jsoup. Here is my code:
Document doc;
doc = Jsoup.connect("http://www.marketimyilmazlar.com/index.php?route=product/product&path=64_80&product_id=14102").get();
Elements elements = doc.getElementsByClass("price");
Then, when i look at the elements variable, its content is like the following:
<div class="price">
2.75 TL
<span class="kdv">KDV Dahil</span>
<br />
</div>7
Here, what i want to do is that, I want to get the value "2.75TL". I thought of using elements.get(int index) method, but do not know how to use index variable. Can anyone help me with this?
Thanks
You can use ownText method, e.g.
Elements elements = doc.getElementsByClass("price");
System.out.println(elements.get(0).ownText()); // 2.75 TL
Quite simple, you need to get the text nodes out of the element, and then take the first of it, so the solution is something like:
element.textNodes().get(0);

retrieve data from a class inside another class

I have the following html code
<span class="tag" style="font-size: 12px;">Black Library<span class="count"> (1)</span> </span>
and i want to retrieve number "(1)" from class count inside class tag... how can i do that with jsoup?
code like
Elements num = document.select(".tag count");
is not working.
In fact i want both the "tag" Black Library and the "count" 1.. Any help to do that?
PS. there is another class count, for which the html code is
<li class="gap">Reviews <span class="count">(0)</span></li>
but i dont want that result.
Elements num = document.select(".tag count");
This will select elements with class="tag" attribute and then it will in its children look for <count> elements. But you actually want to look for elements with class="count" attribute. Fix the CSS selector accordingly:
Elements num = document.select(".tag .count");
See also:
W3 CSS3 selectors specification
Jsoup CSS selector syntax cookbook
Jsoup Selector API javadoc

HtmlUnit accessing an element without id or Name

How can I access this element:
<input type="submit" value="Save as XML" onclick="some code goes here">
More info: I have to access programmatically a web page and simulate clicking on a button on it, which then will generate a xml file which I hope to be able to save on the local machine.
I am trying to do so by using HtmlUnit libraries, but all examples I could find use getElementById() or getElementByName() methods. Unfortunately, this exact element doesn't have a name or Id, so I failed miserably. I supposed then that the thing I have to do is use the getByXPath() method but I got completely lost into XPath documentation(this matter is all new to me).
I have been stuck on this for a couple of hours so I really need all the help I can get.
Thanks in advance.
There are several options for an XPATH to select that input element.
Below is one option, which looks throughout the document for an input element that has an attribute named type with the value "submit" and an attribute named value with the value "Save as XML".
//input[#type='submit' and #value='Save as XML']
If you could provide a little bit more structure, a more specific (and efficient) XPATH could be created. For instance, something like this might work:
/html/body//form//input[#type='submit' and #value='Save as XML']
You should be able to use the XPATH with code like this:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
submitButton = page.getByXPath("/html/body//form//input[#type='submit' and #value='Save as XML']")
Although I would, in most cases, recommend using XPath, if you don't know anything about it you can try the getInputByValue(String value) method. This is an example based on your question:
// Fetch the form somehow
HtmlForm form = this.page.getForms().get(0);
// Get the input by its value
System.out.println(form.getInputByValue("Save as XML").asXml());

Categories

Resources