retrieve data from a class inside another class - java

I have the following html code
<span class="tag" style="font-size: 12px;">Black Library<span class="count"> (1)</span> </span>
and i want to retrieve number "(1)" from class count inside class tag... how can i do that with jsoup?
code like
Elements num = document.select(".tag count");
is not working.
In fact i want both the "tag" Black Library and the "count" 1.. Any help to do that?
PS. there is another class count, for which the html code is
<li class="gap">Reviews <span class="count">(0)</span></li>
but i dont want that result.

Elements num = document.select(".tag count");
This will select elements with class="tag" attribute and then it will in its children look for <count> elements. But you actually want to look for elements with class="count" attribute. Fix the CSS selector accordingly:
Elements num = document.select(".tag .count");
See also:
W3 CSS3 selectors specification
Jsoup CSS selector syntax cookbook
Jsoup Selector API javadoc

Related

How to get the specific information from an element in JSOUP?

I am trying to parse an html file using jsoup. Here is my code:
Document doc;
doc = Jsoup.connect("http://www.marketimyilmazlar.com/index.php?route=product/product&path=64_80&product_id=14102").get();
Elements elements = doc.getElementsByClass("price");
Then, when i look at the elements variable, its content is like the following:
<div class="price">
2.75 TL
<span class="kdv">KDV Dahil</span>
<br />
</div>7
Here, what i want to do is that, I want to get the value "2.75TL". I thought of using elements.get(int index) method, but do not know how to use index variable. Can anyone help me with this?
Thanks
You can use ownText method, e.g.
Elements elements = doc.getElementsByClass("price");
System.out.println(elements.get(0).ownText()); // 2.75 TL
Quite simple, you need to get the text nodes out of the element, and then take the first of it, so the solution is something like:
element.textNodes().get(0);

Jsoup Select method returns null

I am trying to get the rating of each movie but I cant seem to use the select method in the right way. I am trying to get the 7.0 part from the webpage:
http://www.imdb.com/title/tt0800369/
<div class="star-box giga-star">
<div class="titlePageSprite star-box-giga-star"> 7.0 </div>
I am using this line in java:
Element rating = doc.select("star-box giga-star").first();
System.out.println(rating);
Thanks in advance!
You can select an element by its class using .star-box-giga-star, and use text() to get the textual content of the element.
doc.select(".star-box-giga-star").text();
Problem with your selector is that you are using ancestor child selector instead of .class or element.class like div.star-box. Notice that to use multiple class you need to use element.class1.class2 or just .class1.class2 if you don't want to specify element.
Also if you want to specify parent child relationship you will have to use > so try maybe something like
Document doc = Jsoup.connect("http://www.imdb.com/title/tt0800369/").get();
Element rating = doc
.select("div.star-box.giga-star > div.titlePageSprite.star-box-giga-star")
.first();
System.out.println(rating);
Unfortunately this will print
<div class="titlePageSprite star-box-giga-star">
7.0
</div>
so if you want to get only text contend from that element use System.out.println(rating.text());
BTW since there is only one element with class star-box-giga-star you can just use
String rating = doc.select(".star-box-giga-star").text();
as shown in Alex answer

Get severals class same name with JSOUP

Is there a way to get HTML from severals class with same name with the plugin JSoup of Java ?
For example:
<div class="div_idalgo_content_result_date_match_local">
blablabla
</div>
<div class="div_idalgo_content_result_date_match_local">
123456789
</div>
I'd like get blablabla in one String and 123456789 in another.
I wish my question is understandable.
This can be done in several different ways.
If you want to select the div's with the class name above, you can simply use the following:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
This will give you a collection of Element that you can iterate over.
If you after that would like to select perhaps only the first one, you can use the :eq(0)-parameter, or the first()-parameter.
Element firstDiv = div.first();
OR
Elements div = doc.select("div.div_idalgo_content_result_date_match_local:eq(0)");
Note that the second method you are selecting from the document, while in the first method you select from the collection of Element's. You can of course also change the value of the :eq(0) to something else that matches your element. There are many useful selectors that you can use that I have included a link to in the end of the answer.
The following code will split your div's into two:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
Element firstDiv = div.first();
Element secondDiv = div.get(1);
System.out.println("This is the first div: " + firstDiv.text());
System.out.println("This is the second div: " + secondDiv.text());
JSoup Cookbook - Selector syntax

JSoup - how to grab a href (url/link) immediately preceding a <span class = *>?

Given the following :
<li class="med grey mkp2">
<span class="price bld">$28.15</span> new <span class="grey">(14 offers)</span> </li>
I need to grab the href, which sounds simple, right? However the only way I can find the correct list item to grab from is to get the <span class="price bld">, so the href I need preceeds it. It's similar to Extracting href from a class within other div/id classes with jsoup, but in reverse.
There can be many list items with the css class "med grey mkp2", but I only need content from the ones with the noted span with class="price bld".
How can I achieve this?
You can only select the target element (the <a>), not the child element (the <span>), otherwise it would only return <span> elements. In this particular case, you can use the :has() selector to check if the target element has the desired child element.
Elements elements = document.select("a:has(.price.bld)");
See also:
Jsoup selector cookbook
:has(seletor): find elements that contain elements matching the selector; e.g. div:has(p)

get elements from html parser

I'm using JSOUP, and trying to get the elements which start with a particular div tag id. For example:
<div id="test123">.
I need to check if the elements starts with the string "test" and get all the elements.
I looked at http://jsoup.org/cookbook/extracting-data/selector-syntax and I tried a multiple variations using:
doc.select("div:matches(test(*))");
But it still didn't work. Any help would be much appreciated.
Use the attribute-starts-with selector [attr^=value].
Elements elements = doc.select("div[id^=test]");
// ...
This will return all <div> elements with an id attribute starting with test.

Categories

Resources