Jsoup get text from website

Jsoup get text from website - java

I already can navigate in the site and get all the links that i want. But my main objective is getting the commentary of the hotels. The site i am using is this http://www.booking.com/hotel/pt/park-italia-flat.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=637e7af0c3009aa9ea132a960e2d2d40;dcid=4;ucfs=1;room1=A,A;srfid=b8260a1c264a3873291a9061733a43536a4d35c2X979#tab-reviews
I can get where using jsoup no problem but now i dont know how to get the text. I already tried getElementsByTag and getTextand other solutions. Can this be done with jsoup or i need another library.
I am trying this way to get the text. But the text that appears is not what i want.
Document doc ;
try {
doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
Elements scriptElements = doc.getElementsMatchingText("span");
for (Element link : scriptElements ) {
System.out.printf(" Text: <%s> \n", link.text());
}
} catch (IOException ex) {
Logger.getLogger(GetComentsThread.class.getName()).log(Level.SEVERE, null, ex);
}
For getting the urls i using something like this.
Pattern pattern = Pattern.compile("src=destinationfinder");
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
Matcher matcher = pattern.matcher(link.attr("abs:href"));
if (matcher.find()) {
dest = link.attr("abs:href");
break;
}
}
Now i can get some reviews but only the positive dont know why
doc = Jsoup.connect(pair.getValue().toString() + "#tab-reviews").get();
//doc = Jsoup.connect("http://www.booking.com/hotel/pt/pestanaportohotel.pt-pt.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaLsBiAEBmAEvuAEEyAEE2AEB6AEB-AEL;sid=cff2dddd95e71c0768847a554584c888;dcid=4;dist=0;group_adults=2;room1=A%2CA;sb_price_type=total;srfid=798bd6b01ea1dba53ee6b6b945dda1f623859730X2;type=total;ucfs=1&#tab-reviews").get();
String teste="p.trackit";
Elements scriptElements = doc.select(teste);
for (Element link : scriptElements) {
//System.out.printf(" Text: <%s> ...%s\n", link.text(),link.attr("class=\"review_pos\""));
System.out.printf(" Text: <> ...%s\n",link.text());
}

Reviews are loaded using an AJAX request to another url.
There you can get all the info you need.
Response:
<li class="
review_item
clearfix
">
<p class="review_item_date">
16 de Setembro de 2015
</p>
<div class="review_item_reviewer">
<h4>
Beatriz
</h4>
<span class="reviewer_country">
<span class="reviewer_country_flag sflag slang-br">
</span>
Brasil
</span>
</div>
<!-- .review_item_reviewer -->
<div class="review_item_review">
<div class="
review_item_review_container
lang_ltr
seo_reviews_item
">
<div class="review_item_review_header">
<div class="
review_item_header_score_container
">
<div class="review_item_review_score jq_tooltip high_score_tooltip" title="
Excepcional
">
9,6
</div>
</div>
<div class="review_item_header_content_container">
<div class="review_item_header_content seo_review_title">
Excepcional
</div>
</div>
</div>
<ul class="review_item_info_tags">
<li class="review_info_tag"><span class="bullet">•</span> Viagem de lazer</li>
<li class="review_info_tag"><span class="bullet">•</span> Família</li>
<li class="review_info_tag"><span class="bullet">•</span> Apartamento com Varanda</li>
<li class="review_info_tag"><span class="bullet">•</span> Ficou 5 noites</li>
<li class="review_info_tag"><span class="bullet">•</span> Submetido através de dispositivo móvel</li>
</ul>
<div class="review_item_review_content">
<p class="review_pos"><i class="review_item_icon">눇</i>Conforto, perto do centro, perto de um lindo mercado, bem decorado, com todo material necessário para fazer as refeições, Wi-Fi excelente</p>
</div>
</div>
</div>
</li>

looks like you just need to use jsoup to get content from
class="review_pos" and class="review_neg"

Related

How do i loop through divs using jsoup

Hi guys I'm using jsoup in a java webapplication on IntelliJ. I'm trying to scrape data of port call events from a shiptracking website and store the data in a mySQL database. The data for the events is organised in divs with the class name table-group and the values are in another div with the class name table-row. My problem is the divs rows for all the vessel are all the same class name and im trying to loop through each row and push the data to a database. So far i have managed to create a java class to scrape the first row. How can i loop through each row and store those values to my database. Should i create an array list to store the values?
this is my scraper class
public class Scarper {
private static Document doc;
public static void main(String[] args) {
final String url =
"https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Events();
}
public static void Events() {
Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");
List<String> arrayList = new ArrayList();
for (Element ele : elm) {
String event = ele.select("div.col:nth-of-type(2)").text();
String time = ele.select("div.col:nth-of-type(3)").text();
String port = ele.select("div.col:nth-of-type(4)").text();
String vessel = ele.select(".td_vesseltype.col").text();
Event ev = new Event();
System.out.println(event);
System.out.println(time);
System.out.println(port);
System.out.println(vessel);
}
}
}
sample of the div classes i want to scrape
<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
<div class="cs-table">
<div class="heading">
<div class="col" style="width: 10px"></div>
<div class="col" style="width: 110px">Event</div>
<div class="col" style="width: 120px">Time (<span class="tooltip" title="My Time: In your current TimeZone">MT</span>)</div>
<div class="col" style="width: 150px">Port</div>
<div class="col">Vessel</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>BELFAST</div>
<div class="col td_vesseltype"><img src="/icons/icon7_511.png"><span class="padding_18">WILSON BLYTH [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-flag-checkered green"></i></div>
<div class="col">Arrival</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>HUNTERS QUAY</div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18">SOUND OF SOAY [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>LARGS</div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18">LOCH SHIRA [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>RYDE</div>
<div class="col td_vesseltype"><img src="/icons/icon4_511.png"><span class="padding_18">ISLAND FLYER [GB]</span></div>
</div>
</div>

You can start with looping over the table's rows: the selector for the table is .cs-table so you can get the table with Element table = doc.select(".cs-table").first();. Next you can get the table's rows with the selector div.table-row - Elements rows = doc.select("div.table-row"); now you can loop over all the rows and extract the data from each row. The code should look like:
Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
String event = row.select("div.col:nth-of-type(2)").text();
String time = row.select("div.col:nth-of-type(3)").text();
String port = row.select("div.col:nth-of-type(4)").text();
String vessel = row.select(".td_vesseltype.col").text();
System.out.println(event + "-" + time + " " + port + " " + vessel);
System.out.println("---------------------------");
// Do stuff with data here
}
Now it's up to you to decide if you want to keep the data in some array/list inside the loop and use it later, or to insert it directly to your database.

JSoup not able to get links from html

I'm trying to get links from html of a site but unable to do so using Jsoup.
This is the HTML:
<div class="anime_muti_link">
<ul>
<li><div class="doamin">Domain</div><div class="link">Link</div></li>
<li class="anime">
Server m1</div><span>Watch This Link</span>
</li>
<li class="anime">
Server m2</div><span>Watch This Link</span>
</li>
<li class="xstreamcdn">
Xstreamcdn</div><span>Watch This Link</span>
</li>
<li class="mixdrop">
<div class="server mixdrop">Mixdrop</div><span>Watch This Link</span>
</li>
<li class="streamsb">
StreamSB</div><span>Watch This Link</span>
</li>
<li class="doodstream">
Doodstream</div><span>Watch This Link</span>
</li>
</ul>
</div>
This is the android code that I wrote which doesn't seem to work:
try {
Document doc = Jsoup.connect(URL).get();
Elements content = doc.getElementsByClass("anime_muti_link");
Elements links = content.select("a");
String[] urls = new String[links.size()];
for (int i = 0; i < links.size(); i++) {
urls[i] = links.get(i).attr("data-video");
if (!urls[i].startsWith("https://")) {
urls[i] = "https:" + urls[i];
}
}
arrayList.addAll(Arrays.asList(urls));
Log.d("CALLING_URL", "Links: " + Arrays.toString(urls));
} catch (IOException e) {
e.getMessage();
}
Can someone please help me with this? Thanks
Edit: Basically I'm trying to get those 6 links and add them to my list to use it within the app.
Edit 2:
So I found another HTML that can seems better:
<div class="heading-servers">
<span><i class="fa fa-signal"></i> Servers</span>
<ul class="servers">
<li data-vs="https://example.com" class="server server-active" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Netu</li>
<li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">VideoVard</li>
<li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Doodstream</li>
<li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Okstream</li>
</ul>
</div>

As you can see, in this li definition you are including a nested div:
<li class="xstreamcdn">
Xstreamcdn</div><span>Watch This Link</span>
</li>
This is causing that the variable content, the HTML fragment with class anime_muti_link, to look like:
<div class="anime_muti_link">
<ul>
<li>
<div class="doamin">
Domain
</div>
<div class="link">
Link
</div></li>
<li class="anime"> <a href="#" class="active" rel="1" data-video="example.com">
<div class="server m1">
Server m1
</div><span>Watch This Link</span></a> </li>
<li class="anime"> <a href="#" rel="1" data-video="example.com">
<div class="server m1">
Server m2
</div><span>Watch This Link</span></a> </li>
<li class="xstreamcdn"> Xstreamcdn</li>
</ul>
</div>
A similar result will be obtained even if you tidy your HTML. I used this code from one of my previous answers:
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.setIndentContent(true);
tidy.setPrintBodyOnly(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setSmartIndent(true);
tidy.setShowWarnings(false);
tidy.setQuiet(true);
tidy.setTidyMark(false);
org.w3c.dom.Document htmlDOM = tidy.parseDOM(new ByteArrayInputStream(html.getBytes()), null);
OutputStream out = new ByteArrayOutputStream();
tidy.pprint(htmlDOM, out);
String tidiedHtml = out.toString();
// System.out.println(tidiedHtml);
Document document = Jsoup.parse(tidiedHtml);
Elements content = document.getElementsByClass("anime_muti_link");
System.out.println(content);
And this is why you are finding only three anchors.
Please, try correcting your HTML or selecting the anchor tag as the document level instead:
Document document = Jsoup.parse(html);
// Elements content = document.getElementsByClass("anime_muti_link");
// System.out.println(content);
Elements links = document.select("a");
String[] urls = new String[links.size()];
for (int i = 0; i < links.size(); i++) {
urls[i] = links.get(i).attr("data-video");
if (!urls[i].startsWith("https://")) {
urls[i] = "https://" + urls[i];
}
}
System.out.println(Arrays.asList(urls));
If the result obtained contains undesired links, perhaps you can try narrowing the selector used, something like:
document.select(".anime_muti_link a")
If this doesn't work, another possible alternative could be selecting the anchor elements with a data-video attribute, a[data-video]:
Document document = Jsoup.parse(html);
Elements videoLinks = document.select("a[data-video]");
String[] urls = new String[videoLinks.size()];
for (int i = 0; i < videoLinks.size(); i++) {
urls[i] = videoLinks.get(i).attr("data-video");
if (!urls[i].startsWith("https://")) {
urls[i] = "https://" + urls[i];
}
}
System.out.println(Arrays.asList(urls));
With your new test case, you can obtain the desired information with a very similar code:
String html = "<div class=\"heading-servers\">\n" +
" <span><i class=\"fa fa-signal\"></i> Servers</span>\n" +
" <ul class=\"servers\">\n" +
" <li data-vs=\"https://example.com\" class=\"server server-active\" style=\"display: block;\" onclick=\"return loadIframe('ifrm', this.getAttribute('data-vs'));\">Netu</li>\n" +
" <li data-vs=\"https://example.com\" class=\"server\" style=\"display: block;\" onclick=\"return loadIframe('ifrm', this.getAttribute('data-vs'));\">VideoVard</li>\n" +
" <li data-vs=\"https://example.com\" class=\"server\" style=\"display: block;\" onclick=\"return loadIframe('ifrm', this.getAttribute('data-vs'));\">Doodstream</li>\n" +
" <li data-vs=\"https://example.com\" class=\"server\" style=\"display: block;\" onclick=\"return loadIframe('ifrm', this.getAttribute('data-vs'));\">Okstream</li>\n" +
" </ul>\n" +
" </div>";
Document document = Jsoup.parse(html);
Elements videoLinks = document.select("div.heading-servers ul.servers li.server");
String[] urls = new String[videoLinks.size()];
for (int i = 0; i < videoLinks.size(); i++) {
urls[i] = videoLinks.get(i).attr("data-vs");
if (!urls[i].startsWith("https://")) {
urls[i] = "https://" + urls[i];
}
}
System.out.println(Arrays.asList(urls));
The most important part is the definition of the selector that should be applied to the parsed document, div.heading-servers ul.servers li.server in our case.
I provided a selector with many fragments, but depending on the actual use HTML it could be simplified with ul.servers li.server or even li.server.

Jsoup css selector "not", not return anything

I'm trying to ignore an item and not parse it on Jsoup
But css selector "not", not working !!
I don't understand what is wrong ??
my code:
MangaList list = new MangaList();
Document document = getPage("https://3asq.org/");
MangaInfo manga;
for (Element o : document.select("div.page-item-detail:not(.item-thumb#manga-item-5520)")) {
manga = new MangaInfo();
manga.name = o.select("h3").first().select("a").last().text();
manga.path = o.select("a").first().attr("href");
try {
manga.preview = o.select("img").first().attr("src");
} catch (Exception e) {
manga.preview = "";
}
list.add(manga);
}
return list;
html code:
<div class="col-12 col-md-6 badge-pos-1">
<div class="page-item-detail manga">
<div id="manga-item-5520" class="item-thumb hover-details c-image-hover" data-post-id="5520">
<a href="https://3asq.org/manga/gosu/" title="Gosu">
<img width="110" height="150" src="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg" srcset="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg 110w, https://3asq.org/wp-content/uploads/2020/03/IMG_4497-175x238.jpg 175w" sizes="(max-width: 110px) 100vw, 110px" class="img-responsive" style="" alt="IMG_4497"/> </a>
</div>
<div class="item-summary">
<div class="post-title font-title">
<h3 class="h5">
<span class="manga-title-badges custom noal-manga">Noal-Manga</span> Gosu
</h3>

If I debug your code and extract the HTML for:
System.out.println(document.select("div.page-item-detail").get(0)) (hint use the expression evaluator in IntelliJ IDEA (Alt+F8 - for in-session, real-time debugging)
I get:
<div class="page-item-detail manga">
<div id="manga-item-2003" class="item-thumb hover-details c-image-hover" data-post-id="2003">
<a href="http...
...
</div>
</div>
</div>
It looks like you want to extract the next div tag down with class containing item-thumb ... but only if the id isn't manga-item-5520.
So here's what I did to remove that one item
document.select("div.page-item-detail div[class*=item-thumb][id!=manga-item-5520]")
Result size: 19
With the element included:
document.select("div.page-item-detail div[class*=item-thumb]")
Result size: 20
You can also try the following if you want to remain based at the outer div tag rather than the inner div tag.
document.select("div.page-item-detail:has(div[class*=item-thumb][id!=manga-item-5520])")

Java and Selenium: Trouble getting contents of input field

I'm having problems getting the text contents of an input field. I seem to only be getting the things around it with the method I'm using.
Snippet from the page:
(It's a list of itemsincluding an input field in each row.)
The markup:
<ul class="budsjett budsjett--kompakt" id="sifobudsjett">
<li class="budsjett-post ng-isolate-scope ng-valid" id="SIFO_mat">
<div class="felt" >
<div class="felt-indre">
<div id="SIFO_mat-farge" class="sifo-farge farge-graa"></div>
<span class="budsjett-post-beskrivelse" >
<span tabindex="0" title="Vis hjelpetekst" role="button">
<span class="hjelpetekst-label" >Mat og drikke</span>
</span>
<span class="sifo-hjelp" aria-hidden="true"></span>
</span>
</span>
<span class="budsjett-post-verdi">
<span class="budsjett-post-verdi-endret" ng-show="!skrivebeskyttet" aria-hidden="false" style="">
<input id="SIFO_mat-input" name="SIFO_mat" type="number">
<span class="felt-enhet"><abbr id="SIFO_mat-enhet" title="kroner" translate=""><span class="ng-scope">kr</span></abbr></span>
</span>
</span>
</div>
</div>
</li>
The code:
List<WebElement> sifoliste = driver.findElement(By.id("sifobudsjett")).findElements(By.tagName("li"));
Result of first element: "Mat og drikke".
List<WebElement> sifoliste = driver.findElement(By.id("sifobudsjett")).findElements(By.tagName("input"));
Result of first element: ""
List<WebElement> sifoliste = driver.findElement(By.id("sifobudsjett")).findElements(By.className("budsjett-post-verdi-endret"));
Result of first element: "kr"
Any ideas?

The <input> tag doesn't have text, what you see in the UI is kept in the value attribute. It exists even if you can't see it in the html
driver.findElement(By.id("SIFO_mat-input")).getAttribute("value");
For all the <input>s
List<WebElement> sifoliste = driver.findElement(By.id("sifobudsjett")).findElements(By.tagName("input"));
String text = sifoliste.get(0).getAttribute("value"); // 2790

Try
String inputValue = driver.findElement(By.tagName("input")).getAttribute("value");

Issue on parsing Html with jsoup for java

I am trying to parse HTML using jsoup.
I used "try jsoup" to check if parsing of the html is correct.
screenshot of the results : please open this link ^^
My code is :
URL url = new URL("http://tw.search.bid.yahoo.com/search/ac;_ylt=AtqkyTO06sgGHho20HzmPEX3_rF8?ei=UTF-8&p=%E8%A1%A3%E6%9C%8D");
Document doc;
try {
doc = Jsoup.parse(url, 3000);
Elements descriptions = doc.select("div#srp_sl_result"+" div.att-item");
for (Element element : descriptions) {
System.out.println(element.ownText());
System.out.println("--------------");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
But the results are returning empty,
I am getting following output:
--------------
--------------
--------------
I am expecting output like:
女裝手套衣服＊艾爾莎＊暗釦長款披風式毛衣罩衫外套S~L【TAA1166】 出價 799 元 直購 799 元 運費80元 ｜
30 次 ｜ 剩 16小時 60分 賣家：艾爾莎時尚精品 (評價 25229) 在新北市
☆意樂舖☆【塑鋼衣架】ABS強化多功能神奇魔術衣架(收納衣服.領帶.皮帶.肩帶) 出價 35 元 直購 35 元 運費
55元 ｜ 8 次 ｜ 1天 6小時 賣家：意樂舖(創意樂園小舖) (評價 14613) 在新北市
HappyLife【YK1324】韓國超人氣乾濕兩用衣架 防滑魔術衣架 止滑衣架 衣服衣櫃衣櫥收納 出價 25 元 直購
25 元 運費70元 ｜ 16 次 ｜ 2天 3小時 賣家：HappyLife快樂生活網 (評價 14360) 在新北市
Here is some sample HTML from the search page:
<div class="att-item item yui3-g " data-url="https://login.yahoo.com/config/login?.intl=tw&.pd=c%3D3Chd7Yq72e502eh4R99sgUvi5Q--&.done=https%3A%2F%2Ftw.search.bid.yahoo.com%2Fsearch%2Fauction%2Fproduct%3Fei%3DUTF-8%26p%3D%25E8%25A1%25A3%25E6%259C%258D&rr=2465463942">
<div class="yui3-u">
<div class="srp-pdimage">
<img height="120" alt=" (DAJIN達錦衣服設計中心)棒壘球帽字凸繡200元，棒球帽，帽子，棒壘球服，棒球衣 " src="https://s.yimg.com/hg/ac/30/ea/e79010279-ac-4511xf9x0430x0600-s.jpg" />
</div>
</div>
</div>
What should I change in my code?
How to achieve my goal.
Please help me!

You should use the text() method, not ownText(), as the documentation states, it:
Gets the combined text of this element and all its children.
Here is an updated example:
public static void main(String[] args) throws MalformedURLException {
URL url = new URL( "http://tw.search.bid.yahoo.com/search/"
+ "ac;_ylt=AtqkyTO06sgGHho20HzmPEX3_rF8?ei=UTF-8&p=%E8%A1%A3%E6%9C%8D");
Document doc;
try {
doc = Jsoup.parse(url, 3000);
Elements descriptions = doc.select("div#srp_sl_result div.att-item");
for (Element element : descriptions) {
System.out.println(element.text());
System.out.println("--------------");
}
} catch (IOException e) {
e.printStackTrace();
}
}

I've visited the page you are trying to parse and in the browser console I've written:
$('div#srp_sl_result div.att-item')
The search returned a div:
<div class="att-item item yui3-u" data-url="https://login.yahoo.com/config/login?.intl=tw&.pd=c%3D3Chd7Yq72e502eh4R99sgUvi5Q--&.done=https%3A%2F%2Ftw.search.bid.yahoo.com%2Fsearch%2Fauction%2Fproduct%3Fei%3DUTF-8%26p%3D%25E8%25A1%25A3%25E6%259C%258D&rr=3456505015" id="yui_3_14_1_3_1394093660536_452">
<div class="wrap" id="yui_3_14_1_3_1394093660536_451">
<div class="srp-pdimage" id="yui_3_14_1_3_1394093660536_450">
<a href="https://tw.page.bid.yahoo.com/tw/auction/f61398121;_ylt=Ali1FeHY3kStUUeBmGO4vupyFbN8;_ylv=3?u=Y2583393636" id="yui_3_14_1_3_1394093660536_456">
<img width="200" alt=" HappyLife【SP323】納川6+1家庭裝真空收納袋/真空袋/壓縮袋/棉被衣物衣服收納~附吸氣管 " src="https://s.yimg.com/hg/ac/b6/51/f61398121-ac-6849xf8x0600x0400-s.jpg" id="yui_3_14_1_3_1394093660536_455">
</a>
</div>
<div class="srp-pdhead">
<div class="srp-pdinfo">
<a class="srp-bid" href="https://tw.page.bid.yahoo.com/tw/show/bid_hist;_ylt=Ahu0X7QeYNL6gEwV.IhDhWlyFbN8;_ylv=3?aID=f61398121">6 次</a>
<span>出價</span>
<em>399</em>
<span>元</span>
<span class="sep">｜</span>
</div>
<div class="srp-pdprice">
<span>直購</span>
<em>399</em>
<span>元</span>
</div>
</div>
<div class="srp-pdtitle">
HappyLife【SP323】納川6+1家庭裝真空收納袋/真空袋/壓縮袋/棉被衣物衣服收納~附吸氣管
</div>
<div class="srp-pdftitle">
HappyLife【SP323】納川6+1家庭裝真空收納袋/真空袋/壓縮袋/棉被衣物衣服收納~附吸氣管
</div>
<div class="srp-pdstore">
<a class="srp-ico" href="https://tw.help.yahoo.com/auct/policy/protection.html#reward" alt="享買賣家五萬保障"></a>
HappyLife快樂生活網
</div>
</div>
</div>
So I don't understand why you have so many elements returned. In any case element.ownText() returns the text of that div, excluding any inner element, so no text should be shown because that div has no text, only other elements

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Jsoup get text from website - java

looks like you just need to use jsoup to get content from class="review_pos" and class="review_neg"

Related

How do i loop through divs using jsoup

JSoup not able to get links from html

Jsoup css selector "not", not return anything

Java and Selenium: Trouble getting contents of input field

Issue on parsing Html with jsoup for java

Categories

Resources