How can I parse ul elements in a HTML document with a specific class type using Java?
I want to pars this section from HTML:
<ul class="news-list">
<li>
<a onclick="AjaxStatManager('Content','1258')" href="http://www.gyte.edu.tr/icerik/120/1258/kim-101-final-mazeret-sinavi.aspx" target="_self">
<div class="text">
<h2>KİM 101 Final Mazeret Sınavı</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1248')" href="http://www.gyte.edu.tr/icerik/120/1248/butunleme-sinav-tarihleri.aspx" target="_self">
<div class="text">
<h2>Bütünleme Sınav Tarihleri</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1242')" href="http://www.gyte.edu.tr/icerik/120/1242/bil-374-internet-teknolojileri-final-sinavi.aspx" target="_self">
<div class="text">
<h2>Bil 374 İnternet Teknolojileri Final Sınavı</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1241')" href="http://www.gyte.edu.tr/icerik/120/1241/kim101-final-sinavi.aspx" target="_self">
<div class="text">
<h2>Kim101 Final Sınavı </h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1222')" href="/Files/UserFiles/85/duyurular/yeterlilik.pdf" target="_self">
<div class="text">
<h2>Doktora Yeterlilik Sınav Tarihleri</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1221')" href="/Files/UserFiles/85/duyurular/duyuru-dokt-seminer.pdf" target="_self">
<div class="text">
<h2>Doktora Programı Adaylarına Önemli Duyuru</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1127')" href="http://www.gyte.edu.tr/icerik/120/1127/20122013-egitimogretim-yili-guz-yari-yili--final-programi.aspx" target="_self">
<div class="text">
<h2>2012-2013 Eğitim-Öğretim Yılı Güz Yarı Yılı Final Programı</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1109')" href="/Files/UserFiles/85/duyurular/Yüksek Lisans Doktora Seminer I ve II Sunum Takvimi.pdf" target="_self">
<div class="text">
<h2>Yüksek Lisans / Doktora Seminer I ve II Sunum Takvimi</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','998')" href="http://www.gyte.edu.tr/icerik/120/998/bilgisayar-muhendisligi-bolumu-20122013-guz-yari-yili-ders-programlari.aspx" target="_self">
<div class="text">
<h2>Bilgisayar Mühendisliği Bölümü 2012-2013 Güz Yarı Yılı Ders Programları</h2>
<p>Bilgisayar Mühendisliği Bölümü 2012-2013 Güz Yarı Yılı Ders Programları</p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1101')" href="http://www.gyte.edu.tr/icerik/120/1101/kim-101-kimya-dersi---ii-vizesi.aspx" target="_self">
<div class="text">
<h2>KİM 101 Kimya Dersi II .vizesi</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1073')" href="/Files/duyuru/bilgisayar_muh/Yuksek_lisans_-_Doktora_Seminer_I_-_II.pdf" target="_self">
<div class="text">
<h2>Yüksek Lisans/Doktora Seminer I ve II Ders Planı</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1058')" href="/Files/duyuru/bilgisayar_muh/bil495-496syl.pdf" target="_self">
<div class="text">
<h2>BIL 495/496 Bitirme Projesi Ders Planı</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','1006')" href="/Files/duyuru/bilgisayar_muh/duy-ders2013guz_1.doc" target="_self">
<div class="text">
<h2>G.Y.T.E. Lisans Üstü Öğrencilerinin Dikkatine</h2>
<p></p>
</div>
</a>
</li>
<li>
<a onclick="AjaxStatManager('Content','984')" href="http://www.gyte.edu.tr/icerik/120/984/bil-341-programlama-dilleri-butunleme-sinavi.aspx" target="_self">
<div class="text">
<h2>BİL 341 Programlama Dilleri bütünleme sınavı</h2>
<p></p>
</div>
</a>
</li>
</ul>
I have following code to parse but it does not work:
try {
URL url = new URL("http://www.gyte.edu.tr/kategori/120/0/duyurular.aspx");
HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream());
kit.read(HTMLReader, doc, 0);
ElementIterator it = new ElementIterator(doc);
Element elem;
while ((elem = it.next()) != null) {
AttributeSet as = elem.getAttributes();
if (as.containsAttribute("class", "news-list")) {
int c = elem.getElementCount();
System.out.println("Element count = " + c);
}
}
} catch (IOException | BadLocationException e) {
e.printStackTrace();
return e.getMessage();
}
return "Success!";
You could load it into a Document object. This will read in the HTML for you and you can iterate/query using available methods.
I think it is work for an XPATH query.
XPath xpath = XPathFactory.newInstance().newXPath();
String expression= "//ul[#class = 'news-list']";
InputSource inputSource = new InputSource("your.html");
NodeSet nodes = (NodeSet) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
Here is the JSoup solution:
try {
Document doc = Jsoup.parse(new URL("http://www.gyte.edu.tr/kategori/120/0/duyurular.aspx"), 1000000);
Elements elements = doc.getElementsByAttributeValue("class", "news-list");
System.out.println(elements.size());
for (Element e : elements) {
System.out.println(e.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
and the output:
<ul class="news-list">
<li> <a onclick="AjaxStatManager('Content','1258')" href="http://www.gyte.edu.tr/icerik/120/1258/kim-101-final-mazeret-sinavi.aspx" target="_self">
<div class="text">
<h2>KİM 101 Final Mazeret Sınavı</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1248')" href="http://www.gyte.edu.tr/icerik/120/1248/butunleme-sinav-tarihleri.aspx" target="_self">
<div class="text">
<h2>Bütünleme Sınav Tarihleri</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1242')" href="http://www.gyte.edu.tr/icerik/120/1242/bil-374-internet-teknolojileri-final-sinavi.aspx" target="_self">
<div class="text">
<h2>Bil 374 İnternet Teknolojileri Final Sınavı</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1241')" href="http://www.gyte.edu.tr/icerik/120/1241/kim101-final-sinavi.aspx" target="_self">
<div class="text">
<h2>Kim101 Final Sınavı </h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1222')" href="/Files/UserFiles/85/duyurular/yeterlilik.pdf" target="_self">
<div class="text">
<h2>Doktora Yeterlilik Sınav Tarihleri</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1221')" href="/Files/UserFiles/85/duyurular/duyuru-dokt-seminer.pdf" target="_self">
<div class="text">
<h2>Doktora Programı Adaylarına Önemli Duyuru</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1127')" href="http://www.gyte.edu.tr/icerik/120/1127/20122013-egitimogretim-yili-guz-yari-yili--final-programi.aspx" target="_self">
<div class="text">
<h2>2012-2013 Eğitim-Öğretim Yılı Güz Yarı Yılı Final Programı</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1109')" href="/Files/UserFiles/85/duyurular/Yüksek Lisans Doktora Seminer I ve II Sunum Takvimi.pdf" target="_self">
<div class="text">
<h2>Yüksek Lisans / Doktora Seminer I ve II Sunum Takvimi</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','998')" href="http://www.gyte.edu.tr/icerik/120/998/bilgisayar-muhendisligi-bolumu-20122013-guz-yari-yili-ders-programlari.aspx" target="_self">
<div class="text">
<h2>Bilgisayar Mühendisliği Bölümü 2012-2013 Güz Yarı Yılı Ders Programları</h2>
<p>Bilgisayar Mühendisliği Bölümü 2012-2013 Güz Yarı Yılı Ders Programları</p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1101')" href="http://www.gyte.edu.tr/icerik/120/1101/kim-101-kimya-dersi---ii-vizesi.aspx" target="_self">
<div class="text">
<h2>KİM 101 Kimya Dersi II .vizesi</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1073')" href="/Files/duyuru/bilgisayar_muh/Yuksek_lisans_-_Doktora_Seminer_I_-_II.pdf" target="_self">
<div class="text">
<h2>Yüksek Lisans/Doktora Seminer I ve II Ders Planı</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1058')" href="/Files/duyuru/bilgisayar_muh/bil495-496syl.pdf" target="_self">
<div class="text">
<h2>BIL 495/496 Bitirme Projesi Ders Planı</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','1006')" href="/Files/duyuru/bilgisayar_muh/duy-ders2013guz_1.doc" target="_self">
<div class="text">
<h2>G.Y.T.E. Lisans Üstü Öğrencilerinin Dikkatine</h2>
<p></p>
</div> </a> </li>
<li> <a onclick="AjaxStatManager('Content','984')" href="http://www.gyte.edu.tr/icerik/120/984/bil-341-programlama-dilleri-butunleme-sinavi.aspx" target="_self">
<div class="text">
<h2>BİL 341 Programlama Dilleri bütünleme sınavı</h2>
<p></p>
</div> </a> </li>
</ul>
Related
This question already has answers here:
NullPointerException Parsing Jsoup
(1 answer)
What is a NullPointerException, and how do I fix it?
(12 answers)
Closed 3 years ago.
I have a problem that I do not know how to solve. While fetching data I get NPE. It is weird, because for other categories of book it works normally.
String romancesCategoryEmpikURL = "https://www.empik.com/ksiazki/poradniki";
Document document = Jsoup.connect(romancesCategoryEmpikURL).get();
List<Element> siteElements = document.select("div.productBox__info");
List<Book> romanceCategoryBooks = new ArrayList<>();
for (int i = 0; i < 15; i++) {
String author = siteElements.get(i).select("span > a").first().ownText();
romanceCategoryBooks.add(new Book.BookBuilder()
.withAuthor(author)
.withPrice(price)
.withTitle(title)
.withProductID(productID)
.withBookURL(BookURL)
.build());
}
NPE occurs with fetching author from site: https://www.empik.com/ksiazki/poradniki
HTML code:
<div class="productBox__info">
<a href="/jak-uratowac-swiat-czyli-co-dobrego-mozesz-zrobic-dla-planety-szpura-areta,p1223701396,ksiazka-p" class="productBox seoTitle" title="Jak uratować świat? Czyli co dobrego możesz zrobić dla planety - Szpura Areta" data-product-id="p1223701396">
<span class="productBox__title">
<span class="productBox__number">1</span>
Jak uratować świat? Czyli co dobrego możesz zrobić dla planety
</span>
</a>
<span class="productBox__subtitle">
<a href="/szukaj/produkt?author=szpura+areta" class="smartAuthor" title="Szpura Areta - wszystkie produkty">
Szpura Areta </a>
</span>
<div class="rating">
<ul class="ratingStars"><li class="rate"><i class="fa fa-fw fa-star active"></i></li><li class="rate"><i class="fa fa-fw fa-star active"></i></li><li class="rate"><i class="fa fa-fw fa-star active"></i></li><li class="rate"><i class="fa fa-fw fa-star active"></i></li><li class="rate"><i class="fa fa-fw fa-star active"></i></li></ul>
<div class="score">
4.7/5
</div>
</div>
<div class="productBox__price">
<div class="productBox__priceItem productBox__priceItem--promotion ta-productlist-price ">
37,49 zł </div>
<div class="productBox__priceItem productBox__priceItem--old ta-productlist-oldprice">
49,99 zł </div>
</div>
</div>
I want to fetch author which is Szpura Areta.
So i have the following HTML Code of a listbox here:
<div role="listbox" aria-expanded="false" class="quantumWizMenuPaperselectEl docssharedWizSelectPaperselectRoot freebirdFormviewerViewItemsSelectSelect freebirdThemedSelectDarkerDisabled" jscontroller="YwHGTd" jsaction="click:cOuCgd(LgbsSe); keydown:I481le; keypress:Kr2w4b; mousedown:UX7yZ(LgbsSe),npT2md(preventDefault=true); mouseup:lbsD7e(LgbsSe); mouseleave:JywGue; touchstart:p6p2H(LgbsSe); touchmove:FwuNnf; touchend:yfqBxc(LgbsSe|preventMouseEvents=true|preventDefault=true); touchcancel:JMtRjd(LgbsSe); focus:AHmuwe; blur:O22p3e;b5SvAb:TvD9Pc;" jsshadow="" jsname="W85ice" aria-describedby="i.desc.709120473 i.err.709120473" aria-labelledby="i73">
<div jsname="LgbsSe" role="presentation">
<div class="quantumWizMenuPaperselectOptionList" jsname="d9BH4c" role="presentation">
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption isSelected isPlaceholder" jsname="wQNmvb" jsaction="" data-value="" aria-selected="true" role="option" tabindex="0">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">Auswählen</content>
</div>
<div class="quantumWizMenuPaperselectOptionSeparator" role="presentation"></div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="140 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">140 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="141 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">141 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="142 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">142 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="143 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">143 cm</content>
</div>
</div>
<div class="quantumWizMenuPaperselectDropDown exportDropDown" role="presentation"></div>
</div>
<div class="exportSelectPopup quantumWizMenuPaperselectPopup" jsaction="click:dPTK6c(wQNmvb); mousedown:uYU8jb(wQNmvb); mouseup:LVEdXd(wQNmvb); mouseover:nfXz1e(wQNmvb); touchstart:Rh2fre(wQNmvb); touchmove:hvFWtf(wQNmvb); touchend:MkF9r(wQNmvb|preventMouseEvents=true)" role="presentation" jsname="V68bde" style="display:none;"></div>
</div>
I am writing an program which has to select an element of this listbox automatically in java (like "140 cm", "141 cm" like you see in the code etc...). I tried to access the listbox itself with the following code:
WebElement checkBox = driver.findElement(By.cssSelector("div[aria-labelledby*=i73]"));
CheckBox.click();
It worked but now i have to select somehow an element of this listbox. I tried it with the 'Select'-Command, which did not work:
Select listbox = new Select(checkBox);
listbox.selectByVisibleText("140 cm");
I also tried it with clicking on the specific div with the '140 cm' text and waiting for its clickability. But I get a timeout exception because it failed to wait for the element to be clickable.
WebElement boxElement = driver.findElement(By.cssSelector("div[data-value*='140']"));
WebDriverWait wait = new WebDriverWait(driver, 10);
boxElement = wait.until(ExpectedConditions.elementToBeClickable(By.cssSelector("div[data-value*='140']")));
boxElement.click();
I am desperate and do not know what to do. Can any of you guys help me? I am thankfully for every answer!
greetings
The items in my itemList are incomplete! For some reason from the 10th iteration of my loop to the last
el.select(".item").select(".img").select(".pic").select(".picRind").select(".picCore").attr("src")
returns a empty string and I can't understand why
0-9th iteration is perfectly find though. I went through the html and my code should work for every li I'm iterating through.
private Document getHtmlDocument() throws IOException {
document = Jsoup.connect(url).get();
return document;
}
public List<AliExpressItem> getAliExpressItemList() throws IOException {
Document document;
Element ul;
Elements ulLi;
document = getHtmlDocument();
ul = document.getElementById("hs-below-list-items");
ulLi = ul.getElementsByClass("list-item");
List<AliExpressItem> itemList = new ArrayList<>();
for(Element el : ulLi) {
AliExpressItem item = new AliExpressItem();
item.setImage(el.select(".item")
.select(".img")
.select(".pic")
.select(".picRind")
.select(".picCore")
.attr("src"));
item.setDescription(el.select(".item")
.select(".info")
.select("h3")
.select("a")
.text());
item.setPrice(el.select(".item")
.select(".info")
.select(".price")
.select(".value")
.text());
itemList.add(item);
}
return itemList;
}
Theres a ul with 48 li's inside. The above code should work for all 48 li's
<li qrdata="|32805326364|cn1511315262" pub-catid="200247142" sessionid="201711160635492248862329348280002056372" class="list-item list-item-first ">
<div class="item">
<div class="img img-border">
<div class="pic">
<a class="picRind history-item j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.1.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0" target="_blank" data-spm-anchor-id="2114.search0204.3.1"><img class="picCore pic-Core-v" src="//ae01.alicdn.com/kf/HTB1RUjgQFXXXXayXXXXq6xXFXXX4/Hot-Sale-Novelty-Toys-Hand-font-b-Spinner-b-font-Anti-stress-toys-fidget-font-b.jpg_220x220.jpg" alt="Hot Sale Novelty Toys Hand Spinner Anti stress toys fidget spinners For Autism and ADHD reliever stress spinner(China)"></a>
</div>
</div>
<div class="info">
<h3>
<a class="history-item product j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.2.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0" title="Hot Sale Novelty Toys Hand Spinner Anti stress toys fidget spinners For Autism and ADHD reliever stress spinner" target="_blank" data-spm-anchor-id="2114.search0204.3.2">Hot Sale Novelty Toys Hand <font><b>Spinner</b></font> Anti stress toys fidget <font><b>spinners</b></font> For Autism and ADHD reliever stress <font><b>spinner</b></font></a>
</h3>
<span class="price price-m">
<span class="value" itemprop="price">US $1.99</span>
<span class="separator">/</span>
<span class="unit">unidad</span>
</span>
<strong class="free-s">Envío gratis</strong>
<div class="rate-history">
<span rel="nofollow" class="order-num">
<a class="order-num-a j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.3.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0#thf" rel="nofollow" target="_blank" data-spm-anchor-id="2114.search0204.3.3"><em title="Pedido totales"> Ventas (0)</em></a>
</span>
</div>
</div>
<div class="info-more">
<div class="aplus-sp-main">
<div class="sp-box">
</div>
</div>
<div class="store-name-chat">
<div class="store-name util-clearfix">
Alisa's cabin
</div>
</div>
<a class="score-dot" href="//www.aliexpress.com/store/feedback-score/1308215.html?spm=2114.search0204.3.5.Lwk2KD" rel="nofollow" data-spm-anchor-id="2114.search0204.3.5"><span class="score-icon-new score-level-22" id="score1" feedbackscore="1,276" sellerpositivefeedbackpercentage="93.7"></span></a>
<div class="add-to-wishlist">
<a class="atwl-button j-p4plog" href="javascript:;" data-product-id="32805326364" data-batman-id="ja2kvte8" data-spm-anchor-id="2114.search0204.3.6">Añadir a Lista Deseos</a>
</div>
<input class="atc-product-id" type="hidden" value="32805326364">
<input class="atc-product-standard" type="hidden" value="">
</div>
</div>
I want to extract some data from many links from xbox. The problem I am experiencing is that in the section where the price is shown, the structure is different if the game is with discount (for example).
The code I have written to scrap the price:
String urlPage = "https://www.microsoft.com/en-us/store/p/call-of-duty-advanced-warfare-gold-edition/c20hl06x0v8w" ;
System.out.println("Comprobando entradas de: "+urlPage);
if (getStatusConnectionCode(urlPage) == 200) {
Document document = getHtmlDocument(urlPage);
Elements entradas = document.select("div.m-product-detail-hero-product-placement div.price-info");
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("srv_saleprice").text();
}
}else{
System.out.println("El Status Code no es OK es: "+getStatusConnectionCode(urlPage));
}
The HTML for a game that has no discount:
URL for first case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<span>$59.99</span>
<sup>+</sup>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="59.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
And for a game with discount:
URL for the second case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<s class="srv_saleprice" aria-label="Full price was $159.99">$159.99</s>
<span> </span>
<div class="price-disclaimer">
<span>$135.99</span>
<sup>+</sup>
</div>
<span> </span>
<span></span>
</div>
<div class="caption text-muted srv_countdown">
<span class="sub">save $24.00</span>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="135.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
In this second example the value inside elements is $135.99 but is not the game base price ($159.99 in this case).
How could I extract only the base price for every game (with or without) discount?
I want to select from following ul all children with Jsoup.
<ul class="breadcrumbs xlarge-12 columns hide-for-small-only">
<li>
<a href="http://www.thalia.de/shop/home/show/">
Home
</a>
</li>
<li>
<a href="http://www.thalia.de/shop/buecher/show/">
Bücher
</a>
</li>
<li>
<a href="http://www.thalia.de/shop/fachbuecher-115/show/">
Fachbücher
</a>
</li>
<li>
<a href="http://www.thalia.de/shop/chemie-143/show/">
Chemie
</a>
</li>
</ul>
Here is my code, but this gives me only the first two elements. What am I doing wrong?
Elements category = doc.select("div.ncMain.productMainView");
Elements category2 = category.select("ul.breadcrumbs.xlarge-12.columns.hide-for-small-only");
Elements category3 = category2.select("ul li a");
String categoryString = category3.text();