Jsoup parsing from String - java

I am new with using Jsoup and i have a problem to get the text value from div with class name text as a string.
This is the string that a want to scrap.
<body>
<div class="details ">
<div class="title turquoise2">
AAC-Olympia
</div>
<div class="subhead turquoise2">
Correspondentie-adres:
</div>
<div class="text">
Rijdt 37
<br /> 6631AP HORSSEN
<br /> 0487-541339
</div>
<div class="subhead turquoise2">
Accommodatie:
</div>
<div class="text">
Sportpark De Polenkamp
<br /> Bredestraat 3
<br /> 6631BC HORSSEN
<br /> 0487-541339
</div>
<div class="subhead turquoise2">
Opgericht:
</div>
<div class="text">
01-07-2011
</div>
<div class="subhead turquoise2">
Tenue:
</div>
<div class="text">
Shirt: Wit
<br /> Broek: Zwart
<br /> Kousen: Zwart
</div>
<div class="subhead turquoise2">
Regio:
</div>
<div class="text">
Veldregio: Regio 4 veld
<br /> Zaalregio:
</div>
<div class="subhead turquoise2">
Info:
</div>
<div class="text">
Relatienummer: NXTG36Z
<br /> Email:
janberg37#Caiway.nl
<br /> Website:
http://www.aac-olympia.nl
<br /> District: Oost
</div>
<div class="subhead turquoise2">
Klasse(s):
</div>
<div class="text">
Klasse za:
<br /> Klasse zon: 5e klasse
<br /> Klasse zaal:
<br /> Junioren: Nee
<br /> Pupillen: Nee
<br /> Vrouwen: Nee
<br /> G-Voetbal: Nee
</div>
<div class="text">
Overzicht indeling district Oost
</div>
</div>
<div class="details details-functionaris">
<div class="title turquoise2">
AAC-Olympia
</div>
<div class="voorzitter">
</div>
<div class="secretaris">
</div>
<div class="penningmeester">
</div>
<div class="functionarissen">
</div>
</div>
</body>
I want to get from second div with class name text following information separate, i tried following code but gives me empty string,
Element Adres = finalDocument.getElementsByClass("text").get(1);
String AllTextValue = Adres.text();//This give me all information from the div
But i want all 4 text value apart,
String firstText = For this one i have no ieee what i need to do
String SecondText = Adres.getElementsByTag("br").get(0).text();//Returns Empty value
String ThirdText = Adres.getElementsByTag("br").get(1).text();//Returns Empty value
String FourthText = Adres.getElementsByTag("br").get(2).text();//returns Empty value
Can somebody help me.
Thank i lot.

Elements implements the List interface so just use:
Elements Email = finalDocument.getElementsByTag("a");
String emailAddress = Email.get(0).text();
Naming the Elements object Email is slightly misleading. I would recommend the following refactored code:
Elements anchors = finalDocument.getElementsByTag("a");
String email = anchors.get(0).text();

Related

How to iterate many arrays simultaniously inside a same loop

I am new with thymeleaf, and I want to display 3 values from 3 different arrays with the same index, inside the same div.row, I tried several ways but I only could iterate one array at a time without errors, below is my Controller side:
public String index(Model model) {
String[] table0 = {"0","1","2","3"}
String[] table1 = {"14","21","25","75"}
String[] table2 = {"7","63","57","87"}
model.addAttribute("table0", table0;
model.addAttribute("table1", table1);
model.addAttribute("table2", table2);
return "index";
}
Inside the html file, table0 is the first array iterated without errors, I don't know how to edit/improve the following code to display all the three arrays tables0, tables1 and tables3 at the same time:
<div class="row" th:each="v0 : ${tables0}" >
<div class="cell" th:text="value">
<!-- Here I could display a value from tables0 -->
</div>
<div class="cell" >
<!-- Here I need to display the value of tables1 having the same index as v0 -->
</div>
<div class="cell" >
<!-- Here I need to display the value of tables2 having the same index as v0 -->
</div>
</div>
here you could find what you're searching about , keeping iteration status
by simply adding a var after the object , and use index to get the current index value
by example :
<div class="row" th:each="v0,iter : ${tables0}" >
<div class="cell" th:text="value">
<!-- Here I could display a value from tables0 -->
<span th:text="${v0}"></span>
</div>
<div class="cell" >
<span th:text="${table1[iter.index]}"></span>
</div>
<div class="cell" >
<span th:text="${table2[iter.index]}"></span>
</div>
</div>
You can use Thymeleaf's iterStat to do this.
Assuming the following input data:
String[] table0 = {"0", "1", "2", "3"};
String[] table1 = {"14", "21", "25", "75"};
String[] table2 = {"7", "63", "57", "87"};
You can use the following Thymeleaf markup:
<div class="row" th:each="val,iterStat : ${table0}" >
<div class="cell" th:text="${val}">
</div>
<div class="cell" th:text="${table1[iterStat.index]}">
</div>
<div class="cell" th:text="${table2[iterStat.index]}">
</div>
</div>
This produces a column of numbers as follows (I don't have any CSS so it's just the raw output):
0
14
7
1
21
63
2
25
57
3
75
87
The related html looks like this:
<div class="row">
<div class="cell">0</div>
<div class="cell">14</div>
<div class="cell">7</div>
</div>
<div class="row">
<div class="cell">1</div>
<div class="cell">21</div>
<div class="cell">63</div>
</div>
<div class="row">
<div class="cell">2</div>
<div class="cell">25</div>
<div class="cell">57</div>
</div>
<div class="row">
<div class="cell">3</div>
<div class="cell">75</div>
<div class="cell">87</div>
</div>
The iterStat function is described here - it basically keeps track of your iterations. Since you want the same index for each table, it's a good fit for your needs.

Selecting Elements from a 'special' listbox with Selenium in Java

So i have the following HTML Code of a listbox here:
<div role="listbox" aria-expanded="false" class="quantumWizMenuPaperselectEl docssharedWizSelectPaperselectRoot freebirdFormviewerViewItemsSelectSelect freebirdThemedSelectDarkerDisabled" jscontroller="YwHGTd" jsaction="click:cOuCgd(LgbsSe); keydown:I481le; keypress:Kr2w4b; mousedown:UX7yZ(LgbsSe),npT2md(preventDefault=true); mouseup:lbsD7e(LgbsSe); mouseleave:JywGue; touchstart:p6p2H(LgbsSe); touchmove:FwuNnf; touchend:yfqBxc(LgbsSe|preventMouseEvents=true|preventDefault=true); touchcancel:JMtRjd(LgbsSe); focus:AHmuwe; blur:O22p3e;b5SvAb:TvD9Pc;" jsshadow="" jsname="W85ice" aria-describedby="i.desc.709120473 i.err.709120473" aria-labelledby="i73">
<div jsname="LgbsSe" role="presentation">
<div class="quantumWizMenuPaperselectOptionList" jsname="d9BH4c" role="presentation">
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption isSelected isPlaceholder" jsname="wQNmvb" jsaction="" data-value="" aria-selected="true" role="option" tabindex="0">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">Auswählen</content>
</div>
<div class="quantumWizMenuPaperselectOptionSeparator" role="presentation"></div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="140 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">140 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="141 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">141 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="142 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">142 cm</content>
</div>
<div class="quantumWizMenuPaperselectOption freebirdThemedSelectOptionDarkerDisabled exportOption" jsname="wQNmvb" jsaction="" data-value="143 cm" aria-selected="false" role="option" tabindex="-1">
<div class="quantumWizMenuPaperselectRipple exportInk" jsname="ksKsZd"></div>
<content class="quantumWizMenuPaperselectContent exportContent">143 cm</content>
</div>
</div>
<div class="quantumWizMenuPaperselectDropDown exportDropDown" role="presentation"></div>
</div>
<div class="exportSelectPopup quantumWizMenuPaperselectPopup" jsaction="click:dPTK6c(wQNmvb); mousedown:uYU8jb(wQNmvb); mouseup:LVEdXd(wQNmvb); mouseover:nfXz1e(wQNmvb); touchstart:Rh2fre(wQNmvb); touchmove:hvFWtf(wQNmvb); touchend:MkF9r(wQNmvb|preventMouseEvents=true)" role="presentation" jsname="V68bde" style="display:none;"></div>
</div>
I am writing an program which has to select an element of this listbox automatically in java (like "140 cm", "141 cm" like you see in the code etc...). I tried to access the listbox itself with the following code:
WebElement checkBox = driver.findElement(By.cssSelector("div[aria-labelledby*=i73]"));
CheckBox.click();
It worked but now i have to select somehow an element of this listbox. I tried it with the 'Select'-Command, which did not work:
Select listbox = new Select(checkBox);
listbox.selectByVisibleText("140 cm");
I also tried it with clicking on the specific div with the '140 cm' text and waiting for its clickability. But I get a timeout exception because it failed to wait for the element to be clickable.
WebElement boxElement = driver.findElement(By.cssSelector("div[data-value*='140']"));
WebDriverWait wait = new WebDriverWait(driver, 10);
boxElement = wait.until(ExpectedConditions.elementToBeClickable(By.cssSelector("div[data-value*='140']")));
boxElement.click();
I am desperate and do not know what to do. Can any of you guys help me? I am thankfully for every answer!
greetings

Why does my loop only working on some of it's iterations?... (using Jsoup to extract data)

The items in my itemList are incomplete! For some reason from the 10th iteration of my loop to the last
el.select(".item").select(".img").select(".pic").select(".picRind").select(".picCore").attr("src")
returns a empty string and I can't understand why
0-9th iteration is perfectly find though. I went through the html and my code should work for every li I'm iterating through.
private Document getHtmlDocument() throws IOException {
document = Jsoup.connect(url).get();
return document;
}
public List<AliExpressItem> getAliExpressItemList() throws IOException {
Document document;
Element ul;
Elements ulLi;
document = getHtmlDocument();
ul = document.getElementById("hs-below-list-items");
ulLi = ul.getElementsByClass("list-item");
List<AliExpressItem> itemList = new ArrayList<>();
for(Element el : ulLi) {
AliExpressItem item = new AliExpressItem();
item.setImage(el.select(".item")
.select(".img")
.select(".pic")
.select(".picRind")
.select(".picCore")
.attr("src"));
item.setDescription(el.select(".item")
.select(".info")
.select("h3")
.select("a")
.text());
item.setPrice(el.select(".item")
.select(".info")
.select(".price")
.select(".value")
.text());
itemList.add(item);
}
return itemList;
}
Theres a ul with 48 li's inside. The above code should work for all 48 li's
<li qrdata="|32805326364|cn1511315262" pub-catid="200247142" sessionid="201711160635492248862329348280002056372" class="list-item list-item-first ">
<div class="item">
<div class="img img-border">
<div class="pic">
<a class="picRind history-item j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.1.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0" target="_blank" data-spm-anchor-id="2114.search0204.3.1"><img class="picCore pic-Core-v" src="//ae01.alicdn.com/kf/HTB1RUjgQFXXXXayXXXXq6xXFXXX4/Hot-Sale-Novelty-Toys-Hand-font-b-Spinner-b-font-Anti-stress-toys-fidget-font-b.jpg_220x220.jpg" alt="Hot Sale Novelty Toys Hand Spinner Anti stress toys fidget spinners For Autism and ADHD reliever stress spinner(China)"></a>
</div>
</div>
<div class="info">
<h3>
<a class="history-item product j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.2.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0" title="Hot Sale Novelty Toys Hand Spinner Anti stress toys fidget spinners For Autism and ADHD reliever stress spinner" target="_blank" data-spm-anchor-id="2114.search0204.3.2">Hot Sale Novelty Toys Hand <font><b>Spinner</b></font> Anti stress toys fidget <font><b>spinners</b></font> For Autism and ADHD reliever stress <font><b>spinner</b></font></a>
</h3>
<span class="price price-m">
<span class="value" itemprop="price">US $1.99</span>
<span class="separator">/</span>
<span class="unit">unidad</span>
</span>
<strong class="free-s">Envío gratis</strong>
<div class="rate-history">
<span rel="nofollow" class="order-num">
<a class="order-num-a j-p4plog" href="//www.aliexpress.com/item/Hot-Sale-Novelty-Toys-Hand-Spinner-Anti-stress-toys/32805326364.html?spm=2114.search0204.3.3.Lwk2KD&s=p&ws_ab_test=searchweb0_0,searchweb201602_5_10152_10065_10151_10344_10068_10130_10345_10324_10342_10547_10325_10343_10546_10340_10341_10548_10545_10541_10562_10084_10083_10307_5680011_10178_10060_10155_10154_10056_10055_10539_10312_10059_10313_10314_10534_10533_100031_10103_10073_10102_10594_10557_10558_10596_10142_10107,searchweb201603_14,ppcSwitch_5_ppcChannel&btsid=6350c066-2194-4756-b1f7-ed7e1b0028e1&rmStoreLevelAB=0#thf" rel="nofollow" target="_blank" data-spm-anchor-id="2114.search0204.3.3"><em title="Pedido totales"> Ventas (0)</em></a>
</span>
</div>
</div>
<div class="info-more">
<div class="aplus-sp-main">
<div class="sp-box">
</div>
</div>
<div class="store-name-chat">
<div class="store-name util-clearfix">
Alisa's cabin
</div>
</div>
<a class="score-dot" href="//www.aliexpress.com/store/feedback-score/1308215.html?spm=2114.search0204.3.5.Lwk2KD" rel="nofollow" data-spm-anchor-id="2114.search0204.3.5"><span class="score-icon-new score-level-22" id="score1" feedbackscore="1,276" sellerpositivefeedbackpercentage="93.7"></span></a>
<div class="add-to-wishlist">
<a class="atwl-button j-p4plog" href="javascript:;" data-product-id="32805326364" data-batman-id="ja2kvte8" data-spm-anchor-id="2114.search0204.3.6">Añadir a Lista Deseos</a>
</div>
<input class="atc-product-id" type="hidden" value="32805326364">
<input class="atc-product-standard" type="hidden" value="">
</div>
</div>

Finding previous sibling element using selenium xpath dynamically

I am writing selenium scripts for the following code.
<div id="abc" class="ui-selectmanycheckbox ui-widget hpsapf-chechkbox">
<div class="ui-chkbox ui-widget">
<div class="ui-helper-hidden-accessible">
<input id="abc:0" name="abc" type="checkbox" value="0" checked="checked">
</div>
<div class="ui-chkbox-box ui-widget ui-corner-all ui-state-default ui-state-active">
<span class="ui-chkbox-icon ui-icon ui-icon-check ui-c"></span>
</div>
</div>
<span class="hpsapf-radio-label">
<label for="abc:0">Herr</label>
</span>
<div class="ui-chkbox ui-widget">
<div class="ui-helper-hidden-accessible">
<input id="abc:1" name="abc" type="checkbox" value="1">
</div>
<div class="ui-chkbox-box ui-widget ui-corner-all ui-state-default">
<span class="ui-chkbox-icon ui-icon ui-icon-blank ui-c"></span>
</div>
</div>
<span class="hpsapf-radio-label">
<label for="abc:1">Frau</label>
</span>
</div>
These are the checkbox like the following.The number of the checkboxes are changed as per database values.
In my code i am first checking whether the "Frau" check box is selected or not. so i tried following snippet.
WebElement mainElement= driver.findElement(By.id("abc"));
WebElement label=mainElement.findElement(By.xpath(".//label[contains(#for,'abc')][text() = 'Frau']"));
WebElement parent = label.findElement(By.xpath(".."));
WebElement div = parent.findElement(By.xpath("preceding-sibling::::div"));
WebElement checkBox = div.findElement(By.className("ui-chkbox-box"));
String css = checkBox.getAttribute("class");
if(css.contains("ui-state-active")) {
return "checked";
}
else
{
return "unchecked";
}
But when i tried to execute this script. WebElement div = parent.findElement(By.xpath("preceding-sibling::::div")); gives me the first div tag and not the preceding one. I want a preceding sibling.
Use :: and index, not ::::
WebElement div = parent.findElement(By.xpath("preceding-sibling::div[1]"));

How can I extract information from HTML depending on the structure

I want to extract some data from many links from xbox. The problem I am experiencing is that in the section where the price is shown, the structure is different if the game is with discount (for example).
The code I have written to scrap the price:
String urlPage = "https://www.microsoft.com/en-us/store/p/call-of-duty-advanced-warfare-gold-edition/c20hl06x0v8w" ;
System.out.println("Comprobando entradas de: "+urlPage);
if (getStatusConnectionCode(urlPage) == 200) {
Document document = getHtmlDocument(urlPage);
Elements entradas = document.select("div.m-product-detail-hero-product-placement div.price-info");
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("srv_saleprice").text();
}
}else{
System.out.println("El Status Code no es OK es: "+getStatusConnectionCode(urlPage));
}
The HTML for a game that has no discount:
URL for first case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<span>$59.99</span>
<sup>+</sup>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="59.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
And for a game with discount:
URL for the second case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<s class="srv_saleprice" aria-label="Full price was $159.99">$159.99</s>
<span> </span>
<div class="price-disclaimer">
<span>$135.99</span>
<sup>+</sup>
</div>
<span> </span>
<span></span>
</div>
<div class="caption text-muted srv_countdown">
<span class="sub">save $24.00</span>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="135.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
In this second example the value inside elements is $135.99 but is not the game base price ($159.99 in this case).
How could I extract only the base price for every game (with or without) discount?

Categories

Resources