If i have code like this
Elements e = d.select("div[id=result_52]");
System.out.println("elemeeeeeee" + e);
the output of e is as below
elemeeeeeee<imagebox id="result_52" class="rsltGrid prod celwidget" name="B00BF9MZ44">
<div class="linePlaceholder"></div>
<div class="image imageContainer">
<a href="http://www.abcdefg.com/VIZIO-E241i-A1-24-Inch-1080p-tilted/dp/B00BF9MZ44/ref=lp_6459736011_1_53/190-4904523-2326018?s=tv&ie=UTF8&qid=1405599829&sr=1-53">
<div class="imageBox">
<img src="http://ecx.images-abcdefg.com/images/I/51PhLnnk7NL._AA160_.jpg" class="productImage cfMarker" alt="Product Details" />
</div>
I want both URL which is coming inside
You can use # to get the specific id.
Element e = d.select("div#result_52").get(0);
String firstURL = e.select("a").attr("href"); //select the `a` tag and the `href` attribute.
String secondURL = e.select("img").attr("src");
Related
I'm trying to ignore an item and not parse it on Jsoup
But css selector "not", not working !!
I don't understand what is wrong ??
my code:
MangaList list = new MangaList();
Document document = getPage("https://3asq.org/");
MangaInfo manga;
for (Element o : document.select("div.page-item-detail:not(.item-thumb#manga-item-5520)")) {
manga = new MangaInfo();
manga.name = o.select("h3").first().select("a").last().text();
manga.path = o.select("a").first().attr("href");
try {
manga.preview = o.select("img").first().attr("src");
} catch (Exception e) {
manga.preview = "";
}
list.add(manga);
}
return list;
html code:
<div class="col-12 col-md-6 badge-pos-1">
<div class="page-item-detail manga">
<div id="manga-item-5520" class="item-thumb hover-details c-image-hover" data-post-id="5520">
<a href="https://3asq.org/manga/gosu/" title="Gosu">
<img width="110" height="150" src="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg" srcset="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg 110w, https://3asq.org/wp-content/uploads/2020/03/IMG_4497-175x238.jpg 175w" sizes="(max-width: 110px) 100vw, 110px" class="img-responsive" style="" alt="IMG_4497"/> </a>
</div>
<div class="item-summary">
<div class="post-title font-title">
<h3 class="h5">
<span class="manga-title-badges custom noal-manga">Noal-Manga</span> Gosu
</h3>
If I debug your code and extract the HTML for:
System.out.println(document.select("div.page-item-detail").get(0)) (hint use the expression evaluator in IntelliJ IDEA (Alt+F8 - for in-session, real-time debugging)
I get:
<div class="page-item-detail manga">
<div id="manga-item-2003" class="item-thumb hover-details c-image-hover" data-post-id="2003">
<a href="http...
...
</div>
</div>
</div>
It looks like you want to extract the next div tag down with class containing item-thumb ... but only if the id isn't manga-item-5520.
So here's what I did to remove that one item
document.select("div.page-item-detail div[class*=item-thumb][id!=manga-item-5520]")
Result size: 19
With the element included:
document.select("div.page-item-detail div[class*=item-thumb]")
Result size: 20
You can also try the following if you want to remain based at the outer div tag rather than the inner div tag.
document.select("div.page-item-detail:has(div[class*=item-thumb][id!=manga-item-5520])")
I want to retrieve visitor ID from “visitor” or "visitor.VisitorId" . but below code I use to retrieve data but successfully run without any error but I received value is null.
HTML Code:-
<ul class="sidebar-menu">
<li id="visitorView" class="treeview active">
<a>
<ul id="visitorViewMenu" class="treeview-menu menu-open" style="display: block;">
<!-- ngRepeat: visitor in Visitors -->
<li class="ng-scope" ng-repeat="visitor in Visitors" style="">
<a id="visitor.VisitorId" class="ng-binding" ng-click="select(visitor)">
<countryflag class="flagimg ng-isolate-scope" visitor="visitor">
<span class="chattabname"/>
A
<span class="timmer1 pull-right" runtimer="{"VisitorID":"c2c45b4d-5077-492f-afd6-88ab3bba99cd","Name":"A","StartTime":"2016-09-09 10:33:21","WidgetId":"7fcf22c6-4a9d-4701-9865-b8a85d597862","ConnectionId":"edc7d72b-8217-4961-81ff-f4ef4138bc3b","TimeZone":"Asia/Colombo","CountryCode":"lk","VisitorName":null,"Department":null,"CompanyId":"a4afbd8b-1de9-49d9-8fe6-4ec8119f4bb8"}">
</a>
</li>
<!-- end ngRepeat: visitor in Visitors -->
<li>
</ul>
</li>
<li class="treeview">
<li class="treeview">
</ul>
Selenium Code:-
**1st method :-**
WebElement cityField = driver.findElement(By.cssSelector("a[ng-click='select(visitor)']"));
**2nd method :-**
WebElement cityField = driver.findElement(By.cssSelector("a[id='visitor.VisitorId']"));
**Output**
System.out.println("+++-- "+cityField.getAttribute("value"));
Try using getText() which will return innerText of the <a> element as below :-
WebElement cityField = driver.findElement(By.id("visitor.VisitorId"));
System.out.println("+++-- " + cityField.getText());
Or if you want to get span element where visitorId present in runtimer attribute value, you should locate span element and get runtimer attribute value as :-
WebElement cityField = driver.findElement(By.cssSelector("a[id = 'visitor.VisitorId'] span.timmer1"));
String runtimeData = cityField.getAttribute("runtimer");
//Now do some programming stuff to retrieve visitor id
runtimer attribute data looks like in json format, so you can retrieve any data after converting in into org.json.JSONObject by passing their key as below :-
import org.json.JSONException;
import org.json.JSONObject;
public static Object getValue(String data, String key) throws JSONException {
JSONObject jObject = new JSONObject(data);
return jObject.get(key);
}
String visitorID = (String) getValue(runtimeData, "VisitorID");
System.out.println(visitorID);
Output :-
c2c45b4d-5077-492f-afd6-88ab3bba99cd
As OP suggested, we can use split() function as well to retrieve data as :-
String[] splitS = runtimeData.split(",");
for(int i =0; i < splitS.length; i++)
{
System.out.println("splitS" + splitS[i]);
}
If I understand correctly the value you are looking for is in the runtimer attribute that located in descendant element of id="visitor.VisitorId", you need to put that in getAttribute() method
WebElement cityField = driver.findElement(By.cssSelector("#visitor.VisitorId > .timmer1"));
String attributeData = cityField.getAttribute("runtimer");
String visitorId = attributeData.split(",");
System.out.println("+++-- " + visitorId);
Output: +++-- "VisitorID":"c2c45b4d-5077-492f-afd6-88ab3bba99cd"
I Have a html file like below
<div id ="test"> <u>s</u> </div>
I want to modify like this using java
<div id ="test"> <b>Test<b> </div>
is it possible in jsoup ?
it is posible:
Element el = doc.select("div#test").first();
for (Element elC : el.children()) {
elC.remove();
}
Element nel = el.appendElement("b");
nel.text("Test");
Using Jsoup:
Element movie_div = doc.select("div.movie").first();
I got a such HTML-code:
<div class="movie">
<div>
<div>
<strong>Year:</strong> 2014
</div>
<div>
<strong>Country:</strong> USA
</div>
</div>
</div>
How can I use jsoup to extract the country and the year?
For the example html I want the extracted values to be "2014" and "USA".
Thanks.
Use
Element e = doc.select("div.movie").first().child(0);
List<TextNode> textNodes = e.child(0).textNodes();
String year = textNodes.get(textNodes.size()-1).text().trim();
textNodes = e.child(1).textNodes();
String country = textNodes.get(textNodes.size()-1).text().trim();
Did you try something like:
Element movie_div = doc.select("div.movie strong").first();
And to get the text value you should try;
movie_div.text();
My problem is that I need to get a div class inside a div class inside a div class and the are 4 instances of the classes with the same name but different data...
I can currently get the first div class inside the div class but i need to be able to access other elements within it aswell......for example:
docTide = Jsoup.connect("http://www.mhpa.co.uk/search-tide-times/").timeout(600000).get();
Elements tideTableRows = docTide.select("div.tide_row.odd");
Element firstDiv = tideTableRows.first();
Element secondDiv = tideTableRows.get(1);
System.out.println("This is the first div: " + firstDiv.text());
System.out.println("This is the second div: " + secondDiv.text());
but this is the structure of the webpage where there are 2 repeats and I need to access each of them e.g:
<div class="tide_row odd">
<div class="time">00:57</div>
<div class="height_m">4.9</div>
<div class="height_f">16,1</div>
<div class="range_m">1.9</div>
<div class="range_f">6,3</div>
</div>
<div class="tide_row even">
<div class="time">07:23</div>
<div class="height_m">2.9</div>
<div class="height_f">9,6</div>
<div class="range_m">2</div>
<div class="range_f">6,7</div>
</div>
<div class="tide_row odd">
<div class="time">13:46</div>
<div class="height_m">5.1</div>
<div class="height_f">16,9</div>
<div class="range_m">2.2</div>
<div class="range_f">7,3</div>
</div>
<div class="tide_row even">
<div class="time">20:23</div>
<div class="height_m">2.8</div>
<div class="height_f">9,2</div>
<div class="range_m">2.3</div>
<div class="range_f">7,7</div>
</div>
So basically it has nested classes in separate classes with the same name, how can I construct the correct syntax to return the data from the classes separately?
This is quite hard to explain!
Edit: This is how I managed to extract the information from the nested classes:
docTide = Jsoup.connect("http://www.mhpa.co.uk/search-tide-times/").timeout(600000).get();
Elements tideTimeOdd = docTide.select("div.tide_row.odd div:eq(0)");
Elements tideTimeEven = docTide.select("div.tide_row.even div:eq(0)");
Elements tideHightOdd = docTide.select("div.tide_row.odd div:eq(2)");
Elements tideHightEven = docTide.select("div.tide_row.even div:eq(2)");
Element firstTideTime = tideTimeOdd.first();
Element secondTideTime = tideTimeEven.first();
Element thirdTideTime = tideTimeOdd.get(1);
Element fourthTideTime = tideTimeEven.get(1);
Element firstTideHight = tideHightOdd.first();
Element secondTideHight = tideHightEven.first();
Element thirdTideHight = tideHightOdd.get(1);
Element fourthTideHight = tideHightEven.get(1);
It would work fine by just doing:
docTide = Jsoup.connect("http://www.mhpa.co.uk/search-tide-times/").timeout(600000).get();
Elements tideTableRows = docTide.select("div[class=tide_row odd]");
Element firstDiv += tideTableRows.select("div[class=time]");
Element secondDiv += tideTableRows.select("div[class=height]");
You should try to access elements by id if you can. It makes your code a lot simpler and if you have 50 headers in the same container, as an example, this way you don't have to count them all.
Seperate elements:
docTide = Jsoup.connect("http://www.mhpa.co.uk/search-tide-times/").timeout(600000).get();
Element tideTableRows = docTide.select("div[class=tide_row odd]").first();
Element firstDiv1 = tideTableRows.select("div[class=time]");
Element secondDiv1 = tideTableRows.select("div[class=height]");
tideTableRows2 = docTide.select("div[class=tide_row odd]").second();
Element firstDiv2 = tideTableRows.select("div[class=time]");
Element secondDiv2 = tideTableRows.select("div[class=height]");
You can try this:
docTide = Jsoup.connect("http://www.mhpa.co.uk/search-tide-times/").timeout(600000).get();
Elements tideTableRows = docTide.select("div.tide_row");
for (Element tideTableRow : tideTableRows){
if (tideTableRow.hasClass("odd")){
//do the odd stuff...
}
Elements innerDivs = tideTableRows.select("div");
for (Element innerDiv : innerDivs){
//do whatever you need
}
}
Note: The code is not tested.
Update: I showed you how to access the odd rows only... From there you should be able to get the rest by yourself I hope.