HtmlUnit get div element by class inside a DomElement? - java

Hello I am using HtmlUnit library and I need to get some href attribute from an a tag, inside some div:
<div class="threadpostedin td alt">
<p>Forum:<br>
<a href="programming/website-development/"
title="Website Development">Website
Development</a></p>
</div>
This div is located inside a <li> which is located inside a <ol>
to get the ol I did this:
HtmlOrderedList l = (HtmlOrderedList) this.page.getElementById("searchbits");
The html:
<ol class="searchbits" id="searchbits" start="1">
Now from the div I posted, I need to get the href "programming/website-development/", but I am not sure how to do this. Yes the div has a class name, but if I do
for (DomElement ele : l.getChildElements()) {
System.out.println(ele.getByXPath("//div[#class='threadpostedin td alt']").size());
break;
}
it will print 15, because overall there are 15 lists in the ol, in each list there is one div with class threadpostedin td alt. What I need to do, is the the exact div with class threadpostedin td alt in the DomElement I got from the iteration, and not get the list of all divs with that class.
Is there a way to do this with HtmlUnit?

I assume you have more links than one to make it more detailed.
HtmlElement element = page.getByXPath("//div[#class='threadpostedin td alt']").get(0);
DomNodeList<DomNode> nodes = element.querySelectorAll("a");
for(DomNode a : nodes) {
if(a.getAttributes().getNamedItem("href") !=null) {
String href = page.getFullyQualifiedUrl(a.getAttributes().getNamedItem("href").getNodeValue()).toString().toLowerCase();
String baseUrl = page.getBaseURL().toString();
}
}

Related

Get text without class and from span

Hi I could not get the text from html I wanna get this text This is a test text
<div class="rehou">
<span class="tlid-t t">
<span title="" class="">This is a test text</span>
</span>
<span class="tlid-t-v" style="" role="button"></span>
</div>
My java:
Document doc = Jsoup.connect(url).get();
Elements ele= doc.select("span.tlid-t t");
textass = ele.text();
The span has the two different classes tlid-t and t. So if you want to use both classes in your select you should use span.tlid-t.t instead of span.tlid-t t.
Elements ele = doc.select("span.tlid-t.t");
String textass = ele.text();
System.out.println(textass);
Which would print This is a test text.
But this will select the outer span! If the html gets changed the content of textass will be also changing. If you only want to select the text of the inner span you should use span.tlid-t.t span.
Elements ele = doc.select("span.tlid-t.t span");
String textass = ele.text();
System.out.println(textass);
This will also print This is a test text.

How do I find the list of inner child tagnames using the object of Parent Tag

Can anyone help me to find out the list of child tagnames using reference of parent tagname.
In a table I have list of rows and columns which means each row has 14 columns and each column has list of inner tags like span, span, input. Now I need to find the list of items under column td[11] for which I have written the below code:
element=driver.findElement(shopviewtableid);
items=element.findElements(shopviewrow);
if(items.size()>0) {
for(WebElement ele:items) {
columnvalues=ele.findElements(shopviewcolumn);
for(WebElement item:columnvalues) {
System.out.println("Inside Tag name of each column"+item..toString());
In the above code I am passing table id in shopviewtable id and tagname tr for shopviewrow and xpath //td[11] for shopviewcolumn. Now after fetching the td[11] for each row again I am fetching the list of items under td[11]
for(WebElement item:columnvalues) {
System.out.println("Inside Tag name of each column"+item..toString());" .
Here under td[11] I have three items with tagname as span,span,input. PFA screenshot How do I get name of these tags from the list[enter image description here]
Tried using item.getTagname() for each item but it displays td as tagname and not the name of the element inside the td[11].
It would be great if anyone could help me on this issue.
Here is my Html structure:
<td role="gridcell" style="width: 11%;" class="jqnoDetails">
<span id="detailsForm:j_id_5q:0:minQuantity" class="hidden">1</span>
<span id="detailsForm:j_id_5q:0:incQuantity" class="hidden">1</span>
<span id="detailsForm:j_id_5q:0:originalQuantity" class="hidden">1.0000</span>
<input id="detailsForm:j_id_5q:0:quantity" name="detailsForm:j_id_5q:0:quantity" type="text" value="1" min="0" inc="1" onblur="PrimeFaces.bcn(this,event,[function(event){handleQuantityChanged(this); updatePrice($(this), 11.23);},function(event){jsf.ajax.request('detailsForm:j_id_5q:0:quantity',event,{execute:'#this ',render:'#this ','CLIENT_BEHAVIOR_RENDERING_MODE':'OBSTRUSIVE','javax.faces.behavior.event':'blur'})}])" style="width: 85%;" aria-required="true" class="ui-inputfield ui-inputtext ui-widget ui-state-default ui-corner-all jqQuantityInput textCenter" role="textbox" aria-disabled="false" aria-readonly="false">
</td>
I am assuming that you have the mechanism to find the cell using iteration as shown. Now try to get all child elements of particular grid cell using XPath like this -
List<WebElement> childElements =item.findElements(By.xpath(".//child::*"));
//try even (".//*") as XPath to get the child elements
once you have a list of elements, you can iterate using for loop to get required tag or another data getAttribute method of WebElement.
Please find the below code to get the tag names of each element inside a cell.
element=driver.findElement(shopviewtableid);
items=element.findElements(shopviewrow);
if(items.size()>0) {
for(WebElement ele:items) {
columnvalues=ele.findElements(shopviewcolumn);
for(WebElement item:columnvalues) {
System.out.println("Inside Tag name of each column");
insideElements=item.findElements(By.xpath("//*");
for(WebElement ele:insideElements)
{
System.out.println("Inside Element tag:"+ ele.getTagName();
}
}

How to fetch data from ul li in android studio with jsoup

I am trying to get Product 1 and Product 2 but I cant get it help please
I am using jsoup and volley
<ul id="searched-products">
<li>
<div class="gd-col navUnitContainer1 gu4">
<div class="product_name">
<a>Prodict 1</a>
</div>
</div>
</li>
<li>
<div class="gd-col navUnitContainer1 gu4">
<div class="product_name">
<a>Prodict 2</a>
</div>
</div>
</li>
</ul>
I have tried this
Elements itemElements = doc.select("ul#searched-products li");
but its not selecting "li".I have also tried this
Elements itemElements = doc.select("ul#searched-products"); //this line works
Element e1 = itemElements.get(i);
e1.select("li"); or item.getElementsByTag("li");
still no good...
There are hundreds of li so I cant do this
doc.select("li");
Kindly suggest something
Like this:
public class JsoupList {
public static void main(String[] brawwwr){
String html = "<ul id=\"searched-products\">" +
"<li>" +
"<div class=\"gd-col navUnitContainer1 gu4\">" +
"<div class=\"product_name\">" +
"<a>Prodict 1</a>" +
"</div>" +
"</div>" +
"</li>" +
"<li>" +
"<div class=\"gd-col navUnitContainer1 gu4\">" +
"<div class=\"product_name\">"+
"<a>Prodict 2</a>" +
"</div>" +
"</div>" +
"</li>" +
"</ul>";
Document doc = Jsoup.parse(html);
Elements itemElements = doc.select("ul#searched-products li");
for(Element elem : itemElements){
System.out.println(elem.select("div div a").text());
}
}
}
Will return
Prodict 1
Prodict 2
You can imagine repetitive code inside tags like a little page of its own.
regards
Try this code.
Elements itemElements = doc.select("ul#searched-products");
itemElements = itemElements.select("li");
for(Element ele : itemElements){
String text = ele.text();
System.out.println(text); //this will return Prodict 1 and Prodict 2
}
// or u can try by getting all the a
for(Element ele : itemElements){
String text = ele.select("a").first().text();
System.out.println(text); //this will also return Prodict 1 and Prodict 2
}
To exclude <li> or <a> tags outside the list, you need to restrict the selector to match only inside the list. The best would be to use the ID (#searched-products). Then do not select <li> or <a> tags from the doc, but from the selected <ul>element.
You can get your text with any of the following selectors (not a complete list):
#searched-products li a
#searched-products a
#searched-products .product_name a
#searched-products .product_name
Even the last one is okay, since you need only the text, and div.product_name contains only the <a> tag.
for(Element e: doc.select("#searched-products .product_name")) {
String t = e.text(); // Prodict N
}
By the way, your original approach with selecting <li> tags inside ul#searched-products should have worked. If that doesn't return anything, the case might be that the list is generated dynamically on that page. You can test it easily by printing out the HTML that Jsoup has (doc.html() or doc.select('#searched-products').html()).
If really that's the case, Jsoup is not the right tool for you. I suggest you to use Selenium with possibly a headless browser (HtmlUnit or PhantomJS). They can return and even interact with dynamically created elements, so maybe other parts of your crawl process can be simplified.

get href value (WebDriver)

How do I get a value from href?
like this eg:
<div id="cont"><div class="bclass1" id="idOne">Test</div>
<div id="testId"><a href="**NEED THIS VALUE AS STRING**">
<img src="img1.png" class="clasOne" />
</a>
</div>
</div>
</div>
I need that value as string.
I've tried with this:
String e = driverCE.findElement(By.xpath("//div[#id='testId']")).getAttribute("href");
JOptionPane.showMessageDialog(null, e);
But just returns NULL value...
You have pointed your element to 'div' instead of 'a'
Try the below code
driverCE.findElement(By.xpath("//div[#id='testId']/a")).getAttribute("href");
If you got more than one anchor tag, the following code snippet will help to find all the links pointed by href
//find all anchor tags in the page
List<WebElement> refList = driver.findElements(By.tagName("a"));
//iterate over web elements and use the getAttribute method to
//find the hypertext reference's value.
for(WebElement we : refList) {
System.out.println(we.getAttribute("href"));
}

Jsoup parse HTML including span tags

I have a HTML with the following format
<article class="cik" id="100">
<a class="ci" href="/abc/1001/STUFF">
<img alt="Micky Mouse" src="/images/1001.jpg" />
<span class="mick vtEnabled"></span>
</a>
<div>
Micky Mouse
<span class="FP">$88.00</span> <span class="SP">$49.90</span>
</div>
</article>
In the above code the tag inside article has a span class="mick vtEnabled" with no lable. I want to check if this span tag with the class name specified is present within the article tag. How do i do that? I tried select("> a[href] > span.mick vtEnabled") and checked the size..it remains 0 for all the article tags irrespective if its set or not. any inputs?
Starting from individual article tags would be good:
final String test = "<article class=\"cik\" id=\"100\"><a class=\"ci\" href=\"/abc/1001/STUFF\"><img alt=\"Micky Mouse\" src=\"/images/1001.jpg\" /></a><div>Micky Mouse<span class=\"FP\">$88.00</span> <span class=\"SP\">$49.90</span></div></article>";
final Elements articles = Jsoup.parse(test).select("article");
for (final Element article : articles) {
final Elements articleImages = article.select("> a[href] > img[src]");
for (final Element image : articleImages) {
System.out.println(image.attr("src"));
}
final Elements articleLinks = article.select("> div > a[href]");
for (final Element link : articleLinks) {
System.out.println(link.attr("href"));
System.out.println(link.text());
}
final Elements articleFPSpans = article.select("> div > span.FP");
for (final Element span : articleFPSpans) {
System.out.println(span.text());
}
}
final Elements articleSPSpans = article.select("> div > span.SP");
for (final Element span : articleSPSpans) {
System.out.println(span.text());
}
}
This prints:
/images/1001.jpg
/abc/1001/STUFF
Micky Mouse
$88.00
$49.90

Categories

Resources