I'm trying to select the next node value (number 4) after the span tag in the html below. How can I do that??
<tr valign="top">
<td></td>
<td> 1 </td>
<td> 2 </td>
<td><span> 3 </span></td>
<td> 4 </td>
<td> 5 </td>
<td> 6 </td>
</tr>
final String html = "<tr valign=\"top\">\n"
+ " <td></td>\n"
+ " <td> 1 </td>\n"
+ " <td> 2 </td>\n"
+ " <td><span> 3 </span></td>\n"
+ " <td> 4 </td>\n"
+ " <td> 5 </td>\n"
+ " <td> 6 </td>\n"
+ "</tr>";
Document doc = Jsoup.parse(html);
Element nextToSpan = doc.select("span").first().nextElementSibling();
Explained:
doc.select("span") // Select the span-tags of doc
.first() // retrieve the first one
.nextElementSibling(); // Get the element that's next to it
Documentation: http://jsoup.org/cookbook/extracting-data/selector-syntax
Related
I've got a table and need to find a specific row number. In this case, I'm interested in the 2nd row (2nd tr).
Here's the HTML:
<table>
<thead>
<tbody>
<tr class="classes mico_models_classes_7 listRowWhite">
<tr class="classes mico_models_classes_8 listRowDark">
<td style="width:130px;">9:00am - 10:30am</td>
<td class="noprint enrolledFull" style="width:70px;height:40px;text-align:center;"> </td>
<td style="width:200px;">
<a class="eventName" data-eventdescription="Very long spin class" data-eventname="Spinning 90 min" href="javascript://">Spinning 90 min</a>
<br>
with
<a class="classesEmployeeName" title="John Doe" href="javascript://" data-employeeid="5117">John Doe</a>
</td>
<td style="text-align:right;width:72px;">0 of 20</td>
<td>Spin Class</td>
</tr>
<tr class="classes mico_models_classes_9 listRowWhite">
<tr class="classes mico_models_classes_10 listRowDark">
</tbody>
</table>
Unfortunately, the following returns null for dataRowIndex:
String classTittle = "Spinning 90 min";
String dataRowIndex = driver.findElement(By.cssSelector("[data-eventname='" + classTitle + "']")).getAttribute("rowIndex");
You can solve this by iterating the tr HTML-elements while counting them. You can also access them while iterating.
List<WebElement> element = driver.findElements(By.cssSelector("tr"));
int row = 0;
for( WebElement w : element){
String elemText = w.getText();
System.out.println(elemText);
String clickText = "Spinning 90 min";
if(elemText.contains(clickText)){
w.click(); //do something with the element
System.out.println("Text in row " + row + " is " + clickText + " so i clicked it!");
}
System.out.println("this was row " + row + "\n");
row++;
}
Will yield:
this was row 0
9:00am - 10:30am Spinning 90 min
with John Doe 0 of 20 Spin Class
Text in row 1 is Spinning 90 min so i clicked it!
this was row 1
this was row 2
this was row 3
You might want to encapsulate your specific logic in a method later on. Hope this helps ^^-d
I am trying to process a large amount of data for a research project. I have one html file loaded trough Jsoup, but the problem is that the table I need to evaluate does not have an Id or CLASS. I have searched stack, but I don't seem to find an answer as to how I can reach each <tr> and get the information out of its <td>'s.
<table>
<tr>
<td align="center">inf1</td>
<td align="center">date</td>
<tdalign="center">time</td>
<td align="center">group</td>
<td align="center">name</td>
<td align="center">---</td>
<td align="center">room</td>
<td align="center">---</td>
<td align="center">---</td>
<td> </td>
<tdalign="center">reason</td>
<td align="center"> </td>
</tr>
</table>
(The empty <td>'s and the "---" are just for displaying purposes in this table and don't have any value for my project)
I need to sort each <tr> (structured in the same way) by group and inf1 with the other data linked to them in order to use the data in an android Studio project where they will be displayed differently.
Thank you in advance for help:)
You can use Jsoup CSS selectors and a custom class that implements Comparable to keep the records. Something like this:
String html = ""
+"<table>"
+" <tr>"
+" <td align=\"center\">inf1</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group1</td>"
+" </tr> "
+"</table>"
+"<table>"
+" <tr>"
+" <td align=\"center\">inf1</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group0</td>"
+" </tr> "
+"</table>"
+"<table>"
+" <tr>"
+" <td align=\"center\">inf2</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group0</td>"
+" </tr> "
+"</table>"
;
Document doc = Jsoup.parse(html);
class TableRecord implements Comparable<TableRecord>{
public String inf = "";
public String grp = "";
#Override
public int compareTo(TableRecord arg0) {
int cmpGrp = arg0.grp.compareTo(this.grp);
if (cmpGrp==0){
return arg0.inf.compareTo(this.inf);
}
return cmpGrp;
}
#Override
public String toString(){
return "grp="+grp+":inf="+inf;
}
}
List<TableRecord> tableRecords = new ArrayList<>();
Elements trs = doc.select("table tr");
for (Element tr : trs){
Elements tds = tr.select("td");
TableRecord tableRecord = new TableRecord();
tableRecord.inf = tds.get(0).text();
tableRecord.grp = tds.get(3).text();
tableRecords.add(tableRecord);
}
Collections.sort(tableRecords);
for (TableRecord tableRecord:tableRecords){
System.out.println(tableRecord);
}
Below is my HTML structure of page.
<tr>
<td class="checkCol">
<td align="center">
<td> 8 </td>
<td> Add </td>
<td>
<td> Route Translation </td>
<td title=""> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> Force Complete </td>
<td>
<td>
<td>
<td>
</tr>
I am using below code to retrieve the TD element values.
List<WebElement> numOfRows = sppOrder_table.findElements(By.tagName("tr"));
if (numOfRows.size() == 1) {
System.out.println("No Record");
} else {
// Excluding header row
for (int i = 1; i <= numOfRows.size() - 1; i++) {
List<WebElement> numOfColumns = ((WebElement) numOfRows.get(i)).findElements(By.tagName("td"));
for (WebElement td : numOfColumns) {
System.out.println("Column Value === "+td.getText());
}
}
My Table Xpath is correct. It is printing nothing using HTMLUNITDRIVE and working fine using Firefox. Please suggest the resolution for this issue.
It works with latest version. Your case is missing the header tr element.
The below prints the expected result:
<table id='myid'>
<tr></tr>
<tr>
<td class="checkCol">
<td align="center">
<td> 8 </td>
<td> Add </td>
<td>
<td> Route Translation </td>
<td title=""> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> Force Complete </td>
<td>
<td>
<td>
<td>
</tr>
</table>
HtmlUnitDriver driver = new HtmlUnitDriver();
driver.get(the_url);
WebElement sppOrder_table = driver.findElement(By.id("myid"));
List<WebElement> numOfRows = sppOrder_table.findElements(By.tagName("tr"));
if (numOfRows.size() == 1) {
System.out.println("No Record");
} else {
// Excluding header row
for (int i = 1; i <= numOfRows.size() - 1; i++) {
List<WebElement> numOfColumns = ((WebElement) numOfRows.get(i)).findElements(By.tagName("td"));
for (WebElement td : numOfColumns) {
System.out.println("Column Value === "+td.getText());
}
}
}
I'm parsing this page segment:
<tr valign="middle">
<td class="inner"><span style=""><span class="" title=""></span> 2 <span class="icon ok" title="Verified"></span> </span><span class="icon cat_tv" title="Video » TV" style="bottom:-2;"></span> VALUE </td>
<td width="1%" align="center" nowrap="nowrap" class="small inner" >VALUE</td>
<td width="1%" align="right" nowrap="nowrap" class="small inner" >VALUE</td>
<td width="1%" align="center" nowrap="nowrap" class="small inner" >VALUE</td>
</tr>
I have this segment in variable tv: HtmlElement tv = tr.get(i);
I read tag VALUE in this way:
HtmlElement a = tv.getElementsByTagName("a").get(0);
object.name.value(a.getTextContent());
url = a.getAttribute("href");
object.url_detail.value(myBase + url);
How can I read only VALUE field of the other <td>....</td> sections?
I would suggest using XPath, which is the recommended way of parsing XML/HTML
Reference: How to read XML using XPath in Java
Also take a look at this question: RegEx match open tags except XHTML self-contained tags
Update
If I understood correctly, you need the "VALUE" from each td, right?
If so, your XPath would something like this:
//td[#class="small inner"]/text()
You may try a wonderful java package jsoup.
UPDATE: using the package, you can solve the problem like this:
String html = "<tr valign=\"middle\">"
+ " <td class=\"inner\">"
+ " <span style=\"\"><span class=\"\" title=\"\"></span> 2 <span class=\"icon ok\" title=\"Verified\"></span> </span><span class=\"icon cat_tv\" title=\"Video » TV\" style=\"bottom:-2;\"></span>"
+ " VALUE "
+ " </td>"
+ " <td width=\"1%\" align=\"center\" nowrap=\"nowrap\" class=\"small inner\" >VALUE</td>"
+ " <td width=\"1%\" align=\"right\" nowrap=\"nowrap\" class=\"small inner\" >VALUE</td>"
+ " <td width=\"1%\" align=\"center\" nowrap=\"nowrap\" class=\"small inner\" >VALUE</td>"
+ "</tr>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
Elements labelPLine = doc.select("a[href]");
System.out.println("value 1:" + labelPLine.text());
Elements labelPLine2 = doc.select("td[width=1%");
Iterator<Element> it = labelPLine2.iterator();
int n = 2;
while (it.hasNext()) {
System.out.println("value " + (n++) + ":" + it.next().text());
}
The result would be:
value 1:VALUE
value 2:VALUE
value 3:VALUE
value 4:VALUE
I have html page that extract the information from:
table class="students">
<tbody>
<tr class="rz" style="color:red;" onclick="location.href='//andy.pvt.com';">
<td>
<a title="Display Andy website,Andy" href="//andy.pvt.com">15</a>
</td>
<td>Andy jr</td>
<td align="right">44.31</td>
<td align="right">23.79</td>
<td align="right">57</td>
<td align="right">1,164,700</td>
<td align="right">0.12</td>
<td align="center">
<td align="left">0.99</td>
<td align="right">
</tr>
=
I want to get Andy, 15 andy.pvt.lom.
I am able to extract this table using doc.select(table).get
I am not able to extract the information I am looking.
how to get the "tables.select("xxxx");"
can you please help me with the xxx what I am missing?
You state:
I tried ; tables = doc.select("table").get(0); than tables.select("a title).
You want something more along the lines of
tables.select("a[href]").attr("href"); // to get your String
and
tables.select("a[href]").text(); // to get your number
e.g.,
Elements tables = doc.select("table");
String hrefAttr = tables.select("a[href]").attr("href");
System.out.println("href attribute: " + hrefAttr);
String number = tables.select("a[href]").text();
System.out.println("number: " + number);