Parsing an HTML table in Android studio

Parsing an HTML table in Android studio - java

I am trying to process a large amount of data for a research project. I have one html file loaded trough Jsoup, but the problem is that the table I need to evaluate does not have an Id or CLASS. I have searched stack, but I don't seem to find an answer as to how I can reach each <tr> and get the information out of its <td>'s.
<table>
<tr>
<td align="center">inf1</td>
<td align="center">date</td>
<tdalign="center">time</td>
<td align="center">group</td>
<td align="center">name</td>
<td align="center">---</td>
<td align="center">room</td>
<td align="center">---</td>
<td align="center">---</td>
<td> </td>
<tdalign="center">reason</td>
<td align="center"> </td>
</tr>
</table>
(The empty <td>'s and the "---" are just for displaying purposes in this table and don't have any value for my project)
I need to sort each <tr> (structured in the same way) by group and inf1 with the other data linked to them in order to use the data in an android Studio project where they will be displayed differently.
Thank you in advance for help:)

You can use Jsoup CSS selectors and a custom class that implements Comparable to keep the records. Something like this:
String html = ""
+"<table>"
+" <tr>"
+" <td align=\"center\">inf1</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group1</td>"
+" </tr> "
+"</table>"
+"<table>"
+" <tr>"
+" <td align=\"center\">inf1</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group0</td>"
+" </tr> "
+"</table>"
+"<table>"
+" <tr>"
+" <td align=\"center\">inf2</td>"
+" <td align=\"center\">date</td>"
+" <td align=\"center\">time</td>"
+" <td align=\"center\">group0</td>"
+" </tr> "
+"</table>"
;
Document doc = Jsoup.parse(html);
class TableRecord implements Comparable<TableRecord>{
public String inf = "";
public String grp = "";
#Override
public int compareTo(TableRecord arg0) {
int cmpGrp = arg0.grp.compareTo(this.grp);
if (cmpGrp==0){
return arg0.inf.compareTo(this.inf);
}
return cmpGrp;
}
#Override
public String toString(){
return "grp="+grp+":inf="+inf;
}
}
List<TableRecord> tableRecords = new ArrayList<>();
Elements trs = doc.select("table tr");
for (Element tr : trs){
Elements tds = tr.select("td");
TableRecord tableRecord = new TableRecord();
tableRecord.inf = tds.get(0).text();
tableRecord.grp = tds.get(3).text();
tableRecords.add(tableRecord);
}
Collections.sort(tableRecords);
for (TableRecord tableRecord:tableRecords){
System.out.println(tableRecord);
}

Related

How to read empty cell from the web table

How to read empty cell from the web table while iterating through each row to get the other data
HTML is as follows :
<table>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>7</td>
</tr>
</tbody>
</table>
I want to print the values of the table cells from 1 to 7.
I am able to print all the values successfully , however I am stuck at reading empty cells from the web table and the code throws me "WebElement not found exception "
Code I have written :
List cols= driver.findElements(By.xpath("//tbody//tr//td[1]"));
System.out.println("No of columns are : " + cols.size());
WebElement baseTable = driver.findElement(By.tagName("table"));
for(int i=1;i<=cols.size();i++) {
for(int j=1;j<=3;j++) {
WebElement tableRow = driver.findElement(By.xpath("//tbody//tr[" + i + "]//td[" + j + "]"));
String rowtext = tableRow.getText();
System.out.println("text in the row of table : "+rowtext);
}
When it comes to
//tbody//tr[3]//td[3]
and
//tbody//tr[5]//td[2]
and
//tbody//tr[5]//td[3]
Exception - WebElementNotFound is thrown
How can I featch empty cell values is my quesy.
Any help is appreciated!!
Thanks,

Selenium date picker issue

I need to pick a date from calendar so for that I'm picking all the dates using the below code but what i want is I need to get all the <td> tags from the calendar only but for now I'm getting <td> tags from the entire page . Below is the code:
List<WebElement> listofCalendardates=CommonBrowserSetup.driver.findElements(By.tagName("td"));
for(int i=0 ; i < listofCalendardates.size() ; i++) {
System.out.println("the data is :: " + listofCalendardates.get(i).getText());
if (listofCalendardates.get(i).getText().equals(finalDateValue)) {
listofCalendardates.get(i).click();
break;
}
}
Is there a way by which I can get only the dates from the calendar and no other data from the page? Thanks
<table class=" table-condensed">
<thead>
<tbody>
<tr>
<td class="old day">30</td>
<td class="day">1</td>
<td class="day">2</td>
<td class="day">3</td>
<td class="day">4</td>
<td class="day">5</td>
<td class="day">6</td>
</tr>
<tr>

You need to locate the calendar object and use it to locate the <td> tags
WebElement calendarElement = CommonBrowserSetup.driver.findElement(...);
List<WebElement> listofCalendardates = calendarElement.findElements(By.tagName("td"));

Use following xpath to get the tags of only datepickers
List<WebElement> listofCalendardates=CommonBrowserSetup.driver.findElements(By.xpath(".//table[#class='table-condensed']//td"));
for(int i=0 ; i < listofCalendardates.size() ; i++) {
System.out.println("the data is :: " + listofCalendardates.get(i).getText());
if (listofCalendardates.get(i).getText().equals(finalDateValue)) {
listofCalendardates.get(i).click();
break;
}
}

How to select multiple tags in an xpath

Please visit the website "http://www.cricbuzz.com/cricket-series/2223/icc-cricket-world-cup-2015/points-table"
<table class="table cb-srs-pnts">
<thead>
<tr class="cb-srs-gray-strip">
<th class="cb-col-20 cb-srs-pnts-th text-left" style="padding-left: 6px;">Pool B</th>
<td class="cb-srs-pnts-th">Mat</td>
<td class="cb-srs-pnts-th">Won</td>
<td class="cb-srs-pnts-th">Lost</td>
<td class="cb-srs-pnts-th">Tied</td>
<td class="cb-srs-pnts-th">NR</td>
<th class=" cb-srs-pnts-th">Pts</th>
<td class="cb-srs-pnts-th">NRR</td>
<th/>
</tr>
</thead>
<tbody>
</table>
I have to print all the table headings present for first table. It has a combination of "td" and "th" tags.
I am using the following xpath to retrive them.
//h3/../table[1]/thead/tr/*[self::td or self::th]
All the values are getting printed except for the text "Pool B"
Can somebody tell me why "Pool B" text is not getting selected?
Code to print the output:
driver.get("cricbuzz.com/cricket-series/2223/icc-cricket-world-cup-2015/…);
System.out.println(driver.findElement(By.xpath(" //h3/../table[1]/thead/tr/th")‌).getText());
List<WebElement> tableHeading = driver.findElements(By .xpath("//h3/../table[1]/thead/tr/*[self: : tdorself: : th]"));
for (int i = 1; i < tableHeading.size(); i++)
{
System.out.println(i+""+tableHeading.get(i).getText());
}

Set the index to 0 in your for loop:
for (int i = 0; i < tableHeading.size(); i++)
{
System.out.println(i+""+tableHeading.get(i).getText());
}

How to extract href detail from links using Jsoup?

I have an html with this form:
<table>
<tbody>
<tr>
<td class="t1"><img class="png" src="" alt="site1"></td>
<td class="t2 up">INFORMATION</td>
<td class="t2 down">INFORMATION</td>
<td class="t2 up mark">INFORMATION</td>
</tr>
<tr>
<td class="t1"><img class="png" src="" alt="site2"></td>
<td class="t2 down">INFORMATION</td>
<td class="t2 stable">INFORMATION</td>
<td class="t2 up">INFORMATION</td>
</tr>
.
.
.
</tbody>
</table>
and I want to extract or the value of href (/click/site1) or the value of alt (site1).
How can I do this using Jsoup??
thx
edit:
this is the code that I wrote:
for(Element table : doc.select("table"))
{
for(Element row : table.select("tr"))
{
System.out.print(table.attr("href").toString());
Elements column = row.select("td");
{
System.out.println(column.text());
}
}
System.out.println();
}
but this line System.out.print(table.attr("href").toString());doesn't print anything

This process is described in jsoup cookbook.
http://jsoup.org/cookbook/extracting-data/working-with-urls
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
In your question you try to get the attribute href from the table but the table doesn't have href attribute. Either you search for all a tags or you may select the td inside your row and then the link inside of that.
Did some coding add changed your example and added some code to only write the links.
for(Element table : doc.select("table")) {
for(Element row : table.select("tr")) {
Elements column = row.select("td");
Elements atag = column.get(0).select("a");
System.out.print(atag.get(0).attr("href").toString());
System.out.print(" ");
System.out.println(column.text());
}
System.out.println();
}
for(Element link : doc.select("a")) {
System.out.println(link.attr("href")); // == "/"
}

jsoup to get a particular element in Tablw

I have html page that extract the information from:
table class="students">
<tbody>
<tr class="rz" style="color:red;" onclick="location.href='//andy.pvt.com';">
<td>
<a title="Display Andy website,Andy" href="//andy.pvt.com">15</a>
</td>
<td>Andy jr</td>
<td align="right">44.31</td>
<td align="right">23.79</td>
<td align="right">57</td>
<td align="right">1,164,700</td>
<td align="right">0.12</td>
<td align="center">
<td align="left">0.99</td>
<td align="right">
</tr>
=
I want to get Andy, 15 andy.pvt.lom.
I am able to extract this table using doc.select(table).get
I am not able to extract the information I am looking.
how to get the "tables.select("xxxx");"
can you please help me with the xxx what I am missing?

You state:
I tried ; tables = doc.select("table").get(0); than tables.select("a title).
You want something more along the lines of
tables.select("a[href]").attr("href"); // to get your String
and
tables.select("a[href]").text(); // to get your number
e.g.,
Elements tables = doc.select("table");
String hrefAttr = tables.select("a[href]").attr("href");
System.out.println("href attribute: " + hrefAttr);
String number = tables.select("a[href]").text();
System.out.println("number: " + number);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing an HTML table in Android studio - java

Related

How to read empty cell from the web table

Selenium date picker issue

How to select multiple tags in an xpath

How to extract href detail from links using Jsoup?

jsoup to get a particular element in Tablw

Categories

Resources