I've the following HTML Page:
</div><div id="page_content_list01" class="grid_12">
<h2><strong class="floatleft">TEXT1</strong></h2><br>
<table>
<tbody>
<tr>
<th class="no_width">
<p class="floatleft">Attachments:</p>
</th>
<td class="link_azure">
<a target="_blank" href="http://www.example.com">TEXT2</a><br/>
</td>
</tr>
</tbody>
</table><h2><strong class="floatleft">TEXT3</strong></h2><br>
<table>
<tbody>
<tr>
<th class="no_width">
<p class="floatleft">Atachments:</p>
</th>
<td class="link_azure">
<a target="_blank" href="http://www.example2.com">TEXT4</a><br/>
</td>
</tr>
</tbody>
</table><h2><strong class="floatleft">TEXT5</strong></h2><br>
<table>
<tbody>
<tr>
Actually I'm doing:
Elements rows = document.select("div#page_content_list01");
Now I to select "TEXT" and link. I wanna to make clickable link, so I'm using:
for (Element eleme : rows) {
Elements elements = eleme.select("a");
for (Element elem : elementi) {
String url = elem.attr("href");
String title = elem.text();
}
}
and I'm getting:
url = "http://www.example.com";
title = "TEXT2";
and it's ok, but in this way I can't read "TEXT1" and "TEXT3".
Can someone help me please?
I think you need to work on the selecors. First, your primary selector
Elements rows = document.select("div#page_content_list01");
will return with a list of ONE element only, since you actually select the div, not the tables or table rows. I would instead do this to get all relevant info:
Elements tables = document.select("div#page_content_list01>table");
for (Element table : tables){
Element h2 = table.previousElementSibling();
String titleStr = h2.text();
Element a = table.select("a").first();
String linkStr = a.attr("href");
}
Note that the Text in the h2 elements is on the same level as the table, not inside a common div. This is why I use the previous sibling notation. Also note that I wrote this out of my head and it is untested. You should get the idea though.
I have an html with this form:
<table>
<tbody>
<tr>
<td class="t1"><img class="png" src="" alt="site1"></td>
<td class="t2 up">INFORMATION</td>
<td class="t2 down">INFORMATION</td>
<td class="t2 up mark">INFORMATION</td>
</tr>
<tr>
<td class="t1"><img class="png" src="" alt="site2"></td>
<td class="t2 down">INFORMATION</td>
<td class="t2 stable">INFORMATION</td>
<td class="t2 up">INFORMATION</td>
</tr>
.
.
.
</tbody>
</table>
and I want to extract or the value of href (/click/site1) or the value of alt (site1).
How can I do this using Jsoup??
thx
edit:
this is the code that I wrote:
for(Element table : doc.select("table"))
{
for(Element row : table.select("tr"))
{
System.out.print(table.attr("href").toString());
Elements column = row.select("td");
{
System.out.println(column.text());
}
}
System.out.println();
}
but this line System.out.print(table.attr("href").toString());doesn't print anything
This process is described in jsoup cookbook.
http://jsoup.org/cookbook/extracting-data/working-with-urls
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
In your question you try to get the attribute href from the table but the table doesn't have href attribute. Either you search for all a tags or you may select the td inside your row and then the link inside of that.
Did some coding add changed your example and added some code to only write the links.
for(Element table : doc.select("table")) {
for(Element row : table.select("tr")) {
Elements column = row.select("td");
Elements atag = column.get(0).select("a");
System.out.print(atag.get(0).attr("href").toString());
System.out.print(" ");
System.out.println(column.text());
}
System.out.println();
}
for(Element link : doc.select("a")) {
System.out.println(link.attr("href")); // == "/"
}
First i want to apologise for my english. I am new with programming in java and also in Jsoup. i want to get some data from website. Information in the website is given in HTML tabel.i don't need not all fields from the tabel. I use this;
Document doc = Jsoup.connect("http://www.emo.nl/barges/en.html")
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
Element table1 = doc.select("table").first();//.getElementsByTag("td");//.getElementsByTag("td")
String body = table1.toString();
Document docb = Jsoup.parseBodyFragment(body);
Element bbd = docb.body();
String hhk = bbd.toString();
System.out.println(hhk);
result of this code gives me all Tabel in String. As follow;
<body>
<table>
<tbody>
<tr>
<th>Name</th>
<th>Bargeno.</th>
<th>Reported present</th>
<th>Busy</th>
<th>Starting</th>
<th>Harbour</th>
</tr>
<tr>
<td>AMETHYST</td>
<td>2327085</td>
<td>*</td>
<td>Busy</td>
<td>19-03-2014 spil 1</td>
<td>HH</td>
</tr>
<tr>
<td>AMETHYST 2</td>
<td>2327086</td>
<td>*</td>
<td>Busy</td>
<td>19-03-2014 spil 1</td>
<td>HH</td>
</tr>
<tr>
<td>AQUAPOLIS</td>
<td>6105002</td>
<td>*</td>
<td> </td>
<td>19-03-2014 spil 1</td>
<td>HH</td>
</tr>
</tbody>
</table>
</body>
This is too much information for me i want to make two variabel lets say;
private String naam;
private String date;
and in name variabel i want to store first <td> tag (AMETHYST)
and in date variabel i want to put fifth <td> tag (19-03-2014)
Is there any way to do this thanks a lot for any help.
One way to do it would be to read the elements at the specified index:
String naam = bbd.getElementsByTag("td").get(0).text();
String date = bbd.getElementsByTag("td").get(4).text();
System.out.println(naam + " " + date);
Gives,
AMETHYST 19-03-2014 spil 1
EDIT:
Since the td contains &nbps; spil 1 you would see that getting retrieved too. In case you want to eliminate and the presence is consistent then;
System.out.println(naam + " " + date.substring(0, date.indexOf('\u00A0') - 1));
Gives,
AMETHYST 19-03-2014
EDIT 2: Based on OP's query on getting the collection of all 1st tds within the table use something like:
Elements tds = table1.select(" > tbody > tr > td:eq(0)");
for (Element el : tds) {
System.out.println(el.text());
}
Where > tbody > tr > td:eq(0) pulls out the 0th index td against every tr encountered within your table1
Output,
AQUAPOLIS
AQUAPOLIS
IMPERIAL 7
CHIMO
...
For more information on the selector syntax refer to here.
<table id="tblListViewHeader" class="adminlist" cellspacing="1" cellpadding="0" style="table-layout: fixed; width: 1003px;">
<tbody>
</table>
</td>
</tr>
<tr>
<td>
<div id="divListView" style="width: 100%; height: 300px; overflow: auto; display: block;">
<table id="tblListView" class="adminlist" cellspacing="1" cellpadding="0" style="table-layout: fixed; width: 100%;">
<tbody data-bind="template: { name: 'ActiveGradeTemplate', foreach: ActiveGrade }">
<tr class="row0">
<td data-bind="text:$index()+1" style="width: 5%;">1</td>
<td data-bind="text: GradeName" style="width: 20%;">Vantage Point</td>
<td align="right" data-bind="text: DisplayCreatedDate" style="width: 10%;">27 Mar 2013</td>
<td align="right" data-bind="text: CreatedByUser" style="width: 10%;">Name</td>
<td align="right" data-bind="text: DisplayModifiedDate" style="width: 10%;">27 Mar 2013</td>
<td align="right" data-bind="text: ModifiedByUser" style="width: 10%;">Name</td>
<td align="center" data-bind="text: Status" style="width: 5%;">Active</td>
<td align="center" style="width: 10%;">
<a id="lnkEdit_7" data-bind="click: $root.lnkEdit, attr:{'id':'lnkEdit_' + GradeID}" href="#">Edit</a>
<span id="spanEdit_7" data-bind="attr:{'id':'spanEdit_' + GradeID}"></span>
</td>
</tr>
<tr class="row0">
<td data-bind="text:$index()+1" style="width: 5%;">2</td>
<td data-bind="text: GradeName" style="width: 20%;">test grade</td>
<td align="right" data-bind="text: DisplayCreatedDate" style="width: 10%;">Yesterday</td>
<td align="right" data-bind="text: CreatedByUser" style="width: 10%;">Name</td>
<td align="right" data-bind="text: DisplayModifiedDate" style="width: 10%;">Yesterday</td>
<td align="right" data-bind="text: ModifiedByUser" style="width: 10%;">Name</td>
<td align="center" data-bind="text: Status" style="width: 5%;">Active</td>
<td align="center" style="width: 10%;">
<a id="lnkEdit_11" data-bind="click: $root.lnkEdit, attr:{'id':'lnkEdit_' + GradeID}" href="#">Edit</a>
<span id="spanEdit_11" data-bind="attr:{'id':'spanEdit_' + GradeID}"></span>
</td>
</tr>
How can I retrieve the td values for each and every row, this is for Dynamic generation. All the tr class names are the same: <tr class="row0">. How do I retreive the table data for the above formatted table?
Try below code, this will print all cells data,
// Grab the table
WebElement table = driver.findElement(By.id("divListView"));
// Now get all the TR elements from the table
List<WebElement> allRows = table.findElements(By.tagName("tr"));
// And iterate over them, getting the cells
for (WebElement row : allRows) {
List<WebElement> cells = row.findElements(By.tagName("td"));
// Print the contents of each cell
for (WebElement cell : cells) {
System.out.println(cell.getText());
}
}
// Grab the table
WebElement table = driver.findElement(By.id("table-6"));
//Get number of rows in table
int numOfRow = table.findElements(By.tagName("tr")).size();
//Get number of columns In table.
int numOfCol = driver.findElements(By.xpath("//*[#id='table-6']/tbody/tr[1]/td")).size();
//divided Xpath In three parts to pass Row_count and Col_count values.
String first_part = "//*[#id='table-6']/tbody/tr[";
String second_part = "]/td[";
String third_part = "]";
//take the second column values
int j=2;
//List to store the second column
List<String> secondColumnList=new ArrayList<String>();
//Loop through the rows and get the second column and put it in a list
for (int i=1; i<=numOfRow; i++){
//Prepared final xpath of specific cell as per values of i and j.
String final_xpath = first_part+i+second_part+j+third_part;
//Will retrieve value from located cell and print It.
String test_name = driver.findElement(By.xpath(final_xpath)).getText();
secondColumnList.add(test_name);
System.out.println(test_name);
}
Dynamic table data capturing:
1.First of all Capture Table Head Count.
[int tHeadCount = driver.findElements(By.xpath("//table//tr//th")).size();]
2.Capture Table Row Count in which row your actual data exists.
[-in my point of view i need data from first row it self, so i am hard coding it to zero.]
If you want please add one for loop to existing code.
3.The actual solution starts from here.
Following is function call "Deposited By" is table heading text of corresponding table data.
String tableDataValue = managePackageTableData("Deposited By");
public String managePackageTableData(String columnName) {
//In Following line i am capturing table contains how many headers.
int tHeadCount = driver.findElements(By.xpath("//table//tr//th")).size();
int statusIndex = 0;
for(int i=0;i<tHeadCount-1;i++)
{
String theadValue = driver.findElements(By.className("table")).get(0).findElements(By.tagName("tr")).get(0).findElements(By.tagName("th")).get(i).getText();
if(theadValue.equalsIgnoreCase(columnName))
{
statusIndex = i;
break;
}
}
String tableData = driver.findElements(By.tagName("tbody")).get(0).findElements(By.tagName("tr")).get(0).findElements(By.tagName("td")).get(statusIndex).getText();
return tableData;
}
You have many options, but there's mine.
You catch all tds into a list.
List<WebElement> tdlist = driver.findElements(By.cssSelector("table[id='divListView'] tr td"));
and if you want to have the value, you can use a loop.
for(WebElement el: tdlist) {
Systeme.out.println(el.getText());
}
Check out this
Most common Challenge Automation tester face during iterating through Table and list. they often want to find some value from the table cell or list and want to perform action on the same value or find corresponding other element in the same block and perform action on it.
http://qeworks.com/iterate-table-lists-selenium-webdriver/
I have done this code using TestComplete and have replicated it now with Selenium C#, I have learnt this hard way but will work for any table control and you don't have to hardcode any xpath elements in it. Also if you have a nested table control within a td like for example where you have a nested table structure where your data is interpreted like below(happens in complex tables when you use developer express grids or angular grid controls).This case if you see the td tag again has a nested table structure which again has duplicate data. You can either capture such data or leave it depending on the case using the code which I am giving below.
Html
<table>
<tr>
<td>Account #
<table>
<tr>
<td>
Account #
</td>
</tr>
</table>
</td>
<td>Name
<table>
<tr>
<td>
Name
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>1234
<table>
<tr>
<td>
1234
</td>
</tr>
</table>
</td>
<td>Bharat
<table>
<tr>
<td>
Bharat
</td>
</tr>
</table>
</td>
</tr>
</table>
Code
public List<TableDataCollection> StoreHtmlTableToList(IWebElement tblObj)
{
DataTable dataTbl = new DataTable();
int rowIndex = 1;
try
{
_tblDataCollection = new List<TableDataCollection>();
var tblRows = ((IJavaScriptExecutor)DriverContext.Driver).ExecuteScript("return arguments[0].rows; ", tblObj);
if (tblRows != null)
{
//Iterate through each row of the table
foreach (IWebElement tr in (IEnumerable)tblRows)
{
int colIndx = 1;
// Iterate through each cell of the table row
var tblCols = ((IJavaScriptExecutor)DriverContext.Driver).ExecuteScript("return arguments[0].cells; ", tr);
foreach (IWebElement td in (IEnumerable)tblCols)
{
//loop through any child or nested table structures if you want using the same approach
//Write Table to List : This part is not done yet
//Print the values
Console.WriteLine("Row[" + rowIndex.ToString() + "] Col[" + colIndx.ToString() + "] : " + td.Text);
colIndx++;
}
rowIndex++;
}
}
}
catch (Exception)
{
throw;
}
return _tblDataCollection;
}
# Ripon Al Wasim
The below code will helps you to find values column by column
WebElement customtable = t.driver.findElement(By.cssSelector("div.custom-table"));
List<WebElement> r = customtable.findElements(By.tagName("tr"));
for (WebElement row : r) {
List<WebElement> d = row.findElements(By.tagName("td"));
for(int i = 0; i<d.size(); i++) {
if(i==0) {
WebElement x =d.get(i);
JavascriptExecutor js = (JavascriptExecutor) t.driver;
js.executeScript("arguments[0].scrollIntoView();", x);
System.out.println(i+"."+d.get(i).getText()+"\n");
if(d.get(i).getText().contains(searchtext)) {
System.out.println(i+".yes\n");
}
else
{
System.out.println("No\n");
}
}
}
}
its working for me.
//tbody/tr
this will give you the total no of row-
//tbody/tr/td
this will give you all the cell for the above rows and you can iterate it based on your requiredment.
Below are different approach we can follow to handle dynamic data in application [Its not for dynamic elements];
Using excel approach;
a. Get the web-element of the field
b. Get its text .
c. Store the data in the excel and validate with actual result pattern.
Note : There are multiple data validation options in excel like compare columns, get duplicate etc..
Using collection;
a. Get the web-element of the field.
b. Get its data inside the collection [List, set, map etc]
c. Write the java code to compare the pattern of the application data.
i. Pattern can be data type, data length, data range ,Decimal places of amount field or other ,currency type, date and time pattern etc.
ii. You can write the java conditions to verify the values of charts/graphs/dashboard if you are using in your application.
d. Compare the actual data pattern[from the collection] and the expected data pattern[From the java code]
Use JDBC API to handle it through the database where you can check the actual data by using different commands.
<table width="100%" border="0" cellpadding="0" cellspacing="1" class="table_border" id="center_table">
<tbody>
<tr>
<td width="25%" class="heading_table_top">S. No.</td>
<td width="45%" class="heading_table_top">
Booking Status (Coach No , Berth No., Quota)
</td>
<td width="30%" class="heading_table_top">
* Current Status (Coach No , Berth No.)
</td>
</tr>
</tbody>
</table>
I scrap a webpage and store the response in a string.
I then parse it into jsoup doc
Document doc = Jsoup.parse(result);
Then i select the table using
Element table=doc.select("table[id=center_table]").first();
Now i need to replace the text in tag "Booking Status (Coach No , Berth No., Quota)" to "Booking Status" using jsoup.. Could anybody help ?
I tried
table.children().text().replaceAll(RegEx to select the text?????, "Booking Status");
Elements tds=doc.select("table[id=center_table] td"); // select the tds from your table
for(Element td : tds) { // loop through them
if(td.text().contains("Booking Status")) { // found the one you want
td.text("Booking Status"); // Replace with your text
}
}
then you can use doc.toString() to get the text of the HTML back to save to disk, send to a webView or whatever else you want to do with it.
Elements tablecells=doc.select("table tbody tr td");
will give you 3 cells.
use a loop to get the each element with
Element e=Elements.get(int index);
Use the e.text() to get the String.
Compare or replace strings with String.equals() , String.contains(), String.replace()