Jsoup remove specific TR id

Jsoup remove specific TR id - java

I try to remove a "TR" on a table with specific id, where I'm got from
String url = "http://citraslider.blogspot.com/2014/03/table-model.html";
Document doc = Jsoup.connect(url).get();
System.out.println(doc);
And print:
<table id="mt" border="0" cellpadding="2" cellspacing="0" width="100%" class="hbtbl">
<tr id="1"><td class="aa"><div class="bb">11</div><b class="nme pn_std">bua</b>: ask 1 ?</td></tr>
<tr id="2"><td class="a"><div class="b">12</div><b class="nme pn_std">bua</b>: ask 2 ?</td></tr>
<tr id="3"><td class="aa"><div class="bb">13</div><b class="nme pn_std">bua</b>: ask 3 ?</td></tr>
<tr id="-1"><td class="a"><div align="center">[prev]</div></td></tr>
</table>
Here is:
Elements elemen= doc.select("tr");
Elements el = doc.getElementsByAttributeValue("id", "-1");
el.remove(0);
System.out.println(el); // Here its work
for (Element e : elemen) {
System.out.println(e.text()+":"+e.attr("id")); // But at this line still show [prev], tr id="-1" still show
e.getElementsByAttributeValue("id", "-1").remove();
}
So, how i can remove "tr id=-1" on a loop results?
<tr id="-1"><td class="a"><div align="center">[prev]</div></td></tr>

Just change your selector to exclude tr for id = -1
Elements elemen= doc.select("tr").not("tr#-1");
More information on the selector syntax here.
Code becomes,
Elements elemen= doc.select("tr").not("tr#-1");
for (Element e : elemen) {
System.out.println(e.text()+":"+e.attr("id"));
}
Gives,
11bua: ask 1 ?:1
12bua: ask 2 ?:2
13bua: ask 3 ?:3

Related

HREF + TEXT with Jsoup

I've the following HTML Page:
</div><div id="page_content_list01" class="grid_12">
<h2><strong class="floatleft">TEXT1</strong></h2><br>
<table>
<tbody>
<tr>
<th class="no_width">
<p class="floatleft">Attachments:</p>
</th>
<td class="link_azure">
<a target="_blank" href="http://www.example.com">TEXT2</a><br/>
</td>
</tr>
</tbody>
</table><h2><strong class="floatleft">TEXT3</strong></h2><br>
<table>
<tbody>
<tr>
<th class="no_width">
<p class="floatleft">Atachments:</p>
</th>
<td class="link_azure">
<a target="_blank" href="http://www.example2.com">TEXT4</a><br/>
</td>
</tr>
</tbody>
</table><h2><strong class="floatleft">TEXT5</strong></h2><br>
<table>
<tbody>
<tr>
Actually I'm doing:
Elements rows = document.select("div#page_content_list01");
Now I to select "TEXT" and link. I wanna to make clickable link, so I'm using:
for (Element eleme : rows) {
Elements elements = eleme.select("a");
for (Element elem : elementi) {
String url = elem.attr("href");
String title = elem.text();
}
}
and I'm getting:
url = "http://www.example.com";
title = "TEXT2";
and it's ok, but in this way I can't read "TEXT1" and "TEXT3".
Can someone help me please?

I think you need to work on the selecors. First, your primary selector
Elements rows = document.select("div#page_content_list01");
will return with a list of ONE element only, since you actually select the div, not the tables or table rows. I would instead do this to get all relevant info:
Elements tables = document.select("div#page_content_list01>table");
for (Element table : tables){
Element h2 = table.previousElementSibling();
String titleStr = h2.text();
Element a = table.select("a").first();
String linkStr = a.attr("href");
}
Note that the Text in the h2 elements is on the same level as the table, not inside a common div. This is why I use the previous sibling notation. Also note that I wrote this out of my head and it is untested. You should get the idea though.

How to extract href detail from links using Jsoup?

I have an html with this form:
<table>
<tbody>
<tr>
<td class="t1"><img class="png" src="" alt="site1"></td>
<td class="t2 up">INFORMATION</td>
<td class="t2 down">INFORMATION</td>
<td class="t2 up mark">INFORMATION</td>
</tr>
<tr>
<td class="t1"><img class="png" src="" alt="site2"></td>
<td class="t2 down">INFORMATION</td>
<td class="t2 stable">INFORMATION</td>
<td class="t2 up">INFORMATION</td>
</tr>
.
.
.
</tbody>
</table>
and I want to extract or the value of href (/click/site1) or the value of alt (site1).
How can I do this using Jsoup??
thx
edit:
this is the code that I wrote:
for(Element table : doc.select("table"))
{
for(Element row : table.select("tr"))
{
System.out.print(table.attr("href").toString());
Elements column = row.select("td");
{
System.out.println(column.text());
}
}
System.out.println();
}
but this line System.out.print(table.attr("href").toString());doesn't print anything

This process is described in jsoup cookbook.
http://jsoup.org/cookbook/extracting-data/working-with-urls
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
In your question you try to get the attribute href from the table but the table doesn't have href attribute. Either you search for all a tags or you may select the td inside your row and then the link inside of that.
Did some coding add changed your example and added some code to only write the links.
for(Element table : doc.select("table")) {
for(Element row : table.select("tr")) {
Elements column = row.select("td");
Elements atag = column.get(0).select("a");
System.out.print(atag.get(0).attr("href").toString());
System.out.print(" ");
System.out.println(column.text());
}
System.out.println();
}
for(Element link : doc.select("a")) {
System.out.println(link.attr("href")); // == "/"
}

Selenium webdriver : exclude child node

Here is the sample HTML Code :
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<tr class="tinyfont">
<tr height="2px">
<tr height="1px">
<tr height="1px">
<tr>
<tr height="2px">
<tr height="1px">
<tr height="1px">
<tr height="2px">
</tbody>
</table>
I am using selenium webdriver.
I have received the all the child elements from this code but now I want to exclude one particular child element in logic, how I can exclude one of the child element from my array.
I want to exclude tr[6] child element..
List<WebElement> list = driver.findElements(By.xpath("/html/body/table/tbody //*"));
ArrayList<String> al1 = new ArrayList<String>();
for(WebElement ele:list){
String className = ele.getAttribute("class");
System.out.println("Class name = "+className);
al1.add(className);
}
Thanks in Advance!!

Either omit the 6th table row, then select all descendants:
/html/body/table/tbody/tr[position() != 6]//*
or only select all table rows that are not at position 1 and have an attribute (and then select their descendants):
/html/body/table/tbody/tr[position() = 1 or #*]//*
or to be more specific, also check the attribute name:
/html/body/table/tbody/tr[position() = 1 or #height or #class]//*

Is it always element 6 that you want to avoid? If it is, use a for look with an increment and just avoid element 6 with an if statement.
int numOfElements = driver.findElements(By.xpath("/html/body/table/tbody //*")).count();
ArrayList<String> al1 = new ArrayList<String>();
for(int i = 1; i<= numOfElements; i++)
{
if(i!=6)
{
String className = driver.findElement(By.xpath("/html/body/table/tbody/tr["+i+"]")).getAttribute("class");
System.out.println("Class name = "+className);
al1.add(className);
}
}
This wont sound like a solution that you are looking for, but it still is a round about way to achieve what you want. Off the top of my head, I cant think of another way unless you have a attribute that contains something to compare off of or to exclude

Parsing table data with jsoup

I am using jsoup in my android app to parse my html code but now I need parse table data and I can not get it to work. I try many ways but not successful so I want try luck here if anyone have experience.
Here is part of my html:
<div id="editacia_jedla">
<h2>My header</h2>
<h3>My sub header</h3>
<table border="0" class="jedalny_listok_tabulka" cellpadding="2" cellspacing="1">
<tr>
<td width="100" class="menu_nazov neparna" align="left">Food Menu 1</td>
<td class="jedlo neparna" align="left">vegetable and beef
<div class="jedlo_box_alergeny">Allergens: 1, 3</div>
</td>
</tr>
<tr>
<td width="100" class="menu_nazov parna" align="left">Food Menu 2</td>
<td class="jedlo parna" align="left">Potato salad and pork
<div class="jedlo_box_alergeny">Allergens: 6</div>
</td>
</tr>
</table>
etc
</div>
My java/android code:
try {
String tableHtmlCode="";
Document fullHtmlDocument = Jsoup.connect(urlOfFoodDay).get();
Element elm1 = fullHtmlDocument.select("#editacia_jedla").first();
for( Element element : elm1.children() )
{
tableHtmlCode+=element.getElementsByIndexEquals(2); //this set table content because 0=h2, 1=h3
}
Document parsedTableDocument = Jsoup.parse(tableHtmlCode);
//Element th = parsedTableDocument.select("td[class=jedlo neparna]").first(); THIS IS BAD
String foodContent="";
String foodAllergens="";
}
So now I want extract text vegetable and beef and save it to string foodContent and numbera 1, 3(together) from div class jedlo_box_alergeny save to string foodAllergens. Someone can help? I will very grateful for any ideas

Iterate over your document's parent tag jedalny_listok_tabulka and loop over td tags.
td tag is the parent to href tags which include the allergy values. Hence, you would loop over the tags a elements to get your numbers, something like:
Elements myElements = doc.getElementsByClass("jedalny_listok_tabulka")
.first().getElementsByTag("td");
for (Element element : myElements) {
if (element.className().contains("jedlo")) {
String foodContent = element.ownText();
String foodAllergen = "";
for (Element href : element.getElementsByTag("a")) {
foodAllergen += " " + href.text();
}
System.out.println(foodContent + " : " + foodAllergen);
}
}
Output:
vegetable and beef : 1 3
Potato salad and pork : 6

Replacing text inside tags using Jsoup

<table width="100%" border="0" cellpadding="0" cellspacing="1" class="table_border" id="center_table">
<tbody>
<tr>
<td width="25%" class="heading_table_top">S. No.</td>
<td width="45%" class="heading_table_top">
Booking Status (Coach No , Berth No., Quota)
</td>
<td width="30%" class="heading_table_top">
* Current Status (Coach No , Berth No.)
</td>
</tr>
</tbody>
</table>
I scrap a webpage and store the response in a string.
I then parse it into jsoup doc
Document doc = Jsoup.parse(result);
Then i select the table using
Element table=doc.select("table[id=center_table]").first();
Now i need to replace the text in tag "Booking Status (Coach No , Berth No., Quota)" to "Booking Status" using jsoup.. Could anybody help ?
I tried
table.children().text().replaceAll(RegEx to select the text?????, "Booking Status");

Elements tds=doc.select("table[id=center_table] td"); // select the tds from your table
for(Element td : tds) { // loop through them
if(td.text().contains("Booking Status")) { // found the one you want
td.text("Booking Status"); // Replace with your text
}
}
then you can use doc.toString() to get the text of the HTML back to save to disk, send to a webView or whatever else you want to do with it.

Elements tablecells=doc.select("table tbody tr td");
will give you 3 cells.
use a loop to get the each element with
Element e=Elements.get(int index);
Use the e.text() to get the String.
Compare or replace strings with String.equals() , String.contains(), String.replace()

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Jsoup remove specific TR id - java

Related

HREF + TEXT with Jsoup

How to extract href detail from links using Jsoup?

Selenium webdriver : exclude child node

Parsing table data with jsoup

Replacing text inside tags using Jsoup

Categories

Resources