Capturing Id attribute to an Array by JSoup? - java

I'm using the library Jsoup, is that I have a string with two HTML components with the attribute ID to all, I want to do is capture the two IDs in an array.
String chain = "<div id='stylized' class='myform' style='margin:20px auto;'>
<div id='material_comprado' > </div> ";
I was trying to use this for, but failed.
int i = 0;
Elements values = doc.getElementsByAttribute("id");
String s[] = new String[values.size()];
for(Element el : values){
s[i++] = el.attr("id");
System.out.println("==> "+s[i]);
}
Anyone can help me.

Your JSoup code itself is fine.
You're incrementing the array index for s beyond its upper bound resulting in an ArrayIndexOutOfBoundsException when you attempt to display the element. Increment the index after youve finished accessing the array
for (Element el : values){
s[i] = el.attr("id");
System.out.println("==> " + s[i]);
i++; // now safe to increment
}

Related

Iterating through elements in jsoup and parsing href

I was having trouble getting just the href from a rows of table data. Although I was able to get it working, I am wondering if anyone has an explanation for why my code here works.
for (Element element : result.select("tr")) {
if (element.select("tr.header.left").isEmpty()) {
Elements tds = element.select("td");
//The line below is what I don't understand
String link = tds.get(0).getElementsByAttribute("href").first().attr("href");
String position = tds.get(1).text();
}
}
The line that I was using before, that did not work is below:
String link = tds.get(0).attr("href");
Why does this line return an empty string? I'm assuming it has to do with how I am iterating through the elements as I've selected by "tr". However, I'm not familiar with how Elements vs Element are structured.
Thanks for your help!
Elements is simply an ArrayList<Element>
The reason you're having to write that extra code is because <td> doesn't have an href attribute, so tds.get(0).attr("href"); won't work. You're presumably trying to capture the href from an <a> within the cell. The longer, working code is saying:
For the first cell in the row, get the first element with an #href attribute (i.e. a link), and get
its #href attribute
Try the following example (with example document) to show how to access the child links more clearly:
Element result = Jsoup.parse("<html><body><table><tr><td><a href=\"http://a.com\" /</td><td>Label1</td></tr><tr><td><a href=\"http://b.com\" /></td><td>Label2</td></tr></table></body></html>");
for (Element element : result.select("tr")) {
if (element.select("tr.header.left").isEmpty()) {
Elements tds = element.select("td");
String link = tds.get(0).getElementsByTag("a").attr("href");
String position = tds.get(1).text();
System.out.println(link + ", " + position);
}
}

how to build an string array from a string and integers in java

I have to put in a string array some values resulting from several parsed html pages. So the first value it's a name and all the others are numbers. After I must return the array to main to print. Obviously I make something wrong .
this is part of my newbie code...
String[] ret = null;
int y = 0;
for (Element h1 : h1s) {
// Using Jsoup to scrape the html file and find H1 text
h1_id = h1.className();
// I put here the text of H1
h1_text = h1.text();
if (h1_id.equals("ezomat-logo-text ezCSS")) {
// jump to the next h1
} else {
// I want to put the txt as the first array place
ret[y] = "'" + h1_text + "'";
}
i = 0;
// found the number values single integers with comma
for (Element image : images) {
Imm[i] = "," + imageName;
i++;
}
i = 0;
y = 1;
// y = 1 because I want to start from the second position.
for (Element image : images) {
ret[y] = Imm[i];
i++;
y++;
}
}
return ret;
You can't dynamicly resize an array, you have to initialize it with a fixed size.
So, you have to initialize it with
String[] ret = new String[size];
where size have to be the number of elements you are going to put into your array.
Or the better approach: Use ArrayList<String>instead. Initialize it with
ArrayList<String> ret = new ArrayList<String>();
and add your Items with ret.add("whatever");.
On the first line of your code you attempt to define an array without a size, but you don't actually define it, you just assign null.
Also, it's impossible to dynamically add elements to such array.
For these scenarios we have List.
To define a List that stores Strings use the following code:
List<String> ret = new ArrayList<String> ();
And then proceed to add elements to this array like so:
ret.add ("," + imageName);
To retrieve a value from an index in the list do the following:
ret.get(index);
Java does not allow arrays with variable length. I think that this is your main problem.
There are two choiches:
Obtain the array length first and instantiate the array accordingly
String[] ret = new String[100];
Use an ArrayList
ArrayList<String> ret = new ArrayList<String>();
You can add elements to the ArrayList like this: ret.add(value);
The Java Tutorial: Arrays
java.util.ArrayList reference

Extracting Table Data with JSoup on Yahoo Finance

Trying to practice extracting data from tables using JSoup. Can't figure out why I can't pull the "Shares Outstanding" field from
https://finance.yahoo.com/q/ks?s=AAPL+Key+Statistics
Here's two attempts where 's' is AAPL:
public class YahooStatistics {
String sharesOutstanding = "Shares Outstanding:";
public YahooStatistics(String s) {
String keyStatisticsURL = ("https://finance.yahoo.com/q/ks?s="+s+"+Key+Statistics");
//Attempt 1
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (Element td : tds.select(sharesOutstanding)) {
System.out.println(td.ownText());
}
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
//Attempt 2
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (int j = 0; j < tds.size() - 1; j++) {
Element td = tds.get(j);
if ((td.ownText()).equals(sharesOutstanding)) {
System.out.println(tds.get(j+1).ownText());
}
}
}
}
}
catch(IOException ex) {
ex.printStackTrace();
}
The attempts return: BUILD SUCCESSFUL and nothing else.
I've disabled JavaScript on my browser and the table still shows, so I'm assuming this is not written in JavaScript but HTML.
Any suggestions are appreciated.
Notes about your source after the edit:
You should compare ownText() rather than text(). text() gives you the combined text of all the element and all its sub-elements. In this case the element contains Shares Outstanding<font size="-1"><sup>5</sup></font>:, so its combined text is "Shares Outstanding5:". If you use ownText it will just be "Shares Outstanding:".
Note the colon (:). Update the value in sharesOutstanding accordingly.
You are passing it the wrong URL. There should be a + following the AAPL.
Your current query (at least the second attempt) is returning the element twice, because there is a nested table so it finds the TDs twice.
You can either break from your loops once you found a match, go back to your original version (with corrections as above) - see note - or you can try using a more sophisticated query which will only match once:
Elements elems = doc.select("td.yfnc_tablehead1:containsOwn("+sharesOutstanding+") + td.yfnc_tabledata1");
if ( ! elems.isEmpty() ) {
System.out.println( elems.get(0).owntext() );
}
This selector gives you all the td elements whose class is yfnc_tabledata1, whose immediate preceding sibling is a td element whose class is yfnc_tablehead1 and whose own text contains the "Shares Outstanding:" string. This should basically select the exact TD you need.
Note: the previous version of this answer was a long rattle about the difference between Elements.select() and Element.select(). It turns out that I was dead wrong and your original version should have worked - if you had corrected the four points above. So to set the record straight: select() on an Elements actually does look inside each element and the resulting list may contain descendents of any of the elements in the original list that match the selection. Sorry about that.

How to check the order of breadcrumbs in selenium

I have a header defined as First / Second / Third as a breadcrumbs. I would like to check if the elements are displayed in correct order using selenium with optimal way of coding.
<ol class = "breadcrumb"
<li class="break-all">
First
<span class="divider">/</span>
</li>
<li class="break-all">
Second
<span class="divider">/</span>
</li>
<li class="break-all">
Third
<span class="divider">/</span>
</li>
</ol>
Now when I do
findBy("//ol[#class='breadcrumb']")
, I get the whole elements.
First you need to find all breadcrumb elements, for example, via a cssSelector(). Then, for every WebElement in the list call getText() to get the actual text:
List<String> expected = Arrays.asList("First", "Second", "Third");
List<WebElement> breadcrumbs = driver.findElements(By.cssSelector("ol.breadcrumb li a"));
for (int i = 0; i < expected.length; i++) {
String breadcrumb = breadcrumbs.get(i).getText();
if (breadcrumb.equals(expected[i])) {
System.out.println("passed on: " + breadcrumb);
} else {
System.out.println("failed on: " + breadcrumb);
}
}
findBy("//ol[#class='breadcrumb']").getText().equals("First / Second / Third");
it should work.
To solve your problem, you can follow the below process:
1- Create an "ArrayList" and add all the items that needs to be compared with.
2- Retrieve the link texts and put it in a new ArrayList.
3- Assert that the two ArrayList matches.
Below code shall work for you:
//Adding all the list items to compare in an ArrayList
ArrayList<String> alist = new ArrayList<String>();
alist.add("First");
alist.add("Second");
alist.add("Third");
//Checking the Arraylist's data
System.out.println("The list values are as under: ");
for(String list_item: alist)
System.out.println(list_item);
//Creating an ArrayList to store the retrieved link texts
ArrayList<String> List_Compare = new ArrayList<String>();
//Retrieving the link texts and putting them into the Arraylist so created
List<WebElement> New_List = driver.findElements(By.xpath("//a[#class='break-all']"));
for(WebElement list_item: New_List){
List_Compare.add(list_item.getText());
}
//Checking the new Arraylist's data
System.out.println("The Retrieved list values are as under: ");
for(String list_item: List_Compare)
System.out.println(list_item);
//Asserting the original Arraylist matches to the Arraylist with retrieved Link Texts
try{
Assert.assertEquals(alist, List_Compare);
System.out.println("Equal lists");
}catch(Throwable e){
System.err.println("Lists are not equal. "+e.getMessage());
}
NOTE: Do import the Assert class using import junit.framework.Assert; for the last part of the code to work.
You need to create a list of known values to compare with. And then use findElements() to find all the elements to match your target. In that you also need to carefully write the selector so that it grabs the list of expected elements.
//a[#class='break-all'] can be used to grab the list of elements you want
String[] expected = {"First", "Second", "Third"};
List<WebElement> allOptions = select.findElements(By.xpath("//a[#class='break-all']"));
// make sure you found the right number of elements
if (expected.length != allOptions.size()) {
System.out.println("fail, wrong number of elements found");
}
// make sure that the value of every <option> element equals the expected value
for (int i = 0; i < expected.length; i++) {
String optionValue = allOptions.get(i).getText();
if (optionValue.equals(expected[i])) {
System.out.println("passed on: " + optionValue);
} else {
System.out.println("failed on: " + optionValue);
}
}
Implementation taken from here

Parsing links for href value using JSoup works for a single link, but not for an array of links

I have managed to successfully grab the href links using JSoup. I have also managed to grab the relative value and absolute value of a href for a single link. As shown below:
//works perfectly, website: bbc.co.uk
Document document = Jsoup.connect(url).get();
Element link = document.select("a").last();
String relHref = testlink.attr("href");
String absHref = testlink.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
//output:
relHref: /help/web/links/
absHref: http://www.bbc.co.uk/help/web/links/
I can even use Element link = document.select("a").first(); and this also works. However, when I try and add this in a loop to iterate through all of the grabbed links and print out each link, it doesn't give me the expected results. Here is my code:
//not working
Elements links = document.select("a");
for(int i=0; i<links.size(); i++){
String relHref = links.attr("href");
String absHref = links.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
}
//output
http://m.bbc.co.uk
http://m.bbc.co.uk
http://m.bbc.co.uk
....
I know the links array of type Elements has the correct data, and if I try and print the elements in the links array it displays all of the href tags i.e.
for (Element link : links) {
System.out.println(link);
}
//output 116 links:
mobile site
<img src="http://static.bbci.co.uk/frameworks/barlesque/2.72.5/orb/4/img/bbc-blocks-dark.png" width="84" height="24" alt="BBC">
Skip to content
<a id="orb-accessibility-help" href="/accessibility/">Accessibility Help</a>
....
But how do I get the relHref and absHref for an array to work? Instead my code just prints out the first link over and over again. I've been going at this for hours, so I'm probably making a silly mistake somewhere but help is appreciated!
Thanks.
On this line:
String relHref = links.attr("href");
...how is it supposed to know you're talking about the ith link? (It doesn't: Elements#attr always returns the value for the first entry in the Elements collection.)
You want
String relHref = links.get(i).attr("href");
...which gets the specific link you're interested in via Elements#get, then uses Node#attr on it.
That said, though, I would just use the enhanced for loop:
for (Element link : document.select("a")) {
String relHref = link.attr("href");
String absHref = link.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
}
...unless you need i for something.
You need to use the Elements method, get(int index) inside of your for loop to get each Element held by your Elements.
e.g.,
Elements links = document.select("a");
for(int i=0; i < links.size(); i++) {
Element ele = links.get(i);
/// use ele here to extract info from each Element
}

Categories

Resources