Iterate through all links of a website using Selenium

Iterate through all links of a website using Selenium - java

I'm new to Selenium and I would like to download all the pdf, ppt(x) and doc(x) files from a website. I have written the following code. But I'm confused how to get the inner links:
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
public class WebScraper {
String loginPage = "https://blablah/login";
static String userName = "11";
static String password = "11";
static String mainPage = "https://blahblah";
public WebDriver driver = new FirefoxDriver();
ArrayList<String> visitedLinks = new ArrayList<>();
public static void main(String[] args) throws IOException {
System.setProperty("webdriver.gecko.driver", "E:\\geckodriver.exe");
WebScraper webSrcaper = new WebScraper();
webSrcaper.openTestSite();
webSrcaper.login(userName, password);
webSrcaper.getText(mainPage);
webSrcaper.saveScreenshot();
webSrcaper.closeBrowser();
}
/**
* Open the test website.
*/
public void openTestSite() {
driver.navigate().to(loginPage);
}
/**
* #param username
* #param Password Logins into the website, by entering provided username and password
*/
public void login(String username, String Password) {
WebElement userName_editbox = driver.findElement(By.id("IDToken1"));
WebElement password_editbox = driver.findElement(By.id("IDToken2"));
WebElement submit_button = driver.findElement(By.name("Login.Submit"));
userName_editbox.sendKeys(username);
password_editbox.sendKeys(Password);
submit_button.click();
}
/**
* grabs the status text and saves that into status.txt file
*
* #throws IOException
*/
public void getText(String website) throws IOException {
driver.navigate().to(website);
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
List<WebElement> allLinks = driver.findElements(By.tagName("a"));
System.out.println("Total no of links Available: " + allLinks.size());
for (int i = 0; i < allLinks.size(); i++) {
String fileAddress = allLinks.get(i).getAttribute("href");
System.out.println(allLinks.get(i).getAttribute("href"));
if (fileAddress.contains("download")) {
driver.get(fileAddress);
} else {
// getText(allLinks.get(i).getAttribute("href"));
}
}
}
/**
* Saves the screenshot
*
* #throws IOException
*/
public void saveScreenshot() throws IOException {
File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("screenshot.png"));
}
public void closeBrowser() {
driver.close();
}
}
I have an if clause which checks if the current link is a downloadable file (with an address including the word "download"). If it is, I will get it, if not, what to do? That part is my problem. I tried to implement a recursive function to retrieve the nested links and repeat the steps for the nested links, but no success.
In the meantime, the first link which is found when giving https://blahblah as the input, is https://blahblah/# which refers to the same page as https://blahblah. It can also cause a problem, but currently, I'm trapped in another problem, namely the implementation of the recursion function. Could you please help me?

You are not far off, but answering your question, grab all the link into a list of elements, iterate and click(and wait). Using C# something like this;
IList<IWebElement> listOfLinks = _driver.FindElements(By.XPath("//a"));
foreach (var link in listOfLinks)
{
if(link.GetAttribute("href").Contains("download"))
{
link.Click();
WaitForSecs(); //Thread.Sleep(1000)
}
}
JAVA
List<WebElement> listOfLinks = webDriver.findElements(By.xpath("//a"));
for (WebElement link :listOfLinks ) {
if(link.getAttribute("href").contains("download"))
{
link.click();
//WaitForSecs(); //Thread.Sleep(1000)
}
}

One option is to embed groovy in your java code if you want to search depth-first. When httpBuilder parses , it gives xml like documentation and then you can traverse as deep as you like using gpath in groovy. Your test.groovy is like below
#Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7' )
import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET
import static groovyx.net.http.ContentType.JSON
import groovy.json.*
import org.cyberneko.html.parsers.SAXParser
import groovy.util.XmlSlurper
import groovy.json.JsonSlurper
urlValue="http://yoururl.com"
def http = new HTTPBuilder(urlValue)
//parses page and provide xml tree , it even includes malformed html
def parsedText = http.get([:])
// number of a tags. "**" will parse depth-first
aCount= parsedText."**".findAll {it.name()=='a'}.size()
Then you just call test.groovy from java like this
static void runWithGroovyShell() throws Exception {
new GroovyShell().parse( new File( "test.groovy" ) ).invokeMethod( "hello_world", null ) ;
}
More info on parsing html with groovy
Addition:
When you evaluate groovy within Java, to access groovy variables in Java environment through groovy bindings, have a look here

Related

Need to identify the web element of the value 'SivaKumar' using xpath locator and return it. Using the same web element, get the text and return it

I am a beginner in Selenium. I do not have any hands on experience in it. Last month I had enrolled for a Selenium beginner to advanced course where I have few activities where I can do hands on.
I am stuck at a certain place. Let me explain my issue.
This is the activity description:
RelativeXpathLocator
URL: http://webapps.tekstac.com/Shopify/
Test Procedure:
Use the template code.
Don't make any changes in DriverSetup file.
Only in the suggested section add the code to,
Invoke the driver using getWebDriver() method defined in DriverSetup()
Identify the web element of the value 'SivaKumar' using xpath locator and return it.
Using the same web element, get the text and return it.
The code that I wrote for this:
//Add required imports
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
public class RelativeXpathLocator //DO NOT Change the class Name
{
static WebDriver driver;
static String baseUrl = "http://webapps.tekstac.com/Shopify/";
public WebDriver createDriver() //DO NOT change the method signature
{
DriverSetup ds = new DriverSetup();
return ds.getWebDriver();
//Implement code to create Driver from DriverSetup and return it
}
public WebElement getRelativeXpathLocator(WebDriver driver)//DO NOT change the method signature
{
WebElement l = driver.findElement(By.xpath("//*[#id='tbrow']/td[3]"));
return (l);
/*Replace this comment by the code statement to get the Web element */
/*Find and return the element */
}
public String getName(WebElement element)//DO NOT change the method signature
{
return element.getAttribute("tbrow");
//Get the attribute value from the element and return it
}
public static void main(String[] args){
RelativeXpathLocator pl=new RelativeXpathLocator();
driver = pl.createDriver();
//WebElement les = pl.getRelativeXpathLocator(driver);
//String las = pl.getName(les);
//Add required code
}
}
Kinda stuck here. Not sure what mistake I've made in getname or main().
The ending portion is throwing error while compiling. Says "Unable to locate name using xpath expected: but was:
Please advise.

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
public class RelativeXpathLocator //DO NOT Change the class Name
{
static WebDriver driver;
static String baseUrl = "http://webapps.tekstac.com/Shopify/";
public WebDriver createDriver() //DO NOT change the method signature
{
DriverSetup ds = new DriverSetup();
return ds.getWebDriver();
//Implement code to create Driver from DriverSetup and return it
}
public WebElement getRelativeXpathLocator(WebDriver driver)//DO NOT change the method signature
{
WebElement element = driver.findElement(By.xpath("//tr[#id='tbrow']/td[3]"));
return element;
/*Replace this comment by the code statement to get the Web element */
/*Find and return the element */
}
public String getName(WebElement element)//DO NOT change the method signature
{
return element.getText();
//Get the attribute value from the element and return it
}
public static void main(String[] args){
RelativeXpathLocator pl=new RelativeXpathLocator();
driver = pl.createDriver();
//WebElement les = pl.getRelativeXpathLocator(driver);
//String las = pl.getName(les);
//Add required code
}
}

Taking screenshot from multiple URL's with Ashot and Selenium

I am trying to automate a test case where i have to take the screenshot of a particular screen that exists in different websites. Spcifically, i am trying to test if a particular checkbox is aligned or not.Below is what i have as my script, and i am using Ashot to take the screenshots.The scripts logs into the three systems,and click on the link i want to it to click, however there is only a single screen shot from the last URL vs a screen shot from every URL. Please help me explain how can i iterate the Ashot so that it will take a screenshot for every website instead of what it is doing right now. Essentialy all the steps are iterated except taking the screenshot, and i want the script to iterate through the screenshots as well.
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.Assert;
import org.testng.annotations.*;
import ru.yandex.qatools.ashot.AShot;
import ru.yandex.qatools.ashot.Screenshot;
import ru.yandex.qatools.ashot.shooting.ShootingStrategies;
public class checkboxAlignment {
String driverPath = "C:\\Users\\xxx\\Desktop\\Work\\chromedriver.exe";
public WebDriver driver;
public String expected = null;
public String actual = null;
#BeforeTest
public void launchBrowser() {
System.out.println("launching chrome browser");
System.setProperty("webdriver.chrome.driver", driverPath);
driver = new ChromeDriver();
}
#Test(dataProvider = "URLprovider")
private void notePrice(String url) throws IOException {
driver.get(url);
System.out.println(driver.getCurrentUrl());
WebElement email = driver.findElement(By.xpath("//input[#id='Email']"));
WebElement password = driver.findElement(By.xpath("//input[#id='PWD']"));
WebElement submit = driver.findElement(By.xpath("//button[#type='submit']"));
email.sendKeys("xxx#xxx.com");
password.sendKeys("xxx");
submit.click();
System.out.println(driver.getTitle());
driver.manage().window().maximize();
//click on the PI tab
driver.findElement(By.id("IDpi")).click();
// This doesnot iterate, only one screenshot is taken by Ashot
Screenshot fpScreenshot = new AShot().shootingStrategy(ShootingStrategies.viewportPasting(1000)).takeScreenshot(driver);
ImageIO.write(fpScreenshot.getImage(),"PNG",new File("C://Users//dir//eclipse-workspace//someDir//screenshots//checkbox.jpg"));
}
#DataProvider(name = "URLprovider")
private Object[][] getURLs() {
return new Object[][] { { "http://www.someURL.com/A" }, { "http://www.someurl.com/B" },
{ "http://www.someurl.com/C" } };
}
}

You are saving all the screenshot in the same file checkbox.jpg. That is why your previous screenshots are replaced by the last one. Try to name the file different for every screenshot. Also, save the screenshots with .png extension as that is the actual file type.
Try this for saving the image:
ImageIO.write(fpScreenshot.getImage(),"PNG",new File("C://Users//dir//eclipse-workspace//someDir//screenshots//checkbox-"+driver.getCurrentUrl()+".png"));

I'm doing something like this
#Step("Захват страницы для хранилища")
protected void capturePageToVault(String pageName, String url, int scrollTime) throws IOException {
open(url);
expected = capturePage(scrollTime);
ImageIO.write(expected.getImage(), "png", expectedImg(pageName));
attach = new FileInputStream(expectedImg(pageName));
Allure.addAttachment("Exemplar", "image/png", attach, ".png");
attach.close();
}

How to enter date, month, year in 3 different dropdowns more efficiently

I have just started out on automation and I am stuck on how I can have my date, month, year which are in 3 different drop downs with different xpaths entered in more efficient way such that I do not have to use select class for every single one of them
Here is the code :
package com.singh.assignment;
import java.io.FileReader;
import java.util.List;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.Select;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
public class Json
{
public static void main(String args[])
{
JsonParser parser = new JsonParser();
try
{
Object obj = parser.parse(new FileReader("C:\\Users\\dell\\eclipse-
workspace\\Assignment\\data.json"));
JsonObject jsonObject = (JsonObject) obj;
String fname = (String) jsonObject.get("fname").getAsString();
String lname = (String) jsonObject.get("lname").getAsString();
String baseurl = (String) jsonObject.get("baseurl").getAsString();
String mstatus = (String) jsonObject.get("mstatus").getAsString();
System.setProperty("webdriver.gecko.driver","E:\\WORK\\geckodriver.exe\\");
WebDriver driver = new FirefoxDriver();
driver.get(baseurl);
driver.findElement(By.partialLinkText("Registration")).click();
driver.findElement(By.xpath("//input[#id =
'name_3_firstname']")).sendKeys(fname);
driver.findElement(By.xpath("//input[#id =
'name_3_lastname']")).sendKeys(lname);
List<WebElement> martial = driver.findElements(By.name("radio_4[]"));
{
for(WebElement radio : martial)
{
if(radio.getAttribute("value").equalsIgnoreCase(mstatus))
{
radio.click();
}
}
}
driver.findElement(By.xpath("//input[#value = 'reading']")).click();
WebElement cntry = driver.findElement(By.xpath("//select[#id =
'dropdown_7']"));
Thread.sleep(3000);
Select index = new Select(cntry);
index.selectByVisibleText("India");
WebElement month = driver.findElement(By.id("mm_date_8"));
Select index1 = new Select(month);
index1.selectByVisibleText("9");
WebElement date = driver.findElement(By.id("dd_date_8"));
Select index2 = new Select(date);
index2.selectByVisibleText("15");
WebElement year = driver.findElement(By.id("dd_date_8"));
Select index3 = new Select(year);
index3.selectByVisibleText("1995");
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
And here is the JSON File :
{
"baseurl": "http://demoqa.com/",
"fname": AKASHDEEP,
"lname": SINGH,
"mstatus": single,
"hobby": ["Dance", "Reading", "Cricket"]
}

I'd say that you need to completely revise your tests' implementation approach. An ideal test case shouldn't know anything about WebDriver, locators or hardcoded data. You should try building several abstraction layers to encapsulate driver calls within framework, locators within page objects, test data within external storage (plus entities and data providers).
If we talk about some basic optimization for your particular scenario, I'd start with creating some abstract page, which could hide explicit interactions with a WebDriver:
public abstract class AbstractPage {
private final WebDriverWait wait;
public AbstractPage() {
// assuming some external driver provider
this.wait = new WebDriverWait(getDriver(), 10);
}
public void selectByVisibleText(final By locator, final String text) {
new Select(waitFor(locator, ExpectedConditions::visibilityOfElementLocated)).selectByVisibleText(text);
}
public void selectByVisibleText(final By locator, final int number) {
selectByVisibleText(locator, String.valueOf(number));
}
private WebElement waitFor(final By locator, final Function<By, ExpectedCondition<WebElement>> condition) {
return wait.until(condition.apply(locator));
}
}
Then you can just create a page object for your domain logic, which will reuse common dropdown selection method:
public class HomePage extends AbstractPage {
private By dropdownDate = By.id("date");
private By dropdownMonth = By.id("month");
private By dropdownYear = By.id("year");
public HomePage selectDate(final LocalDate date) {
selectByVisibleText(dropdownMonth, date.getMonthValue());
selectByVisibleText(dropdownDate, date.getDayOfMonth());
selectByVisibleText(dropdownYear, date.getYear());
return this;
}
}
And in your test case you'd just call selectDate(date), which is much more concise and human readable way to express business logic.

Getting sub links of a URL using jsoup

Consider a URl www.example.com it may have plenty numbers of links ,some may be internal and other may be external.I want to get a list of all the sub links ,not even the sub-sub links but only sub link.
E.G if there are four links as follows
1)www.example.com/images/main
2)www.example.com/data
3)www.example.com/users
4)www.example.com/admin/data
Then out of the four only 2 and 3 are of use as they are sub links not the sub-sub and so on links .Is there a way to achieve it through j-soup..If this could not be achieved through j-soup then one can introduce me with some other java API.
Also note that it should be a link of the parent Url which is initially sent(i.e. www.example.com)

If i can understand a sub-link can contain one slash you can attempt with this with counting the number of slashes for example :
List<String> list = new ArrayList<>();
list.add("www.example.com/images/main");
list.add("www.example.com/data");
list.add("www.example.com/users");
list.add("www.example.com/admin/data");
for(String link : list){
if((link.length() - link.replaceAll("[/]", "").length()) == 1){
System.out.println(link);
}
}
link.length(): count the number of characters
link.replaceAll("[/]", "").length() : count the number of slashes
If the difference equal to one then right link else no.
EDIT
How will i scan the whole website for sub links?
The answer for this with the robots.txt file or Robots exclusion standard, so in this it define all the sub-links of the web site for example https://stackoverflow.com/robots.txt, so the idea is, to read this file and you can extract the sub-links from this web-site here is a piece of code that can help you :
public static void main(String[] args) throws Exception {
//Your web site
String website = "http://stackoverflow.com";
//We will read the URL https://stackoverflow.com/robots.txt
URL url = new URL(website + "/robots.txt");
//List of your sub-links
List<String> list;
//Read the file with BufferedReader
try (BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()))) {
String subLink;
list = new ArrayList<>();
//Loop throw your file
while ((subLink = in.readLine()) != null) {
//Check if the sub-link is match with this regex, if yes then add it to your list
if (subLink.matches("Disallow: \\/\\w+\\/")) {
list.add(website + "/" + subLink.replace("Disallow: /", ""));
}else{
System.out.println("not match");
}
}
}
//Print your result
System.out.println(list);
}
This will show you :
[https://stackoverflow.com/posts/, https://stackoverflow.com/posts?,
https://stackoverflow.com/search/, https://stackoverflow.com/search?,
https://stackoverflow.com/feeds/, https://stackoverflow.com/feeds?,
https://stackoverflow.com/unanswered/,
https://stackoverflow.com/unanswered?, https://stackoverflow.com/u/,
https://stackoverflow.com/messages/, https://stackoverflow.com/ajax/,
https://stackoverflow.com/plugins/]
Here is a Demo about the regex that i use.
Hope this can help you.

To scan the links on the web page you can use JSoup library.
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
class read_data {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("**your_url**").get();
Elements links = doc.select("a");
List<String> list = new ArrayList<>();
for (Element link : links) {
list.add(link.attr("abs:href"));
}
} catch (IOException ex) {
}
}
}
list can be used as suggested in the previous answer.
The code for reading all the links on a website is given below. I have used http://stackoverflow.com/ for illustration. I would recommend you to go through company's terms of use before scraping it's website.
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class readAllLinks {
public static Set<String> uniqueURL = new HashSet<String>();
public static String my_site;
public static void main(String[] args) {
readAllLinks obj = new readAllLinks();
my_site = "stackoverflow.com";
obj.get_links("http://stackoverflow.com/");
}
private void get_links(String url) {
try {
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a");
links.stream().map((link) -> link.attr("abs:href")).forEachOrdered((this_url) -> {
boolean add = uniqueURL.add(this_url);
if (add && this_url.contains(my_site)) {
System.out.println(this_url);
get_links(this_url);
}
});
} catch (IOException ex) {
}
}
}
You will get list of all the links in uniqueURL field.

Selenium click action directs to page other than by Selenium IDE

I am trying to click on a web element using Selenium WebDriver(2.21.0).
When I try driving through the Selenium IDE it works properly but when I try the same set of actions using the Java implementation of Firefox driver- it leads to the wrong page.
While the code is running and I manually scroll to the desired element, it works.
I am making sure that the web element is visible and enabled using
By by = By.xpath("(//a[contains(#href, 'javascript:void(0);')])[26]"); //**Edit:** this is how i
//am getting the locator
WebElement element = driver.findElement(by);
return (element.isEnabled() || element.isDisplayed()) ? element : null;
which returns some element but not the one I am expecting.
This looks strange to me as Selenium webdriver mostly scrolls to an element(if not visible on the screen) by itself and does the required interaction.
I have tried some answers like one, two with no success.
Thanks in advance!
EDIT: here is the IDE's exported code(java/JUnit4/webdriver)
package com.example.tests;
import java.util.regex.Pattern;
import java.util.concurrent.TimeUnit;
import org.junit.*;
import static org.junit.Assert.*;
import static org.hamcrest.CoreMatchers.*;
import org.openqa.selenium.*;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.Select;
public class Bandar {
private WebDriver driver;
private String baseUrl;
private StringBuffer verificationErrors = new StringBuffer();
#Before
public void setUp() throws Exception {
driver = new FirefoxDriver();
baseUrl = "http://e.weibo.com/";
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
}
#Test
public void testBandar() throws Exception {
driver.get(baseUrl + "/nescafechina");
driver.findElement(By.xpath("(//a[contains(#href, 'javascript:void(0);')])[26]")).click();
driver.findElement(By.xpath("(//a[contains(#href, 'javascript:void(0);')])[12]")).click();
}
#After
public void tearDown() throws Exception {
driver.quit();
String verificationErrorString = verificationErrors.toString();
if (!"".equals(verificationErrorString)) {
fail(verificationErrorString);
}
}
private boolean isElementPresent(By by) {
try {
driver.findElement(by);
return true;
} catch (NoSuchElementException e) {
return false;
}
}
}

Ishank,
What I have done is gone through and created a test that shows the different kind of asserts that you can use in your testing. They do look a little different from what you are looking at, I feel your main problem is WebElement element = driver.findElement(by); because you are not giving it an actual element. The (by); section is looking for a string that can be found on the page. Acceptable strings would be; id("gbfqb"); or xpath("(//a[contains(#href, 'javascript:void(0);')])[26]"); or even name("find-button");.
/**
* Test the main Google page.
* #throws InterruptedException
*
*/
#Test
public void signUp() throws InterruptedException {
String testId = "TestStack01";
entered(testId);
webDriver.get("www.google.com");
webDriver.findElement(By.id("gbqfq")).clear();
webDriver.findElement(By.id("gbqfq")).sendKeys("Test");
assertEquals("", webDriver.findElement(By.id("gbqfb")).getText());
WebElement whatyourlookingfor = webDriver.findElement(By.id("gbqfb"));
assertTrue(selenium.isElementPresent("gbqfb"));
assertTrue(whatyourlookingfor.isEnabled());
assertTrue(whatyourlookingfor.isDisplayed());
assertFalse(whatyourlookingfor.isSelected());
webDriver.findElement(By.id("gbqfb")).click();
leaving(testId);
}
I hope that this has helped in getting which element is being returned.
Curtis Miller

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Iterate through all links of a website using Selenium - java

Related

Need to identify the web element of the value 'SivaKumar' using xpath locator and return it. Using the same web element, get the text and return it

Taking screenshot from multiple URL's with Ashot and Selenium

How to enter date, month, year in 3 different dropdowns more efficiently

Getting sub links of a URL using jsoup

Selenium click action directs to page other than by Selenium IDE

Categories

Resources