How to prevent getting duplicate hrefs using getAttribute Selenium

How to prevent getting duplicate hrefs using getAttribute Selenium - java

I keep getting same links from the site which I'm testing.
This is my code
"
List<WebElement> activeLinks = new ArrayList<WebElement>();
//2.Iterate LinksList: Exclude all the links/images - doesn't have any href attribute and exclude images starting with javascript.
boolean breakIt = true;
for(WebElement link:AllTheLinkList)
{
breakIt = true;
try
{
//System.out.println((link.getAttribute("href")));
if(link.getAttribute("href") != null && !link.getAttribute("href").contains("javascript") && link.getAttribute("href").contains("pharmacy")) //&& !link.getAttribute("href").contains("pharmacy/main#"))
{
activeLinks.add(link);
}
}
catch(org.openqa.selenium.StaleElementReferenceException ex)
{
breakIt = false;
}
if (breakIt)
{
continue;
}
}
//Get total amount of Other links
log.info("Other Links ---> " + (AllTheLinkList.size()-activeLinks.size()));
//Get total amount of links in the page
log.info("Size of active links and images in pharmacy ---> "+ activeLinks.size());
for(int j=0; j<activeLinks.size(); j++) {
HttpURLConnection connection = (HttpURLConnection) new URL(activeLinks.get(j).getAttribute("href")).openConnection();
connection.setConnectTimeout(4000);
connection.connect();
String response = connection.getResponseMessage(); //Ok
int code = connection.getResponseCode();
connection.disconnect();
//System.out.println((j+1) +"/" + activeLinks.size() + " " + activeLinks.get(j).getAttribute("href") + "---> status:" + response + " ----> code:" + code);
log.info((j+1) +"/" + activeLinks.size() + " " + activeLinks.get(j).getAttribute("href") + "---> status:" + response + " ----> code:" + code);
}
And this is my output:
I'm getting same links again and again. it's like they are repeating.
Anybody can help me with this?

Try copying the list items to a set because set does not allow duplicates.
For example:
WebDriver driver = new ChromeDriver();
List<WebElement> anchors = driver.findElements(By.tagName("a"));
Set<WebElement> hrefs = new HashSet<WebElement>(anchors);
Iterator<WebElement> i = hrefs.iterator();
while(i.hasNext()) {
WebElement anchor = i.next();
if(anchor.getAttribute("href").contains(href)) {
anchor.click();
break;
}
}
Hope this helps.

Related

I can login a webpage (aspx) with python but i can't with JSoup

import requests as rq
url = "https://ogrotomasyon.uludag.edu.tr/login.aspx?un="
data = { "__EVENTTARGET" : "", "__EVENTARGUMENT" : "",
"__VIEWSTATE" : "",
"__VIEWSTATEGENERATOR" : "C2EE9ABB",
"__EVENTVALIDATION" : "",
"un":"USERNAME", "pw":"PASSWORD", "ok22": "Giriş"}
s = rq.Session()
a = s.get(url, verify=0)
print(a.text)
d1 = input()
d2 = input()
data["__VIEWSTATE"] = d1
data["__EVENTVALIDATION"] = d2
a = s.post(url, data=data, verify=0)
s.get(a.url)
print(a.url)
I can login the system with this code when i copy __VIEWSTATE and __EVENTVALIDATION truely.
CAREFULL : I'm using one session. If i use just 'rq.get("URL")' i cannot log in. I have to use one and always same session
public static void main(String[] args) {
try {
Connection.Response first = Jsoup.connect("https://ogrotomasyon.uludag.edu.tr/login.aspx")
.validateTLSCertificates(false)
.method(Connection.Method.GET)
.execute();
Elements form = first.parse().select("form");
Elements inputs = form.select("input");
System.out.println(inputs.get(0).id() + " " + inputs.get(0).val());
System.out.println(inputs.get(1).id() + " " + inputs.get(1).val());
System.out.println(inputs.get(2).id() + " " + inputs.get(2).val());
System.out.println(inputs.get(3).id() + " " + inputs.get(3).val());
System.out.println(inputs.get(4).id() + " " + inputs.get(4).val());
System.out.println(inputs.get(7).id() + " " + inputs.get(7).val());
Document second = Jsoup.connect("https://ogrotomasyon.uludag.edu.tr/login.aspx")
.validateTLSCertificates(false)
.data(inputs.get(0).id(), inputs.get(0).val())
.data(inputs.get(1).id(), inputs.get(1).val())
.data(inputs.get(2).id(), inputs.get(2).val())
.data(inputs.get(3).id(), inputs.get(3).val())
.data(inputs.get(4).id(), inputs.get(4).val())
.data("un", "USERNAME")
.data("pw", "PASSWORD")
.data(inputs.get(7).id(), inputs.get(7).val())
.post();
System.out.println(second.text());
} catch (IOException e) {
e.printStackTrace();
}
}
}
But i can't log in with this code. Actually sometimes i can but few times it "throws java.net.UnknownHostException"
How can i connect via 2nd code?

Request parameters coming from jsps are changed when two different users access the code same time

public String generateDataPDF() {
System.out.println("Inside generate PDF");
String filePath = "";
HttpSession sess = ServletActionContext.getRequest().getSession();
try {
sess.setAttribute("msg", "");
if (getCrnListType().equalsIgnoreCase("F")) {
try {
filePath = getModulePath("CRNLIST_BASE_LOCATION") + File.separator + getCrnFileFileName();
System.out.println("File stored path : " + filePath);
target = new File(filePath);
FileUtils.copyFile(crnFile, target);
} catch (Exception e) {
System.out.println("File path Exception " + e);
}
}
System.out.println("Values from jsp are : 1)Mode of Generation : " + getCrnListType() + " 2)Policy Number : " + getCrnNumber() + " 3)Uploaded File Name : " + getCrnFileFileName() + " 4)LogoType : " + getLogoType()
+ " 5)Output Path : " + getOutputPath() + " 6)Type of Generation : " + getOptionId() + " 7)PDF Name : " + getPdfName());
String srtVAL = "";
String arrayVaue[] = new String[]{getCrnListType(), getCrnListType().equalsIgnoreCase("S") ? getCrnNumber() : filePath, getLogoType().equalsIgnoreCase("WL") ? "0" : "1",
getOutputPath(), getGenMode(), getRenType()};
//INS DB Connection
con = getInsjdbcConnection();
ArrayList selectedCRNList = new ArrayList();
String selectedCRNStr = "";
selectedCRNStr = getSelectedVal(selectedCRNStr, arrayVaue[1]);
String[] fileRes = selectedCRNStr.split("\\,");
if (fileRes[0].equalsIgnoreCase("FAIL")) {
System.out.println("fileRes is FAIL beacause of other extension file.");
sess.setAttribute("pr", "Please upload xls or csv file.");
return SUCCESS;
}
System.out.println("List file is : " + selectedCRNStr);
String st[] = srtVAL.split("[*]");
String billDateStr = DateUtil.getStrDateProc(new Date());
Timestamp strtPasrsingTm = new Timestamp(new Date().getTime());
String minAMPM = DateUtil.getTimeDate(new Date());
String str = "";
String batchID = callSequence();
try {
System.out.println("Inside Multiple policy Generation.");
String userName=sess.getAttribute("loginName").toString();
String list = getProcessesdList(userName);
if (list != null) {
System.out.println("list is not null Users previous data is processing.....");
//setTotalPDFgNERATEDmSG("Data is processing please wait.");
sess.setAttribute("pr","Batch Id "+list+" for User " + userName + " is currently running.Please wait till this Process complete.");
return SUCCESS;
}
String[] policyNo = selectedCRNStr.split("\\,");
int l = 0, f = 0,counter=1;
for (int j = 0; j < policyNo.length; j++,counter++) {
String pdfFileName = "";
int uniqueId=counter;
globUniqueId=uniqueId;
insertData(batchID, new Date(), policyNo[j], getOptionId(), userName,uniqueId);
System.out.println("Executing Proc one by one.");
System.out.println("policyNo[j]" + policyNo[j]);
System.out.println("getOptionId()" + getOptionId());
System.out.println("seqValue i.e batchId : " + batchID);
}
str = callProcedure(policyNo[j], getOptionId(), batchID);
String[] procResponse = str.split("\\|");
for (int i = 0; i < procResponse.length; i++) {
System.out.println("Response is : " + procResponse[i]);
}
if (procResponse[0].equals("SUCCESS")) {
Generator gen = new Generator();
if (getPdfName().equalsIgnoreCase("true")) {
System.out.println("Checkbox is click i.e true");
pdfFileName = procResponse[1];
} else {
System.out.println("Checkbox is not click i.e false");
String POLICY_SCH_GEN_PSS = getDetailsForFileName(userName, policyNo[j], batchID);
String[] fileName = POLICY_SCH_GEN_PSS.split("\\|");
if (getLogoType().equals("0") || getLogoType().equals("2")) {
System.out.println("If logo is O or 1");
pdfFileName = fileName[1];
} else if (getLogoType().equals("1")) {
System.out.println("If logo is 2");
pdfFileName = fileName[0];
}
}
b1 = gen.genStmt(procResponse[1], procResponse[2], "2", getLogoType(), "0", pdfFileName,"1",userName,batchID);
l++;
updateData(uniqueId,batchID, "Y");
} else {
f++;
updateData(uniqueId,batchID, "F");
}
}
sess.setAttribute("pr","Total "+l+" "+getGenericModulePath("PDF_RES1") + " " + " " + getGenericModulePath("PDF_RES2") + " " + f);
}catch (Exception e) {
updateData(globUniqueId,batchID, "F");
System.out.println("Exception in procedure call");
setTotalPDFgNERATEDmSG("Fail");
e.printStackTrace();
sess.setAttribute("pr", "Server Error.");
return SUCCESS;
}
}catch (Exception ex) {
ex.printStackTrace();
sess.setAttribute("pr", "Server Error.");
return SUCCESS;
}
System.out.println("Above second return");
return SUCCESS;
}
GenerateDataPDf method generates PDF based on the parameters i.e ProductType(GenMode),CrnList(uploaded in excel file...)Code works fine when only single user generates PDF. But If two different User(User and roles are assigned in application) start the process same time request paraeters are overridden then! Suppose first user request pdf for 50 customers for product 1. User1's process is still running and second user request for product2. Now User1's pdf are generated but for product2.....! Here batchId is unique for every single request.One table is maintained where batch_id,all pdf,generation flags are mainained there. How do I solve this?

As per your comment, this is what I would do, It's probably not the best way to do !
Firstly : Create a function to collet all your data at the beginning. You should not modify/update/create anything when you are generating a PDF. IE : array/list collectPDFData() wich should retourn an array/list.
Secondly : Use a synchronized methods like synchronized boolean generatePDF(array/list)
"Synchronized" methods use monitor lock or intrinsic lock in order to manage synchronization so when using synchronized, each method share the same monitor of the corresponding object.
NB : If you use Synchronize, it's probably useless to collect all your data in a separate way, but I think it's a good practice to make small function dedicated to a specific task.
Thus, your code should be refactored a little bit.

Can't get text with Selenium

I have a problem with Selenium WebDriver in Java. When I use this code (without using element.click();) it works:
public static void main(String[] args) {
try {
File salida= new File("salidas/Salida.txt");
FileWriter fw = new FileWriter(salida);
PrintWriter volcado = new PrintWriter(fw);
System.setProperty("webdriver.chrome.driver", "path to\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://ranking-empresas.eleconomista.es/REPSOL-PETROLEO.html");
String name = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[1]/td[2]")).getText();
String obj_soc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[2]/td[2]")).getText();
String direcc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[3]/td[2]")).getText();
String loc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[4]/td[2]")).getText();
String tel = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[5]/td[2]")).getText();
String url = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[8]/td[2]")).getText();
String actividad = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[9]/td[2]")).getText();
volcado.print(name + " " + obj_soc + " " + direcc + " " + loc + " " + tel + " " + url + " " + actividad);
volcado.close();
driver.close();
}
catch(Exception e) {
e.printStackTrace();
}
}
But the problem came when I wanted to access by the previous page with the element.click(); like this:
System.setProperty("webdriver.chrome.driver", "path to\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html");
WebElement element = driver.findElement(By.xpath("//*[#id=\"tabla-ranking\"]/table/tbody/tr[1]/td[7]/a"));
element.click();
String name = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[1]/td[2]")).getText();
String obj_soc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[2]/td[2]")).getText();
String direcc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[3]/td[2]")).getText();
String loc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[4]/td[2]")).getText();
String tel = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[5]/td[2]")).getText();
String url = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[8]/td[2]")).getText();
String actividad = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[9]/td[2]")).getText();
volcado.print(name+" "+obj_soc+" "+direcc+" "+loc+" "+tel+" "+url+" "+actividad);
volcado.close();
driver.close();
}
catch(Exception e){
e.printStackTrace();
}}
Selenium opens the browser and the pages, but my variables don’t get the text of the XPath expression.

The data is not yet present on the page at the time you are trying to get the text. Wait for the data before reading it, and it should be fine:
WebDriver driver = new ChromeDriver();
WebDriverWait wait = new WebDriverWait(driver, 10);
driver.get("http://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html");
driver.findElement(By.xpath("//*[#id=\"tabla-ranking\"]/table/tbody/tr[1]/td[7]/a")).click();
// Wait for the data to be present
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("business-profile")));
String name = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[1]/td[2]")).getText();
String obj_soc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[2]/td[2]")).getText();
String direcc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[3]/td[2]")).getText();
String loc = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[4]/td[2]")).getText();
String tel = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[5]/td[2]")).getText();
String url = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[8]/td[2]")).getText();
String actividad = driver.findElement(By.xpath("//*[#id=\"business-profile\"]/div[17]/div[1]/div[2]/table/tbody/tr[9]/td[2]")).getText();
volcado.print(name + " " + obj_soc + " " + direcc + " " + loc + " " + tel + " " + url + " "+actividad);
volcado.close();
However, a much cleaner alternative would be to get all the cells with a single XPath expression:
driver.get("http://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html");
driver.findElement(By.xpath("//*[#id=\"tabla-ranking\"]/table/tbody/tr[1]/td[7]/a")).click();
// Wait for the data to be present
List<WebElement> cells = wait.until(
ExpectedConditions.presenceOfAllElementsLocatedBy(
By.xpath("//h3[.='Datos comerciales de REPSOL PETROLEO SA']/following::tbody[1]/tr/td[2]")));
String name = cells.get(0).getText();
String obj_soc = cells.get(1).getText();
String direcc = cells.get(2).getText();
String loc = cells.get(3).getText();
String tel = cells.get(4).getText();
String url = cells.get(7).getText();
String actividad = cells.get(8).getText();

Getting a connect time out exception in jsoup

I am trying to read a lot of html pages using jsoup. I have an arraylist called "allPageLinks" that keeps html page links. Here is my code:
Document doc;
for (int i = 0; i < allPageLinks.size(); i++) {
try {
doc = Jsoup.connect(allPageLinks.get(i)).timeout(0).get();
Element page_clips = doc.getElementById("page_clips");
Element page_clip_content = page_clips
.getElementById("content");
Elements product_grid = page_clip_content
.select(".product-list.margin-left-5");
Elements products = product_grid.get(0).children();
for (int j = 0; j < products.size(); j++) {
try {
String productName = products.get(j)
.getElementsByClass("name").text();
String productPrice = products.get(j)
.getElementsByClass("price").text();
String productLink = products.get(j)
.getElementsByClass("image").select("a")
.first().attr("href");
Document newDoc = Jsoup.connect(productLink).get();
Elements elements = newDoc.getElementsByClass("left");
Elements productNameElement = elements.get(0)
.getElementsByClass("colorbox");
String productImage = productNameElement.attr("href");
elements = newDoc.getElementsByClass("right");
String productId = elements.get(0)
.getElementsByClass("field").get(1).text();
writer.append(productName);
writer.append(';');
writer.append(productPrice);
writer.append(';');
writer.append(productId);
writer.append(';');
writer.append(productImage);
writer.append(';');
writer.append(productLink);
writer.append('\n');
} catch (Exception ex) {
System.out.println(ex.getMessage() + " " + i + " "
+ allPageLinks.get(i) + " ICTEKICATCH");
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage() + " " + i + " "
+ allPageLinks.get(i));
}
}
Even though i set connection timeout to zero, i am getting a lot of connect time out exceptions for most of the links. Can anyone help me to get rid of that exception?
Thanks

You forgot to add specify the timeout for this connection within the loop of the code:
Document newDoc = Jsoup.connect(productLink).get();
Should be:
Document newDoc = Jsoup.connect(productLink).timeout(0).get();
That is where the timeout exception is most likely occurring.

unable to get theaders

im currently having a problem in getting the headers title of a table to make a validation, it work great until column 6, because when it goes to next one which isn't visible the .getText() is blank. I no the xpath is correct.
public void getAndSaveDataOfTable (final String tabla) throws FileNotFoundException{
WebElement element = driver.findElement(By.xpath(tabla));
List<WebElement> elements = element.findElements(By.xpath("th"));
Assert.assertTrue(elements.size() > 1);
int cantelements = elements.size();
for (int i = 1; i <= cantelements; i++) {
String data = driver.findElement(
By.xpath(".//div[#class='ui-datatable-scrollable-header-box']//table/thead/tr/th["+ i + "]/span[1]")).getText();
System.out.println("EL nombre del encabezado " + i + " " + data);
datosFila.put(i, data);
}
So between column 7and 20 I can't get the text of the header.

If you are going to be using the JavascriptExecutor several times (you said your issue is on multiple columns 7-20) I would suggest you can optimize by running JS only once and then iterating through in Java, rather than running JS again for every operation. The JavascriptExecutor is slow, and you'll save whole seconds on the total test time by avoiding even 12+ extra JS executions.
// JavaScript to get text from tHeads
String getTheadTexts =
"var tHeads = []; " +
"for (i = o; i < 20; i++) { " +
"var cssSelector = 'div.ui-datatable-scrollable-header-box table thead tr th' + i + ' span:nth-of-type(1)'; " +
"var elem = document.querySelector(cssSelector); " +
"var elemText = ''; " +
"if ((elem.textContent) && (typeof (elem.textContent) != 'undefined')) { " +
"tHeads.push(elem.textContent); " +
"} else { " +
"tHeads.push(elem.innerText); " +
"} " +
"} " +
"return tHeads; ";
// Execute the JS to populate a List
List<String> tHeadTexts = (ArrayList<String>) js.executeScript(getTheadTexts);
// Do some operation on each List item
int i = 0;
for (String data: tHeadTexts){
System.out.println("EL nombre del encabezado " + i + " " + data);
datosFila.put(i, data);
i++;
}

I ran into what sounds like what you're running into. Take a look at my post on
WebElement getText() is an empty string in Firefox if element is not physically visible on the screen
I ended up solving it by using a little bit of JavaScript to columns "into view". Hopefully this helps you out.
private void scrollToElement(WebElement element){
((JavascriptExecutor) driver).executeScript("arguments[0].scrollIntoView(true);", element);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to prevent getting duplicate hrefs using getAttribute Selenium - java

Related

I can login a webpage (aspx) with python but i can't with JSoup

Request parameters coming from jsps are changed when two different users access the code same time

Can't get text with Selenium

Getting a connect time out exception in jsoup

unable to get theaders

Categories

Resources