I want to get the current captcha that is displayed on a website. An example of this would be
http://top100arena.com/in.asp?id=58978
How would I get the image link of the captcha that is displayed other than right clicking - > open image in new page?
You are looking for the div identified by "rechapta_image":
Then extract the src attribute of the img element inside this div.
To do this, you can choose for an easy String-operation-based way or use a HTML parsing library like JSoup.
Here is an example of such an extract URL:
http://www.google.com/recaptcha/api/image?c=03AHJ_VutGj3wvhGoQGxu6FUnG3uOWJdyB2RpSb2N5v9AQJyakMy1kKMPeDoRfADhjAj5rLqekuOzXe3cRChnA_sEN7PL68em4pI_kE3wFKUhhkqFF9jQzKJerX__InwD_DB0Ox1mKQmZVRl97yuSL62tZhYyhSqtuIta-3n0KvytB9QqSn8nXgw8
Actually, it seems that the chapta box is an iframe. So search for an iframe with src string containing "chapta". Example of such a iframe:
<iframe src="http://www.google.com/recaptcha/api/noscriptk=6LeyFroSAAAAAJTmR7CLZ5an7pcsS5eJ3wEoWHhJ"
height="300" width="500" frameborder="0"></iframe><br/>
So, once you extracted that URL, use JSoup again to find the URL to the image. The page fetched has a part this:
So, look for a center element, and get the img element out of it.
Try using Firebug in firefox https://addons.mozilla.org/es/firefox/addon/firebug/, Its easy to use and in the Red section you´ll find a label named Image, you´ll find the image there.
Related
I'm trying to get the site scripts using 'Jsoup.connect(url).get().html()'
but it doesn’t appear the script I want, does anyone know how I can get this script?
Script I want to get
It doesn't appear in the source because that video is inside an iframe. That iframe has its own src attribute (visible on your screenshot). Try getting that page instead.
EDIT:
Get the first page and parse it. Then select iframe src and when you have the second URL do the same again so get the page and parse it:
String iframeUrl = Jsoup.connect(url).get().selectFirst("#option-1 iframe").attr("src");
System.out.println(iframeUrl);
Document document = Jsoup.connect(iframeUrl).get();
System.out.println(document.html());
Here i am unable to click the iframe href using the Page factory but if i try seperately without pagefactory i am unable to click the Href
DOMImage: enter image description here
#FindBy(xpath="//*[#id="content"]/div/ul/li[2]/a")
WebElement iFramelabeltext;
Tried various approachs:
//a[normalize-space()='iFrame']
//a[#href='/iframe']
Need help in solving this ?
Firstly, you need to understand what is a Frame actually - it's a html code block inside entire html page code. So Selenium cannot directly access it.
You need to switch to that Frame and then perform needed actions.
How to deal with frames, see here - https://www.guru99.com/handling-iframes-selenium.html
I need to able to click on a link (I am using Selenium and Java). I am searching the link using xpath but for some reason I am not getting most of the webpage, just a bunch of white spaces. In the image you can see the highlighted link I am looking for.
I tried:
System.out.println(driver.findElement(By.xpath("//*[#class='titre_1']/a")).getText());
System.out.println(driver.findElement(By.xpath("//*[#id='li-7']/div/a")).getText());
I get: org.openqa.selenium.NoSuchElementException: no such element: Unable to locate element
If I do:
System.out.println(driver.findElement(By.xpath("//*")).getText());
I only get a few elements from the page and bunch of white spaces. What could be wrong?
Please help. I couldn't fit in the entire html source to show you. I hope that's ok.
html source pic
In case the element is inside an iframe, you can get it by switching to iframe then call findElement. See code below:
WebElement iframeElement = driver.findElement(By.id("id_of_the_iframe"));
driver.switchTo().frame(iframeElement);
Then you can find the element with your xpath:
System.out.println(driver.findElement(By.xpath("//*[#class='titre_1']/a")).getText());
The problem is that some of the urls I am trying to pull from have javascript slideshow containers that haven't loaded yet. I want to get the images in the slideshow but since this hasn't loaded yet it doesn't grab the element. Is there anyway to do this? This is what I have so far
Document doc = Jsoup.connect("http://news.nationalgeographic.com/news/2013/03/pictures/130316-gastric-brooding-frog-animals-weird-science-extinction-tedx/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ng%2FNews%2FNews_Main+%28National+Geographic+News+-+Main%29").get();
Elements jpg = doc.select("img[src$=.jpg]");
jsoup can't handle javascript, but you can use an additional library for this:
Parse JavaScript with jsoup
Trying to parse html hidden by javascript
I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:
<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>
There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.
jsoup is probably what you want, it excels at extracting data from an HTML document.
There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax
The process will be in two steps:
parse the page and find the url of the iframe
parse the content of the iframe and extract the information you need
The code would look like this:
// let's find the iframe
Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
Elements elements = document.select("iframe");
Element iframe = elements.first();
// now load the iframe
URL iframeUrl = new URL(iframe.absUrl("src"));
document = Jsoup.parse(iframeUrl, 15000);
// extract the div
Element div = document.getElementById("number_forecast");
In you page that contains iframe change source of youe iframe to your own url. This url will be processed with your ouw controller, that will read content, parse it, extract all you need and write to response. If there is absolute references in your iframe this should work.