Download a webpage's source which uses a loading spinner

Download a webpage's source which uses a loading spinner - java

I need to download a source code of this webpage: https://app.zonky.cz/#/marketplace/ so I could have the code checking if there is a new loan available. Unfortunate for me, the web page uses a loading spinner for the time the page is being loaded in the background. When I try to download the page's source using:
String url = "https://app.zonky.cz/#/marketplace/";
StringBuilder text = new StringBuilder();
try
{
URL pageURL = new URL(url);
Scanner scanner = new Scanner(pageURL.openStream(), "utf-8");
try {
while (scanner.hasNextLine()){
text.append(scanner.nextLine() + "\n");
}
}
finally{
scanner.close();
}
}
catch(Exception ex)
{
//
}
System.out.println(text.toString());
I get the page's source from the moment the spinner is being shown. Do you know of a better approach?
Solution:
public static String getSource() {
WebDriver driver = new FirefoxDriver();
driver.get("https://app.zonky.cz/#/marketplace/");
String output = driver.getPageSource();
driver.close();
return output;
}

You could always wait until the page has finished loading by checking if an element exists(one that is only loaded after the spinner disappears)
Also have you looked into using selenium ? it can be really useful for interacting with websites and handling tricky procedures such as waiting for elements :P
Edit: a pretty simple tutorial for Selenium waiting can be found here - http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#explicit-and-implicit-waits

Related

I need assistance in creating a function that will read (open new browser tab)url from CSV file in Java (Selenium)

I am trying to learn java and selenium by myself and creating a robot that will scan job/career pages for certain string (job name e.g. QA , developer...)
I'm trying to create JAVA code using selenium, that will read URL links from CSV file and open a new tab.
the main goal is to add several url in the CSV and assert/locate a certain string in the designated url's for example: is there "Careers" link in each URL, the test will pass for this specific url.
created a selenium project
created new chromeDriver
Created CSV built from 3 columns (ID, company's name, URL) - and added it to the project
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class URLSearch {
public static void main(String[] args) {
ChromeDriver driver = new ChromeDriver();
driver.manage().window().maximize();
String fileName = "JobURLList.csv";
File file = new File(fileName); //read from file
try {
Scanner inputStream = new Scanner(file);
while (inputStream.hasNext()) {
String data = inputStream.next();
System.out.println(data);
}
inputStream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
first line in the CSV - titles: id, name, url
Read the url from the second line - e.g. https://careers.google.com/jobs/"
open browsertab and start going over the url list (from the CSV)
locate a hardcoded string (e.g. "developer" , "qa" ..) in each url
if such a string was found, write in console the url that the test turned out to be positive (such a string was found in one of the url's).
if no such string was found, skip to the next url.

To open the new tab do something like this (this assumes "driver" object is your WebDriver):
((JavascriptExecutor)driver).executeScript("window.open('about:blank', '_blank');");
Set<String> tab_handles = driver.getWindowHandles();
int number_of_tabs = tab_handles.size();
int new_tab_index = number_of_tabs-1;
driver.switchTo().window(tab_handles.toArray()[new_tab_index].toString());
You could then create a function that takes a list of key/value pairs, with URL and term to search for and loop through it. Do you want to use a hashmap for this, or maybe an ArrayList of a class (id/name/url)? The code for finding the text would be something like this (assumes you've defined a var of "Pass" to boolean):
driver.get([var for URL]);
//driver will wait for pageready state, so you may
// not need the webdriver wait used below. Depends
// on if the page populates data after pagereadystate
String xpather = "//*[contains(text(), '" + [string var for text to search for] + "')]";
try
{
wait = new WebDriverWait(driver, 10);
List<WebElement> element = wait.until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.xpath(xpather)));
this.Pass = false;
if (element.size() > 0)
{
this.Pass = true;
}
}
catch (Exception ex)
{
this.Pass = false;
System.out.println ("Exception finding text: " + ex.toString());
}
Then logic for if (this.Pass==true or false)..

How to load and collect all comments with Selenium and Java

I have Java application which is using Selenium Web Driver to crawl/scrape information from Google Play Store applications. I have about 30 links from apps and i have a problem with collecting ALL comments from each application.
For example this application needs a lot of scrolling to load all comments, but some other applications need less/more scrolling.
How can i dynamically load all comments for each app?

Since you have not shared sample code i will share javascript snippet and then provide a C# implementation that you can use in your refer for your Java Selenium project.
Sample JavaScript code
let i=0;
var element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
var timer = setInterval(function()
{
console.log(element);
element.scrollIntoView();
i++;
element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
if(element===undefined)
clearTimeout(timer);
},500);
Running above code in console once you are on the application page with comments that you have shared will scroll until the end of page while printing out each comment on the console.
Sample code with Selenium C# bindings :
static void Main(string[] args)
{
ChromeDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl("https://play.google.com/store/apps/details?id=com.plokia.ClassUp&hl=en&showAllReviews=true");
ExtractComments(driver);
driver.Quit();
}
private static void ExtractComments(ChromeDriver driver,int startingIndex=0)
{
IEnumerable<IWebElement> comments = driver.FindElementsByCssSelector(#"div>span[jsname='bN97Pc']");
if (comments.Count() <= startingIndex)
return; //no more new comments hence return.
if (startingIndex > 0)
comments = comments.Skip(startingIndex); //skip already processed elements
//process located comments
foreach (var comment in comments)
{
string commentText = comment.Text;
Console.WriteLine(commentText);
(driver as IJavaScriptExecutor).ExecuteScript("arguments[0].scrollIntoView()", comment);
Thread.Sleep(250);
startingIndex++;
}
Thread.Sleep(2000); // Let more comments load once we have consumed existing
ExtractComments(driver,startingIndex); //Recursively call self to process any further comments that have been loaded after scrolling
}
Hope this helps.

Jsoup code that works for Eclipse but not Android Studio (httpurlconnectionimpl)

I am working on a small app for myself and I just don't understand why my code is working in Eclipse but not on my phone using Android Studio.
public static ArrayList<Link> getLinksToChoose(String searchUrl) {
ArrayList<Link> linkList = new ArrayList<Link>();
try {
System.out.println(searchUrl);
Document doc = Jsoup.connect(searchUrl).timeout(3000).userAgent("Chrome").get();
Elements links = doc.select("tr");
links.remove(0);
Elements newLinks = new Elements();
for(Element link : links) {
Link newLink = new Link(getURL(link),getName(link),getLang(link));
linkList.add(newLink);
}
} catch(IOException e){
e.printStackTrace();
}
return linkList;
}
The problem is I can't even get the Document. I always get an httpurlconnectionimpl in the line where I try to get the html doc. I have read a bit about Jsoup in Android. Some people suggest using AsyncTask but it doesn't seem like that would solve my problem.

The loading of the content must happen outside the main thread, e.g. in an AsyncTask.

How to wait for download to finish using Webdriver

Is there any way to wait for a download to finish in WebDriver?
Basically, I want to verify that downloaded file getting stored correctly inside hard drive and to verify that, need to wait till download finishes. Please help if anyone aware of such a scenario earlier.

Not a direct answer to your question but I hope it will help.
Pool the file on your hard drive (compute a MD5 of it). Once it is correct, you can go further with your test. Fail your tets if the file is not correct after a timeout.

Poll the configured download directory for absence of partial file.
Example code for Chrome below:
File destinationDir = new File("blah");
Map<String, Object> prefs = new HashMap<>();
prefs.put("download.default_directory", destinationDir.getAbsolutePath());
DesiredCapabilities desiredCapabilities = DesiredCapabilities.chrome();
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", prefs);
desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
WebDriver webDriver = new ChromeDriver(desiredCapabilities);
// Initiate download...
do {
Thread.sleep(5000L);
} while(!org.apache.commons.io.FileUtils.listFiles(destinationDir, new String[]{"crdownload"}, false).isEmpty());

As answered on Wait for Download to finish in selenium webdriver JAVA
private void waitForFileDownload(int totalTimeoutInMillis, String expectedFileName) throws IOException {
FluentWait<WebDriver> wait = new FluentWait(this.funcDriver.driver)
.withTimeout(totalTimeoutInMillis, TimeUnit.MILLISECONDS)
.pollingEvery(200, TimeUnit.MILLISECONDS);
File fileToCheck = getDownloadsDirectory()
.resolve(expectedFileName)
.toFile();
wait.until((WebDriver wd) -> fileToCheck.exists());
}
public synchronized Path getDownloadsDirectory(){
if(downloadsDirectory == null){
try {
downloadsDirectory = Files.createTempDirectory("selleniumdownloads_");
} catch (IOException ex) {
throw new RuntimeException("Failed to create temporary downloads directory");
}
}
return downloadsDirectory;
}
Then you can use a library like this to do the actual file handling to see the file is stored correctly (that could mean comparing file size, Md5 hashes or even checking the content of the file which Tika can actually do as well).
public void fileChecker(){
waitForFileDownload(20000,"filename_here");
File file = downloadsDirectory.resolve(expectedFileName).toFile();
AutoDetectParser parser = new AutoDetectParser();
parser.setParsers(new HashMap<MediaType, Parser>());
Metadata metadata = new Metadata();
metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
try (InputStream stream = new FileInputStream(file)) {
parser.parse(stream, (ContentHandler) new DefaultHandler(), metadata, new ParseContext());
}
String actualHash = metadata.get(HttpHeaders.CONTENT_MD5);
assertTrue("There was a hash mismatch for file xyz:",actualHash.equals("expectedHash"));
}

I am your the bellow code in Python + Firefox:
browser.get('about:downloads') #Open the download page.
# WAit all icons change from "X" (cancel download).
WebDriverWait(browser, URL_LOAD_TIMEOUT * 40).until_not(
EC.presence_of_element_located((By.CLASS_NAME, 'downloadIconCancel')))

For small files, I currently either use an implied wait or wait for the JS callback that my file has downloaded before moving on. The code below was posted on SO by another individual, I can't find the post right away, so I won't take credit for it.
public static void WaitForPageToLoad(IWebDriver driver)
{
TimeSpan timeout = new TimeSpan(0, 0, 2400);
WebDriverWait wait = new WebDriverWait(driver, timeout);
IJavaScriptExecutor javascript = driver as IJavaScriptExecutor;
if (javascript == null)
throw new ArgumentException("driver", "Driver must support javascript execution");
wait.Until((d) =>
{
try
{
string readyState = javascript.ExecuteScript("if (document.readyState) return document.readyState;").ToString();
return readyState.ToLower() == "complete";
}
catch (InvalidOperationException e)
{
//Window is no longer available
return e.Message.ToLower().Contains("unable to get browser");
}
catch (WebDriverException e)
{
//Browser is no longer available
return e.Message.ToLower().Contains("unable to connect");
}
catch (Exception)
{
return false;
}
});
}
It should wait for your file to finish if it is small. Unfortunately, I haven't tested this on larger files ( > 5MB )

Selenium takes lots of time to get dynamic page of given URL

I am doing a Project in Java.
In this project I have to work with DOM.
For that I first load a dynamic page of any given URL, by using Selenium.
Then I parse them using Jsoup.
I want to get the dynamic page source code of given URL
Code snapshot:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver();
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
Document doc = Jsoup.parse(html_content);
// OPERATIONS USING DOM TREE
}
But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.
Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.
Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.
Edit NO. 1
There is some code given on this link.
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);
But what is second line here, I didn't understand. As Documentation is also very poor of selenium.
Edit No. 2
System.out.println("Fetching %s..." + url1);
System.out.println("Fetching %s..." + url2);
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("url1");
String hml1 = driver.getPageSource();
driver.get("url2");
String hml2 = driver.getPageSource();
driver.close();
Document doc1 = Jsoup.parse(hml1);
Document doc2 = Jsoup.parse(hml2);

Try this:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
// OPERATIONS USING DOM TREE
}
private static FirefoxProfile createFirefoxProfile() {
File profileDir = new File("/tmp/firefox-profile-dir");
if (profileDir.exists())
return new FirefoxProfile(profileDir);
FirefoxProfile firefoxProfile = new FirefoxProfile();
File dir = firefoxProfile.layoutOnDisk();
try {
profileDir.mkdirs();
FileUtils.copyDirectory(dir, profileDir);
} catch (IOException e) {
e.printStackTrace();
}
return firefoxProfile;
}
The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.

if you are sure, confident about your code, you can go with phantomjs. it is a headless browser and will get your results with quick hits. FF will take time to execute.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Download a webpage's source which uses a loading spinner - java

Related

I need assistance in creating a function that will read (open new browser tab)url from CSV file in Java (Selenium)

How to load and collect all comments with Selenium and Java

Jsoup code that works for Eclipse but not Android Studio (httpurlconnectionimpl)

How to wait for download to finish using Webdriver

Selenium takes lots of time to get dynamic page of given URL

Categories

Resources