How to load and collect all comments with Selenium and Java - java

I have Java application which is using Selenium Web Driver to crawl/scrape information from Google Play Store applications. I have about 30 links from apps and i have a problem with collecting ALL comments from each application.
For example this application needs a lot of scrolling to load all comments, but some other applications need less/more scrolling.
How can i dynamically load all comments for each app?

Since you have not shared sample code i will share javascript snippet and then provide a C# implementation that you can use in your refer for your Java Selenium project.
Sample JavaScript code
let i=0;
var element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
var timer = setInterval(function()
{
console.log(element);
element.scrollIntoView();
i++;
element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
if(element===undefined)
clearTimeout(timer);
},500);
Running above code in console once you are on the application page with comments that you have shared will scroll until the end of page while printing out each comment on the console.
Sample code with Selenium C# bindings :
static void Main(string[] args)
{
ChromeDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl("https://play.google.com/store/apps/details?id=com.plokia.ClassUp&hl=en&showAllReviews=true");
ExtractComments(driver);
driver.Quit();
}
private static void ExtractComments(ChromeDriver driver,int startingIndex=0)
{
IEnumerable<IWebElement> comments = driver.FindElementsByCssSelector(#"div>span[jsname='bN97Pc']");
if (comments.Count() <= startingIndex)
return; //no more new comments hence return.
if (startingIndex > 0)
comments = comments.Skip(startingIndex); //skip already processed elements
//process located comments
foreach (var comment in comments)
{
string commentText = comment.Text;
Console.WriteLine(commentText);
(driver as IJavaScriptExecutor).ExecuteScript("arguments[0].scrollIntoView()", comment);
Thread.Sleep(250);
startingIndex++;
}
Thread.Sleep(2000); // Let more comments load once we have consumed existing
ExtractComments(driver,startingIndex); //Recursively call self to process any further comments that have been loaded after scrolling
}
Hope this helps.

Related

Microsoft Graph API: A question about IDriveItemCollectionPage

I look at the docs regarding this and struggling about with how IDriveItemCollectionPage works.
I am currently doing the following, trying to list all children DriveItems of the root drive of a site given its Drive Id with the Java SDK
public ArrayList<DriveItem> getDriveItemChildrenFoldersOfRootDrive(String rootDriveId){
//gets the children folder driveI
IDriveItemCollectionPage driveChildren= mGraphServiceClient.drives().byId(rootDriveId).root().children().buildRequest().get();
ArrayList<DriveItem> results = new ArrayList<DriveItem>();
results.addAll(driveChildren.getCurrentPage());
return results;
}
I realize if getNextPage returns null then there are no more results, but do you have to make another api call to get the next page if there is one?How do I do that with the above setup?
Agree with Brad. I tested at my side and got a success. Here is my code
public static void main(String[] args) {
IGraphServiceClient client = GetClient();
IDriveCollectionPage page = client.drives().buildRequest().get();
List<Drive> drives = page.getCurrentPage();
while(page.getNextPage() != null){
page = page.getNextPage().buildRequest().get();
drives.addAll(page.getCurrentPage());
}
System.out.println(drives.size());
}

Jsoup code that works for Eclipse but not Android Studio (httpurlconnectionimpl)

I am working on a small app for myself and I just don't understand why my code is working in Eclipse but not on my phone using Android Studio.
public static ArrayList<Link> getLinksToChoose(String searchUrl) {
ArrayList<Link> linkList = new ArrayList<Link>();
try {
System.out.println(searchUrl);
Document doc = Jsoup.connect(searchUrl).timeout(3000).userAgent("Chrome").get();
Elements links = doc.select("tr");
links.remove(0);
Elements newLinks = new Elements();
for(Element link : links) {
Link newLink = new Link(getURL(link),getName(link),getLang(link));
linkList.add(newLink);
}
} catch(IOException e){
e.printStackTrace();
}
return linkList;
}
The problem is I can't even get the Document. I always get an httpurlconnectionimpl in the line where I try to get the html doc. I have read a bit about Jsoup in Android. Some people suggest using AsyncTask but it doesn't seem like that would solve my problem.
The loading of the content must happen outside the main thread, e.g. in an AsyncTask.

Download a webpage's source which uses a loading spinner

I need to download a source code of this webpage: https://app.zonky.cz/#/marketplace/ so I could have the code checking if there is a new loan available. Unfortunate for me, the web page uses a loading spinner for the time the page is being loaded in the background. When I try to download the page's source using:
String url = "https://app.zonky.cz/#/marketplace/";
StringBuilder text = new StringBuilder();
try
{
URL pageURL = new URL(url);
Scanner scanner = new Scanner(pageURL.openStream(), "utf-8");
try {
while (scanner.hasNextLine()){
text.append(scanner.nextLine() + "\n");
}
}
finally{
scanner.close();
}
}
catch(Exception ex)
{
//
}
System.out.println(text.toString());
I get the page's source from the moment the spinner is being shown. Do you know of a better approach?
Solution:
public static String getSource() {
WebDriver driver = new FirefoxDriver();
driver.get("https://app.zonky.cz/#/marketplace/");
String output = driver.getPageSource();
driver.close();
return output;
}
You could always wait until the page has finished loading by checking if an element exists(one that is only loaded after the spinner disappears)
Also have you looked into using selenium ? it can be really useful for interacting with websites and handling tricky procedures such as waiting for elements :P
Edit: a pretty simple tutorial for Selenium waiting can be found here - http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#explicit-and-implicit-waits

WebDriver.getWindowHandle() method

I'm new to Selenium learning. WebDriver.getWindowHandle() documentation is not very clear to me and the example is not working as given in the book, so I thought of confirming the value returned by this method.
1) Let's say I am on page PAGE1. So getWindowHandle() should return handle to PAGE1. (Correct)
2) Now from this page, I go to PAGE2 (by hyperlink and opening a new window). My book says now getWindowHandle() should return handle to PAGE2. However my program still returns handle to PAGE1.
Selenium v2.43
Reproducible on Firefox and Chrome both.
Question: What is the exact value that getWindowHandle() should return?
WebDriver wd = new ChromeDriver();
wd.get("file://D:/Projects/Selenium/Startup/web/ch3/switch_main.html");
String h1 = wd.getWindowHandle();// original handle
System.out.println("First handle = " + h1);
WebElement clickhere = wd.findElement(By.id("clickhere"));
clickhere.click();//moved to a new child page<
String h2 = wd.getWindowHandle();
System.out.println("Second handle = " + h2);// this handle is not different than h1
getWindowHandle() will get the handle of the page the webDriver is currently controlling. This handle is a unique identifier for the web page. This is different every time you open a page even if it is the same URL.
getWindowHandles() (don't forget the 's') will give you all the handles for all the pages that the web driver understands are open. Note that when you put these in a list they are listed in the order that they have been opened.
You can use SwitchTo().Window("handle") to switch to the window you desire.
You can use SwitchTo().Window("mywindowID"), if you know the window ID.
SwitchTo().Window("") will always go back to the base/main window.
SwitchTo().Frame("popupFrame") will get to the Popup that came from the window the webdriver is currently controlling.
If the link opens a new window you should have a new window handle in the WebDriver. You can loop current window handles with getWindowHandles.
See this example from http://www.thoughtworks.com/products/docs/twist/13.3/help/how_do_i_handle_popup_in_selenium2.html
String parentWindowHandle = browser.getWindowHandle(); // save the current window handle.
WebDriver popup = null;
Iterator<String> windowIterator = browser.getWindowHandles();
while(windowIterator.hasNext()) {
String windowHandle = windowIterator.next();
popup = browser.switchTo().window(windowHandle);
if (popup.getTitle().equals("Google") {
break;
}
}
When you open the new window, the WebDriver doesn't automatically switch to it. You need to use the switchTo() method to switch to the new window, either using the name of the new window, or its handle (which you can get with getWindowHandles() and searching for the one that's not the current window).
I have used this code for my project
String oldTab = driver.getWindowHandle();
public static void switchingToNewTabUsingid(WebDriver driver,WebDriverWait wait,String id,String oldTab)
{
wait.until(ExpectedConditions.elementToBeClickable(By.id(id)));
driver.findElement(By.id(id)).click();
ArrayList<String> newTab = new ArrayList<String>(driver.getWindowHandles());
newTab.remove(oldTab);
driver.switchTo().window(newTab.get(0));
}
//Perfrom Opeartion here on switched tab
public static void comingBackToOldTab(WebDriver driver,String oldTab)
{
driver.close();
driver.switchTo().window(oldTab);
}
With Selenium 2.53.1 using firefox 47.0.1 as the WebDriver in Java: You need to open the separate windows/browsers in it's own driver. I have having the same problem. No matter how many windows or tabs I opened, "driver.getWindowHandles()" would only return one handle so it was impossible to switch between tabs. I found Chrome worked way better for me.
Once I started using Chrome 51.0, I could get all handles. The following code show how to access multiple drivers and multiple tabs within each driver.
// INITIALIZE TWO DRIVERS (THESE REPRESENT SEPARATE CHROME WINDOWS/BROWSERS)
driver1 = new ChromeDriver();
driver2 = new ChromeDriver();
// LOOP TO OPEN AS MANY TABS AS YOU WISH
for(int i = 0; i < TAB_NUMBER; i++) {
driver1.findElement(By.cssSelector("body")).sendKeys(Keys.CONTROL + "t");
// SLEEP FOR SPLIT SECOND TO ALLOW DRIVER TIME TO OPEN TAB
Thread.sleep(100);
// STORE TAB HANDLES IN ARRAY LIST FOR EASY ACCESS
ArrayList tabs1 = new ArrayList<String> (driver1.getWindowHandles());
// REPEAT FOR THE SECOND DRIVER (SECOND CHROME BROWSER WINDOW)
// LOOP TO OPEN AS MANY TABS AS YOU WISH
for(int i = 0; i < TAB_NUMBER; i++) {
driver2.findElement(By.cssSelector("body")).sendKeys(Keys.CONTROL + "t");
// SLEEP FOR SPLIT SECOND TO ALLOW DRIVER TIME TO OPEN TAB
Thread.sleep(100);
// STORE TAB HANDLES IN ARRAY LIST FOR EASY ACCESS
ArrayList tabs2 = new ArrayList<String> (driver2.getWindowHandles());
// NOW PERFORM DESIRED TASKS WITH FIRST BROWSER IN ANY TAB
for(int ii = 0; ii <= TAB_NUMBER; ii++) {
driver2.switchTo().window(tabs2.get(ii));
// LOGIC FOR THAT DRIVER'S CURRENT TAB
}
// PERFORM DESIRED TASKS WITH SECOND BROWSER IN ANY TAB
for(int ii = 0; ii <= TAB_NUMBER; ii++) {
drvier2.switchTo().window(tabs2.get(ii));
// LOGIC FOR THAT DRIVER'S CURRENT TAB
}
Hopefully that gives you a good idea of how to manipulate multiple tabs in multiple browser windows.

Selenium takes lots of time to get dynamic page of given URL

I am doing a Project in Java.
In this project I have to work with DOM.
For that I first load a dynamic page of any given URL, by using Selenium.
Then I parse them using Jsoup.
I want to get the dynamic page source code of given URL
Code snapshot:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver();
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
Document doc = Jsoup.parse(html_content);
// OPERATIONS USING DOM TREE
}
But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.
Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.
Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.
Edit NO. 1
There is some code given on this link.
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);
But what is second line here, I didn't understand. As Documentation is also very poor of selenium.
Edit No. 2
System.out.println("Fetching %s..." + url1);
System.out.println("Fetching %s..." + url2);
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("url1");
String hml1 = driver.getPageSource();
driver.get("url2");
String hml2 = driver.getPageSource();
driver.close();
Document doc1 = Jsoup.parse(hml1);
Document doc2 = Jsoup.parse(hml2);
Try this:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
// OPERATIONS USING DOM TREE
}
private static FirefoxProfile createFirefoxProfile() {
File profileDir = new File("/tmp/firefox-profile-dir");
if (profileDir.exists())
return new FirefoxProfile(profileDir);
FirefoxProfile firefoxProfile = new FirefoxProfile();
File dir = firefoxProfile.layoutOnDisk();
try {
profileDir.mkdirs();
FileUtils.copyDirectory(dir, profileDir);
} catch (IOException e) {
e.printStackTrace();
}
return firefoxProfile;
}
The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.
if you are sure, confident about your code, you can go with phantomjs. it is a headless browser and will get your results with quick hits. FF will take time to execute.

Categories

Resources