Headless Chrome - getting blank page source - java

I'm trying to load a website with Chrome browser in headless mode using Selenium web driver. I face an issue with some specific websites. The page is loading, in the first 2-3 seconds it shows a page with "please enable javascript..." and after 3 seconds, page source goes blank.
I'm using Selenium and especially Chrome for long time and I am familiar with the platform. For the purpose of this case, I'm using Chrome Version 73.0.3683.86 , ChromeDriver 2.46.628411 (which is compatible according to Which ChromeDriver version is compatible with which Chrome Browser version?) on a Mac OS. selenium java version is latest - 3.141.59
I suspect that headless Chrome cannot handle specific content-type such as "svg" and any other GUI related HTTP response.
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--headless");
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://identity.tescobank.com/login");
Thread.sleep(3000);
System.out.println(driver.getPageSource());
driver.quit();
Expected result is to have the page source same as it is showing in non-headless mode.

Headless Chrome should be able to handle everything the normal Chrome can do:
It brings all modern web platform features provided by Chromium and the Blink rendering engine to the command line.
(see https://developers.google.com/web/updates/2017/04/headless-chrome)
Since only the login page of a bank causes you trouble, my guess is that the security of the page detects an anomaly and decides not to serve you.
One way they can do that is by looking at the User Agent string which contains HeadlessChrome.
That said, unless you're writing integration tests for the bank, your behavior is at least suspicious. If you have a valid and legal concern, clear it with the bank first. They might take actions against you, otherwise. Blocking your IP address (which could affect many people) or asking the police to have a word with you.

I was facing similar issue in my script, after login. Somehow refreshing the page resolved the issue.
driver.navigate().refresh();

Related

What is the difference between Selenium Webdriver and GeckoDriver?

I use Selenium every 5 years or so and everytime it has changed beyond recognition. I just started a new Selenium project, googled some quickstart guides such as https://www.toolsqa.com/selenium-webdriver/run-selenium-test/ (written in September 2020) and https://www.guru99.com/first-webdriver-script.html (© 2020) and both seem to use WebDriver, e.g., by initiating their examples with WebDriver driver = new FirefoxDriver(); although the latter has a disclaimer saying that from Firefox 35 (I have 82) and up you should use Geckodriver.
I use Selenium for Java 3.141.59 downloaded from https://www.selenium.dev/downloads/ but it only has two references to Geckodriver (at least that is all that is displayed when it enter Ge and autocomplete in my IDE), GeckoDriverInfo and GeckoDriverService (as a comparison, there are nine references to WebDriver).
I have read the information here https://github.com/mozilla/geckodriver but it didn't make me any wiser, nor did https://en.wikipedia.org/wiki/Selenium_(software)#Selenium_WebDriver (Geckodriver isn't even mentioned on this Wikipedia page).
What is the difference between Webdriver and Geckodriver?
Why, if one download the newest/current version of everything, isn't Geckodriver included if that is the recommended tool since several (?) years?
Why are guides that are written recently use Webdriver if Geckodriver is the way to go?
I think I have done a reasonably amount of research before asking this question but feel free to suggest improvements because I am genuinely confused.
WebDriver is a specification. It defines the way how UI interfaces can be automated. GeckoDriver is the implementation of such specification - it is the WebDriver implementation for Firefox browser.
So basically a WebDriver is a server that exposes REST API to one side and that knows how to control browser on another side.
Here is short explanation of E2E flow (for Firefox and Java):
You download Selenium java library. It provides Java client for interacting with web drivers
You download GeckoDriver
In your Java code you call WebDriver driver = new FirefoxDriver();
Selenium library starts GeckoDriver executable in OS native manner
In your Java code you call driver.get("http://my.url")
Selenium library forms REST call to the server that is started with GeckoDriver. It invokes the endpoint according to this section of specification.
GeckoDriver then translates this command to somewhat that Firefox understands so that the browser navigates to required page.
So basically you need 3 things to make everything work:
Selenium Java library that is basically a Java client for WebDriver REST API
GeckoDriver (that implements REST API according to WebDriver specification and translates it to commands which Firefox browse can understand)
Firefox browser
webdriver, is the parent of ChromeDriver ( from chrome ), GeckoDriver ( FireFox ), IEDriver and RemoteDriver. Possibly even more if they are supported. So, the GeckoDriver is used to control a FireFox Browser instance, but it implements the methods mentioned in the WebDriver interface.
GeckoDriver is not included as it is specific to FireFox only, other users may want to use other browsers.
To keep the flexibility of swapping out the implementations for different browsers. :)

How to enable flash in Selenium with headless chrome

I'm trying to automate some interactions with our flash app as part of our CI process. I'm running into troubles with enabling flash when running chrome headlessly (via xvfb-run) with Selenium Standalone Server. I've done a lot of searching, but thus far haven't come up with anything that works.
I'm currently using this, but am open to switching to different versions if there's a known working config somewhere...
Selenium Standalone Server 3.11
Chromedriver 2.33
Chrome 65.0.3325.181
Java 8
When I first got this started I would get a warning on the page saying I needed to enable Adobe Flash Player. I got "past" that message by using the following from https://sqa.stackexchange.com/questions/30312/enable-flash-player-on-chrome-62-while-running-selenium-test:
ChromeOptions options = new ChromeOptions();
options.addArguments("headless");
Map<String, Object> prefs = new HashMap<String, Object>();
prefs.put("profile.default_content_setting_values.plugins", 1);
prefs.put("profile.content_settings.plugin_whitelist.adobe-flash-player", 1);
prefs.put("profile.content_settings.exceptions.plugins.*,*.per_resource.adobe-flash-player", 1);
// Enable Flash for this site
prefs.put("PluginsAllowedForUrls", "ourapp.com");
options.setExperimentalOption("prefs", prefs);
WebDriver driver = new ChromeDriver(options);
driver.get("ourapp.com");
When loading our app, the page now gives a slightly different message which I haven't been able to get past. Is there a way to get around this, or is there any other way to enable Flash by default?
Restart Chrome to enable Adobe Flash Player
Thanks in advance for the help!
Thanks to a coworker for pointing out this post indicating that plugins don't work in headless chrome. https://groups.google.com/a/chromium.org/forum/#!searchin/headless-dev/flash%7Csort:date/headless-dev/mC0REfsA7vo/rKAZdRrCCQAJ
Fortunately in my case I was already using xvfb as a virtual display, so removing the "headless" argument from my ChromeOptions was all I needed to get everything running.
EDIT*
While true that I did not need to run in headless mode while using xvfb, I ended up finding that the real 'solution' (more of a workaround) to my problem was to upload a custom chrome profile into my docker image. Doing so allowed me to set all the preferences I needed, and none of the code posted in the original post is required. Would much prefer to achieve this programatically, but this at least gets me what I need for now. Figured I'd post in case someone else runs into this in the future..

Selenium WD: different pages appear while open them manually and if using Selenium

I have faced with the next problem: when I would like to open manually login page of my application (let it be e.g. **hello**.example.com/login) I can see one page style and DOM structure.
But when I am trying to start application using Selenium WebDriver (and therefore Chrome is controlled by automated test software), suddenly I can see another page that I have on this domain (e.g. **hi**.example.com/login) and this domain's DOM structure. URL in the browser as I have expected (first link), but the appearance and DOM as if I am on another page (second link).
Tried to use incognito window without any plugins and settings, but it didn't help.
Chrome Version 62.0.3202.94 (64-bit);
Chromedriver version 2.33.506120;
Win 10.
What problem it could be?
Thanks in advance.

Selenium Chrome Driver Limitations Web Scraping at Scale

I'm planning to use Selenium Chrome Driver for my project which will be used to do web scraping to multiple public websites (something like kayak or skyscanner). So there will be a REST GET endpoint where my backend would launch headless Chrome to scrape multiple websites, and eventually return a manipulated JSON.
I want to know how scalable is Chrome Driver as it sounds like a headless Chrome instance needs to be launched whenever a request comes in.
Updated: Question using Google Chrome Headless
Please find the pros and cons of phantom js which I noticed during implementation .Hope this helps.
Cons:
1)It will fail to recognize the browser elements like id,xpath,csselector
when compared to chrome driver.
2)If you have login mechanism ,redirects won't work as you expect when compared to chrome driver.
3)You need to manually implement the custom logic for screen shots for the test failures if you need it.
4)If you want to switch between multiple drivers like chrome,html etc then it is very difficult
Pros:
1)Test case execution is faster when compared to chrome driver
2)No browser is required it will run without GUI.
3)No much configurations are needed when compared to chromedriver.
You can go with html driver also which is quite faster then phantom but even it has its own limitations that you need take care of before implementation.
I am not sure that you really need to use PhantomJS.
Chrome implemented "headless" mode couple of months ago.
"Headless Chrome" does the same job that PhantomJS, and does it better.
I heard that PhantomJS authors even said that they will not support it anymore.
You can enable headless mode in Selenide with just on line:
Configuration.headless = true;
Did you think about headless chrome?
Headless Chrome

Selenium 2.32, Java 1.6.0_07, IE Webdriver (32 and 64 Bit), IE9 - getWindowHandles returns only one browser

I am using Selenium 2.32, Java JDK 1.6.0_07, IE9 with Windows 7. Here is the problem
When i use IE WebDriver 32 Bit and click on a link which opens a new browser containing PDF, the PDF is opened in the browser itself which is fine, but the new browser is not identified when i use driver.getWindowHandles(). It always returns only the parent window. When i use the same code with IE8, it works perfectly fine and i am able to get the URL of the new browser.
I thought since it is Windows 7 and IE9, i should use IE Webdriver 64 bit and so i used IE Webdriver version 2.32.3 (64 Bit). With this webdriver, when i click on the link, the new browser pops up but the PDF is not opened in the browser and instead it is opened as a separate PDF file. Even in this case, the new browser is not identified and driver.getWindowHandles() returns only one browser.
Not just the PDF browsers but also the normal browsers are not returned by driver.getWindowHandles()
I am using a wait of 10 seconds for the new browser to load and so the there is no load/sync issue.
I want to identify the new browser and get the URL of the new browser. Please help.
The problem here is that you are making things too complicated. From your comments, it does not seem you are doing things the "typical" and "recommended" way. If you are following advice, then you are doing it the slightly harder route. My advice is still to simplify further.
If I were to guess on your issue though: I notice that you say you are using "IEDriverServer". That tells me that you may be improperly using WebDriver. When you are using a Grid Hub and a separate Grid Node ( see my link here for sample launch instructions: https://gist.github.com/djangofan/5174433 ) then you should be invoking RemoteWebDriver rather than WebDriver, like so (or similar):
WebDriver driver = new RemoteWebDriver (
new URL("http://localhost:4444/wd/hub"),
DesiredCapabilities.firefox()
);
driver.get("http://www.google.com");
This work for me:
Root cause: On IE 7 or higher on Windows Vista or Windows 7, you must set the Protected Mode settings for each zone to be the same value. The value can be on or off, as long as it is the same for every zone. To set the Protected Mode settings, choose "Internet Options..." from the Tools menu, and click on the Security tab. For each zone, there will be a check box at the bottom of the tab labeled "Enable Protected Mode".
Hope it works for you.

Categories

Resources