I am doing a Project in Java.
In this project I have to work with DOM.
For that I first load a dynamic page of any given URL, by using Selenium.
Then I parse them using Jsoup.
I want to get the dynamic page source code of given URL
Code snapshot:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver();
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
Document doc = Jsoup.parse(html_content);
// OPERATIONS USING DOM TREE
}
But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.
Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.
Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.
Edit NO. 1
There is some code given on this link.
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);
But what is second line here, I didn't understand. As Documentation is also very poor of selenium.
Edit No. 2
System.out.println("Fetching %s..." + url1);
System.out.println("Fetching %s..." + url2);
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("url1");
String hml1 = driver.getPageSource();
driver.get("url2");
String hml2 = driver.getPageSource();
driver.close();
Document doc1 = Jsoup.parse(hml1);
Document doc2 = Jsoup.parse(hml2);
Try this:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
// OPERATIONS USING DOM TREE
}
private static FirefoxProfile createFirefoxProfile() {
File profileDir = new File("/tmp/firefox-profile-dir");
if (profileDir.exists())
return new FirefoxProfile(profileDir);
FirefoxProfile firefoxProfile = new FirefoxProfile();
File dir = firefoxProfile.layoutOnDisk();
try {
profileDir.mkdirs();
FileUtils.copyDirectory(dir, profileDir);
} catch (IOException e) {
e.printStackTrace();
}
return firefoxProfile;
}
The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.
if you are sure, confident about your code, you can go with phantomjs. it is a headless browser and will get your results with quick hits. FF will take time to execute.
Related
Hi I want to scrap the information from a website so I tried to use Jsoup (also tried HttpClient) to do so. I realize that both of them couldn't "see" certain content of the html page. so when I tried to print out the parsed html, I got the empty div like this. It prints out some other div just fine.
here's my code:
Class Main{
public static void main(String args[]) throws IOException, InterruptedException {
Document doc = Jsoup.connect(url).get();
System.out.println(doc.getElementsByClass("needed content"));
}
}
the result in the terminal is:
<div class="needed content"></div>
I am searching for answers on stackoverflow, some recommends using Jackson Library
Java - How do I access a child of Div using JSoup
some recommend embed a browser in java
Is there a way to embed a browser in Java?
some recommend using htmlunit
Fail to get full content of page with JSoup
I just tried combining Jsoup with html unit, same result here's the code:
try(WebClient wc = new WebClient()){
wc.getOptions().setJavaScriptEnabled(true);
wc.getOptions().setCssEnabled(false);
wc.getOptions().setThrowExceptionOnScriptError(false);
wc.getOptions().setTimeout(10000);
HtmlPage page = wc.getPage("https://chainlinklabs.com/jobs");
String pageXml = page.asXml();
Document doc2 = Jsoup.parse(pageXml, url);
System.out.println(doc2.getElementsByClass("needed content"));
System.out.println("Thank God!");
}
My interpretation of the problem is Jsoup is not showing part of the html content because it contains javascript; am I heading to the right direction?
There is no need (and it is a waste of resources) to re-parse the page from HtmlUnit into jsoup. All the select options are available in HtmlUnit also (see https://htmlunit.sourceforge.io/gettingStarted.html) - and maybe more.
This simple code works for me - parts of the page are generated by an js script that starts asynchronous. Because of this you have to wait for these scripts before accessing the page.
public static void main(String[] args) throws IOException {
String url = "https://chainlinklabs.com/jobs";
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScriptStartingBefore(10_000);
// System.out.println("--------------------------------");
// System.out.println(page.asXml());
// System.out.println("--------------------------------");
System.out.println("- Jobs -------------------------");
final DomNodeList<DomNode> jobTitles = page.querySelectorAll(".job-title");
for (DomNode domNode : jobTitles) {
System.out.println(domNode.asNormalizedText());
}
System.out.println("--------------------------------");
}
}
Is there a way to get selenium screenshots with headers ? I've tried the code below but the screenshot does not have a header.
I have a test case that requires clicking a link and making sure the action must bring to a new tab, so as evidence I have to attach capture there are two tabs.
public static void main (String args[]) throws IOException {
DesiredCapabilities dc = new DesiredCapabilities();
RemoteWebDriver driver;
URL url = new URL("http://localhost:4444/wd/hub");
dc.setCapability(CapabilityType.BROWSER_NAME, BrowserType.CHROME);
dc.setCapability(CapabilityType.PLATFORM, "MAC");
driver = new RemoteWebDriver(url, dc);
driver.manage().window().maximize();
driver.get("https://google.com");
new WebDriverWait(driver, 20).until(ExpectedConditions.presenceOfElementLocated(By.name("q")));
File getImage = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(getImage, new File("/Users/path/screenshot.jpg"));
driver.quit();
}
Current result
Expected result
No you can't, the screenshot functionality in Selenium takes an image of the rendered DOM. The browser chrome (i.e. the the UI components of the browser rendered by the OS the browser is running on) is not part of the DOM so Selenium is unaware of it.
The next question is why do you want the browser chrome in your image? If you are just trying to find out the displayed URL (as your question implies) you can use the command driver.getCurrentUrl(); instead.
As #Ardesco suggested, it is not possible to take screenshot.
However i think you can use java.awt.Robot class to capture the screen. It takes the screenshot of the current screen.
Here's an snapshot of code for capturing screenshot using java.awt,
public void getScreenshot(int timeToWait) throws Exception {
Rectangle rec = new Rectangle(
Toolkit.getDefaultToolkit().getScreenSize());
Robot robot = new Robot();
BufferedImage img = robot.createScreenCapture(rectangle);
ImageIO.write(img, "jpg", setupFileNamePath());
}
I have a URL with a video embedded to it just like youtube URLs and i want to validate if the video loads and streams
The difficulty i am having is that i don't have the tagname , id or anything of the video element , so how can i check in such a case
Code using selenium:
public class URLCheck
{
public static void main(String args[]){
File file = new File("C:\\Users\\MB0000038\\Desktop\\chromedriver\\chromedriver.exe");
System.setProperty("webdriver.chrome.driver", file.getAbsolutePath());
WebDriver driver=new ChromeDriver();
driver.get("https://www.youtube.com/watch?v=FtsrtcagbOQ");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// JavascriptExecutor js = (JavascriptExecutor) driver;
// WebElement video = driver.findElement(By.tagName("video"));
driver.quit();
}
}
This code works well but only open a chrome tab displaying the video
Thanks in advance
I have checked your video, there are two tags which you can play around to verify your video.
driver.get("https://www.youtube.com/watch?v=59rEMnKWoS4");
driver.manage().window().maximize();
WebElement pauseAndPlay = driver.findElement(By.xpath("//button[#class='ytp-play-button ytp-button']"));
//Get the attribute aria-lable of pauseAndPlay Element which would tells you current state of video(pause/play)
String videoState=pauseAndPlay.getAttribute("aria-label");
if(videoState.equalsIgnoreCase("Pause")){
System.out.println("Video is currently in play state");
Thread.sleep(3000);
//pausing my video and checking the current play time
pauseAndPlay.click();
Thread.sleep(2000);
WebElement currentTimeElement=driver.findElement(By.xpath("//span[#class='ytp-time-current']"));
String currentTime=currentTimeElement.getText();
if(currentTime!="0:00"){
System.out.println("my video is getting progressed and currently at:"+currentTime);
}else{
System.out.println("my video is not getting played");
}
}else if(videoState.equalsIgnoreCase("Play")){
System.out.println("Video is currently paused");
}
I need to download a source code of this webpage: https://app.zonky.cz/#/marketplace/ so I could have the code checking if there is a new loan available. Unfortunate for me, the web page uses a loading spinner for the time the page is being loaded in the background. When I try to download the page's source using:
String url = "https://app.zonky.cz/#/marketplace/";
StringBuilder text = new StringBuilder();
try
{
URL pageURL = new URL(url);
Scanner scanner = new Scanner(pageURL.openStream(), "utf-8");
try {
while (scanner.hasNextLine()){
text.append(scanner.nextLine() + "\n");
}
}
finally{
scanner.close();
}
}
catch(Exception ex)
{
//
}
System.out.println(text.toString());
I get the page's source from the moment the spinner is being shown. Do you know of a better approach?
Solution:
public static String getSource() {
WebDriver driver = new FirefoxDriver();
driver.get("https://app.zonky.cz/#/marketplace/");
String output = driver.getPageSource();
driver.close();
return output;
}
You could always wait until the page has finished loading by checking if an element exists(one that is only loaded after the spinner disappears)
Also have you looked into using selenium ? it can be really useful for interacting with websites and handling tricky procedures such as waiting for elements :P
Edit: a pretty simple tutorial for Selenium waiting can be found here - http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#explicit-and-implicit-waits
I'm using Selenium WebDriver to try to insert an external javascript file into the DOM, rather than type out the entire thing into executeScript.
It looks like it properly places the node into the DOM, but then it just disregards the source, i.e. the function on said source js file doesn't run.
Here is my code:
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
public class Example {
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
driver.get("http://google.com");
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("document.getElementsByTagName('head')[0].innerHTML += '<script src=\"<PATH_TO_FILE>\" type=\"text/javascript\"></script>';");
}
}
The code of the javascript file I am linking to is
alert("hi Nate");
I've placed the js file on my localhost, I called it using file:///, and I tried it on an external server. No dice.
Also, in the Java portion, I tried appending 'scr'+'ipt' using that trick, but it still didn't work. When I inspect the DOM using Firefox's inspect element, I can see it loads the script node properly, so I'm quite confused.
I also tried this solution, which apparently was made for another version of Selenium (not webdriver) and thus didn't work in the least bit: Load an external js file containing useful test functions in selenium
According to this: http://docs.seleniumhq.org/docs/appendix_migrating_from_rc_to_webdriver.jsp
You might be using the browserbot to obtain a handle to the current
window or document of the test. Fortunately, WebDriver always
evaluates JS in the context of the current window, so you can use
“window” or “document” directly.
Alternatively, you might be using the browserbot to locate elements.
In WebDriver, the idiom for doing this is to first locate the element,
and then pass that as an argument to the Javascript. Thus:
So does the following work in webdriver?
WebDriver driver = new FirefoxDriver();
((JavascriptExecutor) driver)
.executeScript("var s=window.document.createElement('script');\
s.src='somescript.js';\
window.document.head.appendChild(s);");
Injecting our JS-File into DOM
Injecting our JS-File into browsers application from our local server, so that we can access our function using document object.
injectingToDOM.js
var getHeadTag = document.getElementsByTagName('head')[0];
var newScriptTag = document.createElement('script');
newScriptTag.type='text/javascript';
newScriptTag.src='http://localhost:8088/WebApplication/OurOwnJavaScriptFile.js';
// adding <script> to <head>
getHeadTag.appendChild(newScriptTag);
OurSeleniumCode.java
String baseURL = "http://-----/";
driver = new FirefoxDriver();
driver.navigate().to(baseURL);
JavascriptExecutor jse = (JavascriptExecutor) driver;
Scanner sc = new Scanner(new FileInputStream(new File("injectingToDOM.js")));
String inject = "";
while (sc.hasNext()) {
String[] s = sc.next().split("\r\n");
for (int i = 0; i < s.length; i++) {
inject += s[i];
inject += " ";
}
}
jse.executeScript(inject);
jse.executeScript("return ourFunction");
OurOwnJavaScriptFile.js
document.ourFunction = function(){ .....}
Note : If you are passing JS-File as String to executeScript() then don't use any comments in between JavaScript code, like injectingToDOM.js remove all comments data.