Jsoup getting image source of widget from google search [duplicate] - java

Can I fill out forms, execute events and Javascript functions in Jsoup? If yes how can I? Or should I go for another parser.

JSoup is just an HTML parser/"tidyfier" - not a browser emulator. To interact with HTML pages (execute javascript, fill out forms, etc.) you should use a tool like HtmlUnit or Selenium.

Use Selenium - if you use Selenium 2 WebDriver API, the main classes there are WebDriver, FirefoxDriver, and JavascriptExecutor.

Related

Jsoup parse triple nested span [duplicate]

Can I fill out forms, execute events and Javascript functions in Jsoup? If yes how can I? Or should I go for another parser.
JSoup is just an HTML parser/"tidyfier" - not a browser emulator. To interact with HTML pages (execute javascript, fill out forms, etc.) you should use a tool like HtmlUnit or Selenium.
Use Selenium - if you use Selenium 2 WebDriver API, the main classes there are WebDriver, FirefoxDriver, and JavascriptExecutor.

Can we automate pdf using selenium?

Is there any way to get text from PDF pages using selenium/java apart from reading through input file stream?
In my application a report opens in PDF format, I need to get data from it.
When opened in Firefox it shows DOM structure but I wasn't able to locate element using that.
Big NO.Selenium automates browsers,Mock web applications, run tests. What you are asking is not the part of Selenium api. Third party api's are available that doesn't work 100%. check out
How to extract text from a PDF?

Perform click on Web page element before parsing in Java

I'm trying to parse HTML page with DOM parser and jsoup library.
The problem that I'm facing is this:
On Web site there are two buttons which show two different tables.
I need to parse the table which is shown when the second button is clicked.
There are different attribute values set after clicking the second button.
When I do Jsoup.connect("example.com")
I get response like first button is selected and I don't need that data.
Is there a way to perform click on second button, and then start parsing and retrieving data from Web site?
Jsoup is just a parser, i.e. it can't handle events such as clicking on buttons. Have a look at browser automation tools (e.g. Selenium) to perform this kind of job.
JSoup is a HTML parser and not a browser alternative. Take a look at Html Unit
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
JSoup can't control the web page, only parse the content. For manipulation and interaction, there are some tools. I recommend Geb, which uses a Groovy DSL with a JQuery like syntax, making it very fluent. It's also pretty easy to parse xml/html with it.

java: get html contents

I have a HTML file containing some java script tags. When I run this file in some browser such as IE, some contents are cached from its source and displayed on browser(for example weather of some cities). How can I run run this html file and get contents of web page that was displayed on web browser before? I don't want to display contents on my application; I want to parse returned data and extract some special contents(for example extract weather of each city).
can anyone guide me please?
What you're trying to do is called html scraping.
Your best option is to get help in the form of a library, since this is a conmon and complex task.
See this question: Options for HTML scraping?
Selenium is a good bet. It supports HtmlUnit, Firefox, Chrome amongst other browsers.
Link: http://seleniumhq.org/

How get a text element from an xpath expression in Selenium?

When Selenium navigates through web pages I want to save some of the text from that web pages. I use Selenium with Java. So is there a way to extract from that web pages the text having a specific xpath?
selenium.getText("//div[#id='debugState']");
Visit, http://release.seleniumhq.org/selenium-remote-control/0.9.0/doc/java/com/thoughtworks/selenium/DefaultSelenium.html#getText%28java.lang.String%29

Categories

Resources