I have a HTML file containing some java script tags. When I run this file in some browser such as IE, some contents are cached from its source and displayed on browser(for example weather of some cities). How can I run run this html file and get contents of web page that was displayed on web browser before? I don't want to display contents on my application; I want to parse returned data and extract some special contents(for example extract weather of each city).
can anyone guide me please?
What you're trying to do is called html scraping.
Your best option is to get help in the form of a library, since this is a conmon and complex task.
See this question: Options for HTML scraping?
Selenium is a good bet. It supports HtmlUnit, Firefox, Chrome amongst other browsers.
Link: http://seleniumhq.org/
Related
Is there any way to get text from PDF pages using selenium/java apart from reading through input file stream?
In my application a report opens in PDF format, I need to get data from it.
When opened in Firefox it shows DOM structure but I wasn't able to locate element using that.
Big NO.Selenium automates browsers,Mock web applications, run tests. What you are asking is not the part of Selenium api. Third party api's are available that doesn't work 100%. check out
How to extract text from a PDF?
I am developing a Java Web Application (jsp/servlet) using tomcat. I need to display pdf file from local machine. can you suggest what is best way to display it?
I used iframe to display pdf file.
<iframe src="resume.pdf" width="100%" style="height:60em">
[Your browser does <em>not</em> support <code>iframe</code>,
or has been configured not to display inline frames.
You can access the document
via a link though.]
</iframe>
I think you can try a Library called XPDF , I think you can convert from PDF to HTML page , or the second option is just let the user open a link to the page (www.yourwebsite.com/pdffolder/somepdf.pdf)
If you need display a pdf file using tomcat, you can access directly to the file using the specific url where the file is located in your navigator, depending on the path where you put the file, so you can access using 127.0.0.1/files/test.pdf for example. If you need generate a pdf, the best tool I think is iText, this is an easy example how to use id: Introducing PDF and iText
I want a Java app that would capture all the images (and preferably data in other tags too) from a webpage and write their links to an excel file.
While I know my way around Excel files and Java, I was just wondering if there's any way to capture images from web pages.
A quick google search didnt help
Obviously there is.
Since images are in the source code, you can start from the simpliest solution - getting the page source, retrieve image links and download them.
KISS ;-)
Probably you need to parse the html of the webpage and get the links referring to images from respective html tags.
I already tested window.print() command for this purpose but it is not fulfill my requirement.
I also used print content of iframe in which source is pdf file but it is only work in chrome not in other browser.
I want to print pdf files automatically using code instead of open file and print it.
For example there are two files such as 1.pdf and 2.pdf in any directory and source is given then how can print both files using either javascript or php or both.
I already tested window.print() command for this purpose but it is not fulfill my requirement.
My required as image as:
Million thanks in advance.
This is not possible since most browsers, unlike google chrome (where it works) don't have a built in pdf viewer.
The printing of a pdf document is up to the pdf reader, whether or not it is installed as a browser plugin, not the browser.
I fix this issue of merging multiple pdf or image or both by using imageMagick.
Using below command we can merge pdf and image as:
<?
$cmd = "test.pdf test.jpeg final.pdf";
exec("convert $cmd");
?>
After completed merging process, open final.pdf automatically using code then user can print it easily.
You can find more.
I would like some guidance on the best way of launching PDFs from a browser.
I have a JSP that takes some parameters and based on this downloads a PDF from my Documentum server. The files are stored on my local file system. I then provide the user with a bunch of links to the PDFs so they can click on them to launch the PDF.
Is there a best practice method of doing this?
Thanks.
There is nothing special to do: a link is sufficient. But you can add target="_new" if you want the browser to open the pdf in a new window.
Just make sure that the content-type returned for the pdf is application/pdf.