Is there any way to get text from PDF pages using selenium/java apart from reading through input file stream?
In my application a report opens in PDF format, I need to get data from it.
When opened in Firefox it shows DOM structure but I wasn't able to locate element using that.
Big NO.Selenium automates browsers,Mock web applications, run tests. What you are asking is not the part of Selenium api. Third party api's are available that doesn't work 100%. check out
How to extract text from a PDF?
Related
I am developing a Java Web Application (jsp/servlet) using tomcat. I need to display pdf file from local machine. can you suggest what is best way to display it?
I used iframe to display pdf file.
<iframe src="resume.pdf" width="100%" style="height:60em">
[Your browser does <em>not</em> support <code>iframe</code>,
or has been configured not to display inline frames.
You can access the document
via a link though.]
</iframe>
I think you can try a Library called XPDF , I think you can convert from PDF to HTML page , or the second option is just let the user open a link to the page (www.yourwebsite.com/pdffolder/somepdf.pdf)
If you need display a pdf file using tomcat, you can access directly to the file using the specific url where the file is located in your navigator, depending on the path where you put the file, so you can access using 127.0.0.1/files/test.pdf for example. If you need generate a pdf, the best tool I think is iText, this is an easy example how to use id: Introducing PDF and iText
I have a HTML file containing some java script tags. When I run this file in some browser such as IE, some contents are cached from its source and displayed on browser(for example weather of some cities). How can I run run this html file and get contents of web page that was displayed on web browser before? I don't want to display contents on my application; I want to parse returned data and extract some special contents(for example extract weather of each city).
can anyone guide me please?
What you're trying to do is called html scraping.
Your best option is to get help in the form of a library, since this is a conmon and complex task.
See this question: Options for HTML scraping?
Selenium is a good bet. It supports HtmlUnit, Firefox, Chrome amongst other browsers.
Link: http://seleniumhq.org/
This question is related to another one I've posted recently: Check printing with Java/JSP
We're looking for alternatives to how we currently print checks in a Java web application via an applet. It seems the consensus is to use PDF for printing and that itext offers the ability to do so with Java.
However, it's important in our particular case that the checks are "print-only" - the user should not have any ability in the application to save the check (I know a savvy user could do a PrintScreen but we want to cover our rears and make no native functionality in the app to save checks).
I haven't been successful in browsing the web to find out if it's possible to create a PDF with itext in this manner. I have seen posts on restricting permissions in a PDF but what I'm really looking for is a way to disable the ability to save a PDF locally using itext.
Does this functionality exist? If so, could you point me to documentation/code samples on it?
I'm presuming that you're serving this PDF and wishing to print it from within a web application / web site where no out of the ordinary client side plug-ins are installed.
If printing the PDF using conventional means (e.g. Adobe Reader), the PDF MUST be downloaded to the browser's cache to be opened and printed. There is no way around that.
Now you can probably prevent the average Joe from saving the PDF locally via the following technique, but any savvy user will be able to inspect your HTML's source and download the PDF directly.
Output your PDF in iText such that when the PDF is opened, a print action automatically occurs
Put an invisible IFRAME on your HTML page which loads this PDF, but is not visible in the browser to your user
When the user loads your HTML page, the PDF will be loaded in the IFRAME and sent to the users printer (presuming that Adobe Reader is installed in the browser). Yes, the PDF will end up in the browser cache, but the user would have to be savvy enough to both recognize this and then hunt it down in their browser's cache.
If this is not acceptable, you're going to have to look at converting the PDF to another file type (e.g. pages are rendered to images displayed in the browser or perhaps a Flash / Java object that sends each page in the document to the printer directly)
The printWriter class gives some static variables for certain options: PrintWriter
And here is another SO post that might help: iText disable printing/Copying/Saving
I already tested window.print() command for this purpose but it is not fulfill my requirement.
I also used print content of iframe in which source is pdf file but it is only work in chrome not in other browser.
I want to print pdf files automatically using code instead of open file and print it.
For example there are two files such as 1.pdf and 2.pdf in any directory and source is given then how can print both files using either javascript or php or both.
I already tested window.print() command for this purpose but it is not fulfill my requirement.
My required as image as:
Million thanks in advance.
This is not possible since most browsers, unlike google chrome (where it works) don't have a built in pdf viewer.
The printing of a pdf document is up to the pdf reader, whether or not it is installed as a browser plugin, not the browser.
I fix this issue of merging multiple pdf or image or both by using imageMagick.
Using below command we can merge pdf and image as:
<?
$cmd = "test.pdf test.jpeg final.pdf";
exec("convert $cmd");
?>
After completed merging process, open final.pdf automatically using code then user can print it easily.
You can find more.
We have Flex on the front end and Java on the back end. When a user will request for a PDF file, request will go to the Java backend, where a PDF file will be generated using Jasper Reports. What we dont know is how to display this PDF file in browser; since we dont want to use JSP/Servlets etc - It has to be flex only. Any suggestions?
Flash Player cannot natively render PDF files. This is possible using Adobe AIR but not in a Flex application. Your best bet is to call navigateToURL() and open a Servlet in a new browser tab/window. The Servlet can simply write contents of the PDF file to the OutputStream and set the appropriate HTTP headers.
i think this question is old, but it may help others, there's a new library developed by Jasper Forge them selves, which deals with JasperReports directly, i mean it's not a PDF viewer, but a JasperReport exporting tool, you can download it from here
i tried it through using JasperServer, when viewing reports you can choose from different options to export it, one of them is flash, and it's working nice
Well for starters, PDFs don't always display in the browser. It depends on the user's settings. You essentially header them the pdf file and either they download it or a program like Acrobat Reader opens in the browser to display it.
Not sure how this is done in flex, I would imagine if you're using Java one simple servlet could do it.