I'm trying to download thousands of Excel files from a website. I'd normally use urllib2 for this, but unfortunately the actual downloading takes place through a java applet and the urls don't change correspondingly. E.g., filling out a query and hitting download doesn't change the url until the file is actually downloading, and when it does change the url is always the same and doesn't change based on the query. So, in sum, I'm trying to download a bunch files which are normally queried through a java applet using python. Thanks in advance!
Related
In my application i have a webpage created for the user, which display the content from the mysql workbench in the tabular format. this application is created using spring mvc(eclipse ide). in this page i have created a export button, on click of this button the contents present the table will be exported to pdf or excel format and the generated file should be downloaded to the download folder.
Can anyone help with:
extract these table content
export the content to pdf/excel(xls)
download of the file on click of the button.
I am totally confused how do i start with it. any reference for how to read the webpage content and proceed with this will be helpful.
Not exactly sure of what you want to do, but to extract a website content is easy with Jsoup, it is a great html parsing library and can be downloaded here. To produce excel file you could use a library developed by Apache - Apache POI. A tutorial on how to do this can be found here.
Here is what I think you can do-
Since you are displaying the contents in a webpage, you already have an API that fetches the data from the data source. You can utilize the same API for getting the data. You would then need to turn that data into a CSV format. On top of this you can have a little REST service that serves contents of type text/csv. Your download button in the UI can invoke this REST to get the CSV file downloaded. Hope this helps.
I am developing a Java project in which i have a sub-module where i need to extract contents [text, image, color] from a webpage and compare it with another webpage. I am planning to use WinHTTrack software for downloading the webpage locally, but the problem is it doesn't save it as HTML. How can i download a webpage with HTML extension using softwares such as WinHTTrack [or just saving the webpage through ctrl+s is enogh.?]. Also i am planning to use HTML Parsers to extract the 3 content types[text, image, color],after downloading the webpage locally. So which parser to go with.?
WEll I use Httrack and it fetches html files as well. You are probably taking winhttrack project file as the only output file, but if you check inside the project directory there are html files (together with images, etc). I would suggest using - http://htmlparser.sourceforge.net/. It is a java library and since your project is a Java project it should be fairly easy to use it. You can also save the whole website locally using org.htmlparser.parserapplications.SiteCapturer (and specify whether resources such as images should be captured as well). Hope it helps.
Is it possible send a image or file from an java applet to a php script which produce a pdf.
Until now my java applet has the opportunity that you can save a screenshot of the applet in the users chosen directory and the user can save the test results in form of a pdf. But I want t combine it to one file and I would like to avoid to work with a database.
Yes, it wouldn't even make sense to use a database for this (unless that is the source of the test results). Just upload the screenshot/results as one POST request and then generate PDF with PHP.
So what you want is to render the PDF combining both, the image and the test results right? The processing Framework has a nice Render2PDF function. Integrate it in your project and u wont need this PHP stuff which comes directly from hell.
Radek
I already tested window.print() command for this purpose but it is not fulfill my requirement.
I also used print content of iframe in which source is pdf file but it is only work in chrome not in other browser.
I want to print pdf files automatically using code instead of open file and print it.
For example there are two files such as 1.pdf and 2.pdf in any directory and source is given then how can print both files using either javascript or php or both.
I already tested window.print() command for this purpose but it is not fulfill my requirement.
My required as image as:
Million thanks in advance.
This is not possible since most browsers, unlike google chrome (where it works) don't have a built in pdf viewer.
The printing of a pdf document is up to the pdf reader, whether or not it is installed as a browser plugin, not the browser.
I fix this issue of merging multiple pdf or image or both by using imageMagick.
Using below command we can merge pdf and image as:
<?
$cmd = "test.pdf test.jpeg final.pdf";
exec("convert $cmd");
?>
After completed merging process, open final.pdf automatically using code then user can print it easily.
You can find more.
We have Flex on the front end and Java on the back end. When a user will request for a PDF file, request will go to the Java backend, where a PDF file will be generated using Jasper Reports. What we dont know is how to display this PDF file in browser; since we dont want to use JSP/Servlets etc - It has to be flex only. Any suggestions?
Flash Player cannot natively render PDF files. This is possible using Adobe AIR but not in a Flex application. Your best bet is to call navigateToURL() and open a Servlet in a new browser tab/window. The Servlet can simply write contents of the PDF file to the OutputStream and set the appropriate HTTP headers.
i think this question is old, but it may help others, there's a new library developed by Jasper Forge them selves, which deals with JasperReports directly, i mean it's not a PDF viewer, but a JasperReport exporting tool, you can download it from here
i tried it through using JasperServer, when viewing reports you can choose from different options to export it, one of them is flash, and it's working nice
Well for starters, PDFs don't always display in the browser. It depends on the user's settings. You essentially header them the pdf file and either they download it or a program like Acrobat Reader opens in the browser to display it.
Not sure how this is done in flex, I would imagine if you're using Java one simple servlet could do it.