Not able to get the pdf from internet - java

I am trying to download the pdf content from the internet and download it to a local file.
I am using iText for reading the pdf and using Java stream to write it.
After writing the file, new file can not be opened in the PDF reader.
PdfReader reader = new PdfReader(strURL);
FileOutputStream fos = new FileOutputStream(new File(fileName));
fos.write(reader.getPageContent(1));
fos.flush();
fos.close();
I am trying to get the PDf from this link
I am debugged few things. Here are the findings.
reader.getEofPos()
gives 291633, Which is same as file length.But
reader.getPageContent(1).length;
gives only 42360 bytes. Clearly byte read are less than the actual size.
Only one page of pdf is present
reader.getNumberOfPages() =1
Do i need to specify few more things to reader to read the entire pdf file?

If all you're trying to do is download a PDF from the internet and save it locally, this can be accomplished using a simple HTTP web request. An internet download is normally just an HTTP GET request, which you can accomplish by doing something like this:
URLConnection connection = new URL(url).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
InputStream response = connection.getInputStream();
Once you get the response, you can save the bytes to a path of your choosing.

Related

Not able to generate multiple documents using ServletOutputStream in Java [duplicate]

For example, i would like to download one zip file and one csv file in one response. Is there any way other than compressing these two files in one zip file.
Although ServletResponse is not meant to do this, we could programmatically tweak it to send multiple files, which all client browsers except IE seems to handle properly. A sample code snippet is given below.
response.setContentType("multipart/x-mixed-replace;boundary=END");
ServletOutputStream out = response.getOutputStream();
out.println("--END");
for(File f:files){
FileInputStream fis = new FileInputStream(file);
BufferedInputStream fif = new BufferedInputStream(fis);
int data = 0;
out.println("--END");
while ((data = fif.read()) != -1) {
out.write(data);
}
fif.close();
out.println("--END");
out.flush();
}
out.flush();
out.println("--END--");
out.close();
This will not work in IE browsers.
N.B - Try Catch blocks not included
Code developed by Jason Hunter to handle servlet request and response having multiple parts has been the defacto since years. You can find it at servlets.com
No you can not do that. The reason is that whenever you want to sent any data in request you use steam available in request and retrive this data using request.getRequestParameter("streamParamName").getInputStream(), also please make a note if you have already consumed this stream once you will not be able to get it again.
The example mentioned above is a tweak that google also uses in sending multipart email with multiple attachments. To achieve that they define boundaries for each attachment and client have to take care of these boundaries while retrieving this information and rendering it.

Decode Base64InputStream from String and unzip it with GZIPInputStream in Java

I am tryin to create simple app to decode the zipped data from string. In one textarea user will paste the zipped data after button click the data will be decoded and shown in another textarea.
If I use files, it works perfectly:
zis = new GZIPInputStream(new Base64InputStream(new FileInputStream(inZippedFile)));
where inZippedFile is file. Then result is saved to outputFile.
However, if I want to use string in InputStream it will never finish.
String input = "..."
InputStream in = IOUtils.toInputStream(input, "UTF-8");
zis = new GZIPInputStream(new Base64InputStream(in));
For IOUtils I am using common apache 2.4 jars. Anything what I am doing wrong?
Thanks
The decoding and unzip the string data is working correctly, there was just mistake in parsing the data to corect form. This was causing the long run.
So this is working, no need to set the UTF-8:
new GZIPInputStream(new Base64InputStream(IOUtils.toInputStream(input)));

Downloading a PDF file from a protected webpage

So I've been trying this for a couple of days now and I really don't have any time left since the project is due in tomorrow. I was wondering if someone could help me out with this. I'm trying to download a PDF file from this link, which is a link to a webpage of PDF content. I have tried using Jsoup but Jsoup does not support webpages when they are written in PDF format. This is the code I've been trying to use:
System.out.println("opening connection");
URL url = new URL("https://www.capitaliq.com/CIQDotNet/Filings/DocumentRedirector.axd?versionId=1257051021&type=pdf&forcedownload=false");
InputStream in = url.openStream();
FileOutputStream fos = new FileOutputStream("/Users/HIDDEN/Desktop/fullreport.pdf");
System.out.println("reading file...");
int length = -1;
byte[] buffer = new byte[1024];// buffer for portion of data from
// connection
while ((length = in.read(buffer)) > -1) {
fos.write(buffer, 0, length);
}
fos.close();
in.close();
System.out.println("file was downloaded");
The problem with this code is that it automatically redirects you to a login page in which you have to type your username and password. Therefore, I have to find a way to login to my account and connect to the page without using Jsoup (as earlier mentioned, this is unable to read PDF contents). If someone could alter this code to make it possible for me to login and subsequently download the pdf by looking at the html of this login page and adjusting the code. I would be eternally grateful. Thank you!
HtmlUnit is what I use for stuff like this, especially when speed is not critical.
Here's a random-ish piece of psuedo code from another one of my answers:
WebClient wc = new WebClient(BrowserVersion.CHROME);
HtmlPage p = wc.getPage(url)
((HtmlTextInput) p.getElementById(userNameId)).setText(userName);
((HtmlTextInput) p.getElementById(passId)).setText(pass);
p = ((HtmlElement) p.getElementById(submitBtnId)).click();
// Just as an example for something I've had to do, I use
// UnexpectedPage when the "content-type" is "application/zip"
UnexpectedPage up = ((HtmlElement) p.getElementById(downloadBtn)).click();
InputStream in = up.getInputStream();
...
Use another library for reading the pdf

How to efficiently download large csv file using java

I need to provide a feature where user can download reports in excel/csv format in my web application. Once i made a module in web application which creates excel and then read it and sent to browser. It was working correctly. This time i don't want to generate excel file, as i don't have that level of control over file systems. I guess one way is to generate appropriate code in StringBuffer and set correct contenttype(I am not sure about this approach). Other team also has this feature but they are struggling when data is very large. What is the best way to provide this feature considering size of data could be very huge. Is it possible to send data in chunk without client noticing(except delay in downloading).
One issue i forgot to add is when there is very large data, it also creates problem in server side (cpu utilization and memory consumption). Is it possible that i read fixed amount of records like 500, send it to client, then read another 500 till completed.
You can also generate HTML instead of CSV and still set the content type to Excel. This is nice for colouring and styled text.
You can also use gzip compression when the client accepts that compression. Normally there are standard means, like a servlet filter.
Never a StringBuffer or the better StringBuilder. Better streaming it out. If you do not (cannot) call setContentength, the output goes chunked (without predictive progress).
URL url = new URL("http://localhost:8080/Works/images/address.csv");
response.setHeader("Content-Type", "text/csv");
response.setHeader("Content-disposition", "attachment;filename=myFile.csv");
URLConnection connection = url.openConnection();
InputStream stream = connection.getInputStream();
BufferedOutputStream outs = new BufferedOutputStream(response.getOutputStream());
int len;
byte[] buf = new byte[1024];
while ((len = stream.read(buf)) > 0) {
outs.write(buf, 0, len);
}
outs.close();

Programmatically Downloading CSV Files with Java

Scenario: A website I use to research stock data has a link on the page to Export Data to Spreadsheet. The URL displayed when hovering over the export link is of the form http://www.stocksite.com/historical/export.php?symbol=C .
Question: Rather, that manually visiting the page for each stock I would like to automate the task. From Java, how can I programmatically call the site with a stock symbol and save the exported csv file? The URL and URLConnection class seem like the obvious place to start, but I'm unsure where to go from there.
All you need to do is to get the CSV in flavor of an InputStream.
InputStream input = new URL("http://example.com/file.csv").openStream();
Then you can feed it to any decent Java CSV parser API. Almost any of them take input in flavor of InputStream or Reader. If necessary, you can easily decorate InputStream as a Reader using InputStreamReader which you can then feed to the CSV parser.
Reader reader = new InputStreamReader(input, "UTF-8");

Categories

Resources