How to download a dataset using java - java

I want to make an app that will download all the paper mentioned in the image by taking the domain as input. i have been able to download a single file from some other website but was unable to download a paper form acm digital library.
what i need to do is download the entire data set.
here is the code that i used to download a single file.
String fileName = "1.txt";
URL link = new URL("http://shayconcepts.com");
InputStream in = new BufferedInputStream(link.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[2048];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(response);
fos.close();
How can I modify it to download entire data set

Related

Amazon Merchant Fulfilment Create Shipment Label

I am trying to get the shipment label from amazon merchant fulfillment as per the instructions mentioned on the Amazon pages.
"To obtain the actual PDF document, you must decode the Base64-encoded string, save it as a binary file with a “.zip” extension, and then extract the PDF file from the ZIP file."
Has any one got it to work. I have tried couple of things but every time i get blank pdf.
Here is my code. Can please some body guide me if I am doing it correctly
byte[] decodedBytes = Base64.decodeBase64(contents);
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream("c:\\output\\asdwd.zip")));
//now create the entry in zip file
ZipEntry entry = new ZipEntry("asd.pdf");
zos.putNextEntry(entry);
zos.write(decodedBytes);
zos.close();
The instructions say to save the bytes as a binary file with the extension .zip.
What you are actually doing is creating a ZIP file with the contents of the byte array as an entry.
According to my reading of the instructions, your code should do this:
byte[] decodedBytes = Base64.decodeBase64(contents);
FileOutputStream fos = new FileOutputStream("c:\\output\\asdwd.zip");
fos.write(decodedBytes);
fos.close();
Or better still:
byte[] decodedBytes = Base64.decodeBase64(contents);
try (FileOutputStream fos = new FileOutputStream("c:\\output\\asdwd.zip")) {
fos.write(decodedBytes);
}
Then using a ZIP tool or a web browser, open asdwd.zip, find the entry containing the PDF, and extract it or print it.
Here is the code to generate a shipping label in case somebody needs it.
byte[] decoded = Base64.decodeBase64(contents);
try (FileOutputStream fos = new FileOutputStream(zipFilePath + amazonOrderId + zipFileName)) {
fos.write(decoded);
fos.close();
}
file = new File(destDirectory + amazonOrderId + pngFile);
if (file.exists()) {
file.delete();
}
try (OutputStream out = new FileOutputStream(destDirectory + amazonOrderId + pngFile)) {
try (InputStream in = new GZIPInputStream(
new FileInputStream(zipFilePath + amazonOrderId + zipFileName))) {
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = in.read(buffer)) != -1) {
out.write(buffer, 0, noRead);
}
}
}

Java InputStream Downloading Slowly

I am using this code to download an encrypted file:
InputStream frame = new URL(url).openStream();
ByteArrayOutputStream data = new ByteArrayOutputStream();
int count;
byte[] buff = new byte[10000];
while((count = frame.read(buff)) > 0){
data.write(buff, 0, count);
System.out.println(count);
}
This code works great sometimes, but others it can take up to a minute for a ~36 kilobyte file. Is there a better way of doing this or is the problem simply the connection to the download server?

How to download an Image directly from the URL in some websites?

I was trying to download this image, directly from the full path. When I try it throw Firefox, I'm redirected to the initial page.
But into some Manga, i could see the url from image, and it's that one...
I'm using the code below... Someone knows why it's happening?
URL url = new URL("http://mangas.centraldemangas.com.br/death_note/death_note001-01.jpg");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
String nomeArquivo = "teste.jpg";
FileOutputStream fos = new FileOutputStream("C://ImagensAnime//"+nomeArquivo);
fos.write(response);
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
Is there any way to download it using this URL directly?
Most likely they are checking the HTTP REFERRER (with mod_rewrite maybe) and if it doesn't match their domain it will redirect you to the homepage.
Try setting HTTP-REFERRER to http://centraldemangas.com.br/ and see if that works.

Java servlet and IO: Create a file without saving to disk and sending it to the user

I`m hoping can help me out with a file creation/response question.
I know how to create and save a file. I know how to send that file back to the user via a ServletOutputStream.
But what I need is to create a file, without saving it on the disk, and then send that file via the ServletOutputStream.
The code above explains the parts that I have. Any help appreciated. Thanks in Advance.
// This Creates a file
//
String text = "These days run away like horses over the hill";
File file = new File("MyFile.txt");
Writer writer = new BufferedWriter(new FileWriter(file));
writer.write(text);
writer.close();
// Missing link goes here
//
// This sends file to browser
//
InputStream inputStream = null;
inputStream = new FileInputStream("C:\\MyFile.txt");
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int bytesRead;
while ( (bytesRead = inputStream.read(buffer)) != -1)
baos.write(buffer, 0, bytesRead);
response.setContentType("text/html");
response.addHeader("Content-Disposition", "attachment; filename=Invoice.txt");
byte[] outBuf = baos.toByteArray();
stream = response.getOutputStream();
stream.write(outBuf);
You don't need to save off a file, just use a ByteArray stream, try something like this:
inputStream = new ByteArrayInputStream(text.getBytes());
Or, even simpler, just do:
stream.write(text.getBytes());
As cHao suggests, use text.getBytes("UTF-8") or something similar to specify a charset other than the system default. The list of available charsets is available in the API docs for Charset.

How do I get the InputStream of decompressed data from an InputStream of GZIPed data?

I call a service which returns a gzipped file. I have the data as an InputStream (courtesy of javax.activation.DataHandler.getInputStream();) from the response.
What I would like to do is, without writing anything to disk, get an InputStream of the decompressed data in the file that is in the archive. The compressed file in this case is an xml document that I am trying to unmarshal using javax.xml.bind.Unmarshaller which takes an InputStream.
I'm currently trying to write the InputStream to an OutputStream (decompressing the data) and then I'll need to write it back to an InputStream. It's not working yet so I thought I would see if there was a better (I would hope so) approach.
I can write the initial InputStream to disk and get a gz file, and then read that file, get the compressed file out of it and go from there but I'd rather keep it all in memory is possible.
Update 1: Here is my current (not working - get a "Not in GZIP format" exception):
ByteArrayInputStream xmlInput = null;
try {
InputStream in = dh.getInputStream(); //dh is a javax.activation.DataHandler
BufferedInputStream bis = new BufferedInputStream(in);
ByteArrayOutputStream bo = new ByteArrayOutputStream();
int bytes_read = 0;
byte[] dataBuf = new byte[4096];
while ((bytes_read = bis.read(dataBuf)) != -1) {
bo.write(dataBuf, 0, bytes_read);
}
ByteArrayInputStream bin = new ByteArrayInputStream(bo.toByteArray());
GZIPInputStream gzipInput = new GZIPInputStream(bin);
ByteArrayOutputStream out = new ByteArrayOutputStream();
dataBuf = new byte[4096];;
bytes_read = 0;
while ((bytes_read = gzipInput.read(dataBuf)) > 0) {
out.write(dataBuf, 0, bytes_read);
}
xmlInput = new ByteArrayInputStream(out.toByteArray());
If instead of writing to a ByteArrayOutputStream I write to a FileOutputStream the first time around I get a compressed file (which I can manually open to get the xml file within) and the service (eBay) says it should be a gzip file so I'm not sure why I get a "Not in GZIP format" error.
Update 2: I tried something a little different - same error ("Not in GZIP format"). Wow, I just tried to end that parenthesis with a semi-colon. Anyways, here is my second attempt, which still does not work:
ByteArrayInputStream xmlInput = null;
try {
GZIPInputStream gzipInput = new GZIPInputStream(dh.getInputStream());
ByteArrayOutputStream bo = new ByteArrayOutputStream();
int bytes_read = 0;
byte[] dataBuf = new byte[4096];
while ((bytes_read = gzipInput.read(dataBuf)) != -1) {
bo.write(dataBuf, 0, bytes_read);
}
xmlInput = new ByteArrayInputStream(bo.toByteArray());
Decorate the input stream with a GZIPInputStream.
InputStream decompressed = new GZIPInputStream(compressed);
The following code should work. Keep in mind you'll have to handle exceptions properly.
OutputStream out = null;
InputStream in = null;
try {
out = /* some output stream */;
in = new java.util.GZIPInputStream(/*some stream*/);
byte[] buffer = new byte[4096];
int c = 0;
while (( c = in.read(buffer, 0, 4096)) > 0) {
out.write(buffer, 0, c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
Take a look at GZIPInputStream. Here's an example; the class handles this very transparently, it's almost no work to use.

Categories

Resources