How to download an Image directly from the URL in some websites? - java

I was trying to download this image, directly from the full path. When I try it throw Firefox, I'm redirected to the initial page.
But into some Manga, i could see the url from image, and it's that one...
I'm using the code below... Someone knows why it's happening?
URL url = new URL("http://mangas.centraldemangas.com.br/death_note/death_note001-01.jpg");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
String nomeArquivo = "teste.jpg";
FileOutputStream fos = new FileOutputStream("C://ImagensAnime//"+nomeArquivo);
fos.write(response);
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
Is there any way to download it using this URL directly?

Most likely they are checking the HTTP REFERRER (with mod_rewrite maybe) and if it doesn't match their domain it will redirect you to the homepage.
Try setting HTTP-REFERRER to http://centraldemangas.com.br/ and see if that works.

Related

Blank pages in pdf after downloading it from web

I am trying to download a PDF file with HttpClient, it is downloading the PDF file but pages are blank. I can see the bytes on console from response if I print them. But when I try to write it to file it is producing a blank file.
FileUtils.writeByteArrayToFile(new File(outputFilePath), bytes);
However the file is showing correct size of 103KB and 297KB as expected but its just blank!!
I tried with Output stream as well like:
FileOutputStream fileOutputStream = new FileOutputStream(outFile);
fileOutputStream.write(bytes);
Also tried to write with UTF-8 coding like:
Writer out = new BufferedWriter( new OutputStreamWriter(
new FileOutputStream(outFile), "UTF-8"));
String str = new String(bytes, StandardCharsets.UTF_8);
try {
out.write(str);
} finally {
out.close();
}
Nothing is working for me. Any suggestion is highly appreciated..
Update: I am using DefaultHttpClient.
HttpGet httpget = new HttpGet(targetURI);
HttpResponse response = null;
String htmlContents = null;
try {
httpget = new HttpGet(url);
response = httpclient.execute(httpget);
InputStreamReader dataStream=new InputStreamReader(response.getEntity().getContent());
byte[] bytes = IOUtils.toByteArray(dataStream);
...
You do
InputStreamReader dataStream=new InputStreamReader(response.getEntity().getContent());
byte[] bytes = IOUtils.toByteArray(dataStream);
As has already been mentioned in comments, using a Reader class can damage binary data, e.g. PDF files. Thus, you should not wrap your content in an InputStreamReader.
As your content can be used to construct an InputStreamReader, though, I assume response.getEntity().getContent() returns an InputStream. Such an InputStream usually can be directly used as IOUtils.toByteArray argument.
So:
InputStream dataStream=response.getEntity().getContent();
byte[] bytes = IOUtils.toByteArray(dataStream);
should already work for you!
Here is a method I use to download a PDF file from a specific URL. The method requires two string arguments, an url string (example: "https://www.ibm.com/support/knowledgecenter/SSWRCJ_4.1.0/com.ibm.safos.doc_4.1/Planning_and_Installation.pdf") and a destination folder path to download the PDF file (or whatever) into. If the destination path does not exist within the local file system then it is automatically created:
public boolean downloadFile(String urlString, String destinationFolderPath) {
boolean result = false; // will turn to true if download is successful
if (!destinationFolderPath.endsWith("/") && !destinationFolderPath.endsWith("\\")) {
destinationFolderPath+= "/";
}
// If the destination path does not exist then create it.
File foldersToMake = new File(destinationFolderPath);
if (!foldersToMake.exists()) {
foldersToMake.mkdirs();
}
try {
// Open Connection
URL url = new URL(urlString);
// Get just the file Name from URL
String fileName = new File(url.getPath()).getName();
// Try with Resources....
try (InputStream in = url.openStream(); FileOutputStream outStream =
new FileOutputStream(new File(destinationFolderPath + fileName))) {
// Read from resource and write to file...
int length = -1;
byte[] buffer = new byte[1024]; // buffer for portion of data from connection
while ((length = in.read(buffer)) > -1) {
outStream.write(buffer, 0, length);
}
}
// File Successfully Downloaded");
result = true;
}
catch (MalformedURLException ex) { ex.printStackTrace(); }
catch (IOException ex) { ex.printStackTrace(); }
return result;
}

How to download a dataset using java

I want to make an app that will download all the paper mentioned in the image by taking the domain as input. i have been able to download a single file from some other website but was unable to download a paper form acm digital library.
what i need to do is download the entire data set.
here is the code that i used to download a single file.
String fileName = "1.txt";
URL link = new URL("http://shayconcepts.com");
InputStream in = new BufferedInputStream(link.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[2048];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(response);
fos.close();
How can I modify it to download entire data set

http request to download a file in java

Im really new to Java so i hope you can help me.
I want to Programm a http request to download a File (pdf) from a Website.
The Website is using cookies. So i have to send a first request to get a cookie and then a second to download the file.
I want to use no external Libraries if possible.
This is what i gotso far:
URLConnection request = null;
String url = str;
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
request = new URL(url ).openConnection();
System.setProperty("http.maxRedirects", "100");
InputStream in = request.getInputStream();
File downloadedFile = File.createTempFile("test", ".pdf");
FileOutputStream out = new FileOutputStream(downloadedFile);
byte[] buffer = new byte[1024];
int len = in.read(buffer);
while (len != -1) {
out.write(buffer, 0, len);
len = in.read(buffer);
if (Thread.interrupted()) {
try {
throw new InterruptedException();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
in.close();
out.close();
return downloadedFile.getAbsolutePath();
my code seems to do nothing...
Best whishes M1K
In case you are using spring framework in your application, See if you can use copy of method of FileCopyUtils and copy fileInputStream to response outputstream like below.
FileCopyUtils.copy(new FileInputStream(file), response.getOutputStream());

Read first bytes of a file

I need a very simple function that allows me to read the first 1k bytes of a file through FTP. I want to use it in MATLAB to read the first lines and, according to some parameters, to download only files I really need eventually. I found some examples online that unfortunately do not work. Here I'm proposing the sample code where I'm trying to download one single file (I'm using the Apache libraries).
FTPClient client = new FTPClient();
FileOutputStream fos = null;
try {
client.connect("data.site.org");
// filename to be downloaded.
String filename = "filename.Z";
fos = new FileOutputStream(filename);
// Download file from FTP server
InputStream stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
stream.read(b);
fos.write(b);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (fos != null) {
fos.close();
}
client.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
}
the error is in stream which is returned empty. I know I'm passing the folder name in a wrong way, but I cannot understand how I have to do. I've tried in many way.
I've also tried with the URL's Java classes as:
URL url;
url = new URL("ftp://data.site.org/pub/obs/2008/021/ab120210.08d.Z");
URLConnection con = url.openConnection();
BufferedInputStream in =
new BufferedInputStream(con.getInputStream());
FileOutputStream out =
new FileOutputStream("C:\\filename.Z");
int i;
byte[] bytesIn = new byte[1024];
if ((i = in.read(bytesIn)) >= 0) {
out.write(bytesIn);
}
out.close();
in.close();
but it is giving an error when I'm closing the InputStream in!
I'm definitely stuck. Some comments about would be very useful!
Try this test
InputStream is = new URL("ftp://test:test#ftp.secureftp-test.com/bookstore.xml").openStream();
byte[] a = new byte[1000];
int n = is.read(a);
is.close();
System.out.println(new String(a, 0, n));
it definitely works
From my experience when you read bytes from a stream acquired from ftpClient.retrieveFileStream, for the first run it is not guarantied that you get your byte buffer filled up. However, either you should read the return value of stream.read(b); surrounded with a cycle based on it or use an advanced library to fill up the 1024 length byte[] buffer:
InputStream stream = null;
try {
// Download file from FTP server
stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
IOUtils.read(stream, b); // will call periodically stream.read() until it fills up your buffer or reaches end-of-file
fos.write(b);
} catch (IOException e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(inputStream);
}
I cannot understand why it doesn't work. I found this link where they used the Apache library to read 4096 bytes each time. I read the first 1024 bytes and it works eventually, the only thing is that if completePendingCommand() is used, the program is held for ever. Thus I've removed it and everything works fine.

java.net.URL read stream to byte[]

I'm trying to read an image from an URL (with the Java package
java.net.URL) to a byte[]. "Everything" works fine, except that the content isn't being entirely read from the stream (the image is corrupt, it doesn't contain all the image data)... The byte array is being persisted in a database (BLOB). I really don't know what the correct approach is, maybe you can give me a tip. :)
This is my first approach (code formatted, removed unnecessary information...):
URL u = new URL("http://localhost:8080/images/anImage.jpg");
int contentLength = u.openConnection().getContentLength();
Inputstream openStream = u.openStream();
byte[] binaryData = new byte[contentLength];
openStream.read(binaryData);
openStream.close();
My second approach was this one (as you'll see the contentlength is being fetched another way):
URL u = new URL(content);
openStream = u.openStream();
int contentLength = openStream.available();
byte[] binaryData = new byte[contentLength];
openStream.read(binaryData);
openStream.close();
Both of the code result in a corrupted image...
I already read this post from Stack Overflow.
There's no guarantee that the content length you're provided is actually correct. Try something akin to the following:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
InputStream is = null;
try {
is = url.openStream ();
byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
baos.write(byteChunk, 0, n);
}
}
catch (IOException e) {
System.err.printf ("Failed while reading bytes from %s: %s", url.toExternalForm(), e.getMessage());
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
}
finally {
if (is != null) { is.close(); }
}
You'll then have the image data in baos, from which you can get a byte array by calling baos.toByteArray().
This code is untested (I just wrote it in the answer box), but it's a reasonably close approximation to what I think you're after.
Just extending Barnards's answer with commons-io. Separate answer because I can not format code in comments.
InputStream is = null;
try {
is = url.openStream ();
byte[] imageBytes = IOUtils.toByteArray(is);
}
catch (IOException e) {
System.err.printf ("Failed while reading bytes from %s: %s", url.toExternalForm(), e.getMessage());
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
}
finally {
if (is != null) { is.close(); }
}
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toByteArray(java.io.InputStream)
Here's a clean solution:
private byte[] downloadUrl(URL toDownload) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
byte[] chunk = new byte[4096];
int bytesRead;
InputStream stream = toDownload.openStream();
while ((bytesRead = stream.read(chunk)) > 0) {
outputStream.write(chunk, 0, bytesRead);
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
return outputStream.toByteArray();
}
I am very surprised that nobody here has mentioned the problem of connection and read timeout. It could happen (especially on Android and/or with some crappy network connectivity) that the request will hang and wait forever.
The following code (which also uses Apache IO Commons) takes this into account, and waits max. 5 seconds until it fails:
public static byte[] downloadFile(URL url)
{
try {
URLConnection conn = url.openConnection();
conn.setConnectTimeout(5000);
conn.setReadTimeout(5000);
conn.connect();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
IOUtils.copy(conn.getInputStream(), baos);
return baos.toByteArray();
}
catch (IOException e)
{
// Log error and return null, some default or throw a runtime exception
}
}
byte[] b = IOUtils.toByteArray((new URL( )).openStream()); //idiom
Note however, that stream is not closed in the above example.
if you want a (76-character) chunk (using commons codec)...
byte[] b = Base64.encodeBase64(IOUtils.toByteArray((new URL( )).openStream()), true);
Use commons-io IOUtils.toByteArray(URL):
String url = "http://localhost:8080/images/anImage.jpg";
byte[] fileContent = IOUtils.toByteArray(new URL(url));
Maven dependency:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
The content length is just a HTTP header. You cannot trust it. Just read everything you can from the stream.
Available is definitely wrong. It's just the number of bytes that can be read without blocking.
Another issue is your resource handling. Closing the stream has to happen in any case. try/catch/finally will do that.
It's important to specify timeouts, especially when the server takes to respond. With pure Java, without using any dependency:
public static byte[] copyURLToByteArray(final String urlStr,
final int connectionTimeout, final int readTimeout)
throws IOException {
final URL url = new URL(urlStr);
final URLConnection connection = url.openConnection();
connection.setConnectTimeout(connectionTimeout);
connection.setReadTimeout(readTimeout);
try (InputStream input = connection.getInputStream();
ByteArrayOutputStream output = new ByteArrayOutputStream()) {
final byte[] buffer = new byte[8192];
for (int count; (count = input.read(buffer)) > 0;) {
output.write(buffer, 0, count);
}
return output.toByteArray();
}
}
Using dependencies, e.g., HC Fluent:
public byte[] copyURLToByteArray(final String urlStr,
final int connectionTimeout, final int readTimeout)
throws IOException {
return Request.Get(urlStr)
.connectTimeout(connectionTimeout)
.socketTimeout(readTimeout)
.execute()
.returnContent()
.asBytes();
}

Categories

Resources