Copy binary data from URL to file in Java without intermediate copy

Copy binary data from URL to file in Java without intermediate copy - java

I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.
I'm not at all familiar with Java, but a bit of googling led me to this code:
URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();
That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.
I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.
Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?

Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:
ByteArrayOutputStream out = new ByteArrayOutputStream();
With:
FileOutputStream out = new FileOutputStream("filename");
If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:
out.close();
See the documentation of FileOutputStream for more details.

I don't know what you mean with "large" data, but try using the JVM parameter
java -Xmx 256m ...
which sets the maximum heap size to 256 MByte (or any value you like).

If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.
URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)
In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:
http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.
But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Related

How to efficiently download large csv file using java

I need to provide a feature where user can download reports in excel/csv format in my web application. Once i made a module in web application which creates excel and then read it and sent to browser. It was working correctly. This time i don't want to generate excel file, as i don't have that level of control over file systems. I guess one way is to generate appropriate code in StringBuffer and set correct contenttype(I am not sure about this approach). Other team also has this feature but they are struggling when data is very large. What is the best way to provide this feature considering size of data could be very huge. Is it possible to send data in chunk without client noticing(except delay in downloading).
One issue i forgot to add is when there is very large data, it also creates problem in server side (cpu utilization and memory consumption). Is it possible that i read fixed amount of records like 500, send it to client, then read another 500 till completed.

You can also generate HTML instead of CSV and still set the content type to Excel. This is nice for colouring and styled text.
You can also use gzip compression when the client accepts that compression. Normally there are standard means, like a servlet filter.
Never a StringBuffer or the better StringBuilder. Better streaming it out. If you do not (cannot) call setContentength, the output goes chunked (without predictive progress).

URL url = new URL("http://localhost:8080/Works/images/address.csv");
response.setHeader("Content-Type", "text/csv");
response.setHeader("Content-disposition", "attachment;filename=myFile.csv");
URLConnection connection = url.openConnection();
InputStream stream = connection.getInputStream();
BufferedOutputStream outs = new BufferedOutputStream(response.getOutputStream());
int len;
byte[] buf = new byte[1024];
while ((len = stream.read(buf)) > 0) {
outs.write(buf, 0, len);
}
outs.close();

How can I use ReadableByteChannel to get file contents and store it in byteBuffer?

Below is the code that I have written. I want to do the simple thing, storing binary file data into byteBuffer.
File file = new File(fileName);
try {
ReadableByteChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(file.length());
// How can use buf.read to get all the contents?
} catch (Exception e){
}
I was wondering
how can I use read to get all data from channel and store it in ByteBuffer
if there is more elegant way to allocate ByteBuffer, other than using File object to get the length of the file

I prefer to use memory mapping.
FileChannel channel = new FileInputStream(fileName).getChannel();
ByteBuffer buf = channel.map(MapMode.READ_ONLY,0,channel.size());
If the file is greater than 2 GB, you have to have more than one mapping. On the plus side this takes around 10 ms regardless of size and doesn't use much heap or direct memory regardless of the size of the file.

From the ReadableByteChannel Javadocs
read(ByteBuffer dst)
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
So ... channel.read(buf);
As for your second question, if you want to read the entire contents of the file into memory at once that seems like a reasonable approach.

FileOutputStream is really slow

I am downloading databases from the network, which are between 100 Kbytes and 500 Kbytes large. Here is my code (removed useless code):
URLConnection uConnection = downloadUrl.openConnection();
InputStream iS = uConnection.getInputStream();
BufferedInputStream bIS = new BufferedInputStream(iS);
byte[] buffer = new byte[1024];
FileOutputStream fOS = new FileOutputStream(db);
int bufferLength = 0;
while ((bufferLength = bIS.read(buffer)) > 0) {
fOS.write(buffer, 0, bufferLength);
}
fOS.close();
My problem is, that it takes a long time for him to finish the while-statement. Have I messed up the code somewhere? It shouldn't take that long for such small files, shouldn't it? I'm talking about 1 minute, for three files not larger than 1 MB altogether... Thanks in advance!

"Slow" is really rather ambiguous. That being said, considering what you're trying to do you shouldn't be using a BufferedInputStream and your buffer is way too small.
The buffered wrappers are for optimizing small reads/writes. Since all you're doing is trying to read a ton of data as fast as you can, you should just read directly from the InputStream, and use a large buffer (Say, 64k since the underlying native code is probably going to chunk at that size anyway).
byte[] buffer = new byte[65536];
...
while ((bufferLength = iS.read(buffer, 0, buffer.length) > 0) {
...

I've found the real solution in Jdk 1.7, which is made by reliable, fast, simple and almost definitively will spawn a pity veil on older java.io solutions.Despite the web is still plenty full of examples of copying files in java using In/out Streams I'll warmely suggest everyone to use a simple method : java.nio.Files.copy(Path origin, Path destination) with optional parameters for replacing destination,migrate metadata file attributes and even try a transactional move of files (if permitted by the underlying O.S.). That's a really good Job, waited for so long! You can easily convert code from copy(File file1, File file2) by appending a ".toPath()" to the File instance (e.g. file1.toPath(), file2.toPath(). Note also that the boolean method isSameFile(file1.toPath(), file2.toPath()), is already used inside the above copy method but easily usable in every case you want. For every case you can't upgrade to 1.7 using community libraries from Apache (commons-io) or Google (guava commons) is still suggested.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!

From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.

Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?

Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html

Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}

You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Corrupt file when using Java to download file

This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.

I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.

If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.

Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.

There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();

You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Copy binary data from URL to file in Java without intermediate copy - java

I don't know what you mean with "large" data, but try using the JVM parameter java -Xmx 256m ... which sets the maximum heap size to 256 MByte (or any value you like).

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it. But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Related

How to efficiently download large csv file using java

How can I use ReadableByteChannel to get file contents and store it in byteBuffer?

FileOutputStream is really slow

how to write a file without allocating the whole byte array into memory?

Corrupt file when using Java to download file

Categories

Resources