Corrupt file when using Java to download file

Corrupt file when using Java to download file - java

This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.

I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.

If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.

Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.

There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();

You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!

Related

When do I need to specify the encoding while writing the file to the disk?

I have a sample method which copies one file to another using InputStream and OutputStream. In this case, the source file is encoded in 'UTF-8'. Even if I don't specify the encoding while writing to the disk, the destination file has the correct encoding. But, if I have to write a java.lang.String to a file, I need to specify the encoding. Why is that ?
public static void copyFile() {
String sourceFilePath = "C://my_encoded.txt";
InputStream inStream = null;
OutputStream outStream = null;
try{
String targetFilePath = "C://my_target.txt";
File sourcefile =new File(sourceFilePath);
outStream = new FileOutputStream(targetFilePath);
inStream = new FileInputStream(sourcefile);
byte[] buffer = new byte[1024];
int length;
//copy the file content in bytes
while ((length = inStream.read(buffer)) > 0){
outStream.write(buffer, 0, length);
}
inStream.close();
outStream.close();
System.out.println("File "+targetFilePath+" is copied successful!");
}catch(IOException e){
e.printStackTrace();
}
}
My guess is that since the source file has thee correct encoding and since we read and write one byte at a time, it works fine. And java.lang.String is 'UTF-16' by default and if we write it to the file, it reads one byte at a time instead of 2 bytes and hence garbage values. Is that correct or am I completely wrong in my understanding ?

You are copying the file byte per byte, so you don't need to care about character encoding.
As a rule of thumb:
Use the various InputStream and OutputStream implementations for byte-wise processing (like file copy).
There are some convenience methods to handle text directly like PrintStream.println(). Be careful because most of them use the default platform specific encoding.
Use the various Reader and Writer implemenations for reading and writing text.
If you need to convert between byte-wise and text processing use InputStreamReader and OutputStreamWriter with explicit file encoding.
Do not rely on the default encoding. The default character encoding is platform specific (e.g. Windows-ANSI aka Cp1252 for Windows, usually UTF-8 on Linux).
Example: If you need to read a UTF-8 text file:
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-8"));
Avoid using a FileReader because a FileReader uses always the default encoding.
A special case: If you need random access to a file you should use RandomAccessFile. With it you can read and write data blocks at arbitrary positions. You can read and write raw byte blocks or you can use convenience methods to read and write text. But you should read the documentation carefully. E.g. the methods readUTF() and writeUTF() use a modified UTF-8 encoding.
InputStream, OutputStream, Reader, Writer and RandomAccessFile form the basic IO functionality, enough for most use cases. For advanced IO (e.g. memory mapped files, ...) have a look at package java.nio.

Just read your code! (For the copy part at least ;-) )
When you copy the two files, you copy it byte by byte. There is no conversion to String, thus.
When you write a String into a file, you need to convert it (indirectly sometimes) in an array of byte (byte[]). There you need to specify your encoding.
When you read a file to get a String, you need to know its encoding in order to do it properly. Java doesn't 'skip' any byte but you need to make a conversion once again : from a byte[] to a String.

Decode Base64InputStream from String and unzip it with GZIPInputStream in Java

I am tryin to create simple app to decode the zipped data from string. In one textarea user will paste the zipped data after button click the data will be decoded and shown in another textarea.
If I use files, it works perfectly:
zis = new GZIPInputStream(new Base64InputStream(new FileInputStream(inZippedFile)));
where inZippedFile is file. Then result is saved to outputFile.
However, if I want to use string in InputStream it will never finish.
String input = "..."
InputStream in = IOUtils.toInputStream(input, "UTF-8");
zis = new GZIPInputStream(new Base64InputStream(in));
For IOUtils I am using common apache 2.4 jars. Anything what I am doing wrong?
Thanks

The decoding and unzip the string data is working correctly, there was just mistake in parsing the data to corect form. This was causing the long run.
So this is working, no need to set the UTF-8:
new GZIPInputStream(new Base64InputStream(IOUtils.toInputStream(input)));

Reading the contents of a zip entry from a zip that is in the form of bytes

I'm calling onto some code that returns me an HTTP response. I can get the contents of the response which returns me a byte array. The bytes represent a zip file that I would like to extract and get the contents of a single file (the zip only contains one file).
Currently I have some messy code (I'll need to clean it up if I keep it) that seems to work:
byte[] bytes = response.out.toByteArray();
ByteArrayInputStream input = new ByteArrayInputStream(bytes);
ZipInputStream zip = new ZipInputStream(input);
ByteArrayOutputStream output = new ByteArrayOutputStream();
zip.getNextEntry();
int data;
while ((data = zip.read()) != -1) output.write(data);
output.close();
zip.close();
input.close();
byte[] kmlBytes = output.toByteArray();
String contents = new String(kmlBytes, "UTF-8");
but was wondering whether there was a cleaner way to do the same thing because the above looks incredibly ugly.

In order to avoid "reinventing the wheel" and reduce clutter and to focus on your app's business logic rather than low-level code like this, you can use Apache Commons Compress.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!

From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.

Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?

Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html

Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}

You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Copy binary data from URL to file in Java without intermediate copy

I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.
I'm not at all familiar with Java, but a bit of googling led me to this code:
URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();
That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.
I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.
Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?

Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:
ByteArrayOutputStream out = new ByteArrayOutputStream();
With:
FileOutputStream out = new FileOutputStream("filename");
If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:
out.close();
See the documentation of FileOutputStream for more details.

I don't know what you mean with "large" data, but try using the JVM parameter
java -Xmx 256m ...
which sets the maximum heap size to 256 MByte (or any value you like).

If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.
URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)
In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:
http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.
But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.