android write to disk unreliable - written file.length !=expected.length

android write to disk unreliable - written file.length !=expected.length - java

I have a write method that will write a byte[] to disk. On very few devices I'm running into some strange problems where the written file.length() != byte[].length after a successful write operation.
Code and Problem
The code to write a file to disk
private static boolean writeByteFile(File file, byte[] byteData) throws IOException {
if (!file.exists()) {
boolean fileCreated = file.createNewFile();
if (!fileCreated) {
return false;
}
}
FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream(fos);
bos.write(byteData);
bos.flush();
fos.getFD().sync(); // sync to disk as recommended: http://android-developers.blogspot.com/2010/12/saving-data-safely.html
fos.close();
if (file.length() != byteData.length) {
final byte[] originalMD5Hash = md.digest(byteData);
InputStream is = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(is);
byte[] buffer = new byte[4096];
while(bis.read(buffer) > -1) {
md.update(buffer);
}
is.close();
final byte[] writtenFileMD5Hash = md.digest();
if(!Arrays.equals(originalMD5Hash, writtenFileMD5Hash)) {
String message = String.format(
"After an fsync, the file's length is not equal to the number of bytes we wrote!\npath=%s, expected=%d, actual=%d. >> " +
"Original MD5 Hash: %s, written file MD5 hash: %s",
file.getAbsolutePath(), byteData.length, file.length(),
digestToHex(originalMD5Hash), digestToHex(writtenFileMD5Hash));
throw new GiantWtfException(message);
}
}
return true;
}
I'm running into the if-statement where I compare file length on a few devices.
One example output:
After an fsync, the file's length is not equal to the number of bytes we wrote! path=/mnt/sdcard/.folder/filename, expected=233510, actual=229376 >> Original MD5 Hash: f1d298c0484672c52d9c26d04a3a21dc, written file MD5 hash: ab30660bd2b476d9551c15b340207a8a
I currently see this problem on 5 devices as I'm slowly rolling out the code. Some device data:
Question
Is there anything else I can do or improve?
More stats and observations
Current system version
2.3.5
2.3.6
Model
N860 (LG)
GT-I9100G (Samsung)
GT-S5300 (Samsung)
GT-S7500 (Samsung)
LG-VS410PP (LG)
Other stats
In the general crash analytics (from Crittercism) there is always more then enough free disk space at the time the problem happens. Still some (not all) of the devices have thrown IOExceptions around no free disk space at a different point in time.
As always I've never been able to reproduce that problem on any test phone I have.
Assumptions / Observations:
Generally I would expect a IOException when the disk is full. Still all the exceptions that I catch have less bytes written then they should have.
Interestingly enough all the number of bytes that actually have been written to disk are a multiple of 2^15.
EDIT:
I added a MD5 check sum validation that also fails and simplified the example code a little for better readability. It still fails in the wild with different MD5 hashes.

philipp, file.length() is the file size as reported by the OS. It might be the space the file takes up on disk or the number of bytes in the file.
If the number returned is size on disk, it is related to the number of clusters that hold the file. For example NTFS generally uses 4KB clusters. If you save a text document with 3 ascii encoded characters in on an NTFS formatted volume, the size of the file is 3 bytes, the size of the file on disk is 4096 bytes. On NTFS with a 4KB cluster all files are a multiple of 4096 bytes on disk. See http://en.wikipedia.org/wiki/Data_cluster for more.
If the number returned is the length of the file in bytes (from the underlying file-system's meta-data) then you should have an exact match to how many bytes you wrote, though I wouldn't bet my life on it.
Android uses YAFFS or EXT4, if that helps at all.
I strongly agree with admdrew, use a hash. MD5 would work great. SHA or even CRC should work fine for this task. As you write bytes to the disk, feed the stream to your hash algorithm as well. Once the file is written, read it back and feed that to your hasher. Compare the results. If you want to be sure the data is clean, file size is not enough.

Related

BufferedInputStream hanging (not reaching end of file)

I have Java SSL/TLS server&client sockets. My client simply sends a file to the Server and Server receives it. Here are my codes:
My client method:
static boolean writeData(BufferedOutputStream bos, File data) {
FileInputStream fis = new FileInputStream(data);
BufferedInputStream bis = new BufferdInputStream(fis);
byte[] bytes = new byte[512];
int count = 0;
while ((count = bis.read(bytes, 0, bytes.length)) > 0) {
System.out.println("Sending file...");
bos.write(dataByte, 0, count);
System.out.println(count);
}
bos.flush();
System.out.println("File Sent");
}
My server method:
static boolean receiveData(BufferedInputStream bis, File data) {
byte[] bytes = new byte[512];
int count = 0;
while ((count = bis.read(bytes, 0, bytes.length)) > 0) {
System.out.println("Receiving file...");
// Do something..
System.out.println(count);
}
bos.flush();
System.out.println("File Received");
}
The problem is, the server hangs inside the while loop.. It never reaches the "File Received" message.
Even if the file is small, the bis.read() method never returns -1 at the end of file for some reason. I tested the methods with a file size of 16 bytes, and the output is as follows:
Client terminal:
> Sending file...
> 16
> File Sent
Server terminal:
> Receiving file...
> 16
As you can see, the server never reaches the "File Received" message and hangs inside the loop even after the end of stream is reached.. Can anyone guess the reason for this?
Thanks

Your server never detects that the file has been sent, because it checks whether you have closed the connection at the other end (the only reason why you would receive -1 bytes read).
But you never close the connection, you only flush it.
Replace bos.flush() with bos.close() in the writeData method and it should work.
If you don't want to close the connection, because you want to do more work with it, you have to add a protocol of some sort, because there is no default way to do that.
One thing you could do, which is one of the easier ways to implement this, is to send the length of the file as a 32-bit or 64-bit integer before the file.
Then the server knows how many bytes it should read before it can consider the file fully sent.
If you don't know the length of the file, there are many options. I'm not sure if there is a consensus on the most effective way to do this, but given that many protocols take different approaches, I don't think that there is.
These are just a few suggestions, which you can tune.
Before any piece of data, you send the length of the data you want to send as a 32-bit bit (signed) integer. So a file will be sent as multiple pieces of data. Sending a negative number means that the previous piece was the last piece and the file has ended. (If you needed to send a piece that was larger than the maximum that you can represent in a signed 32-bit integer, you need to split it in several pieces).
You think of a random number, with a long-enough length (something like 16 bytes or 32 bytes) that it will never occur in your data. You send that number before the file and when the file is done, you send it again to indicate that event. This is similar to the MIME multi-part encoding.
You take a byte or a number of bytes that indicates whether the file has ended (like 0xFF). But to ensure that you can still legitimately send 0xFF as part of the file, you add the rule that 0xFF 0xFF means that the file has ended, but 0xFF 0x00 means "just a literal 0xFF" in the file.
There are many more ways to do it.

File behaves differently when loaded from jar or directory

I'm a java newbie with a real hair puller. Hope someone can help.
I have a binary file that loads ok from the applet's directory,
but which only partially loads from the applet's jar file.
The code below loads the file both ways and compares them. They
should be identical, but the output is "divergence at byte 8181".
int spx_data_length = 158994;
byte[] spx_buf = new byte[spx_data_length];
byte[] spx_buf2 = new byte[spx_data_length];
// binary file in jar
InputStream is = Vocals.class.getResourceAsStream("0.raw");
is.read(spx_buf, 0, spx_data_length);
is.close();
// same binary file in applet directory
URL srcURL=new URL(getCodeBase(),"0.raw");
URLDataSource u_dat = new URLDataSource(srcURL);
is=u_dat.getInputStream();
is.read(spx_buf2, 0, spx_data_length);
is.close();
// compare them
for(int i=0;ispx_data_length;i++){
if(spx_buf[i] != spx_buf2[i]){
Obj[0]="divergence at byte "+i; win.call("show_string", Obj);
i=spx_data_length;
}
}

InputStream.read(byte[], int, int) will read up to spx_data_length bytes, but may very well read less. Particularly in the case of compressed data (i.e. reading from the JAR), it might return one decompression buffer worth of data at a time. You should either loop until the read returns -1, or use something like DataInputStream.readFully(byte[], int, int). And you should compare the number of bytes read: if that differes, there is little point in comparing the bytes past the smaller of these counts.

How do I use a ChannelBufferOutputStream to check compression size

In a java program I am compressing an InputStream like this:
ChannelBufferOutputStream outputStream = new ChannelBufferOutputStream(ChannelBuffers.dynamicBuffer(BUFFER_SIZE));
GZIPOutputStream compressedOutputStream = new GZIPOutputStream(outputStream);
try {
IOUtils.copy(inputStream, compressedOutputStream);
} finally {
// this should print the byte size after compression
System.out.println(outputStream.writtenBytes());
}
I am testing this code with a json file that is ~31.000 byte uncompressed and ~7.000 byte compressed on disk. Sending a InputStream that is wrapping the uncompressed json file to the code above, outputStream.writtenBytes() returns 10 which would indicate that it compressed down to only 10 byte. That seems wrong, so I wonder where the problem is. ChannelBufferOutputStream javadoc says: Returns the number of written bytes by this stream so far. So it should be working.

Try calling GZIPOutputStream.finish() or flush() methods before counting bytes
If that does not work, you can create a proxy stream, whose mission - to count the number of bytes that have passed through it

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!

From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.

Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?

Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html

Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}

You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

Copy binary data from URL to file in Java without intermediate copy

I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.
I'm not at all familiar with Java, but a bit of googling led me to this code:
URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();
That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.
I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.
Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?

Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:
ByteArrayOutputStream out = new ByteArrayOutputStream();
With:
FileOutputStream out = new FileOutputStream("filename");
If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:
out.close();
See the documentation of FileOutputStream for more details.

I don't know what you mean with "large" data, but try using the JVM parameter
java -Xmx 256m ...
which sets the maximum heap size to 256 MByte (or any value you like).

If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.
URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)
In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:
http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.
But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.