reading larger files silently crash without error

reading larger files silently crash without error - java

I'm a noob to Java so may be missing something obvious but I have a function which works for files files in the 200-300k range without issue but once I get to 1.4mb it falls over silently!
Here's the code:
private String readOutputFile(String filename) {
if (filename == null) {
return null;
}
File file = new File(filename);
FileInputStream fis = null;
String fileContent = "";
this.logger.log("Reading " + filename + " from filesystem.");
try {
fis = new FileInputStream(file);
System.out.println("Total file size to read (in bytes) : " + fis.available());
int content;
while ((content = fis.read()) != -1) {
fileContent += (char) content;
}
} catch (IOException e) {
this.logger.log("IO Problem reading ITMS output file\n");
e.printStackTrace();
throw new Error("io-error/itms-output");
} finally {
try {
if (fis != null)
fis.close();
} catch (IOException ex) {
this.logger.log("IO Problem reading and/or closing ITMS output file\n");
ex.printStackTrace();
throw new Error("io-error/finally-block");
}
}
this.logger.log("File content has been read in");
String compressed = this.compress(this.cleanXML(fileContent));
this.logger.log("The compressed file size is :" + compressed.length() + " bytes");
return compressed;
}
When it hits the size threshold which creates it to fail, it seems to stay within the while loop or at least that's my assumption because while it does report to the console the "Total file size to read ..." it never reaches the "File content has been read in" logging.

You are creating many temporary String objects by performing character concatenation in your loop. I would use a StringBuilder. I would also prefer a try-with-resources. And if at all possible, I would prefer to stream from the InputStream to the OutputStream directly (instead of reading this entirely into memory). Anyway, based on what is here,
private String readOutputFile(String filename) {
if (filename == null) {
return null;
}
File file = new File(filename);
StringBuilder sb = new StringBuilder();
this.logger.log("Reading " + filename + " from filesystem.");
try (FileInputStream fis = new FileInputStream(file)) {
System.out.println("Total file size to read (in bytes) : " + fis.available());
int content;
while ((content = fis.read()) != -1) {
sb.append((char) content);
}
} catch (IOException e) {
this.logger.log("IO Problem reading ITMS output file\n");
e.printStackTrace();
throw new Error("io-error/itms-output");
}
this.logger.log("File content has been read in");
String compressed = this.compress(this.cleanXML(sb.toString()));
this.logger.log("The compressed file size is : " + compressed.length() + " bytes");
return compressed;
}

Related

File md5 hash changes when chunking it (for netty transfer)

Question at the bottom
I'm using netty to transfer a file to another server.
I limit my file-chunks to 1024*64 bytes (64KB) because of the WebSocket protocol. The following method is a local example what will happen to the file:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
while(is.read(buf) > 0) {
os.write(buf);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null && os != null) {
is.close();
os.close();
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
The file is loaded by the InputStream into a ByteBuffer and directly written to the OutputStream.
The content of the file cannot change while this process.
To get the md5-hashes of the file I've wrote the following method:
public static String checksum(File file) {
InputStream is = null;
try {
is = new FileInputStream(file);
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[8192];
int read = 0;
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
return new BigInteger(1, digest.digest()).toString(16);
} catch(IOException | NoSuchAlgorithmException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
is.close();
} catch(IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
return null;
}
So: just in theory it should return the same hash, shouldn't it? The problem is that it returns two different hashes that do not differ with every run.. file size stays the same and the content either.
When I run the method once for in: file-1, out: file-2 and again with in: file-2 and out: file-3 the hashes of file-2 and file-3 are the same! This means the method will properly change the file every time the same way.
1. 58a4a9fbe349a9e0af172f9cf3e6050a
2. 7b3f343fa1b8c4e1160add4c48322373
3. 7b3f343fa1b8c4e1160add4c48322373
Here is a little test that compares all buffers if they are equivalent. Test is positive. So there aren't any differences.
File file1 = new File("controller/templates/Example.zip");
File file2 = new File("controller/templates2/Example.zip");
try {
byte[] buf1 = new byte[1024*64];
byte[] buf2 = new byte[1024*64];
FileInputStream is1 = new FileInputStream(file1);
FileInputStream is2 = new FileInputStream(file2);
boolean run = true;
while(run) {
int read1 = is1.read(buf1), read2 = is2.read(buf2);
String result1 = Arrays.toString(buf1), result2 = Arrays.toString(buf2);
boolean test = result1.equals(result2);
System.out.println("1: " + result1);
System.out.println("2: " + result2);
System.out.println("--- TEST RESULT: " + test + " ----------------------------------------------------");
if(!(read1 > 0 && read2 > 0) || !test) run = false;
}
} catch (IOException e) {
e.printStackTrace();
}
Question: Can you help me chunking the file without changing the hash?

while(is.read(buf) > 0) {
os.write(buf);
}
The read() method with the array argument will return the number of files read from the stream. When the file doesn't end exactly as a multiple of the byte array length, this return value will be smaller than the byte array length because you reached the file end.
However your os.write(buf); call will write the whole byte array to the stream, including the remaining bytes after the file end. This means the written file gets bigger in the end, therefore the hash changed.
Interestingly you didn't make the mistake when you updated the message digest:
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
You just have to do the same when you "rechunk" your files.

Your rechunk method has a bug in it. Since you have a fixed buffer in there, your file is split into ByteArray-parts. but the last part of the file can be smaller than the buffer, which is why you write too many bytes in the new file. and that's why you do not have the same checksum anymore. the error can be fixed like this:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
int length;
while((length = is.read(buf)) > 0) {
os.write(buf, 0, length);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null)
is.close();
if(os != null)
os.close();
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
Due to the length variable, the write method knows that until byte x of the byte array, only the file is off, then there are still old bytes in it that no longer belong to the file.

java OutOfMemoryError about FileOutputStream?

Thanks for everyone ^_^,the problem is solved:there is a single line is too big(over 400M...I download a damaged file while I didn't realize), so throw a OutOfMemoryError
I want to split a file by using java,but it always throw OutOfMemoryError: Java heap space,I searched on the whole Internet,but it looks like no help :(
ps. the file's size is 600M,and it have over 30,000,000 lines,every line is no longer than 100 chars.
(maybe you can generate a "level file" like this:{
id:0000000001,level:1
id:0000000002,level:2
....(over 30 millions)
})
pss. set the Jvm memory size larger is not work,:(
psss. I changed to another PC, problem remains/(ㄒoㄒ)/~~
no matter how large the -Xms or -Xmx I set,the outputFile's size is always same,(and the Runtime.getRuntime().totalMemory() is truely changed)
here's the stack trace:
Heap Size = 2058027008
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.BufferedReader.readLine(BufferedReader.java:345)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.xiaomi.vip.tools.ptupdate.updator.Spilt.main(Spilt.java:39)
...
here's my code:
package com.updator;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
public class Spilt {
public static void main(String[] args) throws Exception {
long heapSize = Runtime.getRuntime().totalMemory();
// Print the jvm heap size.
System.out.println("Heap Size = " + heapSize);
String mainPath = "/home/work/bingo/";
File mainFilePath = new File(mainPath);
FileInputStream inputStream = null;
FileOutputStream outputStream = null;
try {
if (!mainFilePath.exists())
mainFilePath.mkdir();
String sourcePath = "/home/work/bingo/level.txt";
inputStream = new FileInputStream(sourcePath);
BufferedReader bufferedReader = new BufferedReader(new FileReader(
new File(sourcePath)));
String savePath = mainPath + "tmp/";
Integer i = 0;
File file = new File(savePath + "part"
+ String.format("%0" + 5 + "d", i) + ".txt");
if (!file.getParentFile().exists())
file.getParentFile().mkdir();
file.createNewFile();
outputStream = new FileOutputStream(file);
int count = 0, total = 0;
String line = null;
while ((line = bufferedReader.readLine()) != null) {
line += '\n';
outputStream.write(line.getBytes("UTF-8"));
count++;
total++;
if (count > 4000000) {
outputStream.flush();
outputStream.close();
System.gc();
count = 0;
i++;
file = new File(savePath + "part"
+ String.format("%0" + 5 + "d", i) + ".txt");
file.createNewFile();
outputStream = new FileOutputStream(file);
}
}
outputStream.close();
file = new File(mainFilePath + "_SUCCESS");
file.createNewFile();
outputStream = new FileOutputStream(file);
outputStream.write(i.toString().getBytes("UTF-8"));
} finally {
if (inputStream != null)
inputStream.close();
if (outputStream != null)
outputStream.close();
}
}
}
I think maybe: when outputStream.close(),the memory did not release?

So you open the original file and create a BufferedReaderand a counter for the lines.
char[] buffer = new char[5120];
BufferedReader reader = Files.newBufferedReader(Paths.get(sourcePath), StandardCharsets.UTF_8);
int lineCount = 0;
Now you read into your buffer, and write the characters as they come in.
int read;
BufferedWriter writer = Files.newBufferedWriter(Paths.get(fileName), StandardCharsets.UTF_8);
while((read = reader.read(buffer, 0, 5120))>0){
int offset = 0;
for(int i = 0; i<read; i++){
char c = buffer[i];
if(c=='\n'){
lineCount++;
if(lineCount==maxLineCount){
//write the range from 0 to i to your old writer.
writer.write(buffer, offset, i-offset);
writer.close();
offset=i;
lineCount=0;
writer = Files.newBufferedWriter(Paths.get(newName), StandarCharset.UTF_8);
}
}
writer.write(buffer, offset, read-offset);
}
writer.close();
}
That should keep the memory usage lower and prevent you from reading too large of a line at once. You could go without BufferedWriters and control the memory even more, but I don't think that is necessary.

I've tested with large text file.(250Mb)
it works well.
You need to add try catch exception codes for file stream.
public class MyTest {
public static void main(String[] args) {
String mainPath = "/home/work/bingo/";
File mainFilePath = new File(mainPath);
FileInputStream inputStream = null;
FileOutputStream outputStream = null;
try {
if (!mainFilePath.exists())
mainFilePath.mkdir();
String sourcePath = "/home/work/bingo/level.txt";
inputStream = new FileInputStream(sourcePath);
Scanner scanner = new Scanner(inputStream, "UTF-8");
String savePath = mainPath + "tmp/";
Integer i = 0;
File file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
if (!file.getParentFile().exists())
file.getParentFile().mkdir();
file.createNewFile();
outputStream = new FileOutputStream(file);
int count = 0, total = 0;
while (scanner.hasNextLine()) {
String line = scanner.nextLine() + "\n";
outputStream.write(line.getBytes("UTF-8"));
count++;
total++;
if (count > 4000000) {
outputStream.flush();
outputStream.close();
count = 0;
i++;
file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
file.createNewFile();
outputStream = new FileOutputStream(file);
}
}
outputStream.close();
file = new File(mainFilePath + "_SUCCESS");
file.createNewFile();
outputStream = new FileOutputStream(file);
outputStream.write(i.toString().getBytes("UTF-8"));
} catch (FileNotFoundException e) {
System.out.println("ERROR: FileNotFoundException :: " + e.getStackTrace());
} catch (IOException e) {
System.out.println("ERROR: IOException :: " + e.getStackTrace());
} finally {
if (inputStream != null)
try {
inputStream.close();
if (outputStream != null)
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
if the problem still occurs, change java heap memory size with following command on the shell prompt.
ex)
Xmx1g : 1Gb heap memory size,
MyTest : class name
java -Xmx1g MyTest

Files downloaded as Binary with Java are corrupted

I have written an downloader which should be used to download text files, as well as images. So I download the files as binaries. Many of the downloads work very well, but some parts of the text files and many image files are corrupted. The errors occur always at the same files and at the same places (as long as I can tell when analysing the text files). I used this code for downloading:
public File downloadFile(HttpURLConnection connection) {
return writeFileDataToFile(getFileData(connection));
}
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
//writes data to the file
private File writeFileDataToFile(List<Byte> fileData) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
this.file.createNewFile();
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
System.exit(1);
}
}
try (OutputStream output = new FileOutputStream(file)) {
output.write(Bytes.toArray(fileData));
return file;
} catch (IOException e) {
System.out.println("Error while accessing file at " + file.getPath());
System.exit(1);
return null;
}
}

I could suggest you to not pass through List of Byte, since you create a list of Byte from an array, to get it back to an array of Byte, which is not really efficient.
Moreover you wrongly assume the chunk size (not necesseraly 8192 bytes).
Why don't you just do something as:
private File writeFileDataToFile(HttpURLConnection connection) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
//this.file.createNewFile(); // not needed, will be created at FileOutputStream
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
//System.exit(1);
// instead do a throw of error or return null
throw new YourException(message);
}
}
OutputStream output = null;
InputStream input = null;
try {
output = new FileOutputStream(file):
input = connection.getInputStream();
byte[] fileChunk = new byte[8*1024];
int bytesRead;
while ((bytesRead = input.read(fileChunk )) != -1) {
output.write(fileChunk , 0, bytesRead);
}
return file;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
// System.exit(1); // you should avoid such exit
// instead do a throw of error or return null
throw new YourException(message);
} finally {
if (input != null) {
try {
input.close();
} catch (Execption e2) {} // ignore
}
if (output != null) {
try {
output.close();
} catch (Execption e2) {} // ignore
}
}
}

The failure was adding the whole fileChunk Array to file data, even if it wasn't completely filled by the read operation.
Fix:
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
Where the relevant change is changing
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
into
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}

Incorrectly reading int from binary file Java

I'm trying to read a date (set of 6 integers) and temperature (double) from binary .dat file.
After multiple tries I finally got to the stage where the file is working, but it's returning int in the format I cannot recognize. Eg. date 2017-03-02 11:33 , and temperature 3.8 is read as:
Measure : 515840-1024-1024 2816 8512 241591910
temperature: 1.9034657819129845E185
Any ideas, how to change the code?
public void readFile() {
try {
DataInputStream dis = null;
BufferedInputStream bis = null;
try {
FileInputStream fis = new FileInputStream(fileLocation);
int b;
bis = new BufferedInputStream(fis);
dis = new DataInputStream(fis);
while ((b = dis.read()) != -1) {
System.out.println("Measure : " + dis.readInt() + "-"
+ dis.readInt() + "-" + dis.readInt() + " " +
dis.readInt() + " " + dis.readInt() + " "
+ dis.readInt() + " Temperature: "+ dis.readDouble());
}
} finally {
dis.close();
}
} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (EOFException f) {
f.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
} // readFile

while ((b = dis.read()) != -1) {
The problem is here. This reads and discards a byte of the file on every iteration, so all subsequent reads are out of sync.
The correct way to loop with a DataInputStream or ObjectInputStream is with a while (true) loop, and terminating it when read() returns -1, readLine() returns null, or readXXX() for any other X throws EOFException.
Note that you don't normally need to log or print a stack trace on EOFException, as it's a normal loop termination condition ... unless you had reason to expect more data, e.g. your file started with a record count that you haven't reached yet, which might indicate that the file was truncated and therefore corrupt.

Reading data from multiple files and writing the data to a new file giving unexpected result?

I've split a mp3 file of 10 MB size into 10 parts of 1 MB each in mp3 format on my Android device, each file plays successfully by the player but while reading the data of all the 10 files and writing it to a single file the total size of the new file is more than 17 MB and the file doesn't play itself. Following is the code:
CODE FOR FILE SPLIT :
File file = new File(Environment.getExternalStorageDirectory()
+ "/MusicFile.mp3");
try {
FileInputStream fis = new FileInputStream(file);
FileOutputStream fos = null;
int size = 1048576; // 1 MB of data
byte buffer[] = new byte[size];
int count = 0;
int i = 0;
while (true) {
i = fis.read(buffer, 0, size);
if (i == -1) {
break;
}
File filename = getSplitFileName("split_" + count);
fos = new FileOutputStream(filename);
fos.write(buffer, 0, i);
++count;
}
fis.close();
fos.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
} catch (Exception e2) {
e2.printStackTrace();
}
CODE FOR FILE JOIN :
File folder = new File(cacheDirSplit.getAbsolutePath());
File files[] = folder.listFiles();
BufferedReader bufReader = null;
BufferedWriter bufWriter = null;
if (files.length > 1) {
try {
File fileName = getJoinedFileName("NewMusicFile");
String data;
for (int i = 0; i < files.length; i++) {
long dataSize = 0;
bufReader = new BufferedReader(new FileReader(
files[i]));
bufWriter = new BufferedWriter(new FileWriter(
fileName, true));
while ((data = bufReader.readLine()) != null) {
bufWriter.write(data);
dataSize = dataSize + data.getBytes().length;
}
Log.i("TAG", "File : " + files[i] + "size ==> "
+ dataSize);
}
bufReader.close();
bufWriter.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
What i do not understand is that while reading each file is read as 1.7MB as printed by the LOGCAT output but on the device when i check the splitted file is of 1MB only. Is there anything wrong with the code or is there some other thing I'm missing? Thanks in advance.

You cannot use readLine() on the content of an mp3 file. readLine() is for text files only. And if the ten were really playable and real mp3 files you had to strip the header first as Onur explained.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

reading larger files silently crash without error - java

Related

File md5 hash changes when chunking it (for netty transfer)

java OutOfMemoryError about FileOutputStream?

Files downloaded as Binary with Java are corrupted

Incorrectly reading int from binary file Java

Reading data from multiple files and writing the data to a new file giving unexpected result?

Categories

Resources