Reading file >4GB file in java - java

I have mainframe data file which is greater than 4GB. I need to read and process the data for every 500 bytes. I have tried using FileChannel, however I am getting error with message Integer.Max_VALUE exceeded
public void getFileContent(String fileName) {
RandomAccessFile aFile = null;
FileChannel inChannel = null;
try {
aFile = new RandomAccessFile(Paths.get(fileName).toFile(), "r");
inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(500 * 100000);
while (inChannel.read(buffer) > 0) {
buffer.flip();
for (int i = 0; i < buffer.limit(); i++) {
byte[] data = new byte[500];
buffer.get(data);
processData(new String(data));
buffer.clear();
}
}
} catch (Exception ex) {
// TODO
} finally {
try {
inChannel.close();
aFile.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Can you help me out with a solution?

The worst problem of you code is the
catch (Exception ex) {
// TODO
}
part, which implies that you won’t notice any exceptions thrown by your code. Since there is nothing in the JRE printing a “Integer.Max_VALUE exceeded” message, that problem must be connected to your processData method.
It might be worth noting that this method will be invoked way too often with repeated data.
Your loop
for (int i = 0; i < buffer.limit(); i++) {
implies that you iterate as many times as there are bytes within the buffer, up to 500 * 100000 times. You are extracting 500 bytes from the buffer in each iteration, processing a total of up to 500 * 500 * 100000 bytes after each read, but since you have a misplaced buffer.clear(); at the end of the loop body, you will never experience a BufferUnderflowException. Instead, you will invoke processData each of the up to 500 * 100000 times with the first 500 bytes of the buffer.
But the whole conversion from bytes to a String is unnecessarily verbose and contains unnecessary copy operations. Instead of implementing this yourself, you can and should just use a Reader.
Besides that, your code makes a strange detour. It starts with a Java 7 API, Paths.get, to convert it to a legacy File object, create a legacy RandomAccessFile to eventually acquire a FileChannel. If you have a Path and want a FileChannel, you should open it directly via FileChannel.open. And, of course, use a try(…) { … } statement to ensure proper closing.
But, as said, if you want to process the contents as Strings, you surely want to use a Reader instead:
public void getFileContent(String fileName) {
try( Reader reader=Files.newBufferedReader(Paths.get(fileName)) ) {
CharBuffer buffer = CharBuffer.allocate(500 * 100000);
while(reader.read(buffer) > 0) {
buffer.flip();
while(buffer.remaining()>500) {
processData(buffer.slice().limit(500).toString());
buffer.position(buffer.position()+500);
}
buffer.compact();
}
// there might be a remaining chunk of less than 500 characters
if(buffer.position()>0) {
processData(buffer.flip().toString());
}
} catch(Exception ex) {
// the *minimum* to do:
ex.printStackTrace();
// TODO real exception handling
}
}
There is no problem with processing files >4GB, I just tested it with a 8GB file. Note that the code above uses the UTF-8 encoding. If you want to retain the behavior of your original code of using whatever happens to be your system’s default encoding, you may create the Reader using
Files.newBufferedReader(Paths.get(fileName), Charset.defaultCharset())
instead.

Related

I'm having trouble writing copy large files

I'm having problems with my code, I'm encrypting a file with more than 300mb in base 64 but my application gives errors when I open the lra encrypt file
this is my code crashes on the byte, i don't understand why
private void encript(final File file) {
new AsyncTask<Void, Void, Void>() {
#Override
protected Void doInBackground(Void[] p) {
File new_file = null;
try {
new_file = new File(file.getAbsolutePath() + ".enc.txt");
if (!new_file.exists()) {
new_file.createNewFile();
}
BufferedInputStream mInputStream = new BufferedInputStream(new FileInputStream(file));
OutputStream mOutputStream = new DataOutputStream(new FileOutputStream(new_file));
byte[] data = new byte[mInputStream.available()];
int len = 0;
while (true) {
len = mInputStream.read(data);
if (len > 0) {
mOutputStream.write(Base64.encode(data, 0, len, Base64.DEFAULT));
}
break;
}
mOutputStream.flush();
if (mOutputStream != null) {
mOutputStream.close();
}
if (mInputStream != null) {
mInputStream.close();
}
} catch (Exception io) {
Toast.makeText(MainActivity.this, io.toString(), Toast.LENGTH_LONG).show();
}
return null;
}
#Override
protected void onPostExecute(Void res) {
Toast.makeText(MainActivity.this, "Sucesss", Toast.LENGTH_LONG).show();
}
}.execute(new Void[0]);
}
Note that what you are doing here is Base64 encoding the file contents. Don't imagine that someone can't trivially crack this (so-called) "encryption".
There are lots of things wrong with your attempt. I shall go through the more important ones:
#Override
protected Void doInBackground(Void[] p) {
File new_file = null;
try {
Problem: You should be using try with resources to avoid resource leaks.
new_file = new File(file.getAbsolutePath() + ".enc.txt");
if (!new_file.exists()) {
new_file.createNewFile();
}
Problems:
On the one hand, there is no need to use createNewFile to pre-create an output file. Opening the file using FileOutputStream will create it if it doesn't exist already.
On the other hand, this won't prevent (or report) errors in cases where the file's parent directory doesn't exist, is not writeable and so on.
It would be better to use java.nio.file.Path and java.nio.file.Files from Java 7 / Android API 26. Path and Files are better APIs and they will report problems as exceptions so that you can (hypothetically) report them to the user via your exception handler.
There are even some Files.copy methods, though they are not directly applicable to your use-case since you are encoding the data as you copy it.
BufferedInputStream mInputStream =
new BufferedInputStream(new FileInputStream(file));
OutputStream mOutputStream =
new DataOutputStream(new FileOutputStream(new_file));
Problem:
I don't think you need a DataOutputStream. It won't actually be doing anything.
byte[] data = new byte[mInputStream.available()];
Problem:
The available() method should not be used for this. It returns the number of bytes that are "available" to be read right now. The value you get is context dependent. For a socket stream it is typically the number of bytes that are currently in the kernel buffers ready to read. For a "regular" file it may be the length of the input file.
So if you are copying a "really big" file, then you may be attempting to allocate a buffer that will hold the entire file. In the worst case, that will cause your app to OOME!
NOTE - Such an OOME might be the "out of nowhere" problem that you are seeing.
The "best" way is debatable, but I would just use a fixed buffer size ... if I was doing an explicit read / write copy of a stream. The size of the buffer affects throughput, but if you are looking for ultimate performance you shouldn't be doing it this way.
int len = 0;
while (true) {
len = mInputStream.read(data);
if (len > 0) {
mOutputStream.write(
Base64.encode(data, 0, len, Base64.DEFAULT));
}
break;
}
Problem: This loop is simply wrong. You are unconditionally breaking on the first iteration. You should be doing something like this:
int len;
while ((len = mInputStream.read(data)) > 0) {
mOutputStream.write(Base64.encode(data, 0, len, Base64.DEFAULT));
}
In other words, keep reading / writing until read returns a non-positive result.
Note: I'm not sure which Base64 class you are using there. It doesn't appear to be java.util.Base64
mOutputStream.flush();
if (mOutputStream != null) {
mOutputStream.close();
}
if (mInputStream != null) {
mInputStream.close();
}
Problems:
The flush() is not necessary. Closing the stream will flush. And besides, what happens with your attempted flush if mOutputStream is null.
This version leaks resources (file descriptors). If an exception has been thrown, these statements won't be executed, and the stream objects will not be closed.
This is all unnecessary if you use try with resources instead.
} catch (Exception io) {
Toast.makeText(MainActivity.this, io.toString(),
Toast.LENGTH_LONG).show();
}
return null;
}
Problems:
Catching Exception is a bad idea. A better idea is to catch and handle the expected exceptions, and let the unexpected ones propagate so that they can be handled further up the stack.
In this case, it looks like you are assuming that the exception will be some sort of I/O exception. In fact, it could also be an unchecked exception such as an NPE. (An OOME is also possible, though this catch wouldn't catch that because OOMEs are Error exceptions.)
You are throwing away the exception details. Unexpected exceptions should be logged so that you can diagnose them via logcat.

How to read file part by part while writing it?

I'm working on an app that records video and I need to send already written data in videofile to server in base64 string without stopping record process. Does anyone know how to make it with less memory consumption?
For now I'm doing it this way
private void sendNewVideos(String path) {
try {
Log.i(TAG, "VIDEO PATH - " + path);
FileWriter fileWriter = new FileWriter(new File(pathToFolder + "/temp.txt"));
String base64String = new String();
File file = new File(path);
Long size = 0L;
base64String = Base64.encodeToString(readFile(file, size), Base64.DEFAULT);
fileWriter.append(base64String);
fileWriter.flush();
boolean flag = true;
while (flag) {
if (size < file.length()) {
base64String = Base64.encodeToString(readFile(file, size), Base64.DEFAULT);
fileWriter.append(base64String);
fileWriter.flush();
size = file.length();
}
}
fileWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private byte[] readFile(File file, Long size) {
try {
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
randomAccessFile.seek(size);
FileChannel fileChannel = randomAccessFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024 * 2);
while (fileChannel.read(buffer) > 0) {
buffer.flip();
byte[] temp = new byte[buffer.limit()];
for (int i = 0; i < buffer.limit(); i++) {
temp[i] = buffer.get(i);
}
buffer.clear();
return temp;
}
fileChannel.close();
randomAccessFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Writing to file is just to check how it works. But after some time recording stops. Sometimes LogCat shows something like this
I/art: Thread[3,tid=23425,WaitingInMainSignalCatcherLoop,Thread*=0x7fe42c410800,peer=0x22c08080,"Signal Catcher"]: reacting to signal 3
I/art: Wrote stack traces to '/data/anr/traces.txt'
I think that's because of either memory leak or just out of memory problem.
Some kind of solutions.
Don't use Base64 for encoding video for sending via network (even wi-fi) as it increases amount of data approximately 10 times which is not very good for battery and could kill or hang you process/service.
Avoid reading file that is in process of written as it could and would slowdown IO operation speed.
If you still need to send data from such file use some kind of next algorithm:
get access to file (for example with buffered input stream);
read part of file to buffer;
do as simpler work with it as possible. For, example, send buffer to server in separate thread with HTTPUrlConnection. You can find example here.
Control used memory otherwise system try to kill you process.

Read Java socket inputstream without thread.sleep() in the below code?

public static void waitUntil(String prompt, InputStream instr) {
while (true) {
try {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (instr.available() >= 5) {
byte[] buff = new byte[1024];
int ret_read = 0;
ret_read = instr.read(buff);
if (ret_read > 0) {
if ((new String(buff, 0, ret_read)).contains(prompt)
&& flag) {
break;
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
If if remove that thread.sleep(1000) or even i reduce the to less than 1000 its not working properly.
Question : How to read java socket inputstream without thread.sleep() till all all incoming bytes are arrived?
if (instr.available() >= 5) {
Don't do that.
Instead of checking how many bytes are available, just try to read some into a buffer.
That will block until at least one byte is available, and then return as many as there are (that also fit into the buffer).
If that does not return all the bytes you need, loop until you get them.
If you just want to read everything, check out this thread: Convert InputStream to byte array in Java . Personally, I use Commons IO for this.
Use DataInputStream.readFully() with a buffer size of 5 (in this case, or more generally the size of data you're expecting), and get rid of both the sleep and the available() test.

How to use a chunk delimiter in a raw data file?

I want to save raw data chunks to a file, And later on read those chunks one by one. This is no big deal except the following doubt:
What exact bytes to use as a delimiter, i.e to identify end of one chunk and beginning of next ? Given that chunk data might also contain such a sequence of bytes by random chance.
Notes: chunks are of variable size and contain random data. They are jpeg images actually.
You could first write the length of the chunk to the file as a fixed-size value, e.g. a 4 bytes integer, followed by the data itself:
public void appendChunk(byte[] data, File file) throws IOException {
DataOutputStream stream = null;
try {
stream = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file, true)));
stream.writeInt(data.length);
stream.write(data);
} finally {
if (stream != null) {
try {
stream.close();
} catch (IOException e) {
// ignore
}
}
}
}
If you later have to read the chunks back from that file, you start by reading the length of the first chunk. You now can decide whether to read the chunk data, or whether to skip it and continue with the next chunk.
public void processChunks(File file) throws IOException {
DataInputStream stream = null;
try {
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
while (true) {
try {
int length = stream.readInt();
byte[] data = new byte[length];
stream.readFully(data);
// todo: do something with the data
} catch (EOFException e) {
// end of file reached
break;
}
}
} finally {
if (stream != null) {
try {
stream.close();
} catch (IOException e) {
// ignore
}
}
}
}
You can also add other meta-data about the chunks, like writing the original name of the file with stream.writeUTF(...). You only have to make sure that you write and read the same data in the same order.
Create a 2nd file in which you save the byteranges of your chunks in the chunkfile, or add that information to the header of your chunkfile. Did something similar once, don't forget that the byteranges than have the additional offset of the length of the header.
int startbyte = 0;
int lastByte = 0;
int chunkcount = 0;
File chunkfile;
File structurefile;
for (every chunk) {
append chunk to chunkfile
lastByte = startByte + chunk.sizeInBytes()
append to structurefile: chunkcount startByte lastByte
chunkcount++;
startByte = lastByte + 1
}

why this function returns null outstream

this function when called in a loop is sometimes giving null as outstream while other times not .. any reason why ? i am writing the outstream into text file sometimes i get empty text file . why ? if i run the loop 20 times .. i sometimes get empty text file on 3 random occasions sometimes 4 or 2 random occasions. what should i do ?
public void decrypt(InputStream in, OutputStream out) {
try {
// Bytes read from in will be decrypted
in = new CipherInputStream(in, dcipher);
// Read in the decrypted bytes and write the cleartext to out
int numRead = 0;
while ((numRead = in.read(buf)) >= 0) {
out.write(buf, 0, numRead);
}
out.close();
}
catch (java.io.IOException e) {
}
}
I think this happens because you are closing the output stream in your function. This way, the next iteration of your cycle will try to write to an already closed output stream. It will throw an IOException but you are ignoring it. Try closing the output stream after your loop and not in the method.
InputStream in = null;
OutputStream out = null;
try {
in = Initialize input stream
out = Initialize output stream
for (int i = 0; i < 10; i++) {
decrypt(in, out);
}
}finally {
try {
if (out != null)
out.close();
}finally {
if (in != null)
in.close();
}
}
If an exception is thrown by any code in your try block , it is ignored (since you have nothing in your catch clause.
You might want to :
actually do something in the catch clause (at least print the message of the exception - try e.printStackTrace())
instead of doing the out.close() call in the try block, do it in a finally clause after the catch block (so that it happens even if there is an error)
also , as pointed out by bruno, if you're always reusing the same output stream for evey calls of decrypt, you should not close it inside the function. However you might want to flush() it inside you loop.
you should definitely fix this part of your code:
catch (java.io.IOException e) {
}
and do at least some logging there. That way you'll find out why you have the problem you described.
"Never close something that you haven't opened" - don't know if that's a golden rule, but it nearly always leads to trouble when you close a resource in a subroutine - either the ressource is closed next time you need it or the resource is not closed because you changed the code...

Categories

Resources