java.lang.OutOfMemoryError while reading file to byte[] array

java.lang.OutOfMemoryError while reading file to byte[] array - java

Is there a cleaner and faster way to do this:
BufferedReader inputReader = new BufferedReader(new InputStreamReader(context.openFileInput("data.txt")));
String inputString;
StringBuilder stringBuffer = new StringBuilder();
while ((inputString = inputReader.readLine()) != null) {
stringBuffer.append(inputString + "\n");
}
text = stringBuffer.toString();
byte[] data = text.getBytes();
Basically I'm trying to convert a file into byte[], except if the file is large enough then I run into an outofmemory error. I've been looking around SO for a solution, I tried to do this here, and it didn't work. Any help would be appreciated.

Few suggestions:
You don't need to create string builder. You can directly read bytes from the file.
If you read multiple files, check for those byte[] arrays remaining in memory even when not required.
Lastly increase the maximum memory for your java process using -Xmx option.

As we know the size of this file, somewhat half of the memory can be saved by allocating the byte array of the given size directly rather than expanding it:
byte [] data = new byte[ (int) file.length() ];
FileInputStream fin = new FileInputStream(file);
int n = 0;
while ( (n = fin.read(data, n, data.length() - n) ) > 0);
This will avoid allocating unnecessary additional structures. The byte array is only allocated once and has the correct size from beginning. The while loop ensures all data are loaded ( read(byte[], offset, length) may read only part of file but returns the number of bytes read).
Clarification: When the StringBuilder runs out, it allocates a new buffer that is the two times larger than the initial buffer. At this moment, we are using about twice the amount of memory that would be minimally required. In the most degenerate case (one last byte does not fit into some already big buffer), near three times the minimal amount of RAM may be required.

If you haven't enough memory to store there whole file, you can try rethink your algorithm to process file data while reading it, without constructing large byte[] array data.
If you have already tried increase java memory by playing with -Xmx parameter, then there isn't any solution, which will allow you store data in memory, which can not be located there due to its large size.

This is similar to File to byte[] in Java
You're currently reading in bytes, converting them to characters, and then trying to turn them back into bytes. From the InputStreamReader class in the Java API:
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters..
It would be way more efficient to just read in bytes.
One way would be to use a ByteArrayInputStream directly on context.openFileInput(), or the Jakarta Commons IOUtils.toByteArray(InputStream), or if you're using JDK7 you can use Files.readAllBytes(Path).

You are copying bytes into char (which use twice the space) and back into bytes again.
InputStream in = context.openFileInput("data.txt");
ByteArrayOutputStream bais = new ByteArrayOutputStream();
byte[] bytes = new byte[8192];
for(int len; (lne = in.read(bytes) > 0;)
bais.write(bytes, 0, len);
in.close();
return bais.toByteArray();
This will half your memory requirement but it can still mean you run out of memory. In this case you have to either
increase your maximum heap size
process the file progressively instead of all at once
use memory mapped files which allows you to "load" a file without using much heap.

The 'cleaner and faster way' is not to do it at all. It doesn't scale. Process the file a piece at a time.

This solution will test the free memory before loading...
File test = new File("c:/tmp/example.txt");
long freeMemory = Runtime.getRuntime().freeMemory();
if(test.length()<freeMemory) {
byte[] bytes = new byte[(int) test.length()];
FileChannel fc = new FileInputStream(test).getChannel();
MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());
while(mbb.hasRemaining()) {
mbb.get(bytes);
}
fc.close();
}

Related

InputStream and OutOfMemory Error

public String loadJSONFromAsset(String path) {
String json = null;
try {
InputStream is = this.getAssets().open(path);
int size = is.available();
Log.d("Size: ", "" +size);
byte[] buffer = new byte[size];
is.read(buffer);
is.close();
json = new String(buffer, "UTF-8");
} catch (IOException ex) {
ex.printStackTrace();
}
return json;
}
This is code which convert the file to JSON data file. It works literally, it creates JSON file but the size of "is" is appr. 8MB
D/Size:: 7827533
and OutOfMemory Error occurs at most devices such as
java.lang.OutOfMemoryError
at java.lang.String.<init>(String.java:255)
at java.lang.String.<init>(String.java:228)
at com.example.fkn.projecttr.List.loadJSONFromAsset(List.java:255)
How can I handle it? How can it be coded more efficient? It has no problem running time but it consumes too large memory on device. Thus, when the device memory has no more capacity, program crashes out.

I noticed this:
int size = is.available();
and thought it was a little strange. So I went and looked at the JavaDoc for InputStream.available and this is what it had to say:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
So you have one of two conditions:
Your file size is actually 8MB.
If you really have this much JSON, you need to rethink what is in there and what you are using it for. One option that I don't see a lot of developers use is JsonReader, which allows you to parse through the JSON without loading the entire stream into memory first.
Your file size is much smaller than 8MB
Just read the file differently, see How do I create a Java string from the contents of a file?

Keep the bytes compressed, a GZipOutputStream on a ByteArrayOutputStream (or a gzipped compressed asset).
Then always process that, for output, parsing or whatever, using a GZipInputStream.
That would safe at least a factor 20 (10 compression, ~3 times less overhead).
For a less invasive change (and more memory consumption): now there is an 8 MB byte array and a 16 MB String. The string could be immediately parsed to a DOM, so JSON. Discarding the whitespace, and equal String values mapped to one String instance (for instance by a Map<String, String> idmap). It depends whether there is much repetition in the data.

Java OutOfMemoryError in reading a large text file

I'm new to Java and working on reading very large files, need some help to understand the problem and solve it. We have got some legacy code which have to be optimized to make it run properly.The file size can vary from 10mb to 10gb only. only trouble start when file starting beyond 800mb size.
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];
ByteArrayOutputStream bArrStream = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
bArrStream.write(localbuffer, 0, i);
}
byte[] data = bArrStream.toByteArray();
inFileReader.close();
bos.close();
We are getting the error
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
Any help would be appreciated?

Try to use java.nio.MappedByteBuffer.
http://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html
You can map a file's content onto memory without copying it manually. High-level Operating Systems offer memory-mapping and Java has API to utilize the feature.
If my understanding is correct, memory-mapping does not load a file's entire content onto memory (meaning "loaded and unloaded partially as necessary"), so I guess a 10GB file won't eat up your memory.

Even though you can increase the JVM memory limit, it is needless and allocating a huge memory like 10GB to process a file sounds overkill and resource intensive.
Currently you are using a "ByteArrayOutputStream" which keeps an internal memory to keep the data. This line in your code keeps appending the last read 2KB file chunk to the end of this buffer:
bArrStream.write(localbuffer, 0, i);
bArrStream keeps growing and eventually you run out of memory.
Instead you should reorganize your algorithm and process the file in a streaming way:
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];
int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
//Deal with the current read 2KB file chunk here
}
inFileReader.close();

The Java virtual machine (JVM) runs with a fixed upper memory limit, which you can modify thus:
java -Xmx1024m ....
e.g. the above option (-Xmx...) sets the limit to 1024 megabytes. You can amend as necessary (within limits of your machine, OS etc.) Note that this is different from traditional applications which would allocate more and more memory from the OS upon demand.
However a better solution is to rework your application such that you don't need to load the whole file into memory at one go. That way you don't have to tune your JVM, and you don't impose a huge memory footprint.

You can't read 10GB Textfile in memory. You have to read X MB first, do something with it and than read the next X MB.

The problem is inherent in what you're doing. Reading entire files into memory is always and everywhere a bad idea. You're really not going to be able to read a 10GB file into memory with current technology unless you have some pretty startling hardware. Find a way to process them line by line, record by record, chunk by chunk, ...

Is it mandatory to get entire ByteArray() of output stream?
byte[] data = bArrStream.toByteArray();
Best approach is read line by line & write it line by line. You can use BufferedReader or Scanner to read large files as below.
import java.io.*;
import java.util.*;
public class FileReadExample {
public static void main(String args[]) throws FileNotFoundException {
File fileObj = new File(args[0]);
long t1 = System.currentTimeMillis();
try {
// BufferedReader object for reading the file
BufferedReader br = new BufferedReader(new FileReader(fileObj));
// Reading each line of file using BufferedReader class
String str;
while ( (str = br.readLine()) != null) {
System.out.println(str);
}
}catch(Exception err){
err.printStackTrace();
}
long t2 = System.currentTimeMillis();
System.out.println("Time taken for BufferedReader:"+(t2-t1));
t1 = System.currentTimeMillis();
try (
// Scanner object for reading the file
Scanner scnr = new Scanner(fileObj);) {
// Reading each line of file using Scanner class
while (scnr.hasNextLine()) {
String strLine = scnr.nextLine();
// print data on console
System.out.println(strLine);
}
}
t2 = System.currentTimeMillis();
System.out.println("Time taken for scanner:"+(t2-t1));
}
}
You can replace System.out with your ByteArrayOutputStream in above example.
Please have a look at below article for more details: Read Large File
Have a look at related SE question:
Scanner vs. BufferedReader

ByteArrayOutputStream writes to an in-memory buffer. If this is really how you want it to work, then you have to size the JVM heap after the maximum possible size of the input. Also, if possible, you may check the input size before even start processing to save time and resources.
The alternative approach is a streaming solution, where the amount of memory used at runtime is known (maybe configurable but still known before the program starts), but if it's feasible or not depends entirely on you application's domain (because you can't use an in-memory buffer anymore) and maybe the architecture of the rest of your code if you can't/don't want to change it.

Try using a large buffer read size may be 10 mb and then check.

Read the file iteratively linewise. This would significantly reduce memory consumption. Alternately you may use
FileUtils.lineIterator(theFile, "UTF-8");
provided by Apache Commons IO.
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}

Run Java with the command-line option -Xmx, which sets the maximum size of the heap.
See here for details..

Assuming that you are reading large txt file and the data is set line by line , use line by line reading approach. As I know you can read up to 6GB may be more.
...
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
br.close();
Refrence for the code fragment

Short answer,
without doing anything, you can push the current limit by a factor of 1.5. It means that, if you are able to process 800MB, you can process 1200 MB. It also means that if by some trick with java -Xm .... you can move to a point where your current code can process 7GB, your problem is solved, because the 1.5 factor will take you to 10.5GB, assuming you have that space available on your system and that JVM can get it.
Long answer:
The error is pretty self-descriptive. You hit the practical memory limit on your configuration. There is a lot of speculating about the limit that you can have with JVM, I do not know enough about that, since I can not find any official information. However, you will somehow be limited by constraints like the available swap, the kernel address space usage, the memory fragmentation, etc.
What is happening now is that ByteArrayOutputStream objects are created with a default buffer of size 32 if you do not supply any size (this is your case). Whenever you call the write method on the object, there is an internal machinery that is started. The openjdk implementation release 7u40-b43 that seems to match perfectly with the output of your error, uses an internal method ensureCapacity to check that the buffer has enough room to put the bytes you want to write. If there is not enough room, another internal method grow is called to grow the size of the buffer. The method grow defines the appropriate size and calls the method copyOf from the class Arrays to do the job.
The appropriate size of the buffer is the maximum between the current size and the size riquired to hold all the content (the present content and the new content to be write).
The method copyOf from the class Arrays (follow the link) allocates the space for the new buffer, copy the content of the old buffer to the new one and return it to grow.
Your problem occurs at the allocation of the space for the new buffer, After some write, you got to a point where the available memory is exhausted: java.lang.OutOfMemoryError: Java heap space.
If we look into details, you are reading by chunks of 2048. So
your first write to the grows the size of the buffer from 32 to 2048
your second call will double it to 2*2048
your third call will take it to 2^2*2048, you have to time to write two more times before the need of allocating.
then 2^3*2048, you will have the time for 4 mores writes before allocating again.
at some point, your buffer will be of size 2^18*2048 which is 2^19*1024 or 2^9*2^20 (512 MB)
then 2^19*2048 which is 1024 MB or 1 GB
Something that is unclear in your description is that you can somehow read up to 800MB, but can no go beyond. You have to explain that to me.
I expect that your limit be exactly a power of 2 (or close if we use power of 10 units somewere). In that regard, I expect you to start having trouble immediatly above one of these: 256MB, 512 MB, 1GB, 2GB, etc.
When you hit that limit, it does not mean that you are out of memory, it simply means that it is not possible to allocate another buffer of twice the size of the buffer you already have. This observation opens room for improvement in your work: find the maximum size of buffer that you can allocate and reserve it upfront by calling the appropriate constructor
ByteArrayOutputStream bArrStream = new ByteArrayOutputStream(myMaxSize);
It has the advantage of reducing the overhead background memory allocation that happens under the hood to keep you happy. By doing this, you will be able to go to 1.5 the limit you have right now. This is simply because the last time the buffer was increased, it went from half the current size to the current size, and at some point you had both the current buffer and the old one together in memory. But you will not be able to go beyond 3 times the limit you are having now. The explanation is exactly the same.
That been said, I do not have any magic suggestion to solve the problem apart from process your data by chunks of given size, one chunk at a time. Another good approach will be to use the suggestion of Takahiko Kawasaki and use MappedByteBuffer. Keep in mind that in any case you will need at least 10 GB of physical memory or swap memory to be able to load a file of 10GB.
see

After thinking about it, I decided to put a second answer. I considered the advantages and disadvantages of putting this second answer, and the advantages are worth going for it. So here it is.
Most of the suggested considerations are forgetting a given fact: There is a builtin limit in the size of arrays (including ByteArrayOutputStream) that you can have in Java. And that limit is dictated by the bigest int value which is 2^31 - 1(little bit less than 2Giga). This means that you can only read a maximum of 2 GB (-1 byte) and put it in a single ByteArrayOutputStream. The limit might actually be smaller for array size if the VM wants more control.
My suggestion is to use an ArrayList of byte[] instead of a single byte[] holding the full content of the file. And also remove the non necessary step of putting in ByteArrayOutputStream before putting it in a final data array. Here is an example based on your original code:
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
// good habits are good, define a buffer size
final int BUF_SIZE = (int)(Math.pow(2,30)); //1GB, let's not go close to the limit
byte[] localbuffer = new byte[BUF_SIZE];
int i = 0;
while (-1 != (i = inFileReader.read(localbuffer))) {
if(i<BUF_SIZE){
data.add( Arrays.copyOf(localbuffer, i) )
// No need to reallocate the reading buffer, we copied the data
}else{
data.add(localbuffer)
// reallocate the reading buffer
localbuffer = new byte[BUF_SIZE]
}
}
inFileReader.close();
// Process your data, keep in mind that you have a list of buffers.
// So you need to loop over the list
Simply running your program should work fine on 64 bits system with enough physical memory or swap. Now if you want to speed it up to help the VM size correctly the heap at the beginning, run with the options -Xms and -Xmx. For example if you want a heap of 12GB to be able to handle 10GB file, use java -Xms12288m -Xmx12288m YourApp

Read a file(>150MB) and return the file content as ByteArrayOutputStream

I am trying to read a large file (>150MB) and return the file content as a ByteArrayOutputStream. This is my code...
private ByteArrayOutputStream readfileContent(String url) throws IOException{
log.info("Entering readfileContent ");
ByteArrayOutputStream writer=null;
FileInputStream reader=null;
try{
reader = new FileInputStream(url);
writer = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int bytesRead = reader.read(buffer);
while (bytesRead = > -1) {
writer.write(buffer, 0, bytesRead);
buffer = new byte[1024];
}
}
finally {
writer.close();
}
log.info("Exiting readfileContent ");
return writer;
}
I am getting an java.lang.OutOfMemoryError: Java heap space exception. I have tried increasing the java heap size, but it still happens. Could someone please assist with this problem.

You should return the BufferedInputStream and let the caller read from it. What you are doing is copying the whole file into memory as a ByteArrayOutputStream.
Your question is missing what you want to do with the file content. Without that we can only guessing. There is a ServletOutputStream commented out. Did you want to write to this originally? Writing to this instead to the ByteArrayOutputStream should be working.

There is an error in the while loop. Change it to
while (bytesRead >= -1) {
writer.write(buffer, 0, bytesRead);
bytesRead = reader.read(buffer);
}
Also don't forget to close reader.
(It will still need quite large amount of memory.)

Your approach is going to use at least the same ammount of memory as the file, but because ByteArrayOutputStream is using a byte array as storage, it'll potentially have to resize itself 150,000 times (150 meg/1024k buffer) which is not efficient. Upping the heap size to 2* your file size and increasing the size of buf to something much larger may allow it to run, but as other posters have said, it's far better to read form the file as you go, rather than read it in as a String.

Since you know how many bytes you are going to be read, you can save time and space by creating the ByteArrayOutputStream with a size. This will save the time and space overheads of "growing" the ByteArrayOutputStream backing storage. (I haven't looked at the code, but it is probably using the same strategy as StringBuilder; i.e. doubling the allocation each time it runs out. That strategy may end up using up to 3 times the file size at peak usage.)
(And frankly, putting the output into a ByteArrayOutputStream when you know the size seems somewhat pointless. Just allocate byte array big enough and read directly into that.)
Apart from that, the answer is that you need to make the heap bigger.

I have seen similar issues in C# in Windows cause by not having enough Contiguous Virtual Memory on the host. If your on Windows, you can try increasing the VM space.

Tuning the performance of reading a large InputStream in java

I want to read a large InputStream and return it as a String.
This InputStream is a large one. So, normally it takes much time and a lot of memory while it is excuting.
The following code is the one that I've developed so far.
I need to convert this code as it does the job in a lesser time consuming lesser memory.
Can you give me any idea to do this.
BufferedReader br =
new BufferedReader(
new InputStreamReader(
connection.getInputStream(),
"UTF-8")
);
StringBuilder response = new StringBuilder(1000);
char[] buffer = new char[4096];
int n = 0;
while(n >= 0){
n = br.read(buffer, 0, buffer.length);
if(n > 0){
response.append(buffer, 0, n);
}
}
return response.toString();
Thank you!

When you are doing buffered I/O you can just read one char at a time from the buffered reader. Then build up the string, and do a toString() at the end.

You may find that for large files on some operating systems, mmaping the file via FileChannel.map will give you better performance - map the file and then create a string out of the mapped ByteBuffer. You'll have to benchmark though, as it may be that 'traditional' IO is faster in some cases.

Do you know in advance the likely maxiumum length of your string? You currently specify an intiial capacity of 1000 for your buffer. If what you read is lots bigger than thet you'll pay some cost in allocating larger internal buffers.
If you have control over the life-cycle of what you're reading, perhaps you could allocate a single re-usable byte array as the buffer. Hence avoiding garbage collection.

Increase the size of your buffer. The bigger the buffer, the faster all the data can be read. If you know (or can work out) how many bytes are available in the stream, you could even allocate a buffer of the same size up-front.

You could run the code in a separate thread... it won't run any faster but at least your program will be able to do some other work instead of waiting for data from the stream.

How to initialize a ByteBuffer if you don't know how many bytes to allocate beforehand?

Is this:
ByteBuffer buf = ByteBuffer.allocate(1000);
...the only way to initialize a ByteBuffer?
What if I have no idea how many bytes I need to allocate..?
Edit: More details:
I'm converting one image file format to a TIFF file. The problem is the starting file format can be any size, but I need to write the data in the TIFF to little endian. So I'm reading the stuff I'm eventually going to print to the TIFF file into the ByteBuffer first so I can put everything in Little Endian, then I'm going to write it to the outfile. I guess since I know how long IFDs are, headers are, and I can probably figure out how many bytes in each image plane, I can just use multiple ByteBuffers during this whole process.

The types of places that you would use a ByteBuffer are generally the types of places that you would otherwise use a byte array (which also has a fixed size). With synchronous I/O you often use byte arrays, with asynchronous I/O, ByteBuffers are used instead.
If you need to read an unknown amount of data using a ByteBuffer, consider using a loop with your buffer and append the data to a ByteArrayOutputStream as you read it. When you are finished, call toByteArray() to get the final byte array.
Any time when you aren't absolutely sure of the size (or maximum size) of a given input, reading in a loop (possibly using a ByteArrayOutputStream, but otherwise just processing the data as a stream, as it is read) is the only way to handle it. Without some sort of loop, any remaining data will of course be lost.
For example:
final byte[] buf = new byte[4096];
int numRead;
// Use try-with-resources to auto-close streams.
try(
final FileInputStream fis = new FileInputStream(...);
final ByteArrayOutputStream baos = new ByteArrayOutputStream()
) {
while ((numRead = fis.read(buf)) > 0) {
baos.write(buf, 0, numRead);
}
final byte[] allBytes = baos.toByteArray();
// Do something with the data.
}
catch( final Exception e ) {
// Do something on failure...
}
If you instead wanted to write Java ints, or other things that aren't raw bytes, you can wrap your ByteArrayOutputStream in a DataOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
while (thereAreMoreIntsFromSomewhere()) {
int someInt = getIntFromSomewhere();
dos.writeInt(someInt);
}
byte[] allBytes = baos.toByteArray();

Depends.
Library
Converting file formats tends to be a solved problem for most problem domains. For example:
Batik can transcode between various image formats (including TIFF).
Apache POI can convert between office spreadsheet formats.
Flexmark can generate HTML from Markdown.
The list is long. The first question should be, "What library can accomplish this task?" If performance is a consideration, your time is likely better spent optimising an existing package to meet your needs than writing yet another tool. (As a bonus, other people get to benefit from the centralised work.)
Known Quantities
Reading a file? Allocate file.size() bytes.
Copying a string? Allocate string.length() bytes.
Copying a TCP packet? Allocate 1500 bytes, for example.
Unknown Quantities
When the number of bytes is truly unknown, you can do a few things:
Make a guess.
Analyze example data sets to buffer; use the average length.
Example
Java's StringBuffer, unless otherwise instructed, uses an initial buffer size to hold 16 characters. Once the 16 characters are filled, a new, longer array is allocated, and then the original 16 characters copied. If the StringBuffer had an initial size of 1024 characters, then the reallocation would not happen as early or as often.
Optimization
Either way, this is probably a premature optimization. Typically you would allocate a set number of bytes when you want to reduce the number of internal memory reallocations that get executed.
It is unlikely that this will be the application's bottleneck.

The idea is that it's only a buffer - not the whole of the data. It's a temporary resting spot for data as you read a chunk, process it (possibly writing it somewhere else). So, allocate yourself a big enough "chunk" and it normally won't be a problem.
What problem are you anticipating?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.