Performance : BufferedOutputStream vs FileOutputStream in Java

Performance : BufferedOutputStream vs FileOutputStream in Java - java

I have read that BufferedOutputStream Class improves efficiency and must be used with FileOutputStream in this way -
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream("myfile.txt"));
and for writing to the same file below statement is also works -
FileOutputStream fout = new FileOutputStream("myfile.txt");
But the recommended way is to use Buffer for reading / writing operations and that's the reason only I too prefer to use Buffer for the same.
But my question is how to measure performance of above 2 statements. Is their any tool or kind of something, don't know exactly what? but which will be useful to analyse it's performance.
As new to JAVA language, I am very curious to know about it.

Buffering is only helpful if you are doing inefficient reading or writing. For reading, it's helpful for letting you read line by line, even when you could gobble up bytes / chars faster just using read(byte[]) or read(char[]). For writing, it allows you to buffer pieces of what you want to send through I/O with the buffer, and to send them only on flush (see PrintWriter (PrintOutputStream(?).setAutoFlush())
But if you are just trying to read or write as fast as you can, buffering doesn't improve performance
For an example of efficient reading from a file:
File f = ...;
FileInputStream in = new FileInputStream(f);
byte[] bytes = new byte[(int) f.length()]; // file.length needs to be less than 4 gigs :)
in.read(bytes); // this isn't guaranteed by the API but I've found it works in every situation I've tried
Versus inefficient reading:
File f = ...;
BufferedReader in = new BufferedReader(f);
String line = null;
while ((line = in.readLine()) != null) {
// If every readline call was reading directly from the FS / Hard drive,
// it would slow things down tremendously. That's why having a buffer
//capture the file contents and effectively reading from the buffer is
//more efficient
}

These numbers came from a MacBook Pro laptop using an SSD.
BufferedFileStreamArrayBatchRead (809716.60-911577.03 bytes/ms)
BufferedFileStreamPerByte (136072.94 bytes/ms)
FileInputStreamArrayBatchRead (121817.52-1022494.89 bytes/ms)
FileInputStreamByteBufferRead (118287.20-1094091.90 bytes/ms)
FileInputStreamDirectByteBufferRead (130701.87-956937.80 bytes/ms)
FileInputStreamReadPerByte (1155.47 bytes/ms)
RandomAccessFileArrayBatchRead (120670.93-786782.06 bytes/ms)
RandomAccessFileReadPerByte (1171.73 bytes/ms)
Where there is a range in the numbers, it varies based on the size of the buffer being used. A larger buffer results in more speed up to a point, typically somewhere around the size of the caches within the hardware and operating system.
As you can see, reading bytes individually is always slow. Batching the reads into chunks is easily the way to go. It can be the difference between 1k per ms and 136k per ms (or more).
These numbers are a little old, and they will vary wildly by setup but they will give you an idea. The code for generating the numbers can be found here, edit Main.java to select the tests that you want to run.
An excellent (and more rigorous) framework for writing benchmarks is JMH. A tutorial for learning how to use JMH can be found here.

Related

Java OutOfMemoryError in reading a large text file

I'm new to Java and working on reading very large files, need some help to understand the problem and solve it. We have got some legacy code which have to be optimized to make it run properly.The file size can vary from 10mb to 10gb only. only trouble start when file starting beyond 800mb size.
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];
ByteArrayOutputStream bArrStream = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
bArrStream.write(localbuffer, 0, i);
}
byte[] data = bArrStream.toByteArray();
inFileReader.close();
bos.close();
We are getting the error
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
Any help would be appreciated?

Try to use java.nio.MappedByteBuffer.
http://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html
You can map a file's content onto memory without copying it manually. High-level Operating Systems offer memory-mapping and Java has API to utilize the feature.
If my understanding is correct, memory-mapping does not load a file's entire content onto memory (meaning "loaded and unloaded partially as necessary"), so I guess a 10GB file won't eat up your memory.

Even though you can increase the JVM memory limit, it is needless and allocating a huge memory like 10GB to process a file sounds overkill and resource intensive.
Currently you are using a "ByteArrayOutputStream" which keeps an internal memory to keep the data. This line in your code keeps appending the last read 2KB file chunk to the end of this buffer:
bArrStream.write(localbuffer, 0, i);
bArrStream keeps growing and eventually you run out of memory.
Instead you should reorganize your algorithm and process the file in a streaming way:
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];
int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
//Deal with the current read 2KB file chunk here
}
inFileReader.close();

The Java virtual machine (JVM) runs with a fixed upper memory limit, which you can modify thus:
java -Xmx1024m ....
e.g. the above option (-Xmx...) sets the limit to 1024 megabytes. You can amend as necessary (within limits of your machine, OS etc.) Note that this is different from traditional applications which would allocate more and more memory from the OS upon demand.
However a better solution is to rework your application such that you don't need to load the whole file into memory at one go. That way you don't have to tune your JVM, and you don't impose a huge memory footprint.

You can't read 10GB Textfile in memory. You have to read X MB first, do something with it and than read the next X MB.

The problem is inherent in what you're doing. Reading entire files into memory is always and everywhere a bad idea. You're really not going to be able to read a 10GB file into memory with current technology unless you have some pretty startling hardware. Find a way to process them line by line, record by record, chunk by chunk, ...

Is it mandatory to get entire ByteArray() of output stream?
byte[] data = bArrStream.toByteArray();
Best approach is read line by line & write it line by line. You can use BufferedReader or Scanner to read large files as below.
import java.io.*;
import java.util.*;
public class FileReadExample {
public static void main(String args[]) throws FileNotFoundException {
File fileObj = new File(args[0]);
long t1 = System.currentTimeMillis();
try {
// BufferedReader object for reading the file
BufferedReader br = new BufferedReader(new FileReader(fileObj));
// Reading each line of file using BufferedReader class
String str;
while ( (str = br.readLine()) != null) {
System.out.println(str);
}
}catch(Exception err){
err.printStackTrace();
}
long t2 = System.currentTimeMillis();
System.out.println("Time taken for BufferedReader:"+(t2-t1));
t1 = System.currentTimeMillis();
try (
// Scanner object for reading the file
Scanner scnr = new Scanner(fileObj);) {
// Reading each line of file using Scanner class
while (scnr.hasNextLine()) {
String strLine = scnr.nextLine();
// print data on console
System.out.println(strLine);
}
}
t2 = System.currentTimeMillis();
System.out.println("Time taken for scanner:"+(t2-t1));
}
}
You can replace System.out with your ByteArrayOutputStream in above example.
Please have a look at below article for more details: Read Large File
Have a look at related SE question:
Scanner vs. BufferedReader

ByteArrayOutputStream writes to an in-memory buffer. If this is really how you want it to work, then you have to size the JVM heap after the maximum possible size of the input. Also, if possible, you may check the input size before even start processing to save time and resources.
The alternative approach is a streaming solution, where the amount of memory used at runtime is known (maybe configurable but still known before the program starts), but if it's feasible or not depends entirely on you application's domain (because you can't use an in-memory buffer anymore) and maybe the architecture of the rest of your code if you can't/don't want to change it.

Try using a large buffer read size may be 10 mb and then check.

Read the file iteratively linewise. This would significantly reduce memory consumption. Alternately you may use
FileUtils.lineIterator(theFile, "UTF-8");
provided by Apache Commons IO.
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}

Run Java with the command-line option -Xmx, which sets the maximum size of the heap.
See here for details..

Assuming that you are reading large txt file and the data is set line by line , use line by line reading approach. As I know you can read up to 6GB may be more.
...
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
br.close();
Refrence for the code fragment

Short answer,
without doing anything, you can push the current limit by a factor of 1.5. It means that, if you are able to process 800MB, you can process 1200 MB. It also means that if by some trick with java -Xm .... you can move to a point where your current code can process 7GB, your problem is solved, because the 1.5 factor will take you to 10.5GB, assuming you have that space available on your system and that JVM can get it.
Long answer:
The error is pretty self-descriptive. You hit the practical memory limit on your configuration. There is a lot of speculating about the limit that you can have with JVM, I do not know enough about that, since I can not find any official information. However, you will somehow be limited by constraints like the available swap, the kernel address space usage, the memory fragmentation, etc.
What is happening now is that ByteArrayOutputStream objects are created with a default buffer of size 32 if you do not supply any size (this is your case). Whenever you call the write method on the object, there is an internal machinery that is started. The openjdk implementation release 7u40-b43 that seems to match perfectly with the output of your error, uses an internal method ensureCapacity to check that the buffer has enough room to put the bytes you want to write. If there is not enough room, another internal method grow is called to grow the size of the buffer. The method grow defines the appropriate size and calls the method copyOf from the class Arrays to do the job.
The appropriate size of the buffer is the maximum between the current size and the size riquired to hold all the content (the present content and the new content to be write).
The method copyOf from the class Arrays (follow the link) allocates the space for the new buffer, copy the content of the old buffer to the new one and return it to grow.
Your problem occurs at the allocation of the space for the new buffer, After some write, you got to a point where the available memory is exhausted: java.lang.OutOfMemoryError: Java heap space.
If we look into details, you are reading by chunks of 2048. So
your first write to the grows the size of the buffer from 32 to 2048
your second call will double it to 2*2048
your third call will take it to 2^2*2048, you have to time to write two more times before the need of allocating.
then 2^3*2048, you will have the time for 4 mores writes before allocating again.
at some point, your buffer will be of size 2^18*2048 which is 2^19*1024 or 2^9*2^20 (512 MB)
then 2^19*2048 which is 1024 MB or 1 GB
Something that is unclear in your description is that you can somehow read up to 800MB, but can no go beyond. You have to explain that to me.
I expect that your limit be exactly a power of 2 (or close if we use power of 10 units somewere). In that regard, I expect you to start having trouble immediatly above one of these: 256MB, 512 MB, 1GB, 2GB, etc.
When you hit that limit, it does not mean that you are out of memory, it simply means that it is not possible to allocate another buffer of twice the size of the buffer you already have. This observation opens room for improvement in your work: find the maximum size of buffer that you can allocate and reserve it upfront by calling the appropriate constructor
ByteArrayOutputStream bArrStream = new ByteArrayOutputStream(myMaxSize);
It has the advantage of reducing the overhead background memory allocation that happens under the hood to keep you happy. By doing this, you will be able to go to 1.5 the limit you have right now. This is simply because the last time the buffer was increased, it went from half the current size to the current size, and at some point you had both the current buffer and the old one together in memory. But you will not be able to go beyond 3 times the limit you are having now. The explanation is exactly the same.
That been said, I do not have any magic suggestion to solve the problem apart from process your data by chunks of given size, one chunk at a time. Another good approach will be to use the suggestion of Takahiko Kawasaki and use MappedByteBuffer. Keep in mind that in any case you will need at least 10 GB of physical memory or swap memory to be able to load a file of 10GB.
see

After thinking about it, I decided to put a second answer. I considered the advantages and disadvantages of putting this second answer, and the advantages are worth going for it. So here it is.
Most of the suggested considerations are forgetting a given fact: There is a builtin limit in the size of arrays (including ByteArrayOutputStream) that you can have in Java. And that limit is dictated by the bigest int value which is 2^31 - 1(little bit less than 2Giga). This means that you can only read a maximum of 2 GB (-1 byte) and put it in a single ByteArrayOutputStream. The limit might actually be smaller for array size if the VM wants more control.
My suggestion is to use an ArrayList of byte[] instead of a single byte[] holding the full content of the file. And also remove the non necessary step of putting in ByteArrayOutputStream before putting it in a final data array. Here is an example based on your original code:
InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
// good habits are good, define a buffer size
final int BUF_SIZE = (int)(Math.pow(2,30)); //1GB, let's not go close to the limit
byte[] localbuffer = new byte[BUF_SIZE];
int i = 0;
while (-1 != (i = inFileReader.read(localbuffer))) {
if(i<BUF_SIZE){
data.add( Arrays.copyOf(localbuffer, i) )
// No need to reallocate the reading buffer, we copied the data
}else{
data.add(localbuffer)
// reallocate the reading buffer
localbuffer = new byte[BUF_SIZE]
}
}
inFileReader.close();
// Process your data, keep in mind that you have a list of buffers.
// So you need to loop over the list
Simply running your program should work fine on 64 bits system with enough physical memory or swap. Now if you want to speed it up to help the VM size correctly the heap at the beginning, run with the options -Xms and -Xmx. For example if you want a heap of 12GB to be able to handle 10GB file, use java -Xms12288m -Xmx12288m YourApp

What is the result of buffering a buffered stream in java?

Was writing the javadoc for :
/**
* ...Buffers the input stream so do not pass in a BufferedInputStream ...
*/
public static void meth(InputStream is) throws IOException {
BufferedInputStream bis = new BufferedInputStream(is,
INPUT_STREAM_BUFFER_SIZE);
// rest omitted
}
But is it really a problem to pass a buffered input stream in ? So this :
InputStream is = new BufferedInputStream(new FileInputStream("C:/file"), SIZE);
meth(is);
would buffer the is into bis - or would java detect that is is already buffered and set bis = is ? If yes, would different buffer sizes make a difference ? If no, why not ?
NB : I am talking about input streams but actually the question is valid for output streams too

But is it really a problem to pass a buffered input stream in ?
Not really. There is potentially a small overhead in doing this, but it is negligible compared with the overall cost of reading input.
If you look at the code of BufferedInputStream, (e.g. the read1 method) you will see that block reads are implemented to be efficient when buffered streams are stacked.
[Re the example code:] would java detect that is is already buffered and set bis = is ?
No.
If no, why not ?
Because Java (the language, the compiler) generally doesn't understand the semantics of Java library classes. And in this case, since the benefit of such an optimization would be negligible, it i not worthwhile implementing.
Of course, you are free to write your meth method to do this kind of thing explicitly ... though I predict that it will make little difference.
I do not quite get why in read1 they "bother" to copy to the input buffer only if the requested length is less than the buf.length (or if there is a marked position in the input stream)
I assume that you are referring to this code (in read1):
if (len >= getBufIfOpen().length && markpos < 0) {
return getInIfOpen().read(b, off, len);
}
The first part is saying that if the user is asking for less than the stream's configured buffer size, we don't want to short-circuit the buffering. (Otherwise, we'd have the problem that doing a read(byte[], int, int) with a small requested length would be pessimal.)
The second part is to do with the way that mark / reset is implemented. Instead of using mark / reset on the underlying stream (which may or may not be supported), a BufferedInputStream uses the buffer to implement it. What you are seeing is part of that logic. (You can work the details for yourself ... reading the comments in the source code.)

If you buffer the stream twice then it will use more memory and be slower than if you only did so once, but it will still work.
It's certainly worth documenting that your stream does buffering so that users will know they don't need to do so themselves.
Generally it's best to discourage rather than actively prevent this sort of misuse.

The answer is no, Java would not detect the double buffering.
It is up to the user to avoid this problem. The BufferedInputStream has no way of knowing whether the InputStream you pass into the constructor is buffered or not.
Here is the source code for the BufferedInputStream constructor:
public BufferedInputStream(InputStream in, int size) {
super(in);
if (size <= 0) {
throw new IllegalArgumentException("Buffer size <= 0");
}
buf = new byte[size];
}
EDIT
From the comments is it a problem to double buffer a stream?
The short answer is yes.
The idea of buffering is to increase speed so that data is spooled into memory and written out (usually to very slow IO) in chunks. If you double buffer you spool data into memory and then flush that data back into memory somewhere else. This certainly has a cost in terms of speed...

How to deal with reading and processing huge text files without getting OutofMemoryError

I wrote some straightforward code to read text files (>1g) and do some processing on Strings.
However, I have to deal with Java heap space problems since I try to append Strings (using StringBuilder) that are getting to big on memory usage at some point. I know that I can increase my heap space with, e. g. '-Xmx1024', but I would like to work with only little memory usage here.How could I change my code below to manage my operations?
I am still a Java novice and maybe I made some mistakes in my code which may seem obvious to you.
Here's the code snippet:
private void setInputData() {
Pattern pat = Pattern.compile("regex");
BufferedReader br = null;
Matcher mat = null;
try {
File myFile = new File("myFile");
FileReader fr = new FileReader(myFile);
br = new BufferedReader(fr);
String line = null;
String appendThisString = null;
String processThisString = null;
StringBuilder stringBuilder = new StringBuilder();
while ((line = br.readLine()) != null) {
mat = pat.matcher(line);
if (mat.find()) {
appendThisString = mat.group(1);
}
if (line.contains("|")) {
processThisString = line.replace(" ", "").replace("|", "\t");
stringBuilder.append(processThisString).append("\t").append(appendThisString);
stringBuilder.append("\n");
}
}
// doSomethingWithTheString(stringBuilder.toString());
} catch (Exception ex) {
ex.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Here's the error message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at Test.setInputData(Test.java:47)
at Test.go(Test.java:18)
at Test.main(Test.java:13)

You could do a dry run, without appending, but counting the total string length.
If doSomethingWithTheString is sequential there would be other solutions.
You could tokenize the string, reducing the size. For instance Huffman compression looks for already present sequences reading a char, possible extends the table and then yields a table index. (The open source OmegaT translation tool uses such a strategy at one spot for tokens.) So it depends on the processing you want to do. Seeing the reading of a kind of CSV a dictionary seems feasible.
In general I would use a database.
P.S. you can save half the memory, writing all to a file, and then rereading the file in one string. Or use a java.nio ByteBuffer on the file, a memory mapped file.

You can't use StringBuilder in this case. It holds data in memory.
I think you should consider saving the result into file in every line.
i.e. Use FileWriter instead of StringBuilder.

The method doSomethingWithTheString() should probably need to change so that it accepts an InputStream as well. While reading the original file content and transforming it line by line you should write the transformed content to a temporary file line by line. Then an input stream to that temporary file could be send to the doSomethingWithTheString() method. Probably the method needs to be renamed as doSomethingWithInputStream().

From your example it is not clear what you are going to do with your enormous string once you have modified it. However since your modifications do not appear to span multiple lines I'd just write the modified data to a new file.
In order to do that create and open a new FileWriter object before your while cycle, move your stringBuffer declaration to the beginning of the cycle and write stringBuffer to your new file at the end of the cycle.
If, on the other hand, you do need to combine data coming from different lines consider using a database. Which kind depends on the nature of your data. If it has a record-like organization you might adopt a relational database, such as Apache Derby or MySQL, otherwise you might check out so called No SQL databases, such as Cassandra or MongoDB.

The general strategy is to design your application so that it doesn't need to hold the entire file (or too large a proportion of it) in memory.
Depending on what your application does:
You could write the intermediate data to a file and read it back again a line at a time to process it.
You could pass each line read to the processing algorithm; e.g. by calling doSomethingWithTheString(...) on each line individually rather than all of them.
But if you need to have the entire file in memory, you are between a rock and a hard place.
The other thing to note is that using a StringBuilder like that may require up to 6 times as much memory as the file size. It goes like this.
When the StringBuilder needs to expand its internal buffer it does this by making a char array twice the size of the current buffer, and copying from the old to the new. At that point you have 3 times as much buffer space allocated as you have before the buffer expansion started. Now suppose that there was just one more character to append to the buffer.
If the file is in ASCII (or another 8 bit charset), the StringBuilder's buffer needs twice that amount of memory ... because it consists of char not byte values.
If you have a good estimate of the number of characters that will be in the final string (e.g. from the file size), you can avoid the x3 multiplier by giving a capacity hint when you create the StringBuilder. However, you mustn't underestimate, 'cos if you underestimate just slightly ...
You could also use a byte-oriented buffer (e.g. a ByteArrayOutputStream) instead of a StringBuilder ... and then read it with a ByteArrayInputStream / StreamReader / BufferedReader pipeline.
But ultimately, holding a large file in memory doesn't scale as the file size increases.

Are you sure there is a line terminator in the file? If not, your while loop will just keeps looping and leads to your error. If so, it might worth trying reading a fixed number of bytes at a time so that the reader won't grow infinitely.

I suggest the use of Guavas FileBackedOutputStream. You gain the advantage of having an OutputStream that will eat up disk io instead of main memory. Of course access will be slower due to the disk io, but, if you are dealing with such a large stream, and you are unable to chunk it into a more managable size, it is a good option.

Memory required by JVM for creating CSV files and zip it on the fly

I am creating two CSV files using String buffers and byte arrays.
I use ZipOutputStream to generate the zip files. Each csv file will have 20K records with 14 columns. Actually the records are fetched from DB and stored in ArrayList. I have to iterate the list and build StringBuffer and convert the StringBuffer to byte Array to wirte it to the zip entry.
I want to know the memory required by JVM to do the entire process starting from storing the records in the ArrayList.
I have provide the code snippet below.
StringBuffer responseBuffer = new StringBuffer();
String response = new String();
response = "Hello, sdksad, sfksdfjk, World, Date, ask, askdl, sdkldfkl, skldkl, sdfklklgf, sdlksldklk, dfkjsk, dsfjksj, dsjfkj, sdfjkdsfj\n";
for(int i=0;i<20000;i++){
responseBuffer.append(response);
}
response = responseBuffer.toString();
byte[] responseArray = response.getBytes();
res.setContentType("application/zip");
ZipOutputStream zout = new ZipOutputStream(res.getOutputStream());
ZipEntry parentEntry = new ZipEntry("parent.csv");
zout.putNextEntry(parentEntry);
zout.write(responseArray);
zout.closeEntry();
ZipEntry childEntry = new ZipEntry("child.csv");
zout.putNextEntry(childEntry);
zout.write(responseArray);
zout.closeEntry();
zout.close();
Please help me with this. Thanks in advance.

I'm guessing you've already tried counting how many bytes will be allocated to the StringBuffer and the byte array. But the problem is you can't really know how much memory your app will use unless you have upper bounds on the sizes of the CSV records. I'm If you want your software to be stable, robust and scalable, I'm afraid you're asking the wrong question: you should strive on performing the task you need to do using a fixed amount of memory, which in your case seems easily possible.
The key is, that in your case the processing is entirely FIFO - you read records from the database, and then write them (in the same order) into a FIFO stream (OutputStream in that case). Even zip compression is stream-based, and uses a fixed amount of memory internally, so you're totally safe there.
Instead of buffering the entire input in a huge String, then converting it to a huge byte array, then writing it to the output stream - you should read each response element separately from the database (or chunks of fixed size, say 100 records at a time), and write it to the output stream. Something like
res.setContentType("application/zip");
ZipOutputStream zout = new ZipOutputStream(res.getOutputStream());
ZipEntry parentEntry = new ZipEntry("parent.csv");
zout.putNextEntry(parentEntry);
while (... fetch entries ...)
zout.write(...data...)
zout.closeEntry();
The advantage of this approach is that because it works with small chunks you can easily estimate their sizes, and allocate enough memory for your JVM so it never crashes. And you know it will still work if your CSV files become much more than 20K lines in the future.

To analyze the memory usage you can use a Profiler.
JProfiler or YourKit is very good at doing this.
VisualVM is also good to an extent.

You can measure the memory with the MemoryTestbench.
http://www.javaspecialists.eu/archive/Issue029.html
This article desribes what to do. Its simple, and acurate to 1 byte, I often use it.
It even could be run form a junit test case, so its very usefull, while a profiler could not be run
from a junit test case.
With that apporach, you even can measure the memory size of one Integer object.
But with zip there is one special thing. Zipstream uses a native c library, in that case the MemoryTestbench may not measure that memory, only the java part.
You should try both variants, the MemroyTestbench, and with profilers (jprof).

Tuning the performance of reading a large InputStream in java

I want to read a large InputStream and return it as a String.
This InputStream is a large one. So, normally it takes much time and a lot of memory while it is excuting.
The following code is the one that I've developed so far.
I need to convert this code as it does the job in a lesser time consuming lesser memory.
Can you give me any idea to do this.
BufferedReader br =
new BufferedReader(
new InputStreamReader(
connection.getInputStream(),
"UTF-8")
);
StringBuilder response = new StringBuilder(1000);
char[] buffer = new char[4096];
int n = 0;
while(n >= 0){
n = br.read(buffer, 0, buffer.length);
if(n > 0){
response.append(buffer, 0, n);
}
}
return response.toString();
Thank you!

When you are doing buffered I/O you can just read one char at a time from the buffered reader. Then build up the string, and do a toString() at the end.

You may find that for large files on some operating systems, mmaping the file via FileChannel.map will give you better performance - map the file and then create a string out of the mapped ByteBuffer. You'll have to benchmark though, as it may be that 'traditional' IO is faster in some cases.

Do you know in advance the likely maxiumum length of your string? You currently specify an intiial capacity of 1000 for your buffer. If what you read is lots bigger than thet you'll pay some cost in allocating larger internal buffers.
If you have control over the life-cycle of what you're reading, perhaps you could allocate a single re-usable byte array as the buffer. Hence avoiding garbage collection.

Increase the size of your buffer. The bigger the buffer, the faster all the data can be read. If you know (or can work out) how many bytes are available in the stream, you could even allocate a buffer of the same size up-front.

You could run the code in a separate thread... it won't run any faster but at least your program will be able to do some other work instead of waiting for data from the stream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.