Input stream reads large files very slowly, why?

Input stream reads large files very slowly, why? - java

I am trying to submit a 500 MB file.
I can load it but I want to improve the performance.
This is the slow code:
File dest = getDestinationFile(source, destination);
if(dest == null) return false;
in = new BufferedInputStream(new FileInputStream(source));
out = new BufferedOutputStream(new FileOutputStream(dest));
byte[] buffer = new byte[1024 * 20];
int i = 0;
// this while loop is very slow
while((i = in.read(buffer)) != -1){
out.write(buffer, 0, i); //<-- SLOW HERE
out.flush();
}
How can I find why it is slow?
Isn't the byte array size / buffer size sufficient?
Do you have any ideas to improve the performance or?
Thanks in advance for any help

You should not flush in loop.
You are using BufferedOutputStream. This mean that after "caching" some amount of data it flushes data to file.
Your code just kills performance by flushing data after writing a little amount of data.
try do this like that:
while((i = in.read(buffer)) != -1){
out.write(buffer, 0, i); <-- SLOW HERE
}
out.flush();
..:: Edit: in response of comment below ::..
In my opinion you should not use buffer at all. You are using Buffered(Output/Input)Stream which means that they have his own buffer to read "package" of data from disk and save "package" of data. Im not 100% sure about performance in using additional buffer but I want you to show how I would do that:
File dest = getDestinationFile(source, destination);
if(dest == null) return false;
in = new BufferedInputStream(new FileInputStream(source));
out = new BufferedOutputStream(new FileOutputStream(dest));
int i;
while((i = in.read()) != -1){
out.write(i);
}
out.flush();
In my version you will just read a BYTE (no a int. Read doc: http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
this method returns int but this is just a BYTE) but there is no need to read a whole buffer (so you don't need to be worry about size of it).
Probably you should read more about streams to better understand what is nessesary to do with them.

Related

InputStream via ReadableByteChannel does not read to end

I have an existing problem where I am using InputStreams and I want to increase the performance of reading from this channel. Therefore i read with a ReadableByteChannel.
As a result the reading is much faster with this code:
public static String readAll(InputStream is, String charset, int size) throws IOException{
try(ByteArrayOutputStream bos = new ByteArrayOutputStream()){
java.nio.ByteBuffer buffer = java.nio.ByteBuffer.allocate(size);
try(ReadableByteChannel channel = Channels.newChannel(is)){
int bytesRead = 0;
do{
bytesRead = channel.read(buffer);
bos.write(buffer.array(), 0, bytesRead);
buffer.clear();
}
while(bytesRead >= size);
}
catch(Exception ex){
ex.printStackTrace();
}
String ans = bos.toString(charset);
return ans;
}
}
The Problem is: It does not read to the end every time! If I try to read a File it works pretty good. If I read from a network Socket (to request a WebPage manually for example) it sometimes stops somewhere in between.
What can I do to read to the end?
I don't want to use something like this:
StringBuilder result = new StringBuilder();
while(true){
int ans = is.read();
if(ans == -1) break;
result.append((char)ans);
}
return result.toString();
because this implementation is slow.
I hope you can help me with my problem. maybe i have some mistake in my code.

This causes problem:
... } while (bytesRead >= size);
Reading from socket may return when at least one byte was read (or even if no bytes in case of non-blocking). So if there are not enough bytes in OS socket buffer, the condition will break the loop although obviously not full content was read. If the size identifies expected length to be received, implement total += bytesRead and break the loop when total reaches size. Or if you reach end of file of course...

Your copy loop is completely wrong. There's no reason why bytesRead should ever be >= size, and it misbehaves at end of stream. It should be something like this:
while ((bytesRead = channel.read(buffer)) > 0)
{
bos.write(buffer.array(), 0, bytesRead);
buffer.clear();
}
with suitable adjustments for limiting the transfer to size bytes, which are non-trivial.
But layering all this over an existing InputStream cannot possibly be 'much faster' tha using the InputStream directly, unless because of the premature termination. Unless your idea of use an InputStream is what you posted, which is horrifically slow. Try that with a 'BufferedInputStream.

Is it possible to read images without ImageIO?

I am trying to read an image and deliver it through a Java socket. But there are some bits that does not fit. When viewing in a diff tool I realized that all numbers bigger than 127 were truncated.
So I wanted to just convert it to a char[] array and return it instead. Now I'm getting a complette different image, perhaps due to char's size.
try (PrintWriter out = new PrintWriter(this.socket.getOutputStream(), true);
BufferedInputStream in = new BufferedInputStream(new FileInputStream(filename), BUFSIZ)) {
byte[] buffer = new byte[BUFSIZ];
while (in.read(buffer) != -1) {
response.append(new String(buffer));
out.print(response.toString());
response.setLength(0);
}
} catch (IOException e) {
System.err.println(e.getMessage());
}
This is my reading and delivering code.
I've read many times to use ImageIO but I want to do it without, since I don't know whether it's an image or not. (And what about other file types like executables?)
So, is there any way to convert it to something like an unsigned byte that'll be delivered correctly on the client? Do I have to use something different than read() to achieve that?

Writers are for character data. Use the OutputStream. And you're making the usual mistake of assuming that read() filled the buffer.
The following loop will copy anything correctly. Memorize it.
int count;
byte[] buffer = new byte[8192];
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}

Repeat after me: a char is not a byte and it's not a code point.
Repeat after me: a Writer is not an OutputStream.
try (OutputStream out = this.socket.getOutputStream();
BufferedInputStream in = new BufferedInputStream(new FileInputStream(filename), BUFSIZ)) {
byte[] buffer = new byte[BUFSIZ];
int len;
while ((len = in.read(buffer))) != -1) {
out.write(buffer, 0, len);
}
} catch (IOException e) {
System.err.println(e.getMessage());
}
(this is from memory, check the args for write()).

java: read large binary file

I need to read out a given large file that contains 500000001 binaries. Afterwards I have to translate them into ASCII.
My Problem occurs while trying to store the binaries in a large array. I get the warning at the definition of the array ioBuf:
"The literal 16000000032 of type int is out of range."
I have no clue how to save these numbers to work with them! Has somebody an idea?
Here is my code:
public byte[] read(){
try{
BufferedInputStream in = new BufferedInputStream(new FileInputStream("data.dat"));
ByteArrayOutputStream bs = new ByteArrayOutputStream();
BufferedOutputStream out = new BufferedOutputStream(bs);
byte[] ioBuf = new byte[16000000032];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
out.write(ioBuf, 0, bytesRead);
}
out.close();
in.close();
return bs.toByteArray();
}

The maximum Index of an Array is Integer.MAX_VALUE and 16000000032 is greater than Integer.MAX_VALUE
Integer.MAX_VALUE = 2^31-1 = 2147483647
2147483647 < 16000000032
You could overcome this by checking if the Array is full and create another and continue reading.
But i'm not quite sure if your approach is the best way to perform this. byte[Integer_MAX_VALUE] is huge ;)
Maybe you can split the input file in smaller chunks process them.
EDIT: This is how you could read a single int of your file. You can resize the buffer's size to the amount of data you want to read. But you tried to read the whole file at once.
//Allocate buffer with 4byte = 32bit = Integer.SIZE
byte[] ioBuf = new byte[4];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
//if bytesRead == 4 you read 1 int
//do your stuff
}

If you need to declare a large constant, append an 'L' to it which indicates to the compiler that is a long constant. However, as mentioned in another answer you can't declare arrays that large.
I suspect the purpose of the exercise is to learn how to use the java.nio.Buffer family of classes.

I made some progress by starting from scratch! But I still have a problem.
My idea is to read up the first 32 bytes, convert them to a int number. Then the next 32 bytes etc. Unfortunately I just get the first and don't know how to proceed.
I discovered following method for converting these numbers to int:
public static int byteArrayToInt(byte[] b){
final ByteBuffer bb = ByteBuffer.wrap(b);
bb.order(ByteOrder.LITTLE_ENDIAN);
return bb.getInt();
}
so now I have:
BufferedInputStream in=null;
byte[] buf = new byte[32];
try {
in = new BufferedInputStream(new FileInputStream("ndata.dat"));
in.read(buf);
System.out.println(byteArrayToInt(buf));
in.close();
} catch (IOException e) {
System.out.println("error while reading ndata.dat file");
}

difference between input.read and input.read(array, offset, length)

I'm trying to understand how inputstreams work. The following block of code is one of the many ways to read data from a text file:-
File file = new File("./src/test.txt");
InputStream input = new BufferedInputStream (new FileInputStream(file));
int data = 0;
while (data != -1) (-1 means we reached the end of the file)
{
data = input.read(); //if a character was read, it'll be turned to a bite and we get the integer representation of it so a is 97 b is 98
System.out.println(data + (char)data); //this will print the numbers followed by space then the character
}
input.close();
Now to use input.read(byte, offset, length) i have this code. I got it from here
File file = new File("./src/test.txt");
InputStream input = new BufferedInputStream (new FileInputStream(file));
int totalBytesRead = 0, bytesRemaining, bytesRead;
byte[] result = new byte[ ( int ) file.length()];
while ( totalBytesRead < result.length )
{
bytesRemaining = result.length - totalBytesRead;
bytesRead = input.read ( result, totalBytesRead, bytesRemaining );
if ( bytesRead > 0 )
totalBytesRead = totalBytesRead + bytesRead;
//printing integer version of bytes read
for (int i = 0; i < bytesRead; i++)
System.out.print(result[i] + " ");
System.out.println();
//printing character version of bytes read
for (int i = 0; i < bytesRead; i++)
System.out.print((char)result[i]);
}
input.close();
I'm assuming that based on the name BYTESREAD, this read method is returning the number of bytes read. In the documentation, it says that the function will try to read as many as possible. So there might be a reason why it wouldn't.
My first question is: What are these reasons?
I could replace that entire while loop with one line of code: input.read(result, 0, result.length)
I'm sure the creator of the article thought about this. It's not about the output because I get the same output in both cases. So there has to be a reason. At least one. What is it?

The documentation of read(byte[],int,int says that it:
Reads up to len bytes of data.
An attempt is made to read as many as len bytes
A smaller number may be read.
Since we are working with files that are right there in our hard disk, it seems reasonable to expect that the attempt will read the whole file, but input.read(result, 0, result.length) is not guaranteed to read the whole file (it's not said anywhere in the documentation). Relying in undocumented behaviors is a source for bugs when the undocumented behavior change.
For instance, the file stream may be implemented differently in other JVMs, some OS may impose a limit on the number of bytes that you may read at once, the file may be located in the network, or you may later use that piece of code with another implementation of stream, which doesn't behave in that way.
Alternatively, if you are reading the whole file in an array, perhaps you could use DataInputStream.readFully
About the loop with read(), it reads a single byte each time. That reduces performance if you are reading a big chunk of data, since each call to read() will perform several tests (has the stream ended? etc) and may ask the OS for one byte. Since you already know that you want file.length() bytes, there is no reason for not using the other more efficient forms.

Imagine you are reading from a network socket, not from a file. In this case you don't have any information about the total amount of bytes in the stream. You would allocate a buffer of fixed size and read from the stream in a loop. During one iteration of the loop you can't expect there are BUFFERSIZE bytes available in the stream. So you would fill the buffer as much as possible and iterate again, until the buffer is full. This can be useful, if you have data blocks of fixed size, for example serialized object.
ArrayList<MyObject> list = new ArrayList<MyObject>();
try {
InputStream input = socket.getInputStream();
byte[] buffer = new byte[1024];
int bytesRead;
int off = 0;
int len = 1024;
while(true) {
bytesRead = input.read(buffer, off, len);
if(bytesRead == len) {
list.add(createMyObject(buffer));
// reset variables
off = 0;
len = 1024;
continue;
}
if(bytesRead == -1) break;
// buffer is not full, adjust size
off += bytesRead;
len -= bytesRead;
}
} catch(IOException io) {
// stream was closed
}
ps. Code is not tested and should only point out, how this function can be useful.

You specify the amount of bytes to read because you might not want to read the entire file at once or maybe you couldn't or might not want to create a buffer as large as the file.

Compressing and decompressing large size data in java?

I need to compress/decompress different types of files that are contained in a Folder the size of that folder might be more than 10-11 GB.
I used following code but this is taking long time to compress the data.
BufferedReader in = new BufferedReader(new FileReader("D:/ziptest/expansion1.MPQ"));
BufferedOutputStream out = new BufferedOutputStream(
new GZIPOutputStream(new FileOutputStream("test.gz")));
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
Please suggest me some fast compressing and decompressing library in java, i also want to split the large file in different parts such as in a chunk of 100MB each.

Reader/Writer is only for Text and if you try to read binary with these is will get corrupted.
Instead I suggest you use FileInputStream. The fastest way to copy the data is to use your own buffer.
InputStream in = new FileInputStream("D:/ziptest/expansion1.MPQ");
OutputStream out = new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("test.gz")));
byte[] bytes = new byte[32*1024];
int len;
while((len = in.read(bytes)) > 0)
out.write(bytes, 0, len);
in.close();
out.close();
Since you reading large chunks of bytes, it is more efficient not to BufferedInput/OuptuStream as this removes one copy. There is a BufferedOutptuStream after the GZIPOutputStream as you cannot control the size of data it produces.
BTW: If you are only reading this with Java, you can use DeflatorOutputStream, its slightly faster and smaller, but only supported by Java AFAIK.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Input stream reads large files very slowly, why? - java

Related

InputStream via ReadableByteChannel does not read to end

Is it possible to read images without ImageIO?

java: read large binary file

difference between input.read and input.read(array, offset, length)

Compressing and decompressing large size data in java?

Categories

Resources