Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.
Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.
Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}
Related
I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.
As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.
All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}
i want to read from a network stream and write the bytes to a file, directly.
But every time i run the program very few bytes are written to the file actually.
Java:
InputStream in = uc.getInputStream();
int clength=uc.getContentLength();
byte[] barr = new byte[clength];
int offset=0;
int totalwritten=0;
int i;
int wrote=0;
OutputStream out = new FileOutputStream("file.xlsx");
while(in.available()!=0) {
wrote=in.read(barr, offset, clength-offset);
out.write(barr, offset, wrote);
offset+=wrote;
totalwritten+=wrote;
}
System.out.println("Written: "+totalwritten+" of "+clength);
out.flush();
That's because available() doesn't do what you think it does. Read its API documentation. You should simply read until the number of bytes read, returned by read(), is -1. Or even simpler, use Files.copy():
Files.copy(in, new File("file.xlsx").toPath());
Using a buffer that has the size of the input stream also pretty much defeats the purpose of using a buffer, which is to only have a few bytes in memory.
If you want to reimplement copy(), the general pattern is the following:
byte[] buffer = new byte[4096]; // number of bytes in memory
int numberOfBytesRead;
while ((numberOfBytesRead = in.read(buffer)) >= 0) {
out.write(buffer, 0, numberOfBytesRead);
}
You're using .available() wrong. From Java documentation:
available() returns an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking by the next
invocation of a method for this input stream
That means that the first time your stream is slower than your file writing speed (very soon in all probability) the while ends.
You should either prepare a thread that waits for the input until it has read all the expected content length (with a sizable timeout, of course) or just block your program in the wait, if user interaction is not a big deal.
In client side, read code:
byte[] bytes = new byte[50]; //TODO should reuse buffer, for test only
ByteBuffer dst = ByteBuffer.wrap(bytes);
int ret = 0;
int readBytes = 0;
boolean fail = false;
try {
while ((ret = socketChannel.read(dst)) > 0) {
readBytes += ret;
System.out.println("read " + ret + " bytes from socket " + dst);
if (!dst.hasRemaining()) {
break;
}
}
int pos = dst.position();
byte[] data = new byte[pos];
dst.flip();
dst.get(data);
System.out.println("read data: " + StringUtil.toHexString(data));
} catch (Exception e) {
fail = true;
handler.onException(e);
}
The problem is socketChannel.read() always return positive, I checked the return buffer, the data is duplicate N times, it likes the low level socket buffer's position is not move forward. Any idea?
If the server only returned 48 bytes, your code must have blocked in the read() method trying to get the 49th and 50th bytes. So either your '50' is wrong or you will have to restructure your code to read and process whatever you get as you get it rather than trying to fill buffers first. And this can't possibly be the code where you think you always got the same data. The explanation for that would be failure to compact the buffer after the get, if you reuse the same buffer for the next read, which you should do, but your posted code doesn't do.
1 : This might not be a bug !
[assuming that there is readable data in the buffer]...
You would expect a -1 at the end of the stream... See http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/channels/SocketChannel.html#read%28java.nio.ByteBuffer%29
If you are continually recieving a positive value from the read() call, then you will need to determine why data is being read continually.
Of course, the mystery herein ultimately lies in the source data (i.e. the SocketChannel which you are read data from).
2: Explanation of your possible problems
If your socket channel is coming from a REAL file, which is finite then your file is really big, and eventually, the read() operation will return 0... eventually...
If, on the other hand, your socket channel is listening to a source of data which you EXPECT to be finite (i.e. a serialized object stream, for example), I would double check the source --- maybe your finite stream is simply producing more and more data... and you are correctly consuming it.
3: Finally some advice
A trick for debugging this type of error is playing with the ByteBuffer input to your read method : the nice thing about java.nio's ByteBuffers is that, since they are more object oriented then the older byte[] writers, you can get very fine-grained debugging of their operations.
I am using DataInputStream to read some bytes from a socket. I have a expected number of bytes to read from the stream (after decoding a header, I know how many bytes are in the message) It works 99% of the time but occasionally I will have the number of bytes read be less than len.
int numRead = dis.read(buffer, 0, len);
What could cause numRead to be less than len? It's not -1. I would expect the behavior of read to block until the stream is closed or EOF is reached, but if it's a socket underlying the streams this shouldn't happen unless the socket closes, right?
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
Thanks
EDIT: For a general stream, you just keep reading until you've read everything you want to, basically. For an implementation of DataInput (such as DataInputStream) you should use readFully, as suggested by Peter Lawrey. Consider the rest of this answer to be relevant for the general case where you just have an InputStream.
It's entirely reasonable for an InputStream of any type to give you less data than you asked for, even if more is on its way. You should always code for this possibility - with the possible exception of ByteArrayInputStream. (That can still return less data than was requested, of course, if there's less data left than you asked for.)
Here's the sort of loop I'm talking about:
byte[] data = new byte[messageSize];
int totalRead = 0;
while (totalRead < messageSize) {
int bytesRead = stream.read(data, totalRead, messageSize - totalRead);
if (bytesRead < 0) {
// Change behaviour if this isn't an error condition
throw new IOException("Data stream ended prematurely");
}
totalRead += bytesRead;
}
You can use DataInputStream this way.
byte[] bytes = new byte[len];
dis.readFully(bytes);
This will either return with all the data read or throw an IOException.
read returns each time with the bits that were available at that time and -1 when done, you are typically supposed to do
while (true) {
int numRead = dis.read(buffer, 0, len);
if (numRead == -1) break;
total.append(buffer, numRead);
}
I would expect the behavior of read to
block until the stream is closed or
EOF is reached.
Then you need to check the Javadocs. The contract of read() is that it will read at least one byte, blocking if necessary until it has done so, or until EOS or an exception occurs. There is nothing in the specification that says it will read the entire length you requested. That's why it returns a length.
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
You can use Apache Commons IOUtils, they have a method that does exactly what you need:
byte[] buf = new byte[BUFFER_SIZE];
int length;
do {
length = IOUtils.read(inputStream, buf);
if (length > 0) {
//do something with buf
}
} while (length == BUFFER_SIZE);
Why does the following method hang?
public void pipe(Reader in, Writer out) {
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append(buf.flip());
}
}
Answering my own question: you have to call buf.clear() between reads. Presumably, read is hanging because the buffer is full. The correct code is
public void pipe(Reader in, Writer out) {
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append(buf.flip());
buf.clear();
}
}
I would assume that it is a deadlock. The in.read(buf) locks the CharBuffer and prevents the out.append(buf) call.
That is assuming that CharBuffer uses locks (of some kind)in the implementation. What does the API say about the class CharBuffer?
Edit: Sorry, some kind of short circuit in my brain... I confused it with something else.
CharBuffers don't work with Readers and Writers as cleanly as you might expect. In particular, there is no Writer.append(CharBuffer buf) method. The method called by the question snippet is Writer.append(CharSequence seq), which just calls seq.toString(). The CharBuffer.toString() method does return the string value of the buffer, but it doesn't drain the buffer. The subsequent call to Reader.read(CharBuffer buf) gets an already full buffer and therefore returns 0, forcing the loop to continue indefinitely.
Though this feels like a hang, it is in fact appending the first read's buffer contents to the writer every pass through the loop. So you'll either start to see a lot of output in your destination or the writer's internal buffer will grow, depending on how the writer is implemented.
As annoying as it is, I'd recommend a char[] implementation if only because the CharBuffer solution winds up building at least two new char[] every pass through the loop.
public void pipe(Reader in, Writer out) throws IOException {
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int count = in.read(buf);
while( count >= 0 ) {
out.write(buf, 0, count);
count = in.read(buf);
}
}
I'd recommend only using this if you need to support converting between two character encodings, otherwise a ByteBuffer/Channel or byte[]/IOStream implementation would be preferable even if you're piping characters.