Java: Inputstream mark limit

Java: Inputstream mark limit - java

According to Java documentation, the readlimit parameter of the mark method in the Class InputStream server for set "the maximum limit of bytes that can be read before the mark position becomes invalid.".
I have a file named sample.txt whose content is "hello". And i wrote this code:
import java.io.*;
public class InputStream{
public static void main (String[] args) throws IOException {
InputStream reader = new FileInputStream("sample.txt");
BufferedInputStream bis = new BufferedInputStream(reader);
bis.mark(1);
bis.read();
bis.read();
bis.read();
bis.read();
bis.reset();
System.out.println((char)bis.read());
}
}
The output is "h". But if i after the mark method read more than one bytes, shouldn't i get an error for the invalid reset method call?

I would put this down to documentation error.
The non-parameter doc for BufferedInputStream is "See the general contract of the mark method of InputStream," which to me indicates that BufferedInputStream does not behave differently, parameter doc notwithstanding.
And the general contract, as specified by InputStream, is
The readlimit arguments tells this input stream to allow that many bytes to be read before the mark position gets invalidated [...] the stream is not required to remember any data at all if more than readlimit bytes are read from the stream
In other words, readlimit is a suggestion; the stream is free to under-promise and over-deliver.

If you look at the source, particularly the fill() method, you can see (after a while!) that it only invalidates the mark when it absolutely has to, i.e. it is more tolerant than the documentation might suggest.
...
else if (pos >= buffer.length) /* no room left in buffer */
if (markpos > 0) { /* can throw away early part of the buffer */
int sz = pos - markpos;
System.arraycopy(buffer, markpos, buffer, 0, sz);
pos = sz;
markpos = 0;
} else if (buffer.length >= marklimit) {
markpos = -1; /* buffer got too big, invalidate mark */
pos = 0; /* drop buffer contents */
....
The default buffer size is relatively large (8K), so invalidation won't be triggered in your example.

Looking at the implementation of BufferedInputStream, it describes the significance of the marker position in the JavaDocs (of the protected markpos field):
[markpos is] the value of the pos field at the time the last mark method was called.
This value is always in the range -1 through pos. If there is no marked position in the input stream, this field is -1. If there is a marked position in the input stream, then buf[markpos] is the first byte to be supplied as input after a reset operation. If markpos is not -1, then all bytes from positions buf[markpos] through buf[pos-1] must remain in the buffer array (though they may be moved to another place in the buffer array, with suitable adjustments to the values of count, pos, and markpos); they may not be discarded unless and until the difference between pos and markpos exceeds marklimit.
Hope this helps. Take a peek at the definitions of read, reset and the private method fill in the class to see how it all ties together.
In short, only when the class retrieves more data to fill its buffer will the mark position be taken into account. It will be correctly invalidated if more bytes are read than the call to mark allowed. As a result, calls to read will not necessarily trigger the behaviour advertised in the public JavaDoc comments.

This looks like a subtle bug. If you reduce the buffer sizey you'll get an IOException
public static void main(String[] args) throws IOException {
InputStream reader = new ByteArrayInputStream(new byte[]{1, 2, 3, 4, 5, 6, 7, 8});
BufferedInputStream bis = new BufferedInputStream(reader, 3);
bis.mark(1);
bis.read();
bis.read();
bis.read();
bis.read();
bis.reset();
System.out.println((char)bis.read());
}

Related

Why did the value of "fileInputStream.read()" be changed?

I think that only the value of i was changed, but why it was fileInputStream.read()?
import java.io.*;
public class FileStream_byte1 {
public static void main(String[] args) throws IOException {
FileOutputStream fOutputStream = new FileOutputStream("FileStream_byte1.txt");
fOutputStream.write(-1);
fOutputStream.close();
FileInputStream fileInputStream = new FileInputStream("FileStream_byte1.txt");
int i;
System.out.println(" " + fileInputStream.read());
fileInputStream.close();
}
}
//The result is 255
import java.io.*;
public class FileStream_byte1 {
public static void main(String[] args) throws IOException {
FileOutputStream fOutputStream = new FileOutputStream("FileStream_byte1.txt");
fOutputStream.write(-1);
fOutputStream.close();
FileInputStream fileInputStream = new FileInputStream("FileStream_byte1.txt");
int i ;
while ((i = fileInputStream.read()) != -1)
System.out.println(" " + fileInputStream.read());
fileInputStream.close();
}
}
//The result is -1

The reason why you read 255 (in first case) despite writing -1 can be seen in the documentation of OutputStream.write(int) (emphasis mine):
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
FileOutputStream gives no indication of changing that behavior.
Basically, InputStream.read and OutputStream.write(int) use ints to allow the use of unsigned "bytes". They both expect the int to be in the range of 0-255 (the range of a byte). So while you called write(-1) it will only write the "eight low-order bits" of -1 which results in writing 255 to the file.
The reason you get -1 in the second case is because you are calling read twice but there is only one byte in the file and you print the result of the second read.
From the documentation of InputStream.read (emphasis mine):
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

You can find that out by reading the documentation, quoting
"
public int read() throws IOException. It reads a byte of data from this input stream. This method blocks if no input is yet available. Specified by:
read in class InputStream. Returns: the next byte of data, or -1 if the
end of the file is reached. Throws: IOException - if an I/O error
occurs.
"
It returns -1 because it reaches the end of the file.
In your while loop, it reads the byte of data you wrote, doesn't print anything and then the next time it tries to read a byte of data, there is none, so it returns -1 and since its matching your condition it prints it proceeding to exit the subroutine

In your code you are calling read twice, the first time it will return 255 then the next time -1 indicating that the stream is ended
try
while ((i = fileInputStream.read()) != -1)
System.out.println(" " + i);

Why BufferedInputStream#reset() did not thrown a RuntimeException?

BufferedInputStream#mark(int) function gets as an argument the limit of bytes that can be read, once read, the mark become invalidated.
In the OCP book mark(int) is described as:
...you can call mark(int) with a read-ahead limit value. You can then
read as many bytes as you want up to the limit value.
So the code below is setting limit value to 1 byte, after reading that byte, mark should be invalidated and calling .reset() function should throw a RuntimeException, yet it is not happening. Is it the JVM which is somehow overriding the argument passed to mark function?
public class Main {
public static void main(String[] args) throws IOException {
Path path = Paths.get("Java8_IOandNIO\\src\\main\\resources\\abcd.txt");
File f = new File(path.toString());
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
bis.mark(1);
System.out.println((char) bis.read());
System.out.println((char) bis.read());
System.out.println((char) bis.read());
bis.reset();
System.out.println("called reset");
System.out.println((char) bis.read());
System.out.println((char) bis.read());
System.out.println((char) bis.read());
}
}
The code each time is printing data from the sample file:
A
B
C
called reset
A
B
C

Well, the documentation (original contract from InputStream) states:
If the method mark has not been called since the stream was created, or the number of bytes read from the stream since mark was last called is larger than the argument to mark at that last call, then an IOException might be thrown.
(Emphasis mine)
This means that the limit is a recommendation. It is not mandatory that the mark will be invalidated after that number of bytes have been read.

Because:
the OCP you quoted doesn't say anything about throwing a RuntimeException;
it doesn't say you can't necessarily read more;
the OCP isn't a normative reference;
the real normative reference doesn't say so either; and
the stream is buffered, so it can support a mark of up to its internal buffer size, which is currently 8192 bytes.

How Buffer Streams works internally in Java

I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.

As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.

All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}

Why doesn't InputStream fill the array fully?

Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.

Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.

Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}

BufferedInputStream isn't marking

A BufferedInputStream that I have isn't marking correctly. This is my code:
public static void main(String[] args) throws Exception {
byte[] b = "HelloWorld!".getBytes();
BufferedInputStream bin = new BufferedInputStream(new ByteArrayInputStream(b));
bin.mark(3);
while (true){
byte[] buf = new byte[4096];
int n = bin.read(buf);
if (n == -1) break;
System.out.println(n);
System.out.println(new String(buf, 0, n));
}
}
This is outputting:
11
HelloWorld!
I want it to output
3
Hel
8
loWorld!
I also tried the code with just a pure ByteArrayInputStream as bin, and it didn't work either.

I think you're misunderstanding what mark does.
The purpose of mark is to cause the stream to remember its current position, so you can return to it later using reset(). The argument isn't how many bytes will be read next -- it's how many bytes you'll be able to read afterward before the mark is considered invalid (ie: you won't be able to reset() back to it; you'll either get an exception or end up at the start of the stream instead).
See the docs on InputStream for details. Readers' mark methods work quite similarly.

That's not what mark() does. You need to re-read the documentation. Mark lets you go backward through the stream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Inputstream mark limit - java

Related

Why did the value of "fileInputStream.read()" be changed?

Why BufferedInputStream#reset() did not thrown a RuntimeException?

How Buffer Streams works internally in Java

Why doesn't InputStream fill the array fully?

BufferedInputStream isn't marking

Categories

Resources