How can I read a file multiple times in java - java

This is my understanding regarding reading a file using BufferedReader in java. Please correct me if I am wrong somewhere...
Recently I had a requirement where we are required to read a file multiple times.
The usual way which I use is setting a mark() and doing a reset. But the input parameters to
a mark is an integer and it cannot accept a long number. Is there a way in which we can read the file, a large number of times.
In c++ we can do a seekg on the fstream and read the contents once again irrespective of the number of times we want to do so. Is there anything in java which is of this nature.

Just close the file and read it again.
But review your requirement. Why can't you process it in one pass?

Not much of a good answer but if you want to do random reading and writing then you can use Channels in java.nio package.
BufferedReader is for reading a file when you logically see it as a series of records and records are generally accessed sequentially.
Channels allow you to view your file as a series of blocks. Blocks are meant to be read randomly. :)
Using subclass of channel, FileChannel, you can read what you want from wherever you want. You need to specify two things:
Where to read from.
How much to read.
It has a read(dst,pstn) where dst is a ByteBuffer and pstn is a long position.
Don't worry that it is abstract because you use it via Files.newByteChannel() which does all the voodoo needed to make it work :)

Related

Confused about inputstreams and reading from files

I tried to understand the logic behind inputstreams and reading from files, but I fail to understand how you can read from a file using an inputstream.
My understanding is that when using input devices like a keyboard, you send input data through the input stream to the system. If you are reading from an input stream, aren't you reading the input data that's being send to the system at that time?
If we are creating an inputstream with the following code:
FileInputStream test = new FileInputStream("loremipsum.txt");
And if we try to read from the newly created inputstream with test.read(); how is there any data flowing through the inputstream? As no inputdata has been input from an input device at the time, but has already been input way beforehand. Is there something I'm missing out on? It almost seems to me as input streams are used in two different ways: Java using inputstreams to read data from a source and input devices using to input data to a source.
Java streams are a general concept / interface - a stream of data that you need to open, then read the data from (or write data to for output streams), then close. The basic stream only supports sequential reading / writing, no random access. Also, the data may or may not be readily available when you attempt to read from the stream, so the read may or may not block.
This abstraction allows us to use the same approach regardless of where we read the data from - it might be keyboard, a file, a network connection, output form another program or even some kind of generator that generates an endless sequence of data. Simply put, reading the input from file behaves the same as if someone in the background opened the file and typed its content on the keyboard really fast.
There are ways in Java to read the file in another ways (e.g. random access instead of sequential), but if you need to read the file from start to end, streams are a useful abstraction.

Handling write and read at the same time in java

I have a thread which will be continuously logging into a file. I have a function getLines() which when called will return last 100 lines of the log file.
My question is whether implementing a simple BufferedReader inside getLines() is enough ? I'm mainly concerned whether reading is valid when a write is going on. I don't mind missing a few lines of code which was written during the process of read though.
Thanks
Since Java FileOutputStream / FileInputStream open files in shared mode reading will not interfere with writing. Though in my view it would be better and more efficient to implement a logger that holds last 100 written lines and returns them on demand.

Need help using java threads to download file parts

I am trying to download a file from a server in a user specified number of parts (n). So there is a file of x bytes divided into n parts with each part downloading a piece of the whole file at the same time. I am using threads to implement this, but I have not worked with http before and do not really understand how downloading a file really works. I have read up on it and it seems "Range" needs to be used, but I do not know how to download different parts and being able to append them without corrupting the data.
(Since it's a homework assignment I will only give you a hint)
Appending to a single file will not help you at all, since this will mess up the data. You have two alternatives:
Download from each thread to a separate temporary file and then merge the temporary files in the right order to create the final file. This is probably easier to conceive, but a rather ugly and inefficient approach.
Do not stick to the usual stream-style semantics - use random access (1, 2) to write data from each thread straight to the right location within the output file.

reading a file while it's being written

I've read some posts on stackoverflow about this topic but I'm still confused. When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
EDIT: sorry, I should have mentioned that the file writing it is in C++ and the one reading it is in Java so variables can't really be shared easily
When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
The problem is that you can never be sure that the current last character of the file is the end of a line. If it is a line terminator, you are OK. If BufferedReader.readLine() will interpret it as a complete line without a line terminator ... and weird results will ensue.
What you need to do is to implement your own line buffering. When you get an EOF you wait until the file grows some more and then resume reading the line.
Alternatively, if you are using Java 7 or later, the file watcher APIs allow you to watch for file writes without polling the file's size.
By the way, there is an Apache commons class that is designed for doing this kind of thing:
http://commons.apache.org/io/api-2.0/org/apache/commons/io/input/Tailer.html
If I understand, the file is being written in C# in some process and another Java process wants to read it while it is being written.
Look at File Monitoring section on the tail command here. But I want to warn you that when I used the cygwin tail on Windows recently to follow log files that were rolling over, it sometimes failed under heavy load. Other implementations may be more robust.
To have a count of the number of lines, just keep a counter on the side that's doing the writing.
So, every time you write a line, increment a counter, and make that counter readable via a method, something like, public int getNumLinesWritten()
The obvious answer to me... Why not use a buffer? Use a string or whatever you need. (You could use a list/array of strings if you want, one for each line maybe?) Append to the string just as you would write to the file, then instead of reading from the file, read from that string. Would that work for you?

Reading a gz file and keeping track of position in file

So, here is the situation:
I have to read big .gz archives (GBs) and kind of "index" them to later on be able to retrieve specific pieces using random access.
In other words, I wish to read the archive line by line, and be able to get the specific location in the file for any such line. (so that I can jump directly to these specific locations upon request). (PS: ...and it's UTF-8 so we cannot assume 1 byte == 1 char.)
So, basically, what I just need is a BufferedReader which keeps track of its location in the file. However, this doesn't seem to exist.
Is there anything available or do I have to roll my own?
A few additional comments:
I cannot use BufferedReader directly since the file location corresponds to what has been buffered so far. In other words, a multiple of the internal buffer size instead of the line location.
I cannot use InputStreamReader directly for performance reasons. Unbuffered would be way to slow, and, btw, lacks convenience methods to read lines.
I cannot use RandomAccessFile since 1. it's zipped, and 2. RandomAccessFile uses "modified" UTF-8
I guess the best would be use a kind of of buffered reader keeping track of file location and buffer offset ...but this sounds quite cumbersome. But maybe I missed something. Perhaps there is already something existing to do that, to read files line by lines and keep track of location (even if zipped).
Thanks for tips,
Arnaud
I think jzran could be pretty much what you're looking for:
It's a Java library based on the
zran.c sample from zlib.
You can preprocess a large gzip
archive, producing an "index" that can
be used for random read access.
You can balance between index size and
access speed.
What you are looking for is called mark(), markSupported() and skip().
This methods are declared both in InputStream and Reader, so you are welcome to use them.
GZIP compression does not support seeking. Previous data blocks are needed to build compression tables...

Categories

Resources