I've read some posts on stackoverflow about this topic but I'm still confused. When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
EDIT: sorry, I should have mentioned that the file writing it is in C++ and the one reading it is in Java so variables can't really be shared easily
When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
The problem is that you can never be sure that the current last character of the file is the end of a line. If it is a line terminator, you are OK. If BufferedReader.readLine() will interpret it as a complete line without a line terminator ... and weird results will ensue.
What you need to do is to implement your own line buffering. When you get an EOF you wait until the file grows some more and then resume reading the line.
Alternatively, if you are using Java 7 or later, the file watcher APIs allow you to watch for file writes without polling the file's size.
By the way, there is an Apache commons class that is designed for doing this kind of thing:
http://commons.apache.org/io/api-2.0/org/apache/commons/io/input/Tailer.html
If I understand, the file is being written in C# in some process and another Java process wants to read it while it is being written.
Look at File Monitoring section on the tail command here. But I want to warn you that when I used the cygwin tail on Windows recently to follow log files that were rolling over, it sometimes failed under heavy load. Other implementations may be more robust.
To have a count of the number of lines, just keep a counter on the side that's doing the writing.
So, every time you write a line, increment a counter, and make that counter readable via a method, something like, public int getNumLinesWritten()
The obvious answer to me... Why not use a buffer? Use a string or whatever you need. (You could use a list/array of strings if you want, one for each line maybe?) Append to the string just as you would write to the file, then instead of reading from the file, read from that string. Would that work for you?
Related
In Java 8 I see new method is added called lines() in Files class which can be used to read a file line by line in Java. Does it work for huge files? I mean can we load first 1000 lines then second set of 1000 lines. I have huge file with 1GB, Will it work?
Could someone share code snippet how to use it?
Does it work for huge files? [...] I have huge file with 1GB, Will it
work?
As far as I can see it should work well for big files as well (but I haven't tried):
try(Stream<String> lines = Files.lines(path)){
lines.filter(...).map(...)....foreach(...);
}
I mean can we load first 1000 lines then second set of 1000 lines.
How many lines are read at one time is implementation specific to Files.lines (which probably uses a BufferedReader, but I might be wrong).
From the API (embolden by me)
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
This very strongly suggests that you can use this on any arbitrarily sized file, assuming your code doesn't hold all of the content in memory.
For a project I am working on, I am trying to count the vowels in text file as fast as possible. In order to do so, I am trying a concurrent approach. I was wondering if it is possible to concurrently read a text file as a way to speed up the counting? I believe the bottleneck is the I/O, and since right now I am reading the file in via a buffered reader and processing line by line, I was wondering if it was possible to read multiple sections of the file at once.
My original thought was to use
Split File - Java/Linux
but apparently MappedByteBuffers are not great performance wise, and I still need to read line by line from each MappedByteBuffer once I split.
Another option is to split after reading a certain number of lines, but that defeats the purpose.
Would appreciate any help.
The following will NOT split the file - but can help in concurrently processing it!
Using Streams in Java 8 you can do things like:
Stream<String> lines = Files.lines(Paths.get(filename));
lines.filter(StringUtils::isNotEmpty) // ignore empty lines
and if you want to run in parallel you can do:
lines.parallel().filter(StringUtils::isNotEmpty)
In the example above I was filtering empty lines - but of course you can modify it to your use (counting vowels) by implementing your own method and calling it.
I have a thread which will be continuously logging into a file. I have a function getLines() which when called will return last 100 lines of the log file.
My question is whether implementing a simple BufferedReader inside getLines() is enough ? I'm mainly concerned whether reading is valid when a write is going on. I don't mind missing a few lines of code which was written during the process of read though.
Thanks
Since Java FileOutputStream / FileInputStream open files in shared mode reading will not interfere with writing. Though in my view it would be better and more efficient to implement a logger that holds last 100 written lines and returns them on demand.
This is my understanding regarding reading a file using BufferedReader in java. Please correct me if I am wrong somewhere...
Recently I had a requirement where we are required to read a file multiple times.
The usual way which I use is setting a mark() and doing a reset. But the input parameters to
a mark is an integer and it cannot accept a long number. Is there a way in which we can read the file, a large number of times.
In c++ we can do a seekg on the fstream and read the contents once again irrespective of the number of times we want to do so. Is there anything in java which is of this nature.
Just close the file and read it again.
But review your requirement. Why can't you process it in one pass?
Not much of a good answer but if you want to do random reading and writing then you can use Channels in java.nio package.
BufferedReader is for reading a file when you logically see it as a series of records and records are generally accessed sequentially.
Channels allow you to view your file as a series of blocks. Blocks are meant to be read randomly. :)
Using subclass of channel, FileChannel, you can read what you want from wherever you want. You need to specify two things:
Where to read from.
How much to read.
It has a read(dst,pstn) where dst is a ByteBuffer and pstn is a long position.
Don't worry that it is abstract because you use it via Files.newByteChannel() which does all the voodoo needed to make it work :)
I hope there do have an operation for this topic,'cause I don't want to loop the file once again,and hope to read the file from the specific location say a line number,and then I will read the file with much more threads than just one.
Any idea?
Thanks first!!
To my knowledge there isn't anything like this in the standard Java API. You could use LineIterator or (even just a basic BufferedReader) to build a custom class that does what you need, like this guy did.
Note that a RandomAccessFile sounds promising but unfortunately for you, the seek() method takes an offset in bytes and not in lines so unless your lines are all always the same length, this wont' work for you.