What would be a good way to read new entries in a text file that is continuously getting updated by another thread without having too much system strain?
Only new entries are being appended that I want to read line by line, process them, and then output them in a new file.
The log file could grow a few GBs big and have around 5 million lines before being deleted.
Both the log file and this code will run on 64-bit Windows 7 with Java 7 (if that matters).
I am very new to programming, so would someone be willing to provide some sample code? Thanks.
Apache Commons has a Tailer class.
Other than that, I guess you could open a BufferedReader, and sleep if readLine comes back null (in other words, the end of the file has been reached).
But the Apache Tailer class is much better, handles rolling log files and such.
Related
In a Clojure program, how do you read from standard out? I want to do that, or pipe the standard output, to an input stream that I create. The standard output in Clojure is a java.io.PrintWriter .
I have a Samza job, started by a Clojure program. There's also an nrepl server to which I can remotely connect. After connecting, I need to be able to tap into and tail standard out (to which jobs write their output).
1) As per this SO question, with-out-str (see here) lets us temporarily bind *out* (to a java.io.StringWriter), so that your executed code writes to a string. But that doesn't let me tap into the existing *out*.
2) If you look at clojure.java.shell (see here), it gets the JVM's Runtime and exec's a Process on it. From that process, you can get its standard output stream. But again, that's not the default standard out (*out*) I'm looking for.
3) This SO question gets close to what I'm trying to do. But again, I'm connecting to an existing process, and want to tail out its standard output.
Is this possible in Clojure (see here)? Has anyone solved this?
Process output is not a publish subscribe model, so in effect when a process puts a character into it's output buffer, exactly one process gets to pull it off that buffer. If you have a program that was started by a shell that shell process if reading it's output and writing it to a terminal (or reading and ignoring it). If you attach your process after the process that started it and start trying to grab the data, you will most likely not get anything because the parent process will get it first. I just tried this from two terminals:
Terminal 1:
cat
Terminal 2:
ps -ef | grep cat
tail -f /proc/24547/fd/2
Terminal 1:
hello
Terminal 2:
< nothing >
The string "hello" printed to terminal 1, the process that started it.
It's tempting then to say "well what if nobody reads the output, then it will be there for me to get". While this sounds good it runs into the problem that these are fixed sized buffers, so as soon as the output buffer is full the process that is trying to write to it blocks (is prevented from running at all) until someone reads the output to unblock it.
The general solution is to pipe the process you want to tail later to the tee command which writes the output to a file and passes it to whatever was reading it.
command-to-watch arg1 arg2 | tee logfile.potentially-huge
Though if you go this route you should rotate the log file before your disk fills. Make sure you empty the log file with exactly this command
echo > logfile.potentially-huge
or use your program to make a truncate call to the file. simply deleting the file will remove it's name from the log directory without deleting it, it will silently continue to grow taking up disk space and the new file will get no output ever.
This is basically why we built log libraries like log4j (in the 90s) and syslog (in the 80s).
If you still want to get hackish crazy on this, turn to tmux, it can do anything, and changes the way people work with text. In all seriousness you should change the way the other process creates it's output to make it easier to get.
I've got a huge csv file that keeps increasing forever [although sometimes it gets reset], I know that's not good but unfortunately I can't change the design since it's another application that keeps adding stuff there.
I have split this file into new smaller files considering that new stuff is going to appear in that csv file every time.For example, one csv file for each 1000 values or something like that.
I'm thinking about writing a small program to do it and run it periodically via Windows Scheduled Tasks, is it the best way of fixing this problem? If so, Can you help me with the code [Java, VB, C#...]. If it's not the best solution, which path should I follow?
I have file reader object in Java.
When we load the file in, is entire file loaded or only file pointer will be loaded like in C. Because I have application (producer) wherein it keeps on writing to the file and other application (consumer) will start to read the file after few minutes, I want to read all file data which will be written by the application producer.
I tried to search but could not get the answer.
Best and canonical approach would be to use Java NIO memory mapped files. Use MappedByteBuffer.force() on each write to insure the consumer process flushes the dirty page files. Consumers (in other process if necessary) can map the same file in read mode.
A FileReader works a lot like a FileInputStream, which in turn works a lot like its C analogue -- when you first open a file, very little data will actually be loaded immediately. The file's contents generally aren't read in til you first read some of it, at which point the runtime reads enough to fill the request, or the buffer if you're using a BufferedReader (possibly plus enough to fill up a block, depending on the OS and JVM).
When i say it "works like its C analogue", though, i mean it. A FileReader is opened for reading, and a FileWriter is opened for writing. There could be issues with having a file open for reading and writing at the same time. Windows, in particular, isn't too fond of it. If you care about portability, you may want to find another way -- alphazero's answer sounds promising, but i've never tried it.
I don't understand what you mean when you wrote, "when we load the file in". Sample code would be helpful here. Please see How to Ask.
If you've called anything that sounds like "read" on your Reader then you've got data in memory. If this is text data (assuming you're using FileReader) then you're better off using a BufferedReader. Not only is it buffered, it's more convenient for text files.
If I have the answer all wrong then please post some code so I know what you're talking about. And welcome!
I am generating a log file and what i want is that i want to read the data periodically without having to read from the beginning each time. can anyone help.
Open the file and have a loop which,
get the size and compare with the size you have read already.
if the size has grown, read that many bytes and no more. Doing this means you can read more later.
if the size has shrink, close the file and start again.
You can use FileInputStream or RandomAccessFile.
use unix command 'tail', the option '-f' and '-F' is for the same command is very handy as well.
See here http://www.thegeekstuff.com/2009/08/10-awesome-examples-for-viewing-huge-log-files-in-unix/ for examples or just google around for examples.
If you want to Run a program to read your log file periodically then you can use schedulers like, Quartz Scheduler to run it periodically.
RandomAccessFile is a good option. If you leave the application you will have to persist the place of your last read before leaving, in order to avoid rereading information.
Log files, on the other hand, tend to become quite large for heavy event flow. Rotating log files will allow you to shift your problem a little towards file naming. Your can configure your system to produce one log file per day like here:
app_access.2011-11-28.log,
app_access.2011-11-29.log,
app_access.2011-11-30.log,
...
If the files you get are still very large, you may rotate them by date and time and you will have also the hour as part of the file name. Your files could then rotate, let's say, every three hours or even every hour. This will give you more log files to read, but they will be smaller, thus easier to process. The date and time range you want to seek for will be part of the file name.
You could also additionally rotate by file size. If you select a maximum file size you can deal with you could avoid accessing randomly a huge file completely.
I'm trying to edit configuration file in Java. What I really need to do is to change single line, so reading the whole file and writing it back would be waste of time, since configuration file can be big.
Is there a more efficient way to do this? Except reading in/editing/writing out file. I thouhgt of converting entire file to string, replacing the line I want and writting it back.
I don't know how efficient would that be, can someone give me some other suggestions or the one I mentioned are ok, execution time is important.
I would recommend to use the Preferences API instead. Then on the Windows platform your preferences is stored in the registry. On other platforms the corresponding way to save application preferences is used. See also Preferences API Overview.
How big of a configuration file are we talking here? 1k lines? 10k? 1m lines? If the line you want to edit is the last line, just seek to the start of the line, truncate the file there and write the new one. If it's not... you will need to read it whole and write it again.
Oh, and the 2 options you mention are actually the same (read/edit/write).
On the third hand, I think it's irrelevant (unless you have weird constraints, like a flash storage device which takes too long to write, and has limited write cycles), given the sizes of most config files.