Reading a log file which gets rolled over - java

I am trying to use a simple program to read from a log file. The code used is as follows:
RandomAccessFile in = new RandomAccessFile("/home/hduser/Documents/Sample.txt", "r");
String line;
while(true) {
if((line = in.readLine()) != null) {
System.out.println(line);
} else {
Thread.sleep(2000);
The code works well for new lines being added to the log file but it does not replicate the rollover process. i.e. when the content of the log file is cleared I expect the java console to continue reading text from the first line newly written to the log. Could that be possible? What changes need to be made to the existing code to achieve that?

At my work I had to deal with the processing of logs that can be rolled over without missing any data. What I do is store a tiny memo file that contains:
A hash of the first 1024 bytes (or less) of the log (I used SHA-1 or something because it's easy)
The number of bytes used to generate the hash
The current file position
I close the log file after processing all lines, or some maximum number of lines, and update the memo file. I sleep for a tiny bit and then open the log file again. This allows me to check whether a rollover has occurred. A rollover is detected when:
The current file is smaller than the last file position
The hash is not the same
In my case, I can use the hash to find the correct log file, and work backwards to get up to date. Once I know I've picked up where I left off in the correct file, I can continue reading and memoizing my position. I don't know if this is relevant to what you want to do, but maybe that gives you ideas.
If you don't have any persistence requirements, you probably don't need to store any memo files. If your 'rollover' just clears the log and doesn't move it away, you probably don't need to remember any file hashes.

I am sorry... My Bad.. I don't want it to go blank.. I just want the next new line written to the log to be read.
Since what you need is able to read from beginning when you file is cleared, you will need to monitor the length of file and reset the cursor pointer when length of file reduces. You can reset the cursor using seek(..) method.
See code below -
RandomAccessFile in = new RandomAccessFile("/home/hduser/Documents/Sample.txt", "r");
String line;
long length = 0;//used to check the file length
while (true) {
if(in.length()<length){//new condition to reset position if file length is reduced
in.seek(0);
}
if ((line = in.readLine()) != null) {
System.out.println(line);
length = in.length();
} else {
Thread.sleep(2000);
}
}

it does not replicate the rollover process. i.e. when the content of the log file is cleared I expect the java console to continue reading text from the first line newly written to the log. Could that be possible?
Struggling with this as well. +1 to #paddy for the hash idea.
Another solution (depending on your operating system) is to use the use the inode of the file although this may only work under unix:
Long inode = (Long)Files.getAttribute(logFile.toPath(), "unix:ino");
This returns the inode of the underlying file-system associated with log-file. If the inode changes then the file is a brand new file. This assumes when the log is rolled over that it is moved aside and the same file is not written over.
To make this work you would record the inode of the file you are reading then check to see if the inode has changed if you haven't gotten any new data in some period of time.

Related

Java File is disappearing from the path /tmp/hsperfdata_*username*/

This is very confusing problem.
We have a Java-application (Java8 and running on JBoss 6.4) that is looping a certain amount of objects and writing some rows to a File on each round.
On each round we check did we receive the File object as a parameter and if we did not, we create a new object and create a physical file:
if (file == null){
File file = new File(filename);
try{
file.createNewFile();
} catch (IOException e) {e.printStackTrace();}}
So the idea is that the file get's created only once and after that the step is skipped and we proceed straight to writing. The variable filename is not a path, it's just a file name with no path so the file gets created to a path jboss_root/tmp/hsperfdata_username/
edit1. I'll add here also the methods used from writing if they happen to make relevance:
fw = new FileWriter(indeksiFile, true); // append = true
bw = new BufferedWriter(fw);
out = new PrintWriter(bw);
.
.
out.println(..)
.
.
out.flush();
out.close(); // this flushes as well -> line above is useless
So now the problem is that occasionally, quite rarely thou, the physical file disappears from the path in the middle of the process. The java-object reference is never lost, but is seems that the object itself disappears because the code automatically creates the file again to the same path and keeps on writing stuff to it. This would not happen if the condition file == null would not evaluate to true. The effect is obviously that we loose the rows which were written to the previous file. Java application does not notice any errors and keeps on working.
So, I would have three questions which are strongly related for which I was not able to find answer from google.
If we call method File.CreateNewFile(), is the resulting file a permanent file in the filesystem or some JVM-proxy-file?
If it's permanent file, do you have any idea why it's disappearing? The default behavior in our case is that at some point the file is always deleted from the path. My guess is that same mechanism is deleting the file too early. I just dunno how to control that mechanism.
My best guess is that this is related to this path jboss_root/tmp/hsperfdata_username/ which is some temp-data folder created by the JVM and probably there is some default behavior that cleans the path. Am I even close?
Help appreciated! Thanks!
File.createNewFile I never used in my code: it is not needed.
When afterwards actually writing to the file, it probaby creates it anew, or appends.
In every case there is a race on the file system. Also as these are not atomic actions,
you might end up with something unstable.
So you want to write to a file, either appending on an existing file, or creating it.
For UTF-8 text:
Path path = Paths.get(filename);
try (PrintWriter out = new PrintWriter(
Files.newBufferedWriter(path, StandardOpenOption.CREATE, StandardOpenOption.APPEND),
false)) {
out.println("Killroy was here");
}
After comment
Honestly as you are interested in the cause, it is hard to say. An application restart or I/O (?) exceptions one would find in the logs. Add logging to a specific log for appending to the files, and a (logged) periodic check for those files' existence.
Safe-guard
Here we are doing repeated physical access to the file system.
To prevent appending to a file twice at the same time (of which I would expect an exception), one can make a critical section in some form.
// For 16 semaphores:
final int semaphoreCount = 16;
final int semaphoreMask = 0xF;
Semaphore[] semaphores = new Semaphore[semaphoreCount];
for (int i = 0; i < semaphores.length; ++i) {
semaphores[i] = new Semaphore(1, true); // FIFO
}
int hash = filename.hashcode() & semaphoreMask ; // toLowerCase on Windows
Semaphore semaphore = semaphores[hash];
try {
semaphore.aquire();
... append
} finally {
semaphore.release();
}
File locks would have been a more technical solution, which I would not like to propose.
The best solution, you perhaps already have, would be to queue messages per file.

Writing and reading a file from two processes

You have:
A process (READER) that opens a text file (TEXTFILE), reads all the lines until the EOF and waits for new lines to appear.
The READER is implemented in Java and the waiting part uses java.nio.file.WatchService, which if I understand correctly on Linux uses inotify. I am not sure which is more relevant to the question.
The implementation is quite simple (exception handling and some ifs left out for brevity):
WatchService watcher;
watcher = FileSystems.getDefault().newWatchService();
Path logFolder = Paths.get("/p/a/t/h");
logFolder.register(watcher, ENTRY_MODIFY);
reader = Files.newBufferedReader("TEXTFILE", Charset.forName("US-ASCII"));
key = watchService.take();
for (WatchEvent<?> event : key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
doSomethingWithTheNewLine(reader.readLine());
}
Now, if I run READER and
Open TEXTFILE in an editor, add a line and save it, the result is that the READER doesn't seem to get the new line
If, on the other hand, I do something like this in bash
while true; do echo $(date) ; sleep 2; done >> TEXTFILE
then the READER does get the new lines
EDIT:
As far as I can see, the difference here that may matter is that in the first case, the editor loads the content of the file, closes it (I assume), and on saving it opens the file again and synchronizes the content with the file system, while the bash line keeps the file opened... how would that make any difference, I am not sure
I suppose the simple question is why???
They way I understood scenario like this is that Linux is using some sort of locking when >1 processes need access to the same file on filesystem at the same time. I also thought that when a process A opens a file descriptor to a file at time t0, it gets let's say a snapshot of what the file content was at t0. Even if the process A doesn't close the file descriptor (which is what seems to be the case here) and a process B appends to that file at some tome t0 + delta, then the process A would have to reopen the file descriptor to see the changes, it cannot hold to the same file descriptor and get new data being appended to that file... though it's obvious that what I've observed contradicts that assumption....

BufferedReader was never closed, but file was able to delete

Recently, I reviewed our application code, and I found one issue in our code.
/**
* truncate cat tree(s) from the import file
*/
private void truncateCatTreesInFile(File file, String userImplCode) throws Exception
{
String rowStr = null, treeCode = null;
BufferedReader reader = new BufferedReader(new FileReader(file));
rowStr = reader.readLine(); // skip 1st row - header
Impl impl;
List<String> row = null;
Set<String> truncatedTrees = new HashSet<String>();
while ((rowStr = reader.readLine()) != null)
{
row = CrudServiceHelper.getRowFromFile(rowStr);
if (row == null) continue;
impl = getCatImportImpl(row.get(ECatTreeExportImportData.IMPL.getIndex()), userImplCode);
treeCode = row.get(ECatTreeExportImportData.TREE_CODE.getIndex());
if(truncatedTrees.contains(treeCode)) continue;
truncatedTrees.add(treeCode);
CatTree catTree = _treeDao.findByCodeAndImpl(treeCode, impl.getId());
if(catTree!= null) _treeDao.makeTransient(catTree);
}
_treeDao.flush();
}
Looking at the above code, the "reader" was never closed, I was thinking it could be an issue, but actually, it just works fine, the file is able to delete by tomcat.
javax.servlet.context.tempdir>
[java] 2013-03-27 17:45:54,285 INFO [org.apache.struts2.dispatcher.Dispatch
er] -
Basically, what I am trying to do is uploading one file from browser, and generate sql based on the file to insert data into our database. After all done, delete the file.
I am surprised this code works fine, does anybody have an idea here? I tried to google it, but I did not get any idea.
Thanks,
Jack
Not closing a reader may result in a resource leak. Deleting an open file may still be perfectly fine.
Under Linux (and other Unix variants) deleting a file if just unlinking a name from it. A file without any names left gets actually freed. So opening a file, deleting it (removing its name) and then reading and writing to it is a well-known way to obtain a temporary file. Once the file is closed, the space is freed, but not earlier.
Under Windows, certain programs lock files they read, this prevents other processes from removing such a file. But not all programs do so. I don't have a Windows machine around to actually test how does Java handle this.
The fact that the code does not crash does not mean that the code works completely correctly. The problem you noticed might become visible only much later, if the app just consumes more and more RAM due to the leak. This is unlikely, though: the garbage collector will eventually close readers, and probably soon enough, because reader is local and never leaks out of the method.

Using PrintWriter with files

In case I have the following code:
private PrintWriter m_Writer;
m_Writer = new PrintWriter(new FileWriter(k_LoginHistoryFile));
I am writing to a local file on server which name is k_LoginHistoryFile.
Now, as my program runs it doing writings to this file so how can I delete all the file content between each writes?
I think it is important as I don't want to write to a file which will eventually have current updated information on its beginning + not up to date info at its end.
Thanks in advance
This expression:
new FileWriter(k_LoginHistoryFile)
will truncate your file if it exists. It won't just overwrite the start of the file. It's not clear how often this code is executing, but each time it does execute, you'll start a new file (and effectively delete the old contents).
I think it is important as I don't want to write to a file which will eventually have current updated information on its beginning + not up to date info at its end.
If you want to keep a running output file (and you can't keep the file open), consider this constructor: FileWriter(String, boolean)
If the boolean is true, your updated information will be at the end of the file.

Check if archives are identical

I'm using a shell script to automatically create a zipped backup of various directories every hour. If I haven't been working on any of them for quite some time, this creates alot of duplicate archives. MD5 hashes of the files don't match, because they do have different filenames & creation dates etc.
Other than making sure there won't be duplicates in the first place, another option is checking if filesizes match, but that doesn't necesseraly mean they are duplicates.
Filenames are done like so;
Qt_2012-03-15_23_00.tgz
Qt_2012-03-16_00_00.tgz
So maybe it would be an option to check if files have identical filesizes consequently (if that's the right word for it.)
Pseudo code:
int previoussize = 0;
String previouspath = null;
String Filename = null;
String workDir = "/path/to/workDir ";
String processedDir = "/path/to/processedDir ";
//Loop over all files
for file in workDir
{
//Match
if(file.size() == previoussize)
{
if(previouspath!=null) //skip first loop
{
rm previouspath; //Delete file
}
}
else //No Match
{
/*If there's no match, we can move the previous file
to another directory so it doesn't get checked again*/
if(previouspath!=null) //skip first loop
{
mv previouspath processedDir/Filename;
}
}
previoussize = file.size();
previouspath = file.path();
Filename = file.name();
}
Example:
Qt_2012-03-15_23_00.tgz 10KB
Qt_2012-03-16_00_00.tgz 10KB
Qt_2012-03-16_01_00.tgz 10KB
Qt_2012-03-16_02_00.tgz 15KB
Qt_2012-03-16_03_00.tgz 10KB
Qt_2012-03-16_04_00.tgz 10KB
If I'm correct this would only delete the first 2 and the second to last one. The third and the fourth should be moved to the processedDir.
So I guess I have 2 questions:
Would my pseudo code work the way I intend it to? (I find these things rather confusing.)
Is there a better/simpler/faster way? Because even though the chance of accidentally deleting non-identicals like that is very small, it's still a chance.
I can think of a couple of alternatives:
Deploy a version control system such as Git, Subversion, etc, and write a script that periodically checks in any changes. This will save a lot of space because only files that have actually changed get saved, and because changes to text files will be stored as diffs.
Use an incremental backup tool. This article lists a number of alternatives.
Normal practice is to put the version control system / backups on a different machine, but you don't have to do that.
Not clear if this need to run as a batch. If it's manual, you can run BeyondCompare or any decent comparison tool to diff the two archives

Categories

Resources