Log4J able to recover from disk full?

Log4J able to recover from disk full? - java

We have several java application server running here, with several apps. They all log with Log4J into the same file system, which we created only for that reason.
From time to time it happens that the file system runs out of space and the app gets
log4j:ERROR Failed to flush writer,
java.io.IOException
Unfortunately Log4J does not recover from this error, so that even after space is freed in the file system, no more logs are written from that app. Are there any options, programming-wise or setting-wise, to get Log4J going again, besides restarting the app?

I didn't test this, but the website of logback states:
Graceful recovery from I/O failures
Logback's FileAppender and all its sub-classes, including
RollingFileAppender, can gracefully recover from I/O failures. Thus,
if a file server fails temporarily, you no longer need to restart your
application just to get logging working again. As soon as the file
server comes back up, the relevant logback appender will transparently
and quickly recover from the previous error condition.
I assume the same would be true for the above situation.

What do you see is an acceptable outcome here? I'd consider writing a new Appender that wraps whichever appender is accessing the disk, and tries to do something sensible when it detects IOExceptions. Maybe get it to wrap the underlying Appenders write methods in a try-catch block, and send you or a sysadmin an email.

limit the size of your logs and try using a custom appender to archive logs to a backup machine with lots of disk space.

Related

Log4j Console Appender Limitations

I am curious about the numbers around the limitations for the Log4j console appender. I have a service that processes messages using a thread pool and logs each event after processing. Before the thread pool approach, the service would just use the main thread for processing all of the messages. This took too long so I implemented a thread pool so each thread can process a subset of the messages as they are independent of each other.
However I started running into an issue where apparently the async queue is full and the threads would discard logs until the queue capacity would be available. I tracked down where this log iscoming from and it’s here due to the discarding policy: https://logging.apache.org/log4j/2.x/log4j-core/apidocs/src-html/org/apache/logging/log4j/core/async/DiscardingAsyncQueueFullPolicy.html#line.49
This is a problem as I need the logs and I need to use a console appender. I added a config to instead use the default policy so we don’t discard logs: https://logging.apache.org/log4j/2.x/log4j-core/apidocs/src-html/org/apache/logging/log4j/core/async/DefaultAsyncQueueFullPolicy.html#line.29
But now the issue is that processing the messages is taking too long, and it makes sense because now when the queue is full, the thread takes time to send logs to the console instead of returning and processing another batch of messages.
My questions:
Is there anything I can do to address this issue if I need to use a console appender? Would more cpu/memory help in this case for the threads?
Why exactly does the queue get so full quickly? Because when using the main thread to process ALL of the messages (so not batches) we don’t run into this issue but using the threads to batch process the messages we do? Also can we check the log4j queue size programmatically?
Can we configure the size of the log4j queue if we’re using a console appender?
Is there a logs/second figure for the max to expect using a console appender? so we can compare and see if we’re logging much more somehow.
We want to log the events to console, so we haven’t tried to use a different logger such as to a file. Would that be our only solution here if we are trying to log too many logs/second?

The reason your logging queue is filling up is that you increased the performance of your application too much :-). Now you have a bottleneck in the logging appender.
There are some benchmarks comparing a FileAppender to a ConsoleAppender in Log4j2 documentation. Your figures might vary, but due to the synchronization mechanisms in System.out, the console appender is around 20 times slower than the file appender. The bottleneck is not really the filesystem, since redirecting stdout to /dev/null gives similar results.
To improve performance, you can set the direct property on the console appender to true (cf. documentation). This basically replaces System.out with new FileOutputStream(FileDescriptor.out) and provides performance on par with the file appender.
You can fine tune the LMAX disruptor using Log4j2 properties (cf. documentation). E.g. log4j2.asyncLoggerRingBufferSize allows you to increase the size of the async queue (ring buffer in LMAX terminology).

Extending log4j/slf4j Logger

I am in a situation where multiple threads (from the same JVM) are writing to the same file (logging by using Logger).
I need to delete this file at some point, and next use of logger will create the file and log.
The logging library is synchronized, therefore I do not need to worry about concurrent logging to the same file.
But... I want to add an external operation which operates this file, and this operation is to delete the file, therefore I have to somehow synchronize the logging (Logger) with this delete operation because I do not want to delete the file while the Logger is doing work.
Things I thought of:
Use FileChannel.lock to lock the file, something Logger does, as well. I decided against this, because of this:
File locks are held on behalf of the entire Java virtual machine. They
are not suitable for controlling access to a file by multiple threads
within the same virtual machine.
Which means in my case (same JVM, multiple threads) this will not cause the effect I want.
What are my options?
Am I missing something vital here?
Perhaps there is a way to do this using the already existing stuff in the Logger?

It seems you are looking for log rolling and log archiving functionalities. Log rolling is a common feature in Log4j and Logback (SLF4j also).
You can configure the logging library to create a new log file based on size of the current file or the time of day. You can configure the file name format for the rolled file and then have the external process archive or delete old rolled log files.
You can refer to the Log4j 2 configuration given in this answer.

Filesystems are generally synchronized by the OS, so you can simply delete the file without having to worry about locks or anything. Depending on how log4j locks the file that delete process might fail though, and you need to add a retry-loop.
int attempts = 3;
final File logfile = new File(theLogFilePath);
while ((attempts > 0) && logfile.exists() && !logfile.delete()) {
--attempts;
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
attempts = 0;
}
}
This isn't exactly clean code, but then what you do isn't clean anyways. ;)
You interfere with the logging process rather rudely, but since also a user could delete that file at any time, log4j should handle that gracefully. Worst case my guess is that a message that was about to be logged will get lost, but that's probably not an issue considering that you simply delete the log-file anyways.
For a cleaner implementation see this question.

A trick I've used in the past when there is no other option (see Saptarshi Basu's log-rolling suggestion https://stackoverflow.com/a/53011323/823393) is to just rename the current log file.
After the rename, any outstanding logging that is queued up for it continues into the renamed one. Usually, any new log requests will create a new file.
All that remains is to clean up the renamed one. You can usually manage this using some external process or just delete any old log files whenever this process triggers.

Working of Appenders in Logging utilities (log4j) in the context of a web app

How do appenders work in the web-app when the log results are to be printed to a single file ?
Referring to the working of the servlets , each request is served by a different servlet thread , and in a web application , there are numerous of requests occurring in parallel.
How is logging handled when multiple instances of the servlets want to log to the same file ?
Is this problem handled like the critical section problem in shared resources ?
Since the operation involved is write and resource is single log file , would it not slow down the web app in order to serve the requests ?

It depends on the logging framework. In log4j, AppenderSkeleton.doAppend method is synchronized, only allowing one thread to log a single log statement at a time. The good news is that unless you are logging thousands of messages per second (and if you are, you are probably doing the wrong thing or using the wrong framework), this is not a problem.
Remember that the actual writing to the file might not necessarily occur immediately as most likely there's a buffer which makes the critical section turnaround rather quick.
The scenario you mentioned is not the only case why synchronization is necessary. For example, when logging statements, there can be multiple appenders configured. Logging system has to ensure that no appenders are added/removed within a single log event to avoid situation where the same event would be logged with one appender and wouldn't with another.

JBOSS Console Logging, recommended in production environment?

I've looking for an answer on this for a while. In the company I work we have a highly concurrent system but recently found that the logging configuration of the web server (Jboss) includes the Console appender. The application loggers are also going to the console. We started to get deadlocks on the logging actions, most of them to the console appender (I know that Log4j has a really nasty sincronization bug, but i'm almost sure that we dont have any sincronization method in the related code). Another thing founded, is that the IT Guys regularly access to the console with a putty console, put pauses to check the logs and then just close the putty window.
Is it possible that the console appender, and the use of console for logging and monitoring in a production environment, are causing deadlocks and race conditions on the system?. My understanding is that the console should be used only on development phases with an IDE, because on a highly concurrent system it will be another resource to get (slow because of unbuffered I/O) subject to race conditions.
Thanks.

From the Best practices for Performance tuning JBoss Enterprise Application Platform 5, page 9
Turn off console logging in production
Turn down logging verbosity
Use asynchronous logging.
Wrap debug log statements with If(debugEnabled())
I heavily recommend first and last considerations in production due to a bug in Log4J that calculates what to log before logging it i.e. if MyClass#toString() is a heavy operation, Log4J will first calculate this String before (yes, it will do the heavy operation) and then it will check if this String must be logged (pretty bad, indeed =).
Also, tell the IT guys to use the less command when checking the log files in order to not blocking the files, do not check the console directly =. This command works in Linux, if your server is on Unix environment, the command will be tail (based on #Toni's comment).
IMO, I think that an official JBoss performance guide is the best proof to stop using the console logging in production (even if this still doesn't prove your deadlock problem).

How to handle disk full errors while logging in logback?

I am using slf4j+logback for logging in our application. Earlier we were using jcl+log4j and moved recently.
Due to the high amount of logging in our application, there is a chance of disk being full in production environment. In such cases we need to stop logging and application should work fine. What I found from the web is that we need to poll logback StatusManager for such errors. But this will add a dependency with logback for the application.
For log4j, I found that we can create an Appender which stops logging in such scenarios. That again will cause a application dependency with log4j.
Is there a way to configure this with only slf4j or is there any other mechanism to handle this?

You do not have to do or configure anything. Logback is designed to handle this situation quite nicely. Once target disk is full, logback's FileAppender will stop writing to it for a certain short amount of time. Once that delay elapses, it will attempt to recover. If the recovery attempt fails, the waiting period is increased gradually up to a maximum of 1 hour. If the recovery attempt succeeds, FileAppender will start logging again.
The process is entirely automatic and extends seamlessly to RollingFileAppender. See also graceful recovery.
On a more personal note, graceful recovery is one my favorite logback features.

You may try extending the slf4j.Logger class, specifically the info, debug, trace and other methods and manually query for the available space (via File.getUsableSpace() ) before every call.
That way you will not need any application dependency

2 real options:
add a cron task on linux (or scheduled one on windows) to clean up your mess (incl. gzip some, if need be).
buy a larger hard disk and manually perform the maintenance
+-reduce logging
Disk full is like OOM, you can't know what fails 1st when catch it. Dealing w/ out of memory (or disk) is by preventing it. There could be a lot of cases when extra disk space could be needed and the task failed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.