I'm looking for a way to prevent some sensitive data from being logged.
Ideally i would like to prevent / capture things like
String sensitive = "";
log.info ("This should be prevented or caught by something : {} ", sensitive);
this post is a bit of a longshot, I'm willing to investigate on any lead.
annotation, new types, Sonar Rules, logger hacking etc...
thx for your brainstorming :)
guillaume
Create custom type for it.
Make sure that toString doesn't return actual content.
I imagine there are multiple ways to do this, but one way is to use the Logback configuration file, to specify a message provider for the "arguments" and "message". In those providers, you define a "writeTo" method that looks for particular patterns in the output, and masks them.
This is the path to a solution, but I obviously don't provide many details here. I'm not aware of any "standard" solutions for this.
Another possibility would avail itself if your architecture has services running in transient containers, and the log output is sent to a centralized log aggregator, like Splunk. If you were ok with the initial logs written in the container having sensitive data, you could have the log aggregator look for patterns to mask for.
I would recommend two options, can you split your PII data into a separate log and then log that data securely?
If not, consider something like Cribl Logstream. Point your log shipper at it and let it strip away any PII you are concerned about. LogStream makes it very very easy to remove/mask/encrypt sensitive data. It has all sorts of other features as well.
At my last job we used LogStream as the router to make decisions about the data based on the content. PII data was detected and one copy was pushed to a secure PII certified logging platform and another copy was pushed to the operational logging platform but the PII data was masked so a wider audience could use the logging with no risk. It was a very useful workflow that solved a log of problems.
Related
We're currently running into an interesting problem regarding the sanitization of error logs being printed into our server logs. We have proper global error handling set up and have custom error messages that are sent back as responses from our OSGi java servlets.
We use dockerized containers as server instances that are autoscaled, so we're thinking about setting up a log aggregator and storing our exceptions within a DB in the cloud, that way we can also track metrics about our exceptions and pinpoint how we could improve our development process to reduce certain types of errors, etc.
I did a bit of research about how that should be done and I found this. The OWASP Logging sheet cheat. It mentions that passwords should never be logged among a few other things. That brings us to my question:
How do I go about properly sanitizing my logs without using some janky text processing or manually covering up all the potential cases?
Example stacktrace:
pkg.exceptions.CustomException: some registration error
ERROR: duplicate key value violates unique constraint "x_username_org_id_key"
Detail: Key (username, org_id)=(SOME EMAIL, 1) already exists.
Query: with A as (some query) insert into someTable (..values...) Parameters: [X, X, X, X, X, SOME_EMAIL, THE_PASSWORD]
at somepkg.etc
This is a pretty common error with registration systems that happens due to username collisions. Sure there's ways that this specific case can be avoided by ensuring the username isn't taken before the insertion isn't attempted and handling that case separately, but that's just a single case among many others.
After looking around to find a solution there doesn't seem to be an obvious way to solve the problem and I'm wondering if everyone out there has simply implemented their own version of a log sanitizer? We could simply purge the stacktrace if some troublesome strings are present, but that's not the best solution. Any suggestions?
If you only store and pass around password hashes you won't need to sanitize the logs for passwords. In cases where a password must be preserved temporarily in code use char[]s rather than Strings. This is a more secure approach in general and is considered a best practice. The standard library APIs all use character arrays for passwords.
I would like to create rolling files that contains statistical data about my service.
for example, logging each request that contained parameter X with a certain result set.
I have to write these files to comply with other systems statistical data:
Roll the file every half an hour
Each file have to have column headers
I have to follow a strict file name convention such as tracking.display.1314116577.done
My service is written in Java.
Since I need to roll files, using loggers seems like a good direction so I have tried an approach where I would log the data using logback logger (my logger of choise), but the conventional rolling file appender cannot role the file every half an hour (or am i wrong?), cannot add column headers and has a strict naming convention of its own.
I have tried to write my own RollingPolicy, but can't find enough resources or examples of how it's done.
Can anyone show/refer me how to accomplish this?
If not, would you recommend a different approach?
Thank you!
Yes, you can do it with a logback appender.
Take a look at TimeBasedRollingPolicy of RollingFileAppender you can easily roll the file each half hour.
To write the header you can extend RollingFileAppender and add the header based on your needs.
A very dumb answer: getting logging right costs much time. So it is easier to implement your own logger from scratch. It seems you have very rigid requirements. For instance that ".done" - you probably first write it as ".part" and then rename it when done.
I am using ActiveMQ, Spring.
Is there any way by which I can keep track of all processed messages. I have to keep track what all messages has been processed. I also want to review these processed messages at later stage.
Should i use database for this?
Is there any good library that can make this operation easy
I do not want to make table in database for every kind of model object
In general, I would suggest that you either log and/or record the messages into the database. If you simply want to review the messages later, simple logging may suffice. If you need to do transactional rollup/searching through a UI, then the database is better.
However, you can also achieve what you want with ActiveMQ virtual destinations. With this, you can have 1 destination forward to 2 other destinations. Then your app could listen on 1 destination, and a copy of the message would sit on the other for your review. For example:
<broker persistent="false" useJmx="false" xmlns="http://activemq.apache.org/schema/core">
<destinationInterceptors>
<virtualDestinationInterceptor>
<virtualDestinations>
<compositeQueue name="MY.QUEUE">
<forwardTo>
<queue physicalName="MY.QUEUE.PROCESS" />
<topic physicalName="MY.QUEUE.REVIEW" />
</forwardTo>
</compositeQueue>
</virtualDestinations>
</virtualDestinationInterceptor>
</destinationInterceptors>
</broker>
Would define a queue MY.QUEUE where each message would end up in BOTH the .PROCESS and .REVIEW queues.
I would use a database.
Perhaps you could use an ORM such as Hibernate, but JDBC or SpringTemplates may be better.
Rather than making a separate table for each model object, make a 'message' table and serialize the uncommon portions into a payload blob (or text). You could then use a utility to deserialize the message for review (or playback) later.
Kartik,
This is a good programming question, but it's more of a what should I program question rather than a "How can I do this" question. It's hard to answer a "What should I program question" because what you should program depends directly on what you need. At best, we can only guess at what you really need.
If you need to update the processed JMS messages, then a database will make it easy to update. If you need to prove that nobody updated a "logged" entry, then a database might not do the job.
Let's say this log is used to see which very slow to process messages still need to complete. Then a database will provide easy searching, provided that the person searching knows SQL. However, if the log is more of an archive, then the database just adds overhead to the entire process, a structured file will do.
In Java there is JDBC for writing and retrieving to databases, and it is not a hard API to use. Then again, there is also a number of decent logging frameworks, and of course there is always FileOutputStream. Without knowing how this log is to be used, it is very difficult to determine which techniques are really overkill, likewise it's not possible to know which techniques are not quite enough.
Go back and review how the log is to be used, and then evaluate if the features that databases provide are overkill.
Cheers,
Ed
In GWT one typically loads i18n strings using a interface like this:
public interface StatusMessage extends Messages {
String error(String username);
:
}
which then loads the actual strings from a StatusMessage.property file:
error=User: {0} does not have access to resource
This is a great solution, however my client is unbendable in his demand for putting the i18n strings in a database so they can be changed at runtime (though its not a requirement that they be changed realtime).
One solution is to create a async service which takes a message ID and user locale and returns a string. I have implemented this and find it terribly ugly (it introduces a huge amount of extra communication with the server, plus it makes property placeholder replacement rather complicated).
So my question is this, can I in some nice way implement a custom message provider that loads the messages from the backend in one big swoop (for the current user session). If it can also hook into the default GWT message mechanism, then I would be completely happy (i.e. so I can create a interface like above and keep using the the nice {0}, {1}... property replacement format).
Other suggestions for clean database driven messages in GWT are also welcome.
GWT's in-built Dictionary class is the best way to move forward. Here's the official documentation on how to use it.
Let's say your application has 500 messages per locale at an average of 60 chars per message. I wouldn't think twice about loading all of these when the user logs in or selects his language: it's <50k of data and should not be an issue if you can assume broadband connectivity being available...your "one swoop" suggestion. I already do that in one GWT application, although it's not messages, but properties that are read from the database.
i think you might find this article useful:
http://googlewebtoolkit.blogspot.com/2010/02/putting-test-data-in-its-place.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+blogspot/NWLT+(Google+Web+Toolkit+Blog)&utm_content=Google+Reader
What you could do is set up a TextResource and then, you could just change the text at runtime. I haven't tried this but I am very confident that this would work.
To optimize the performance, you can put your messages in a js resource, for example: http://host.com/app/js/messages.js?lang=en, then map this resource to a servlet which will take the messages dictionary from your cache (a singleton bean, for instance) and write it to the response.
To optimize even more, you can:
- put a parameter to the resource URL, for example: .../messages.js?lang=en&version={last updated date of messages}
- {last updated date of messages} is stored somewhere in DB
- whenever user updates the messages, {last updated date of messages} will change
- in the response to browser, set Cache-control as you want to tell browser to cache your messages.
I have a basic facility for allowing users to remotely apply changes to the logging files in my application. Some logs are configured using java.util.logging properties files, and some are configured using log4j/log4cplus-style properties files. I'd like to do some basic validation of the properties that users try to apply. Namely, I want to assure the following:
Every logging.properties file must always contain at least a root logger/logging level
The logger/level must be set to a valid value. That is, they should not be able to set .level = GIBBERISH or anything like that.
I'll probably allow them to set MaxFileSize and MaxBackupIndex (log4j), and .limit and .count properties (java.util.logging), too.
What's the best way to accomplish this? I can obviously just loop over the keys and values in a Properties object and look for their values in a hard-coded Map or some other data structure that tells what valid properties are, but I'm trying to come up with a solution that's a little more elegant than that.
The problem with running any set of partial syntax checks against the properties files is that they'll always be inadequate by definition unless you capture every partial variation acceptable by the logging system, in which case you'll have recreated a portion of the logging system. No matter what properties you choose to validate theres bound to be additional ways to submit broken files.
Rather than testing for individual properties, why not create an additional (temporary, for the scope of the check only) logger object based on the input file and detect if it throws an error?
The "elegant" solution would be to write a rule-based engine for checking sets of name-value pairs. But IMO that is totally over the top for this use-case ... unless the checks are far more complex than I imagine.
I'd say that the simple (inelegant) solution is best in this case.