Is there any heuristic/pattern for logging user actions - java

I have a GWT/Java/Hibernate/MySQL application (but I think any web pattern could be valid) that do a CRUD on several objects. Each object is stored in a table in the database. I want to implement an action logger. For example for Object A I want to know who created it and modified it, and for User B, what actions did he perform.
My idea is to have a History table that stores : UserId, ObjectId, ActionName. The UserId and ObjectId are foreign keys. Am I on the right track ?

I also think this is the right direction.
However, bare in mind that in an application with lots of traffic, this logs can become overhead.
I would suggest the following in this case -
A. Don't use hibernate for this "action logging" - Hibernate has better performance for "mostly read DB"
B. Consider DB that is better in "mostly write" scenario for the action logging table.
You can try to look for a NoSQL solution for this.
C. If you use such NoSQL DB, but still want to keep the logging actions in the relational DB, have an offline process that runs once in a day for example), that will query your "action logging DB" and will insert it to the relational DB.
D. If it's ok that your system might lose some action logging, consider using producer/consumer pattern (for example - use a queue between producer and consumer thread) - the threads that need to log actions will not log them synchronously, but will log them asynchronously.
E. In addition, don't forget that such logging table has the potential to be over-flooded in time, causing queries on it to take a long time. For these issues consider the following:
E.1. Every day remove really old logs - let's say - older than month, or move them to some "backup" table.
E.2 Index some fields that you mostly use for action logging queries (for example - maybe an action_type) field.

If only changes to specific fields, e.g., something like status in a users table, should be tracked, I would use a user_status_histories table being referenced from the users table via foreign key. The user_status_histories table would contain fields such as current_status, date and something like admin_who_modified_the_status.
Whenever a status change is made, a new record would be inserted into the user_status_histories table. This would allow easy querying of all status changes.
Of course, querying a user would then require a (LEFT or INNER) JOIN with the user_status_histories table in order to get the last record (= the current status).
Depending on your needs, you might think of a current_status field in the users table (besides the status serving as foreign key) for fast access, which would be maintained parallel to the user_status_histories table.

Yes you are. Another very similar framework is one which supports undo and redo. These frameworks track user actions and have the additional ability to restore state to the way it was before the user action.

Related

Table data overrides

I'm currently sourcing some static data from a third party. It's a simple one-to-many, like this
garage:
id
name
desc
location
garage_price:
id
garage_id
price_type
price
Sometimes, the data is incorrect, and I will need to correct it. At the same time, I'd like to preserve the original sourced data somewhere and potentially run some queries to show the changes.
My question is whether someone is doing something like this with SQL, Java and Hibernate, and what's the approach you've taken, or would take.
I could add a boolean column, "original_data", to both tables, and before an update happens, run a trigger to copy the row from garage or garage_price into an "original_garage" or "original_price" table as long as original_data is true. Then set original_data to false, and all further updates will just happen on the garage/garage_price tables.
Anything wrong with that approach, and how do people typically work with multiple tables with the same data in Hibernate/JPA? Previously, I'd create a class that holds all the data, and subclass it twice, once per each table, while setting
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
on the parent.
As so often there are various options:
Use Hibernate Envers. It will keep a complete history of changes, so if you do multiple changes each will result in a row in the auditing tables. These tables are separate from your main data tables which might be a pro or a con, depending on your requirements.
Use the approach that you described: Write the original dataset, copy it before modifying it. You'll need two additional attributes:
A flag marking the original and a technical id do have a unique primary key.
Just as the second version, but you could actually do that in a trigger in the database. Which probably is faster, works no matter how the data gets inserted and to copy rows in the database is actually really easy, while it feels rather cumbersome in Java. Of course, writing triggers is considered a PITA in itself by many Java developers. If your application doesn't usually use triggers and stored procedures it is also really easy to forget about the trigger and being rather confused where these additional rows come from.

Data structure/Java Technique for managing a list of sequential commands

I'm not sure if something special exists for this use case - but it felt like a case where someone was likely to have made some sort of useful structure/technique/design-pattern.
My Situation
I have a set of SQL commands executed from middle tier (Java) to insert/update/delete data to any of a set of very large tables via joins from a related staging table.
I have more SQL commands which update various derived tables based on the staging table/actual table contents. Different tables will interact with different derived tables via different queries (as usual). These commands may have to be interleaved with the first set depending on the use case - so, I can't necessarily execute set 1 then set 2 all at once.
My Question
So, I need to build a chain of commands that get executed sequentially, and I need to trigger a rollback if any of them fail. I'd like to do this in the most clear, documented way possible.
Does anyone know a standard way of coding this? I'm sure anyone migrating from stored procedure code to middle tier code has done this before and I don't want to reinvent the wheel if there are good options out there.
Additional Information
One of my main concerns is making everything clear. To elaborate, I'll have a set of queries specifically designed to:
Truncate staging table A' and populate it with primary keys targeting deletion records
Delete from actual table A based on join with A'
Truncate staging table A' and populate it with full data for upserts
Update/Insert records from A' to A based on joins
The same logic will apply to tables B, C, D, etc. Unfortunately, it can be the case where just A and C need an extra step, like syncing deletes to a certain derived table, to be done after the deletions but before the upserts.
I'd obviously like to group all the logic for updating a table, and I'd like to group all the logic for updating a derived table as well, but at execution time they have to be intelligently interleaved and this sounds messy to me.
Don't write such a thing yourself. This is what JTA was born for.
You can use either JPA or Spring to do it.
Annotate the unit of work as transactional and let the database and JDBC handle it.
If you must do it yourself, follow the aspect-oriented approach and make it a decorative "before & after" implementation.

Exploring user specific data in webapps

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?
Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

How to build a change tracking system - not audit system

I have a requirement in which I need to capture data changes (not auditing) and life cycle states on inventory.
Technology:
Jave, Oracle, Hibernate + JPA
For the data changes, we have been given a list of data elements that are to be monitored. If the element changes we are to notify a given 3rd party vendor. What I want to do is make this a generic service that we can provide to any of our current and future 3rd party vendors.
We don't care who made the change or what the new value is just that it changed.
The thought is that the data layer of our application would use annotation on each of the data elements. If that data element changed, then it would place a message into a queue. The message bean would then read the queue and make an entry in a table.
Table to look something like the following:
Table Name: ATL_CHANGE_TRACKER
Key columns
INVENTORY_ID Inventory Id of the vehicle
SALEEVENT_ITEM_ID SaleEvent item of the vehicle
FIELD_CHANGED_ID Id of the field that got changed or action. Link to subscription
UPDATE_DTM Indicates the date time when change occured.
For a given inventory, we could have up to 200 entries in this table (monitoring 200 fields across many tables).
Then a daemon for the given 3rd party would then read from this table based on the fields that it has subscribed to (could be all the fields). It would then read what every table it is required to to create the message to be sent to the 3rd party. Decouple the provider of the data and the user of the data.
Identify the list of fields/actions that are available
Table Name: ATL_FIELD_ACTION
Key columns
ID
NAME Name of the field/action - Example Color,Make
REC_CRE_TIME_STAMP
REC_CRE_USER_ID
LAST_UPDATE_USER_ID
LAST_UPDATE_TIME_STAMP
Subscription table, if 3rd Party company xyz is interested in 60 fields, the 60 fields will be mapped to this table.
ATL_FIELD_ACTION_SUBSCRIPTION
Key columns
ATL_FIELD_ACTION_ ID ID of the atl_field_action table
CONSUMER 3rd Party Name
FUNCTION Name of the 3rd Party Transmission that it is used for
STATUS
REC_CRE_TIME_STAMP
REC_CRE_USER_ID
LAST_UPDATE_USER_ID
LAST_UPDATE_TIME_STAMP
The second part is that there will be actions on the life cycle of the inventory which will need to be recored also. In this case, when the state of the inventory changes a message will be placed on the same queue and that entry will be entered in the same table.
Again, the daemon will have subscribed to these states and will collect the ones it is interested in.
The goal here is to not have the business tier/data tier care who wants the data - just that it needs to provide it so those interested can get it.
Wonder if anyone has done something like this - any gotchas - off the shelf - open source solutions to do this.
For a high-level discussion on the topic, I would suggest reading this article by Martin Fowler.
Its sounds like you have write-once, read-many type of data, it might produce large volumes of data, and the data is different for different clients. If you ask me, it sounds like this may be a good place to make use of either a NOSQL database or hack your Oracle database to act as a NOSQL database. See here for a discussion on how someone did this with MySQL.
Otherwise, you may look at creating an "immutable" database table and have Hibernate write new records every time it does an update as described here.
Couple things.
First, you get to do all of this work yourself. The JPA/Hibernate lifecycle listeners, while they have an event for when an update occurs, you aren't passed the "old" object and the "new" object. So, you're going to have to keep track of what fields change using some other method.
Second, again with lifecycle listeners, be careful inside of them, as the transaction state is a bit murky. At least on Glassfish/EclipseLink, I've had "strange" problems using either the JPA or JMS from a lifecycle listener. Just weird behavior. We went to a non-transactional queue to capture all of our information that we track from the lifecycle events.
If having the change data committed on its own transaction is acceptable, then there is value is pushing the data on to a faster, internal queue (which can feed a listener that posts it to an MDB). This just gets the auditing "out of band" with your transaction, give you better transaction throughput. But if you need to have the change information committed with the same transaction, this won't work. For example, you could put something on the queue and then the transaction may be rolled back (for whatever) reason, leaving the change on the queue showing it happened, when it in fact failed. That's a potential issue with this.
But if you're posting a lot of audit information, then this can be a concern.
If the auditing information has a short life span (with respect to the rest of the data), then you should probably make an effort to cull the audit tables, they can get pretty large.
Also, if practical, don't disregard the use of DB triggers for this. They can be quite efficient and effective at this process.

database audit table

I have an existing application that I am working w/ and the customer has defined the table structure they would like for an audit log. It has the following columns:
storeNo
timeChanged
user
tableChanged
fieldChanged
BeforeValue
AfterValue
Usually I just have simple audit columns on each table that provide a userChanged, and timeChanged value. The application that will be writing to these tables is a java application, and the calls are made via jdbc, on an oracle database. The question I have is what is the best way to get the before/after values. I hate to compare objects to see what changes were made to populate this table, this is not going to be efficient. If several columns change in one update, then this new table will have several entries. Or is there a way to do this in oracle? What have others done in the past to track not only changes but changed values?
This traditionally what oracle triggers are for. Each insert or update triggers a stored procedure which has access to the "before and after" data, which you can do with as you please, such as logging the old values to an audit table. It's transparent to the application.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:59412348055
If you use Oracle 10g or later, you can use built in auditing functions. You paid good money for the license, might as well use it.
Read more at http://www.oracle.com/technology/pub/articles/10gdba/week10_10gdba.html
"the customer has defined the table structure they would like for an audit log"
Dread words.
Here is how you would implement such a thing:
create or replace trigger emp_bur before insert on emp for each row
begin
if :new.ename = :old.ename then
insert_audit_record('EMP', 'ENAME', :old.ename, :new.ename);
end if;
if :new.sal = :old.sal then
insert_audit_record('EMP', 'SAL', :old.sal, :new.sal);
end if;
if :new.deptno = :old.deptno then
insert_audit_record('EMP', 'DEPTNO', :old.deptno, :new.deptno);
end if;
end;
/
As you can see, it involves a lot of repetition, but that is easy enough to handle, with a code generator built over the data dictionary. But there are more serious problems with this approach.
It has a sizeable overhead: an
single update which touches ten
field will generate ten insert
statements.
The BeforeValue and AfterValue
columns become problematic when we
have to handle different datatypes -
even dates and timestamps become
interesting, let alone CLOBs.
It is hard to reconstruct the state
of a record at a point in time. We
need to start with the earliest
version of the record and apply the
subsequent changes incrementally.
It is not immediately obvious how
this approach would handle INSERT
and DELETE statements.
Now, none of those objections are a problem if the customer's underlying requirement is to monitor changes to a handful of sensitive columns: EMPLOYEES.SALARY, CREDIT_CARDS.LIMIT, etc. But if the requirement is to monitor changes to every table, a "whole record" approach is better: just insert a single audit record for each row affected by the DML.
I'll ditto on triggers.
If you have to do it at the application level, I don't see how it would be possible without going through these steps:
start a transaction
SELECT FOR UPDATE of the record to be changed
for each field to be changed, pick up the old value from the record and the new value from the program logic
for each field to be changed, write an audit record
update the record
end the transaction
If there's a lot of this, I think I would be creating an update-record function to do the compares, either at a generic level or a separate function for each table.

Categories

Resources