Exploring user specific data in webapps

Exploring user specific data in webapps - java

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?

Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

Related

Table data overrides

I'm currently sourcing some static data from a third party. It's a simple one-to-many, like this
garage:
id
name
desc
location
garage_price:
id
garage_id
price_type
price
Sometimes, the data is incorrect, and I will need to correct it. At the same time, I'd like to preserve the original sourced data somewhere and potentially run some queries to show the changes.
My question is whether someone is doing something like this with SQL, Java and Hibernate, and what's the approach you've taken, or would take.
I could add a boolean column, "original_data", to both tables, and before an update happens, run a trigger to copy the row from garage or garage_price into an "original_garage" or "original_price" table as long as original_data is true. Then set original_data to false, and all further updates will just happen on the garage/garage_price tables.
Anything wrong with that approach, and how do people typically work with multiple tables with the same data in Hibernate/JPA? Previously, I'd create a class that holds all the data, and subclass it twice, once per each table, while setting
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
on the parent.

As so often there are various options:
Use Hibernate Envers. It will keep a complete history of changes, so if you do multiple changes each will result in a row in the auditing tables. These tables are separate from your main data tables which might be a pro or a con, depending on your requirements.
Use the approach that you described: Write the original dataset, copy it before modifying it. You'll need two additional attributes:
A flag marking the original and a technical id do have a unique primary key.
Just as the second version, but you could actually do that in a trigger in the database. Which probably is faster, works no matter how the data gets inserted and to copy rows in the database is actually really easy, while it feels rather cumbersome in Java. Of course, writing triggers is considered a PITA in itself by many Java developers. If your application doesn't usually use triggers and stored procedures it is also really easy to forget about the trigger and being rather confused where these additional rows come from.

Adding custom fields in my application

I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.

I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.

Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.

Is there any heuristic/pattern for logging user actions

I have a GWT/Java/Hibernate/MySQL application (but I think any web pattern could be valid) that do a CRUD on several objects. Each object is stored in a table in the database. I want to implement an action logger. For example for Object A I want to know who created it and modified it, and for User B, what actions did he perform.
My idea is to have a History table that stores : UserId, ObjectId, ActionName. The UserId and ObjectId are foreign keys. Am I on the right track ?

I also think this is the right direction.
However, bare in mind that in an application with lots of traffic, this logs can become overhead.
I would suggest the following in this case -
A. Don't use hibernate for this "action logging" - Hibernate has better performance for "mostly read DB"
B. Consider DB that is better in "mostly write" scenario for the action logging table.
You can try to look for a NoSQL solution for this.
C. If you use such NoSQL DB, but still want to keep the logging actions in the relational DB, have an offline process that runs once in a day for example), that will query your "action logging DB" and will insert it to the relational DB.
D. If it's ok that your system might lose some action logging, consider using producer/consumer pattern (for example - use a queue between producer and consumer thread) - the threads that need to log actions will not log them synchronously, but will log them asynchronously.
E. In addition, don't forget that such logging table has the potential to be over-flooded in time, causing queries on it to take a long time. For these issues consider the following:
E.1. Every day remove really old logs - let's say - older than month, or move them to some "backup" table.
E.2 Index some fields that you mostly use for action logging queries (for example - maybe an action_type) field.

If only changes to specific fields, e.g., something like status in a users table, should be tracked, I would use a user_status_histories table being referenced from the users table via foreign key. The user_status_histories table would contain fields such as current_status, date and something like admin_who_modified_the_status.
Whenever a status change is made, a new record would be inserted into the user_status_histories table. This would allow easy querying of all status changes.
Of course, querying a user would then require a (LEFT or INNER) JOIN with the user_status_histories table in order to get the last record (= the current status).
Depending on your needs, you might think of a current_status field in the users table (besides the status serving as foreign key) for fast access, which would be maintained parallel to the user_status_histories table.

Yes you are. Another very similar framework is one which supports undo and redo. These frameworks track user actions and have the additional ability to restore state to the way it was before the user action.

Is it a bad practice to expose DB internal IDs in URLs?

Is it a bad practice to expose DB internal IDs in URLs?
For example, suppose I have a users table with some IDs (primary key) for each row. Would exposing the URL myapp.com/accountInfo.html?userId=5, where 5 is an actual primary key, be considered a "bad thing" and why?
Also assume that we properly defend against SQL injections.
I am mostly interested in answers related to the Java web technology stack (hence the java tag), but general answers will also be very helpful.
Thanks.

That bases on the way you parse the URL. If you allow blind SQL injections that is bad. You have to only to validate the id from the user input.
Stackexchange also puts the id of the row into the URL as you can see in your address bar. The trick is to parse the part and get did of all possible SQL. The simples way is to check that the id is a number.

It isn't a bad thing to pass through in the URL, as it doesn't mean much to the end user - its only bad if you rely on that value in the running of your application. For example, you don't want the user to notice that userId=5 and change it to userID=10 to display the account of another person.
It would be much safer to store this information in a session on the server. For example, when the user logs in, their userID value is stored in the session on the server, and you use this value whenever you query the database. If you do it this way, there usually wouldn't be any need to pass through the userID in the URL, however it wouldn't hurt because it isn't used by your DB-querying code.

To use the database ID in URLs is good, because this ID should never change in an objects (db rows) life. Thus the URL is durable - the most important aspect of an URL. See also Cool URIs don't change.

Yes it is a bad thing. You are exposing implementation detail. How bad? That depends. It forces you to do unneeded checks of the user input. If other applications start depending on it, you are no longer free to change the database scheme.

PKs are meant for the system.
To the user, it may represent a different meaning:
For e.g.
Let's consider following links. Using primary-key,it displays an item under products productA, productB,productC;
(A)http://blahblahsite.com/browse/productA/111 (pkey)
(B)http://blahblahsite.com/browse/productB/112 (pkey)
(C)http://blahblahsite.com/browse/productC/113 (pkey)
User on link B may feel there are 112 items under ProductB, which is misleading.
Also it will cause problem while merging tables since PK will be auto-incremented.

What are the best practices to separate data from users

For a customer we where developing a big application that where open to all users if you will, meaning, all users could see each others data.
Now suddenly the customer is saying that they want only users belonging to the same organization to be able to view each others data.
So we came up with this data model:
So now the question is: How is it best to separate the data?
This is the only alternative I see:
SQL JOIN on ALL relevant tables (All tables that have data should no always join on Organization)
-- All queries should now add an extra join to Organization, and if the join doesn't exists, we need to create a new foreign key.
But I feel an extra join (We have around 20 tables that needs extra join) is quite costly.
I hope there are some other best practices or solutions we can consider.
PS: This is a Web application developed using Java/JSF/Seam (but I don't know if that is relevant)
UPDATE
I want to clarify something. My consurn is not security but performance. We have added the foreign key to organization to all relevant tables that has shared data, and we are using user's logged in organization to filter the data.
All I want to know is if this is a good architectural solution (inner join) or if we should do something else (ie: Load all shared data, and filter in memory instead of sql join).

You really have to understand the difference between the persistency layer and the application layer.
It doesn't matter how you define your database tables, as anyone with database access will have access to all the users data. What does matter is how you define the behavior in your application.
Changing the database design should only be done for performance reasons, not for security - which should be handled in the application.

I would reckon that the best pattern would be to only expose the user details through the web application, so at that point its a case of restricting the data exposed to each user. This will allow you to build in the required security inside the application.
Alternatively if you are allowing direct database access then you will need to create a login/user (depends on database used) for each organization or user and then restrict the access of these login/user entities to parameterized stored procedures rather than the base tables. This will push security back onto the database, which is riskier but still do-able.
As to meta changes to support the organization column, parameterizing the stored procedures will be fairly trivial:
select #organizationId = organizationId from User where User.id = #currentUserId
select * from User where organizationId = #organizationId
(depending on the sql flavour you will need to enclose some entities eg ``User, [User] etc)

I see no reason that Organization has to be 'joined' at all.
If your 'data' tables all have OrganizationID columns, then you can lookup the 'organizationID' from the user and then add this as a condition to the join.
EX:
select #OrganizationId = organizationId from User where User.id = #currentUserId
select * from datatable a .... where .... AND a.organizationID = #organizationID
See; no join.
With respect to performance, there are different types of joins, and SQLServer allows you to hint at the type of join. So in some cases, a merge join is the best, whereas in something like this scenario, a loop join would be the best. Not sure if these choices are available in MySQL.
With respect to all of your tables needing a join, or condition (see above), there is a logical answer, and an implementation answer. The implementation answer depends on your indexing. If you can limit the dataset the most by adding that condition, then you will benefit. But if the join with the other table that has already been filtered does a better job at reducing rows, then the condition will be worthless (or worst case, it will use the wrong index). Assuming you have indexes on your join and condition columns.
Logically, only data that isn't fully dependent on a table that is filtered by organizationID needs that extra condition. If you have a car table, and carparts table, then you only have to filter the car table. Unless for some reason you don't need to join with the car table for some joins, in which case you will need that organizationID on the parts table too.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.