factors to consider before dropping a column from a table

factors to consider before dropping a column from a table - java

We are supporting deletion of columns from a table through java layer. What are the factors that should be considered before doing so like
1.no of instances in table
2.behavior of different database vendors etc

Are you sure you're doing the right thing? Unless you're using throwaway tables for people to mess around with or use for studying, this sounds like a not so good design.
Once tables are defined, their column number shouldn't change. Otherwise you'd get denormalized tables; foreign keys can break and all hell breaks loose unless you've got pretty good constraints placed on your columns.
Tracking what columns exist and what queries you can execute will put much more burden on your JDBC code than needed.
This isn't a Java or JDBC question, it's more of a database design question. You should speak with your DBA about this.

constraints on the columns. If you have PKs or FKs on the columns then they may not (or cannot) be dropped (easily) depending on db vendor. While the cols may not be 'used' by the user, they may be dependent on by other cols/tables.
Also, total agree with duffymo. VERY dangerous to allow users to chose to drop cols.
Oracle does have the ability to restore dropped cols, but really, do you want to go down that path?
Auto generated drop statements have always been fraught with peril.

The biggest question has little to do with the database and everything to do with clients that use it. Your Java app might be able to check to see if there are any non-null entries in the column, but it can't tell which clients are expecting the column to be there for SELECTs and UPDATEs.
I don't know your precise use case, but I'd say this is usually an activity for a DBA and not a user of your app. I'd advise caution.

Related

Table data overrides

I'm currently sourcing some static data from a third party. It's a simple one-to-many, like this
garage:
id
name
desc
location
garage_price:
id
garage_id
price_type
price
Sometimes, the data is incorrect, and I will need to correct it. At the same time, I'd like to preserve the original sourced data somewhere and potentially run some queries to show the changes.
My question is whether someone is doing something like this with SQL, Java and Hibernate, and what's the approach you've taken, or would take.
I could add a boolean column, "original_data", to both tables, and before an update happens, run a trigger to copy the row from garage or garage_price into an "original_garage" or "original_price" table as long as original_data is true. Then set original_data to false, and all further updates will just happen on the garage/garage_price tables.
Anything wrong with that approach, and how do people typically work with multiple tables with the same data in Hibernate/JPA? Previously, I'd create a class that holds all the data, and subclass it twice, once per each table, while setting
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
on the parent.

As so often there are various options:
Use Hibernate Envers. It will keep a complete history of changes, so if you do multiple changes each will result in a row in the auditing tables. These tables are separate from your main data tables which might be a pro or a con, depending on your requirements.
Use the approach that you described: Write the original dataset, copy it before modifying it. You'll need two additional attributes:
A flag marking the original and a technical id do have a unique primary key.
Just as the second version, but you could actually do that in a trigger in the database. Which probably is faster, works no matter how the data gets inserted and to copy rows in the database is actually really easy, while it feels rather cumbersome in Java. Of course, writing triggers is considered a PITA in itself by many Java developers. If your application doesn't usually use triggers and stored procedures it is also really easy to forget about the trigger and being rather confused where these additional rows come from.

~4K rows insertion to DB per user - design & performance

I'm writing an application that allows each user to label English words in three categories (some lexical exercise).
The main DB table, Word, contains ~4K different rows of words.
The Label table contains 3 labels.
--> The Word-Label table (that contains 3 columns: word_id, label_id, user_id) will add 4K rows per user (let's assume all the words starts with some pre-defined label when user register to the system).
The problem is that the table will grow very fast. 1:4000 (user/row) is bad in my opinion.
What can you suggest here to eliminate such a huge table? I've read that table-per-user is also considered bad practice.
In addition, I'm using Spring & Hibernate and the 4K insertions after the user get registered for the first time is pretty tough and takes time.
I can consider some NoSQL solution or another tool than Hibernate, but I'm consisting to use Spring & Java - so suggest something properly.
Will be glad for your help here!

There is no issue with data size. You may have an issue with Hibernate, but that is another issue.
If you end up with thousands of users, you'll have a few tens of millions of rows. That is not a large number of rows. If you want to insert default labels for a new user, then the code would look something like this:
insert into userLabels (userId, wordId, label)
select :userId, w.wordId, <default label>
from words w;
I would be surprised if this took more than a second or two.
If you knew that you would be having millions of users, then size might be more of an issue. The best solution would require better understanding of the application. The solution might vary from partitioning the tables, using arrays, or coming up with a different structure for representing your data.
You probably want various indexes on your tables to speed performance, but that depends on the queries you want to run. You might consider using a native interface to the database. Your use-case doesn't seem particularly complicated, so I don't know what advantage Hibernate or similar layers gets you.

First approach, you will just add new row to word-label for user after action. So, not every user will probably have 4k rows in that table. Now, when your database - query and stuff around that functionality will be a problem (bottleneck) then try to fix the issue and improve performance.
There are many performance tricks in sql databases you can use. For example, you wrote about table per user. That's not quite the best solution, next example, in mysql, u can create table patitions and it will be handled as one table but with performance improvement.
Second approach, for this type of data, of cource some NoSQL like MongoDB would perform great.

you could encode the user responsse-map into a 4000 entry bit-array, or string if you don't need the relational capabilities of the database
then it would be one record per user.
create table user_words (userid int, wiorddata text);
insert into user_words values (1,'YNYYNmmmYY'/* ... */ );
you application would need to have the list of words and kniow which wird each character refers to.

Adding custom fields in my application

I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.

I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.

Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.

Exploring user specific data in webapps

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?

Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

Advanced database modification "script" - how to do it

I have to go through a database and modify it according to a logic. The problem looks something like this. I have a history table in my database and I have to modify.
Before modifying anything I have to look at whether an object (which has several rows in the history table) had a certain state, say 4 or 9. If it had state 4 or 9 then I have to check the rows between the currently found row and the next state 4 or 9 row. If such a row (between those states) has a specific value in a specific column then I do something in the next row. I hope this is simple enough to give you an idea. I have to do this check for all the objects. Keep in mind that any object can be modified anywhere in its life cycle (of course until it reaches a final state).
I am using a SQL Sever 2005 and Hibernate. AFAIK I can not do such a complicated check in Transact SQL! So what would you recommend for me to do? So far I have been thinking on doing it as JUnit test. This would have the advantage of having Hibernate to help me do the modifications and I would have Java for lists and other data structures I might need and don't exist in SQL. If I am doing it as a JUnit test I am not loosing my mapping files!
I am curious what approaches would you use?

I think you should be able to use cursors to manage the complicated checks in SQL Server. You didn't mention how frequently you need to do this, but if this is a one-time thing, you can either do it in Java or SQL Server, depending on your comfort level.
If this check needs to be applied on every CRUD operation, perhaps database trigger is the way to go. If the logic may change frequently over the time, I would much rather writing the checks in Hibernate assuming no one will hit the database directly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.