Save as object or just a row in database?

Save as object or just a row in database? - java

As a practice project, I'm creating a calorie tracking app. Part of that requires creating individual food items that can be saved for reuse. A single food item would consist of a name, a calorie count, a serving size, etc. I'm thinking about the best way to save these food items, and it seems reasonable to connect the project to a database. Specifically I'm using MySQL.
The question I have then is whether each food item should simply exist in storage as a row in a database with a column for each field (name, calories, etc), or if I should create a foodItem class with an instance for each food item. This seems redundant, but at the same time, if I don't create an object, it seems to me this would be counter to OOP principles and database use would almost always replace class/object creation. Couldn't a user just write straight into the database without creating an object? What am I missing here?

As a design practice, I like to have "software objects" such as FoodItem which, among other things, know how to "persist themselves" in an underlying database. They know how to populate themselves, and how and when to update the database. As much as possible, all of the details of "database representation" are hidden within the objects' implementations.
The actual manner in which you represent the data in the database also depends very much on your reporting requirements. Think about the sort of summary statistics, catalogs, and other things which you might need to produce from these data using separate tools.

Related

Table data overrides

I'm currently sourcing some static data from a third party. It's a simple one-to-many, like this
garage:
id
name
desc
location
garage_price:
id
garage_id
price_type
price
Sometimes, the data is incorrect, and I will need to correct it. At the same time, I'd like to preserve the original sourced data somewhere and potentially run some queries to show the changes.
My question is whether someone is doing something like this with SQL, Java and Hibernate, and what's the approach you've taken, or would take.
I could add a boolean column, "original_data", to both tables, and before an update happens, run a trigger to copy the row from garage or garage_price into an "original_garage" or "original_price" table as long as original_data is true. Then set original_data to false, and all further updates will just happen on the garage/garage_price tables.
Anything wrong with that approach, and how do people typically work with multiple tables with the same data in Hibernate/JPA? Previously, I'd create a class that holds all the data, and subclass it twice, once per each table, while setting
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
on the parent.

As so often there are various options:
Use Hibernate Envers. It will keep a complete history of changes, so if you do multiple changes each will result in a row in the auditing tables. These tables are separate from your main data tables which might be a pro or a con, depending on your requirements.
Use the approach that you described: Write the original dataset, copy it before modifying it. You'll need two additional attributes:
A flag marking the original and a technical id do have a unique primary key.
Just as the second version, but you could actually do that in a trigger in the database. Which probably is faster, works no matter how the data gets inserted and to copy rows in the database is actually really easy, while it feels rather cumbersome in Java. Of course, writing triggers is considered a PITA in itself by many Java developers. If your application doesn't usually use triggers and stored procedures it is also really easy to forget about the trigger and being rather confused where these additional rows come from.

database or ObjectOutputStream, Object specific member or actual object for reference

I'm working on an application for a pharmacy , basically this application has a class "item" and another class "selling invoices" which logs selling processes .
So my question here if the pharmacy is expected to have about ten thousand products in stock, and I'm storing these products in a linked list of type Item, and storing the invoices in linked list also , then on closing the app i save them using object output stream and reload them upon the start, Is it a bad practice ? Have I to use database instead?
My second question is, if i continue on using linkedlist and object output stream , what is better for performance and memory, storing the actual item as a field member in the invoice class or just its ID and then getting the item upon recalling using this ID reference, so what's better ?
Thanks in advance .

It is a bad idea to use ObjectOutputStream like that.
Here are some of the reasons:
If your application crashes (or the power fails) before you "save", then all changes are lost.
Saving "all objects" is expensive.
Serialized objects are opaque. It is only practical to look at them from Java code.
Serialized objects are fragile. If your application classes change, you may find that old serialized objects can no longer be read. That's bad enough, but now consider what happens if your client wants to look at pharmacy records from 5 years ago ... from a backup tape.
Serialized objects provide no way of searching ... apart from reading all of the objects one at a time.
Designs which involve reading all objects into memory do not scale. You are liable to run out of memory. Or compromise on your requirements to avoid running out of memory.
By contrast:
A database won't lose any changes have been committed. They are much more resilient to things like application errors and system level failures.
Committing database changes is not as expensive, because you only write data that has changed.
Typical databases can be viewed, queried, and if necessary repaired using an off-the-shelf database tool.
Changing Java code doesn't break the database. And for some schema changes, there are ways to migrate the database schema and records to match an updated database.
Databases have indexes and query languages for implementing efficient search.
Databases scale because the primary copy of the data is on disk, not in memory.

How to store database data with lots of attributes into cache?

Let's say that I have a table with columns TABLE_ID, CUSTOMER_ID, ACCOUNT_NUMBER, PURCHASE_DATE, PRODUCT_CATEGORY, PRODUCT_PRICE.
This table contains all purchases made in some store.
Please don't concentrate on changing the database model (there are obvious improvement possibilities) because this is a made-up example and I can't change the actual database model, which is far from perfect.
The only thing I can change is the code which uses the already existing database model.
Now, I don't want to access the database all the time, so I have to store the data into cache and then read it from there. The problem is, my program has to support all sorts of things:
What is the total value of purchases made by customer X on date Y?
What is the total value of purchases made for products from category X?
Give me a list of total amounts spent grouped by customer_id.
etc.
I have to be able to preserve this hierarchy in my cache.
One possible solution is to have a map inside a map inside a map... etc.
However, that gets messy very quickly, because I need an extra nesting level for every attribute in the table.
Is there a smarter way to do this?

Have you already established that you need a cache? Are you sure the performance of your application requires it? The database itself can optimize queries, have things in memory, etc.
If you're sure you need a cache, you also need to think about cache invalidation: is the data changing from beneath your feet, i.e. is another process changing the data in the database, or is the database data immutable, or is your application the only process modifying your data.
What do you want your cache to do? Just keep track of queries and results that have been requested so the second time a query is run, you can return the result from the cache? Or do you want to aggressively pre calculate some aggregates? Can the cache data fit into your app memory or do you want to use ReferenceMaps for example that shrink when memory gets tight?
For your actual question, why do you need maps inside maps? You probably should design something that's closer to your business model, and store objects that represent the data in a meaningful way. You could have each query (PurchasesByCustomer, PurchasesByCategory) represented as an object and store them in different maps so you get some type safety. Similarly don't use maps for the result but the actual objects you want.
Sorry, your question is quite vague, but hopefully I've given you some food for thoughts.

Exploring user specific data in webapps

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?

Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

Should I always retrieve full object from a database?

This is a very simple question that applies to programming web interfaces with java. Say, I am not using an ORM (even if I am using one), and let's say I've got this Car (id,name, color, type, blah, blah) entity in my app and I have a CAR table to represent this entity in the database. So, say I have this need to update only a subset of fields on a bunch of cars, I understand that the typical flow would be:
A DAO class (CarDAO) - getCarsForUpdate()
Iterate over all Car objects, update just the color to say green or something.
Another DAO call to updateCars(Cars cars).
Now, isn't this a little beating around the bush for what would be a simple select and update query? In the first step above, I would be retrieving the entire object data from the database: "select id,name,color,type,blah,blah.. where ..from CAR" instead of "select id,color from CAR where ...". So why should I retrieve those extra fields when post the DAO call I would never use anything other than "color"? The same applies to the last step 3. OR, say I query just for the id and color (select id,color) and create a car object with only id and color populated - that is perfectly ok, isn't it? The Car object is anemic anyway?
Doesn't all this (object oriented-ness) seem a little fake?

For one, I would prefer that if the RDBMS can handle your queries, let it. The reason is that you don't want your JVM do all the work especially when running an enterprise application (and you have many concurrent connections needing the same resource).
If you particularly want to update an object (e.g. set the car colour to green) in database, I would suggest a SQL like
UPDATE CAR SET COLOR = 'GREEN';
(Notice I haven't used the WHERE clause). This updates ALL CAR table and I didn't need to pull all Car object, call setColor("Green") and do an update.
In hindsight, what I'm trying to say is that apply engineering knowledge. Your DAO should simply do fast select, update, etc. and let all SQL "work" be handled by RDBMS.

From my experience, what I can say is :
As long as you're not doing join operations, i.e. just querying columns from the same table, the number of columns you fetch will change almost nothing to performance. What really affects performance is how many rows you get, and the where clause. Fetching 2 or 20 columns changes so little you won't see any difference.
Same thing for updating

I think that in certain situations, it is useful to request a subset of the fields of an object. This can be a performance win if you have a large number of columns or if there are some large BLOB columns that would impact performance if they were hydrated. Although the database usually reads in an entire row of information whenever there is a match, it is typical to store BLOB and other large fields in different locations with non-trivial IO requirements.
It might also make sense if you are iterating across a large table and doing some sort of processing. Although the savings might be insignificant on a single row, it might be measurable across a large table.
Also, if you are only using fields that are in indexes, I believe that the row itself will never be read and it will use the fields from the index itself. Not sure in your example if color would be indexed however.
All this said, if you are only persisting objects that are relatively simple without BLOB or other large database fields then this could turn into premature optimization since the query processing, row IO, JDBC overhead, and object creation are most likely going take a lot more time compared to hydrating a subset of the fields in the row. Converting database objects into the final Java class is typically a small portion of the load of each query.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.