Best practices regarding empty fields and nulls in postgresql

Best practices regarding empty fields and nulls in postgresql - java

I'm writing a simple webapp to show my coding skills to potential employers. It connects with an API and receives a JSON file which is then deserialized using Jackson and displayed in a table form in the browser. I want to enable the user to persist the Java object in a Postgres database using Hibernate. I got it to work and it does the job nicely but I want to make it more efficient.
Whenever there is no data in the JSON response to put in the object's field (right now all the possible JSON attributes are present in the Java class/Hibernate entity in the form of String fields) I put an empty String ('') and then, with all fields having something and no null objects, it is stored in the database.
Should I only store what I have and put no empty strings in the DB (using nulls instead) or is what I'm doing now the right way?

Null is an absence of a value. An empty string is a value. But that don't impact much to memory. If you want to display data repeatedly and don't want conversion from null to empty string while retrieval you can go for empty string ''.
But if you want unique constraint for values other than empty string '' then use null.
Sometimes null and empty '' can be used to differentiate either data was known or not. for known but not available data use empty and for unknown data null can be used.

Use NULLwhen there isn't a known value.
Never use the empty string.
For example, if you have a customer which didn't supply his address don't say his address is '', say it is NULL. NULL unambiguously states "no value".
For database columns that must have a value for your web application to work, create the backing table with NOT NULL data constraints on those columns.
In your unit tests, call NULL, ..._address_is_null_ and test for success or failure (depending on if the test should trigger no errors or trigger an exception).
The use of '' in databases as a sentinel, a special value that means something other that '', is discouraged. That's because we won't know what you meant it to mean. Also, there might be more than one special case, and if you use '' first, then it makes restructuring more difficult to add others (unless you fall into the really bad practice of using even more special strings to enumerate other special cases, like "deleted" and so on).

Related

Flink: how to write null values as empty through writeAsCsv in sink

I have my a pojo which I create on runtime and there could be null values in the pojo object. When I try to write the object values in a CSV file with dataset.writeAsCsv, the following exception appears:
org.apache.flink.types.NullFieldException: Field 0 is null, but expected to hold a value.
In this case my integer is null. but same is the case with Date.
Is there any way to write null values as empty back to CSV output file?

Since you can only call the writeAsCsvmethod on Datasets of Tuples, there must be a place in your code where your Dataset<Pojo> is transformed into a Dataset<TupleN>.
Tuples can hold null values, but are not serializable when holding them. (The javadoc more or less warns about this.) If you look at the surrounding lines of your exception you may find that it is thrown at serialization, at least this is what I could reproduce:
org.apache.flink.types.NullFieldException: Field 1 is null, but expected to hold a value.
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize
So I guess you will have to determine what a null value means in the domain of your POJO and replace it accordingly before you write your CSV file. A solution could be to transform your values to Strings and replace null with "". However, depending on the meaning of the value other substitutions could be more appropriate.

Best way to represent set of string values in DB and UI

I'm looking for opinions so I guess this is a 'which is better' question. I have a webapp build in Javascript/jQuery and struts that uses Hibernate to access data in a relational DB (MySQL). When an object/database field has a limited set of strings for values, is it better to use the full string in the object/DB or a 'code' for that string, like a single CHAR instead of the entire string?
class User {
int id;
String userName;
String type; // Values of 'Administrator', 'Regular'
OR
char type // Values of 'A', 'R'
OR
char type // Values of 'A', 'R'
String typeString; // Can be returned on the fly based on 'type' or by DB in SQL CASE statement
}
If the database has the full text string, then its easy coding all the way around, but its wasting the space (in the DB, data transfer) on something that only has a few values.
If the database has just a 'code' then when presenting this field to a user ( like in a grid of existing users, or a dropdown selection list when creating a new user ) the char value must be converted to the full string. Then the question is where should that conversion be done? It could be at the DB level where Hibernate can fill in the full string value from a CASE statement. This saves DB space, but not in data transfer or memory. It could be at the object level where its done in the getter/setter for the 'type' field. Or it could be all the way in the GUI where Javascript converts the 'char' to the appropriate string for the user to see.
Also... if either method is OK to use, what might influence the choice you make? The number of different values? The max length of the strings? How many rows are expected in the table?
I'm sure every DB/programmer has come across this situation many times and probably has a preference.

If you only have a fixed set of user types like Admin and Regular, I think it will easier to use a static hashmap in your code and just store A and R in your code. Something like:
static HashMap<Character,String> userRoles = new HashMap<>();
static{
userRoles.put("A","Admin");
userRoles.put("R", "Regular");
}
When ever you get result from DB, you can just do userRoles.get(type) to check the actual type. This saves space and also it's readable.

I would put the full name in the database alongside an associated short code or ID in some kind of lookup table. Use the shortcode/ID as the primary key for the lookup table, and as a foreign key from other tables. If someone needs to investigate the database layer, or someone needs to use the database for reporting, data warehousing, or analytics this will simplify things greatly.
It's commonly seen as bad practice to name variables, database tables, database columns, functions, etc. with unclear names or abbreviations that not everyone will understand - short codes like this should be seen the same way.

I think its better to do the conversion from the typecode to type (and vice versa) as close to database interaction as possible - in this case Hibernate. This is because your application logic would become more readable and intuitive if it uses the explicit types.
In my opinion- if(BMW.equals(carTypeCode)) {} is lot more readable than if("X".equals(carTypeCode)) {}.
I am not very familiar with Hibernate, but it would be awesome if you could leverage Hibernate for the mapping of String to DB representation and vice versa (maybe using CASE as you mentioned). Personally, I would probably have modeled these Strings as enums and used something like Hibernate Enum Type mapping. Also, you should think about making these type codes a little bit readable by making them at least few chars because these may come in handy when you are debugging some issue by looking at DB dump and you don't have to consult your type-code to type conversion chart.
I don't think performance wise either would not impact much in the average case.

Handling null in Freemarker

Situation:
Old java project using freemarker has many finished templates working great.
Every template is using data form Transaction object.
This transaction object is very large, because wraps all data about transaction.
In templates is a lot of expression like this:
get("object1").getNestedObject2().getNestedObject3().getValue();
Problem:
New requirements appear: All templates have to be process for preview with no real data. All numbers should be Zero and all string should be ---.
Unsatisfactory solutions:
Remake all templates to check null values. (Lot of work and not safe)
Create Transaction object that contains all default value. (Lot of work)
Well my question is: Can I say to Freemarker, that if he finds null or finds null along the way, that he should use 0 instead if he was expecting number or --- if he was expecting String.
Or do you see any better solution?

If you need to show a dummy data model to the templates, your best bet is probably a custom ObjectWrapper (see Configuration.setObjectWrapper). Everything that reads the data model runs through the TemplateModel-s, and the root TemplateModel is made by the ObjectWrapper, thus it can control what values the templates get for what names. But the question is, when you have to return a dummy value for a name, how can you tell what its type will be? It's not just about finding out if it will be a string or a number, but also if it will be a method (like getNestedObject2) or a hash (something that can be followed by .). What can help there is that FreeMarker allows a value to have multiple types, so you can return a value that can be used as a method and as a hash and as a string, for example. Depending on the application that hack is might be good enough, except, you still have to decide if the value is a string or a number, because ${} will print the numerical value if the value both a string and a number.

Ignoring some entity fields during saving in Objectify 4

I am trying to use Objectify #IgnoreSave annotation along with simple If condition (IfEmpty, IfNull) but it seems that it is not working. Without If condition the actual value is not persisted as expected, however, when I use some If condition, it is always persisted (e.g. if IfNull condition used and null value provided, it is persisted and hence original value in datastore deleted).
...
#IgnoreSave(IfNull.class)
private String email;
...
...
this.objectify.save().entity(userDetails).now();
...
Is there any additional configuration needed? Or has anyone experienced the same?

From "hence original value in datastore deleted" it sounds like you misunderstand a fundamental characteristic of the GAE datastore - entities are stored whole. If you #IgnoreSave a field, it will be ignored during save and thus the field will not be present in the datastore. You do not get to update some fields and not others.

java dynamic attributes table

I am developing a java application which needs a special component for dynamic attributes. The arguments are serialized (using JSON) and stored in a database and then deserialized at runtime. All attributes are displayed in a JTable with 3 columns (attribute name, attribute type and attribute value) and stored in a hashmap.
I have currently two problems to solve:
The hashmap can also store objects and the objects can be set to null. And if set to null i dont know which class they belong to. How could i store objects even if they are null and known which class they belong to? Do i need to wrap each object in a class that will holds the class of the stored object?
The objects are deserialized from json at runtime. The problem with this is that there are many different types of objects and i don't actually know all object types that will be stored in the hashmap. So i am looking for a way to dynamicly deserialize objects.. Is there such a way? Would i have to store the class of the object in the serialized json string?
Thanks!

Take a look to the Null Object Pattern. You can use an extra class to represent a Null instance of your type and still could contain information about itself.
There is something called a Class Token, Which is the use of Class objects as keys for heterogeneous containers. Take a look to Effective Java By Joshua Bloch, Item 29. I'm not sure how this approach could work for you since you may have many instances of the same type but I leave it as a reference.

First of all, can you motivate why you use JSON serialization for your attributes ?
This method is disadvantageous in many ways in my opinion, it can cause problems with database search and indexing, make database viewing painful and caus unnecessary code in your application. These problems can be not an issue, it depends how you want to use your attributes.
My solution for situation like these is simple table containing columns like:
id - int
attribute_name - varchar
And then add columns for each supported data type:
string_value - varchar
integer_value - int
date_value - date
... and any other types you want.
This design allow for supreme performance using simple and typesafe ORM mapping without any serialization or other boilerplate. It can store values of any type, you just set correct column for attribute type, leaving all other with null. You can simulate null value by using null in all data columns. Indexing and searching also becomes a piece of cake.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.