How should I design my DAO layer

How should I design my DAO layer - java

Lets say I wanted a web page that would represent a zoo. There should be a list of enclosures (about a ten thousand of them) and it should be possible to display it in three ways:
all enclosures,
only enclosures that the currently logged in user has marked as favorite,
only enclosures that the currently logged in user has commented on.
In all of these cases the list could be too long to fit on a single page and therefore should be divided into multiple pages with a pagination bar.
In order to ease searching for a particular enclosure, all three modes should support additional filtering by a keyword (full-text search in enclosure names). I.e. the user should be able to e.g. display all enclosures marked as favorite that contain a given string in their names. Of course, the list can still be to large and pagination would be applicable here as well.
The question is - how to design the DAO layer to avoid code dupplication and spaghetti code full of conditions? Also, it would be fine to have the code divided into layers/areas of abstraction, so that e.g. the code for building the final SQL queries would not be scattered inconsistently across many different classes from different abstraction layers.

Assuming a traditional request/response web application style here is a sketch:
Represent the various filtering options as classes in supporting code for your DAO. Have the web client specify URL parameters representing the filtering options. You'll need a way to ensure that the filtering options are always sent in on each request, or store them on the user's session.
Map the filtering parameters to the filtering options and pass the options to your DAO. In your DAO's queries "expand" the filtering options into appropriate where claus(es) against the database.
For paging, have the concept of a paging "window". For example, you could have a class that represents the starting row and how many rows to return. Again, expand that class into a predicate executed against the database.
There are other ways to accomplish this (perhaps with one of the million frameworks that are around), but this is how I'd approach it if I had to develop it all from scratch.

Editing my original answer since I misread your criteria. Your DAO will be the same as any other basic DAO. It will (essentially) have a GET method for each of the three queries. If the user wants to narrow down the criteria after that, I would suggest using a jquery plugin like DataTables., assuming the amount of data that gets returned in the DAO methods isn't some outrageously huge amount. That plugin will allow you to add filters to each column that updates as you type, and also has sort, search, and paginate functionality.

Related

What is the proper way to setup and seed a database with artificial data for integration testing

Let's say I have 2 tables in a database, one called students and the other called departments. students looks like the following:
department_id, student_id, class, name, age, gender, rank
and departments looks like:
department_id, department_name, campus_id, number_of_faculty
I have an API that can query the database and retrieve various information from the 2 tables. For example, I have an end point that can get number of students on each campus by joining the 2 tables.
I want to do integration testing for my API end points. To do that, I spin up a local database, run migration of the database schemas to create the tables, then populate each table with artificial records such that I know exactly what is in the database. But coming up with a good seeding process has proven to be anything but easy. For the simple example I described above, my current approach involves generating multiple distinct records for each column. For example, I need at least 2 campuses (say main and satellite), and 3 departments (say Electrical Engineering and Mathematics for main campus and English for satellite campus). Then I need at least 2 students in each department or 6 students in total. And if I mix in gender, age and rank, you can easily see that the number of artificial records grows exponentially. Coming up with all these artificial records is manual and thus tedious to maintain.
So my question is: what is the proper way to set up and seed database for integration testing in general?

First, I do not know any public tool that automates the task of generating test data for arbitrary scenarios.
Actually, this is a hard task in general. You might look for scientific papers and books on the topic. There are may of those. Unfortunately, I have no recommendation on a set of "good" ones.
A quite trivial approach is generating random data drawn from a set of potential values per field (column in the database case). (This is what you did already.) For smaller sets you may even generate the full set of potential combinations. E.g. you might have a look at the following test data generator for an example applying a variant of such an approach.
However, this might not be appropriate for the following reasons:
the resulting data will exhibit significant redundancy, while it may still not cover all interesting cases.
it might create inconsistent data with respect to logical constraints your application would enforce otherwise (e.g. referential integrity)
You might address such issues by adding some constraints into the process of generating test data for eliminating invalid or redundant combinations (with respect to your application).
The actual restriction possible (and making sense), however, are depending on your business and use cases. So, there is no general rule on such restrictions. E.g. if your API provides special treatment for age values based on gender combinations of age and gender are important for your tests, if no such distinction exists any combination of age and gender will be OK.
As long as you are looking for white box test scenarios, you will need to put in your implementation (or at least specification) details.
For black box testing a full set of combinatorial data will be sufficient. Then only reducing test data to keep runtime of tests within some maximum is an issue.
When dealing with white box testing, you might explicitly look for adding in corner cases. E.g. in your case: department without any student, department with a single student, students without department, as long as such scenario makes sense with your testing purposes. (e.g. when testing error handling or testing how your application would deal with inconsistent data.)
In your case you are looking at your API as the main view to the data. The database content just is the input necessary for achieving all interesting output from that API. The actual task of identifying a proper database content may be described by the mathematical problem of providing an inverse to the mapping provided by your application (from database content to API result).
In lack of any ready tool, you might apply the following steps:
start with a simple combinatorial data generator
apply some restrictions eliminating useless or illegal records
run tests capturing coverage data add extra data records for improving coverage repeat testing until coverage is OK
review and adjust data after any change to your code or schema

I think DbUnit might be the right tool for what you're trying to do. You can specify the state of your database before the tests and check the expected state after.

If you need to initialize database with tables and dummy data with Junit,
I am using Unitils or DbUnit
The data in Unitils could be loaded from XML files inside your resources folder, so once the test runner starts, it'll load all content from xml and insert them in the database, please look at the examples on their website.

How to apply code logic to database records/values?

I'm maintaining a system (in Java, with Tomcat, Spring MVC, and Hibernate) where I have to set access rules for user groups. These rules are saved in a database (PostgreSQL) as records / rows. The logic is very simple. Each user of a company's team belongs (is connected) to a group, and each group has a set of rules.
I have to allow administrators to configure (through a web application) rules for groups, so that each rule has a logic and this is recognized and reproduced on the server side.
I need to define rules with parameters, such as:
Authentications only weekends.
Authentications only on weekdays.
Authentications only at a certain time (from time X to time Y).
X authentications per day.
Account expiration from date X
And so on...
My intention is that the company team can organize itself dynamically, just setting up any rules they want at any time, without the need for maintenance every time their policies change.
I've been searching on google and found nothing about it. I know I can do this in Java code, I would have to tie Java code with values of rules names present in database, something that could change in the future (or between companies), and this does not seem right to me. I'm not sure if this is correct, or preferable (maintainable). I appreciate any suggestions, ideas, or corrections (for real).
Note: Team/Groups names may change, but its rules should remain the same (if desired).
EDIT
The database is already modeled and ready. Groups and rules represent values from two different tables, with no logic at all. Querying these values works trivial. However, as I'm maintaining a web application, I'm in charge of creating a code or procedure that applies logic to the choice of rule values.
I was very clear in my question, but I will add more things:
Imagine that my clients (companies) want a website (a web application) that can manage their employees. Every company has teams of employees (groups), each with its function. Otherwise, some employees are sometimes hired as temporary employees.
My duty is to restrict access to the accounts of users who are part of company teams. This will allow business leaders to restrict things according to their policies.
For any company, the process works something like this:
The person in charge defines groups (with names and descriptions).
The same person defines restrictions rules for each group.
User accounts are created and linked to groups with rules.
The accounts are given (assigned) to each person part of the company
team, each according to their function.
Why should this be done?
Management
Control
Security
Speaking more technically now, I do not know where or how I should implement this properly. I know of a way to accomplish this, which is in programming code (Java, in my case), but again, I do not know if this is appropriate.
I also know that it is possible to define users and groups on the database side. But creating and deleting such definitions for each time an employee is hired or his or her length of service is expired can not become practical. My intention is to avoid to the maximum that companies have to spend more money on maintenance (Although sometimes this is obviously impossible).
My question based on a real case can be answered indicating to me an ideal way / approach for this type of scenario, either the solution being something that should be implemented in the database, or something done in the application layer, or both, or something else (I do not have experience to solve this kind of situation properly, so I'm here).
For practical purposes, I have decided to describe what technologies I am using in this system. If you want more information, I'll be happy to show you here.
Also, as this is a question that covers a larger context, not specifically databases, and also not specifically web applications, I have decided to put it here (instead of other StackExchange communities).
Thank you.

Oracle distinct vs java (cqengine/set) : whose leads to better performances?

I have a table from which I extract 8 columns, said columns will be properties of a pojo, say MyPojo.
I want to remove duplicates.
I came up with two strategies.
1-Let oracle take care of this with distinct keyword
select distinct c1,c2...c8 from TABLE where...`
2-Do this in java with cqengine (https://code.google.com/p/cqengine/wiki/DeduplicationStrategies#Logical_Elimination_Strategy):
DeduplicationOption deduplication = deduplicate(DeduplicationStrategy.LOGICAL_ELIMINATION);
ResultSet<Car> results = cars.retrieve(query, queryOptions(deduplication));
3-Do this in java with a set
simply storing rows inside of a Set<MyPojo>
From a performance point of view which one is better?

Let the database do the work. In this case you don't send unnecessary data over the network which will - probably - have the biggest positive impact on performance.
Also it is the most compact solution in terms of code size.

The best way to decide these things is to model it.
What are the access patterns in your application?
If this is would be a one-off request: have the database do the filtering.
If you expect to get many such identical requests: have the database do the filtering, and consider caching results in the application.
If you expect to get a variety of queries on the same dataset, consider caching the unfiltered dataset into the application tier, and querying it with CQEngine.
There is no rule of thumb such as "always have the database do the work". If your application operates at any kind of scale, you will not want every request to hit the database. You need to scale out your application tier.
On the other hand, you should not over-engineer. The answer depends on the traffic volume and data access patterns that you expect.

Design pattern for java wrapper for Jquery datatables

I have found the Jquery datatables plug in extremely useful for simple, read only applications where I'd like to give the user pagination, sorting and searching of very large sets of data (millions of rows using server side processing).
I have a system for reusing this code but I end up doing the same thing over and over alot. I'd like to write a very generalized api that I essentially just need to configure the sql needed to retrieve the data used in the table. I am looking for a good design pattern/approach to do this. I've seen articles like this http://www.codeproject.com/Articles/359750/jQuery-DataTables-in-Java-Web-Applications and have a complete understanding of how server side processing works (have done it in java and asp.net many times). For someone to answer you will probably need to have a deep understanding of how server side processing works in java but here are some issues that come up with attempting to do this:
I generally run three separate queries. A count without the search clause, a count with the clause included, the query for the actual data. I haven't found an efficient way to do all 3 at once and doing so requires a lot of extra data to come back from db (ie counts over and over). The api needs to support behavior based on these three different queries and complex queries at that. I generally row number () over an index for the pagination to be relatively speedy with large data.
*where clause changes dynamically (user can search over a variable number of rows).
*order by clause changes for the same reason.
overall, each case is often pretty specific to the data we need. Is there a good way to abstract this so that I can do minimal work when I want to use the plug in server side.
So, the steps are as follows in most projects:
*extract the params the plug on sends to the server (alot of times my own are added, mostly date ranges)
*build the unfiltered count query (this is rarely dynamic).
*build the filtered count query (is dynamic)
*build the data query
*construct a model object of the table and return it as json.
A lot of the issues occur setting the prepared statements with a variable number of parameters. Dynamically generating the sql in a general way (say based on just column names) seems unlikely. I am wondering if someone else has created something they are using for this or if it sounds like a specific pattern is applicable. It has just occurred to me that creating a reusable filter may be helpful in java. Any advice would be greatly appreciated. Feel free to be language agnostic as the architecture is what I'm trying to figure out.

We have base search criteria where all request parameters relevant to DataTables are mapped onto class properties (fields) and custom search criteria class that extends base and contains specific to business logic fields for sutom search. Also on server side we have repository class that takes custom search criteria as an argument and makes queries to database.
If you are familiar with C#, you could check out custom binding code and example of usage.
You could do such custom binding in your Java code as well.

Exploring user specific data in webapps

I am busy practicing on designing a simple todo list webapp whereby a user can authenticate into the app and save todo list items. The user is also only able to to view/edit the todo list items that they added.
This seems to be a general feature (authenticated user only views their own data) in most web applications (or applications in general).
To me what is important is having knowledge of the different options for accomplishing this. What I would like to achieve is a solution that can handle lots of users' data effectively. At the moment I am doing this using a Relational Database, but noSQL answers would be useful to me as well.
The following ideas came to mind:
Add a user_id column each time this "feature" is needed.
Add an association table (in the example above a user_todo_list_item table) that associates the data.
Design in such a way that you have a table per user per "feature" ... so you would have a todolist_userABC table. It's an option but I do not like it much since a thousand user's means a thousand tables?!
Add row level security to the specific "feature". I am not familiar on how this works but it seems to be a valid option. I am also not sure whether this is database vendor specific.
Of my choices I went with the user_id column on the todolist_item table. Although it can do the job, I feel that a user_id column might be problematic when reading data if the data within the table gets large enough. One could add an index I guess but I am not sure of the index's effectiveness.
What I don't like about it is that I need to have a user_id for every table where I desire this type of feature which doesn't seem correct to me? It also seems that when I implement the database layer I would have to add this to my queries for every feature (unless I use some AOP)?
I had a look around (How does Trello store data in MongoDB? (Collection per board?)), but it does not speak about the techniques regarding user_id columns or things like that. I also tried reading about this in some security frameworks (Spring Security to be specific) but it seems that it only goes into privileges/permissions on a table level and not a row level?
So the question is whether my choice was appropriate and if there are better techniques to do this?

Your choice is the natural thing to do.
The table-per-user is a non-starter (anything that modifies the database structure in response to user action is usually suspect).
Row-level security isn't really an option for webapps - it requires each user session to have a separate, persistent connection to the database, which is rarely practical. And yes, it is vendor-specific.
How you index your tables depends entirely on your usage patterns and types of queries you want to run. Is 'show all TODOs for a user' a query you want to support (seems like it would be)? Then and index on the user id is obviously needed.
Why does having a user_id column seem wrong to you? If you want to restrict access by user, you need to be able to identify which user the record belongs to. Doesn't actually mean that every table needs it - for example, if one record composes another (say, your TODOs have 'steps', each step belongs to a single TODO), only the root of the object graph needs the user id.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.