I am currently evaluating authentication / authorization frameworks.
Apache Shiro seems to be very nice but I am missing row-level security features.
E.g. there might be special rows in a database which should only visible and accessible by users with special privileges.
To avoid unnecessary round-trips, we currently modify the SQL queries to join with our authorization data to get only the visible rows for the current user.
But this concepts doesn't feel 'right' to me, because we mix business code with security related code which should be orthogonal and independent from each other.
What solutions are available/possible?
How do you implement row-level security (especially in combination with jpa)?
UPDATE:
Target database is mostly Oracle 10g/11g
- but a database independent solution would be preferred if there are no big drawbacks
Row level security is really best done in the database itself. The database has to be told what your user context is when you grab a connection. That user is associated with one or more security groups. The database then automatically appends filters to user supplied queries to filter out what can't be seen from the security groups. This of course means that this is a per database-type solution.
Oracle has pretty good Row Level Security support, see http://www.orafusion.com/art_fgac.htm as an example.
We implemented it as JDBC wrapper.
This wrapper simply parses and transforms SQL.
Hibernate filter is good idea too but we have many reports and ad-hoc queries, Hibernate is not the only tool to access data in our applications.
jsqlparser is an excellent open source SQL parser but we have to fork it to fix some issues and to add support of some advanced SQL features e.g. ROLLUP for reporting purposes https://github.com/jbaliuka/sql-analytic
This reporting tool is also available on github but there is no dependency on row level security infrastructure https://github.com/jbaliuka/x4j-analytic
There is a helpful article: http://mattfleming.com/node/243
The idea is that you can implement row level functionality in two ways: directly setting restrictions in your repository or binding the restrictions via AOP. The latter is preferred because security layer should be separated from business logic (orthogonal concerns).
In Hibernate you can use the concept of filters which are applied transparently and repository doesn't know about them. You can add such filters via AOP. The other way is intercepting session.createCriteria() and adding Restrictions to the Criteria transparently using AOP.
Related
There are technically two questions here, but are tightly coupled :)
I'm using Hibernate in a new project. It's a POS project.
It uses Oracle database.
We have decided to use Hibernate because the project is large, and because it provides (the most popular) ORM capabilities.
Spring is, for now, out of the question - the reason being: the project is a Swing client-server application, and it adds needless complexity. And, also, Spring is supposed to be very hungry on the hardware resources.
There is a possibility to throw away Hibernate, and to use JDBC. Why? The project requirement is precise database interaction. Meaning, we should have complete control over the connections, sessions and transactions(and, yes, going as low as unoptimized queries).
The first question is - what are your opinions on using the mentioned requrement?
The second question revolves around Hibernate.
We developed a simple Hibernate pilot project.
Another project requirement is - one database user / one connection per user / one session per user / transactions are flexibile(we can end them when we want, as sessions).
Multiple user can log in the application at the same time.
We achived something like that. To be precise, we achived the full described functionality without the multiple users requirement.
Now, looking at the available resources, I came to a conclusion that if we are to have multiple users on the database(on the same schema), we will end up using multiple SessionFactory, implementing a dynamic ConnectionProvider for new user connections. Why?
The users hashed passwords are in the database, so we need to dynamically add a user to the list of current users.
The second question is - can this be done a little easier, it seems weird that Hibernate doesn't support such configurations.
Thank you.
If you're pondering about weather to use Hibernate or JDBC, honestlly go for JDBC. If your domain model is not too complex, you don't really get a lot of advantages from using hibernate. On the other hand using JDBC will greatly improve performance, as you have better control on your queries, and you get A LOT less memory usage from not habing all the Hibernate overhead. Balance this my making an as detailed as possible first scetch of your model. If you're able to schetch it all from the start (no parts that are possible to change wildly in throughout the project), and if said model doesn't look to involved, JDBC will be your friend.
About your users and sessions there, I think you might be mistaking (tho it could just be me), but I don't think you need multiple SessionFactories to have multiple sessions. SessionFactory is a heavy object to initialize, but once you have one you can get multiple hibernate session objects from it which are lightweight.
As a final remark, if you truly stick with an ORM solution (for whatever reason), if possible chose EclipseLink JPA2 implementation. JPA2 has more features over hibernate and the Eclipselink implementation is less buggy then hibernate.
So, as far as Hibernate goes, I still dont know if the only way to dynamicaly change database users(change database connections) was to create multiple session factories, but I presume it is.
We have lowered our requriements, and decided to use Hibernate, use only one user on the database(one connection), one session per user(multiple sessions/multiple "logical" users). We created a couple of Java classes to wrap that functionality. The resources how this can be done can be found here.
Why did we use Hibernate eventually? Using JDBC is more precise, and more flexibile, but the effort to once again map the ResultSet values into objects is, again, the same manual ORM approach.
For example, if I have a GUI that needs to save a Page, first I have to fetch all the Page Articles and then, after I save the Page, update all the Articles FK to that Page. Notice that Im speaking in nouns(objects), and I dont see any other way to wrap the Page/Articles, except using global state. This is the one thing I wouldnt like to see in my application, and we are, after all, using Java, a OO language.
When we already have an ORM mapper that can be configured(forced would be the more precise word to use in this particular example) to process these thing itself, why to go programming it?
Also, we decided to user google Guice - its much faster, typesafe, and could significantly simplify our development/maintence/testing.
The standard example is probably where you offer a service to multiple companies on the same hosted instance and want employees to be able to see data only from other employees of the same company, not of potentially competitive companies.
I'm using JBossAS7 with Hibernate 4.x.
I could push the company information down from the UI layer and have the (stateless) persistence layer filter on that, but it seems like a bad idea to me, I'd rather have it done in one place closer to the database.
I'm guessing there must be a standard, secure solution for this, maybe around security domains or hibernate sessions? Thoughts? Thanks in advance.
You seem to be building a "multi-tenant application". Hibernate's support for multi-tenancy is quite restricted at the moment, with feature request 5697 having been recently completed, in 4.0.0.Alpha2. Note that this feature request does not address addition of tenant discriminator columns in the entities, which going by the discussion in JIRA, would arrive in 4.0.0.Alpha3 or 4.1.0 (going by JIRA). At the moment, you can store the data related to various tenants in different databases or schemas.
You can also read this related blog post, on various options regarding achieving multi-tenancy in Hibernate; this is quite old compared to the work done in HHH-5697, and does not discuss how one would create a multi-tenant application with tenant discriminator columns in the entity model.
I'm not sure of any standard, but have worked on two systems where it was important. These pre-dated tools like Hibernate and our use of J2EE.
In all systems I've worked on we've had to code this ourselves - using company as part of our keys in requests.
One possibility is a whole different "whatever your database calls its partition" for each customer. (Schema if you're in Oracle). Sounds more complex but it does guarantee isolation between companies and it does also allow some management of scaling or new/delete company. In my previous place of work I remember legal types felt nervous if anyone mentioned keeping more than one company's data in the same table - so that kept them happy.
You could either have your app server connect to the database as a trusted user who can access all, or make sure you pass the end user's credentials down when you connect. I've heard of this. It sounds good from a security point of view and means in a database like Oracle the right thing will just happen. I've not seen it done and wonder how well connection pooling would work if at all.
Edit: Vineet's answer above seems to cover it well. It's an area I'll have to look at more. We've probably got too much legacy code here to change.
I have an existing Java EE 6 application (deployed in Glassfish v 3.1) and want to support multiple tenants. Technologies/APIs I'm currently using in my app are
EJB (including the EJB timer service)
JPA 2.0 (EclipseLink)
JSF 2.0
JMS
JAX-RS
I plan to use CDI as well
As far as I know, adding multi-tenancy support affects only the persistence layer. My question: Has anybody done this before? What are the steps to convert the application? Will this affect other layers other than persistence?
There will be a high number of tenants, therefore, all data will reside in the same DB schema.
Persistence Layer
Start with the persistence layer. Roll upwards through your architecture once you have that done.
The Schema that you are proposing would have an ID that identifies the tenant (eg. TenantId). Each table would have this ID. In all of your queries you would have to ensure that the TenantId matches the logged in User's TenantId.
The difficulty with this is that it is a very manual process.
If you go with Hibernate as your JPA provider then there are some tools that will help with this; namely Hibernate Filters.
These are commonly used to restrict access on multi-tenant Schemas (see here and here for some more)
I haven't used EclipseLink but it does look like it has good support for Multi-Tenancy as well. The DiscriminatorColumn looks like a very similar concept to Hibernate Filters.
Service Layer
I assume that you're using JAX-RS and JMS for a Service Layer. If so then you will also need to think about how you are going to pass the tenantId around and authenticate your Tenants. How are you going to prevent one tenant from accessing the REST service of another? Same thing for JMS.
UI Layer
You are going to have to hook up your login in your UI to a Bean (Hibernate or Eclipselink) that sets the TenantId for the Filter/Discriminator.
Tell us about the number and the degree of separation and customization necessary for different tenants.
If you have a small number of tenants, I would propose to create a customizable "white-label" product. This gives you the opportunity to create some specific things for one tenant without overcomplexing matters. Plus, separating the applications per tenant helps you in maintenance. We did this for a product with a handful of different tenants.
If you have many tenants, this is of course no longer practical. We did a generic version of the same product. All we did then was distinguish tenants by id after login, thus separating the data from others. But still, there was nothing to do in terms of changing the application or a layer within, the id was all what was needed to separate the data and the workflow is automatically separated by having different instances of beans or other managed objects.
There's several ways you can go with this, depending on the level of separation you want to achieve and how many concurrent tenants you want to support. At one extreme, you can create a new schema for each tenant and therefore ensure database-level isolation of data. For most practical purposes it's usually sufficient to have a logical partitioning of your data by assigning a tenant_id to every entity in your domain model and maintaining foreign-key constraints. Of course this means you'll probably want to always pass in your current session's tenant_id to every query / finder method so that it can restrict the data set based on that. You'll want to make sure that users cannot access another tenant's data by entering a tenant id (or a entity id) that does not belong to them in url.
Go message oriented.
If you choose messaging as the strategic approach and refactor (if necessary) business logic around JMS, then other options remain viable and locally applicable.
With this approach, you pay a specific fixed cost (refactor) in your existing (single tenant) system. You then can apply approaches of various degrees of complexity, ranging from simple sharding (#Geziefer's id based association) to a full blown shared-core-schema + extended-tenant-specific-schemas approach, without impacting system architecture and additional refactoring.
You will further have orthogonal control over your system data flows via the messaging layer (applying routers, filters, special processing paths, etc.)
[edit per request]
There is nothing per se in M.T. that explicitly suggests message orientation. But as a general problem, we are looking at widening interfaces, and enriched data flows. Per an API based approach, you would need to carefully inject the appropriate the tenant discriminant in all required interfaces (e.g. methods). A message based (or alternatively a context based API approach) allows for a normative (stable) interface (e.g. message.send()) and at the same allows for explicit specialized data flows. If switching to a message based backbone is not on the table, you are strongly suggested to consider injecting a uniform context (e.g. "RequestContext") param in your APIs. This single extension should cover all your future specialization needs.
I am currently looking into converting a single-tenant Java based web-app that uses Spring, GWT, Hibernate, Jackrabbit, Hibernate Search / Lucene (among others) into a fully fledged SaaS style app.
I stumbled across an article that highlights the following 7 "things" as important changes to make to a single tenant app to make it an SaaS app:
The application must support multi-tenancy.
The application must have some level of self-service sign-up.
There must be a subscription/billing mechanism in place.
The application must be able to scale efficiently.
There must be functions in place to monitor, configure, and manage the application and tenants.
There must be a mechanism in place to support unique user identification and authentication.
There must be a mechanism in place to support some level of customization for each tenant.
My question is has anyone implemented any of the above 7 things in a SaaS /multi-tenant app using similar technologies to those that I have listed? I am keen to get as much input regarding the best ways to do so before I go down the path that I am currently considering.
As a start I am quite sure that I have a good handle on how to handle multiple tenants at a model level. I am thinking of adding a tenant ID to all of our tables and then using a Hibernate filter (and a Full Text Filter for Hibernate Search) to filter based on the logged on user's tenant ID for all queries.
I do however have some concerns around performance as well especially when our number of tenants grows quite high.
Any suggestions on how to implement such a solution will be greatly appreciated (and I apologise if this question is a bit too open-ended).
I would recommend that you architect your application to support all the 4 types of tenant isolation namely separate database for each tenant, separate schema for each tenant, separate table for each tenant and shared table for all tenants with a tenant ID. This will give you the flexibility to horizontally partition your database as you grow, having multiple databases each having a group of smaller tenants and also the ability to have a separate database for some large tenants. Some of your large tenants could also insist that their data (database) should reside in their premise, while the application can run off the cloud.
Here is an exaustive check list of non-functional and infrastructure level features that you may want to consider while architecting your application (some of them you may not need immediately, but think of a business situation of how you will handle such a need if your competition starts offering it)
tenant level customization of a) UI themes and logos b) forms and grids, c) data model extensions and custom fields, d) notification templates, e) pick up lists and master data
tenant level creation and administration of roles and privileges, field level access permissions, data scope policies
tenant level access control settings for modules and features, so that specific modules and features could be enabled / disabled depending on the subscription package.
Metering and monitoring of tasks / events / transactions and restriction of access control once the purchased quota is exceeded. The ability to meter any new entity in the future if and when your business model changes.
Externalising the business rules and workflows out of your code base and representing them as meta data, so that you can customize them for each tenant group / tenant.
Query builder for creating custom reports that is aware of the tenant as well as custom fields added by specific tenants.
Tenant encapsulation and framework level connection string management such that your developers do not have to worry about tenant IDs while writing queries.
All these are based on our experience in building a general purpose multi-tenant framework that can be used for any domain or application. Unfortunately, you cannot use our framework as it is based on .NET
But the engineering needs of any multi-tenant SaaS product (new or migrated) are the same irrespective of the technology stack that you use.
All of the technologies that you listed are quite common and reasonable for both single- and multi-tenant applications. I'd say supporting the 7 "things" for SaaS is much more of a function of how you use the technologies than which. It sounds like you already have a single-tenant application that works. So there's probably not much reason to deviate from the technology selections there unless something is just not working very well already. Your question is otherwise fairly open-ended though, so it's hard to be too much more specific there.
I do have some feedback on splitting the database (and perhaps other things) by tenant ID though. If you know you might eventually have a lot of tenants (say many thousands or more, particularly if they're small) then what you suggest is perhaps best. If however you'll have a smaller number of tenants (particularly if they're large) you might want to consider a database per tenant, so they each have their own table space. By that I mean a single database installation with multiple instances of the same schema inside of it, one per tenant.
There are a few reasons this can be an advantage. One is performance as you mentioned. Adding a tenant ID to every single table is overhead on disk access, query time and increases code complexity. Every index in the database will need to include the tenant ID as well. You run an additional risk of mixing data between tenants if you're not careful (although a Hibernate filter would help mitigate that). With a database per tenant you could restrict access to only the correct one. Porting your current application will probably be a lot easier too, you basically just need to intercept your request somewhere early to decide the tenant based on the URL and point to the right database. Backups are also easy to do per tenant, particularly useful if you ever intend on allowing them to download a backup.
On the other hand there are reasons not to do this. You'll have a lot of database schemas to deal with and they'll have to be updated independently (which can actually be an advantage if you want to avoid taking all tenants down for a schema change, you can roll them out incrementally). It lets you have special cases that could deviate from treating the platform as a true multi-tenant SaaS deployment that's upgraded all at once, resulting in management of multiple versions in production. Lastly I've heard there is a breaking point with just about every database vendor out there in the number of schema instances they'll support in one installation (supposedly some can go to hundreds of thousands though).
It really depends on your use case of course. You mentioned single-tenant which leads me to believe you don't have too many tenants right now, however you do mention growing to lots of tenants. I'm not sure if you mean hundreds or millions, yet either way I hope this helps some with your considerations. Best of luck!
There is no simple answer. I can describe my own solution. It may serve as an inspiration for the others.
tenant per database (postgres)
one additional database shared between tenants
Spring + MyBatis
Spring Security authentication
Details here: http://blog.trixi.cz/2012/01/multitenancy-using-spring-and-postgresql/
For (1): Hibernate supporting multi-tenant configurations out of the box from version 4.
At the moment of writing supported are DB-per-tenant and schema-per-tenant and keeping all tenants in a same DB using discriminator is not yet supported. We have used this functionality successfully in our application (DB-per-client approach).
For (3): After some investigation done we decided to go with Braintree to implement billing. Another solutions many people recommend: Authorize.net, Stripe, PayPal.
For (4): We have used clustered configuration with Hibernate/Spring and JBoss Cache for 2nd level caching. At these days this became "common" and using PaaS services like Jelastic you can even get it pre-configured out of the box.
What you describe is a full service Saas style application serving multiple tenants. There are a few things you have to decide like how critical is data isolation? If you are building for a medical or financial domain, data isolation is a critical factor.
Well, I cannot help answer all your points, but I would suggest looking at database-per-tenant approach for your application as it provides the highest level of data isolation.
Since you are using the Java, Spring, Hibernate stack, I can help you with a small example application I wrote. It is a working example which you can quickly run in your local laptop. I have shared it here. Do take a look and let me know if it answers some of your questions.
The system I am currently working on requires some role-based security, which is well catered for in the Java EE stack. The system intends to be a framework for business domain experts to write their code on top of.
However, there is also a requirement for data security. That is, what information is visible to an end user.
This effectively means reducing visibility to rows (and perhaps even columns) in the database.
We are using Hibernate for our persistence. However, we are using our own annotations so as not to expose our persistence choice to the business domain experts.
For row based security this means we could add an annotation such as #Secured at the entity level, which would cause an extra column to be added to the underlying table to constrain our selects?
For column based security, we could perhaps have #Secured to either assist in query generation, or perhaps use an aspect to filter the information returned?
I'm curious to know how this might affect hibernate's caching mechanisms as well?
I'm sure a lot of others will have had the same issue, and I was wondering how you approached this?
Much appreciated...
Hibernate has a filter mechanism that may work for you. The filters will rewrite the queries hibernate generates to include an additional clause to limit the rows returned. I'm not aware of anything in hibernate to mask/hide columns.
Your database may also have support for this functionality. Oracle, for example, has the Virtual Private Database (VPD) which will rewrite your queries at the database level. This solution has the added benefit that any external program (e.g. reporting tools) that goes against your db will have your security restrictions enforced. VPD also has support to mask restricted columns with NULLs.
Unfortunately, the above solutions have not been adequate to support the security requirements for the types projects I typically work on. There is usually some sort of context that cannot be easily expressed in the above solutions. For example, users can view data that they have created, or that have been been marked as public, or belong to a project which they manage.
We typically create query/finder/DAO objects where we pass in the values required to enforce the security and then create the query accordingly.
I hope this helps
When using Hibernate filters you need to be aware that the additional restrictions will not be applied to SQL statements generted by the load() or get() methods.