pros and cons of multi-tenancy through mutliple deployments and separate schemas

pros and cons of multi-tenancy through mutliple deployments and separate schemas - java

I am trying to create a java web application using Spring mvc. The purpose of this application is to serve different groups of users from different business units in the enterprise. So, for instance, if you think of it as a shopping experience kind of application, where the application is functionally about
picking what you want
adding what you pick to the cart
checking out from cart
then, I need to list plumbing items for plumbing department, electrical items for electrical department and and so on.
So, I decided to have two different schemas with identical table structure. So schema 'PLUMB' will store plumbing dept users who can use the application, and items related to plumbing in USERS and ITEMS table of the PLUMB schema. Similarly, electrical department has its own schema. That's for multitenancy on database side.
For the application/deployment side, the code of the webapp remains the same except for the one property that tells that application which schema it needs to query (this would be obviously different on each instance). So, I am thinking about deploying
http://mycompany.com/plumbingapp
http://mycompany.com/electricalapp
Are there any known anti patterns to this kind of architecture? I see one down side is that I will now have multiple environments to manage - like dev.mycompany.com/plumbingapp and test.mycompany.com/plumbingapp. Other than that, I think this allows for cleaner separation than having one single app that authenticates the user and then asks him to pick from which department he wants to go to and depending on department he picks, I would populate the webpage.
Have you used this kind of structure before? Are there any known down sides to this kind of design/architecture? If I deploy multiple instances, is it a multi-tenant application anymore?

depend on user and his rights, after his login, you will create /plum or /electrical modelAndView.
In your DB you may create tables plum_table and elec_table, except for user table and user_roles.
The other way is to create virtual machines and proxy to different machine depend on /plum or /electrical

The approach listed in the question suits well for two or a few tenants. But not easy to scale.
There is no need for a separate deployment bundle. The tenant identification and separation can be at only DB schema level, roles and any business rules. And the rest of the System components can be common. There is no binding reason to create a separate app for each tenant.
These posts can help help:
Databse architecture (single db vs client specific db) for Building Enterprise Web (RIA) application on cloud
Architecture for SaaS based online portal

Related

Using LDAP for storing business data rather than organizational data only

The business domain of a solution contains the following objects:
users
resources (strings representing ids of real-world objects like products)
groups of users
groups of resources
rights granted to users to act upon resources in a certain way
I need to evaluate whether the solution shall use a relational database or if we can do everything in LDAP with a custom schema.
The proposed solution so far uses some UI framework for the front-end and a REST API to connect to the backend. The business objects on the back end are realized as EJBs. The EJBs use only LDAP with a custom schema (option 1) or a "standard" LDAP and a RDBMS (option 2) for persistance.
I have no experience with custom LDAP schemas and I feel uncomfortable to persist data that is not related to organization (like people, equipment, etc.) but to it's business (like products etc.) to LDAP. On the other hand I do not see any reason why this is not such a good idea.
Here are some of the key requirements:
web front-end
users and user groups are managed in LDAP
possibility for the future to extend the solution in such a way that the resources contain not only ids of real-world objects but also have BLOBs associated with them
writes to the database are far less frequent than reads
no transactions are required
on-premise solution that can easily be integrated into various IT environment

Dynamically select database based on session

I am using JPA and WildFly 10.
Imagine that two different companies each have subscriptions to use my application. Ideally, I'd like to split up the data of the two companies into separate databases, or at the very least, separately prefixed tables, using JPA. If a user logs in on http://example.com/company1/, it will seem as though the application is completely separate from the application running at http://example.com/company2/, when in actuality, the "two" applications are part of one application.
Multiple users may be logged in at the same time from different companies, so the database being used should be session based.
The main reason for splitting up the data into separate tables or databases is for better organization. The reason for using one application rather than multiple is to allow for horizontal scaling.
I have seen answers which state to create multiple persistence.xml files, however my application should dynamically create and drop the databases or tables based on the current subscriptions that are active. Think of it like a web hosting company, where as soon as you pay, you can login and begin working on your website; that's the direction I am moving in. Therefore, I cannot hard-code the different companies into the program.
How can this be done with JPA? Is it even possible? Or is there a better way to accomplish what I am seeking?

You are trying to make your application multi tenant capable. One way I know is:
use a field in every database table which contains a predefined value for a tenant.
set current tenant when the request comes from this tenant
then use Hibernate filtering capability to filter tenant-specific data automatically. Once the filter is activated every query will return only data specific to the current tenant.
Note: a Hibernate filter is set on the Hibernate Session object. If you are using pure JPA, the Entity Manager interface has a method called getDelegate() which returns a Hibernate session object.
Here is a link to the information about Hibernate filters. There are also tutorials on how to use them on the Internet.

Google App Engine Multitenancy API

According to the GAE docs on the Multitenancy API:
Multitenancy is the name given to a software architecture in which one instance of an application, running on a remote server, serves many client organizations (also known as tenants).
But isn't this what every web application is? Dozens, hundreds, maybe even thousands of users all logging in to the system, accessing the same software, but from inside the context of their own "user accounts"? Or is Google's Multitenancy API some kind of API for developing generic data abstraction layers that can be used as backends for multiple apps?
I guess I don't get the meaning of a "Google multi-tenant" app, and as such, don't understand the purpose or usefulness of the Multitenancy API. Thanks in advance for any clarity here!

Consider the standard way that multitenancy is implemented: You add a "tenant ID" field to one or more tables, then include that ID in a WHERE clause. And you index that field.
You could take the same approach in App Engine, adding an indexed property to some of your entities to hold a tenant ID, carefully including that ID in GQL WHERE clauses (or a filters). This'll cost you a bit more on writes (for the two indexes on that property), and more if the ID participates in queries that include other filters, as those would require additional composite indexes that include the ID.
Or you user our multitenancy API, which gives you the same effect without the additional costs for index writes. You get slightly simpler code, and less expense.

Multitenancy here doesn't refer to users of your app as such, but 'instances' of your app with 'separate' datastores.
They aren't really separate instances or separate datastores, as those requests might be served by a shared instance and they are definitely talking to the same datastore. However, by using the API you can set up your app so that the data is partitioned into separate namespaces which don't pollute each other.
If you have only one user on your app, then multi-users and multi-tenanting is pretty much the same thing. If you have multiple users, then generally you'll be sharing data between the users. If so, you can use multitenancy to share data within only a certain group of users and partition the rest off in their own tenancy.
As jtahlborn rightly states, each of our GAE apps is already a tenant on the GAE infrastructure. We aren't able to share data between different apps because they are completely partitioned from each other.
As Dave says, we could implement multitenancy ourselves by adding some kind of domain name or partition id to all our data. The API just gives an easier way to do that.

The difference is whose tenants you are talking about. GAE was multi-tenant from day one in that each program(tenant) ran in a common GAE infrastructure. however, initially, your program itself just managed one body of data (when GAE was first released). the GAE "multi-tenancy API" enables your single program to manage its(your) own tenants (so your tenants as opposed to GAE's tenants).
to state it concisely and confusingly: the "multi-tenancy API" allows you to manage your own tenants(users) within a single GAE program, which is in turn hosted as a tenant(program) within the GAE infrastructure.
in theory, of course, you could always have done this from day 1 in GAE, but all the work for managing the data between your tenants would have been handled in your code. the "multi-tenancy API" attempts to remove that pain from the programmer and make it much simpler to segment the data within your program.

Multi tenancy support in Java EE 6

I have an existing Java EE 6 application (deployed in Glassfish v 3.1) and want to support multiple tenants. Technologies/APIs I'm currently using in my app are
EJB (including the EJB timer service)
JPA 2.0 (EclipseLink)
JSF 2.0
JMS
JAX-RS
I plan to use CDI as well
As far as I know, adding multi-tenancy support affects only the persistence layer. My question: Has anybody done this before? What are the steps to convert the application? Will this affect other layers other than persistence?
There will be a high number of tenants, therefore, all data will reside in the same DB schema.

Persistence Layer
Start with the persistence layer. Roll upwards through your architecture once you have that done.
The Schema that you are proposing would have an ID that identifies the tenant (eg. TenantId). Each table would have this ID. In all of your queries you would have to ensure that the TenantId matches the logged in User's TenantId.
The difficulty with this is that it is a very manual process.
If you go with Hibernate as your JPA provider then there are some tools that will help with this; namely Hibernate Filters.
These are commonly used to restrict access on multi-tenant Schemas (see here and here for some more)
I haven't used EclipseLink but it does look like it has good support for Multi-Tenancy as well. The DiscriminatorColumn looks like a very similar concept to Hibernate Filters.
Service Layer
I assume that you're using JAX-RS and JMS for a Service Layer. If so then you will also need to think about how you are going to pass the tenantId around and authenticate your Tenants. How are you going to prevent one tenant from accessing the REST service of another? Same thing for JMS.
UI Layer
You are going to have to hook up your login in your UI to a Bean (Hibernate or Eclipselink) that sets the TenantId for the Filter/Discriminator.

Tell us about the number and the degree of separation and customization necessary for different tenants.
If you have a small number of tenants, I would propose to create a customizable "white-label" product. This gives you the opportunity to create some specific things for one tenant without overcomplexing matters. Plus, separating the applications per tenant helps you in maintenance. We did this for a product with a handful of different tenants.
If you have many tenants, this is of course no longer practical. We did a generic version of the same product. All we did then was distinguish tenants by id after login, thus separating the data from others. But still, there was nothing to do in terms of changing the application or a layer within, the id was all what was needed to separate the data and the workflow is automatically separated by having different instances of beans or other managed objects.

There's several ways you can go with this, depending on the level of separation you want to achieve and how many concurrent tenants you want to support. At one extreme, you can create a new schema for each tenant and therefore ensure database-level isolation of data. For most practical purposes it's usually sufficient to have a logical partitioning of your data by assigning a tenant_id to every entity in your domain model and maintaining foreign-key constraints. Of course this means you'll probably want to always pass in your current session's tenant_id to every query / finder method so that it can restrict the data set based on that. You'll want to make sure that users cannot access another tenant's data by entering a tenant id (or a entity id) that does not belong to them in url.

Go message oriented.
If you choose messaging as the strategic approach and refactor (if necessary) business logic around JMS, then other options remain viable and locally applicable.
With this approach, you pay a specific fixed cost (refactor) in your existing (single tenant) system. You then can apply approaches of various degrees of complexity, ranging from simple sharding (#Geziefer's id based association) to a full blown shared-core-schema + extended-tenant-specific-schemas approach, without impacting system architecture and additional refactoring.
You will further have orthogonal control over your system data flows via the messaging layer (applying routers, filters, special processing paths, etc.)
[edit per request]
There is nothing per se in M.T. that explicitly suggests message orientation. But as a general problem, we are looking at widening interfaces, and enriched data flows. Per an API based approach, you would need to carefully inject the appropriate the tenant discriminant in all required interfaces (e.g. methods). A message based (or alternatively a context based API approach) allows for a normative (stable) interface (e.g. message.send()) and at the same allows for explicit specialized data flows. If switching to a message based backbone is not on the table, you are strongly suggested to consider injecting a uniform context (e.g. "RequestContext") param in your APIs. This single extension should cover all your future specialization needs.

SaaS / Multi-Tenancy approaches for Java-based (GWT, Spring, Hibernate) web applications

I am currently looking into converting a single-tenant Java based web-app that uses Spring, GWT, Hibernate, Jackrabbit, Hibernate Search / Lucene (among others) into a fully fledged SaaS style app.
I stumbled across an article that highlights the following 7 "things" as important changes to make to a single tenant app to make it an SaaS app:
The application must support multi-tenancy.
The application must have some level of self-service sign-up.
There must be a subscription/billing mechanism in place.
The application must be able to scale efficiently.
There must be functions in place to monitor, configure, and manage the application and tenants.
There must be a mechanism in place to support unique user identification and authentication.
There must be a mechanism in place to support some level of customization for each tenant.
My question is has anyone implemented any of the above 7 things in a SaaS /multi-tenant app using similar technologies to those that I have listed? I am keen to get as much input regarding the best ways to do so before I go down the path that I am currently considering.
As a start I am quite sure that I have a good handle on how to handle multiple tenants at a model level. I am thinking of adding a tenant ID to all of our tables and then using a Hibernate filter (and a Full Text Filter for Hibernate Search) to filter based on the logged on user's tenant ID for all queries.
I do however have some concerns around performance as well especially when our number of tenants grows quite high.
Any suggestions on how to implement such a solution will be greatly appreciated (and I apologise if this question is a bit too open-ended).

I would recommend that you architect your application to support all the 4 types of tenant isolation namely separate database for each tenant, separate schema for each tenant, separate table for each tenant and shared table for all tenants with a tenant ID. This will give you the flexibility to horizontally partition your database as you grow, having multiple databases each having a group of smaller tenants and also the ability to have a separate database for some large tenants. Some of your large tenants could also insist that their data (database) should reside in their premise, while the application can run off the cloud.
Here is an exaustive check list of non-functional and infrastructure level features that you may want to consider while architecting your application (some of them you may not need immediately, but think of a business situation of how you will handle such a need if your competition starts offering it)
tenant level customization of a) UI themes and logos b) forms and grids, c) data model extensions and custom fields, d) notification templates, e) pick up lists and master data
tenant level creation and administration of roles and privileges, field level access permissions, data scope policies
tenant level access control settings for modules and features, so that specific modules and features could be enabled / disabled depending on the subscription package.
Metering and monitoring of tasks / events / transactions and restriction of access control once the purchased quota is exceeded. The ability to meter any new entity in the future if and when your business model changes.
Externalising the business rules and workflows out of your code base and representing them as meta data, so that you can customize them for each tenant group / tenant.
Query builder for creating custom reports that is aware of the tenant as well as custom fields added by specific tenants.
Tenant encapsulation and framework level connection string management such that your developers do not have to worry about tenant IDs while writing queries.
All these are based on our experience in building a general purpose multi-tenant framework that can be used for any domain or application. Unfortunately, you cannot use our framework as it is based on .NET
But the engineering needs of any multi-tenant SaaS product (new or migrated) are the same irrespective of the technology stack that you use.

All of the technologies that you listed are quite common and reasonable for both single- and multi-tenant applications. I'd say supporting the 7 "things" for SaaS is much more of a function of how you use the technologies than which. It sounds like you already have a single-tenant application that works. So there's probably not much reason to deviate from the technology selections there unless something is just not working very well already. Your question is otherwise fairly open-ended though, so it's hard to be too much more specific there.
I do have some feedback on splitting the database (and perhaps other things) by tenant ID though. If you know you might eventually have a lot of tenants (say many thousands or more, particularly if they're small) then what you suggest is perhaps best. If however you'll have a smaller number of tenants (particularly if they're large) you might want to consider a database per tenant, so they each have their own table space. By that I mean a single database installation with multiple instances of the same schema inside of it, one per tenant.
There are a few reasons this can be an advantage. One is performance as you mentioned. Adding a tenant ID to every single table is overhead on disk access, query time and increases code complexity. Every index in the database will need to include the tenant ID as well. You run an additional risk of mixing data between tenants if you're not careful (although a Hibernate filter would help mitigate that). With a database per tenant you could restrict access to only the correct one. Porting your current application will probably be a lot easier too, you basically just need to intercept your request somewhere early to decide the tenant based on the URL and point to the right database. Backups are also easy to do per tenant, particularly useful if you ever intend on allowing them to download a backup.
On the other hand there are reasons not to do this. You'll have a lot of database schemas to deal with and they'll have to be updated independently (which can actually be an advantage if you want to avoid taking all tenants down for a schema change, you can roll them out incrementally). It lets you have special cases that could deviate from treating the platform as a true multi-tenant SaaS deployment that's upgraded all at once, resulting in management of multiple versions in production. Lastly I've heard there is a breaking point with just about every database vendor out there in the number of schema instances they'll support in one installation (supposedly some can go to hundreds of thousands though).
It really depends on your use case of course. You mentioned single-tenant which leads me to believe you don't have too many tenants right now, however you do mention growing to lots of tenants. I'm not sure if you mean hundreds or millions, yet either way I hope this helps some with your considerations. Best of luck!

There is no simple answer. I can describe my own solution. It may serve as an inspiration for the others.
tenant per database (postgres)
one additional database shared between tenants
Spring + MyBatis
Spring Security authentication
Details here: http://blog.trixi.cz/2012/01/multitenancy-using-spring-and-postgresql/

For (1): Hibernate supporting multi-tenant configurations out of the box from version 4.
At the moment of writing supported are DB-per-tenant and schema-per-tenant and keeping all tenants in a same DB using discriminator is not yet supported. We have used this functionality successfully in our application (DB-per-client approach).
For (3): After some investigation done we decided to go with Braintree to implement billing. Another solutions many people recommend: Authorize.net, Stripe, PayPal.
For (4): We have used clustered configuration with Hibernate/Spring and JBoss Cache for 2nd level caching. At these days this became "common" and using PaaS services like Jelastic you can even get it pre-configured out of the box.

What you describe is a full service Saas style application serving multiple tenants. There are a few things you have to decide like how critical is data isolation? If you are building for a medical or financial domain, data isolation is a critical factor.
Well, I cannot help answer all your points, but I would suggest looking at database-per-tenant approach for your application as it provides the highest level of data isolation.
Since you are using the Java, Spring, Hibernate stack, I can help you with a small example application I wrote. It is a working example which you can quickly run in your local laptop. I have shared it here. Do take a look and let me know if it answers some of your questions.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.