Lucene / Solr security when OLS (Oracle Label Security) is required

Lucene / Solr security when OLS (Oracle Label Security) is required - java

This is going to be a bit open-ended initially. My team has a requirement to utilize Oracle Label Security (OLS). Because we would like to enable "fast" search capabilities (Solr/Lucene) how can we correctly retrieve data that is cached (Lucene/ Solr) based on the OLS policy in place?

One way you can use external systems like your OLS is Solr's PostFilter interface. A very good write up how to do use this has been published in the article Custom security filtering in Solr by Erik Hatcher.
Basically you have a hook after all search and filtering has been done. There you can open a connection to your database and filter the search results according to the user's access rights.
To speed this up, you should consider to place some security relevant artifacts into you index, which you then include as ordinary filter. That way you can do a pre-filtering, so that you do not overwhelm the PostFilter.
Currently there is nothing pre-build by the community, but you could kick-off something on GitHub.

Related

Authorizing using WSO2 permissions

WSO2 seems to support 2 scenarios/models of authorization (e.g. as explained here): First is Database based Permission Store and the other is XACML based authorization using defined policies. The first allows you to define permission in a nice UI tree per each role while the later requires more complex policy definitions (but more flexible as it is fine grained permissions).
As far as I found it, those are separate mechanisms, and XACML queries does not consider the permissions defined by the first method. Since I need to support a simple RBAC model, I wish to concentrate on the first kind.
Using Java, I have found how to use the SDK to check the user decision using EntitlementServiceStub SDK, however I failed to find the SDK that can be used to check if the user has permissions of the first kind (I was just able to get the UI definitions, but I'm looking for something that can answer what is the decision/result, e.g. given "user1" & "/permission/protected/server-admin/homepage" can answer "true" - I suspect RemoteAuthorizationManagerServiceStub but not sure it is).
What is the SDK I'm looking for?

According to the post, it claims that database permission store is only available for Carbon based Server. And I think you are trying to do the same for identity server. In the documentation, it is not mentioned which means it does not support this feature for identity server

Apache velocity for user login system, is this a good language choice?

First off, is this even possible? From what iv'e used velocity servlets for in the past, I have never used them to retain information or sessions , just to be queried and return data to the page contexts. I have been doing a project in Velocity and now need to make the choice to either use velocity (if it is possible for this task) or something else. If I can't use velocity, or if there is a better option, what would it be?

Apache Velocity is not a language, it is a template engine with a distinct syntax.

I understand your need: should you use Apache Velocity as a template engine for your dynamic site, how would you then add the authentication feature to your site?
Velocity itself only does the rendering of the pages - to use it in the context of a web application, you need some kind of framework. There are tons of frameworks out there, the simplest one being Apache Velocity Tools, which exposes to the Velocity view several tools belonging to one of the three nested scopes (request, session and application).
Then, to add authentication to your site, you will need some kind of storage for logins and passwords. If you go the database way, you can try Velosurf, a specific Velocity Tool dedicated to database access. Velosurf contains an AuthenticationFilter which can fulfills your needs.
Disclaimer: I am the author of Velosurf.

How to implement row-level security in Java?

I am currently evaluating authentication / authorization frameworks.
Apache Shiro seems to be very nice but I am missing row-level security features.
E.g. there might be special rows in a database which should only visible and accessible by users with special privileges.
To avoid unnecessary round-trips, we currently modify the SQL queries to join with our authorization data to get only the visible rows for the current user.
But this concepts doesn't feel 'right' to me, because we mix business code with security related code which should be orthogonal and independent from each other.
What solutions are available/possible?
How do you implement row-level security (especially in combination with jpa)?
UPDATE:
Target database is mostly Oracle 10g/11g
- but a database independent solution would be preferred if there are no big drawbacks

Row level security is really best done in the database itself. The database has to be told what your user context is when you grab a connection. That user is associated with one or more security groups. The database then automatically appends filters to user supplied queries to filter out what can't be seen from the security groups. This of course means that this is a per database-type solution.
Oracle has pretty good Row Level Security support, see http://www.orafusion.com/art_fgac.htm as an example.

We implemented it as JDBC wrapper.
This wrapper simply parses and transforms SQL.
Hibernate filter is good idea too but we have many reports and ad-hoc queries, Hibernate is not the only tool to access data in our applications.
jsqlparser is an excellent open source SQL parser but we have to fork it to fix some issues and to add support of some advanced SQL features e.g. ROLLUP for reporting purposes https://github.com/jbaliuka/sql-analytic
This reporting tool is also available on github but there is no dependency on row level security infrastructure https://github.com/jbaliuka/x4j-analytic

There is a helpful article: http://mattfleming.com/node/243
The idea is that you can implement row level functionality in two ways: directly setting restrictions in your repository or binding the restrictions via AOP. The latter is preferred because security layer should be separated from business logic (orthogonal concerns).
In Hibernate you can use the concept of filters which are applied transparently and repository doesn't know about them. You can add such filters via AOP. The other way is intercepting session.createCriteria() and adding Restrictions to the Criteria transparently using AOP.

SaaS / Multi-Tenancy approaches for Java-based (GWT, Spring, Hibernate) web applications

I am currently looking into converting a single-tenant Java based web-app that uses Spring, GWT, Hibernate, Jackrabbit, Hibernate Search / Lucene (among others) into a fully fledged SaaS style app.
I stumbled across an article that highlights the following 7 "things" as important changes to make to a single tenant app to make it an SaaS app:
The application must support multi-tenancy.
The application must have some level of self-service sign-up.
There must be a subscription/billing mechanism in place.
The application must be able to scale efficiently.
There must be functions in place to monitor, configure, and manage the application and tenants.
There must be a mechanism in place to support unique user identification and authentication.
There must be a mechanism in place to support some level of customization for each tenant.
My question is has anyone implemented any of the above 7 things in a SaaS /multi-tenant app using similar technologies to those that I have listed? I am keen to get as much input regarding the best ways to do so before I go down the path that I am currently considering.
As a start I am quite sure that I have a good handle on how to handle multiple tenants at a model level. I am thinking of adding a tenant ID to all of our tables and then using a Hibernate filter (and a Full Text Filter for Hibernate Search) to filter based on the logged on user's tenant ID for all queries.
I do however have some concerns around performance as well especially when our number of tenants grows quite high.
Any suggestions on how to implement such a solution will be greatly appreciated (and I apologise if this question is a bit too open-ended).

I would recommend that you architect your application to support all the 4 types of tenant isolation namely separate database for each tenant, separate schema for each tenant, separate table for each tenant and shared table for all tenants with a tenant ID. This will give you the flexibility to horizontally partition your database as you grow, having multiple databases each having a group of smaller tenants and also the ability to have a separate database for some large tenants. Some of your large tenants could also insist that their data (database) should reside in their premise, while the application can run off the cloud.
Here is an exaustive check list of non-functional and infrastructure level features that you may want to consider while architecting your application (some of them you may not need immediately, but think of a business situation of how you will handle such a need if your competition starts offering it)
tenant level customization of a) UI themes and logos b) forms and grids, c) data model extensions and custom fields, d) notification templates, e) pick up lists and master data
tenant level creation and administration of roles and privileges, field level access permissions, data scope policies
tenant level access control settings for modules and features, so that specific modules and features could be enabled / disabled depending on the subscription package.
Metering and monitoring of tasks / events / transactions and restriction of access control once the purchased quota is exceeded. The ability to meter any new entity in the future if and when your business model changes.
Externalising the business rules and workflows out of your code base and representing them as meta data, so that you can customize them for each tenant group / tenant.
Query builder for creating custom reports that is aware of the tenant as well as custom fields added by specific tenants.
Tenant encapsulation and framework level connection string management such that your developers do not have to worry about tenant IDs while writing queries.
All these are based on our experience in building a general purpose multi-tenant framework that can be used for any domain or application. Unfortunately, you cannot use our framework as it is based on .NET
But the engineering needs of any multi-tenant SaaS product (new or migrated) are the same irrespective of the technology stack that you use.

All of the technologies that you listed are quite common and reasonable for both single- and multi-tenant applications. I'd say supporting the 7 "things" for SaaS is much more of a function of how you use the technologies than which. It sounds like you already have a single-tenant application that works. So there's probably not much reason to deviate from the technology selections there unless something is just not working very well already. Your question is otherwise fairly open-ended though, so it's hard to be too much more specific there.
I do have some feedback on splitting the database (and perhaps other things) by tenant ID though. If you know you might eventually have a lot of tenants (say many thousands or more, particularly if they're small) then what you suggest is perhaps best. If however you'll have a smaller number of tenants (particularly if they're large) you might want to consider a database per tenant, so they each have their own table space. By that I mean a single database installation with multiple instances of the same schema inside of it, one per tenant.
There are a few reasons this can be an advantage. One is performance as you mentioned. Adding a tenant ID to every single table is overhead on disk access, query time and increases code complexity. Every index in the database will need to include the tenant ID as well. You run an additional risk of mixing data between tenants if you're not careful (although a Hibernate filter would help mitigate that). With a database per tenant you could restrict access to only the correct one. Porting your current application will probably be a lot easier too, you basically just need to intercept your request somewhere early to decide the tenant based on the URL and point to the right database. Backups are also easy to do per tenant, particularly useful if you ever intend on allowing them to download a backup.
On the other hand there are reasons not to do this. You'll have a lot of database schemas to deal with and they'll have to be updated independently (which can actually be an advantage if you want to avoid taking all tenants down for a schema change, you can roll them out incrementally). It lets you have special cases that could deviate from treating the platform as a true multi-tenant SaaS deployment that's upgraded all at once, resulting in management of multiple versions in production. Lastly I've heard there is a breaking point with just about every database vendor out there in the number of schema instances they'll support in one installation (supposedly some can go to hundreds of thousands though).
It really depends on your use case of course. You mentioned single-tenant which leads me to believe you don't have too many tenants right now, however you do mention growing to lots of tenants. I'm not sure if you mean hundreds or millions, yet either way I hope this helps some with your considerations. Best of luck!

There is no simple answer. I can describe my own solution. It may serve as an inspiration for the others.
tenant per database (postgres)
one additional database shared between tenants
Spring + MyBatis
Spring Security authentication
Details here: http://blog.trixi.cz/2012/01/multitenancy-using-spring-and-postgresql/

For (1): Hibernate supporting multi-tenant configurations out of the box from version 4.
At the moment of writing supported are DB-per-tenant and schema-per-tenant and keeping all tenants in a same DB using discriminator is not yet supported. We have used this functionality successfully in our application (DB-per-client approach).
For (3): After some investigation done we decided to go with Braintree to implement billing. Another solutions many people recommend: Authorize.net, Stripe, PayPal.
For (4): We have used clustered configuration with Hibernate/Spring and JBoss Cache for 2nd level caching. At these days this became "common" and using PaaS services like Jelastic you can even get it pre-configured out of the box.

What you describe is a full service Saas style application serving multiple tenants. There are a few things you have to decide like how critical is data isolation? If you are building for a medical or financial domain, data isolation is a critical factor.
Well, I cannot help answer all your points, but I would suggest looking at database-per-tenant approach for your application as it provides the highest level of data isolation.
Since you are using the Java, Spring, Hibernate stack, I can help you with a small example application I wrote. It is a working example which you can quickly run in your local laptop. I have shared it here. Do take a look and let me know if it answers some of your questions.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.