Scalable http session management (java, linux)

Scalable http session management (java, linux) - java

Is there a best-practice for scalable http session management?
Problem space:
Shopping cart kind of use case. User shops around the site, eventually checking out; session must be preserved.
Multiple data centers
Multiple web servers in each data center
Java, linux
I know there are tons of ways doing that, and I can always come up with my own specific solution, but I was wondering whether stackoverflow's wisdom of crowd can help me focus on best-practices
In general there seem to be a few approaches:
Don't keep sessions; Always run stateless, religiously [doesn't work for me...]
Use j2ee, ejb and the rest of that gang
use a database to store sessions. I suppose there are tools to make that easier so I don't have to craft all by myself
Use memcached for storing sessions (or other kind of intermediate, semi persistent storage)
Use key-value DB. "more persistent" than memcached
Use "client side sessions", meaning all session info lives in hidden form fields, and passed forward and backward from client to server. Nothing is stored on the server.
Any suggestions?
Thanks

I would go with some standard distributed cache solution.
Could be your application server provided, could be memcached, could be terracotta
Probably doesn't matter too much which one you choose, as long as you are using something sufficiently popular (so you know most of the bugs are already hunted down).
As for your other ideas:
Don't keep session - as you said not possible
Client Side Session - too unsecure - suppose someone hacks the cookie to put discount prices in the shopping cart
Use database - databases are usually the hardest bottleneck to solve, don't put any more there than you absolutely have to.
Those are my 2 cents :)
Regarding multiple data centers - you will want to have some affinity of the session to the data center it started on. I don't think there are any solutions for distributed cache that can work between different data centers.

You seem to have missed out vanilla replicated http sessions from your list. Any servlet container worth its salt supports replication of sessions across the cluster. As long as the items you put into the session aren't huge, and are serializable, then it's very easy to make it work.
http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html
edit: It seems, however, that tomcat session replication doesn't scale well to large clusters. For that, I would suggest using JBoss+Tomcat, which gives the idea of "buddy replication":
http://www.jboss.org/community/wiki/BuddyReplicationandSessionData

I personally haven't managed such clusters, but when I took a J2EE course at the university the lecturer said to store sessions in a database and don't try to cache it. (You can't meaningfully cache dynamic pages anyway.) Http sessions are client-side by the definition, as the session-id is a cookie. If the client refuses to store cookies (e.g. he's paranoid about tracking), then he can't have a session.
You can get this id by calling HttpSession.getId().
Of course database is a bottleneck, so you'll end up with two clusters: an application server cluster and a database cluster.
As far as I know, both stateful message beans and regular servlet http sessions exist only in memory without load balancing built in.
Btw. I wouldn't store e-mail address or usernames in a hidden field, but maybe the content of the cart isn't that sensitive data.

I would rather move away from storing user application state in an HTTP session, but that would require a different way of thinking how the application works and use a RESTful stateless architecture. This normally involves dropping support for earlier versions of browsers that do not support MVWW architectures on the client side.
The shopping cart isn't a user application state it is an application state which means it would be stored on a database and managed as such. There can be an association table that would link the user to one or many shopping carts assuming the sharing of carts is possible.
Your biggest hurdle would likely be how to authenticate the user for every request if it is stateless. BASIC auth is the simplest approach that does not involve sessions, FORM-auth will require sessions regardless. A JASPIC implementation (like HTTP Headers or OAuth) will be able to mitigate your authentication concerns elsewhere, in which case a cookie can be used to manage your authentication token (like FORM-auth) or HTTP header like SiteMinder or Client Side Certificates with Apache.
The more expensive databases like DB2 have High Availability and Disaster Recovery features that work across multiple data centers. Note that it is not meant for load balancing the database, since there'd be a large impact due to network traffic.

Related

REST completely stateless, possible?

Can anyone give me an example of truly stateless RESTful endpoints? a simple question, if server is completely stateless, how do we invalidate previous tokens? I consider saving state to DB as bad practice. lets say there are hundreds of requests per second, that would mean hundreds of queries to DB per second (if you save state to DB) and that's bad news. if you save state to server, you'll run into session transfer problem when using multiple servers and load balancers.

Well one example of course would be endpoints that don't need authenticating, and dependent on your structure there are others. For example if you are using something like AngularJS you don't need to have authorization in the same way as you would use it with something like a developer API - you can use session variables which can be signed and stateless.
If you are worried about performance of performing database queries on simple state things like this, it is worth looking at some solutions like Redis, which you can send hundreds of queries to with very little strain.

Stateless restful endpoints by definition wouldn't use tokens (having a lifetime) or state. If you need those, then you don't have a truly stateless restful endpoint.
As an answer to your question, a web server without authentication or a similar mechanism could be considered truly stateless rest endpoint. It would just deliver a file from disk on GET request to anyone requesting it.
Also, if authentication is hardcoded basic auth or similar mechanism sending login details on every request, it would be still stateless. When you start adding tokens that expire you definitely already have state.
For details on doing authentication in a stateless REST manner you can read up on this discussion.

How to manage sessions in a distributed application

I have a Java web application which is deployed on two VMs. and NLB (Network Load Balancing) is set for these VMs. My Application uses sessions. I am confused that how the user session is managed in both VMs. i.e. For Example- If I make a request that goes to VM1 and create a user session. Now the second time I make request and it goes to VM2 and want to access the session data. How would it find the session which has been created in VM1.
Please Help me to clear this confusion.

There are several solutions:
configure the load balancer to be sticky: i.e. requests belonging to the same session would always go to the same VM. The advantage is that this solution is simple. The disadvantage is that if one VM fails, half of the users lose their session
configure the servers to use persistent sessions. If sessions are saved to a central database and loaded from this central database, then both VMs will see the same data in the session. You might still want to have sticky sessions to avoid concurrent accesses to the same session
configure the servers in a cluster, and to distribute/replicate the sessions on all the nodes of the cluster
avoid using sessions, and just use an signed cookie to identify the users (and possibly contain a few additional information). A JSON web token could be a good solution. Get everything else from the database when you need it. This ensures scalability and failover, and, IMO, often makes things simpler on the server instead of making it more complicated.
You'll have to look in the documentation of your server to see what is possible with that server, or use a third-party solution.

We can use distributed Redis to store the session and that could solve this problem.

What is the best approach to build a system with high amount of data communication?

Hello
I have a cache server (written with Java+Lucene Framework) which keeps large amount of data and provides them according to request query.
It basically works like this:
On the startup, it connects DB and stores all tables to the RAM.
It listens for requests and provides the proper data as array lists (about 1000 - 20000 rows)
When a user visits to the web page, it connects to the cache server, requests, and show the server response.
I planned to run web and cache applications in different instances because of memory issues. Cache Server is as service and web is on Tomcat.
What is your suggestion about how the communication should be built between web side and cache server ?
I need to pass large amount of data with array lists from one instance to another. Should I think web services (xml communication), nio socket communication (maybe Apache MINA) or the solutions like CORBA ?
Thanks.

It really depends very much on considerations you have not specified.
What are the clients? for example, if your clients are javascript running AJAX, obviously something over HTTP is more useful than a proprietary UDP solution.
What network is it working on? Local networks behave differently than internet, and mobile internet is quite different than both.
How elaborate use can you make of caching? If you use HTTP you can have a rather good control (through HTTP headers) of both client cache and network caches, and a plethora of existing software that can make use of both.
There are many other considerations to be taken into account, and there are many existing implementations of systems matching the more-common needs. From your (not very detailed) description you gave, I would recommend having a look at Redis.

can i use LDAP servers like ApacheDS as a persistence solution for desktop applications?

a fella recommended me to use ApacheDS as a replacement for my database (MySQL) you can find the discussion here
i am completely new to LDAP and ApacheDS (actually i had no idea about it yesterday), i searched about it and read some articles , finally i got this page.
considering LDAP a network protocol (if it is) is it possible or is it a wise choice to use LDAP Servers like ApacheDS as a persistence solution for desktop applications ?
doesn't LDAP need an application server (like tomcat) to run?
can you please light me up :)
thnx

LDAP needs an LDAP service to run, like ApacheDS, OpenLDAP or the like. It doesn't need anything else.
There are two advantages of LDAP has over an SQL database.
One is much finer access controls e.g. you can have a "column" which can be updated by anyone in the "adminstrator" group and readable by the user and his/her manager only. The LDAP database can implement your security policy which ensures it is centrally auditable.
LDAP databases tend to have better query and read performance (sometimes by an order magnitude), but much lower write performance (also sometimes by an order of magnitude). This is on the assumption that you use it to look up details e.g. username/password far more often than you change them.
I wouldn't use an LDAP database for logging for this reason.

There are many uses of LDAP as a data store for other things than users. As matter of fact, LDAP is often considered as one of the first NoSQL servers.
I know of a teleconference software vendor who used an LDAP directory server to replace a SQL database to gain High Availability and distribution. With their software deployed in several locations worldwide, having a single database wouldn't scale, and created issue at the network level. With LDAP and the multi-master replication capabilities of the server, they were able to have a server in each location, to control the replication flows and even leveraged the distributed nature of data to increase their services.
Java based LDAP directory servers like Apache DS or OpenDJ (opendj.org) give you flexibility in the deployment and can even be embedded in Java applications such as Web applications.
Finally while LDAP servers were designed for many reads and few writes, servers now are capable of heavy writes (although I would not use them for write only activities such as logging). OpenDJ for example has been tested with up to 15000 modifications / second on a 10 millions users database. The same configuration was able to handle over 60000 searches per second. To be fair, the JVM heap size was 32GB.
Regards,
Ludovic.

For deploy LDAP you must ldap server only. For example openldap or ApacheDS.
I used openldap as a persistence solution for web application and it worked.
There is an important difference: sql is relation but ldap is the tree!

what does it mean when they say http is stateless

I am studing java for web and it mentions http is stateless.
what does that mean and how it effects the programming
I was also studying the spring framework and there it mentions some beans have to declared as inner beans as their state changes . What does that means?

HTTP -- that is the actual transport protocol between the server and the client -- is "stateless" because it remembers nothing between invocations. EVERY resource that is accessed via HTTP is a single request with no threaded connection between them. If you load a web page with an HTML file that within it contains three <img> tags hitting the same server, there will be four TCP connections negotiated and opened, four data transfers, four connections closed. There is simply no state kept at the server at the protocol level that will have the server know anything about you as you come in.
(Well, that's true for HTTP up to 1.0 at any rate. HTTP 1.1 adds persistent connection mechanisms of various sorts because of the inevitable performance problems that a truly stateless protocol engenders. We'll overlook this for the moment because they don't really make HTTP stateful, they just make it dirty-stateless instead of pure-stateless.)
To help you understand the difference, imagine that a protocol like Telnet or SSH were stateless. If you wanted to get a directory listing of a remote file, you would have to, as one atomic operation, connect, sign in, change to the directory and issue the ls command. When the ls command finished displaying the directory contents, the connection would close. If you then wanted to display the contents of a specific file you would have to again connect, sign in, change to the directory and now issue the cat command. When the command displaying the file finished, the connection would again close.
When you look at it that way, though the lens of Telnet/SSH, that sounds pretty stupid, doesn't it? Well, in some ways it is and in some ways it isn't. When a protocol is stateless, the server can do some pretty good optimizations and the data can be spread around easily. Servers using stateless protocols can scale very effectively, so while the actual individual data transfers can be very slow (opening and closing TCP connections is NOT cheap!) an overall system can be very, very efficient and can scale to any number of users.
But...
Almost anything you want to do other than viewing static web pages will involve sessions and states. When HTTP is used for its original purpose (sharing static information like scientific papers) the stateless protocol makes a lot of sense. When you start using it for things like web applications, online stores, etc. then statelessness starts to be a bother because these are inherently stateful activities. As a result people very rapidly came up with ways to slather state on top of the stateless protocol. These mechanisms have included things like cookies, like encoding state in the URLs and having the server dynamically fire up data based on those, like hidden state requests, like ... well, like a whole bunch of things up to and including the more modern things like Web Sockets.
Here are a few links you can follow to get a deeper understanding of the concepts:
http://en.wikipedia.org/wiki/Stateless_server
http://en.wikipedia.org/wiki/HTTP
http://en.wikipedia.org/wiki/HTTP_persistent_connection

HTTP is stateless - this means that when using HTTP the end point does not "remember" things (such as who you are). It has no state. This is in contrast to a desktop application - if you have a form and you go to a different form, then go back, the state has been retained (so long as you haven't shut down the application).
Normally, in order to maintain state in web application, one uses cookies.

A stateless protocol does not require the server to retain information or status about each user for the duration of multiple requests. For example, when a web server is required to customize the content of a web page for a user, the web application may have to track the user's progress from page to page.
A common solution is the use of HTTP cookies. Other methods include server side sessions, hidden variables (when the current page is a form), and URL-rewriting using URI-encoded parameters, e.g., /index.php?session_id=some_unique_session_code.
here

HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it.
This shortcoming of HTTP is being addressed in a number of new technologies, including cookies.

When it's said that something is stateless it usually means that you can't assume that the server tracks any state between interactions.
By default the HTTP protocol assumes a truly stateless server. Every request is treated as an independent request.
In practice this is fixed by some servers (most of them) using a tracking cookie in the request to match some state on the server with a specific client. This works because the way cookies work (they are posted to server on each subsequent requests once they have been set on the client).
Basically a server that isn't stateless is an impediment to scale. You need to either make sure that you route all the requests from a specific browser to the same instance or to do backend replication of the states. This usually is a limiting factor when trying to scale an application.
There are some other solutions for keeping track of state (see rails's encrypted state cookie) but basically if you want to grow you need to figure a way to avoid tracking state on the server :).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.