advantage and disadvantage of Tomcat clustering

advantage and disadvantage of Tomcat clustering - java

Currently we have same web application deployed on 4 different tomcat instances, each one running on independent machine. A load Balancer distribute requests to these servers. Our web application makes database calls, maintain cache (key-value pairs). All tomcat instances read same data(XML) from same data-source(another server) and serve it to clients. In future, we are planning to collect some usage data from requests, process and store it in database. This functionality should be common(one module) between all tomcat servers.
Now we are thinking of using tomcat clustering. I done some research but I am not able to figure out how to separate data fetching operations i.e. reading same data(XML) from same data-source(another server) part from all tomcat web apps that make it common. So that once one server fetches data from server, it will maintain it (may be in cache) and the same data can be used by some other server to serve client. Now this functionality can be implemented using distributed cache. But there are other modules that can be made common in all other tomcat instances.
So basically, Is there any advantage of using Tomcat clustering? And if yes then how can I implement module which are common to all tomcat servers.

Read Tomcat configuration reference and clustering guide. Available clustering features are as follows:
The tomcat cluster implementation provides session replication,
context attribute replication and cluster wide WAR file deployment.
So, by clustering, you'll gain:
High availability: when one node fails, another will be able to take over without losing access to the data. For example, a HTTP session can still be handled without the user noticing the error.
Farm deployment: you can deploy your .war to a single node, and the rest will synchronize automatically.
The costs are mainly in performance:
Replication implies object serialization between the nodes. This may be undesired in some cases, but it's also possible to fine tune.
If you just want to share some state between the nodes, then you don't need clustering at all (unless you're going to use context or session replication). Just use a database and/or a distributed cache model like ehcache (or anything else).

Related

Design considerations for J2EE webapp on Tomcat in Amazon WebServices

My project is looking to deploy a new j2ee application to Amazon's cloud. ElasticBeanstalk supports Tomcat apps, which seems perfect. Are there any particular design considerations to keep in mind when writing said app that might differ from just a standalone tomcat on a server?
For example, I understand that the server is meant to scale automatically. Is this like a cluster? Our application framework tends to like to stick state in the HttpSession, is that a problem? Or when it says it scales automatically, does that just mean memory and CPU?

Automatic scaling on AWS is done via adding more servers, not adding more CPU/RAM. You can add more CPU/RAM manually, but it requires shutting down the server for a minute to make the change, and then configuring any software running on the server to take advantage of the added RAM, so that's not the way automatic scaling is done.
Elastic Beanstalk is basically a management interface for Amazon EC2 servers, Elastic Load Balancers and Auto Scaling Groups. It sets all that up for you and provides a convenient way of deploying new versions of your application easily. Elastic Beanstalk will create EC2 servers behind an Elastic Load Balancer and use an Auto Scaling configuration to add more servers as your application load increases. It handles adding the servers to the load balancer when they are ready to receive traffic, and removing them from the load balancer and deleting the extra servers when they are no longer needed.
For your Java application running on Tomcat you have a few options to handle horizontal scaling well. You can enable sticky sessions on the Load Balancer so that all requests from a specific user will go to the same server, thus keeping the HttpSession tied to the user. The main problem with this is that if a server is removed from the pool you may lose some HttpSessions and cause any users that were "stuck" to that server to be logged out of your application. The solution to this is to configure your Tomcat instances to store sessions in a shared location. There are Tomcat session store implementations out there that work with AWS services like ElastiCache (Redis) and DynamoDB. I would recommend using one of those, probably the Redis implementation if you aren't already familiar with DynamoDB.
Another consideration for moving a Java application to AWS is that you cannot use any tools or libraries that rely on multi-cast. You may not be using multi-cast for anything, but in my experience every Java app I've had to migrate to AWS relied on multi-cast for clustering and I had to modify it to use a different clustering method.
Also, for a successful migration to AWS I suggest you read up a bit on VPCs, private IP versus public IP, and Security Groups. A solid understanding of those topics is key to setting up your network so that your web servers can communicate with your DB and cache servers in a secure and performant manner.

About the WebSphere Application Server cluster?

Cluster:
A logical grouping of one or more functionally identical application server processes. A cluster provides ease of deployment, configuration, workload balancing, and fallback redundancy. A cluster is a collection of servers working together as a single system to ensure that mission-critical applications and resources remain available to clients.
Clusters provide scalability. For more information, refer to additional documentation that customer support may provide that describes vertical and horizontal clustering in the WebSphere Application Server distributed environment.
Above is the explanation for WebSphere cluster. In WebSphere world, a cell can have one or many clusters. I want to know in which case one application should be deployed in more than one cluster in WebSphere?

You cannot deploy exactly the same application to more than one cluster, if you need more processing power you just add members to the cluster.
One of the few reasons that comes to my mind to deploy to second cluster, could be to use different application version - check this post for more details and restrictions of deploying multiple versions of the same application.

How to load existing http sessions from an existing server to another server?

I would like to know if Servlet specifications provides a way to load http sessions into my web application.
The idea is simple : every time a new http client is connected, a new session is created... and I will send this session and its values into a database (for the time being this step is easy to do).
If this "master server" dies, another machine will take its IP address, so http clients will now send their requests to this new machine (lets call it "slave server").
Here I would like my slave server retrieve sessions from the old server... but I don't know which method from Servlet specifications can "add" session ! Is there a way to do it ?
PS: it's for an university project, so I cannot use already existing modules like Tomcat's mod_jk for this homemade load-balancer.
EDIT:
I think that a lot of people think I am crazy to not use already existing tools. It's an university project, and I have to make it with my bare hands in order to show to my professors the low level mecanisms that I have used. I already know it would be crazy to use what I am doing in production, when this project will be finished, it will be thrown in the trash.
For the moment, I didn't find a "standard way" to make it with the Servlet specifications, but I can maybe do it with Manager and Session from Tomcat native classes... How can I get the instances for those interfaces ?

This isn't exactly a new idea and is called session replication. There are a couple of ways to do this. The easiest ones imho are (in ascending order of preference):
Jetty's Session clustering with a database
Tomcat's Session clustering. I personally prefer the BackupManager, which makes sure that a session lives on 2 servers in a cluster at any given point in time and forwards clients accordingly. This reduces the network traffic for session replication to a bare minimum.
Session replication with a distributed cache like hazelnuts or ehcache. There are plugins for both jetty and Tomcat to do this. Since most often a cache is used anyway, this is to be the best solution for me. What I tend to do is to put 2 round robin balanced varnish servers in front of such a cluster, which serve the dual purpose role of load balancing the cluster and serving static content from in memory cache.
As for your university project, I'd turn in an embedded jetty with automatic session replication which connects to other servers via broadcast using hazelcast. Useful, not overcomplicated (iirc, you need to implement 2 relatively simple interfaces), yet powerful. Put a varnish in front of your test machines and you should be good to go.

This feature is supported by all major Java EE application server vendors out of the box, so you shouldn't implement anything by yourself. As Markus wrote it is referred as session replication or session persistence. You can take a look at WebSphere Liberty which is available for free for development. It supports it out of the box, without need to implement anything. You just need to:
install Liberty Download just the Liberty profile runtime
configure session replication Configuring session persistence for the Liberty profile
install and configure IBM Http Server for load balancing Configuring a web server plug-in for the Liberty profile

Load Balancing Tomcat 7 for Application Deployment

I am serving a java app through apache mod_jk and tomcat 7. I want to be able to deploy a new instance of the application ( on a separate tomcat instance) that will accept all new sessions. However all existing sessions will continue to be served by the old tomcat. Then after all users have logged off or after a certain time the old server will be shut down and all traffic will be handled by the new tomcat ( I don't expect the load balancer to do this ). This will allow me to deploy without disrupting any connected users.
I have read about mod_jk lad balancing which provides the sticky sessions that I need but I have not found how to force all new sessions to be served from the new application. It looks simple enough to set up a round robbin, but that is not what i want.
So the formal question is:
Are there any load balancers for tomcat7/apache that will allow me to customize balancing rules to respect sticky sessions but preferentially serve from one node?
Any thoughts on how to best achieve this?

Each node manages it's own session data. To remove a node with minimal disruptuion to connected users you need to share session data across all nodes. Tomcat provides session replication for this. Even with replication, it is possilbe that a node may crash before it has shared it's data. There are other solutions as dicussed here

Tomcat supports running multiple versions of the one web application with the Parallel Deployment feature. When a new session is created, it will be using the most recent version of the web application. Existing sessions will continue to use the version of the web application that was the most recent at the session creation time.
Here is an article that discusses Parallel Deployment: http://www.objectpartners.com/2012/04/17/tomcat-v7-parallel-deployment/

How to run multiple tomcats against the same Database with load balancing

Please suggest what are different ways of achieving load balance on database while more than one tomcat is accessing the same database?
Thanks.

This is a detailed example of using multiple tomcat instances and an apache based loadbalancing control
Note, if you have a hardware that would make a load balancing, its even more preferable way as to me (place it instead of apache).
In short it works like this:
A request comes from some client to apache web server/hardware loadbalancer
the web server determines to which node it wants to redirect the request for futher process
the web server calls Tomcat and tomcat gets the request
the Tomcat process the request and sends it back.
Regarding the database :
- tomcat itself has nothing to do with your Database, its your application that talks to DB, not a Tomcat.
Regardless your application layer you can establish a cluster of database servers (For example google for Oracle RAC, but its entirely different story)
In general, when implementing application layer loadbalancing please notice that the common state of the application gets replicated.
The technique called "sticky session" partially handles the issue but in general you should be aware of it.
Hope this helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.