My project is looking to deploy a new j2ee application to Amazon's cloud. ElasticBeanstalk supports Tomcat apps, which seems perfect. Are there any particular design considerations to keep in mind when writing said app that might differ from just a standalone tomcat on a server?
For example, I understand that the server is meant to scale automatically. Is this like a cluster? Our application framework tends to like to stick state in the HttpSession, is that a problem? Or when it says it scales automatically, does that just mean memory and CPU?
Automatic scaling on AWS is done via adding more servers, not adding more CPU/RAM. You can add more CPU/RAM manually, but it requires shutting down the server for a minute to make the change, and then configuring any software running on the server to take advantage of the added RAM, so that's not the way automatic scaling is done.
Elastic Beanstalk is basically a management interface for Amazon EC2 servers, Elastic Load Balancers and Auto Scaling Groups. It sets all that up for you and provides a convenient way of deploying new versions of your application easily. Elastic Beanstalk will create EC2 servers behind an Elastic Load Balancer and use an Auto Scaling configuration to add more servers as your application load increases. It handles adding the servers to the load balancer when they are ready to receive traffic, and removing them from the load balancer and deleting the extra servers when they are no longer needed.
For your Java application running on Tomcat you have a few options to handle horizontal scaling well. You can enable sticky sessions on the Load Balancer so that all requests from a specific user will go to the same server, thus keeping the HttpSession tied to the user. The main problem with this is that if a server is removed from the pool you may lose some HttpSessions and cause any users that were "stuck" to that server to be logged out of your application. The solution to this is to configure your Tomcat instances to store sessions in a shared location. There are Tomcat session store implementations out there that work with AWS services like ElastiCache (Redis) and DynamoDB. I would recommend using one of those, probably the Redis implementation if you aren't already familiar with DynamoDB.
Another consideration for moving a Java application to AWS is that you cannot use any tools or libraries that rely on multi-cast. You may not be using multi-cast for anything, but in my experience every Java app I've had to migrate to AWS relied on multi-cast for clustering and I had to modify it to use a different clustering method.
Also, for a successful migration to AWS I suggest you read up a bit on VPCs, private IP versus public IP, and Security Groups. A solid understanding of those topics is key to setting up your network so that your web servers can communicate with your DB and cache servers in a secure and performant manner.
Related
I have a typical stateless Java application which provides a REST API and performs updates (CRUD) in a Postgresql Database.
However the number of clients is growing and I feel the need to
Increase redundancy, so that if one fails another takes place
For this I will probably need a load balancer?
Increase response speed by not flooding the network and the CPU of just one server (however how will the load balancer not get flooded?)
Maybe I will need to distribute the Database?
I want to be able to update my app seamlessly (I have seen a thingy called kubernetes doing this): Kill each redundant node one by one and immediately replace it with an updated version
My app also stores some image files, which grow fast in disk size, I need to be able to distribute them
All of this must be backup-able
This is the diagram of what I have now (both Java app and DB are on the same server):
What is the best/correct way of scaling this?
Thanks!
Web Servers:
Run your app on multiple servers, behind a load balancer. Use AWS Elastic Beanstalk or roll your own solution with EC2 + Autoscaling Groups + ELB.
You mentioned a concern about "flooding" of the load balancer, but if you use Amazon's Elastic Load Balancer service it will scale automatically to handle whatever traffic you get so that you don't need to worry about this concern.
Database Servers:
Move your database to RDS and enable multi-az fail-over. This will create a hot-standby server that your database will automatically fail-over to if there are issues with your primary server. Optionally add read replicas to scale-out your database capacity.
Start caching your database queries in Redis if you aren't already. There are plugins out there to do this with Hibernate fairly easily. This will take a huge load off your database servers if your app performs the same queries regularly. Use AWS ElastiCache or RedisLabs for your Redis server(s).
Images:
Stop storing your image files on your web servers! That creates lots of scalability issues. Move those to S3 and serve them directly from S3. S3 gives you unlimited storage space, automated backups, and the ability to serve the images directly from S3 which reduces the load on your web servers.
Deployments:
There are so many solutions here that it just becomes a question about which method someone prefers. If you use Elastic Beanstalk then it provides a solution for deployments. If you don't use EB, then there are hundreds of solutions to pick from. I'd recommend designing your environment first, then choosing an automated deployment solution that will work with the environment you have designed.
Backups:
If you do this right you shouldn't have much on your web servers to backup. With Elastic Beanstalk all you will need in order to rebuild your web servers is the code and configuration files you have checked into Git. If you end up having to backup EC2 servers you will want to look into EBS snapshots.
For database backups, RDS will perform a daily backup automatically. If you want backups outside RDS you can schedule those yourself using pg_dump with a cron job.
For images, you can enable S3 versioning and multi-region replication.
CDN:
You didn't mention this, but you should look into a CDN. This will allow your application to be served faster while reducing the load on your servers. AWS provides the CloudFront CDN, and I would also recommend looking at CloudFlare.
I am serving a java app through apache mod_jk and tomcat 7. I want to be able to deploy a new instance of the application ( on a separate tomcat instance) that will accept all new sessions. However all existing sessions will continue to be served by the old tomcat. Then after all users have logged off or after a certain time the old server will be shut down and all traffic will be handled by the new tomcat ( I don't expect the load balancer to do this ). This will allow me to deploy without disrupting any connected users.
I have read about mod_jk lad balancing which provides the sticky sessions that I need but I have not found how to force all new sessions to be served from the new application. It looks simple enough to set up a round robbin, but that is not what i want.
So the formal question is:
Are there any load balancers for tomcat7/apache that will allow me to customize balancing rules to respect sticky sessions but preferentially serve from one node?
Any thoughts on how to best achieve this?
Each node manages it's own session data. To remove a node with minimal disruptuion to connected users you need to share session data across all nodes. Tomcat provides session replication for this. Even with replication, it is possilbe that a node may crash before it has shared it's data. There are other solutions as dicussed here
Tomcat supports running multiple versions of the one web application with the Parallel Deployment feature. When a new session is created, it will be using the most recent version of the web application. Existing sessions will continue to use the version of the web application that was the most recent at the session creation time.
Here is an article that discusses Parallel Deployment: http://www.objectpartners.com/2012/04/17/tomcat-v7-parallel-deployment/
Currently we have same web application deployed on 4 different tomcat instances, each one running on independent machine. A load Balancer distribute requests to these servers. Our web application makes database calls, maintain cache (key-value pairs). All tomcat instances read same data(XML) from same data-source(another server) and serve it to clients. In future, we are planning to collect some usage data from requests, process and store it in database. This functionality should be common(one module) between all tomcat servers.
Now we are thinking of using tomcat clustering. I done some research but I am not able to figure out how to separate data fetching operations i.e. reading same data(XML) from same data-source(another server) part from all tomcat web apps that make it common. So that once one server fetches data from server, it will maintain it (may be in cache) and the same data can be used by some other server to serve client. Now this functionality can be implemented using distributed cache. But there are other modules that can be made common in all other tomcat instances.
So basically, Is there any advantage of using Tomcat clustering? And if yes then how can I implement module which are common to all tomcat servers.
Read Tomcat configuration reference and clustering guide. Available clustering features are as follows:
The tomcat cluster implementation provides session replication,
context attribute replication and cluster wide WAR file deployment.
So, by clustering, you'll gain:
High availability: when one node fails, another will be able to take over without losing access to the data. For example, a HTTP session can still be handled without the user noticing the error.
Farm deployment: you can deploy your .war to a single node, and the rest will synchronize automatically.
The costs are mainly in performance:
Replication implies object serialization between the nodes. This may be undesired in some cases, but it's also possible to fine tune.
If you just want to share some state between the nodes, then you don't need clustering at all (unless you're going to use context or session replication). Just use a database and/or a distributed cache model like ehcache (or anything else).
I'm looking into hosting a standard Java web app on AWS and the new Elastic Beanstalk (http://aws.amazon.com/elasticbeanstalk/) seems to have most of what we want. The one thing I can't figure out is how to do distributed caching. It seems that AWS doesn't allow multicast discovery of new nodes, so I'm not sure how new nodes started by the auto-scaling process should be integrated into an existing distributed cache. Any suggestions / best practices appreciated.
Update: Ideally this would be a cache local to each application server instance. Best case scenario would be a hibernate level 2 cache config for something like ehcache or terracota.
Another route to go down today (after 2011-08-23) is to use Amazon ElastiCache which is protocol-compliant with Memcached and runs in the cloud for you. Makes it easy to put things into an in-memory cache.
Here is some of my thought:
Suppose you have distributed cacheing instance such as memcached running in some EC2 instances and you use the Elastic IP from AWS to map to these instances. As Elastic IP is sort of static IP address, now you can pre-configure your new web app instances to locate the memcached istanced through the memcached IP
During auto-scaling, now it is possible to locate your memcached servers.
If you want your cache on every instance separatly, I would recommend to use Multicontainer Docker Environments for EB, as a way to facilitate settign your app, and your casching layer on every node. Elastic Beanstalk part will work just like before on normal dedicated platform, some reconfiguration may be needed there if you are using private libs, etc. depending on your app details. But if you want your caching layers on separate nodes to talk to each other it may be not easy to achive...
Why do we need Application Server in Java like JBoss or WebSphere. Is it possible that we develop large scale website only with Java (Apache Tomcat). Where thousand of user connect on site at a moment. An example is a website like b2b.
What is the cost of a Application Server? I will be thankful if you compare price among different application server and if there is any free version kindly highlight it.
Application Servers are mostly used if you want to use advanced features like transaction management, hot code swapping and advanced persistence.
There are application servers that are open source. E.g. GlassFish and JBoss.
I don't think you need an application server for building a popular web site, you'll also be fine with a servlet container like Tomcat or Jetty.
In short Application Servers provide you with few services like
Transaction Management
Load Balancing
Security
Threading
etc.
You have to take care of these things yourself in a Web Server.
There are few Open Source Application servers which are free of cost.
I have used Glassfish.
Apart from answers given above, App Servers are required for EJBs.
You need Application Server as follow:
It provides you useful services like automatic transaction,Authentication,Authorization,Lifecycle management.
To remember large user data across pages using ejb's pertaining to a client.
Load balance the user request and buisness logic.
To interact with different Client UI like Java Swing,Browsers.
It is possible to handle the httpheaders yourself. We have done socket servers in java for 20 years. You do not need a container for java swing.
Persistence can be done through databases or server side files unless you need real high speed stuff. I have yet to find a real requirement for an ejb
except that some systems simply require them
This may be because jboss can provide better after-sales service, and jboss, etc. can provide operation and maintenance support, etc. This may be the reason why many large companies choose commercial versions of servers.
But you must know that tomcat and netty are not bad. For example, many large B2B or C2C or B2C companies still use tomcat, such as Internet companies such as Alibaba.
Choose a server
Operation and maintenance costs
Scalable costs
Server cost