How to share static variable data across multiple instances of Tomcat?

How to share static variable data across multiple instances of Tomcat? - java

My company has outgrown its single-instance Tomcat server and so needs to run multiple instances. However, in the code there are many static variables that keep track of various data including objects. (Sure, let the jabs and jeers begin.:) I have considered moving the static variables to be stored in Redis, but that could potentially be a lot of work. Is there an easier way to share mutable static-variable information across instances of Tomcat?
Update: per suggestion by #jake, a note on the nature of the variables. These variables mostly store aggregate information for the service as a whole, such as info on the users logged in within the past hour, various features accessed, etc.

"[..] many static variables" keeping track of mutable information across the webapp sounds terrifying, frankly. If concurrency isn't being handled well, this is likely a source of all kinds of errors/problems. But more importantly even if concurrency is being done properly, that alone could be contributing to resource contention performance problems that are making your app outgrow its single-instance life. In this case even if you address the implementation by replacing with some type of more robust data store (which you probably should), the added network latency will likely exacerbate that problem substantially.
I would suggest reviewing what those variables are and determine if each of them really actually does need to be shared across the entire distributed application. If they turn out to be things that are just cached to avoid recalculations or re-fetching from a database, then maybe you can avoid putting those in a centralized datastore, and do something simpler like just putting them in application scope as #EJP recommended.
We'd need more specific use scenarios about some of those variables to make better recommendations.

You are talking about to have a cluster. So you need a set of new paradigms. You can see this image as an example:
Refer to this link to read more about clustering and load balancing.

You cannot do this as different tomcat instances are basically different process (possibly on different machines altogether) having separate heap space.
This is the same problem as storing session variables (Although Tomcat allows replicating session variable across instances) as you scale up replicating across instances becomes considerable overhead (and performance bottleneck).
So even if you could manage to replicate everything you are only delaying your problem for later and not solving it. For scaling any type server side storage is discouraged. If absolutely necessary distributed cache is the right direction. Or you could store everything in DB (added overhead of DB calls).
Reference
In Java, are static class members shared among programs?
session in cookis pattern

Related

Creating cache shall I use file system or the memory?

I have millions of rows to be read from database and multiple users come in a day to read the same data. so I want to create a cache. so that I don't have to go to database again for same data.
I have seen many option but couldn't find figure out which approach to use.
Creating my own cache I am thinking saving the data of a query result and writing in a file or
use some third party in memory caches?
Guava CacheBuilder ,LRUMap caching,whirlycache ,cache4j.

You are not the first person to have requirements like this, which is why there are dozens of cache implementations available as open source projects, and even a standard set of Java APIs for caching (JCache). If your needs go beyond those solutions, there are even commercial solutions that handle tens of terabytes of data transparently across RAM, flash, database, etc. If none of those are sufficient, then you should definitely write your own.

Its totally dependent on multiple factors. and i think answer will be based on environment, Size of data etc. here is the main points
You want to keep the cache in ram as much as possible because its faster to access than being in file system.
You can also use OS memory mapped files which does balance access vs utilization. I suggest any proven solution than creating your own
If you are running low on memory then you might need to ask question on what is more important like caching the top access data as they are most likely to be asked by client.
So there is not a sure or definite answer but you have to decide based on your constraints. Hope this helps

I think you are overengineering the problem, it isn't trivial to write a performant, transparent cache, unless you only need a simple HashMap to hold some values. You should focus on writing code to solve your domain problem and not writing too much framework code.
Stop reinventing the wheel, use either an in-memory cache (e.g. infinispan or redis) or a database (e.g. postgres). You will have less pain and better performance.

Sharing JVM session

I have a java application, using JVM as session storage. But recently when a certain number of users exceed. The application goes down. JVM is running out of memory.
I want to add new application server also want to use load balancer but as the session is JVM dependent, I can not share it with other application server.
It would be great if I can use one JVM instance dedicatedly for the JVM session and access it via multiple application server.How I can do that?
I am using Java Spring in the project. Is my plan ok to accommodate lot of users requests?
Thanks in advance.

There is a 3rd Party Application called Terracotta. i Tried it and work fine for Spring Application.
You can find the Configuration details from below link.
http://www.terracotta.org/documentation/4.1/terracotta-server-array/introduction
Put a Comment if need any help.

First make sure you know what the cause of out of memory is. If it is really related to having many sessions, you may want to change the way sessions are managed. Instead of keeping session in memory, you could save it into database. In that approach you would reduce memory and also after adding other machines session wouldn't be tied to any of them.

It sound like you are holding (large amounts of) session data in memory ... for performance reasons.
Is my plan ok to accommodate lot of users requests?
Ultimately you will run out of:
physical memory to hold all of the session data in one JVM, or
CPU and I/O bandwidth to satisfy the requests for session data from the other application servers, and/or
CPU resources for simply managing the data. (Hint: the time taken to do a full GC is proportional to the total amount on reachable data.)
If your architecture uses a single JVM for all session data, you will eventually hit a wall. That suggests that you should make it possible to replicate that part of your system. However, it is not possible to suggest the best way to do that ... without a much deeper analysis of your application ... and its real need for scalability.
Bottom line: there are no simple one-size-fits-all solutions for scalability.

Reliable distributed cache on app engine (Java)

I need to keep some values in memory, sort of in-memory db. In terms of reliability, I am not affraid of system failure, I can live with that. However, I can not use memcache service, because the values can be evicted anytime. I need the values to be available on other machines, when application scales. I suppose that appengine will not make memory scale or will it (e.g. if I keep value in an ordinary Java collection)?
What I am trying to achieve here is a "pick a nickname" service. This works in two steps. First, user reserves a nickname. Then he registers the nickname. Nicknames are stored under one entity group (sic!). Therefore I need to avoid datastore contention.
As far as I understand from the https://developers.google.com/appengine/articles/scaling/memcache I can to a certain extent rely on that values in memcache should not be evicted on arbitrary resons. However, I have to count on that this will happen from time to time (e.g. on high memory levels). And this losses of value are very unpleasant to my users.

Your application shares a single instance of Memcache, it is not local to a "machine" (or rather instance of your application).
So if you are running 2 instances and they both retrieve the same value from memcache they will both get the same value.
Running an "in memory" database is not feasible in the cloud - what memory is it you were planning to use, the memory in the instance that's about to shut down?
https://developers.google.com/appengine/articles/scaling/memcache
When designing your application, take the time to consider which datasets can be cached for future reuse. These could be commonly viewed pages or often read datastore entities, just to name a few. There may also be some data in your application which you would like to have shared among all instances of your app but does not need to be persisted forever. In such cases, memcache can improve the scalability of your app by providing a fast and efficient distributed storage system for transient data. Adding memcache logic to your server side code is often well worth the few extra lines of code.

You can use app engine NDB, when you use Python27. NDb is a datastore with auto caching and much more.
Other Machines ? You mean shared between instances of the same app.

JVM heap replication between two machines

What are the basic principles of how two separable computers connected within the same network running the same Java application maintain the same state by syncing their heap between each other?
I believe Terracotta does this task but I have no idea how would some pseudo code look like that would describe its core functions.
I'm just looking for understanding of this technology.

Terracotta DSO works by manipulating the byte code of your classes (and the JDK's classes etc). The instructions on how and when to do this are part of the Terracotta configuration file.
The bytecode modification looks for certain byte codes such as a field read or write or a monitor enter or exit. Whenever those instructions occur, code is added around that location that does the appropriate action in the distributed store. For example when a monitor is obtained due to synchronization, a distributed lock is obtained as well (whether it is a read or write lock is dependent on the configuration). If a field in a shared object is written, the distributed system must verify that a write lock is being held and then send the data value is sent to the clustered server, which stores it on disk or shares it over the network as appropriate.
Note that Terracotta does not share the entire heap, only the graph of objects indicated by the configuration. In general, there would be little point in sharing an entire heap. It is better instead for the application to describe the domain objects needed across the distributed application.
There are many optimizations employed to make the operations above efficient: only field deltas are sent over the wire and in a form much more efficient than Java serialization, many deltas can be bundled and sent in batches, locks are actually "checked out" to a particular client so that if the application data is partitioned across clients, most distributed locks are actually a local operation not involving a network call, etc.

Terracotta can indeed handle that if you tell it to - see the description of its DSO - Distributed Shared Objects.
It sounds cool but I would prefer something like EHcache (can be backed by Terracotta again) which functions on a bit more high level.

One emerging technology that somehow tackles this problem is Distributed Software Transactional Memory. You get strong data consistency guarantees (i.e. 1-copy serializability) and a powerful concurrency control mechanism: transactions.
AFAIK, there is no mature solution out there, but it is promising.

I would recommend that you investigate http://www.jboss.org/infinispan and see if it will fulfill your needs.

Hold most of the object in cache/memory insted of database?

It just occurred to me why not to have most of the objects in a cache(memory) when an application start.
if it's not that large web application. Or to have a settings for how much I want to put in the cache/memory.
I just guess it could require to have something like below 1 GB RAM or a lot less.
Everything in order to speed up the application even more by not querying database.
Is it good idea?

Caching is definitely a good idea and is widely used, but it has to be implemented correctly. There are plenty of pitfalls if done incorrectly. Try looking into one of the big proven systems, like memcached.

Caching is definitely a good idea.
Databases are also not a catch-all solution, though you have to be careful about consistency between runs of your program. What if you change the data but your program crashes before you update it to the database?
There are also lightweight memory resident databases that can let you keep your current queries for now, but run much stuff from memory. Using an ORM tool instead of SQL is particularly effective for this since the switch is almost transparent.

Quickly becomes Not so good idea, when some other node starts updating database.
In that case your cache will be holding stale data.

You can maintain a cache of Frequently Used objects in the memory, just don't forget to add methods to refresh the cache when the underlying database state changes.
Eg: If you have a user's table and you need user names in many many pages, then load the entire table in cache at time of Application Startup, just make sure to update the cache when you are adding new users online or modifying / deleting entries from user table

You don't persist objects to database. What you persist is object's state. So that you can have exactly the same state even after your app stops/closes/restarts. If you want to keep states of your objects persisted, you have no choice, but to use db (or anything else, that allows you to write data to file system).

The details are beyond the scope of an answer here, but we have had good experience of using ehCache ( http://ehcache.org/ )
The combination of support for distributed caches, and overflow to disk has allowed us to keep large numbers of computationally heavy, but fairly unchanging pages in the cache for a site being served from multiple tomcats.
Distribution addresses the question of staleness (if you invalidate your items correctly) and the disk overflow allows us to basically cache everything which was just not feasible with an in-memory cache.
Of course the implementation is not trivial for a real world application, but it improved our performance significantly once the caches were bubbling.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.