JVM heap replication between two machines

JVM heap replication between two machines - java

What are the basic principles of how two separable computers connected within the same network running the same Java application maintain the same state by syncing their heap between each other?
I believe Terracotta does this task but I have no idea how would some pseudo code look like that would describe its core functions.
I'm just looking for understanding of this technology.

Terracotta DSO works by manipulating the byte code of your classes (and the JDK's classes etc). The instructions on how and when to do this are part of the Terracotta configuration file.
The bytecode modification looks for certain byte codes such as a field read or write or a monitor enter or exit. Whenever those instructions occur, code is added around that location that does the appropriate action in the distributed store. For example when a monitor is obtained due to synchronization, a distributed lock is obtained as well (whether it is a read or write lock is dependent on the configuration). If a field in a shared object is written, the distributed system must verify that a write lock is being held and then send the data value is sent to the clustered server, which stores it on disk or shares it over the network as appropriate.
Note that Terracotta does not share the entire heap, only the graph of objects indicated by the configuration. In general, there would be little point in sharing an entire heap. It is better instead for the application to describe the domain objects needed across the distributed application.
There are many optimizations employed to make the operations above efficient: only field deltas are sent over the wire and in a form much more efficient than Java serialization, many deltas can be bundled and sent in batches, locks are actually "checked out" to a particular client so that if the application data is partitioned across clients, most distributed locks are actually a local operation not involving a network call, etc.

Terracotta can indeed handle that if you tell it to - see the description of its DSO - Distributed Shared Objects.
It sounds cool but I would prefer something like EHcache (can be backed by Terracotta again) which functions on a bit more high level.

One emerging technology that somehow tackles this problem is Distributed Software Transactional Memory. You get strong data consistency guarantees (i.e. 1-copy serializability) and a powerful concurrency control mechanism: transactions.
AFAIK, there is no mature solution out there, but it is promising.

I would recommend that you investigate http://www.jboss.org/infinispan and see if it will fulfill your needs.

Related

How to share static variable data across multiple instances of Tomcat?

My company has outgrown its single-instance Tomcat server and so needs to run multiple instances. However, in the code there are many static variables that keep track of various data including objects. (Sure, let the jabs and jeers begin.:) I have considered moving the static variables to be stored in Redis, but that could potentially be a lot of work. Is there an easier way to share mutable static-variable information across instances of Tomcat?
Update: per suggestion by #jake, a note on the nature of the variables. These variables mostly store aggregate information for the service as a whole, such as info on the users logged in within the past hour, various features accessed, etc.

"[..] many static variables" keeping track of mutable information across the webapp sounds terrifying, frankly. If concurrency isn't being handled well, this is likely a source of all kinds of errors/problems. But more importantly even if concurrency is being done properly, that alone could be contributing to resource contention performance problems that are making your app outgrow its single-instance life. In this case even if you address the implementation by replacing with some type of more robust data store (which you probably should), the added network latency will likely exacerbate that problem substantially.
I would suggest reviewing what those variables are and determine if each of them really actually does need to be shared across the entire distributed application. If they turn out to be things that are just cached to avoid recalculations or re-fetching from a database, then maybe you can avoid putting those in a centralized datastore, and do something simpler like just putting them in application scope as #EJP recommended.
We'd need more specific use scenarios about some of those variables to make better recommendations.

You are talking about to have a cluster. So you need a set of new paradigms. You can see this image as an example:
Refer to this link to read more about clustering and load balancing.

You cannot do this as different tomcat instances are basically different process (possibly on different machines altogether) having separate heap space.
This is the same problem as storing session variables (Although Tomcat allows replicating session variable across instances) as you scale up replicating across instances becomes considerable overhead (and performance bottleneck).
So even if you could manage to replicate everything you are only delaying your problem for later and not solving it. For scaling any type server side storage is discouraged. If absolutely necessary distributed cache is the right direction. Or you could store everything in DB (added overhead of DB calls).
Reference
In Java, are static class members shared among programs?
session in cookis pattern

Java shared-address-space and message-passing parallel programming paradigms

In the shared-address-space model there is a common address space shared between
processes and represented as data structures in memory (like ConcurrentHashMap). This gives the advantage of very fast data sharing as shared objects are located on a single computer (let us suppose so for simplicity). As processes may collide, various lock mechanisms (mutex) are helpful to ensure mutual exclusion in accessing shared memory.
This scheme lacks of scalability as an increase in processor number can raise geometrically the traffic on shared memory and a single computer can not have more than say 8 processors.
In the message-passing model there is no sense of global address space. Each
process has its one private local memory. Processes can communicate with each
other via passing messages. Unlike shared-address-space, the message-
passing model offers scalability between processors and memory, although requires the common data to be replicated. An increase in processors will proportionally increase the memory (for that data) size as well, though no lock mechanisms are required in this case.
Reading "Thinking in Java" for inspiration I find only a talk about the shared-address-space model with synchronization principles. As my problem grows in complexity I'm going to try the message-passing paradigm, which as far as I'm not blind, is not presented in the book.
Could you please recommend Java native classes or any proved external library to work with the message-passing model, something like MPI in C++? Any link to that source would be highly appreciated!

Akka is a commonly-used actor framework for the JVM - available for both Java and Scala.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.
http://hadoop.apache.org/

What is Terracotta?

What is Terracotta?
What services does it offer?
What problems does it solve?
What other products solve problems similar to those that Terracotta solves?

Find a great article about Terracotta and how it works at InfoQ written directly by Orion Letizi, co-founder and software engineer at Terracotta:
http://www.infoq.com/articles/open-terracotta-intro
It helped me to prepare for a webcast about terracotta and how it can be used for clustering and scaling grails applications and gave me a good overview about Terracotta.

I like to think about Terracottas DSO in terms of advanced parallel architectures: Terracotta turns your message-passing multicomputer into a usual unified memory multiprocessor. Multicomputers are different from multiprocessors in that processors share memory and, therefore, are easier to program because you just write into memory in usual multithreading way. Though, you it means that you need to explicitly synchronize access to the shared data using a lock, system saves you from the need to explicitly message-passing data marshaling and resolves the biggest parallel programming issue -- the cache coherence -- for you. Multiprocessor marshals the data for you when you take/release the lock. It is, therefore, desirable. But, initially you have a bunch of computers -- a multicomputer.
The magic is achieved by injecting some code into your classes at object field/lock access points. To correspond DB world, Terracotta considers all updates done under a lock atomic (transaction). Likewise multiprocessors can have a global storage, Terracotta allows to back up the locally updated data to disk.

What other products solve problems similar to those that Terracotta solves?
Try Hazelcast, It is super simple to use. Peer to peer, highly scalable, fully open source clustering technology for Java. It is simply distributed Map, Queue, MultiMap, ExecutorService. You can use its Map as a distributed cache.

I found an article in JavaWorld about Terracotta at http://www.javaworld.com/javaworld/jw-01-2009/jw-01-osjp-terracotta.html.

Java distributed objects with locality?

I am evaluating various Java object distribution libraries (Terracotta, JCS, JBoss, Hazelcast ...) for an application server and I'm having trouble understanding their behavior on various axes.
My requirements for distributed objects are not many -- they boil down to one-to-one and one-to-many messaging. There's more, but for the rest we just use JDBC and I assume I can plop a cache in front of this using any of the available libraries.
I would like a system that distributes objects and exhibits locality properties -- in other words, a server that grabs an object tends to hold onto it without excess communication to other nodes. Hazelcast looks simple (and peer-to-peer is nice) but seems to require objects are distributed evenly across all nodes.
I'd like a way to persist objects, preferably transparently. I plan on using EC2, so I have the option of temporary, free, limited local storage (the disk) and permanent, non-free, unlimited storage (S3). It'd be great not to worry about OutOfMemoryErrors.
I like the simplicity and "magic" of Terracotta but it scares the beejeezus out of me. Also in order to truly scale you have to spend $$$$, otherwise you're communicating with a single hub.
I'm cheap and I want something not only free but mature and with a large userbase.
Thanks for any input.

Terracotta seems like a perfect fit for your situation.
It's simple to setup
it can be configured to be persistent (use an EBS volume for EC2)
it's closely integrated with Ehcache (actually Terracotta bought Ehcache) for great distributed caching performance
the free offering scales pretty well with several clients.
Just start playing around with it. I bet you'll love it. To ease your performance fears, simply run a through put test for message passing. This shouldn't take much more than an afternoon of your time.
I have to admit that I haven't used Terracotta for a year and that I don't know the others you suggested.

Terracotta does fit the bill. I understand your objections, but here's my comments:
1) Terracotta does exhibit locality - and is probably the best system at it compared to those you mentioned. Objects are only brought in to a local JVM where requested. Locking for reads or writes is performed using a leasing mechanism. This means if you exhibit perfect locality in your system then you will incur very little network overhead.
2) Terracotta provides disk persistence out of the box - in the OSS version (you don't have to pay $$$$)
3) Why does it scare you so much? Just use EHCache as a cache, or the Hibernate 2nd Level Plugin. It's incredibly easy to setup and use.
4) Yes, Terracotta FX requires you to pay (for scale-out servers). However I would suggest that if you have a system that is mostly read and exhibits true locality then I don't think you'll have a problem getting the scale you are looking for. With Terracotta 3.2 the performance of the Hibernate 2nd Level Cache is 100,000 ops/s using 8 application servers and one Terracotta server at 100/0 read/write ratio and 12,000 ops/s using the same config at 95/5 read/write ratio.
(I just did a talk for the Bay Area SDForum on these numbers so I happen to have them handy)

Yes Hazelcast will distribute your objects across the cluster. However you can enable near cache if you want to reduce the communication cost.
http://www.hazelcast.com/documentation.jsp#MapNearCache

Btw, it's not clear what you are looking for (messaging is not the same as clustering/distributed objects).
If you are looking for messaging in Java I recommend you have a look at RabbitMQ (it's Erlang based but that doesn't matter).

Any concept of shared memory in Java

AFAIK, memory in Java is based on heap from which the memory is allotted to objects dynamically and there is no concept of shared memory.
If there is no concept of shared memory, then the communication between Java programs should be time consuming. In C where inter-process communication is quicker via shared memory compared to other modes of communication.
Correct me if I'm wrong. Also what is the quickest way for 2 Java progs to talk to each other.

A few ways:
RAM Drive
Apache APR
OpenHFT Chronicle Core
Details here and here with some performance measurements.

Since there is no official API to create a shared memory segment, you need to resort to a helper library/DDL and JNI to use shared memory to have two Java processes talk to each other.
In practice, this is rarely an issue since Java supports threads, so you can have two "programs" run in the same Java VM. Those will share the same heap, so communication will be instantaneous. Plus you can't get errors because of problems with the shared memory segment.

Java Chronicle is worth looking at; both Chronicle-Queue and Chronicle-Map use shared memory.
These are some tests that I had done a while ago comparing various off-heap and on-heap options.

One thing to look at is using memory-mapped files, using Java NIO's FileChannel class or similar (see the map() method). We've used this very successfully to communicate (in our case one-way) between a Java process and a C native one on the same machine.
I'll admit I'm no filesystem expert (luckily we do have one on staff!) but the performance for us is absolutely blazingly fast -- effectively you're treating a section of the page cache as a file and reading + writing to it directly without the overhead of system calls. I'm not sure about the guarantees and coherency -- there are methods in Java to force changes to be written to the file, which implies that they are (sometimes? typically? usually? normally? not sure) written to the actual underlying file (somewhat? very? extremely?) lazily, meaning that some proportion of the time it's basically just a shared memory segment.
In theory, as I understand it, memory-mapped files CAN actually be backed by a shared memory segment (they're just file handles, I think) but I'm not aware of a way to do so in Java without JNI.

Shared memory is sometimes quick. Sometimes its not - it hurts CPU caches and synchronization is often a pain (and should it rely upon mutexes and such, can be a major performance penalty).
Barrelfish is an operating system that demonstrates that IPC using message passing is actually faster than shared memory as the number of cores increases (on conventional X86 architectures as well as the more exotic NUMA NUCA stuff you'd guess it was targeting).
So your assumption that shared memory is fast needs testing for your particular scenario and on your target hardware. Its not a generic sound assumption these days!

There's a couple of comparable technologies I can think of:
A few years back there was a technology called JavaSpaces but that never really seemed to take hold, a shame if you ask me.
Nowadays there are the distributed cache technologies, things like Coherence and Tangosol.
Unfortunately neither will have the out right speed of shared memory, but they do deal with the issues of concurrent access, etc.

The easiest way to do that is to have two processes instantiate the same memory-mapped file. In practice they will be sharing the same off-heap memory space. You can grab the physical address of this memory and use sun.misc.Unsafe to write/read primitives. It supports concurrency through the putXXXVolatile/getXXXVolatile methods. Take a look on CoralQueue which offers IPC easily as well as inter-thread communication inside the same JVM.
Disclaimer: I am one of the developers of CoralQueue.

Similar to Peter Lawrey's Java Chronicle, you can try Jocket.
It also uses a MappedByteBuffer but does not persist any data and is meant to be used as a drop-in replacement to Socket / ServerSocket.
Roundtrip latency for a 1kB ping-pong is around a half-microsecond.

MappedBus (http://github.com/caplogic/mappedbus) is a library I've added on github which enable IPC between multiple (more than two) Java processes/JVMs by message passing.
The transport can be either a memory mapped file or shared memory. To use it with shared memory simply follow the examples on the github page but point the readers/writers to a file under "/dev/shm/".
It's open source and the implementation is fully explained on the github page.

The information provided by Cowan is correct. However, even shared memory won't always appear to be identical in multiple threads (and/or processes) at the same time. The key underlying reason is the Java memory model (which is built on the hardware memory model). See Can multiple threads see writes on a direct mapped ByteBuffer in Java? for a quite useful discussion of the subject.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.