During some testing of multiple memcached instance I realized that spymemcached Java client was evenly distributing the key data across the configured instances. I know that memcached is a distributed, but is there a way to configure a client to write key data to all configured instances? I know that memory cache approaches like this are not designed to replace persistent storage (DB) but I have zero need for persistent storage and need a lightweight way to synchronize basic key/value data between two or more instances of my service.
The test Java code I prototyped worked great, and I feel the spymemcached API would integrate well, but I need to replicate the data between memcached instances. I assumed if I specified multiple MC instances that the data would be distributed to all, not across all available. Thanks.
There is some memcached client that allow data replication among multiple memcached servers. From what I can tell, SpyMemcached is not one of them.
I do not understand however, why you want this. Lightweight synchronization works just as well without replication. Memcached clients generally (this includes SpyMemcached) use consistent hashing to map from a key to a server, so every instance of your service will look for a key on the same server.
Related
Two instances of my java application are deployed in a server. One of the instances will be live at any one point and other will be standby. The live instance will receive some data from some receivers and do some processing. Now if my live instance got shutdown due to some error the standby will become live.
Can the data(map/list) maintained/collected in the first instance be somehow shared to second instance?
You can do by using some kind of distributed caching mechanism like redis, hazelcast, ignite etc.
You can maintain distributed collections in cache itself. Like Hazelcast provides java like abstractions of collection.
Similarly Redisson java client(on top of redis) also provides distributed implementation of java collections and much more.
I have a case where I need to frequently update and retrieve values of a map. This variable should have the same keys and values throughout all four servers. If one server updates the Map, it should be reflective in the other servers.
I believe I should be caching this..
Can I have some example codes in how I should be achieving this ?
Thank you.
You need a distributed cache. Choosing one is another issue...
see here.
Example of using EhCache - here.
I would suggest to use any distributed cache for it, i.e. Hazelcast implementation for distributed map
You could setup a Hazelcast cluster and implement MapStore
Also you will need to configure Hazelcast clients on each tomcat server. This clients will load distributed map and take care of syncing the data.
Hazelcast has a great documentation and plenty of examples so it should be easy for you to deal with it.
In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)
You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap
what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you
Here is a situation I have encountered: I have two similair java application running on different servers. Both applications obtain data from the same website using web-service provided. But the site doesn't know of course that the first app has taken the same peace of data as the second app. After fetching data should be saved in database. So I have a problem of saving the same data two times in a database.
How can I avoid duplicate entries in my db?
Probably there are two ways:
1) use database side. write something that looks like "insert if unique".
2) use server side. write some intermediate service that will receive responses from two data fetchers and process them somehow.
I suppose second solution is more effecient.
Can you advice something on this topic?
How would you implement that intermediate service? How would implement communication between the services? If we would use the HashMaps to store received data, how can we estimate maximum size of HashMap that our system can handle?
Do you really need to fetch data at two servers simultaneously? Checking every entry during insert if not present could be expensive. Merging several fetches can be time consuming as well. Is there any benefit of fetching in parallel? Consider having one fetcher at time.
The problem you will face is that you have to choose which one of you distributed processes should perform data fetching and storing it in DB.
It is some kind of Leader Election problem.
Take a look at Apache ZooKeeper which is distributed coordination service.
There is a receipt how to implement leader election with ZooKeeper.
There are a lot of frameworks that already implemented this receipt. I'd recommend you to use Netflix curator. More details about the leader election with curator is available at wiki.
There are distributed frameworks for this sort of problem.
Hazelcast - will allow you to have a single distributed ConcurrentMap across multiple JVM's.
Terracotta - Using it's DSO (Distributed shared objects I think) it will maintain a Map implementation across JVM;s
We have a web application that loads a User object from database. Its a high volume application with thousands of concurrent users so we're looking at ways to cache the User objects to minimise database load.
Currently using ehcache but are looking at memcached to lower the memory requirements of the application, and make it more scaleable.
Problem we are currently having with using memcached is the cpu load that serializing the User instance brings. We're looking at ways to speed up the serialization, but ar also considering if we could use a smaller ehcache cache backed by memcached server.
Has anyone had any experience using ehcache backed by memcached (ie. first look in ehcache, if user not there, look in memcache, if not there look in database)?
Any downsides to this kind of approach?
If you're willing to move away from Ehcache, you could consider Infinispan, which now includes integration with memcache. It's a bit more of a faff to get working than Ehcache, but not too much.
Starting with version 4.1, Infinispan distribution contains a server module that implements the memcached text protocol. This allows memcached clients to talk to one or several Infinispan backed memcached servers. These servers could either be working standalone just like memcached does where each server acts independently and does not communicate with the rest, or they could be clustered where servers replicate or distribute their contents to other Infinispan backed memcached servers, thus providing clients with failover capabilities.
It does make sense to do what you're suggesting. We've experienced the same issue with memcached in that the overhead to serialize objects back and forth isn't worth using it alone for a high volume application. Having a local cache reduces load on the application side while memcached reduces load on the database side. The downside comes with the additional complexity of writing two layers of caches and maintaining cache coherency. I'd try to minimize where you need to use it.
Infinispan can store objects as instances and minimize serialization overhead, also instead of replicating the data on each node it can distribute data to make better usage of your memory, or you can limit the amount of entries to keep in memory.
You can also have it just send invalidation messages to other nodes when you update a value, instead of sending the serialized values around.
In addition, for when it still needs to serialize it uses a very efficient Marshaller instead of Java's serialization, and since version 5 you can plug in your custom Externalizers to customize the wire format of some types to give it a extra push (generally not needed, but nice to have).
In case you where looking at memcached for other reasons as well, be aware that Infinispan also "speaks" the memcached text protocol so if you have other clients around you can still integrate with it.
You could pretty simply overwrite net.sf.ehcache.Cache.createDiskStore()
new Cache(..) {
protected Store createDiskStore() {
if (isDiskStore()) {
// default: return DiskStore.create(this, diskStorePath);
MemcachedStore store = new MemcachedStore(..);
getCacheConfiguration().addConfigurationListener(store);
return store;
} else {
return null;
}
}
}
MemcachedStore is a custom implementation of net.sf.ehcache.store.Store that you'll have to do yourself. That's not to trivial, but then again, starting from DiskStore shouldn't be too difficult.
You can't replace the DiskStore in ehcache because its final. You can implement a new OffHeapStore and plugin it in like that. This is how BigMemory works. There is an Apache project called DirectMemory doing the same thing.
See my post here for more detail:
http://forums.terracotta.org/forums/posts/list/0/8833.page#40635
This article specifies how we can use in-process cache in front of distributed cache in spring application by define our own MultiTieredCacheManager and MultiTieredCache:
Multi Tiered Caching - Using in-process Cache in front of Distributed Cache