We have a web application that loads a User object from database. Its a high volume application with thousands of concurrent users so we're looking at ways to cache the User objects to minimise database load.
Currently using ehcache but are looking at memcached to lower the memory requirements of the application, and make it more scaleable.
Problem we are currently having with using memcached is the cpu load that serializing the User instance brings. We're looking at ways to speed up the serialization, but ar also considering if we could use a smaller ehcache cache backed by memcached server.
Has anyone had any experience using ehcache backed by memcached (ie. first look in ehcache, if user not there, look in memcache, if not there look in database)?
Any downsides to this kind of approach?
If you're willing to move away from Ehcache, you could consider Infinispan, which now includes integration with memcache. It's a bit more of a faff to get working than Ehcache, but not too much.
Starting with version 4.1, Infinispan distribution contains a server module that implements the memcached text protocol. This allows memcached clients to talk to one or several Infinispan backed memcached servers. These servers could either be working standalone just like memcached does where each server acts independently and does not communicate with the rest, or they could be clustered where servers replicate or distribute their contents to other Infinispan backed memcached servers, thus providing clients with failover capabilities.
It does make sense to do what you're suggesting. We've experienced the same issue with memcached in that the overhead to serialize objects back and forth isn't worth using it alone for a high volume application. Having a local cache reduces load on the application side while memcached reduces load on the database side. The downside comes with the additional complexity of writing two layers of caches and maintaining cache coherency. I'd try to minimize where you need to use it.
Infinispan can store objects as instances and minimize serialization overhead, also instead of replicating the data on each node it can distribute data to make better usage of your memory, or you can limit the amount of entries to keep in memory.
You can also have it just send invalidation messages to other nodes when you update a value, instead of sending the serialized values around.
In addition, for when it still needs to serialize it uses a very efficient Marshaller instead of Java's serialization, and since version 5 you can plug in your custom Externalizers to customize the wire format of some types to give it a extra push (generally not needed, but nice to have).
In case you where looking at memcached for other reasons as well, be aware that Infinispan also "speaks" the memcached text protocol so if you have other clients around you can still integrate with it.
You could pretty simply overwrite net.sf.ehcache.Cache.createDiskStore()
new Cache(..) {
protected Store createDiskStore() {
if (isDiskStore()) {
// default: return DiskStore.create(this, diskStorePath);
MemcachedStore store = new MemcachedStore(..);
getCacheConfiguration().addConfigurationListener(store);
return store;
} else {
return null;
}
}
}
MemcachedStore is a custom implementation of net.sf.ehcache.store.Store that you'll have to do yourself. That's not to trivial, but then again, starting from DiskStore shouldn't be too difficult.
You can't replace the DiskStore in ehcache because its final. You can implement a new OffHeapStore and plugin it in like that. This is how BigMemory works. There is an Apache project called DirectMemory doing the same thing.
See my post here for more detail:
http://forums.terracotta.org/forums/posts/list/0/8833.page#40635
This article specifies how we can use in-process cache in front of distributed cache in spring application by define our own MultiTieredCacheManager and MultiTieredCache:
Multi Tiered Caching - Using in-process Cache in front of Distributed Cache
Related
During some testing of multiple memcached instance I realized that spymemcached Java client was evenly distributing the key data across the configured instances. I know that memcached is a distributed, but is there a way to configure a client to write key data to all configured instances? I know that memory cache approaches like this are not designed to replace persistent storage (DB) but I have zero need for persistent storage and need a lightweight way to synchronize basic key/value data between two or more instances of my service.
The test Java code I prototyped worked great, and I feel the spymemcached API would integrate well, but I need to replicate the data between memcached instances. I assumed if I specified multiple MC instances that the data would be distributed to all, not across all available. Thanks.
There is some memcached client that allow data replication among multiple memcached servers. From what I can tell, SpyMemcached is not one of them.
I do not understand however, why you want this. Lightweight synchronization works just as well without replication. Memcached clients generally (this includes SpyMemcached) use consistent hashing to map from a key to a server, so every instance of your service will look for a key on the same server.
In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)
You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap
what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you
I've a web app that makes external web service calls on behalf of it's clients. I want to cache the data returns by some web services in the web app so that other clients can reuse this data and run filters and queries on this cached data.
The current architecture of the web app uses Apache Camel, Spring and Jetty. I'm looking for options (pros/cons) of in-memory database options.
Hazelcast (Java API) - you can distribute the in-memory datagrid (with map, multimap, sets, lists, queues, topics) over multiple nodes very easily & use load/store interface implementation with a disk based DB. You can do something similar with EHCache.
Redis is another option (use the Java client to access it). You can simply configure the conf file to write data to disk (or avoid it altogether) & should not have to write your own load/store classes.
Besides these, there are a number of options you could use. Not sure if you are only looking at open source options, looking at distributed options or not.
Hope it helps.
Have you considered using MemCached? It is not a database, but a caching system you can control from inside your application.
Here are a few more thoughts about in-memory databases. First almost every modern RDBMS has a memory caching system inside it. The more memory you give to the database server (and configure it for caching) the more that it will store in memory for later. If you put together a system with enough memory to cache all the tables, you will have an "in memory" cache without the overhead of another database.
Most total "in memory" databases are used for high volume/large data systems where performance is totally key. And, because they are for extreme performance systems, you are going to pay for them. Or more specifically, pay extra for them. For example, the SAP/Sybase DB's that support full in-memory can cost you from 40% to 300% more than our existing products.
So, in answer to your question, do you really need one?
Try Redisson - distributed and scalable familar Java data structures (Set, Map, ConcurrentMap, List, Queue, Lock, AtomicLong, CountDownLatch, Publish / Subscribe) on top of in-memory db Redis.
I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.
this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.
Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.
Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.
Currently we have 2 app severs, each has application level cache and has centralized database server. To keep both servers app cache in sync we have set up JMS broker in between. On cache clear on one server which sends message to JMS, since other is registered so it will get the message and clears the perticular entry based on message content.
Since this messaging system adds latency in clearing the cache entry, for some amount of time there will be inconsistency between application level caches.
So we thought of having centralized cache server to avoid all this extra work to done to keep all caches in sync.
We are thinking of using Ehcache/Terracotta or Hazelcast, these cache hold resultsets, locks info, and some system specific varaibles.
Please suggest best cache solution for us.
I probably can't suggest the best solution for you but I'll try to give some ideas:
Hazelcast: offers very easy to use distributed map (and lot's of other things worth to have a look at - distributed SQL Query is very neat):
Map<String, Object> map = Hazelcast.getMap("xxx");
and you are done. Work on the map using standard API's. Hazelcast config/setup is quite easy (compared to Ehcache/TC). The monitoring webapp is also easy to use and helpful but there are things missing. Performance should be more than sufficient for a small cluster (like your 2 servers).
Ehcache/Terracotta: would introduce a new infrastructure component to your setup (Terracotta Server) - may be a downside. Using this setup is in my experience quite intense in terms of things to learn and try out. The promise is enterprise class level performance and monitoring facilities.
If you don't have extreme high performance requirements I personally would go for Hazelcast and avoid the complexity of Ehcache/TC.
We have been using centralized Memcached server (as Hibernate 2nd level cache and other caching requirement) and its working well for us. We are using Memcached with XMemcached client and so far its working without any problem.