Options for In-memory databases (Open source and Java-based)

Options for In-memory databases (Open source and Java-based) - java

I've a web app that makes external web service calls on behalf of it's clients. I want to cache the data returns by some web services in the web app so that other clients can reuse this data and run filters and queries on this cached data.
The current architecture of the web app uses Apache Camel, Spring and Jetty. I'm looking for options (pros/cons) of in-memory database options.

Hazelcast (Java API) - you can distribute the in-memory datagrid (with map, multimap, sets, lists, queues, topics) over multiple nodes very easily & use load/store interface implementation with a disk based DB. You can do something similar with EHCache.
Redis is another option (use the Java client to access it). You can simply configure the conf file to write data to disk (or avoid it altogether) & should not have to write your own load/store classes.
Besides these, there are a number of options you could use. Not sure if you are only looking at open source options, looking at distributed options or not.
Hope it helps.

Have you considered using MemCached? It is not a database, but a caching system you can control from inside your application.
Here are a few more thoughts about in-memory databases. First almost every modern RDBMS has a memory caching system inside it. The more memory you give to the database server (and configure it for caching) the more that it will store in memory for later. If you put together a system with enough memory to cache all the tables, you will have an "in memory" cache without the overhead of another database.
Most total "in memory" databases are used for high volume/large data systems where performance is totally key. And, because they are for extreme performance systems, you are going to pay for them. Or more specifically, pay extra for them. For example, the SAP/Sybase DB's that support full in-memory can cost you from 40% to 300% more than our existing products.
So, in answer to your question, do you really need one?

Try Redisson - distributed and scalable familar Java data structures (Set, Map, ConcurrentMap, List, Queue, Lock, AtomicLong, CountDownLatch, Publish / Subscribe) on top of in-memory db Redis.

Related

Custom caching implementation in Java

I want to implement some sort of lightweight caching in Java which is easily integrable in Java and should be easy to deploy with a Java application.
The cache layer will be between the application and the database layer: no database caching, no Spring, no Hibernate, no EHcache, no http caching.
We can use a file system or a nano database so that the cache can be restored so that the cache can be restored after the process restart.
I tried LRU Cache:
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/
But I am not sure how to after overflow should I save database into database (which database will be better to use for faster insert and seek of data). Or I should use File System?
Any one has better inputs to implement caching mechanism in Java?

But I am not sure how to after overflow should I save database into database(which database will be better to use for faster insert ans seek ok data) Or I should use File System?
It depends on the use case. If your cached values are very big, you can store each of it in a file and use the hash of the cache key as file name.
If you have values small in size, storing them as separate files would be a lot of overhead, so it is better to store the cached entries into one or a couple of files. To implement this you need to learn about "external indexes" and "memory management" or "free space management" (e.g. best fit, next fit and compaction strategies). This actually leads to the implementation of a tiny database, so may be use one :) Some stuff that comes to my mind: LevelDB, MapDB, LMDB, RocksDB
Keep in mind that caching operations come in concurrently from the application, so the cache may evict a value and a request to the same key may come in at the same time. Will you implement just the basic operations like Cache.get and Cache.put or also CAS-operations like Cache.putIfAbsent? Do you want to efficiently use multi core system, as they are common today?
Still, when using a tiny database, you will need to prepare for some months of engineering work.
Any one has better inputs to implement caching mechanism in Java?
You can read my blog at cruftex.net for some more input to implement lightweight and fast caching in Java.
For a cache implementation with overflow you can take a look at imcache. But imcache is not a fully-fledged generic cache, because for example CAS-operations are missing, see the Cache interface
My own high performance Java cache implementation cache2k, features CAS-operations, events, loaders&writers, expiry, etc. and it will eventually get some overflow to disk, too. However, I am not sure about the time frame... When you are interested to work in this area: contributions are welcome!

Is there a standard way of synchronising a Map of objects across a network?

In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)

You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap

what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you

Can a streaming collection be implemented in Java?

I needed to implement a utility server that tracks few custom variables that will be sent from any other server. To track the variables, a key value collection, either JDK defined or custom needs to be used.
Here are few considerations -
Keeping all the variables in memory of the server all the time is memory intensive.
This server needs to be a very lightweight server and I do not want heavy database operations.
Is there a pre-defined streaming collection which can serialize the data after a threshold memory and retrieve it on need basis?
I hope I am clear in defining the problem statement.
Please suggest if any other better approach.

this thing looks very promising, but is in development stage...
JDBM3
Edit Current version of the file backed collections: MapDB.

Database
What you've described sounds exactly like you should use a database (i.e. indexed key/value store, too big for memory but want performance benefits of in-memory caching where possible).
I'd recommend a lightweight embedded database such as H2 - it's small, fast and should suit your purposes very well.

Have you thought of using an on the shelf nosql queue value store? Redis for example?
If you want it java only you have the option of using a lib like ehcache, it would have the functionalities you need.

Centralized cache server. (Ehcache or Hazelcast)

Currently we have 2 app severs, each has application level cache and has centralized database server. To keep both servers app cache in sync we have set up JMS broker in between. On cache clear on one server which sends message to JMS, since other is registered so it will get the message and clears the perticular entry based on message content.
Since this messaging system adds latency in clearing the cache entry, for some amount of time there will be inconsistency between application level caches.
So we thought of having centralized cache server to avoid all this extra work to done to keep all caches in sync.
We are thinking of using Ehcache/Terracotta or Hazelcast, these cache hold resultsets, locks info, and some system specific varaibles.
Please suggest best cache solution for us.

I probably can't suggest the best solution for you but I'll try to give some ideas:
Hazelcast: offers very easy to use distributed map (and lot's of other things worth to have a look at - distributed SQL Query is very neat):
Map<String, Object> map = Hazelcast.getMap("xxx");
and you are done. Work on the map using standard API's. Hazelcast config/setup is quite easy (compared to Ehcache/TC). The monitoring webapp is also easy to use and helpful but there are things missing. Performance should be more than sufficient for a small cluster (like your 2 servers).
Ehcache/Terracotta: would introduce a new infrastructure component to your setup (Terracotta Server) - may be a downside. Using this setup is in my experience quite intense in terms of things to learn and try out. The promise is enterprise class level performance and monitoring facilities.
If you don't have extreme high performance requirements I personally would go for Hazelcast and avoid the complexity of Ehcache/TC.

We have been using centralized Memcached server (as Hibernate 2nd level cache and other caching requirement) and its working well for us. We are using Memcached with XMemcached client and so far its working without any problem.

Using ehcache in front of memcached

We have a web application that loads a User object from database. Its a high volume application with thousands of concurrent users so we're looking at ways to cache the User objects to minimise database load.
Currently using ehcache but are looking at memcached to lower the memory requirements of the application, and make it more scaleable.
Problem we are currently having with using memcached is the cpu load that serializing the User instance brings. We're looking at ways to speed up the serialization, but ar also considering if we could use a smaller ehcache cache backed by memcached server.
Has anyone had any experience using ehcache backed by memcached (ie. first look in ehcache, if user not there, look in memcache, if not there look in database)?
Any downsides to this kind of approach?

If you're willing to move away from Ehcache, you could consider Infinispan, which now includes integration with memcache. It's a bit more of a faff to get working than Ehcache, but not too much.
Starting with version 4.1, Infinispan distribution contains a server module that implements the memcached text protocol. This allows memcached clients to talk to one or several Infinispan backed memcached servers. These servers could either be working standalone just like memcached does where each server acts independently and does not communicate with the rest, or they could be clustered where servers replicate or distribute their contents to other Infinispan backed memcached servers, thus providing clients with failover capabilities.

It does make sense to do what you're suggesting. We've experienced the same issue with memcached in that the overhead to serialize objects back and forth isn't worth using it alone for a high volume application. Having a local cache reduces load on the application side while memcached reduces load on the database side. The downside comes with the additional complexity of writing two layers of caches and maintaining cache coherency. I'd try to minimize where you need to use it.

Infinispan can store objects as instances and minimize serialization overhead, also instead of replicating the data on each node it can distribute data to make better usage of your memory, or you can limit the amount of entries to keep in memory.
You can also have it just send invalidation messages to other nodes when you update a value, instead of sending the serialized values around.
In addition, for when it still needs to serialize it uses a very efficient Marshaller instead of Java's serialization, and since version 5 you can plug in your custom Externalizers to customize the wire format of some types to give it a extra push (generally not needed, but nice to have).
In case you where looking at memcached for other reasons as well, be aware that Infinispan also "speaks" the memcached text protocol so if you have other clients around you can still integrate with it.

You could pretty simply overwrite net.sf.ehcache.Cache.createDiskStore()
new Cache(..) {
protected Store createDiskStore() {
if (isDiskStore()) {
// default: return DiskStore.create(this, diskStorePath);
MemcachedStore store = new MemcachedStore(..);
getCacheConfiguration().addConfigurationListener(store);
return store;
} else {
return null;
}
}
}
MemcachedStore is a custom implementation of net.sf.ehcache.store.Store that you'll have to do yourself. That's not to trivial, but then again, starting from DiskStore shouldn't be too difficult.

You can't replace the DiskStore in ehcache because its final. You can implement a new OffHeapStore and plugin it in like that. This is how BigMemory works. There is an Apache project called DirectMemory doing the same thing.
See my post here for more detail:
http://forums.terracotta.org/forums/posts/list/0/8833.page#40635

This article specifies how we can use in-process cache in front of distributed cache in spring application by define our own MultiTieredCacheManager and MultiTieredCache:
Multi Tiered Caching - Using in-process Cache in front of Distributed Cache

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.