Distributed cache for key-value pairs

Distributed cache for key-value pairs - java

I'm looking for a distributed cache for key-value pairs with these features -
Persistence to disk
Open Source
Java Interface
Fast read/write with minimum memory utilisation
Easy to add more machines to the database (Horizontally Scalable)
What are the databases that fit the bill?

Redisson framework also provides distributed cache abilities based on Redis

There are a lot of options that you can make use of.
Redis - the one you've stated by yourself. Its a distinct process, very fast, key-value for sure, but it's not an "in-memory with your application", I mean that you'll always do a socket I/O in order to go to redis process.
Its not written in Java, but it provides a descent Java Driver to work with, moreover there is a spring integration.
If you want a java based solution consider the following:
memcached - a distributed cache
Hazelcast - its a datagrid, its much more than simply key-value store, but you might be interested in this as well.
Infinispan - folks from JBoss have created this one
EHCache - a popular distributed cache
Hope this helps

Related

Distributed Keying / Partitioning / Sharding Java library

I receive http requests on my N fronend machines and want to have them processed by my K backend machines depending on a certain key in the data.
the keying has to be stable and consistent. I also want to scale the frontend and backend machines depending on the load without interruption. I am fine when very little data is lost while scaling.
I think i could achieve my goal with kafka or apache flink. Maybe also hazelcast could be used, by they all seem heavy weight and too much for my case.
Is there a library that just solves the aspect of keying / partitioning / sharding in a distributed way ?
Bonus points for rx integration library.

What makes you think Hazelcast is heavier?
Hazelcast actually provides everything within a single environment - sharding, consistent hashing, partitioning and ensuring high availability of the data etc. Plus the easy and straightforward APIs take away lot of pains of writing code to use. All you are required to do is start a HC cluster using startup scripts and invoke APIs like map.put(key, value)/map.get(key) etc, its that simple as everything else is taken care of by Hazelcast behind the scenes.

In this kind of scenario I usually use a cluster tech that tracks membership (hazelcast or my favorite jgroups, much lighter than hazelcast)
Then combine the current cluster size/members with a consistent hashing function like Guava's (see https://github.com/google/guava/wiki/HashingExplained )
The consistent hash would take in your data as the key and the current cluster member size as the buckets, and you would get back a consistent answer for that same # of buckets.
Then use the computed bucket to route your request

How start hazelcast as simply cache tool inside spring appicaltion

Here is an example how start hazelcast without network. For my purpose it is needed to run hazelcast as embeded, only as simple cache. But original question was about testing - can I use that code for production when I do not need a separated hazelcast server?

Hazelcast is designed to be a distributed system. It wasn't designed to be in-process cache. Because of distributed nature, may design decisions don't make it as a good candidate for your use case. You will see overhead on serialization and network (even in local single node embedded mode).
We're planning to improve this situation by providing optimization for local cache use case but no ETA at this point. You will see some features related to this use case in next couple releases.
I would suggest taking a look on Caffeine. It has JCache and Spring Boot integration. I would suggest sticking to JCache integration because it will make your code portable. If in the future you decide to go distributed, you just need to replace caffeine jars with Hazelcast.
Feel free to ask if you have any questions.
Thank you

Java collections over arangoDB

I am relatively new to ArangoDB. Is there any library written which implements java collections over ArangoDB.
i.e creates an Arangodb server that stores the value in the database and extracts the values as and when needed. I am looking for something similar to Redisson (https://github.com/mrniko/redisson) which is implemented over Redis.

Sadly it is not possible at the moment. But if you want to you can modify the ArangoDB Java Driver (github.com/arangodb/arangodb-java-driver).
Everyone can contribute to the project and if you need any help with the work just ask the ArangoDB Team.

It's important to note that Redis is an in-memory (but persistent-on-disk) database. This makes for screaming fast read/write operations, but at a high memory cost. ArangoDB, on the other hand, compromises some speed to limit the memory footprint, and does so quite well.
However, because of this difference, it does not necessarily make sense to do for ArangoDB what Redisson does for Redis - that is, expose its own Java Collection implementations which allow more direct interaction with the in-memory entities. You would most likely run into unwanted memory issues. Since memory optimization is an important (and nice!) feature of ArangoDB, I would avoid going down this path.
That being said, there are newer Java libraries available to help you easily integrate with ArangoDB.
JNoSQL is a solid "JPA or ORM-like" framework written specifically for NoSQL databases. ArangoDB is one of many that is supported. It exposes convenient annotations and easily supports classic DAO/Repository patterns. There are some good code examples that'll help point you in the right direction.

Second level cache for java web app and its alternatives

Between the transitions of the web app I use a Session object to save my objects in.
I've heard there's a program called memcached but there's no compiled version of it on the site,
besides some people think there are real disadvantages of it.
Now I wanna ask you.
What are alternatives, pros and cons of different approaches?
Is memcached painpul for sysadmins to install? Is it difficult to embed it to the existing infrastructure from the perspective of a sysadmin?
What about using a database to hold temporary data between web app transitions?
Is it a normal practice?

What about using a database to hold
temporary data between web app
transitions? Is it a normal practice?
Database have indeed a cache already. A well design application should try to leverage it to reduce the disk IO.
The database cache works at the data level. That's why other caching mechanism can be used to address different levels. At the java level, you can use the 2nd level cache of hibernate, which can cache entities and query result. This can notably reduce the network IO between the app. server and the database.
Then you may want to address horizontal scalability, that is, to add servers to manage the load. In this case, the 2nd level cache need to be distributed across the nodes. This exists (see JBoss cache), but can get slightly complicated to manage.
Distributed cache tend to worker better if they have simpler scheme based on key/value. That's what memcached is, but there are also other similar solutions. The biggest problem with distributed caches is invalidation of outdated entries -- which can itself turn into a performance bottleneck.
Don't think that you can use a distributed cache as-is to make your performance problems vanish. Designing a scalable distributed architecture requires experience and is always a matter of trade-off between what to optimize and not.
To come back to your question: for regular application, there is IMHO no need of a distributed cache. Decent disk IO and network IO lead usually to decent performance.
EDIT
For non-persistent objects, you have several options:
The HttpSession. Objects need to implement Serializable. The exact way the session is managed depends on the container. In a cluster, the session is usually replicated twice, so that if one node crashes you still have one copy. There is then session affinity to route the request to the server that has the session in memory.
Distributed cache. A system like memcached may indeed make sense, but I don't know the details.
Database. You could of course dump any Serializable object in the database in a BLOB. Can be an option if the web servers are not as reliable as the database server.
Again, for regular application, I would try to go as far as possible with the HttpSession.

How about Ehcache? It's an easy to use pure Java solution ready to plug in to Hibernate. As far as I remember it's supported by containers.
It's quite painless in my experience.

http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
This page should have everything that you need (hopefully !)

Teracotta and Hibernate Search

Does anyone have experience with using Terracotta with Hibernate Search to satisfy application Queries?
If so:
What magnitude of "object
updates" can it handle? (How's the
performance)
What kind of performance do the
Queries have?
Is it possible to use Terracotta
Hibernate Search without even having
a backing Database to satisfy all
"queries" in Memory?

I am Terracotta's CTO. I spent some time last month looking at Hibernate Search. It is not built in a way to be clustered transparently by Terracotta. Here's why in a nutshell: Hibernate has a custom-built JMS replication of Lucene indexes across JVMs.
The basic idea in Search is that talking to local disk under lucene works really well, whereas fragmenting or partitioning up Lucene indexes across the network introduces sooo much latency as to make Lucene seem bad when it is not Lucene's fault at all. To that end, HIbernate Search doesn't rely on JBossCache or any in-memory partitioning / caching schemes and instead relies on JMS and each JVM's local disk in order to provide up-to-date indexing across a cluster with simultaneous low latency. Then, the beauty of Hibernate Search is that standard Hibernate queries and more can be launch through Hibernate at these natural language indexes in each machine.
At Terracotta it turns out we had a similar idea to Emmanuel and built a SearchableMap product on top of Compass. Each machine gets its own Compass store and the store is configured to spill to disk locally. Terracotta is used to create a multi-master writing capability where any JVM can add to the index and the delta is sent through Terracotta to be replayed / reapplied locally to each disk. It works just like Hibernate Search but with DSO as the networking protocol in place of JMS and w/o the nice Hibernate interfaces but instead with Compass interfaces.
I think we will support Hibernate Search w/ help from JBoss (they would need to factor out the JMS impl as pluggable) by end of the year.
Now to your questions directly:
1.Object updates/sec in Hibernate or SearchableMap should be quite high because both are sending only deltas. In Hibernate's case it is a function of our JMS provider. In Terracotta it is scalable just by adding more Terracotta Servers to the array.
Query performance in both is very fast. Local memory performance in most cases. And if you need to page in from disk, it turns out most OSes do a good job and can respond way faster than any network-based clustering can to queries.
It will be, I think, once we get JBoss to factor out their JMS assumptions, etc.
Cheers,
--Ari

Since people on the Hibernate forums keep referring to this post I feel in need to point out that while Ari's comments where correct at the beginning of 2009, we have been developing and improving a lot.
Hibernate Search provides a set of backend channels out of the box, like the already mentioned JMS based and a more recent addition using JGroups, but we made it also pretty easy to plug in alternative implementations or override some.
In addition to using a custom backend, it's now possible since version 4 to replace the whole strategy and instead of changing the backend implementation only you can use an IndexManager which follows a different design and doesn't use a backend at all; at this time we have two IndexManagers only but we're working on more alternatives; again the idea is to provide nice implementations for the most common
It does have an Infinispan based backend for very quick distribution of the index across different nodes, and it should be straight forward to contribute one based on Terracotta or any other clustering technology. More solutions are coming.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.