CouchbaseClient VS CouchbaseCluster - java

I am trying to implement couchbase in my application.
I am confused with
com.couchbase.client.CouchbaseClient
AND
com.couchbase.client.java.CouchbaseCluster.
I tried to google on CouchbaseClient vs CouchbaseCluster but didn't found which one is better & Pros and Cons.
I know we have 3 types of Couchbase Client, one is vBucket-aware, one is traditional old client which support auto clustering via Moxi-Server.
Can someone who have already used couchbase provides me some link or detailed information about these two Java-Client.
I have done some homework on CouchbaseClient and CouchbaseCluster like inserting, updating, deleting documents via both.
In CouchbaseClient the documents stored are Serialized and you cannot view and edit those documents via Couchbase Admin Console, whereas if Documents like StringDocument, JsonDocument, JsonArrayDocument stored via Couchbase cluster can be viewed and are editable over Couchbase Admin Console.
My requirements is I want to use a couchbase client which is AutoConfiurable (vBucket-aware) like if I add new nodes to a cluster, it will auto detect it, or if any node failed, it will auto detect it and does not throw any exception. Further, if I add new cluster, I'd like it to auto detect it and start using it. I don't want to modify the application code for all these things.

There is now two generations of official Couchbase Java SDKs:
generation 1 (currently 1.4.x, not sure of the patch version) is derived from an old Memcached client, Spymemcached... it is now bugfixes only, and it's the one where you have CouchbaseClient as the primary API.
generation 2 is a rewrite, layered into a core artifact and java-client artifact in Maven. Current version is 2.1.3. This is the one where you deal with CouchbaseCluster.
In the old one, you'd have to instantiate one CouchbaseClient for each bucket you deal with.
In the new generation, the notions of cluster and bucket are first class citizens and you can (and should) reuse the same Cluster instance to open references to different Buckets. The Buckets should also be reused (don't open the same bucket several times). Resources are better mutualized this way.
Also, the new generation has more coherent APIs, uses RxJava for asynchronous processing, etc... It is cluster-aware and will get updates of the topology of the cluster (new nodes, failing nodes, etc...).
Note that these two generations are differents artifacts in Maven (old one is couchbase-client while new one is java-client).
There's no way you can get such a notification if you "add new cluster", but that operation doesn't really make sense to me...

Related

Objectify v5 and v6 at the same time in a google app engine java 8 standard project

We want to do a zero downtime migration of a google app engine java 8 standard project to another region.
Unfortunately google does not support this, so it has to be done manually.
One could export the datastore and import it again, but there may be no downtime and the data must always be consistent.
So the idea came up to create the project in the new region, and embed objectify 5 there with all entities (definitions, not data) used in the old project. Any new data goes in the "new datastore" attached to this new project.
All data not found on this new datastore shall be queried (if necessary) using objectify 6 connected to the "old" project using datastore api.
The advantage would be to not export any data manually at all and only migrate the most important data on the fly, using the mechanism above. (there's a lot of unused garbage, we did not do housekeeping for, but also some very vital data that must be on the new system)
Is this a valid approach? I know I'll probably integrate objectify by code and change package names to not have problems on the "code side"
If there is a better approach to migrate a project to another region, we're happy to hear.
We searched for hours without a proper result
Edit: I'm aware that we must instantly stop requests to the old service / disable writes there. We'd solve this by redirecting traffic (http) from the old project to the new one and disable writes.
This is a valid approach for migration. The traffic from new project can continue to do reads from old Datastore and writes to new one. I would like to add one more point.
However, soon after this switchover you should also plan data migration from old datastore to new one through mass export and import. The app will then have to be pointed to the new ones even for reads. https://cloud.google.com/datastore/docs/export-import-entities
This can be done gracefully by introducing a proxy connection logic in JAVA for connecting with the new Datastore entity. Which means during data migration, you put a condition in OFY6 to check for the new Datastore entity, if it is not available then read data from the old entity. This will ensure Zero downtime and in the backend you can silently and safely turn off the old datastore assuming you already have its full export.
Reads from both the old data source and new data source is a valid way to do migrations.

Solrcloud & data import handler

I am planning to upgrade Solr from single instance option to cloud option. Currently I have 5 cores and each one is configured with data import handler. I have deployed web application along with solr.war inside tomcat folder which will trigger full imports & delta-imports periodically according to my project needs.
Now, I am planning to create 2 shards for this application keeping half of my 5 cores data into each shard.I am not to understand how DIH will work in SolrCloud?
Is it fine if I start full-indexing from both shards?
Or I need to do full indexing from only one shard?
Architecture will look like below
It all depends on how you create your solr cloud: using composite id or implicit routing. Using composite id routing will take care of spreading the documents across all available shards. You can initiate the import from any solr cloud node. In the end the cloud environment will contain the imported document indices spread across all shards.
If you use implicit routing you have control where to keep each document index.
You do not have to use the DIH. Alternatively you can write a small app that uses the solr client to populate the index, which gives you more control.
After lots of googling and reading I finally decided to implement DIH as follows. Please let me know your comments if you feel there will be issues with this architecture.

WebSphere propagate changes across all nodes in cluster

I have a problem with a product that I am currently working on. Essentially, There is some very commonly used (and very seldomly updated) information that is retrieved from the database on server start up. We do not want to query the database every time this information is needed because it is very frequent. There is a way to update this information through the application (only by an admin). When this method is used, the data in the database is updated and the cached data in that single server (1 of 4) is updated. Unfortunately, if a user hits any of the other servers they will not see the updated information. Restarting the cluster remedies the problem however, that is not a feasible solution for our production environment. Now that I have explained the situation, I am open to suggestions. Thank you for your time.
For a simple solution, you can go to the cluster in the admin console and ripple start it. That stops/stars the nodes gracefully and one at a time. The only impact is a 25% reduction in capacity while it is working.
IBM WebSphere Application Server has a Dynamic Cache that you can use to store Java objects. The cache can be set up to use replication over a replication domain so it can be shared across a cluster.
Your code would use the DistributedMap interface to interact with the cache. All settings for the dynamic cache can be included with your application or it can be pre-configured. Examples are included in the javadoc link.
(Similar to Java EE Application-scoped variables in a clustered environment (Websphere)?)
That is, I think the standard answer would be a "Distributed Object Store". But a crude alternative (that we use) would be to configure a list of server:port combinations to contact to inform each cluster member to update their own copy of the data.

Is there an embeddable Java alternative to Redis?

According to this thread, Jedis is the best thing to use if I want to use Redis from Java.
However, I was wondering if there are any libraries/packages providing similarly efficient set operations to those that already exist in Redis, but can be directly embedded in a Java application without the need to set up separate servers. (i.e., using Jetty for web server).
To be more precise, I would like to be able to do the following efficiently:
There are a large set of M users (M not known in advance).
There are a large set of N items.
We want users to examine items, one user/item at a time, which produces a stored result (in a normal database.)
Each time a user arrives, we want to assign to that user the item with the least number of existing results that the user has not already seen before. This produces an approximate round-robin assignment of the items over all arriving users, when we just care about getting all items looked at approximately the same number of times.
The above happens in a parallelized fashion. When M and N are large, Redis accomplishes the above much more efficiently than SQL queries. Is there some way to do this using an embeddable Java library that is a bit more lightweight than starting a Redis server?
I recognize that it's possible to write a pile of code using Java's concurrency libraries that would roughly approximate this (and to some extent, I have done that), but that's not exactly what I'm looking for here.
Have a look at project voldemort . It's an distributed key-value store created by Linked-In, and it supports the ability to be embedded.
In the quick start guide is a small example of running the server embedded vs. stand-alone.
VoldemortConfig config = VoldemortConfig.loadFromEnvironmentVariable();
VoldemortServer server = new VoldemortServer(config);
server.start();
I don't know much about Redis, so I can't compare them feature to feature. In the project we used Voldemort, we used it's readonly backing store with great results. It allowed us to "precompile" a bi-daily database in our processing data-center and "ship it" out to edge data-centers. That way each edge data-center had a local copy of it's dataset.
EDIT: After rereading your question, I wanted to add Gauva's Table -- This Table DataStructure may also be something your looking for and is simlar to what you get with many no-sql databases.
Hazelcast provides a number of distributed data structure implementations which can be used as a pure Java alternative to Redis' services. You could then ship a single "jar" with all required dependencies to run your application. You may have to adjust for the slightly different primitives relative to Redis in your own application.
Commercial solutions in this space include Teracotta's Enterprise Ehcache and Oracle Coherence.
Take a look at lmdb (Lightning Memory Database), because I needed exactly the same thing. I deploy a dropwizard application into a container, and adding redis or another external dependancy is painful. This seems to perform well, has good activity. fyi, though, i have not yet used this in production.
https://github.com/lmdbjava/lmdbjava
Google's Guava Library provides friendly versions of the same (and more) Set operators that redis provides.
https://code.google.com/p/guava-libraries/wiki/CollectionUtilitiesExplained
e.g.
Guava Redis
Sets.intersection(a,b) sinter a b
a.count() scard a
Sets.difference(a,b) sdiff a b
Sets.union(a,b) sunion a b
Multisets are a reasonably straightforward proxy for redis sorted-sets as well.

How to share a library for data access in tomcat 7?

I'm fairly new to the whole web programming stuff and have the following problem:
I have 2 webapps, one a axis web service and another one is a spring application. Both should get a set of data from a library which contains the data in memory. This data is large so copying the data for each app is no option.
What I did so far is developing the library which loads and contains the data in a static container. The plan was, that both apps instatiate the class containing the container and may access the data.
Sadly, this doesn't work. I get an exception that the object I want to use are in different classloaders.
My question is: How can I provide such a container provider for both libraries in tomcat 7?
BTW: A database is no option, because its to slow.
Edit: I should have been clear about the data. The data is a Topic Map stored in an topic map engine. (see http://www.isotopicmaps.org ). The engine is used to access the data and therefore is the access point to the data. We have an own engine, which hold the data inmemory which is faster than a database backend.
I Want to have a servlet which provides the configuration and loading of topic maps and then the two servlets above should be able to read and modify a topic map. Thats why I need to have a sort of shared access point to the engine.
This is what distributed caches, key-value stores, document stores, and noSql databases are built for. There are many options and new ones each day. The free and open-source options are likely to meet your needs and provide you with as much support as you will needs. The one the is currently my favorite is membase.
So you want a distributed in-memory cache for a server cluster. You can use among others Terracotta for this. You can find here a nice introduction to Terracotta.
Update: I actually disagree the argument that a database is "too slow". If it's slow, then the datamodel and/or data access code is simply badly designed.

Categories

Resources