Cassandra Hector Load balancing

Cassandra Hector Load balancing - java

I have setup a Cassandra cluster with two nodes recently. The replication factor is set to 2 and they both seem to be working well if both the nodes are turned on.
Now how can I use hector in such a way so that it keeps working as far as atleast one node is up? As of now I have something like following.
CassandraHostConfigurator cassandraHostConfigurator = new CassandraHostConfigurator(
"localhost:9160,xx.xx.13.22:9160");
cassandraHostConfigurator.setMaxActive(20);
cassandraHostConfigurator.setMaxIdle(5);
cassandraHostConfigurator.setCassandraThriftSocketTimeout(3000);
cassandraHostConfigurator.setMaxWaitTimeWhenExhausted(4000);
Cluster cluster = HFactory.getOrCreateCluster("structspeech",
cassandraHostConfigurator);
Keyspace keyspace = HFactory.createKeyspace("structspeech", cluster);
....
Let's say if host xx.xx.13.22 goes down then I am getting the following message in my console and all my inserts are failing untill that node comes up.
Downed xx.xx.13.22(xx.xx.13.22):9160 host still appears to be down: Unable to open transport to xx.xx.13.22(xx.xx.13.22):9160 , java.net.ConnectException: Connection refused: connect
This is how my keyspace is defined
update keyspace structspeech with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options =[{replication_factor:2}];
I am sure I am missing something very trivial, any help will be greatly appreciated.
Thanks

By default Hector uses a consistency level of Quorum so if one of your nodes is down this level cannot be satisfied.
When RF = 2 quorum means you need to read and write to both nodes, so if one of them is down you can't execute.
Here's a nice online tool that demonstrates NRW (N = replication factor, R = read consistency and W = write consistency) http://www.ecyrd.com/cassandracalculator/
To change the consistency level while writing/reading use, for example AllOneConsistencyLevelPolicy HFactory.createKeyspace(String, Cluster, ConsistencyLevelPolicy)

What consistency level are you using when you insert? If you are writing at QUORUM or ALL, you need both nodes to be up to write with a replication factor of 2 (a quorum for 2 nodes is 2, that's why typical cassandra clusters use an odd number for replication factor)

Related

Hazelcast Cache - Printing too many logs ( Ignoring join check from [10.10.10.10]:5702, because this node is not master...)

I'm using Hazelcast Cache for my application.
I have two nodes of Jboss on two different Machines.
Each nodes have two deployments.
Each deployment file has their own hazelcast cache.
I want to cluster between two nodes for each application and below is my configurations,
Config config = new Config();
config.setClusterName("uniqueClusterName");
config.getNetworkConfig().getJoin().getTcpIpConfig().addMember("10.100.101.82,10.100.101.83").setEnabled(true);
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
manager = Hazelcast.newHazelcastInstance(config);
My above configuration is working fine and both the nodes are making cluster on each application.
But I have found below logs, and these logs are printing continuously
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.priority-generic-operation.thread-0) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.hungry_hofstadter.priority-generic-operation.thread-0) [10.100.101.82]:5701 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.generic-operation.thread-1) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
Any work around? How to avoid these logs or I'm doing something wrong here?
TIA

Two clusters sharing the same hardware isn't ideal, as they contend for machine resources.
But if you do, you don't want them clashing, which is what will happen with the default port allocation. The default being to try to listen on port 5701, if this is busy try 5702 and so on. And to try to find other cluster members assuming they are on 5701 also.
To make it work:
(1) Give them unique names, as you've done
config.setClusterName("uniqueClusterName");
&
config.setClusterName("uniqueClusterName2");
As they have different cluster names, members from one cluster won't be able to
join the other. This won't stop them trying, which is causing unwanted log messages.
(2) Assign predictable ports
Try
config.getNetworkConfig().setPort(6701);
&
config.getNetworkConfig().setPort(7701);
They will both try to find ports starting from different offsets, which will allow for predictability.
Without this, both clusters will try to use the default 5701 as the first port, and whichever cluster starts first will success.
With this, the first cluster's member will try and should succeed to get 6701. The second cluster's member will try and should succeed to get 7701.
(3) Specify addresses and ports for connectivity attempts
Try
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:6701,10.100.101.83:6701")
and
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:7701,10.100.101.83:7701")

Multiple Endpoints in Cassandra cluster Connection

I want to give multiple Cassandra endpoints from the config file to my Java application.
Ex:
cassandra host: "host1, host2"
I tried addContactPoints(host), but it did not work. If one of the Cassandra node goes down, I don't want my application to go down.
cluster = Cluster.builder()
.withClusterName(cassandraConfig.getClusterName())
.addContactPoints(cassandraConfig.getHostName())
.withSocketOptions(new SocketOptions().setConnectTimeoutMillis(30000).setReadTimeoutMillis(30000))
.withPoolingOptions(poolingOptions).build();

The java driver is resilient to one of the contact points provided not being available. Contact points are used for establishing an initial connection [*]. As long as the driver is able to communicate with one contact point, it should be able to query the system.peers and system.local table to discover the rest of the nodes in the cluster.
* They are also added to a list of initial hosts in the cluster, but typically the contact points provided map to a node in the system.peers table.

Cassandra behavior on contact point based on data center

Cassandra setup in 3 data-center (dc1, dc2 & dc3) forming a cluster
Running a Java Application on dc1.
dc1 application has Cassandra connectors pointed to dc1 (ips of cassandra in dc1 alone given to the application)
turning off the dc1 cassandra nodes application throws exception in application like
All host(s) tried for query failed (no host was tried)
More Info:
cassandra-driver-core-3.0.8.jar
netty-3.10.5.Final.jar
netty-buffer-4.0.37.Final.jar
netty-codec-4.0.37.Final.jar
netty-common-4.0.37.Final.jar
netty-handler-4.0.37.Final.jar
netty-transport-4.0.37.Final.jar
Keyspace : Network topology
Replication : dc1:2, dc2:2, dc3:2
Cassandra Version : 3.11.4

Here are some things I have found out with connections and Cassandra (and BTW, I believe Cassandra has one of the best HA configurations of any database I've worked with over the past 25 years).
1) Ensure you have all of the components specified in your connection connection. Here is an example of some of the connection components, but there are others as well (maybe you've already done this):
cluster = Cluster.builder()
.addContactPoints(nodes.split(","))
.withCredentials(username, password)
.withPoolingOptions(poolingOptions)
.withLoadBalancingPolicy(
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()
.withLocalDc("MYLOCALDC")
.withUsedHostsPerRemoteDc(1)
.allowRemoteDCsForLocalConsistencyLevel()
.build()
)
).build();
2) Unless the entire DC you're "working in" is down, you could receive errors. Cassandra doesn't fail over to alternate DCs unless every node is down in the DC. If less than all nodes are down and your client can't satisfy the client CL settings, you will receive errors. I was actually hoping, when I did testing a while back, that if you couldn't achieve client CL in the LOCAL DC (even if some nodes in the current DC were up) and alternate DCs could, that it would automatically fail over, but this is not the case (since I last tested).
Maybe that helps?
-Jim

Usage of the LOCAL_QUORUM consistency level in Datastax driver

For some reasons I need to query a particular datacenter within my cassandra cluster. According to the documentation, I can use the LOCAL_QUORUM consistency level:
Returns the record after a quorum of replicas in the current
datacenter as the coordinator has reported. Avoids latency of
inter-datacenter communication.
Do I correctly understand, that in order to specify a particular datacenter for the current query, I have to build a cluster on the given endpoint belonging to this particular DC?
Say, I have two DC's with the following nodes:
DC1: 172.0.1.1, 172.0.1.2
DC1: 172.0.2.1, 172.0.2.2
So, to work with DC1, I build a cluster as:
Cluster cluster = Cluster.builder().addContactPoint("172.0.1.1").build();
Session session = cluster.connect();
Statement statement = session.prepare("select * from ...").bind().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
ResultSet resultSet = session.execute(session);
Is it a proper way to do that?

By itself, DCAwwareRoundRobinPolicy will pick the data center that it finds with the "least network distance" algorithm. To ensure it connects where you want, you should specify the DC as a parameter.
Here is how I tell our dev teams to do it:
Builder builder = Cluster.builder()
.addContactPoints(nodes)
.withQueryOptions(new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(
new DCAwareRoundRobinPolicy.Builder()
.withLocalDc("DC1").build()))
.withPoolingOptions(options);
Note: this may or may not be applicable to your situation, but do I recommend using the TokenAwarePolicy with the DCAwareRoundRobin nested inside it (specifying the local DC). That way any operation specifying the partition key will automatically route to the correct node, skipping the need for an extra hop required with a coordinator node.

According to the Cluster class documentation:
A cluster object maintains a permanent connection to one of the
cluster nodes which it uses solely to maintain information on the
state and current topology of the cluster
Also, because a default load balancing policy is DCAwareRoundRobinPolicy this approach should work fine as expected.

Auto clustering in hazelcast

I tested with the hazelcast-default.xml,
What is happening is I have started a node 192.X.1.1 with port as 5701 and it becomes up and works like a fly,
Mean while, I started a node 192.X.1.2 with port 5701 and I wonder It does a mapping and join together, How to avoid that,
Is the param cluster.min setting to '1', solves the problem???

I am assuming that by cluster min setting you mean hazelcast.initial.min.cluster.size . That is unrelated to this issue. This property simply requires an x number of nodes to join the cluster before starting your application.
What you are looking for depends on whether you are using multicast or TCP-IP to discover nodes.
See this book for details: http://hazelcast.com/resources/mastering-hazelcast/
In case of multicasting you need to set groups, and add the nodes to different groups.
You could also simply define interfaces such as:
192.168.24.*
with the range of IP you want to by accepted by your cluster.
Finally if you are using TCP-IP you need to define the ip of the nodes that will join your cluster.
A simple example being :
<hz:join>
<hz:multicast enabled="false" />
<hz:tcp-ip enabled="true">
<hz:members>192.168.0.1</hz:members>
</hz:tcp-ip>
</hz:join>
(Example shown are using spring configuration)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cassandra Hector Load balancing - java

What consistency level are you using when you insert? If you are writing at QUORUM or ALL, you need both nodes to be up to write with a replication factor of 2 (a quorum for 2 nodes is 2, that's why typical cassandra clusters use an odd number for replication factor)

Related

Hazelcast Cache - Printing too many logs ( Ignoring join check from [10.10.10.10]:5702, because this node is not master...)

Multiple Endpoints in Cassandra cluster Connection

Cassandra behavior on contact point based on data center

Usage of the LOCAL_QUORUM consistency level in Datastax driver

Auto clustering in hazelcast

Categories

Resources