Auto clustering in hazelcast

Auto clustering in hazelcast - java

I tested with the hazelcast-default.xml,
What is happening is I have started a node 192.X.1.1 with port as 5701 and it becomes up and works like a fly,
Mean while, I started a node 192.X.1.2 with port 5701 and I wonder It does a mapping and join together, How to avoid that,
Is the param cluster.min setting to '1', solves the problem???

I am assuming that by cluster min setting you mean hazelcast.initial.min.cluster.size . That is unrelated to this issue. This property simply requires an x number of nodes to join the cluster before starting your application.
What you are looking for depends on whether you are using multicast or TCP-IP to discover nodes.
See this book for details: http://hazelcast.com/resources/mastering-hazelcast/
In case of multicasting you need to set groups, and add the nodes to different groups.
You could also simply define interfaces such as:
192.168.24.*
with the range of IP you want to by accepted by your cluster.
Finally if you are using TCP-IP you need to define the ip of the nodes that will join your cluster.
A simple example being :
<hz:join>
<hz:multicast enabled="false" />
<hz:tcp-ip enabled="true">
<hz:members>192.168.0.1</hz:members>
</hz:tcp-ip>
</hz:join>
(Example shown are using spring configuration)

Related

Hazelcast Cache - Printing too many logs ( Ignoring join check from [10.10.10.10]:5702, because this node is not master...)

I'm using Hazelcast Cache for my application.
I have two nodes of Jboss on two different Machines.
Each nodes have two deployments.
Each deployment file has their own hazelcast cache.
I want to cluster between two nodes for each application and below is my configurations,
Config config = new Config();
config.setClusterName("uniqueClusterName");
config.getNetworkConfig().getJoin().getTcpIpConfig().addMember("10.100.101.82,10.100.101.83").setEnabled(true);
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
manager = Hazelcast.newHazelcastInstance(config);
My above configuration is working fine and both the nodes are making cluster on each application.
But I have found below logs, and these logs are printing continuously
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.priority-generic-operation.thread-0) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.hungry_hofstadter.priority-generic-operation.thread-0) [10.100.101.82]:5701 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.generic-operation.thread-1) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
Any work around? How to avoid these logs or I'm doing something wrong here?
TIA

Two clusters sharing the same hardware isn't ideal, as they contend for machine resources.
But if you do, you don't want them clashing, which is what will happen with the default port allocation. The default being to try to listen on port 5701, if this is busy try 5702 and so on. And to try to find other cluster members assuming they are on 5701 also.
To make it work:
(1) Give them unique names, as you've done
config.setClusterName("uniqueClusterName");
&
config.setClusterName("uniqueClusterName2");
As they have different cluster names, members from one cluster won't be able to
join the other. This won't stop them trying, which is causing unwanted log messages.
(2) Assign predictable ports
Try
config.getNetworkConfig().setPort(6701);
&
config.getNetworkConfig().setPort(7701);
They will both try to find ports starting from different offsets, which will allow for predictability.
Without this, both clusters will try to use the default 5701 as the first port, and whichever cluster starts first will success.
With this, the first cluster's member will try and should succeed to get 6701. The second cluster's member will try and should succeed to get 7701.
(3) Specify addresses and ports for connectivity attempts
Try
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:6701,10.100.101.83:6701")
and
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:7701,10.100.101.83:7701")

Cassandra behavior on contact point based on data center

Cassandra setup in 3 data-center (dc1, dc2 & dc3) forming a cluster
Running a Java Application on dc1.
dc1 application has Cassandra connectors pointed to dc1 (ips of cassandra in dc1 alone given to the application)
turning off the dc1 cassandra nodes application throws exception in application like
All host(s) tried for query failed (no host was tried)
More Info:
cassandra-driver-core-3.0.8.jar
netty-3.10.5.Final.jar
netty-buffer-4.0.37.Final.jar
netty-codec-4.0.37.Final.jar
netty-common-4.0.37.Final.jar
netty-handler-4.0.37.Final.jar
netty-transport-4.0.37.Final.jar
Keyspace : Network topology
Replication : dc1:2, dc2:2, dc3:2
Cassandra Version : 3.11.4

Here are some things I have found out with connections and Cassandra (and BTW, I believe Cassandra has one of the best HA configurations of any database I've worked with over the past 25 years).
1) Ensure you have all of the components specified in your connection connection. Here is an example of some of the connection components, but there are others as well (maybe you've already done this):
cluster = Cluster.builder()
.addContactPoints(nodes.split(","))
.withCredentials(username, password)
.withPoolingOptions(poolingOptions)
.withLoadBalancingPolicy(
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()
.withLocalDc("MYLOCALDC")
.withUsedHostsPerRemoteDc(1)
.allowRemoteDCsForLocalConsistencyLevel()
.build()
)
).build();
2) Unless the entire DC you're "working in" is down, you could receive errors. Cassandra doesn't fail over to alternate DCs unless every node is down in the DC. If less than all nodes are down and your client can't satisfy the client CL settings, you will receive errors. I was actually hoping, when I did testing a while back, that if you couldn't achieve client CL in the LOCAL DC (even if some nodes in the current DC were up) and alternate DCs could, that it would automatically fail over, but this is not the case (since I last tested).
Maybe that helps?
-Jim

Usage of the LOCAL_QUORUM consistency level in Datastax driver

For some reasons I need to query a particular datacenter within my cassandra cluster. According to the documentation, I can use the LOCAL_QUORUM consistency level:
Returns the record after a quorum of replicas in the current
datacenter as the coordinator has reported. Avoids latency of
inter-datacenter communication.
Do I correctly understand, that in order to specify a particular datacenter for the current query, I have to build a cluster on the given endpoint belonging to this particular DC?
Say, I have two DC's with the following nodes:
DC1: 172.0.1.1, 172.0.1.2
DC1: 172.0.2.1, 172.0.2.2
So, to work with DC1, I build a cluster as:
Cluster cluster = Cluster.builder().addContactPoint("172.0.1.1").build();
Session session = cluster.connect();
Statement statement = session.prepare("select * from ...").bind().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
ResultSet resultSet = session.execute(session);
Is it a proper way to do that?

By itself, DCAwwareRoundRobinPolicy will pick the data center that it finds with the "least network distance" algorithm. To ensure it connects where you want, you should specify the DC as a parameter.
Here is how I tell our dev teams to do it:
Builder builder = Cluster.builder()
.addContactPoints(nodes)
.withQueryOptions(new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(
new DCAwareRoundRobinPolicy.Builder()
.withLocalDc("DC1").build()))
.withPoolingOptions(options);
Note: this may or may not be applicable to your situation, but do I recommend using the TokenAwarePolicy with the DCAwareRoundRobin nested inside it (specifying the local DC). That way any operation specifying the partition key will automatically route to the correct node, skipping the need for an extra hop required with a coordinator node.

According to the Cluster class documentation:
A cluster object maintains a permanent connection to one of the
cluster nodes which it uses solely to maintain information on the
state and current topology of the cluster
Also, because a default load balancing policy is DCAwareRoundRobinPolicy this approach should work fine as expected.

JGroups function to list available Groups or Clusters

I have a series of clients which communicate with each other using JGroups library, they basically create a communication channel attached to a cluster name:
communicationChannel = new JChannel(AutoDiscovery.class.getResource("/resource/udp.xml"));
communicationChannel.connect("cluster1");
Now I would like them to first list available clusters to connect to and let the user decide which cluster connect to without hardwiring the name of the cluster in the code as above.
Apparently the API has getName() which returns the logical name of the channel if set but there's no method to retrieve set up clusters.
I though using the org.jgroups.Message.getHeaders() and reading the header would yield the active clusters but nothing.
Any help please?

There's no way to find the currently available clusters, I suggest maintaining some extra state which stores (in-memory) all cluster names and their associated configuration.
Once thing you could do though is develop a custom protocol (insert it below GMS), which does the following:
- Catches down(Event evt): if evt.getType() == Event.CONNECT*** (4 events), grab the cluster name ((String)evt.getArg()) and add it to a set
- Catches down(Event evt): if evt.getType() == Event.DISCONNECT, grab the currently cluster name and remove it from the set
This doesn't give you the config info; you could get this too, if you subclasses JChannel and overwrote connectXXX() and disconnect().

Cassandra Hector Load balancing

I have setup a Cassandra cluster with two nodes recently. The replication factor is set to 2 and they both seem to be working well if both the nodes are turned on.
Now how can I use hector in such a way so that it keeps working as far as atleast one node is up? As of now I have something like following.
CassandraHostConfigurator cassandraHostConfigurator = new CassandraHostConfigurator(
"localhost:9160,xx.xx.13.22:9160");
cassandraHostConfigurator.setMaxActive(20);
cassandraHostConfigurator.setMaxIdle(5);
cassandraHostConfigurator.setCassandraThriftSocketTimeout(3000);
cassandraHostConfigurator.setMaxWaitTimeWhenExhausted(4000);
Cluster cluster = HFactory.getOrCreateCluster("structspeech",
cassandraHostConfigurator);
Keyspace keyspace = HFactory.createKeyspace("structspeech", cluster);
....
Let's say if host xx.xx.13.22 goes down then I am getting the following message in my console and all my inserts are failing untill that node comes up.
Downed xx.xx.13.22(xx.xx.13.22):9160 host still appears to be down: Unable to open transport to xx.xx.13.22(xx.xx.13.22):9160 , java.net.ConnectException: Connection refused: connect
This is how my keyspace is defined
update keyspace structspeech with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options =[{replication_factor:2}];
I am sure I am missing something very trivial, any help will be greatly appreciated.
Thanks

By default Hector uses a consistency level of Quorum so if one of your nodes is down this level cannot be satisfied.
When RF = 2 quorum means you need to read and write to both nodes, so if one of them is down you can't execute.
Here's a nice online tool that demonstrates NRW (N = replication factor, R = read consistency and W = write consistency) http://www.ecyrd.com/cassandracalculator/
To change the consistency level while writing/reading use, for example AllOneConsistencyLevelPolicy HFactory.createKeyspace(String, Cluster, ConsistencyLevelPolicy)

What consistency level are you using when you insert? If you are writing at QUORUM or ALL, you need both nodes to be up to write with a replication factor of 2 (a quorum for 2 nodes is 2, that's why typical cassandra clusters use an odd number for replication factor)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Auto clustering in hazelcast - java

Related

Hazelcast Cache - Printing too many logs ( Ignoring join check from [10.10.10.10]:5702, because this node is not master...)

Cassandra behavior on contact point based on data center

Usage of the LOCAL_QUORUM consistency level in Datastax driver

JGroups function to list available Groups or Clusters

Cassandra Hector Load balancing

Categories

Resources