Ignite Server Node with CacheJdbcPojoStore / Delay Client Node - java

I'm using Spring and Ignite Spring to run a pretty simple cluster consisting of one server node and multiple client nodes with varying domain logic.
Ignite configuration:
All nodes connect via TcpCommuncationSpi, TcpDiscoverySpi and TcpDiscoveryVmIpFinder. TcpDiscoveryVmIpFinder.setAddresses only contain the server node.
The server node has serveral CacheJdbcPojoStore configured, the database data is loaded after calling Ignition.start(cfg) using ignite.cache("mycachename").loadCache(null).
After a client node is connected to the server node, it does some domain specific checks to verify data integrity. This works very well if I first start the server node, wait for all data to be loaded and then start the client nodes.
My problem: if I first start the client nodes, they can't connect to the server node as it is not yet started. They patiently wait for the server node to come up. If I now start the server node, the client nodes connect directly after Ignition.start(cfg) is done on the server node, but BEFORE the CacheJdbcPojoStores are done loading their data. Thus the domain specific integrity checks do fail as there is no data present in the caches yet.
My goal: I need a way to ensure the client nodes are only able to connect to the server node AFTER all data is loaded in the server node. This would simply the deployment process as well as local development a lot, as there would be no strict ordering of starting the nodes.
What I tried so far: Fiddling around with ignite.cluster().state(ClusterState.INACTIVE) as well as setting up a manually controlled BaselineTopology. Sadly, both methods simple declare the cluster as not being ready yet and thus I can't even create the caches, let alone load data.
My question: Is there any way to achieve either:
hook up into the the startup process of the server node so I can load the data in BEFORE it joins the cluster
Declare the server node "not yet ready" until the data is loaded.

Use one of the available data structures like AtomicLong or Semaphore to simulate readiness state. There is no need for internal hooks.
Added an example below:
Client:
while (true) {
// try get "myAtomic" and check its value
IgniteAtomicLong atomicLong = ignite.atomicLong("myAtomic", 0, false);
if (atomicLong != null && atomicLong.get() == 1) {
// initialization is complete
break;
}
// not ready
Thread.sleep(1000);
}
Server:
...
//loading is complete, create atomic sequence and set its value to 1
ignite.atomicLong("myAtomic", 1, true);
...
UPDATE#2
instead of having a self-written-loop including Thread.sleep, latch.await() can be called. See: countdownlatch

Related

Hazelcast Cache - Printing too many logs ( Ignoring join check from [10.10.10.10]:5702, because this node is not master...)

I'm using Hazelcast Cache for my application.
I have two nodes of Jboss on two different Machines.
Each nodes have two deployments.
Each deployment file has their own hazelcast cache.
I want to cluster between two nodes for each application and below is my configurations,
Config config = new Config();
config.setClusterName("uniqueClusterName");
config.getNetworkConfig().getJoin().getTcpIpConfig().addMember("10.100.101.82,10.100.101.83").setEnabled(true);
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
manager = Hazelcast.newHazelcastInstance(config);
My above configuration is working fine and both the nodes are making cluster on each application.
But I have found below logs, and these logs are printing continuously
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.priority-generic-operation.thread-0) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.hungry_hofstadter.priority-generic-operation.thread-0) [10.100.101.82]:5701 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
INFO [com.hazelcast.internal.cluster.impl.operations.SplitBrainMergeValidationOp] (hz.cocky_jackson.generic-operation.thread-1) [10.100.101.82]:5702 [losce_qa] [4.1] Ignoring join check from [10.100.101.83]:5702, because this node is not master...
Any work around? How to avoid these logs or I'm doing something wrong here?
TIA
Two clusters sharing the same hardware isn't ideal, as they contend for machine resources.
But if you do, you don't want them clashing, which is what will happen with the default port allocation. The default being to try to listen on port 5701, if this is busy try 5702 and so on. And to try to find other cluster members assuming they are on 5701 also.
To make it work:
(1) Give them unique names, as you've done
config.setClusterName("uniqueClusterName");
&
config.setClusterName("uniqueClusterName2");
As they have different cluster names, members from one cluster won't be able to
join the other. This won't stop them trying, which is causing unwanted log messages.
(2) Assign predictable ports
Try
config.getNetworkConfig().setPort(6701);
&
config.getNetworkConfig().setPort(7701);
They will both try to find ports starting from different offsets, which will allow for predictability.
Without this, both clusters will try to use the default 5701 as the first port, and whichever cluster starts first will success.
With this, the first cluster's member will try and should succeed to get 6701. The second cluster's member will try and should succeed to get 7701.
(3) Specify addresses and ports for connectivity attempts
Try
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:6701,10.100.101.83:6701")
and
config.getNetworkConfig().getJoin().getTcpIpConfig()
.addMember("10.100.101.82:7701,10.100.101.83:7701")

Multiple Endpoints in Cassandra cluster Connection

I want to give multiple Cassandra endpoints from the config file to my Java application.
Ex:
cassandra host: "host1, host2"
I tried addContactPoints(host), but it did not work. If one of the Cassandra node goes down, I don't want my application to go down.
cluster = Cluster.builder()
.withClusterName(cassandraConfig.getClusterName())
.addContactPoints(cassandraConfig.getHostName())
.withSocketOptions(new SocketOptions().setConnectTimeoutMillis(30000).setReadTimeoutMillis(30000))
.withPoolingOptions(poolingOptions).build();
The java driver is resilient to one of the contact points provided not being available. Contact points are used for establishing an initial connection [*]. As long as the driver is able to communicate with one contact point, it should be able to query the system.peers and system.local table to discover the rest of the nodes in the cluster.
* They are also added to a list of initial hosts in the cluster, but typically the contact points provided map to a node in the system.peers table.

How to detect when Apache Zookeeper sessions are lost or timed out?

Assume we have an Apache Zookeeper quorum up and running and n client nodes connected (using Apache Curator). Is it possible to receive notifications on one of the nodes (the one we are observing) from zookeeper when any of the other nodes sessions are terminated or a timeout is reached? If so, how is this accomplished?
The answer is fairly simple and can be accomplished using Ephemeral nodes and PathChildrenCache. Zookeeper will detect when a node times out (in this example we set the timeout to 10s) and the associated ephemeral node will disappear from the tree. This will fire off an event which we can listen for.
First establish a connection with the curator client and start it up on all nodes
CuratorFramework curator =
CuratorFrameworkFactory
.newClient(zkConnectionString, 10000, 10000, retryPolicy);
curator.start();
curator.getZookeeperClient().blockUntilConnectedOrTimedOut();
Next use PathChildrenCache to assign listeners for zookeeper events. Event types include CHILD_ADDED, CHILD_UPDATED, and CHILD_REMOVED. The event object in the callback will contain the relevant information (and possible associated payload) of the node which went down.
PathChildrenCache pathCache = new PathChildrenCache(curator, "/nodes", true);
pathCache
.getListenable()
.addListener((curator, event) -> {
if (event.getType() == Type.CHILD_REMOVED) {
System.out.println("Child has been removed");
}
});
pathCache.start();
Now on a remote node, add the ephemeral node (here we give it an ID of 33 without a payload)
curator
.create()
.creatingParentsIfNeeded()
.withMode(CreateMode.EPHEMERAL)
.forPath("/nodes/33");
Now pull the plug on the remote node and the event should be detected where the listeners have been assigned.

elasticsearch clustering issue

I am using two ES nodes (ES version 1.0.1) in cluster and I need clarification for following:
When I start application and it connects to both nodes and I can see requests served by both nodes but when I stop one of server it throws exception and other node still works but 50% of requests still get exception and whole traffic is not diverted to running node.
I have following configuration for cluster:
1st Node:
discovery.zen.minimum_master_nodes: 1
node.data: true
discovery.zen.ping.unicast.hosts: ["product-elasticsearch-1","product-elasticsearch-2"]
node.master: true
couchbase.maxConcurrentRequests: 1024
2nd Node
discovery.zen.minimum_master_nodes: 1
node.data: true
discovery.zen.ping.unicast.hosts: ["product-elasticsearch","product-elasticsearch-2"]
node.master: false
couchbase.maxConcurrentRequests: 1024
Following is code for transport client:
settings = ImmutableSettings.settingsBuilder().put("cluster.name", clusterName)
.put("es.http.timeout",timeout)
.put("client.transport.ping_timeout",pingTimeout)
.put("es.http.retries",retries)
.build();
for (String host : hostList) {
transportAddressList.add(new InetSocketTransportAddress(host,port));
}
Collections.shuffle(transportAddressList);
// Using Transport Client
trasportClient = new TransportClient(settings).addTransportAddresses(transportAddressList.toArray(addressArray));
Could someone please let me know when I stop one ES process why all requests are not served by running node?
Which one are you bringing down? The second node has the following property:
node.master: false
This could result in problems if only that node is running.
Hope it helps
Thanks for reply. So another observation/solution is that if we run in master-master mode where node.master : true for both ES nodes it works meaning if one of node goes down then other will work. But if we run two ES nodes in master-slave mode where for slave node.master is false and master is down then slave will not work without master. Is there any way we can make master-slave model working with only slave when master is down?

JGroups function to list available Groups or Clusters

I have a series of clients which communicate with each other using JGroups library, they basically create a communication channel attached to a cluster name:
communicationChannel = new JChannel(AutoDiscovery.class.getResource("/resource/udp.xml"));
communicationChannel.connect("cluster1");
Now I would like them to first list available clusters to connect to and let the user decide which cluster connect to without hardwiring the name of the cluster in the code as above.
Apparently the API has getName() which returns the logical name of the channel if set but there's no method to retrieve set up clusters.
I though using the org.jgroups.Message.getHeaders() and reading the header would yield the active clusters but nothing.
Any help please?
There's no way to find the currently available clusters, I suggest maintaining some extra state which stores (in-memory) all cluster names and their associated configuration.
Once thing you could do though is develop a custom protocol (insert it below GMS), which does the following:
- Catches down(Event evt): if evt.getType() == Event.CONNECT*** (4 events), grab the cluster name ((String)evt.getArg()) and add it to a set
- Catches down(Event evt): if evt.getType() == Event.DISCONNECT, grab the currently cluster name and remove it from the set
This doesn't give you the config info; you could get this too, if you subclasses JChannel and overwrote connectXXX() and disconnect().

Categories

Resources