cassandra single node connection error - java

i am trying to use cassandra as database for an app i am working on. The app is a Netbeans platform app.
In order to start the cassandra server on my localhost i issue Runtime.getRuntime().exec(command)
where command is the string to start the cassandra server and then i connect to the cassandra sever with the datastax driver. However i get the error:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:199)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:80)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1154)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:318)
at org.dhviz.boot.DatabaseClient.connect(DatabaseClient.java:43)
at org.dhviz.boot.Installer.restored(Installer.java:67)
....
i figure it out that the server requires some time to start so i have added the line Thread.sleep(MAX_DELAY_SERVER) which seem to resolve the problem.
Is there any more elegant way to sort this issue?
Thanks.
Code is below.
public class Installer extends ModuleInstall {
private final int MAX_DELAY_SERVER = 12000;
//private static final String pathSrc = "/org/dhviz/resources";
#Override
public void restored() {
/*
-*-*-*-*-*DESCRIPTION*-*-*-*-*-*
IMPLEMENT THE CASSANDRA DATABASE
*********************************
*/
DatabaseClient d = new DatabaseClient();
// launch an instance of the cassandra server
d.loadDatabaseServer();
/*wait for MAX_DELAY_SERVER milliseconds before launching the other instructions.
*/
try {
Thread.sleep(MAX_DELAY_SERVER);
Logger.getLogger(Installer.class.getName()).log(Level.INFO, "wait for MAX_DELAY_SERVER milliseconds before the connect database");
} catch (InterruptedException ex) {
Exceptions.printStackTrace(ex);
Logger.getLogger(Installer.class.getName()).log(Level.INFO, "exeption in thread sleep");
}
d.connect("127.0.0.1");
}
}
public class DatabaseClient {
private Cluster cluster;
private Session session;
private ShellCommand shellCommand;
private final String defaultKeyspace = "dhviz";
final private String LOAD_CASSANDRA = "launchctl load /usr/local/Cellar/cassandra/2.1.2/homebrew.mxcl.cassandra.plist";
final private String UNLOAD_CASSANDRA = "launchctl unload /usr/local/Cellar/cassandra/2.1.2/homebrew.mxcl.cassandra.plist";
public DatabaseClient() {
shellCommand = new ShellCommand();
}
public void connect(String node) {
//this connect to the cassandra database
cluster = Cluster.builder()
.addContactPoint(node).build();
// cluster.getConfiguration().getSocketOptions().setConnectTimeoutMillis(12000);
Metadata metadata = cluster.getMetadata();
System.out.printf("Connected to cluster: %s\n",
metadata.getClusterName());
for (Host host
: metadata.getAllHosts()) {
System.out.printf("Datatacenter: %s; Host: %s; Rack: %s\n",
host.getDatacenter(), host.getAddress(), host.getRack());
}
session = cluster.connect();
Logger.getLogger(DatabaseClient.class.getName()).log(Level.INFO, "connected to server");
}
public void loadDatabaseServer() {
if (shellCommand == null) {
shellCommand = new ShellCommand();
}
shellCommand.executeCommand(LOAD_CASSANDRA);
Logger.getLogger(DatabaseClient.class.getName()).log(Level.INFO, "database cassandra loaded");
}
public void unloadDatabaseServer() {
if (shellCommand == null) {
shellCommand = new ShellCommand();
}
shellCommand.executeCommand(UNLOAD_CASSANDRA);
Logger.getLogger(DatabaseClient.class.getName()).log(Level.INFO, "database cassandra unloaded");
}
}

If you are calling cassandra without any parameters in Runtime.getRuntime().exec(command) it's likely that this is spawning cassandra as a background process and returning before the cassandra node has fully started and is listening.
I'm not sure why you are attempting to embed cassandra in your app, but you may find using cassandra-unit useful for providing a mechanism to embed cassandra in your app. It's primarily used for running tests that require a cassandra instance, but it may also meet your use case.
The wiki provides a helpful example on how to start an embedded cassandra instance using cassandra-unit:
EmbeddedCassandraServerHelper.startEmbeddedCassandra();
In my experience cassandra-unit will wait until the server is up and listening before returning. You could also write a method that waits until a socket is in use, using logic opposite of this answer.

I have changed the code to the following taking inspiration from the answers below. Thanks for your help!
cluster = Cluster.builder()
.addContactPoint(node).build();
cluster.getConfiguration().getSocketOptions().setConnectTimeoutMillis(50000);
boolean serverConnected = false;
while (serverConnected == false) {
try {
try {
Thread.sleep(MAX_DELAY_SERVER);
} catch (InterruptedException ex) {
Exceptions.printStackTrace(ex);
}
cluster = Cluster.builder()
.addContactPoint(node).build();
cluster.getConfiguration().getSocketOptions().setConnectTimeoutMillis(50000);
session = cluster.connect();
serverConnected = true;
} catch (NoHostAvailableException ex) {
Logger.getLogger(DatabaseClient.class.getName()).log(Level.INFO, "trying connection to cassandra server...");
serverConnected = false;
}
}

Related

Amazon Keyspace (Cassandra) query no node was available to execute query

I'm using AWS Keyspace (Cassandra 3.11.2) run on Apache Flink in AWS EMR. Some time below query throws Exception. The same code used on AWS Lambda also had the same Exception NoHost. What did I do wrong?
String query = "INSERT INTO TEST (field1, field2) VALUES(?, ?)";
PreparedStatement prepared = CassandraConnector.prepare(query);
int i = 0;
BoundStatement bound = prepared.bind().setString(i++, "Field1").setString(i++, "Field2")
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
ResultSet rs = CassandraConnector.execute(bound);
at com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:53)
at com.test.manager.connectors.CassandraConnector.execute(CassandraConnector.java:16)
at com.test.repository.impl.BackupRepositoryImpl.insert(BackupRepositoryImpl.java:36)
at com.test.service.impl.BackupServiceImpl.insert(BackupServiceImpl.java:18)
at com.test.flink.function.AsyncBackupFunction.processMessage(AsyncBackupFunction.java:78)
at com.test.flink.function.AsyncBackupFunction.lambda$asyncInvoke$0(AsyncBackupFunction.java:35)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
This is my code:
CassandraConnector.java:
Because cost of init preparedStatement is huge, I'm cached this.
public class CassandraConnector {
private static final ConcurrentHashMap<String, PreparedStatement> preparedStatementCache = new ConcurrentHashMap<String, PreparedStatement>();
public static ResultSet execute(BoundStatement bound) {
CqlSession session = CassandraManager.getSessionInstance();
return session.execute(bound);
}
public static ResultSet execute(String query) {
CqlSession session = CassandraManager.getSessionInstance();
return session.execute(query);
}
public static PreparedStatement prepare(String query) {
PreparedStatement result = preparedStatementCache.get(query);
if (result == null) {
CqlSession session = CassandraManager.getSessionInstance();
result = session.prepare(query);
preparedStatementCache.putIfAbsent(query, result);
}
return result;
}
}
CassandraManager.java:
I'm using singleton double-check locking for session object.
public class CassandraManager {
private static final Logger logger = LoggerFactory.getLogger(CassandraManager.class);
private static final String SSL_CASSANDRA_PASSWORD = "password";
private static volatile CqlSession session;
static {
try {
initSession();
} catch (Exception e) {
logger.error("Error CassandraManager getSessionInstance", e);
}
}
private static void initSession() {
List<InetSocketAddress> contactPoints = Collections.singletonList(InetSocketAddress.createUnresolved(
"cassandra.ap-southeast-1.amazonaws.com", 9142));
DriverConfigLoader loader = DriverConfigLoader.fromClasspath("application.conf");
Long start = BaseHelper.getTime();
session = CqlSession.builder().addContactPoints(contactPoints).withConfigLoader(loader)
.withAuthCredentials(AppUtil.getProperty("cassandra.username"),
AppUtil.getProperty("cassandra.password"))
.withSslContext(getSSLContext()).withLocalDatacenter("ap-southeast-1")
.withKeyspace(AppUtil.getProperty("cassandra.keyspace")).build();
logger.info("End connect: " + (new Date().getTime() - start));
}
public static CqlSession getSessionInstance() {
if (session == null || session.isClosed()) {
synchronized (CassandraManager.class) {
if (session == null || session.isClosed()) {
initSession();
}
}
}
return session;
}
public static SSLContext getSSLContext() {
InputStream in = null;
try {
KeyStore ks = KeyStore.getInstance("JKS");
in = CassandraManager.class.getClassLoader().getResourceAsStream("cassandra_truststore.jks");
ks.load(in, SSL_CASSANDRA_PASSWORD.toCharArray());
TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
tmf.init(ks);
SSLContext ctx = SSLContext.getInstance("TLS");
ctx.init(null, tmf.getTrustManagers(), null);
return ctx;
} catch (Exception e) {
logger.error("Error CassandraConnector getSSLContext", e);
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
logger.error("", e);
}
}
}
return null;
}
}
application.conf
datastax-java-driver {
basic.request {
timeout = 5 seconds
consistency = LOCAL_ONE
}
advanced.connection {
max-requests-per-connection = 1024
pool {
local.size = 1
remote.size = 1
}
}
advanced.reconnect-on-init = true
advanced.reconnection-policy {
class = ExponentialReconnectionPolicy
base-delay = 1 second
max-delay = 60 seconds
}
advanced.retry-policy {
class = DefaultRetryPolicy
}
advanced.protocol {
version = V4
}
advanced.heartbeat {
interval = 30 seconds
timeout = 1 second
}
advanced.session-leak.threshold = 8
advanced.metadata.token-map.enabled = false
}
There are two scenarios where the driver would report NoNodeAvailableException:
Nodes are unresponsive/unavailable and the driver has marked all of them as down.
All the contact points provided are invalid.
If some inserts are working but eventually runs into NoNodeAvailableException, that indicates to me that the nodes are getting overloaded and eventually become unresponsive so the driver no longer picks a coordinator since they're all marked as "down".
If none of the requests work at all, it means that the contact points are unreachable or unresolvable so the driver can't connect to the cluster. Cheers!
The NoHostAvailableException is a client side exception thrown by the open source driver after it has retried available hosts. The open source driver encapsulated the root cause for retry, which can be confusing.
I suggest first improving you observability by setting up these CloudWatch metrics. You can follow this prebuild CloudFormation template to get started it only takes a few seconds.
Here is a set up for Keyspace & Table Metrics for Amazon Keyspaces using Cloud Watch:
https://github.com/aws-samples/amazon-keyspaces-cloudwatch-cloudformation-templates
You can also replace retry policy with the following examples found in this helper project. The retry policy in this project will either try or throw the original exception which will remove the occurrences of NoHostAvailableException this will provide you with better transparency to your application. Here's the like to the Github repo: https://github.com/aws-samples/amazon-keyspaces-java-driver-helpers
If you're using the private VPC endpoint you want to add the following permissions to enable more entries in the system.peers table.,
Amazon Keyspaces just announced new functionality that will provide more connection points when establishing a session with a private VPC endpoints.
Here is a link about how Keyspaces now automatically optimizes client connection made through AWS PrivateLink to improve availability and write and read: https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-keyspaces-for-apache-cassandra-now-automatically-optimi/
This link that talks about Using Amazon Keypscaes with Interface VPC Endpoints: https://docs.aws.amazon.com/keyspaces/latest/devguide/vpc-endpoints.html . To enable this new functionality you will need to provide additional permissions to DescribeNetworkInterfaces and DescribeVpcEndpoints.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"ListVPCEndpoints",
"Effect":"Allow",
"Action":[
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcEndpoints"
],
"Resource":"*"
}
]
}
I suspect that this:
.withLocalDatacenter(AppUtil.getProperty("cassandra.localdatacenter"))
Pulls back a data center name which either does not match the keyspace replication definition or the configured data center name:
nodetool status | grep Datacenter
Basically, if your connection is defined with a local data center which does not exist, it will still try to read/write with replicas in that data center. This will fail, because it obviously cannot find nodes in a non-existent data center.
Similar question here: NoHostAvailable error in cqlsh console

How to disable connection pooling in Gremlin server

I am using gremlin server with hbase as storage.bakend.
When I try to connect to the gremlin server from my spark code,the below message gets logged and after sometime it timeouts.
Opening connection pool on Host{address = 'ip:8182' ,, hostUri=ws:/ip:8182/gremlin} with core size of 2
The following code is used to get the client instance for each partition:
private static Cluster cluster;
private static Client client;
Logger logger = LoggerFactory.getLogger(GremlinSeverConnection.class);
public Client getGraph(GraphConf conf) {
if (client == null) {
try {
// cluster = Cluster.build(new File(conf)).create();
cluster = Cluster.build(conf.getGraphHost()).port(Integer.parseInt(conf.getGraphPort()))
.serializer(getserializer(conf.getGraphSerializer())).create();
client = cluster.connect();
logger.info("connected to graph database");
} finally {
//cluster.close();
//client.close();
}
}
return client;
}
public Serializers getserializer(String serializer) {
return Serializers.GRAPHSON;
}
You could set the min and max size of the connection pool to 1:
Cluster cluster = Cluster.build().maxConnectionPoolSize(1)
minConnectionPoolSize(1).create();
That should force the client to use a single connection.

connect to local cassandra nodes using datastax java driver?

I am using datastax java driver 3.1.0 to connect to cassandra cluster and my cassandra cluster version is 2.0.10.
Below is the singleton class I am using to connect to cassandra cluster.
public class CassUtil {
private static final Logger LOGGER = Logger.getInstance(CassUtil.class);
private Session session;
private Cluster cluster;
private static class Holder {
private static final CassUtil INSTANCE = new CassUtil();
}
public static CassUtil getInstance() {
return Holder.INSTANCE;
}
private CassUtil() {
List<String> servers = TestUtils.HOSTNAMES;
String username =
TestUtils.loadCredentialFile().getProperty(TestUtils.USERNAME);
String password =
TestUtils.loadCredentialFile().getProperty(TestUtils.PASSWORD);
// is this right setting?
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setConnectionsPerHost(HostDistance.LOCAL, 4, 10).setConnectionsPerHost(
HostDistance.REMOTE, 2, 4);
Builder builder = Cluster.builder();
cluster =
builder
.addContactPoints(servers.toArray(new String[servers.size()]))
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withPoolingOptions(poolingOptions)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L))
.withLoadBalancingPolicy(
DCAwareRoundRobinPolicy
.builder()
.withLocalDc(
!TestUtils.isProduction() ? "DC2" : TestUtils.getCurrentLocation()
.get().name().toLowerCase()).build())
.withCredentials(username, password).build();
try {
session = cluster.connect("testkeyspace");
StringBuilder sb = new StringBuilder();
Set<Host> allHosts = cluster.getMetadata().getAllHosts();
for (Host host : allHosts) {
sb.append("[");
sb.append(host.getDatacenter());
sb.append(host.getRack());
sb.append(host.getAddress());
sb.append("]");
}
LOGGER.logInfo("connected: " + sb.toString());
} catch (NoHostAvailableException ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex));
} catch (Exception ex) {
LOGGER.logError("error= " + ExceptionUtils.getStackTrace(ex));
}
}
public void shutdown() {
LOGGER.logInfo("Shutting down the whole cassandra cluster");
if (null != session) {
session.close();
}
if (null != cluster) {
cluster.close();
}
}
public Session getSession() {
if (session == null) {
throw new IllegalStateException("No connection initialized");
}
return session;
}
public Cluster getCluster() {
return cluster;
}
}
What is the settings I need to use to connect to local cassandra nodes first and if they are down, then only talk to remote nodes. Also my pooling configuration options is right here which I am using in the above code?
By default the datastax drivers will only connect to nodes in the local DC. If you do not use withLocalDc it will attempt to discern the local datacenter from the DC of the contact point it is able to connect to.
If you want the driver to fail over to host in remote data center(s) you should use withUsedHostsPerRemoteDc, i.e.:
cluster.builder()
.withLoadBalancingPolicy(DCAwareRoundRobinPolicy.builder()
.withLocalDc("DC1")
.withUsedHostsPerRemoteDc(3).build())
With this configuration, the driver will establish connections to 3 hosts in each remote DC, and only send queries to them if all hosts in the local datacenter is down.
There are other strategies for failover to remote data centers. For example, you could run your application clients in each same physical data center as your C* data centers, and then when a physical data center fails, you can fail over at a higher level (like your load balancer).
Also my pooling configuration options is right here which I am using in the above code?
I think what you have is fine. The defaults are fine too.

OrientDB. Connection leaks or just not closing connections

We have a large multithreaded Java EE application running on Wildfly 8.
We are using OrientDB 2.1.19.
And we have some problems with the connection leaks. At some point orient server stops responding and all threads working with db stuck on retrieving new connection.
Configuration is following:
OGlobalConfiguration.CLIENT_CONNECT_POOL_WAIT_TIMEOUT.setValue(5000);
OGlobalConfiguration.CLIENT_DB_RELEASE_WAIT_TIMEOUT.setValue(5000);
OGlobalConfiguration.CLIENT_CHANNEL_MAX_POOL.setValue(1000);
OGlobalConfiguration.DB_POOL_MIN.setValue(100);
OGlobalConfiguration.DB_POOL_MAX.setValue(5000);
OGlobalConfiguration.STORAGE_LOCK_TIMEOUT.setValue(5000);
OGlobalConfiguration.DB_POOL_IDLE_TIMEOUT.setValue(5000);
OGlobalConfiguration.DB_POOL_IDLE_CHECK_DELAY.setValue(1000);
OPartitionedDatabasePool we're getting thru OPartitionedDatabasePoolFactory
poolFactory = new OPartitionedDatabasePoolFactory();
documentPool = poolFactory.get(orientDBPath, username, password);
Then depending on our needs we are using ODatabaseDocumentTx or OObjectDatabaseTx using
documentPool.acquire();
Acquired ODatabaseDocumentTx or OObjectDatabaseTx we are passing to so called executor, which executes Runnable or Callable.
public void run(ORunnable<DB> oRunnable) {
// db is the private member of executor containing acquired db.
try {
//...
oRunnable.run(db);
//...
} catch (Exception e) {
} finally {
if(!db.isClosed()) {
db.close();
}
}
}
As you can see we are closing db in finally section, also we are fully detaching all documents recieved from db. But in some cases number of connections to DB is not dropping and after that we're getting following exception
2016-07-28 23:10:45,957 FINE [com.orientechnologies.orient.client.remote.ORemoteConnectionManager] (default task-46) Network connection pool is receiving a closed connection to reuse: discard it
2016-07-28 23:10:45,957 FINE [com.orientechnologies.orient.client.remote.ORemoteConnectionManager] (default task-46) Cannot unlock connection lock: java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155) [rt.jar:1.7.0_13]
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) [rt.jar:1.7.0_13]
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460) [rt.jar:1.7.0_13]
at com.orientechnologies.common.concur.lock.OAdaptiveLock.unlock(OAdaptiveLock.java:123) [orientdb-core-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.unlock(OChannelBinaryAsynchClient.java:371) [orientdb-enterprise-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager.remove(ORemoteConnectionManager.java:128) [orientdb-client-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager.release(ORemoteConnectionManager.java:119) [orientdb-client-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.client.remote.OStorageRemote.endResponse(OStorageRemote.java:1643) [orientdb-client-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1240) [orientdb-client-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.client.remote.OStorageRemoteThread.command(OStorageRemoteThread.java:453) [orientdb-client-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:72) [orientdb-core-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) [orientdb-core-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) [orientdb-core-2.1.19.jar:2.1.19]
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:717) [orientdb-core-2.1.19.jar:2.1.19]
Here is the piece of code for the exception above
ODocument storedSession = orient.onDocuments().call(new ODocCallable<ODocument>() {
#Override
public ODocument call(ODatabaseDocumentTx db) {
try {
List<ODocument> list = db.query(new OSQLSynchQuery<ODocument>("select from " + CLAZZ + " where storedSession_sessionID = ?"), sessionId);
if (list.isEmpty()){
return null;
}
return documentUnpin(list.get(0));
} catch (Exception e) {
// PersistentSession may not be in DB on request (for example in workspaces)
}
return null;
}
});
Any suggestions how to improve the situation with connections?
Thanks.

How can I check if a hazelcast cluster is alive from a java client?

We use hazelcast in client-server mode. The hazelcast cluster contains 2 hazelcast nodes and we have about 25 clients connected to the cluster.
What I am lookin for now is a simple check that tries to figure out if the cluster is still alive. It should be a rather cheap operation because this check will occure on every client quite frequently (once every second I could imagine).
What is the best way to do so?
The simplest way would be the register a LifecycleListener to the client HazelcastInstance:
HazelcastInstance client = HazelcastClient.newHazelcastClient();
client.getLifecycleService().addLifecycleListener(new LifecycleListener() {
#Override
public void stateChanged(LifecycleEvent event) {
}
})
The client uses a periodic heartbeat to detect if the cluster is still running.
You can use the LifecycleService.isRunning() method as well:
HazelcastInstance hzInstance = HazelcastClient.newHazelcastClient();
hzInstance.getLifecycleService().isRunning()
As isRunning() may be true even if cluster is down, I'd go for the following approach (a mixture of #konstantin-zyubin's answer and this). This doesn't need an event-listener, which is an advantage in my setup:
if (!hazelcastInstance.getLifecycleService().isRunning()) {
return Health.down().build();
}
int parameterCount;
LocalTopicStats topicStats;
try {
parameterCount = hazelcastInstance.getMap("parameters").size();
topicStats = hazelcastInstance.getTopic("myTopic").getLocalTopicStats();
} catch (Exception e) {
// instance may run but cluster is down:
Health.Builder builder = Health.down();
builder.withDetail("Error", e.getMessage());
return builder.build();
}
Health.Builder builder = Health.up();
builder.withDetail("parameterCount", parameterCount);
builder.withDetail("receivedMsgs", topicStats.getReceiveOperationCount());
builder.withDetail("publishedMsgs", topicStats.getPublishOperationCount());
return builder.build();
I have found a more reliable way to check hazelcast availability, because
client.getLifecycleService().isRunning()
when you use async reconnection mode is always return true, as was mentioned.
#Slf4j
public class DistributedCacheServiceImpl implements DistributedCacheService {
private HazelcastInstance client;
#Autowired
protected ConfigLoader<ServersConfig> serversConfigLoader;
#PostConstruct
private void initHazelcastClient() {
ClientConfig config = new ClientConfig();
if (isCacheEnabled()) {
ServersConfig.Hazelсast hazelcastConfig = getWidgetCacheSettings().getHazelcast();
config.getGroupConfig().setName(hazelcastConfig.getName());
config.getGroupConfig().setPassword(hazelcastConfig.getPassword());
for (String address : hazelcastConfig.getAddresses()) {
config.getNetworkConfig().addAddress(address);
}
config.getConnectionStrategyConfig()
.setAsyncStart(true)
.setReconnectMode(ClientConnectionStrategyConfig.ReconnectMode.ASYNC);
config.getNetworkConfig()
.setConnectionAttemptLimit(0) // infinite (Integer.MAX_VALUE) attempts to reconnect
.setConnectionTimeout(5000);
client = HazelcastClient.newHazelcastClient(config);
}
}
#Override
public boolean isCacheEnabled() {
ServersConfig.WidgetCache widgetCache = getWidgetCacheSettings();
return widgetCache != null && widgetCache.getEnabled();
}
#Override
public boolean isCacheAlive() {
boolean aliveResult = false;
if (isCacheEnabled() && client != null) {
try {
IMap<Object, Object> defaultMap = client.getMap("default");
if (defaultMap != null) {
defaultMap.size(); // will throw Hazelcast exception if cluster is down
aliveResult = true;
}
} catch (Exception e) {
log.error("Connection to hazelcast cluster is lost. Reason : {}", e.getMessage());
}
}
return aliveResult;
}
}

Categories

Resources