hazelcast-kubernetes Network Discovery: How to use with Multiple Nodes

hazelcast-kubernetes Network Discovery: How to use with Multiple Nodes - java

We have a Kubernetes cluster which spins up 4 instances of our application. We'd like to have it share a Hazelcast data grid and keep in synch between these nodes. According to https://github.com/hazelcast/hazelcast-kubernetes the configuration is straightforward. We'd like to use the DNS approach rather than the kubernetes api.
With DNS we are supposed to be able to add the DNS name of our app as described here. So this would be something like myservice.mynamespace.svc.cluster.local.
The problem is that although we have 4 VMs spun up, only one Hazelcast network member is found; thus we see the following in the logs:
Members [1] {
Member [192.168.187.3]:5701 - 50056bfb-b710-43e0-ad58-57459ed399a5 this
}
It seems that there aren't any errors, it just doesn't see any of the other network members.
Here's my configuration. I've tried both using an xml file, like the example on the hazelcast-kubernetes git repo, as well as programmatically. Neither attempt appear to work.
I'm using hazelcast 3.8.
Using hazelcast.xml:
<hazelcast>
<properties>
<!-- only necessary prior Hazelcast 3.8 -->
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<network>
<join>
<!-- deactivate normal discovery -->
<multicast enabled="false"/>
<tcp-ip enabled="false" />
<!-- activate the Kubernetes plugin -->
<discovery-strategies>
<discovery-strategy enabled="true"
class="com.hazelcast.HazelcastKubernetesDiscoveryStrategy">
<properties>
<!-- configure discovery service API lookup -->
<property name="service-dns">myapp.mynamespace.svc.cluster.local</property>
<property name="service-dns-timeout">10</property>
</properties>
</discovery-strategy>
</discovery-strategies>
</join>
</network>
</hazelcast>
Using the XmlConfigBuilder to construct the instance.
Properties properties = new Properties();
XmlConfigBuilder builder = new XmlConfigBuilder();
builder.setProperties(properties);
Config config = builder.build();
this.instance = Hazelcast.newHazelcastInstance(config);
And Programmatically (personal preference if I can get it to work):
Config cfg = new Config();
NetworkConfig networkConfig = cfg.getNetworkConfig();
networkConfig.setPort(hazelcastNetworkPort);
networkConfig.setPortAutoIncrement(true);
networkConfig.setPortCount(100);
JoinConfig joinConfig = networkConfig.getJoin();
joinConfig.getMulticastConfig().setEnabled(false);
joinConfig.getTcpIpConfig().setEnabled(false);
DiscoveryConfig discoveryConfig = joinConfig.getDiscoveryConfig();
HazelcastKubernetesDiscoveryStrategyFactory factory = new HazelcastKubernetesDiscoveryStrategyFactory();
DiscoveryStrategyConfig strategyConfig = new DiscoveryStrategyConfig(factory);
strategyConfig.addProperty("service-dns", kubernetesSvcsDnsName);
strategyConfig.addProperty("service-dns-timeout", kubernetesSvcsDnsTimeout);
discoveryConfig.addDiscoveryStrategyConfig(strategyConfig);
this.instance = Hazelcast.newHazelcastInstance(cfg);
Is anyone farmiliar with this setup? I have ports 5701 - 5800 open. It seems kubernetes starts up and recognizes that discovery mode is on, but only finds the one (local) node.
Here's a snippet from the logs for what it's worth. This was while using the xml file for config:
2017-03-15 08:15:33,688 INFO [main] c.h.c.XmlConfigLocator [StandardLoggerFactory.java:49] Loading 'hazelcast-default.xml' from classpath.
2017-03-15 08:15:33,917 INFO [main] c.g.a.c.a.u.c.HazelcastCacheClient [HazelcastCacheClient.java:112] CONFIG: Config{groupConfig=GroupConfig [name=dev, password=********], properties={}, networkConfig=NetworkConfig{publicAddress='null', port=5701, portCount=100, portAutoIncrement=true, join=JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[], loopbackModeEnabled=false], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[127.0.0.1, 127.0.0.1], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-west-1', securityGroupName='hazelcast-sg', tagKey='type', tagValue='hz-nodes', hostHeader='ec2.amazonaws.com', iamRole='null', connectionTimeoutSeconds=5}, discoveryProvidersConfig=com.hazelcast.config.DiscoveryConfig#3c153a1}, interfaces=InterfacesConfig{enabled=false, interfaces=[10.10.1.*]}, sslConfig=SSLConfig{className='null', enabled=false, implementation=null, properties={}}, socketInterceptorConfig=SocketInterceptorConfig{className='null', enabled=false, implementation=null, properties={}}, symmetricEncryptionConfig=SymmetricEncryptionConfig{enabled=false, iterationCount=19, algorithm='PBEWithMD5AndDES', key=null}}, mapConfigs={default=MapConfig{name='default', inMemoryFormat=BINARY', backupCount=1, asyncBackupCount=0, timeToLiveSeconds=0, maxIdleSeconds=0, evictionPolicy='NONE', mapEvictionPolicy='null', evictionPercentage=25, minEvictionCheckMillis=100, maxSizeConfig=MaxSizeConfig{maxSizePolicy='PER_NODE', size=2147483647}, readBackupData=false, hotRestart=HotRestartConfig{enabled=false, fsync=false}, nearCacheConfig=null, mapStoreConfig=MapStoreConfig{enabled=false, className='null', factoryClassName='null', writeDelaySeconds=0, writeBatchSize=1, implementation=null, factoryImplementation=null, properties={}, initialLoadMode=LAZY, writeCoalescing=true}, mergePolicyConfig='com.hazelcast.map.merge.PutIfAbsentMapMergePolicy', wanReplicationRef=null, entryListenerConfigs=null, mapIndexConfigs=null, mapAttributeConfigs=null, quorumName=null, queryCacheConfigs=null, cacheDeserializedValues=INDEX_ONLY}}, topicConfigs={}, reliableTopicConfigs={default=ReliableTopicConfig{name='default', topicOverloadPolicy=BLOCK, executor=null, readBatchSize=10, statisticsEnabled=true, listenerConfigs=[]}}, queueConfigs={default=QueueConfig{name='default', listenerConfigs=null, backupCount=1, asyncBackupCount=0, maxSize=0, emptyQueueTtl=-1, queueStoreConfig=null, statisticsEnabled=true}}, multiMapConfigs={default=MultiMapConfig{name='default', valueCollectionType='SET', listenerConfigs=null, binary=true, backupCount=1, asyncBackupCount=0}}, executorConfigs={default=ExecutorConfig{name='default', poolSize=16, queueCapacity=0}}, semaphoreConfigs={default=SemaphoreConfig{name='default', initialPermits=0, backupCount=1, asyncBackupCount=0}}, ringbufferConfigs={default=RingbufferConfig{name='default', capacity=10000, backupCount=1, asyncBackupCount=0, timeToLiveSeconds=0, inMemoryFormat=BINARY, ringbufferStoreConfig=RingbufferStoreConfig{enabled=false, className='null', properties={}}}}, wanReplicationConfigs={}, listenerConfigs=[], partitionGroupConfig=PartitionGroupConfig{enabled=false, groupType=PER_MEMBER, memberGroupConfigs=[]}, managementCenterConfig=ManagementCenterConfig{enabled=false, url='http://localhost:8080/mancenter', updateInterval=3}, securityConfig=SecurityConfig{enabled=false, memberCredentialsConfig=CredentialsFactoryConfig{className='null', implementation=null, properties={}}, memberLoginModuleConfigs=[], clientLoginModuleConfigs=[], clientPolicyConfig=PermissionPolicyConfig{className='null', implementation=null, properties={}}, clientPermissionConfigs=[]}, liteMember=false}
2017-03-15 08:15:33,949 INFO [main] c.h.i.DefaultAddressPicker [StandardLoggerFactory.java:49] [LOCAL] [dev] [3.8] Prefer IPv4 stack is true.
2017-03-15 08:15:33,960 INFO [main] c.h.i.DefaultAddressPicker [StandardLoggerFactory.java:49] [LOCAL] [dev] [3.8] Picked [192.168.187.3]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2017-03-15 08:15:34,000 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Hazelcast 3.8 (20170217 - d7998b4) starting at [192.168.187.3]:5701
2017-03-15 08:15:34,001 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Copyright (c) 2008-2017, Hazelcast, Inc. All Rights Reserved.
2017-03-15 08:15:34,001 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Configured Hazelcast Serialization version : 1
2017-03-15 08:15:34,507 INFO [main] c.h.s.i.o.i.BackpressureRegulator [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Backpressure is disabled
2017-03-15 08:15:35,170 INFO [main] c.h.i.Node [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Creating MulticastJoiner
2017-03-15 08:15:35,339 INFO [main] c.h.s.i.o.i.OperationExecutorImpl [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Starting 8 partition threads
2017-03-15 08:15:35,342 INFO [main] c.h.s.i.o.i.OperationExecutorImpl [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Starting 5 generic threads (1 dedicated for priority tasks)
2017-03-15 08:15:35,351 INFO [main] c.h.c.LifecycleService [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] [192.168.187.3]:5701 is STARTING
2017-03-15 08:15:37,463 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Cluster version set to 3.8
2017-03-15 08:15:37,466 INFO [main] c.h.i.c.i.MulticastJoiner [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8]
Members [1] {
Member [192.168.187.3]:5701 - 50056bfb-b710-43e0-ad58-57459ed399a5 this
}

Could you try with service dns name as:
myapp.mynamespace.endpoints.cluster.local
Please reply it's work or not and also post your full log.

I know it's kinda happened long time ago. But the problem here was using wrong class name for discovery strategy.
It should be com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy

Related

com.hazelcast.cp.exception.NotLeaderException no leader election on 3 nodes

I am using hazelcast-4.2.1 – not enterprise –d on 3 nodes. I try enable cp-subsystem with raft consensus protocol by changing config to on each node and restart hazelcast:
<cp-subsystem>
<cp-member-count>3</cp-member-count>
<group-size>3</group-size>
<session-time-to-live-seconds>300</session-time-to-live-seconds>
<session-heartbeat-interval-seconds>5</session-heartbeat-interval-seconds>
<missing-cp-member-auto-removal-seconds>14400</missing-cp-member-auto-removal-seconds>
<fail-on-indeterminate-operation-state>false</fail-on-indeterminate-operation-state>
<raft-algorithm>
<leader-election-timeout-in-millis>2000</leader-election-timeout-in-millis>
<leader-heartbeat-period-in-millis>5000</leader-heartbeat-period-in-millis>
<max-missed-leader-heartbeat-count>5</max-missed-leader-heartbeat-count>
<append-request-max-entry-count>100</append-request-max-entry-count>
<commit-index-advance-count-to-snapshot>10000</commit-index-advance-count-to-snapshot>
<uncommitted-entry-count-to-reject-new-appends>100</uncommitted-entry-count-to-reject-new-appends>
<append-request-backoff-timeout-in-millis>100</append-request-backoff-timeout-in-millis>
</raft-algorithm>
</cp-subsystem>
The code of app is simple:
hazelcastInstance.getCPSubsystem().getLock().lock();
But I was warned on each of 3 nodes "Leader N/A":
2021-11-23 11:45:45 INFO [MetadataRaftGroupManager] - [172.18.20.166]:5701 [dev] [4.2.1] CP Subsystem is waiting for 3 members to join the cluster. Current member count: 1
2021-11-23 11:45:48 INFO [ClusterService] - [172.18.20.166]:5701 [dev] [4.2.1]
Members {size:3, ver:3} [
Member [172.18.20.166]:5701 - 66ef130c-a666-4ec0-8e99-cec4cd504bac this
Member [172.18.20.167]:5701 - 5039c2ea-22dd-4f7c-a134-9367fab4e767
Member [172.18.20.168]:5701 - 47dd80d4-b983-4ae7-b6bc-c2352416eada
]
2021-11-23 11:45:48 INFO [PartitionStateManager] - [172.18.20.166]:5701 [dev] [4.2.1] Initializing cluster partition table arrangement...
2021-11-23 11:45:49 INFO [RaftService] - [172.18.20.166]:5701 [dev] [4.2.1] RaftNode[CPGroupId{name='METADATA', seed=0, groupId=0}] is created with [RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'}, RaftEndpoint{uuid='5039c2ea-22dd-4f7c-a134-9367fab4e767'}, RaftEndpoint{uuid='66ef130c-a666-4ec0-8e99-cec4cd504bac'}]
2021-11-23 11:45:49 INFO [RaftNode(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1] Status is set to: ACTIVE
2021-11-23 11:45:49 INFO [AuthenticationMessageTask] - [172.18.20.166]:5701 [dev] [4.2.1] Received auth from Connection[id=12, /172.18.20.166:5701->/172.18.20.166:53635, qualifier=null, endpoint=[172.18.20.166]:53635, alive=true, connectionType=MCJVM, planeIndex=-1], successfully authenticated, clientUuid: 7c86a343-8b52-4652-8235-7c0dfdb2f5ad, client version: 4.2
2021-11-23 11:45:51 INFO [PreVoteRequestHandlerTask(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1] Granted pre-vote for PreVoteRequest{candidate=RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'}, nextTerm=1, lastLogTerm=0, lastLogIndex=0}
2021-11-23 11:45:51 INFO [VoteRequestHandlerTask(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1] Moving to new term: 1 from current term: 0 after VoteRequest{candidate=RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'}, term=1, lastLogTerm=0, lastLogIndex=0, disruptive=false}
2021-11-23 11:45:51 INFO [RaftNode(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1]
CP Group Members {groupId: METADATA(0), size:3, term:1, logIndex:0} [
CPMember{uuid=47dd80d4-b983-4ae7-b6bc-c2352416eada, address=[172.18.20.168]:5701}
CPMember{uuid=5039c2ea-22dd-4f7c-a134-9367fab4e767, address=[172.18.20.167]:5701}
CPMember{uuid=66ef130c-a666-4ec0-8e99-cec4cd504bac, address=[172.18.20.166]:5701} - FOLLOWER this
]
2021-11-23 11:45:51 INFO [VoteRequestHandlerTask(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1] Granted vote for VoteRequest{candidate=RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'}, term=1, lastLogTerm=0, lastLogIndex=0, disruptive=false}
2021-11-23 11:45:51 INFO [AppendRequestHandlerTask(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1] Setting leader: RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'}
2021-11-23 11:45:51 INFO [RaftNode(METADATA)] - [172.18.20.166]:5701 [dev] [4.2.1]
CP Group Members {groupId: METADATA(0), size:3, term:1, logIndex:0} [
CPMember{uuid=47dd80d4-b983-4ae7-b6bc-c2352416eada, address=[172.18.20.168]:5701} - LEADER
CPMember{uuid=5039c2ea-22dd-4f7c-a134-9367fab4e767, address=[172.18.20.167]:5701}
CPMember{uuid=66ef130c-a666-4ec0-8e99-cec4cd504bac, address=[172.18.20.166]:5701} - FOLLOWER this
]
2021-11-23 11:45:53 INFO [MetadataRaftGroupManager] - [172.18.20.166]:5701 [dev] [4.2.1] CP Subsystem is initialized with: [CPMember{uuid=47dd80d4-b983-4ae7-b6bc-c2352416eada, address=[172.18.20.168]:5701}, CPMember{uuid=5039c2ea-22dd-4f7c-a134-9367fab4e767, address=[172.18.20.167]:5701}, CPMember{uuid=66ef130c-a666-4ec0-8e99-cec4cd504bac, address=[172.18.20.166]:5701}]
2021-11-23 11:45:54 INFO [HealthMonitor] - [172.18.20.166]:5701 [dev] [4.2.1] processors=2, physical.memory.total=5.5G, physical.memory.free=844.4M, swap.space.total=2.0G, swap.space.free=1.2G, heap.memory.used=101.1M, heap.memory.free=857.4M, heap.memory.total=958.5M, heap.memory.max=958.5M, heap.memory.used/total=10.55%, heap.memory.used/max=10.55%, minor.gc.count=2, minor.gc.time=34ms, major.gc.count=2, major.gc.time=96ms, load.process=100.00%, load.system=100.00%, load.systemAverage=1.01, thread.count=47, thread.peakCount=53, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=116, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=1, clientEndpoint.count=10, connection.active.count=12, client.connection.count=10, connection.count=12
2021-11-23 11:46:12 WARN [Invocation] - [172.18.20.166]:5701 [dev] [4.2.1] Retrying invocation: Invocation{op=com.hazelcast.cp.internal.operation.DefaultRaftReplicateOp{serviceName='hz:core:raft', identityHash=309878934, partitionId=185, replicaIndex=0, callId=3364, invocationTime=1637649972379 (2021-11-23 11:46:12.379), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl#0, groupId=CPGroupId{name='default', seed=0, groupId=6960}, op=com.hazelcast.cp.internal.session.operation.HeartbeatSessionOp{serviceName='hz:core:raftSession', sessionId=10}}, tryCount=250, tryPauseMillis=500, invokeCount=100, callTimeoutMillis=60000, firstInvocationTimeMs=1637649936372, firstInvocationTime='2021-11-23 11:45:36.372', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:00:00.000', target=[172.18.20.168]:5701, pendingResponse={VOID}, backupsAcksExpected=-1, backupsAcksReceived=0, connection=Connection[id=11, /172.18.20.166:5701->/172.18.20.168:51871, qualifier=null, endpoint=[172.18.20.168]:5701, alive=true, connectionType=MEMBER, planeIndex=0]}, Reason: com.hazelcast.cp.exception.NotLeaderException: RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'} is not LEADER of CPGroupId{name='default', seed=0, groupId=6960}. Known leader is: N/A
2021-11-23 11:46:12 WARN [Invocation] - [172.18.20.166]:5701 [dev] [4.2.1] Retrying invocation: Invocation{op=com.hazelcast.cp.internal.operation.DefaultRaftReplicateOp{serviceName='hz:core:raft', identityHash=1764520567, partitionId=185, replicaIndex=0, callId=3366, invocationTime=1637649972380 (2021-11-23 11:46:12.380), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl#0, groupId=CPGroupId{name='default', seed=0, groupId=6960}, op=com.hazelcast.cp.internal.session.operation.HeartbeatSessionOp{serviceName='hz:core:raftSession', sessionId=10}}, tryCount=250, tryPauseMillis=500, invokeCount=100, callTimeoutMillis=60000, firstInvocationTimeMs=1637649936136, firstInvocationTime='2021-11-23 11:45:36.136', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:00:00.000', target=[172.18.20.168]:5701, pendingResponse={VOID}, backupsAcksExpected=-1, backupsAcksReceived=0, connection=Connection[id=11, /172.18.20.166:5701->/172.18.20.168:51871, qualifier=null, endpoint=[172.18.20.168]:5701, alive=true, connectionType=MEMBER, planeIndex=0]}, Reason: com.hazelcast.cp.exception.NotLeaderException: RaftEndpoint{uuid='47dd80d4-b983-4ae7-b6bc-c2352416eada'} is not LEADER of CPGroupId{name='default', seed=0, groupId=6960}. Known leader is: N/A
2021-11-23 11:46:12 WARN [Invocation] - [172.18.20.166]:5701 [dev] [4.2.1] Retrying invocation: Invocation{op=com.hazelcast.cp.internal.operation.DefaultRaftReplicateOp{serviceName='hz:core:raft', identityHash=1136581224, partitionId=185, replicaIndex=0, callId=3416, invocationTime=1637649972878 (2021-11-23 11:46:12.878), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl#0, groupId=CPGroupId{name='default', seed=0, groupId=6960}, op=com.hazelcast.cp.internal.session.operation.HeartbeatSessionOp{serviceName='hz:core:raftSession', sessionId=10}}, tryCount=250, tryPauseMillis=500, invokeCount=100, callTimeoutMillis=60000, firstInvocationTimeMs=1637649936875, firstInvocationTime='2021-11-23 11:45:36.875', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:00:00.000', target=[172.18.20.167]:5701, pendingResponse={VOID}, backupsAcksExpected=-1, backupsAcksReceived=0, connection=Connection[id=10, /172.18.20.166:5701->/172.18.20.167:36039, qualifier=null, endpoint=[172.18.20.167]:5701, alive=true, connectionType=MEMBER, planeIndex=0]}, Reason: com.hazelcast.cp.exception.NotLeaderException: RaftEndpoint{uuid='5039c2ea-22dd-4f7c-a134-9367fab4e767'} is not LEADER of CPGroupId{name='default', seed=0, groupId=6960}. Known leader is: N/A
And flooding inovocation with com.hazelcast.cp.exception.NotLeaderException to end of log.
Question: how setup cp-subsystem enable for raft with 3 nodes?

If restart all members of cluster, it change groupId because state be not consistent without persistents mode, that only in enterprise version. And you need restart clients too.
https://github.com/hazelcast/hazelcast/issues/17436

Apache Ignite: Slow Node join and failure

We have a Ignite setup with 3 Servers and Persistence and therefore Baselining enabled. From time to time we have the issue that the Servers take a long time to rebuild the cluster after all Nodes are restarted. Ignite runs embedded in the application.
20.11.2020 08:18:17.678 WARN [main] org.apache.ignite.internal.util.typedef.G:290 - Ignite work directory is not provided, automatically resolved to: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work
20.11.2020 08:18:17.709 WARN [main] org.apache.ignite.internal.util.typedef.G:295 - Consistent ID is not set, it is recommended to set consistent ID for production clusters (use IgniteConfiguration.setConsistentId property)
20.11.2020 08:18:18.053 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Config URL: n/a
20.11.2020 08:18:18.084 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - IgniteConfiguration [igniteInstanceName=null, pubPoolSize=8, svcPoolSize=8, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=4, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=8, sqlQryHistSize=1000, dfltQryTimeout=0, igniteHome=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite, igniteWorkDir=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer#78c03f1f, nodeId=0e60d50b-ee2e-46ed-8d76-5cb51791011b, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout=5000, netCompressionLevel=1, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy#522ba524, chConnPlc=null, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch#29c5ee1d[Count = 1], stopping=false, metricsLsnr=null], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi#15cea7b0, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi#1e6cc850, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi#7e7f0f0a, clientMode=false, rebalanceThreadPoolSize=4, rebalanceTimeout=10000, rebalanceBatchesPrefetchCnt=3, rebalanceThrottle=0, rebalanceBatchSize=524288, txCfg=TransactionConfiguration [txSerEnabled=false, dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000, pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=10000, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=30000, metricsLogFreq=0, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=8, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=10485760, sysRegionMaxSize=52428800, pageSize=0, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default, maxSize=858886144, initSize=10485760, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=true, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true], dataRegions=DataRegionConfiguration[] [DataRegionConfiguration [name=persistent, maxSize=52428800, initSize=10485760, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=true, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true]], storagePath=null, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=1073741824, walSegments=4, walSegmentSize=10485760, walPath=db/wal, walArchivePath=db/wal/archive, metricsEnabled=false, walMode=LOG_ONLY, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory#59429fac, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null, walPageCompression=DISABLED, walPageCompressionLevel=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0, handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration [maxActiveTxPerConn=100]], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]], commFailureRslvr=null]
20.11.2020 08:18:18.084 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Daemon mode: off
...
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Remote Management [restart: off, REST: on, JMX (remote: on, port: 8071, auth: off, ssl: off)]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Logger: JavaLogger [quiet=true, config=null]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - IGNITE_HOME=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - VM arguments: [-Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=8071, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -Djava.rmi.server.hostname=127.0.0.1, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=log/dump.hprof, -XX:+UseG1GC, -XX:+UseStringDeduplication, --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED, --add-exports=java.base/sun.nio.ch=ALL-UNNAMED, --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED, --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED, --add-exports=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED, --illegal-access=permit, -Xmx500m]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - System cache's DataRegion size is configured to 10 MB. Use DataStorageConfiguration.systemRegionInitialSize property to change the setting.
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]
20.11.2020 08:18:18.100 WARN [main] org.apache.ignite.internal.IgniteKernal:295 - Peer class loading is enabled (disable it in production for performance and deployment consistency reasons)
20.11.2020 08:18:18.100 WARN [main] org.apache.ignite.internal.IgniteKernal:295 - Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - 3-rd party licenses can be found at: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\libs\licenses
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_VERSION=2.1.4]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [NODE_NAME=EESRV-LBXC03]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_NUMBER=848]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [NODE_TYPE=LABBOX]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [VERSION=0]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_TIME=1604577743000]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [APPLICATION_NAME=Labbox]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_GIT_HASH=ff2f1f3]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [KEY=_OL2;f~.C3n}yo6p<Zx=BE4I2P:lDL"f]
20.11.2020 08:18:18.163 WARN [pub-#19] org.apache.ignite.internal.GridDiagnostic:295 - This operating system has been tested less rigorously: Windows Server 2012 R2 6.3 amd64. Our team will appreciate the feedback if you experience any problems running ignite in this environment.
20.11.2020 08:18:18.163 WARN [pub-#22] org.apache.ignite.internal.GridDiagnostic:295 - Initial heap size is 64MB (should be no less than 512MB, use -Xms512m -Xmx512m).
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - Configured plugins:
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - ^-- Authentication 1.0.0
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - ^-- null
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 -
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.processors.failure.FailureProcessor:285 - Configured failure handler: [hnd=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]
20.11.2020 08:18:18.600 INFO [main] o.a.i.s.communication.tcp.TcpCommunicationSpi:285 - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
20.11.2020 08:18:18.678 WARN [main] o.a.i.s.communication.tcp.TcpCommunicationSpi:295 - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
20.11.2020 08:18:18.694 WARN [main] o.a.i.spi.checkpoint.noop.NoopCheckpointSpi:295 - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
20.11.2020 08:18:18.741 WARN [main] o.a.i.i.m.collision.GridCollisionManager:295 - Collision resolution is disabled (all jobs will be activated upon arrival).
20.11.2020 08:18:18.741 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Security status [authentication=off, tls/ssl=off]
20.11.2020 08:18:18.866 INFO [main] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=0e60d50b-ee2e-46ed-8d76-5cb51791011b]
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.p.filename.PdsFoldersResolver:285 - Successfully locked persistence storage folder [D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5]
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.p.filename.PdsFoldersResolver:285 - Consistent ID used for local node is [1dbddb2c-ef76-4811-b7d3-46da82061bc5] according to persistence data storage folders
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.b.CacheObjectBinaryProcessorImpl:285 - Resolved directory for serialized binary metadata: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\binary_meta\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5
20.11.2020 08:18:19.631 INFO [main] o.a.i.i.p.c.p.file.FilePageStoreManager:285 - Resolved page store work directory: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5
20.11.2020 08:18:19.694 INFO [main] o.a.i.i.p.c.p.w.f.FileHandleManagerImpl:285 - Initialized write-ahead log manager [mode=LOG_ONLY]
20.11.2020 08:18:19.772 WARN [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:295 - DataRegionConfiguration.maxWalArchiveSize instead DataRegionConfiguration.walHistorySize would be used for removing old archive wal files
20.11.2020 08:18:19.803 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Configured data regions initialized successfully [total=5]
20.11.2020 08:18:19.834 INFO [main] o.a.i.i.p.c.d.d.t.PartitionsEvictManager:285 - Evict partition permits=2
20.11.2020 08:18:19.850 INFO [main] o.a.i.i.p.odbc.ClientListenerProcessor:285 - Client connector processor has started on TCP port 10800
20.11.2020 08:18:20.006 INFO [main] o.a.i.i.p.r.protocols.tcp.GridTcpRestProtocol:285 - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
20.11.2020 08:18:20.115 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Non-loopback local IPs: 192.168.92.177, fe80:0:0:0:6859:37c8:f543:8087%eth4
20.11.2020 08:18:20.115 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Enabled local MACs: 00000000000000E0, 005056BD5072
20.11.2020 08:18:20.131 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:20.147 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.147 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Checking memory state [lastValidPos=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:20.225 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#5853495b, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.256 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#21f459fc, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.256 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Found last checkpoint marker [cpId=8b5aaf2a-7867-47b0-879c-85791363041f, pos=FileWALPointer [idx=512, fileOff=3672982, len=99269]]
20.11.2020 08:18:20.350 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:20.365 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#6c15e8c7, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Finished applying WAL changes [updatesApplied=0, time=31 ms]
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Restoring partition state for local groups.
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Finished restoring partition state for local groups [groupsProcessed=0, partitionsProcessed=0, time=0ms]
20.11.2020 08:18:20.412 INFO [main] o.a.i.i.p.cluster.GridClusterStateProcessor:285 - Restoring history for BaselineTopology[id=12]
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.c.DistributedBaselineConfiguration:285 - Baseline parameter 'baselineAutoAdjustEnabled' was changed from 'null' to 'true'
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.c.DistributedBaselineConfiguration:285 - Baseline parameter 'baselineAutoAdjustTimeout' was changed from 'null' to '300000'
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.p.c.p.file.FilePageStoreManager:285 - Cleanup cache stores [total=1, left=0, cleanFiles=false]
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Configured data regions started successfully [total=5]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Starting binary memory restore for: [166757441, -1947899996, -8785046, -2100569601, 1793235927, -499392514, 30677022, 129211407, 1139332309, 1725334265]
20.11.2020 08:18:21.334 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:21.334 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Checking memory state [lastValidPos=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:21.365 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#317e9c3c, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.397 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#31a3f4de, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.397 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Found last checkpoint marker [cpId=8b5aaf2a-7867-47b0-879c-85791363041f, pos=FileWALPointer [idx=512, fileOff=3672982, len=99269]]
20.11.2020 08:18:21.412 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Binary memory state restored at node startup [restoredPtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.428 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:21.568 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=license, id=166757441, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.584 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=819,1 MiB, pages=203256, tableSize=15,8 MiB, checkpointBuffer=256,0 MiB]
20.11.2020 08:18:21.584 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=commservices, id=-8785046, dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=ignite-sys-cache, id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinespecifications, id=1793235927, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=nxisPorts, id=-499392514, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=datastructures_ATOMIC_PARTITIONED_1#labqueue, id=1205724040, group=labqueue, dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=ignite-sys-atomic-cache#labqueue, id=-327698687, group=labqueue, dataRegionName=default, mode=PARTITIONED, atomicity=TRANSACTIONAL, backups=1, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinemaxbatchno, id=30677022, dataRegionName=persistent, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machineconfiguration, id=129211407, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=specimentracer, id=1139332309, dataRegionName=persistent, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinestatus, id=1725334265, dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Binary recovery performed in 1109 ms.
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:21.662 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:21.693 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Finished applying WAL changes [updatesApplied=0, time=31 ms]
20.11.2020 08:18:21.693 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Restoring partition state for local groups.
20.11.2020 08:18:21.943 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Finished restoring partition state for local groups [groupsProcessed=10, partitionsProcessed=5220, time=235ms]
20.11.2020 08:18:22.021 INFO [main] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Connection check threshold is calculated: 10000
20.11.2020 08:19:19.373 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.175, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery spawning a new thread for connection [rmtAddr=/192.168.92.175, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-sock-reader-[]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Started serving remote node connection [rmtAddr=/192.168.92.175:56962, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-sock-reader-[9f44068b 192.168.92.175:56962 client]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Initialized connection with remote client node [nodeId=9f44068b-b8ca-4d8b-bb32-efd2e2a1940c, rmtAddr=/192.168.92.175:56962]
20.11.2020 08:19:19.498 INFO [tcp-disco-sock-reader-[9f44068b 192.168.92.175:56962 client]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Finished serving remote node connection [rmtAddr=/192.168.92.175:56962, rmtPort=56962
20.11.2020 08:20:21.287 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.176, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery spawning a new thread for connection [rmtAddr=/192.168.92.176, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Started serving remote node connection [rmtAddr=/192.168.92.176:55941, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[6a50abff 192.168.92.176:55941]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Initialized connection with remote server node [nodeId=6a50abff-8cfd-4b3a-b894-54fa9d405d36, rmtAddr=/192.168.92.176:55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[6a50abff 192.168.92.176:55941]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Finished serving remote node connection [rmtAddr=/192.168.92.176:55941, rmtPort=55941
20.11.2020 08:20:26.239 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.175, rmtPort=56996]
... it continues like that till the join or failure
The logs log the same on all servers. In this case server 1 and 2 create a cluster after 7 minutes. server 3 fails after 9 minutes due to incompatible baseline topology. After reseting the failed server it can rejoin the cluster. The behavior only happens sometimes. Most of the time the servers rebuild the cluster without problem.

unable to setup a connection between kafka(hortonworks sandbox) and intelliJ IDEA(local windows system)

Here is the exception:
exception:java.nio.channels.ClosedChannelException
whole logs in the console:
[main] INFO kafka.utils.Log4jControllerRegistration$ - Registered kafka:type=kafka.Log4jController MBean
[main] INFO kafka.utils.VerifiableProperties - Verifying properties
[main] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to xxx.xxx.xxx.xxx:6667
[main] INFO kafka.utils.VerifiableProperties - Property request.required.acks is overridden to 1
[main] INFO kafka.utils.VerifiableProperties - Property serializer.class is overridden to kafka.serializer.StringEncoder
[Thread-0] INFO kafka.client.ClientUtils$ - Fetching metadata from broker BrokerEndPoint(0,xxx.xxx.xxx.xxx,6667) with correlation id 0 for 1 topic(s) Set(test)
[Thread-0] INFO kafka.producer.SyncProducer - Connected to xxx.xxx.xxx.xxx:6667 for producing
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from xxx.xxx.xxx.xxx:6667
[Thread-0] WARN kafka.client.ClientUtils$ - Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,xxx.xxx.xxx.xxx,6667)] failed
java.nio.channels.ClosedChannelException
I have found some answers online that told me that I should set the advertised.host.name in the server.properties but I really don't know to set which IP to advertised.host.name.
I am totally lost in this situation, here is the java code I write in IntelliJ, I just want to know which hostname should I put in BROKER_LIST to make a connection between hortonworks sandbox and local machine.
public class KafkaProperties {
public static final String ZK="127.0.0.1:2181";
public static final String TOPIC="test";
public static final String BROKER_LIST= "xx.xxx.xxx.xxx:6667";
}
server.properties:
# Generated by Apache Ambari. Sun Mar 1 19:04:58 2020
auto.create.topics.enable=true
auto.leader.rebalance.enable=true
compression.type=producer
controlled.shutdown.enable=true
controlled.shutdown.max.retries=3
controlled.shutdown.retry.backoff.ms=5000
controller.message.queue.size=10
controller.socket.timeout.ms=30000
default.replication.factor=1
delete.topic.enable=true
external.kafka.metrics.exclude.prefix=kafka.network.RequestMetrics,kafka.server.DelayedOperationPurgatory,kafka.server.BrokerTopicMetrics.BytesRejectedPerSec
external.kafka.metrics.include.prefix=kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetCommit.98percentile,kafka.network.RequestMetrics.ResponseQueueTimeMs.request.Offsets.95percentile,kafka.network.RequestMetrics.ResponseSendTimeMs.request.Fetch.95percentile,kafka.network.RequestMetrics.RequestsPerSec.request
fetch.purgatory.purge.interval.requests=10000
kafka.ganglia.metrics.group=kafka
kafka.ganglia.metrics.host=localhost
kafka.ganglia.metrics.port=8671
kafka.ganglia.metrics.reporter.enabled=true
kafka.metrics.reporters=
kafka.timeline.metrics.host_in_memory_aggregation=
kafka.timeline.metrics.host_in_memory_aggregation_port=
kafka.timeline.metrics.host_in_memory_aggregation_protocol=
kafka.timeline.metrics.hosts=
kafka.timeline.metrics.maxRowCacheSize=10000
kafka.timeline.metrics.port=
kafka.timeline.metrics.protocol=
kafka.timeline.metrics.reporter.enabled=true
kafka.timeline.metrics.reporter.sendInterval=5900
kafka.timeline.metrics.truststore.password=
kafka.timeline.metrics.truststore.path=
kafka.timeline.metrics.truststore.type=
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
listeners=PLAINTEXT://sandbox-hdp.hortonworks.com:6667
log.cleanup.interval.mins=10
log.dirs=/kafka-logs
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.bytes=-1
log.retention.check.interval.ms=600000
log.retention.hours=168
log.roll.hours=168
log.segment.bytes=1073741824
message.max.bytes=1000000
min.insync.replicas=1
num.io.threads=8
num.network.threads=3
num.partitions=1
num.recovery.threads.per.data.dir=1
num.replica.fetchers=1
offset.metadata.max.bytes=4096
offsets.commit.required.acks=-1
offsets.commit.timeout.ms=5000
offsets.load.buffer.size=5242880
offsets.retention.check.interval.ms=600000
offsets.retention.minutes=86400000
offsets.topic.compression.codec=0
offsets.topic.num.partitions=50
offsets.topic.replication.factor=1
offsets.topic.segment.bytes=104857600
port=6667
producer.metrics.enable=false
producer.purgatory.purge.interval.requests=10000
queued.max.requests=500
replica.fetch.max.bytes=1048576
replica.fetch.min.bytes=1
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.lag.max.messages=4000
replica.lag.time.max.ms=10000
replica.socket.receive.buffer.bytes=65536
replica.socket.timeout.ms=30000
sasl.enabled.mechanisms=GSSAPI
sasl.mechanism.inter.broker.protocol=GSSAPI
security.inter.broker.protocol=PLAINTEXT
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.client.auth=none
ssl.key.password=
ssl.keystore.location=
ssl.keystore.password=
ssl.truststore.location=
ssl.truststore.password=
zookeeper.connect=sandbox-hdp.hortonworks.com:2181
zookeeper.connection.timeout.ms=25000
zookeeper.session.timeout.ms=30000
zookeeper.sync.time.ms=2000
now the code in IntelliJ:
public class KafkaProperties {
public static final String ZK="sandbox-hdp.hortonworks.com:2181";
public static final String TOPIC="yanzhao";
public static final String BROKER_LIST= "sandbox-hdp.hortonworks.com:6667";
}
/ect/hosts file:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.2 sandbox-hdp.hortonworks.com sandbox-hdp
a part of jps command output
7846 QuorumPeerMain /usr/hdp/current/zookeeper-server/conf/zoo.cfg
363 AmbariServer
6444 JournalNode
25581 Kafka /usr/hdp/3.0.1.0-187/kafka/config/server.properties
connection:
[root#sandbox-hdp ~]# netstat -lpn | grep 6667
tcp 0 0 172.18.0.2:6667 0.0.0.0:* LISTEN 25581/java
the command I ran in the sandbox(set up the consumer):
kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --topic yanzhao
exception logs:
[main] INFO kafka.utils.Log4jControllerRegistration$ - Registered kafka:type=kafka.Log4jController MBean
[main] INFO kafka.utils.VerifiableProperties - Verifying properties
[main] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to sandbox-hdp.hortonworks.com:6667
[main] INFO kafka.utils.VerifiableProperties - Property request.required.acks is overridden to 1
[main] INFO kafka.utils.VerifiableProperties - Property serializer.class is overridden to kafka.serializer.StringEncoder
[Thread-0] INFO kafka.client.ClientUtils$ - Fetching metadata from broker BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667) with correlation id 0 for 1 topic(s) Set(yanzhao)
[Thread-0] INFO kafka.producer.SyncProducer - Connected to sandbox-hdp.hortonworks.com:6667 for producing
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from sandbox-hdp.hortonworks.com:6667
[Thread-0] WARN kafka.client.ClientUtils$ - Fetching topic metadata with correlation id 0 for topics [Set(yanzhao)] from broker [BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667)] failed
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79)
at kafka.producer.SyncProducer.send(SyncProducer.scala:124)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:63)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:83)
at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:76)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:76)
at kafka.producer.Producer.send(Producer.scala:78)
at kafka.javaapi.producer.Producer.send(Producer.scala:35)
at com.yanzhao.spark.kafka.KafkaProducer.run(KafkaProducer.java:32)
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from sandbox-hdp.hortonworks.com:6667
[Thread-0] ERROR kafka.utils.CoreUtils$ - fetching topic metadata for topics [Set(yanzhao)] from broker [ArrayBuffer(BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667))] failed
kafka.common.KafkaException: fetching topic metadata for topics [Set(yanzhao)] from broker [ArrayBuffer(BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667))] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:77)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:83)
at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:76)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:76)
at kafka.producer.Producer.send(Producer.scala:78)
at kafka.javaapi.producer.Producer.send(Producer.scala:35)
at com.yanzhao.spark.kafka.KafkaProducer.run(KafkaProducer.java:32)
Caused by: java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79)
at kafka.producer.SyncProducer.send(SyncProducer.scala:124)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:63)
... 7 more

You need to use the hostname of the machine on which your Kafka broker is running (and obviously not the IP of the machine the client is running on).
Now your client needs to use the address the Kafka broker publishes to the public. This address is configured through advertised.listeners:
Listeners to publish to ZooKeeper for clients to use, if different
than the listeners config property. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
Therefore, you should use this address. In case the advertised.listeners is not configured in server.properties you can probably still use listeners address.
On a final note, I can see that you have used "127.0.0.1:2181" as the Zookeeper address. Likewise, you need to use the hostname of the machine where Zookeeper is running.

Hazelcast web clustered session service: Retrying the connection

hazelcast unable to connect, the message receive is as follows
2018-03-03 10:27:51,074 INFO c.h.i.DefaultAddressPicker [LOCAL] [dev] [3.6] Picked Address[127.0.0.1]:5703, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5703], bind any local is true
2018-03-03 10:28:01.078 INFO 19478 --- [.ensureInstance] c.hazelcast.web.ClusteredSessionService : Retrying the connection!!
2018-03-03 10:28:01,078 INFO c.h.w.ClusteredSessionService Retrying the connection!!
2018-03-03 10:28:01.079 INFO 19478 --- [.ensureInstance] com.hazelcast.config.XmlConfigLocator : Loading 'hazelcast-default.xml' from classpath.
2018-03-03 10:28:01,079 INFO c.h.c.XmlConfigLocator Loading 'hazelcast-default.xml' from classpath.
2018-03-03 10:28:01.085 INFO 19478 --- [.ensureInstance] c.hazelcast.web.HazelcastInstanceLoader : Creating a new HazelcastInstance for session replication
2018-03-03 10:28:01,085 INFO c.h.w.HazelcastInstanceLoader Creating a new HazelcastInstance for session replication
2018-03-03 10:28:01.086 INFO 19478 --- [.ensureInstance] c.h.instance.DefaultAddressPicker : [LOCAL] [dev] [3.6] Picked Address[127.0.0.1]:5703, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5703], bind any local is true

Cassandra Java Driver Cold to Hot in 500ms?

I experience a cold to hot (first use) of cluster and session to a local data source (Cassandra) to take 640ms. Any additional connect takes 80 to 100ms so the overhead of the first connect is about 500+ms. Is that normal and is there anything I can do to get this figure down somehow? I use a T410 (i5 2.5GHz).
[Update]
23:27:11.453 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NEW_NODE_DELAY_SECONDS is undefined, using default value 1
23:27:11.460 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NON_BLOCKING_EXECUTOR_SIZE is undefined, using default value 4
23:27:11.463 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NOTIF_LOCK_TIMEOUT_SECONDS is undefined, using default value 60
23:27:11.607 [main] DEBUG com.datastax.driver.core.Cluster - Starting new cluster with contact points [localhost/127.0.0.1:9042]
23:27:11.905 [main] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-1, inFlight=0, closed=false] Transport initialized and ready
23:27:11.906 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing node list and token map
23:27:11.969 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing schema
23:27:12.016 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing node list and token map
23:27:12.051 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Successfully connected to localhost/127.0.0.1:9042
23:27:12.052 [main] INFO c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
23:27:12.053 [main] INFO com.datastax.driver.core.Cluster - New Cassandra host localhost/127.0.0.1:9042 added
23:27:12.076 [Cassandra Java Driver worker-0] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=false] Transport initialized and ready
23:27:12.077 [Cassandra Java Driver worker-0] DEBUG com.datastax.driver.core.Session - Added connection pool for localhost/127.0.0.1:9042
23:27:12.097 [main] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=true] closing connection
23:27:12.103 [main] DEBUG com.datastax.driver.core.Cluster - Shutting down
23:27:12.105 [main] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-1, inFlight=0, closed=true] closing connection
23:27:12.123 [main] DEBUG com.datastax.driver.core.Cluster - Starting new cluster with contact points [/127.0.0.1:9042]
23:27:12.132 [main] DEBUG com.datastax.driver.core.Connection - Connection[/127.0.0.1:9042-1, inFlight=0, closed=false] Transport initialized and ready
23:27:12.132 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing node list and token map
23:27:12.138 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing schema
23:27:12.168 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Refreshing node list and token map
23:27:12.192 [main] DEBUG c.d.driver.core.ControlConnection - [Control connection] Successfully connected to /127.0.0.1:9042
23:27:12.192 [main] INFO c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
23:27:12.192 [main] INFO com.datastax.driver.core.Cluster - New Cassandra host /127.0.0.1:9042 added
23:27:12.201 [Cassandra Java Driver worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/127.0.0.1:9042-2, inFlight=0, closed=false] Transport initialized and ready
23:27:12.202 [Cassandra Java Driver worker-0] DEBUG com.datastax.driver.core.Session - Added connection pool for /127.0.0.1:9042
As one can see the first connection attempt uses up to 600ms and more depending how one might read the figures.

My guess is this has to do with connection initialization. In all currently released versions of the java driver connections are initialized 1 after another synchronously. Fortunately, individual host pools are initialized in parallel, but the connections are not. If you are using 2.0.9, which has a default # of core connections of 8 that could explain why you are seeing slow initialization times. Also if you are using password authentication, that will slow things down quite a bit as well (from ~0-10ms per connection to ~60-120ms).
In java driver 2.0.10, which will be released soon, all connections are initialized in parallel which greatly improves Session initialization. For information see JAVA-701.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

hazelcast-kubernetes Network Discovery: How to use with Multiple Nodes - java

Could you try with service dns name as: myapp.mynamespace.endpoints.cluster.local Please reply it's work or not and also post your full log.

I know it's kinda happened long time ago. But the problem here was using wrong class name for discovery strategy. It should be com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy

Related

com.hazelcast.cp.exception.NotLeaderException no leader election on 3 nodes

Apache Ignite: Slow Node join and failure

unable to setup a connection between kafka(hortonworks sandbox) and intelliJ IDEA(local windows system)

Hazelcast web clustered session service: Retrying the connection

Cassandra Java Driver Cold to Hot in 500ms?

Categories

Resources