We have a Ignite setup with 3 Servers and Persistence and therefore Baselining enabled. From time to time we have the issue that the Servers take a long time to rebuild the cluster after all Nodes are restarted. Ignite runs embedded in the application.
20.11.2020 08:18:17.678 WARN [main] org.apache.ignite.internal.util.typedef.G:290 - Ignite work directory is not provided, automatically resolved to: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work
20.11.2020 08:18:17.709 WARN [main] org.apache.ignite.internal.util.typedef.G:295 - Consistent ID is not set, it is recommended to set consistent ID for production clusters (use IgniteConfiguration.setConsistentId property)
20.11.2020 08:18:18.053 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Config URL: n/a
20.11.2020 08:18:18.084 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - IgniteConfiguration [igniteInstanceName=null, pubPoolSize=8, svcPoolSize=8, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=4, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=8, sqlQryHistSize=1000, dfltQryTimeout=0, igniteHome=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite, igniteWorkDir=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer#78c03f1f, nodeId=0e60d50b-ee2e-46ed-8d76-5cb51791011b, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout=5000, netCompressionLevel=1, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy#522ba524, chConnPlc=null, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch#29c5ee1d[Count = 1], stopping=false, metricsLsnr=null], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi#15cea7b0, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi#1e6cc850, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi#7e7f0f0a, clientMode=false, rebalanceThreadPoolSize=4, rebalanceTimeout=10000, rebalanceBatchesPrefetchCnt=3, rebalanceThrottle=0, rebalanceBatchSize=524288, txCfg=TransactionConfiguration [txSerEnabled=false, dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000, pessimisticTxLogSize=0, pessimisticTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=10000, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=30000, metricsLogFreq=0, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=8, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=10485760, sysRegionMaxSize=52428800, pageSize=0, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default, maxSize=858886144, initSize=10485760, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=true, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true], dataRegions=DataRegionConfiguration[] [DataRegionConfiguration [name=persistent, maxSize=52428800, initSize=10485760, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=true, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true]], storagePath=null, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=1073741824, walSegments=4, walSegmentSize=10485760, walPath=db/wal, walArchivePath=db/wal/archive, metricsEnabled=false, walMode=LOG_ONLY, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory#59429fac, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null, walPageCompression=DISABLED, walPageCompressionLevel=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0, handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration [maxActiveTxPerConn=100]], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]], commFailureRslvr=null]
20.11.2020 08:18:18.084 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Daemon mode: off
...
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Remote Management [restart: off, REST: on, JMX (remote: on, port: 8071, auth: off, ssl: off)]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Logger: JavaLogger [quiet=true, config=null]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - IGNITE_HOME=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - VM arguments: [-Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=8071, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -Djava.rmi.server.hostname=127.0.0.1, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=log/dump.hprof, -XX:+UseG1GC, -XX:+UseStringDeduplication, --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED, --add-exports=java.base/sun.nio.ch=ALL-UNNAMED, --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED, --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED, --add-exports=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED, --illegal-access=permit, -Xmx500m]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - System cache's DataRegion size is configured to 10 MB. Use DataStorageConfiguration.systemRegionInitialSize property to change the setting.
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]
20.11.2020 08:18:18.100 WARN [main] org.apache.ignite.internal.IgniteKernal:295 - Peer class loading is enabled (disable it in production for performance and deployment consistency reasons)
20.11.2020 08:18:18.100 WARN [main] org.apache.ignite.internal.IgniteKernal:295 - Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - 3-rd party licenses can be found at: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\libs\licenses
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_VERSION=2.1.4]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [NODE_NAME=EESRV-LBXC03]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_NUMBER=848]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [NODE_TYPE=LABBOX]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [VERSION=0]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_TIME=1604577743000]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [APPLICATION_NAME=Labbox]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [BUILD_GIT_HASH=ff2f1f3]
20.11.2020 08:18:18.100 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Local node user attribute [KEY=_OL2;f~.C3n}yo6p<Zx=BE4I2P:lDL"f]
20.11.2020 08:18:18.163 WARN [pub-#19] org.apache.ignite.internal.GridDiagnostic:295 - This operating system has been tested less rigorously: Windows Server 2012 R2 6.3 amd64. Our team will appreciate the feedback if you experience any problems running ignite in this environment.
20.11.2020 08:18:18.163 WARN [pub-#22] org.apache.ignite.internal.GridDiagnostic:295 - Initial heap size is 64MB (should be no less than 512MB, use -Xms512m -Xmx512m).
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - Configured plugins:
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - ^-- Authentication 1.0.0
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 - ^-- null
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.p.plugin.IgnitePluginProcessor:285 -
20.11.2020 08:18:18.334 INFO [main] o.a.i.i.processors.failure.FailureProcessor:285 - Configured failure handler: [hnd=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]
20.11.2020 08:18:18.600 INFO [main] o.a.i.s.communication.tcp.TcpCommunicationSpi:285 - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
20.11.2020 08:18:18.678 WARN [main] o.a.i.s.communication.tcp.TcpCommunicationSpi:295 - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
20.11.2020 08:18:18.694 WARN [main] o.a.i.spi.checkpoint.noop.NoopCheckpointSpi:295 - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
20.11.2020 08:18:18.741 WARN [main] o.a.i.i.m.collision.GridCollisionManager:295 - Collision resolution is disabled (all jobs will be activated upon arrival).
20.11.2020 08:18:18.741 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Security status [authentication=off, tls/ssl=off]
20.11.2020 08:18:18.866 INFO [main] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=0e60d50b-ee2e-46ed-8d76-5cb51791011b]
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.p.filename.PdsFoldersResolver:285 - Successfully locked persistence storage folder [D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5]
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.p.filename.PdsFoldersResolver:285 - Consistent ID used for local node is [1dbddb2c-ef76-4811-b7d3-46da82061bc5] according to persistence data storage folders
20.11.2020 08:18:18.866 INFO [main] o.a.i.i.p.c.b.CacheObjectBinaryProcessorImpl:285 - Resolved directory for serialized binary metadata: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\binary_meta\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5
20.11.2020 08:18:19.631 INFO [main] o.a.i.i.p.c.p.file.FilePageStoreManager:285 - Resolved page store work directory: D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5
20.11.2020 08:18:19.694 INFO [main] o.a.i.i.p.c.p.w.f.FileHandleManagerImpl:285 - Initialized write-ahead log manager [mode=LOG_ONLY]
20.11.2020 08:18:19.772 WARN [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:295 - DataRegionConfiguration.maxWalArchiveSize instead DataRegionConfiguration.walHistorySize would be used for removing old archive wal files
20.11.2020 08:18:19.803 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Configured data regions initialized successfully [total=5]
20.11.2020 08:18:19.834 INFO [main] o.a.i.i.p.c.d.d.t.PartitionsEvictManager:285 - Evict partition permits=2
20.11.2020 08:18:19.850 INFO [main] o.a.i.i.p.odbc.ClientListenerProcessor:285 - Client connector processor has started on TCP port 10800
20.11.2020 08:18:20.006 INFO [main] o.a.i.i.p.r.protocols.tcp.GridTcpRestProtocol:285 - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
20.11.2020 08:18:20.115 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Non-loopback local IPs: 192.168.92.177, fe80:0:0:0:6859:37c8:f543:8087%eth4
20.11.2020 08:18:20.115 INFO [main] org.apache.ignite.internal.IgniteKernal:285 - Enabled local MACs: 00000000000000E0, 005056BD5072
20.11.2020 08:18:20.131 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:20.147 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.147 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Checking memory state [lastValidPos=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:20.225 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#5853495b, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.256 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#21f459fc, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.256 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Found last checkpoint marker [cpId=8b5aaf2a-7867-47b0-879c-85791363041f, pos=FileWALPointer [idx=512, fileOff=3672982, len=99269]]
20.11.2020 08:18:20.350 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:20.365 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#6c15e8c7, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Finished applying WAL changes [updatesApplied=0, time=31 ms]
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Restoring partition state for local groups.
20.11.2020 08:18:20.381 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Finished restoring partition state for local groups [groupsProcessed=0, partitionsProcessed=0, time=0ms]
20.11.2020 08:18:20.412 INFO [main] o.a.i.i.p.cluster.GridClusterStateProcessor:285 - Restoring history for BaselineTopology[id=12]
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.c.DistributedBaselineConfiguration:285 - Baseline parameter 'baselineAutoAdjustEnabled' was changed from 'null' to 'true'
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.c.DistributedBaselineConfiguration:285 - Baseline parameter 'baselineAutoAdjustTimeout' was changed from 'null' to '300000'
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.p.c.p.file.FilePageStoreManager:285 - Cleanup cache stores [total=1, left=0, cleanFiles=false]
20.11.2020 08:18:20.522 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Configured data regions started successfully [total=5]
20.11.2020 08:18:20.537 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Starting binary memory restore for: [166757441, -1947899996, -8785046, -2100569601, 1793235927, -499392514, 30677022, 129211407, 1139332309, 1725334265]
20.11.2020 08:18:21.334 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:21.334 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Checking memory state [lastValidPos=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:21.365 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#317e9c3c, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.397 WARN [main] o.a.i.i.p.c.p.wal.FileWriteAheadLogManager:290 - WAL segment tail reached. [idx=512, isWorkDir=true, serVer=org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer#31a3f4de, actualFilePtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.397 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Found last checkpoint marker [cpId=8b5aaf2a-7867-47b0-879c-85791363041f, pos=FileWALPointer [idx=512, fileOff=3672982, len=99269]]
20.11.2020 08:18:21.412 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Binary memory state restored at node startup [restoredPtr=FileWALPointer [idx=512, fileOff=3772251, len=0]]
20.11.2020 08:18:21.428 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=50,0 MiB, pages=12404, tableSize=988,2 KiB, checkpointBuffer=50,0 MiB]
20.11.2020 08:18:21.568 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=license, id=166757441, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.584 INFO [main] o.a.i.i.p.c.p.pagemem.PageMemoryImpl:285 - Started page memory [memoryAllocated=819,1 MiB, pages=203256, tableSize=15,8 MiB, checkpointBuffer=256,0 MiB]
20.11.2020 08:18:21.584 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=commservices, id=-8785046, dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=ignite-sys-cache, id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinespecifications, id=1793235927, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.615 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=nxisPorts, id=-499392514, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=datastructures_ATOMIC_PARTITIONED_1#labqueue, id=1205724040, group=labqueue, dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=ignite-sys-atomic-cache#labqueue, id=-327698687, group=labqueue, dataRegionName=default, mode=PARTITIONED, atomicity=TRANSACTIONAL, backups=1, mvcc=false]
20.11.2020 08:18:21.631 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinemaxbatchno, id=30677022, dataRegionName=persistent, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machineconfiguration, id=129211407, dataRegionName=persistent, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=specimentracer, id=1139332309, dataRegionName=persistent, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Started cache in recovery mode [name=machinestatus, id=1725334265, dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Binary recovery performed in 1109 ms.
20.11.2020 08:18:21.646 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Read checkpoint status [startMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-START.bin, endMarker=D:\IntegrationSolutions\Services\LabDeviceHUB\Labbox\.\..\userdata\labbox\ignite\work\db\node00-1dbddb2c-ef76-4811-b7d3-46da82061bc5\cp\1605855371041-8b5aaf2a-7867-47b0-879c-85791363041f-END.bin]
20.11.2020 08:18:21.662 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=512, fileOff=3672982, len=99269], lastCheckpointId=8b5aaf2a-7867-47b0-879c-85791363041f]
20.11.2020 08:18:21.693 INFO [main] o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:285 - Finished applying WAL changes [updatesApplied=0, time=31 ms]
20.11.2020 08:18:21.693 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Restoring partition state for local groups.
20.11.2020 08:18:21.943 INFO [main] o.a.i.i.processors.cache.GridCacheProcessor:285 - Finished restoring partition state for local groups [groupsProcessed=10, partitionsProcessed=5220, time=235ms]
20.11.2020 08:18:22.021 INFO [main] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Connection check threshold is calculated: 10000
20.11.2020 08:19:19.373 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.175, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery spawning a new thread for connection [rmtAddr=/192.168.92.175, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-sock-reader-[]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Started serving remote node connection [rmtAddr=/192.168.92.175:56962, rmtPort=56962]
20.11.2020 08:19:19.389 INFO [tcp-disco-sock-reader-[9f44068b 192.168.92.175:56962 client]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Initialized connection with remote client node [nodeId=9f44068b-b8ca-4d8b-bb32-efd2e2a1940c, rmtAddr=/192.168.92.175:56962]
20.11.2020 08:19:19.498 INFO [tcp-disco-sock-reader-[9f44068b 192.168.92.175:56962 client]-#4] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Finished serving remote node connection [rmtAddr=/192.168.92.175:56962, rmtPort=56962
20.11.2020 08:20:21.287 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.176, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery spawning a new thread for connection [rmtAddr=/192.168.92.176, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Started serving remote node connection [rmtAddr=/192.168.92.176:55941, rmtPort=55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[6a50abff 192.168.92.176:55941]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Initialized connection with remote server node [nodeId=6a50abff-8cfd-4b3a-b894-54fa9d405d36, rmtAddr=/192.168.92.176:55941]
20.11.2020 08:20:21.287 INFO [tcp-disco-sock-reader-[6a50abff 192.168.92.176:55941]-#5] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - Finished serving remote node connection [rmtAddr=/192.168.92.176:55941, rmtPort=55941
20.11.2020 08:20:26.239 INFO [tcp-disco-srvr-[:47500]-#3] o.a.ignite.spi.discovery.tcp.TcpDiscoverySpi:285 - TCP discovery accepted incoming connection [rmtAddr=/192.168.92.175, rmtPort=56996]
... it continues like that till the join or failure
The logs log the same on all servers. In this case server 1 and 2 create a cluster after 7 minutes. server 3 fails after 9 minutes due to incompatible baseline topology. After reseting the failed server it can rejoin the cluster. The behavior only happens sometimes. Most of the time the servers rebuild the cluster without problem.
I started to work with Hazelcast cache and I want to expose metrics for them and I don't know how to do it.
My java-config
`#Configuration
public class HazelcastConfiguration {
#Bean
public Config config(){
return new Config()
.setInstanceName("hazelcast-instace")
.addMapConfig(
new MapConfig()
.setName("testing")
.setMaxSizeConfig(new MaxSizeConfig(10, MaxSizeConfig.MaxSizePolicy.FREE_HEAP_SIZE))
.setEvictionPolicy(EvictionPolicy.LRU)
.setTimeToLiveSeconds(1000)
.setStatisticsEnabled(true)
);
}
}`
During startup application, I see only this logs
2019-11-30 19:56:01.579 INFO 13444 --- [ main] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [3.12.4] Prefer IPv4 stack is true, prefer IPv6 addresses is false
2019-11-30 19:56:01.671 INFO 13444 --- [ main] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [3.12.4] Picked [192.168.43.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2019-11-30 19:56:01.694 INFO 13444 --- [ main] com.hazelcast.system : [192.168.43.2]:5701 [dev] [3.12.4] Hazelcast 3.12.4 (20191030 - eab1290) starting at [192.168.43.2]:5701
2019-11-30 19:56:01.695 INFO 13444 --- [ main] com.hazelcast.system : [192.168.43.2]:5701 [dev] [3.12.4] Copyright (c) 2008-2019, Hazelcast, Inc. All Rights Reserved.
2019-11-30 19:56:02.037 INFO 13444 --- [ main] c.h.s.i.o.impl.BackpressureRegulator : [192.168.43.2]:5701 [dev] [3.12.4] Backpressure is disabled
2019-11-30 19:56:02.761 INFO 13444 --- [ main] com.hazelcast.instance.Node : [192.168.43.2]:5701 [dev] [3.12.4] Creating MulticastJoiner
2019-11-30 19:56:02.998 INFO 13444 --- [ main] c.h.s.i.o.impl.OperationExecutorImpl : [192.168.43.2]:5701 [dev] [3.12.4] Starting 4 partition threads and 3 generic threads (1 dedicated for priority tasks)
2019-11-30 19:56:02.999 INFO 13444 --- [ main] c.h.internal.diagnostics.Diagnostics : [192.168.43.2]:5701 [dev] [3.12.4] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2019-11-30 19:56:03.007 INFO 13444 --- [ main] com.hazelcast.core.LifecycleService : [192.168.43.2]:5701 [dev] [3.12.4] [192.168.43.2]:5701 is STARTING
2019-11-30 19:56:05.085 INFO 13444 --- [ main] c.h.internal.cluster.ClusterService : [192.168.43.2]:5701 [dev] [3.12.4]
Members {size:1, ver:1} [
Member [192.168.43.2]:5701 - 6ed511ff-b20b-4875-9b39-2dc734d4a9aa this
]
2019-11-30 19:56:05.142 INFO 13444 --- [ main] com.hazelcast.core.LifecycleService : [192.168.43.2]:5701 [dev] [3.12.4] [192.168.43.2]:5701 is STARTED
2019-11-30 19:56:05.295 INFO 13444 --- [e.HealthMonitor] c.h.internal.diagnostics.HealthMonitor : [192.168.43.2]:5701 [dev] [3.12.4] processors=4, physical.memory.total=23,9G, physical.memory.free=11,8G, swap.space.total=27,2G, swap.space.free=10,3G, heap.memory.used=306,9M, heap.memory.free=357,1M, heap.memory.total=664,0M, heap.memory.max=5,3G, heap.memory.used/total=46,23%, heap.memory.used/max=5,63%, minor.gc.count=0, minor.gc.time=0ms, major.gc.count=0, major.gc.time=0ms, load.process=100,00%, load.system=100,00%, load.systemAverage=n/a thread.count=37, thread.peakCount=37, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=1, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0,00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=0, client.connection.count=0, connection.count=0
after autowired class CacheManager I and run on it function getCacheNames I see an empty list, and I don't know why?
and during first added to the cache in the log display
2019-11-30 20:11:12.760 INFO 16464 --- [nio-8080-exec-6] c.h.i.p.impl.PartitionStateManager : [192.168.43.2]:5701 [dev] [3.12.4] Initializing cluster partition table arrangement...
In Metrics Config file I have this and nothing in metrics will diplay:
#Autowired
private CacheMetricsRegistrar cacheMetricsRegistrar;
anyone have ide why its doesn't work?
The cache is initialized when you insert the first entry into it. That is why you don't see any cache in CacheManager at the beginning. After you insert the first value you should see HazelcastCache in cacheManager.caches.
The same with CacheMetricsRegistrar, I just tried and see HazelcastCacheMeterBinderProvider in binderProviders.
For Spring Boot 2.x create your registration component:
#Component
#AllArgsConstructor
public class CacheMetricsRegistrator {
private final CacheMetricsRegistrar cacheMetricsRegistrar;
private final CacheManager cacheManager;
private final Config cacheConfig;
#PostConstruct
public void register() {
this.cacheConfig.getMapConfigs().keySet().forEach(
cacheName -> this.cacheMetricsRegistrar.bindCacheToRegistry(
this.cacheManager.getCache(cacheName))
);
}
}
for your cache configuration of Hazelcast:
#EnableCaching
#Configuration
public class HazelcastConfig {
private MapConfig mapPortfolioCache() {
return new MapConfig()
.setName("my-entity-cache")
.setEvictionConfig(new EvictionConfig().setMaxSizePolicy(MaxSizePolicy.FREE_HEAP_SIZE).setSize(200))
.setTimeToLiveSeconds(60*15);
}
#Bean
public Config hazelCastConfig() {
Config config = new Config()
.setInstanceName("my-application-hazelcast")
.setNetworkConfig(new NetworkConfig().setJoin(new JoinConfig().setMulticastConfig(new MulticastConfig().setEnabled(false))))
.addMapConfig(mapPortfolioCache());
SubZero.useAsGlobalSerializer(config);
return config;
}
}
I have an external OPC UA server, from which I would like to read data. I use username and password authentication, so my client is initialized like follows:
public class MyClient {
// ...
public MyClient() throws Exception {
EndpointDescription[] endpoints =
UaTcpStackClient.getEndpoints(OPCConstants.OPC_SERVER_URI).get();
// using the first endpoint
EndpointDescription endpoint = endpoints[0];
// client configuration
OpcUaClientConfig config = OpcUaClientConfig.builder()
.setApplicationName(LocalizedText.english("Example Client"))
.setApplicationUri(String.format("some:example-client:%s",
UUID.randomUUID()))
.setIdentityProvider(new UsernameProvider(USERNAME, PWD))
.setEndpoint(endpoint)
.build();
}
}
The client's request is the following:
public CompletableFuture<DataValue> getData(NodeId nodeId) {
LOGGER.debug("Sending request");
return client.readValue(60000000.0, TimestampsToReturn.Server, nodeId);
}
I call this request from the main method after initializing the client and connecting it to the server:
MyClient client = new MyClient();
NodeId requestedData = new NodeId(DATA_ID, DATA_KEY);
LOGGER.info("Sending synchronous TestStackRequest NodeId={}",
requestedData);
client.connect();
DataValue response = client.getData(requestedData).get();
LOGGER.info("Received response value={}", response.getValue());
client.disconnect();
However, this code doesn't work (the session is closed when trying to read informations from the server). I get the following output:
2018-04-12 17:43:27,765 DEBUG --- [ua-netty-event-loop-0] Recycler : -Dio.netty.recycler.maxCapacity.default: 262144
2018-04-12 17:43:27,777 DEBUG --- [ua-netty-event-loop-0] UaTcpClientAcknowledgeHandler : Sent Hello message on channel=[id: 0xfd9519e3, L:/172.20.100.54:55805 - R:/172.20.100.135:4840].
2018-04-12 17:43:27,786 DEBUG --- [ua-netty-event-loop-0] UaTcpClientAcknowledgeHandler : Received Acknowledge message on channel=[id: 0xfd9519e3, L:/172.20.100.54:55805 - R:/172.20.100.135:4840].
2018-04-12 17:43:27,793 DEBUG --- [ua-netty-event-loop-0] UaTcpClientMessageHandler : OpenSecureChannel timeout scheduled for +5s
2018-04-12 17:43:27,946 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : Sent OpenSecureChannelRequest (Issue, id=0, currentToken=-1, previousToken=-1).
2018-04-12 17:43:27,951 DEBUG --- [ua-netty-event-loop-0] UaTcpClientMessageHandler : OpenSecureChannel timeout canceled
2018-04-12 17:43:27,961 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : Received OpenSecureChannelResponse.
2018-04-12 17:43:27,967 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : SecureChannel id=1698234671, currentTokenId=1, previousTokenId=-1, lifetime=3600000ms, createdAt=DateTime{utcTime=131680285857690000, javaDate=Thu Apr 12 19:43:05 CEST 2018}
2018-04-12 17:43:27,968 DEBUG --- [ua-netty-event-loop-0] UaTcpClientMessageHandler : 0 message(s) queued before handshake completed; sending now.
2018-04-12 17:43:27,968 DEBUG --- [ua-shared-pool-1] ClientChannelManager : Channel bootstrap succeeded: localAddress=/172.20.100.54:55805, remoteAddress=/172.20.100.135:4840
2018-04-12 17:43:27,996 DEBUG --- [ua-shared-pool-0] ClientChannelManager : disconnect(), currentState=Connected
2018-04-12 17:43:27,997 DEBUG --- [ua-shared-pool-1] ClientChannelManager : Sending CloseSecureChannelRequest...
2018-04-12 17:43:28,000 DEBUG --- [ua-netty-event-loop-0] ClientChannelManager : channelInactive(), disconnect complete
2018-04-12 17:43:28,001 DEBUG --- [ua-netty-event-loop-0] ClientChannelManager : disconnect complete, state set to Idle
2018-04-12 17:43:28,011 INFO --- [main] OpcUaClient : Eclipse Milo OPC UA Stack version: 0.2.1
2018-04-12 17:43:28,011 INFO --- [main] OpcUaClient : Eclipse Milo OPC UA Client SDK version: 0.2.1
2018-04-12 17:43:28,056 DEBUG --- [main] OpcUaClient : Added ServiceFaultListener: org.eclipse.milo.opcua.sdk.client.session.SessionFsm$FaultListener#46d59067
2018-04-12 17:43:28,066 DEBUG --- [main] OpcUaClient : Added SessionActivityListener: org.eclipse.milo.opcua.sdk.client.subscriptions.OpcUaSubscriptionManager$1#78452606
2018-04-12 17:43:28,189 INFO --- [main] CommunicationMain : Sending synchronous TestStackRequest NodeId=NodeId{ns=6, id=::opcua:opcData.outGoing.basic.cycleStep}
2018-04-12 17:43:28,189 DEBUG --- [main] ClientChannelManager : connect(), currentState=NotConnected
2018-04-12 17:43:28,190 DEBUG --- [main] ClientChannelManager : connect() while NotConnected
java.lang.Exception
at org.eclipse.milo.opcua.stack.client.ClientChannelManager.connect(ClientChannelManager.java:67)
at org.eclipse.milo.opcua.stack.client.UaTcpStackClient.connect(UaTcpStackClient.java:127)
at org.eclipse.milo.opcua.sdk.client.OpcUaClient.connect(OpcUaClient.java:313)
at com.mycompany.opcua.participants.MyClient.connect(MyClient.java:147)
at com.mycompany.opcua.participants.CommunicationMain.testClient(CommunicationMain.java:69)
at com.mycompany.opcua.participants.CommunicationMain.main(CommunicationMain.java:51)
2018-04-12 17:43:28,190 DEBUG --- [main] MyClient : Sending request
2018-04-12 17:43:28,197 DEBUG --- [ua-netty-event-loop-1] UaTcpClientAcknowledgeHandler : Sent Hello message on channel=[id: 0xd9b3f832, L:/172.20.100.54:55806 - R:/172.20.100.135:4840].
2018-04-12 17:43:28,204 DEBUG --- [ua-netty-event-loop-1] UaTcpClientAcknowledgeHandler : Received Acknowledge message on channel=[id: 0xd9b3f832, L:/172.20.100.54:55806 - R:/172.20.100.135:4840].
2018-04-12 17:43:28,205 DEBUG --- [ua-netty-event-loop-1] UaTcpClientMessageHandler : OpenSecureChannel timeout scheduled for +5s
2018-04-12 17:43:28,205 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : Sent OpenSecureChannelRequest (Issue, id=0, currentToken=-1, previousToken=-1).
2018-04-12 17:43:28,208 DEBUG --- [ua-netty-event-loop-1] UaTcpClientMessageHandler : OpenSecureChannel timeout canceled
2018-04-12 17:43:28,208 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : Received OpenSecureChannelResponse.
2018-04-12 17:43:28,209 DEBUG --- [ua-shared-pool-0] UaTcpClientMessageHandler : SecureChannel id=1698234672, currentTokenId=1, previousTokenId=-1, lifetime=3600000ms, createdAt=DateTime{utcTime=131680285860260000, javaDate=Thu Apr 12 19:43:06 CEST 2018}
2018-04-12 17:43:28,209 DEBUG --- [ua-netty-event-loop-1] UaTcpClientMessageHandler : 0 message(s) queued before handshake completed; sending now.
2018-04-12 17:43:28,209 DEBUG --- [ua-shared-pool-1] ClientChannelManager : Channel bootstrap succeeded: localAddress=/172.20.100.54:55806, remoteAddress=/172.20.100.135:4840
2018-04-12 17:43:28,210 DEBUG --- [ua-shared-pool-0] SessionFsm : S(Inactive) x E(CreateSessionEvent) = S'(Creating)
Exception in thread "main" java.util.concurrent.ExecutionException: UaException: status=Bad_SessionClosed, message=The session was closed by the client.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)2018-04-12 17:43:28,212 DEBUG --- [ua-shared-pool-1] SessionFsm : Sending CreateSessionRequest...
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at com.mycompany.opcua.participants.CommunicationMain.testClient(CommunicationMain.java:70)
at com.mycompany.opcua.participants.CommunicationMain.main(CommunicationMain.java:51)
Caused by: UaException: status=Bad_SessionClosed, message=The session was closed by the client.
at org.eclipse.milo.opcua.stack.core.util.FutureUtils.failedUaFuture(FutureUtils.java:100)
at org.eclipse.milo.opcua.stack.core.util.FutureUtils.failedUaFuture(FutureUtils.java:88)
at org.eclipse.milo.opcua.sdk.client.session.states.Inactive.(Inactive.java:28)
at org.eclipse.milo.opcua.sdk.client.session.SessionFsm.(SessionFsm.java:69)
at org.eclipse.milo.opcua.sdk.client.OpcUaClient.(OpcUaClient.java:159)2018-04-12 17:43:28,212 INFO --- [NonceUtilSecureRandom] NonceUtil : SecureRandom seeded in 0ms.
at com.mycompany.opcua.participants.MyClient.(MyClient.java:112)
at com.mycompany.opcua.participants.CommunicationMain.testClient(CommunicationMain.java:60)
... 1 more
I use Eclipse milo 0.2.1 as OPC UA library.
Could you please tell me hat can cause this issue and how to fix it? Can it be a race condition related to this?
I can connect to the same server using other client (UaExpert).
Thank you in advance.
All of the calls you're making (connect(), disconnect(), and readValues()) are asynchronous, so what's likely happening here is you're not connected when you attempt the read.
Make sure for these examples you block for the result before moving to the next step. (you're not doing this on connect())
We have a Kubernetes cluster which spins up 4 instances of our application. We'd like to have it share a Hazelcast data grid and keep in synch between these nodes. According to https://github.com/hazelcast/hazelcast-kubernetes the configuration is straightforward. We'd like to use the DNS approach rather than the kubernetes api.
With DNS we are supposed to be able to add the DNS name of our app as described here. So this would be something like myservice.mynamespace.svc.cluster.local.
The problem is that although we have 4 VMs spun up, only one Hazelcast network member is found; thus we see the following in the logs:
Members [1] {
Member [192.168.187.3]:5701 - 50056bfb-b710-43e0-ad58-57459ed399a5 this
}
It seems that there aren't any errors, it just doesn't see any of the other network members.
Here's my configuration. I've tried both using an xml file, like the example on the hazelcast-kubernetes git repo, as well as programmatically. Neither attempt appear to work.
I'm using hazelcast 3.8.
Using hazelcast.xml:
<hazelcast>
<properties>
<!-- only necessary prior Hazelcast 3.8 -->
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<network>
<join>
<!-- deactivate normal discovery -->
<multicast enabled="false"/>
<tcp-ip enabled="false" />
<!-- activate the Kubernetes plugin -->
<discovery-strategies>
<discovery-strategy enabled="true"
class="com.hazelcast.HazelcastKubernetesDiscoveryStrategy">
<properties>
<!-- configure discovery service API lookup -->
<property name="service-dns">myapp.mynamespace.svc.cluster.local</property>
<property name="service-dns-timeout">10</property>
</properties>
</discovery-strategy>
</discovery-strategies>
</join>
</network>
</hazelcast>
Using the XmlConfigBuilder to construct the instance.
Properties properties = new Properties();
XmlConfigBuilder builder = new XmlConfigBuilder();
builder.setProperties(properties);
Config config = builder.build();
this.instance = Hazelcast.newHazelcastInstance(config);
And Programmatically (personal preference if I can get it to work):
Config cfg = new Config();
NetworkConfig networkConfig = cfg.getNetworkConfig();
networkConfig.setPort(hazelcastNetworkPort);
networkConfig.setPortAutoIncrement(true);
networkConfig.setPortCount(100);
JoinConfig joinConfig = networkConfig.getJoin();
joinConfig.getMulticastConfig().setEnabled(false);
joinConfig.getTcpIpConfig().setEnabled(false);
DiscoveryConfig discoveryConfig = joinConfig.getDiscoveryConfig();
HazelcastKubernetesDiscoveryStrategyFactory factory = new HazelcastKubernetesDiscoveryStrategyFactory();
DiscoveryStrategyConfig strategyConfig = new DiscoveryStrategyConfig(factory);
strategyConfig.addProperty("service-dns", kubernetesSvcsDnsName);
strategyConfig.addProperty("service-dns-timeout", kubernetesSvcsDnsTimeout);
discoveryConfig.addDiscoveryStrategyConfig(strategyConfig);
this.instance = Hazelcast.newHazelcastInstance(cfg);
Is anyone farmiliar with this setup? I have ports 5701 - 5800 open. It seems kubernetes starts up and recognizes that discovery mode is on, but only finds the one (local) node.
Here's a snippet from the logs for what it's worth. This was while using the xml file for config:
2017-03-15 08:15:33,688 INFO [main] c.h.c.XmlConfigLocator [StandardLoggerFactory.java:49] Loading 'hazelcast-default.xml' from classpath.
2017-03-15 08:15:33,917 INFO [main] c.g.a.c.a.u.c.HazelcastCacheClient [HazelcastCacheClient.java:112] CONFIG: Config{groupConfig=GroupConfig [name=dev, password=********], properties={}, networkConfig=NetworkConfig{publicAddress='null', port=5701, portCount=100, portAutoIncrement=true, join=JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[], loopbackModeEnabled=false], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[127.0.0.1, 127.0.0.1], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-west-1', securityGroupName='hazelcast-sg', tagKey='type', tagValue='hz-nodes', hostHeader='ec2.amazonaws.com', iamRole='null', connectionTimeoutSeconds=5}, discoveryProvidersConfig=com.hazelcast.config.DiscoveryConfig#3c153a1}, interfaces=InterfacesConfig{enabled=false, interfaces=[10.10.1.*]}, sslConfig=SSLConfig{className='null', enabled=false, implementation=null, properties={}}, socketInterceptorConfig=SocketInterceptorConfig{className='null', enabled=false, implementation=null, properties={}}, symmetricEncryptionConfig=SymmetricEncryptionConfig{enabled=false, iterationCount=19, algorithm='PBEWithMD5AndDES', key=null}}, mapConfigs={default=MapConfig{name='default', inMemoryFormat=BINARY', backupCount=1, asyncBackupCount=0, timeToLiveSeconds=0, maxIdleSeconds=0, evictionPolicy='NONE', mapEvictionPolicy='null', evictionPercentage=25, minEvictionCheckMillis=100, maxSizeConfig=MaxSizeConfig{maxSizePolicy='PER_NODE', size=2147483647}, readBackupData=false, hotRestart=HotRestartConfig{enabled=false, fsync=false}, nearCacheConfig=null, mapStoreConfig=MapStoreConfig{enabled=false, className='null', factoryClassName='null', writeDelaySeconds=0, writeBatchSize=1, implementation=null, factoryImplementation=null, properties={}, initialLoadMode=LAZY, writeCoalescing=true}, mergePolicyConfig='com.hazelcast.map.merge.PutIfAbsentMapMergePolicy', wanReplicationRef=null, entryListenerConfigs=null, mapIndexConfigs=null, mapAttributeConfigs=null, quorumName=null, queryCacheConfigs=null, cacheDeserializedValues=INDEX_ONLY}}, topicConfigs={}, reliableTopicConfigs={default=ReliableTopicConfig{name='default', topicOverloadPolicy=BLOCK, executor=null, readBatchSize=10, statisticsEnabled=true, listenerConfigs=[]}}, queueConfigs={default=QueueConfig{name='default', listenerConfigs=null, backupCount=1, asyncBackupCount=0, maxSize=0, emptyQueueTtl=-1, queueStoreConfig=null, statisticsEnabled=true}}, multiMapConfigs={default=MultiMapConfig{name='default', valueCollectionType='SET', listenerConfigs=null, binary=true, backupCount=1, asyncBackupCount=0}}, executorConfigs={default=ExecutorConfig{name='default', poolSize=16, queueCapacity=0}}, semaphoreConfigs={default=SemaphoreConfig{name='default', initialPermits=0, backupCount=1, asyncBackupCount=0}}, ringbufferConfigs={default=RingbufferConfig{name='default', capacity=10000, backupCount=1, asyncBackupCount=0, timeToLiveSeconds=0, inMemoryFormat=BINARY, ringbufferStoreConfig=RingbufferStoreConfig{enabled=false, className='null', properties={}}}}, wanReplicationConfigs={}, listenerConfigs=[], partitionGroupConfig=PartitionGroupConfig{enabled=false, groupType=PER_MEMBER, memberGroupConfigs=[]}, managementCenterConfig=ManagementCenterConfig{enabled=false, url='http://localhost:8080/mancenter', updateInterval=3}, securityConfig=SecurityConfig{enabled=false, memberCredentialsConfig=CredentialsFactoryConfig{className='null', implementation=null, properties={}}, memberLoginModuleConfigs=[], clientLoginModuleConfigs=[], clientPolicyConfig=PermissionPolicyConfig{className='null', implementation=null, properties={}}, clientPermissionConfigs=[]}, liteMember=false}
2017-03-15 08:15:33,949 INFO [main] c.h.i.DefaultAddressPicker [StandardLoggerFactory.java:49] [LOCAL] [dev] [3.8] Prefer IPv4 stack is true.
2017-03-15 08:15:33,960 INFO [main] c.h.i.DefaultAddressPicker [StandardLoggerFactory.java:49] [LOCAL] [dev] [3.8] Picked [192.168.187.3]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2017-03-15 08:15:34,000 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Hazelcast 3.8 (20170217 - d7998b4) starting at [192.168.187.3]:5701
2017-03-15 08:15:34,001 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Copyright (c) 2008-2017, Hazelcast, Inc. All Rights Reserved.
2017-03-15 08:15:34,001 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Configured Hazelcast Serialization version : 1
2017-03-15 08:15:34,507 INFO [main] c.h.s.i.o.i.BackpressureRegulator [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Backpressure is disabled
2017-03-15 08:15:35,170 INFO [main] c.h.i.Node [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Creating MulticastJoiner
2017-03-15 08:15:35,339 INFO [main] c.h.s.i.o.i.OperationExecutorImpl [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Starting 8 partition threads
2017-03-15 08:15:35,342 INFO [main] c.h.s.i.o.i.OperationExecutorImpl [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Starting 5 generic threads (1 dedicated for priority tasks)
2017-03-15 08:15:35,351 INFO [main] c.h.c.LifecycleService [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] [192.168.187.3]:5701 is STARTING
2017-03-15 08:15:37,463 INFO [main] c.h.system [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8] Cluster version set to 3.8
2017-03-15 08:15:37,466 INFO [main] c.h.i.c.i.MulticastJoiner [StandardLoggerFactory.java:49] [192.168.187.3]:5701 [dev] [3.8]
Members [1] {
Member [192.168.187.3]:5701 - 50056bfb-b710-43e0-ad58-57459ed399a5 this
}
Could you try with service dns name as:
myapp.mynamespace.endpoints.cluster.local
Please reply it's work or not and also post your full log.
I know it's kinda happened long time ago. But the problem here was using wrong class name for discovery strategy.
It should be com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy