Couchbase concurrent timeout exception : Java SDK

Couchbase concurrent timeout exception : Java SDK - java

I am working with java in a maven project. I was using couchbase 2.3.1 but in trying to resolve this issue I rolled back to 2.2.8 to no avail.
The issue I get is that while I do get date through to my couchbase cluster I am seeing alot of this:
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:359)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:354)
Below are the settings for my couchbase environment:
CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword='null', queryEnabled=false, queryPort=8093, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=24, computationPoolSize=24, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=1, queryServiceEndpoints=1, searchServiceEndpoints=1, ioPool=NioEventLoopGroup, coreScheduler=CoreScheduler, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.2.8 (git: 2.2.8, core: 1.2.9), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, disconnectTimeout=25000, dnsSrvEnabled=false}
Im not really too sure what to look at here. As far as I can tell there should be a decent enough connection between the server where the app is running and the couchbase cluster. Any help or direction on this would be helpful. Here is a snippet from where the error is being thrown.
LockableItem<InnerVertex> lv = this.getInnerVertex(id);
lv.lock();
try {
String content;
try {
content = mapper.writeValueAsString(lv.item);
} catch (JsonProcessingException e) {
LOG.warning(e.getMessage());
return;
}
RawJsonDocument d = RawJsonDocument.create(VertexId.toKey(id), content);
bucket.upsert(d);
} finally {
lv.unlock();
}

I was searching for the answer. I got there are many solutions all are talking about exception. I also checked the jar code, it is called that it is timeout exception.
Root Cause Analysis
The error occured from the following section of couchbase: https://github.com/couchbase/couchbase-java-client/blob/master/src/main/java/com/couchbase/client/java/util/Blocking.java#L71
public static <T> T blockForSingle(final Observable<? extends T> observable, final long timeout,
final TimeUnit tu) {
final CountDownLatch latch = new CountDownLatch(1);
TrackingSubscriber<T> subscriber = new TrackingSubscriber<T>(latch);
observable.subscribe(subscriber);
try {
if (!latch.await(timeout, tu)) { // From here, this error occurs.
throw new RuntimeException(new TimeoutException());
}
}
If the timeout kicks in, a TimeoutException nested in a
RuntimeException is thrown to be fully compatible with the
Observable.timeout(long, TimeUnit) behavior.
Resource Link:
http://docs.couchbase.com/sdk-api/couchbase-java-client-2.2.0/com/couchbase/client/java/util/Blocking.html
Your configuration analysis and solution:
Your couchbase environment connectionTimeout is 5000ms or 5sec, which is the default value of connection timeout.
You need to increase this value to 10000ms or greater. Your problem will be solved.
//this tunes the SDK (to customize connection timeout)
CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
.connectTimeout(10000) //10000ms = 10s, default is 5s
.build();
A full Solution
Simonbasle has given a full solution in this tutorial:
From the short log, it looks like the SDK is able to connect to the
node, but takes a little much time to open the bucket. How good is the
network link between the two machines? Is this a VM/cloud machine?
What you can try to do is increase the connect timeout:
public class NoSQLTest {
public static void main(String[] args) {
try {
//this tunes the SDK (to customize connection timeout)
CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
.connectTimeout(10000) //10000ms = 10s, default is 5s
.build();
System.out.println("Create connection");
//use the env during cluster creation to apply
Cluster cluster = CouchbaseCluster.create(env, "10.115.224.94");
System.out.println("Try to openBucket");
Bucket bucket = cluster.openBucket("beer-sample"); //you can also force a greater timeout here (cluster.openBucket("beer-sample", 10, TimeUnit.SECONDS))
System.out.println("disconnect");
cluster.disconnect();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
As a side note, you should always reuse the CouchbaseEnvironment,
CouchbaseCluster and Bucket instances once created (usually by making
them public static somewhere, or a Spring singleton, etc...). These
are thread safe and should be shared (and they are expensive to create
anyway).
Resource Link:
Couchbase connection timeout with Java SDK

Thanks for the question, and for #SkyWalker's Answer.
They helped when I encountered this annoying timeout.
For Spring Data Couchbase 2, adding the following to application.properties solved it
spring.couchbase.env.timeouts.connect=20000

Related

Spring application's timer threads keep increasing

I have some Java app using Spring Batch. I've got a table used as a queue which contains information on jobs that were requested by clients (as a client requests for a task to be executed, a row is added to this queue).
In one of my classes a while loop is run until someone deactivates some flag :
protected void runJobLaunchingLoop() {
while (!isTerminated()) {
try {
if (isActivated()) {
QueueEntryDTO queueEntry = dequeueJobEntry();
launchJob(queueEntry);
}
}
catch (EmptyQueueException ignored) {}
catch (Exception exception) {
logger.error("There was a problem while de-queuing a job ('" + exception.getMessage() + "').");
}
finally {
pauseProcessor();
}
}
}
The pauseProcessor() method calls Thread.sleep(). When I run this app in a Docker container it looks like the number of threads run by the application keep on increasing. The threads have the name "Timer-X" with X some integer that auto-increments.
I looked at the stack trace of one of these :
"Timer-14" - Thread t#128
java.lang.Thread.State: WAITING
at java.base#11.0.6/java.lang.Object.wait(Native Method)
- waiting on <25e60c31> (a java.util.TaskQueue)
at java.base#11.0.6/java.lang.Object.wait(Unknown Source)
at java.base#11.0.6/java.util.TimerThread.mainLoop(Unknown Source)
- locked <25e60c31> (a java.util.TaskQueue)
at java.base#11.0.6/java.util.TimerThread.run(Unknown Source)
Locked ownable synchronizers:
- None
Any idea what could be the cause of this? I'm not sure but if I don't run the app in a container but locally from IntelliJ, it seems like the problem does not occur. I'm not sure because sometimes it takes a while before thread count starts increasing.
EDIT : Some relevant code ...
protected QueueEntryDTO dequeueJobEntry() {
Collection<QueueEntryDTO> collection = getQueueService().dequeueEntry();
if (collection.isEmpty())
throw new EmptyQueueException();
return collection.iterator().next();
}
#Transactional
public Collection<QueueEntryDTO> dequeueEntry() {
Optional<QueueEntry> optionalEntry = this.queueEntryDAO.findTopByStatusCode(QueueStatusEnum.WAITING.getStatusCode());
if (optionalEntry.isPresent()) {
QueueEntry entry = (QueueEntry)optionalEntry.get();
QueueEntry updatedEntry = this.saveEntryStatus(entry, QueueStatusEnum.PROCESSING, (String)null);
return Collections.singleton(this.queueEntryDTOMapper.toDTO(updatedEntry));
} else {
return new ArrayList();
}
}
private void pauseProcessor() {
try {
Long sleepDuration = generalProperties.getQueueProcessingSleepDuration();
sleepDuration = Objects.requireNonNullElseGet(
sleepDuration,
() -> Double.valueOf(Math.pow(2.0, getRetries()) * 1000.0).longValue());
Thread.sleep(sleepDuration);
if (getRetries() < 4)
setRetries(getRetries() + 1);
}
catch (Exception ignored) {
logger.warn("Failed to pause job queue processor.");
}
}

It seems like this was caused by a bug that was resolved in a more recent version of DB2 than I was using.
Applications are getting large number of timer threads when API
timerLevelforQueryTimeout value is not set explicitly in an
application using JCC driver version 11.5 GA (JCC 4.26.14) or
later.
This issue is fixed in 11.5 M4 FP0(JCC 4.27.25).
I updated the version to a newer one (11.5.6) in my POM file, but this didn't fix the issue. Turns out my K8s pod was still using 11.5.0 and Maven acted weird. I then applied this technique (using dependencyManagement in the POM file) and the newer version was loaded.

When doing a redeploy of JBoss WAR with Apache Ignite, Failed to marshal custom event: StartRoutineDiscoveryMessage

I am trying to make it so I can redeploy a JBoss 7.1.0 cluster with a WAR that has apache ignite.
I am starting the cache like this:
System.setProperty("IGNITE_UPDATE_NOTIFIER", "false");
igniteConfiguration = new IgniteConfiguration();
int failureDetectionTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT", "60000"));
igniteConfiguration.setFailureDetectionTimeout(failureDetectionTimeout);
String igniteVmIps = getProperty("IGNITE_VM_IPS");
List<String> addresses = Arrays.asList("127.0.0.1:47500");
if (StringUtils.isNotBlank(igniteVmIps)) {
addresses = Arrays.asList(igniteVmIps.split(","));
}
int networkTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_NETWORK_TIMEOUT", "60000"));
boolean failureDetectionTimeoutEnabled = Boolean.parseBoolean(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT_ENABLED", "true"));
int tcpDiscoveryLocalPort = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT", "47500"));
int tcpDiscoveryLocalPortRange = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT_RANGE", "0"));
TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
tcpDiscoverySpi.setLocalPort(tcpDiscoveryLocalPort);
tcpDiscoverySpi.setLocalPortRange(tcpDiscoveryLocalPortRange);
tcpDiscoverySpi.setNetworkTimeout(networkTimeout);
tcpDiscoverySpi.failureDetectionTimeoutEnabled(failureDetectionTimeoutEnabled);
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
ipFinder.setAddresses(addresses);
tcpDiscoverySpi.setIpFinder(ipFinder);
igniteConfiguration.setDiscoverySpi(tcpDiscoverySpi);
Ignite ignite = Ignition.start(igniteConfiguration);
ignite.cluster().active(true);
Then I am stopping the cache when the application undeploys:
ignite.close();
When I try to redeploy, I get the following error during initialization.
org.apache.ignite.spi.IgniteSpiException: Failed to marshal custom event: StartRoutineDiscoveryMessage [startReqData=StartRequestData [prjPred=org.apache.ignite.internal.cluster.ClusterGroupAdapter$CachesFilter#7385a997, clsName=null, depInfo=null, hnd=org.apache.ignite.internal.GridEventConsumeHandler#2aec6952, bufSize=1, interval=0, autoUnsubscribe=true], keepBinary=false, deserEx=null, routineId=bbe16e8e-2820-4ba0-a958-d5f644498ba2]
If I full restart the server, starts up fine.
Am I missing some magic in the shutdown process?

I see what I did wrong, and it was code I omitted from the ticket.
ignite.events(ignite.cluster().forCacheNodes(cacheConfig.getKey())).remoteListen(locLsnr, rmtLsnr,
EVT_CACHE_OBJECT_PUT, EVT_CACHE_OBJECT_READ, EVT_CACHE_OBJECT_REMOVED);
When it was trying to register this code twice, it was causing that strange error.
I put a try-catch ignore around it for now and things seem to be ok.

Kafka listener isn't listening to Kafka

I'm using Java 11 and kafka-client 2.0.0.
I'm using the following code to generate a consumer :
public Consumer createConsumer(Properties properties,String regex) {
log.info("Creating consumer and listener..");
Consumer consumer = new KafkaConsumer<>(properties);
ConsumerRebalanceListener listener = new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
log.info("The following partitions were revoked from consumer : {}", Arrays.toString(partitions.toArray()));
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
log.info("The following partitions were assigned to consumer : {}", Arrays.toString(partitions.toArray()));
}
};
consumer.subscribe(Pattern.compile(regex), listener);
log.info("consumer subscribed");
return consumer;
}
}
My poll loop is in a different place in the code :
public <K, V> void startWorking(Consumer<K, V> consumer) {
try {
while (true) {
ConsumerRecords<K, V> records = consumer.poll(600);
if (records.count() > 0) {
log.info("Polled {} records", records.count());
} else {
log.info("polled 0 records.. going to sleep..");
Thread.sleep(200);
}
}
} catch (WakeupException | InterruptedException e) {
log.error("Consumer is shutting down", e);
} finally {
consumer.close();
}
}
When I run the code and use this function, the consumer is created and the log contains the following messages :
Creating consumer and listener..
consumer subscribed
polled 0 records.. going to sleep..
polled 0 records.. going to sleep..
polled 0 records.. going to sleep..
The log doesn't contain any info regarding the partition assignment/revocation.
In addition I'm able to see in the log the properties that the consumer uses (group.id is set) :
2020-07-09 14:31:07.959 DEBUG 7342 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [server1:9092]
check.crcs = true
client.id =
group.id=mygroupid
key.deserializer=..
value.deserializer=..
So I tried to use the kafka-console-consumer with the same configuration in order to consume from one of the topics that the regex(mytopic.*) should catch (in this case I used the topic mytopic-1) :
/usr/bin/kafka-console-consumer.sh --bootstrap-server server1:9092 --topic mytopic-1 --property print.timestamp=true --consumer.config /data/scripts/kafka-consumer.properties --from-begining
I have a poll loop in other part of my code that is timing out every 10m .So the bottom line - the problem is that partitions aren't assigned to the Java consumer. The prints inside the listener never happen and the consumer doesn't have any partitions to listen to.

It seems that I was missing the ssl property in my properties file. Don't forget to specify the security.protocol=ssl if you use ssl. It seems that kafka-client API doesn't throw exception if the Kafka uses ssl and you try to access it without ssl parameter configured.

Is there an alternative for GroupReduceFunction running apache-flink java in parallel?

The code below is running locally but not on the cluster. It hangs on GroupReduceFunction and do not terminates even after hours (it takes for large data ~ 9 minutes to compute locally). The last message in the log:
GroupReduce (GroupReduce at main(MyClass.java:80)) (1/1) (...) switched from DEPLOYING to RUNNING.
The code fragment:
DataSet<MyData1> myData1 = env.createInput(new UserDefinedFunctions.MyData1Set());
DataSet<MyData2> myData2 = DataSetUtils.sampleWithSize(myData1, false, 8, Long.MAX_VALUE)
.reduceGroup(new GroupReduceFunction<MyData1, MyData2>() {
#Override
public void reduce(Iterable<MyData1> itrbl, Collector<MyData2> clctr) throws Exception {
int id = 0;
for (MyData1 myData1 : itrbl) {
clctr.collect(new MyData2(id++, myData1));
}
}
});
Any ideas how I could run this segment in parallel? Thanks in advance!

java.lang.IllegalStateException: A connection to a distributed system already exists in this VM. It has the following configuration:

enter image description here public class GemfireTest {
public static void main(String[] args) throws NameResolutionException, TypeMismatchException, QueryInvocationTargetException, FunctionDomainException {
ServerLauncher serverLauncher = new ServerLauncher.Builder()
.setMemberName("server1")
.setServerPort(40404)
.set("start-locator", "127.0.0.1[9090]")
.build();
serverLauncher.start();
String queryString = "SELECT * FROM /gemregion";
ClientCache cache = new ClientCacheFactory().create();
QueryService queryService = cache.getQueryService();
Query query = queryService.newQuery(queryString);
SelectResults results = (SelectResults)query.execute();
int size = results.size();
System.out.println(size);
}
}
trying to run a locator and a server inside my java application getting an exception below:
Exception in thread "main" java.lang.IllegalStateException: A
connection to a distributed system already exists in this VM. It has
the following configuration: ack-severe-alert-threshold="0"
ack-wait-threshold="15" archive-disk-space-limit="0"
archive-file-size-limit="0" async-distribution-timeout="0"
async-max-queue-size="8" async-queue-timeout="60000"
bind-address="" cache-xml-file="cache.xml"
cluster-configuration-dir="" cluster-ssl-ciphers="any"
cluster-ssl-enabled="false" cluster-ssl-keystore=""
cluster-ssl-keystore-password="" cluster-ssl-keystore-type=""
cluster-ssl-protocols="any"
cluster-ssl-require-authentication="true" cluster-ssl-truststore=""
cluster-ssl-truststore-password="" conflate-events="server"
conserve-sockets="true" delta-propagation="true"
deploy-working-dir="C:\Users\Saranya\IdeaProjects\Gemfire"
disable-auto-reconnect="false" disable-tcp="false"
distributed-system-id="-1" distributed-transactions="false"
durable-client-id="" durable-client-timeout="300"
enable-cluster-configuration="true"
enable-network-partition-detection="true"
enable-time-statistics="false" enforce-unique-host="false"
gateway-ssl-ciphers="any" gateway-ssl-enabled="false"
gateway-ssl-keystore="" gateway-ssl-keystore-password=""
gateway-ssl-keystore-type="" gateway-ssl-protocols="any"
gateway-ssl-require-authentication="true" gateway-ssl-truststore=""
gateway-ssl-truststore-password="" groups=""
http-service-bind-address="" http-service-port="7070"
http-service-ssl-ciphers="any" http-service-ssl-enabled="false"
http-service-ssl-keystore="" http-service-ssl-keystore-password=""
http-service-ssl-keystore-type="" http-service-ssl-protocols="any"
http-service-ssl-require-authentication="false"
http-service-ssl-truststore=""
http-service-ssl-truststore-password="" jmx-manager="false"
jmx-manager-access-file="" jmx-manager-bind-address=""
jmx-manager-hostname-for-clients="" jmx-manager-http-port="7070"
jmx-manager-password-file="" jmx-manager-port="1099"
jmx-manager-ssl-ciphers="any" jmx-manager-ssl-enabled="false"
jmx-manager-ssl-keystore="" jmx-manager-ssl-keystore-password=""
jmx-manager-ssl-keystore-type="" jmx-manager-ssl-protocols="any"
jmx-manager-ssl-require-authentication="true"
jmx-manager-ssl-truststore="" jmx-manager-ssl-truststore-password=""
jmx-manager-start="false" jmx-manager-update-rate="2000"
load-cluster-configuration-from-dir="false" locator-wait-time="0"
locators="127.0.0.1[9090]" (wanted "") lock-memory="false"
log-disk-space-limit="0"
log-file="C:\Users\Saranya\IdeaProjects\Gemfire\server1.log"
(wanted "") log-file-size-limit="0" log-level="config" max-num-reconnect-tries="3" max-wait-time-reconnect="60000"
mcast-address="/239.192.81.1" mcast-flow-control="1048576, 0.25,
5000" mcast-port="0" mcast-recv-buffer-size="1048576"
mcast-send-buffer-size="65535" mcast-ttl="32"
member-timeout="5000" membership-port-range="[1024,65535]"
memcached-bind-address="" memcached-port="0"
memcached-protocol="ASCII" name="server1" (wanted "")
off-heap-memory-size="" redis-bind-address="" redis-password=""
redis-port="0" redundancy-zone="" remote-locators=""
remove-unresponsive-client="false" roles=""
security-client-accessor="" security-client-accessor-pp=""
security-client-auth-init="" security-client-authenticator=""
security-client-dhalgo="" security-log-file=""
security-log-level="config" security-manager=""
security-peer-auth-init="" security-peer-authenticator=""
security-peer-verifymember-timeout="1000" security-post-processor=""
security-shiro-init="" security-udp-dhalgo=""
serializable-object-filter="!" server-bind-address=""
server-ssl-ciphers="any" server-ssl-enabled="false"
server-ssl-keystore="" server-ssl-keystore-password=""
server-ssl-keystore-type="" server-ssl-protocols="any"
server-ssl-require-authentication="true" server-ssl-truststore=""
server-ssl-truststore-password="" socket-buffer-size="32768"
socket-lease-time="60000" ssl-ciphers="any" ssl-cluster-alias=""
ssl-default-alias="" ssl-enabled-components="[]"
ssl-gateway-alias="" ssl-jmx-alias="" ssl-keystore=""
ssl-keystore-password="" ssl-keystore-type="" ssl-locator-alias=""
ssl-protocols="any" ssl-require-authentication="true"
ssl-server-alias="" ssl-truststore="" ssl-truststore-password=""
ssl-truststore-type="" ssl-web-alias=""
ssl-web-require-authentication="false" start-dev-rest-api="false"
start-locator="127.0.0.1[9090]" (wanted "")*
statistic-archive-file="" statistic-sample-rate="1000"
statistic-sampling-enabled="true" tcp-port="0"
udp-fragment-size="60000" udp-recv-buffer-size="1048576"
udp-send-buffer-size="65535" use-cluster-configuration="true"
user-command-packages="" validate-serializable-objects="false"
at
org.apache.geode.distributed.internal.InternalDistributedSystem.validateSameProperties(InternalDistributedSystem.java:2959)
at
org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:199)
at
org.apache.geode.cache.client.ClientCacheFactory.basicCreate(ClientCacheFactory.java:243)
at
org.apache.geode.cache.client.ClientCacheFactory.create(ClientCacheFactory.java:214)
at GemfireTest.main(GemfireTest.java:61)
How to solve this exception?

The error here is pretty self explanatory: you can’t have more than one connection to a distributed system within a single JVM. In this particular case you’re starting both a server cache (ServerLauncher) and a client cache (ClientCacheFactory) within the same JVM, which is not supported.
To solve the issue, use two different applications or JVMs, one for the server and another one for the client executing the query.
Cheers.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Couchbase concurrent timeout exception : Java SDK - java

Thanks for the question, and for #SkyWalker's Answer. They helped when I encountered this annoying timeout. For Spring Data Couchbase 2, adding the following to application.properties solved it spring.couchbase.env.timeouts.connect=20000

Related

Spring application's timer threads keep increasing

When doing a redeploy of JBoss WAR with Apache Ignite, Failed to marshal custom event: StartRoutineDiscoveryMessage

Kafka listener isn't listening to Kafka

Is there an alternative for GroupReduceFunction running apache-flink java in parallel?

java.lang.IllegalStateException: A connection to a distributed system already exists in this VM. It has the following configuration:

Categories

Resources