I have some Java app using Spring Batch. I've got a table used as a queue which contains information on jobs that were requested by clients (as a client requests for a task to be executed, a row is added to this queue).
In one of my classes a while loop is run until someone deactivates some flag :
protected void runJobLaunchingLoop() {
while (!isTerminated()) {
try {
if (isActivated()) {
QueueEntryDTO queueEntry = dequeueJobEntry();
launchJob(queueEntry);
}
}
catch (EmptyQueueException ignored) {}
catch (Exception exception) {
logger.error("There was a problem while de-queuing a job ('" + exception.getMessage() + "').");
}
finally {
pauseProcessor();
}
}
}
The pauseProcessor() method calls Thread.sleep(). When I run this app in a Docker container it looks like the number of threads run by the application keep on increasing. The threads have the name "Timer-X" with X some integer that auto-increments.
I looked at the stack trace of one of these :
"Timer-14" - Thread t#128
java.lang.Thread.State: WAITING
at java.base#11.0.6/java.lang.Object.wait(Native Method)
- waiting on <25e60c31> (a java.util.TaskQueue)
at java.base#11.0.6/java.lang.Object.wait(Unknown Source)
at java.base#11.0.6/java.util.TimerThread.mainLoop(Unknown Source)
- locked <25e60c31> (a java.util.TaskQueue)
at java.base#11.0.6/java.util.TimerThread.run(Unknown Source)
Locked ownable synchronizers:
- None
Any idea what could be the cause of this? I'm not sure but if I don't run the app in a container but locally from IntelliJ, it seems like the problem does not occur. I'm not sure because sometimes it takes a while before thread count starts increasing.
EDIT : Some relevant code ...
protected QueueEntryDTO dequeueJobEntry() {
Collection<QueueEntryDTO> collection = getQueueService().dequeueEntry();
if (collection.isEmpty())
throw new EmptyQueueException();
return collection.iterator().next();
}
#Transactional
public Collection<QueueEntryDTO> dequeueEntry() {
Optional<QueueEntry> optionalEntry = this.queueEntryDAO.findTopByStatusCode(QueueStatusEnum.WAITING.getStatusCode());
if (optionalEntry.isPresent()) {
QueueEntry entry = (QueueEntry)optionalEntry.get();
QueueEntry updatedEntry = this.saveEntryStatus(entry, QueueStatusEnum.PROCESSING, (String)null);
return Collections.singleton(this.queueEntryDTOMapper.toDTO(updatedEntry));
} else {
return new ArrayList();
}
}
private void pauseProcessor() {
try {
Long sleepDuration = generalProperties.getQueueProcessingSleepDuration();
sleepDuration = Objects.requireNonNullElseGet(
sleepDuration,
() -> Double.valueOf(Math.pow(2.0, getRetries()) * 1000.0).longValue());
Thread.sleep(sleepDuration);
if (getRetries() < 4)
setRetries(getRetries() + 1);
}
catch (Exception ignored) {
logger.warn("Failed to pause job queue processor.");
}
}
It seems like this was caused by a bug that was resolved in a more recent version of DB2 than I was using.
Applications are getting large number of timer threads when API
timerLevelforQueryTimeout value is not set explicitly in an
application using JCC driver version 11.5 GA (JCC 4.26.14) or
later.
This issue is fixed in 11.5 M4 FP0(JCC 4.27.25).
I updated the version to a newer one (11.5.6) in my POM file, but this didn't fix the issue. Turns out my K8s pod was still using 11.5.0 and Maven acted weird. I then applied this technique (using dependencyManagement in the POM file) and the newer version was loaded.
Related
The code below is running locally but not on the cluster. It hangs on GroupReduceFunction and do not terminates even after hours (it takes for large data ~ 9 minutes to compute locally). The last message in the log:
GroupReduce (GroupReduce at main(MyClass.java:80)) (1/1) (...) switched from DEPLOYING to RUNNING.
The code fragment:
DataSet<MyData1> myData1 = env.createInput(new UserDefinedFunctions.MyData1Set());
DataSet<MyData2> myData2 = DataSetUtils.sampleWithSize(myData1, false, 8, Long.MAX_VALUE)
.reduceGroup(new GroupReduceFunction<MyData1, MyData2>() {
#Override
public void reduce(Iterable<MyData1> itrbl, Collector<MyData2> clctr) throws Exception {
int id = 0;
for (MyData1 myData1 : itrbl) {
clctr.collect(new MyData2(id++, myData1));
}
}
});
Any ideas how I could run this segment in parallel? Thanks in advance!
If I throw enough traffic at a performance environment, I can cause out of memory errors. If I do a thread dump on this instance, it will report a deadlock:
Found one Java-level deadlock:
=============================
"http-nio-8080-exec-9":
waiting to lock monitor 0x00007fb528003098 (object 0x00000000826d6018, a com.mysql.jdbc.LoadBalancedConnectionProxy),
which is held by "Finalizer"
"Finalizer":
waiting to lock monitor 0x00007fb528002fe8 (object 0x00000000826d5f98, a com.mysql.jdbc.ReplicationConnectionProxy),
which is held by "http-nio-8080-exec-9"
I think the OOM is happening because the Finalizer is deadlocked and can't garbage collect unused instances. I'm not sure how to troubleshoot the deadlock though. The http-nio-8080-exec-9 thread is doing this:
"http-nio-8080-exec-9":
at com.mysql.jdbc.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:104)
- waiting to lock <0x00000000826d6018> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.sun.proxy.$Proxy60.setPingTarget(Unknown Source)
at com.mysql.jdbc.ReplicationConnectionProxy.invokeMore(ReplicationConnectionProxy.java:311)
at com.mysql.jdbc.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:457)
- locked <0x00000000826d5f98> (a com.mysql.jdbc.ReplicationConnectionProxy)
at com.sun.proxy.$Proxy57.prepareStatement(Unknown Source)
at org.apache.commons.dbcp2.DelegatingConnection.prepareStatement(DelegatingConnection.java:291)
at org.apache.commons.dbcp2.DelegatingConnection.prepareStatement(DelegatingConnection.java:291)
my code here
On this line of my code, my code is doing this:
preparedStatement = connect.prepareStatement(sqlQuery);
The finalizer thread is doing this:
"Finalizer":
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6597)
- waiting to lock <0x00000000826d5f98> (a com.mysql.jdbc.ReplicationConnectionProxy)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851)
at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.mysql.jdbc.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:108)
- locked <0x00000000826d6018> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.sun.proxy.$Proxy61.close(Unknown Source)
at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.mysql.jdbc.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:108)
- locked <0x00000000826d6018> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.sun.proxy.$Proxy62.close(Unknown Source)
at org.apache.commons.dbcp2.DelegatingResultSet.close(DelegatingResultSet.java:169)
at org.apache.commons.dbcp2.DelegatingStatement.close(DelegatingStatement.java:149)
at org.apache.commons.dbcp2.DelegatingStatement.finalize(DelegatingStatement.java:549)
at java.lang.System$2.invokeFinalize(System.java:1270)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
The ResultSetImpl.realClose is doing this when it deadlocks:
synchronized (locallyScopedConn.getConnectionMutex()) {
Both seem related to mysql jdbc drivers. We are using org.apache.commons.dbcp2.BasicDataSource for our connection pooling. Here's the code where we set up our connection:
private static final BasicDataSource dataSource = new BasicDataSource();
private void setUpConnectionPool()
{
final String JDBC_CONNECTION_STRING = System.getProperty("JDBC_CONNECTION_STRING");
final String DB_USER_STRING = System.getProperty("DB_USER_STRING");
final String DB_PASSWORD_STRING = System.getProperty("DB_PASSWORD_STRING");
final int MAX_CONNECTIONS = System.getProperty("MAX_CONNECTIONS") == null ? 100 : Integer.valueOf(System.getProperty("MAX_CONNECTIONS"));
try {
ReplicationDriver driver = new ReplicationDriver();
dataSource.setUrl(JDBC_CONNECTION_STRING);
dataSource.setDriver(driver);
dataSource.setUsername(DB_USER_STRING);
dataSource.setPassword(DB_PASSWORD_STRING);
dataSource.setMaxTotal(MAX_CONNECTIONS);
dataSource.setConnectionProperties("autoReconnect=true;roundRobinLoadBalance=true;");
} catch (Exception e) {
e.printStackTrace();
}
}
The above method is called when the context is initialized. Then, all code gets its connection by calling this method:
public static Connection getConnection() throws SQLException {
return dataSource.getConnection();
}
I'm hoping there's something obviously wrong with this code, but I don't see an obvious reason why it would cause a deadlock.
I have a java application with below properties
kafkaProperties = new Properties();
kafkaProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokersList);
kafkaProperties.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroupName);
kafkaProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
kafkaProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
kafkaProperties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, consumerSessionTimeoutMs);
kafkaProperties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, maxPartitionFetchBytes);
kafkaProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
I've created 15 consumer threads and let them process the below runnable .I don't have any other consumer with this consumer group name consuming .
#Override
public void run() {
try {
logger.info("Starting ConsumerWorker, consumerId={}", consumerId);
consumer.subscribe(Arrays.asList(kafkaTopic), offsetLoggingCallback);
while (true) {
boolean isPollFirstRecord = true;
logger.debug("consumerId={}; about to call consumer.poll() ...", consumerId);
ConsumerRecords<String, String> records = consumer.poll(pollIntervalMs);
Map<Integer,Long> partitionOffsetMap = new HashMap<>();
for (ConsumerRecord<String, String> record : records) {
if (isPollFirstRecord) {
isPollFirstRecord = false;
logger.info("Start offset for partition {} in this poll : {}", record.partition(), record.offset());
}
messageProcessor.processMessage(record.value(), record.offset());
partitionOffsetMap.put(record.partition(),record.offset());
}
if (!records.isEmpty()) {
logger.info("Invoking commit for partition/offset : {}", partitionOffsetMap);
consumer.commitAsync(offsetLoggingCallback);
}
}
} catch (WakeupException e) {
logger.warn("ConsumerWorker [consumerId={}] got WakeupException - exiting ... Exception: {}",
consumerId, e.getMessage());
} catch (Exception e) {
logger.error("ConsumerWorker [consumerId={}] got Exception - exiting ... Exception: {}",
consumerId, e.getMessage());
} finally {
logger.warn("ConsumerWorker [consumerId={}] is shutting down ...", consumerId);
consumer.close();
}
}
I also have a OffsetCommitCallbackImpl like below . It basically maintains the partition's and their commited offset as map .It also logs whenever offset is committed .
#Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
if (exception == null) {
offsets.forEach((topicPartition, offsetAndMetadata) -> {
partitionOffsetMap.put(topicPartition, offsetAndMetadata);
logger.info("Offset position during the commit for consumerId : {}, partition : {}, offset : {}", Thread.currentThread().getName(), topicPartition.partition(), offsetAndMetadata.offset());
});
} else {
offsets.forEach((topicPartition, offsetAndMetadata) -> logger.error("Offset commit error, and partition offset info : {}, partition : {}, offset : {}", exception.getMessage(), topicPartition.partition(), offsetAndMetadata.offset()));
}
}
Problem/Issue :
I noticed that i miss events/messages whenever i (restart) bring the application down and bring it back up . So when i closely looked at the logging . by comparing the offsets that are committed(using offsetcommitcallback logging) before shutdown vs offsets that are picked up for processing after restart, i see that for certain partition we did not pickup the offset where we left before shutdown. sometimes the start offset for certain partition's are like 1000 more than the committed offset .
NOTE : This happens to like 8 out of 40 partitions
If you closely look at the logging in run method there is one log statement where i actually print the offset before invoking async commit . For example if that last log before shutdown shows that as 10 for partition 1 . After restart the first offset we are processing for partition 1 would be like 100 . And i validated that we are exactly missing 90 messages .
Can any one think of a reason why this would be happening ?
I am working with java in a maven project. I was using couchbase 2.3.1 but in trying to resolve this issue I rolled back to 2.2.8 to no avail.
The issue I get is that while I do get date through to my couchbase cluster I am seeing alot of this:
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:359)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:354)
Below are the settings for my couchbase environment:
CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword='null', queryEnabled=false, queryPort=8093, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=24, computationPoolSize=24, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=1, queryServiceEndpoints=1, searchServiceEndpoints=1, ioPool=NioEventLoopGroup, coreScheduler=CoreScheduler, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.2.8 (git: 2.2.8, core: 1.2.9), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, disconnectTimeout=25000, dnsSrvEnabled=false}
Im not really too sure what to look at here. As far as I can tell there should be a decent enough connection between the server where the app is running and the couchbase cluster. Any help or direction on this would be helpful. Here is a snippet from where the error is being thrown.
LockableItem<InnerVertex> lv = this.getInnerVertex(id);
lv.lock();
try {
String content;
try {
content = mapper.writeValueAsString(lv.item);
} catch (JsonProcessingException e) {
LOG.warning(e.getMessage());
return;
}
RawJsonDocument d = RawJsonDocument.create(VertexId.toKey(id), content);
bucket.upsert(d);
} finally {
lv.unlock();
}
I was searching for the answer. I got there are many solutions all are talking about exception. I also checked the jar code, it is called that it is timeout exception.
Root Cause Analysis
The error occured from the following section of couchbase: https://github.com/couchbase/couchbase-java-client/blob/master/src/main/java/com/couchbase/client/java/util/Blocking.java#L71
public static <T> T blockForSingle(final Observable<? extends T> observable, final long timeout,
final TimeUnit tu) {
final CountDownLatch latch = new CountDownLatch(1);
TrackingSubscriber<T> subscriber = new TrackingSubscriber<T>(latch);
observable.subscribe(subscriber);
try {
if (!latch.await(timeout, tu)) { // From here, this error occurs.
throw new RuntimeException(new TimeoutException());
}
}
If the timeout kicks in, a TimeoutException nested in a
RuntimeException is thrown to be fully compatible with the
Observable.timeout(long, TimeUnit) behavior.
Resource Link:
http://docs.couchbase.com/sdk-api/couchbase-java-client-2.2.0/com/couchbase/client/java/util/Blocking.html
Your configuration analysis and solution:
Your couchbase environment connectionTimeout is 5000ms or 5sec, which is the default value of connection timeout.
You need to increase this value to 10000ms or greater. Your problem will be solved.
//this tunes the SDK (to customize connection timeout)
CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
.connectTimeout(10000) //10000ms = 10s, default is 5s
.build();
A full Solution
Simonbasle has given a full solution in this tutorial:
From the short log, it looks like the SDK is able to connect to the
node, but takes a little much time to open the bucket. How good is the
network link between the two machines? Is this a VM/cloud machine?
What you can try to do is increase the connect timeout:
public class NoSQLTest {
public static void main(String[] args) {
try {
//this tunes the SDK (to customize connection timeout)
CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
.connectTimeout(10000) //10000ms = 10s, default is 5s
.build();
System.out.println("Create connection");
//use the env during cluster creation to apply
Cluster cluster = CouchbaseCluster.create(env, "10.115.224.94");
System.out.println("Try to openBucket");
Bucket bucket = cluster.openBucket("beer-sample"); //you can also force a greater timeout here (cluster.openBucket("beer-sample", 10, TimeUnit.SECONDS))
System.out.println("disconnect");
cluster.disconnect();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
As a side note, you should always reuse the CouchbaseEnvironment,
CouchbaseCluster and Bucket instances once created (usually by making
them public static somewhere, or a Spring singleton, etc...). These
are thread safe and should be shared (and they are expensive to create
anyway).
Resource Link:
Couchbase connection timeout with Java SDK
Thanks for the question, and for #SkyWalker's Answer.
They helped when I encountered this annoying timeout.
For Spring Data Couchbase 2, adding the following to application.properties solved it
spring.couchbase.env.timeouts.connect=20000
How can I remove couple of records in one transaction?
Configs:
EnvironmentConfig myEnvConfig = new EnvironmentConfig();
StoreConfig storeConfig = new StoreConfig();
myEnvConfig.setReadOnly(readOnly);
storeConfig.setReadOnly(readOnly);
// If the environment is opened for write, then we want to be
// able to create the environment and entity store if
// they do not exist.
myEnvConfig.setAllowCreate(!readOnly);
storeConfig.setAllowCreate(!readOnly);
// Allow transactions if we are writing to the store.
myEnvConfig.setTransactional(!readOnly);
storeConfig.setTransactional(!readOnly);
// Open the environment and entity store
bklEnv = new Environment(envHome, myEnvConfig);
//bklEnv.openDatabase(null, envHome.getAbsolutePath(), myEnvConfig);
bklstore = new EntityStore(bklEnv, entryStore, storeConfig);
Clear old data in cyclically. Here we are clearing data by getting firstKey() from index:
public void clearOldDBData(Integer maxCount) throws DatabaseException {
TransactionConfig config = new TransactionConfijg();
config.setReadUncommitted(true);
Transaction txn = berkeleyDbEnv.getBklEnv().beginTransaction(null, config);
txn.setTxnTimeout(1000);
Long keyV = null;
try{
PrimaryIndex<Long,MemoryBTB> memoryBTBIndex =
berkeleyDbEnv.getBklstore().getPrimaryIndex(Long.class, MemoryBTB.class);
if(!memoryBTBIndex.sortedMap().isEmpty() && memoryBTBIndex.sortedMap().keySet().size() > maxCount){
for(int i = 0; i < memoryBTBIndex.sortedMap().keySet().size() - maxCount; i++){
log.trace(BERKELEYDB_CLEAR_DATA);
System.out.println("**************************************************");
PrimaryIndex<Long,MemoryBTB> memoryBTBIndexInternal =
berkeleyDbEnv.getBklstore().getPrimaryIndex(Long.class, MemoryBTB.class);
memoryBTBIndexInternal.delete(txn, memoryBTBIndexInternal.sortedMap().firstKey());
}
}
txn.commit();
System.out.println("+++++++++++++++++++++++++++++++++++++++++++++++++++");
}catch(DatabaseException dbe){
// one more time deleting
try {
Thread.sleep(100);
dataAccessor.getMemoryBTB().delete(txn, keyV);
txn.commit();
}catch(DatabaseException dbeInternal){
log.trace(String.format(TXN_ABORT, dbeInternal.getMessage()));
txn.abort();
} catch (InterruptedException e) {
e.printStackTrace();
throw dbe;
}
}
}
Stacktrace:
[12/12 10:35:20] - TRACE - BerkeleyRepository - Berkeley DB clear data
**************************************************
[12/12 10:35:20] - TRACE - BerkeleyRepository - Berkeley DB clear data
**************************************************
[12/12 10:35:21] - TRACE - MemService - Berkeley DB JSON produce error: (JE 3.3.75) Lock expired. Locker 7752330 -1_Thread-295_ThreadLocker: waited for lock on database=persist#MemoryEntityStore#com.company.memcheck.persists.MemoryBTB LockAddr:1554328 node=333 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1386862520718 endTime=1386862521218
Owners: [<LockInfo locker="31510392 17395_Thread-295_Txn" type="WRITE"/>]
Waiters: []
com.sleepycat.util.RuntimeExceptionWrapper: (JE 3.3.75) Lock expired. Locker 7752330 -1_Thread-295_ThreadLocker: waited for lock on database=persist#MemoryEntityStore#com.com pany.memcheck.persists.MemoryBTB LockAddr:1554328 node=333 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1386862520718 endTime=1386862521218
Owners: [<LockInfo locker="31510392 17395_Thread-295_Txn" type="WRITE"/>]
Waiters: []
at com.sleepycat.collections.StoredContainer.convertException(StoredContainer.java:466)
at com.sleepycat.collections.StoredSortedMap.getFirstOrLastKey(StoredSortedMap.java:216)
at com.sleepycat.collections.StoredSortedMap.firstKey(StoredSortedMap.java:185)
at com.company.memcheck.repository.BerkeleyRepositoryImpl.clearOldDBData(BerkeleyRepositoryImpl.java:142)
at com.company.memcheck.service.MemServiceImpl.removeOldData(MemServiceImpl.java:305)
at com.company.memcheck.service.MemServiceImpl.access$3(MemServiceImpl.java:299)
at com.company.memcheck.service.MemServiceImpl$2.run(MemServiceImpl.java:129)
at java.lang.Thread.run(Thread.java:662)
So as we can see only one entry in a loop where "*******" indicating that it was successful but the others failed.
Do I use cursor for this one?