I am working on something where I need to pull data from MariaDB (using HikariCP), and then send it through Redis. Eventually, when I try to pull from the database, the connection will start leaking. This only happens over time, and suddenly.
Here is the full log from when the leak started happening: https://hastebin.com/sekiximehe.makefile
Here is some debug info:
21:04:40 [INFO] 21:04:40.680 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Before cleanup stats (total=6, active=2, idle=4, waiting=0)
21:04:40 [INFO] 21:04:40.680 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - After cleanup stats (total=6, active=2, idle=4, waiting=0)
21:04:40 [INFO] 21:04:40.682 [HikariPool-1 connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection org.mariadb.jdbc.MariaDbConnection#4b7a5e97
21:04:40 [INFO] 21:04:40.682 [HikariPool-1 connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - After adding stats (total=7, active=2, idle=5, waiting=0)
21:05:05 [INFO] 21:05:05.323 [HikariPool-1 housekeeper] WARN com.zaxxer.hikari.pool.ProxyLeakTask - Connection leak detection triggered for org.mariadb.jdbc.MariaDbConnection#52ede989 on thread Thread-272, stack trace follows
java.lang.Exception: Apparent connection leak detected
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:123)
at us.survivewith.bungee.database.FetchPlayerInfo.run(FetchPlayerInfo.java:29)
at java.lang.Thread.run(Thread.java:748)
21:05:10 [INFO] 21:05:10.681 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Before cleanup stats (total=7, active=2, idle=5, waiting=0)
21:05:10 [INFO] 21:05:10.681 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - After cleanup stats (total=7, active=2, idle=5, waiting=0)
21:05:39 [INFO] 21:05:39.352 [HikariPool-1 housekeeper] WARN com.zaxxer.hikari.pool.ProxyLeakTask - Connection leak detection triggered for org.mariadb.jdbc.MariaDbConnection#3cba7850 on thread Thread-274, stack trace follows
java.lang.Exception: Apparent connection leak detected
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:123)
at us.survivewith.bungee.database.FetchPlayerInfo.run(FetchPlayerInfo.java:29)
at java.lang.Thread.run(Thread.java:748)
Here is the FetchPlayerInfo.run() method:
#Override
public void run()
{
String select = "SELECT `Rank`,`Playtime` FROM `Players` WHERE PlayerUUID=?;";
// This is line 29. How can this possibly be causing a leak?
try(Connection connection = Database.getHikari().getConnection())
{
// Get the data by querying the Players table
try(PreparedStatement serverSQL = connection.prepareStatement(select))
{
serverSQL.setString(1, player);
// Execute statement
try(ResultSet serverRS = serverSQL.executeQuery())
{
// If a row exists
if(serverRS.next())
{
String rank = serverRS.getString("Rank");
Jedis jPublisher = Redis.getJedis().getResource();
jPublisher.publish("playerconnections", player + "~" + serverRS.getInt("Playtime") + "~" + rank);
}
else
{
Jedis jPublisher = Redis.getJedis().getResource();
jPublisher.publish("playerconnections", player + "~" + 0 + "~DEFAULT");
}
}
}
}
catch(SQLException e)
{
//Print out any exception while trying to prepare statement
e.printStackTrace();
}
}
This is how I've setup my Database class:
/**
* This class is used to connect to the database
*/
public class Database
{
private static HikariDataSource hikari;
/**
* Connects to the database
*/
public static void connectToDatabase(String address,
String db,
String user,
String password,
int port)
{
// Setup main Hikari instance
hikari = new HikariDataSource();
hikari.setMaximumPoolSize(20);
hikari.setLeakDetectionThreshold(60 * 1000);
hikari.setDataSourceClassName("org.mariadb.jdbc.MariaDbDataSource");
hikari.addDataSourceProperty("serverName", address);
hikari.addDataSourceProperty("port", port);
hikari.addDataSourceProperty("databaseName", db);
hikari.addDataSourceProperty("user", user);
hikari.addDataSourceProperty("password", password);
}
/**
* Returns an instance of Hikari.
* This instance is connected to the database that contains all data.
* The stats table is only used in this database every other day
*
* #return The main HikariDataSource
*/
public static HikariDataSource getHikari()
{
return hikari;
}
And this is how I am calling the FetchPlayerInfo class:
new Thread(new FetchPlayerInfo(player.getUniqueId().toString())).start();
EDIT:
The problem still persists after using a synchronized getConnection() method from the Database class.
Jedis is also a resource of JedisPool you should close:
/// Jedis implements Closeable. Hence, the jedis instance will be auto-closed after the last statement.
try (Jedis jedis = pool.getResource()) {
What version of HikariCP? It is possible that the leak is not actually a leak. The leak will be reported when the connection is out of the pool for longer than the threshold, he may actually be returned later. Newer versions of HikariCP will log “unleaked” Connections.
EDIT: I am as close to 100% certain as I can be that here is no race condition in HikariCP. This scenario is far to simple, and HikariCP is used by far too many users (millions) for such a fundamental flaw to not have surfaced before.
The only thing that makes sense, looking at the code above and the logs generated, is that one of the calls inside of the outer try-catch is hanging (blocking). I suggest getting a stack dump when the condition occurs, to find if there is a thread blocked inside of FetchPlayerInfo.run().
Related
In my javagent, I started a HttpServer:
public static void premain(String agentArgs, Instrumentation inst) throws InstantiationException, IOException {
HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
server.createContext("/report", new ReportHandler());
server.createContext("/data", new DataHandler());
server.createContext("/stack", new StackHandler());
ExecutorService es = Executors.newCachedThreadPool(new ThreadFactory() {
int count = 0;
#Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setDaemon(true);
t.setName("JDBCLD-HTTP-SERVER" + count++);
return t;
}
});
server.setExecutor(es);
server.start();
// how to properly close ?
Runtime.getRuntime().addShutdownHook(new Thread() {
#Override
public void run() {
server.stop(5);
log.info("internal httpserver has been closed.");
es.shutdown();
try {
if (!es.awaitTermination(60, TimeUnit.SECONDS)) {
log.warn("executor service of internal httpserver not closing in 60 seconds");
es.shutdownNow();
if (!es.awaitTermination(60, TimeUnit.SECONDS))
log.error("executor service of internal httpserver not closing in 120 seconds, give up");
}else {
log.info("executor service of internal httpserver closed.");
}
} catch (InterruptedException ie) {
log.warn("thread interrupted, shutdown executor service of internal httpserver");
es.shutdownNow();
Thread.currentThread().interrupt();
}
}
});
// other instrumention code ignored ...
}
testing programe:
public class AgentTest {
public static void main(String[] args) throws SQLException {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:oracle:thin:#172.31.27.182:1521/pas");
config.setUsername("pas");
config.setPassword("pas");
HikariDataSource ds = new HikariDataSource(config);
Connection c = ds.getConnection();
Connection c1 = ds.getConnection();
c.getMetaData();
try {
Thread.sleep(1000 * 60 * 10);
} catch (InterruptedException e) {
e.printStackTrace();
c.close();
c1.close();
ds.close();
}
c.close();
c1.close();
ds.close();
}
}
When target jvm exit, I want the to stop that HttpServer. but when my testing java programe finish, main thread stoped but the whole jvm process won't terminate, shutdown hook in above code won't execute. if I click 'terminate' button in eclipse IDE, eclipse will show a error:
but at least jvm will exit, and my shutdown hook get invoked.
according to the java doc of java.lang.Runtime:
The Java virtual machine shuts down in response to two kinds of
events:
The program exits normally, when the last non-daemon thread exits or
when the exit (equivalently, System.exit) method is invoked, or The
virtual machine is terminated in response to a user interrupt, such as
typing ^C, or a system-wide event, such as user logoff or system
shutdown.
com.sun.net.httpserver.HttpServer will started a non-daemon dispatcher thread, that thread will exit when HttpServer#stop get called, so I am facing a deadlock.
non-daemon thread not finish -> shutdown hook not triggered -> can't
stop server -> non-daemon thread not finish
Any good idea? please note I can't modify code of targeting application.
UPDATES after apply kriegaex's answer
I added some logging to watch dog thread, and here is outputs:
2021-09-22 17:30:00.967 INFO - Connnection#1594791957 acquired by 40A4F128987F8BD9C0EE6749895D1237
2021-09-22 17:30:00.968 DEBUG - Stack#40A4F128987F8BD9C0EE6749895D1237:
java.lang.Throwable:
at com.zaxxer.hikari.pool.ProxyConnection.<init>(ProxyConnection.java:102)
at com.zaxxer.hikari.pool.HikariProxyConnection.<init>(HikariProxyConnection.java)
at com.zaxxer.hikari.pool.ProxyFactory.getProxyConnection(ProxyFactory.java)
at com.zaxxer.hikari.pool.PoolEntry.createProxyConnection(PoolEntry.java:97)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:192)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
at agenttest.AgentTest.main(AgentTest.java:19)
2021-09-22 17:30:00.969 INFO - Connnection#686560878 acquired by 464555C270688B747CA211DE489B7730
2021-09-22 17:30:00.969 DEBUG - Stack#464555C270688B747CA211DE489B7730:
java.lang.Throwable:
at com.zaxxer.hikari.pool.ProxyConnection.<init>(ProxyConnection.java:102)
at com.zaxxer.hikari.pool.HikariProxyConnection.<init>(HikariProxyConnection.java)
at com.zaxxer.hikari.pool.ProxyFactory.getProxyConnection(ProxyFactory.java)
at com.zaxxer.hikari.pool.PoolEntry.createProxyConnection(PoolEntry.java:97)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:192)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
at agenttest.AgentTest.main(AgentTest.java:20)
2021-09-22 17:30:00.971 DEBUG - Connnection#1594791957 used by getMetaData
2021-09-22 17:30:01.956 DEBUG - there is still 12 active threads, keep wathcing
2021-09-22 17:30:01.956 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true HikariPool-1 connection adder#true
2021-09-22 17:30:02.956 DEBUG - there is still 12 active threads, keep wathcing
2021-09-22 17:30:02.956 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true HikariPool-1 connection adder#true
2021-09-22 17:30:03.957 DEBUG - there is still 12 active threads, keep wathcing
2021-09-22 17:30:03.957 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true HikariPool-1 connection adder#true
2021-09-22 17:30:04.959 DEBUG - there is still 12 active threads, keep wathcing
2021-09-22 17:30:04.959 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true HikariPool-1 connection adder#true
2021-09-22 17:30:05.959 DEBUG - there is still 12 active threads, keep wathcing
2021-09-22 17:30:05.960 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true HikariPool-1 connection adder#true
2021-09-22 17:30:06.960 DEBUG - there is still 11 active threads, keep wathcing
2021-09-22 17:30:06.960 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true
2021-09-22 17:30:07.961 DEBUG - there is still 11 active threads, keep wathcing
2021-09-22 17:30:07.961 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true
2021-09-22 17:30:08.961 DEBUG - there is still 11 active threads, keep wathcing
2021-09-22 17:30:08.961 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true
2021-09-22 17:30:09.962 DEBUG - there is still 11 active threads, keep wathcing
2021-09-22 17:30:09.962 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true
2021-09-22 17:30:10.962 DEBUG - there is still 11 active threads, keep wathcing
2021-09-22 17:30:10.963 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true main#false server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true HikariPool-1 housekeeper#true
2021-09-22 17:30:10.976 INFO - Connnection#1594791957 released
2021-09-22 17:30:10.976 DEBUG - set connection count to 0 by stack hash 40A4F128987F8BD9C0EE6749895D1237
2021-09-22 17:30:10.976 INFO - Connnection#686560878 released
2021-09-22 17:30:10.976 DEBUG - set connection count to 0 by stack hash 464555C270688B747CA211DE489B7730
2021-09-22 17:30:11.963 DEBUG - there is still 10 active threads, keep wathcing
2021-09-22 17:30:11.963 DEBUG - Reference Handler#true Finalizer#true Signal Dispatcher#true server-timer#true Thread-2#false jdbcld-watch-dog#false Timer-0#true oracle.jdbc.driver.BlockSource.ThreadedCachingBlockSource.BlockReleaser#true InterruptTimer#true DestroyJavaVM#false
2021-09-22 17:30:12.964 DEBUG - there is still 10 active threads, keep wathcing
updates
I want to support all kinds of java application, include web application running with servlet containers and standard alone javase applications.
Here is a little MCVE illustrating ewrammer's idea. I used the little byte-buddy-agent helper library for dynamically attaching an agent in order to make my example self-contained, starting the Java agent right from the main method. I omitted the 3 trivial no-op dummy handler classes necessary to run this example.
package org.acme.agent;
import com.sun.net.httpserver.HttpServer;
import net.bytebuddy.agent.ByteBuddyAgent;
import java.io.IOException;
import java.lang.instrument.Instrumentation;
import java.net.InetSocketAddress;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;
public class Agent {
public static void premain(String agentArgs, Instrumentation inst) throws IOException {
HttpServer httpServer = HttpServer.create(new InetSocketAddress(8000), 0);
ExecutorService executorService = getExecutorService(httpServer);
Runtime.getRuntime().addShutdownHook(getShutdownHook(httpServer, executorService));
// other instrumention code ignored ...
startWatchDog();
}
private static ExecutorService getExecutorService(HttpServer server) {
server.createContext("/report", new ReportHandler());
server.createContext("/data", new DataHandler());
server.createContext("/stack", new StackHandler());
ExecutorService executorService = Executors.newCachedThreadPool(new ThreadFactory() {
int count = 0;
#Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setDaemon(true);
t.setName("JDBCLD-HTTP-SERVER" + count++);
return t;
}
});
server.setExecutor(executorService);
server.start();
return executorService;
}
private static Thread getShutdownHook(HttpServer httpServer, ExecutorService executorService) {
return new Thread(() -> {
httpServer.stop(5);
System.out.println("Internal HTTP server has been stopped");
executorService.shutdown();
try {
if (!executorService.awaitTermination(60, TimeUnit.SECONDS)) {
System.out.println("Executor service of internal HTTP server not closing in 60 seconds");
executorService.shutdownNow();
if (!executorService.awaitTermination(60, TimeUnit.SECONDS))
System.out.println("Executor service of internal HTTP server not closing in 120 seconds, giving up");
}
else {
System.out.println("Executor service of internal HTTP server closed");
}
}
catch (InterruptedException ie) {
System.out.println("Thread interrupted, shutting down executor service of internal HTTP server");
executorService.shutdownNow();
Thread.currentThread().interrupt();
}
});
}
private static void startWatchDog() {
ThreadGroup threadGroup = Thread.currentThread().getThreadGroup();
while (threadGroup.getParent() != null)
threadGroup = threadGroup.getParent();
final ThreadGroup topLevelThreadGroup = threadGroup;
// Plus 1, because of the monitoring thread we are going to start right below
final int activeCount = topLevelThreadGroup.activeCount() + 1;
new Thread(() -> {
do {
try {
Thread.sleep(1000);
}
catch (InterruptedException ignored) {}
} while (topLevelThreadGroup.activeCount() > activeCount);
System.exit(0);
}).start();
}
public static void main(String[] args) throws IOException {
premain(null, ByteBuddyAgent.install());
Random random = new Random();
for (int i = 0; i < 5; i++) {
new Thread(() -> {
int threadDurationSeconds = 1 + random.nextInt(10);
System.out.println("Starting thread with duration " + threadDurationSeconds + " s");
try {
Thread.sleep(threadDurationSeconds * 1000);
System.out.println("Finishing thread after " + threadDurationSeconds + " s");
}
catch (InterruptedException ignored) {}
}).start();
}
}
}
As you can see, this is basically your example code, refactored into a few helper methods for readability, plus the new watchdog method. It is quite straightforward.
This produces a console log like:
Starting thread with duration 6 s
Starting thread with duration 6 s
Starting thread with duration 8 s
Starting thread with duration 7 s
Starting thread with duration 5 s
Finishing thread after 5 s
Finishing thread after 6 s
Finishing thread after 6 s
Finishing thread after 7 s
Finishing thread after 8 s
internal httpserver has been closed.
executor service of internal httpserver closed.
I have a requirement where I need to process messages from Kafka without losing any message and also need to maintain the message order. Therefore, I used transactions and enabled 'exactly_once' processing guarantee in my Kafka streams topology. As I assume that the topology processing will be 'all or nothing', that the message offset is committed only after the last node successfully processed the message.
However in a failure scenario, for example when the database is down and the processor fails to store message and throws an exception. At this point, the topology dies as intended and is recreated automatically on rebalance. I assume that the topology should either re-consume the original message again from the Kafka topic OR on application restart, it should re-consume that original message from Kafka topic. However, it seems that original message disappears and is never consumed or processed after that topology died.
What do I need to do to reprocess the original message sent to Kafka topic? Or what Kafka configuration requires change? Do I need manually assign a state store and keep track of messages processed on a changelog topic?
Topology:
#Singleton
public class EventTopology extends Topology {
private final Deserializer<String> deserializer = Serdes.String().deserializer();
private final Serializer<String> serializer = Serdes.String().serializer();
private final EventLogMessageSerializer eventLogMessageSerializer;
private final EventLogMessageDeserializer eventLogMessageDeserializer;
private final EventLogProcessorSupplier eventLogProcessorSupplier;
#Inject
public EventTopology(EventsConfig eventsConfig,
EventLogMessageSerializer eventLogMessageSerializer,
EventLogMessageDeserializer eventLogMessageDeserializer,
EventLogProcessorSupplier eventLogProcessorSupplier) {
this.eventLogMessageSerializer = eventLogMessageSerializer;
this.eventLogMessageDeserializer = eventLogMessageDeserializer;
this.eventLogProcessorSupplier = eventLogProcessorSupplier;
init(eventsConfig);
}
private void init(EventsConfig eventsConfig) {
var topics = eventsConfig.getTopicConfig().getTopics();
String eventLog = topics.get("eventLog");
addSource("EventsLogSource", deserializer, eventLogMessageDeserializer, eventLog)
.addProcessor("EventLogProcessor", eventLogProcessorSupplier, "EventsLogSource");
}
}
Processor:
#Singleton
#Slf4j
public class EventLogProcessor implements Processor<String, EventLogMessage> {
private final EventLogService eventLogService;
private ProcessorContext context;
#Inject
public EventLogProcessor(EventLogService eventLogService) {
this.eventLogService = eventLogService;
}
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public void process(String key, EventLogMessage value) {
log.info("Processing EventLogMessage={}", value);
try {
eventLogService.storeInDatabase(value);
context.commit();
} catch (Exception e) {
log.warn("Failed to process EventLogMessage={}", value, e);
throw e;
}
}
#Override
public void close() {
}
}
Configuration:
eventsConfig:
saveTopicsEnabled: false
topologyConfig:
environment: "LOCAL"
broker: "localhost:9093"
enabled: true
initialiseWaitInterval: 3 seconds
applicationId: "eventsTopology"
config:
auto.offset.reset: latest
session.timeout.ms: 6000
fetch.max.wait.ms: 7000
heartbeat.interval.ms: 5000
connections.max.idle.ms: 7000
security.protocol: SSL
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: org.apache.kafka.common.serialization.StringSerializer
max.poll.records: 5
processing.guarantee: exactly_once
metric.reporters: com.simple.metrics.kafka.DropwizardReporter
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
enable.idempotence: true
request.timeout.ms: 8000
acks: all
batch.size: 16384
linger.ms: 1
enable.auto.commit: false
state.dir: "/tmp"
topicConfig:
topics:
eventLog: "EVENT-LOG-LOCAL"
kafkaTopicConfig:
partitions: 18
replicationFactor: 1
config:
retention.ms: 604800000
Test:
Feature: Feature covering the scenarios to process event log messages produced by external client.
Background:
Given event topology is healthy
Scenario: event log messages produced are successfully stored in the database
Given database is down
And the following event log messages are published
| deptId | userId | eventType | endDate | eventPayload_partner |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | PARTNER-1 |
When database is up
And database is healthy
Then event log stored in the database as follows
| dept_id | user_id | event_type | end_date | event_payload |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | {"partner":"PARTNER-1"} |
Logs:
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 0 (__consumer_offsets-0) (reason: Adding new member eventsTopology-57fdac0e-09fb-4aa0-8b0b-7e01809b31fa-StreamThread-1-consumer-96a3e980-4286-461e-8536-5f04ccb2c778 with group instance id None)
INFO [executor-Rebalance] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 1 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 1
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 0 on partition __transaction_state-4
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_1 with producerId 1 and producer epoch 0 on partition __transaction_state-3
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_16 with producerId 17 and producer epoch 0 on partition __transaction_state-37
INFO [data-plane-kafka-request-handler-4] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_1 with producerId 18 and producer epoch 0 on partition __transaction_state-42
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_0 with producerId 19 and producer epoch 0 on partition __transaction_state-43
...
INFO [data-plane-kafka-request-handler-3] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_17 with producerId 34 and producer epoch 0 on partition __transaction_state-45
INFO [data-plane-kafka-request-handler-5] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 0 on partition __transaction_state-46
INFO [pool-26-thread-1] ManagerClient - Manager request {uri:http://localhost:8081/healthcheck, method:GET, body:'', headers:{}}
INFO [pool-26-thread-1] ManagerClient - Manager response from with body {"Database":{"healthy":true},"eventsTopology":{"healthy":true}}
INFO [dw-admin-130] KafkaConnectionCheck - successfully connected to kafka broker: localhost:9093
INFO [kafka-producer-network-thread | EVENT-LOG-LOCAL-test-client-id] LocalTestEnvironment - Message: ProducerRecord(topic=EVENT-LOG-LOCAL, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"endDate":1618444800000,"deptId":"dept-1","userId":"user-1234","eventType":"CREATE","eventPayload":{"previousEndDate":null,"partner":"PARTNER-1","info":null}}, timestamp=null) pushed onto topic: EVENT-LOG-LOCAL
INFO [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Processing EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
WARN [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Failed to process EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at manager.service.EventLogService.storeInDatabase(EventLogService.java:24)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:47)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:19)
at org.apache.kafka.streams.processor.internals.ProcessorNode.lambda$process$2(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:236)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:216)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:168)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:96)
at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$1(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1033)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:690)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.TaskManager - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Failed to process stream task 0_8 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.KafkaStreams - stream-client [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3] All stream threads have died. The instance will be in error state and should be closed.
Exception: java.lang.IllegalStateException thrown from the UncaughtExceptionHandler in thread "eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1"
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f in group eventsTopology has failed, removing it from the group
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 1 (__consumer_offsets-0) (reason: removing member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f on heartbeat expiration)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 2 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 2
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 1 on partition __transaction_state-4
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 1 on partition __transaction_state-46
INFO [main] Cluster - New databse host localhost/127.0.0.1:59423 added
com.jayway.awaitility.core.ConditionTimeoutException: Condition defined as a lambda expression in steps.EventLogSteps
Expecting:
<0>
to be equal to:
<1>
but was not. within 20 seconds.
I am using the CloseableHttpClient with the PoolingHttpClientConnectionManager. I am using this client to make POST requests to a single URL. PoolingHttpClientConnectionManager is set with a "max total" of 3 connections, TTL set to 5 seconds, connection/socket timeout 5 secs. Here is what I see (note that the requests are made sequentially not concurrently):
POST request #1: 1 connection in the pool
POST request #2: 2 connections in the pool
POST request #3: 3 connections in the pool
POST request #4: last used connection is closed, a new connection is created
I'm not sure why the last used connection is closed and a new connection is established with the fourth request. Why doesn't the connection manager reuse of the existing connections?
Here is what I see in the log:
2020-05-20 22:34:12 DEBUG o.a.h.i.c.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://xxxxx.com:443][total kept alive: 2; total allocated: 2 of 3]
2020-05-20 22:34:12 DEBUG o.a.h.i.c.LoggingManagedHttpClientConnection - http-outgoing-3: Close connection
I traced it down to the following code in HttpComponent's AbstractConnPool.java (httpcore-4.4.4.jar) file (function getPoolEntryBlocking):
int maxPerRoute = this.getMax(route);
int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
int totalUsed;
if (excess > 0) {
for(totalUsed = 0; totalUsed < excess; ++totalUsed) {
E lastUsed = pool.getLastUsed();
if (lastUsed == null) {
break;
}
lastUsed.close();
this.available.remove(lastUsed);
pool.remove(lastUsed);
}
}
Can someone explain why it needs to close the connection (lastUsed.close())?
Drop table using the datastax driver for Cassandra doesn't look to be working. create table works but drop table does not and does not throw an exception. 1) Am I doing the drop correctly? 2) Anyone else seen this behavior?
In the output you can see the table gets created and apparently dropped as it is not in the second table listing in the first run. However, when I reconnect (second run) the table is there resulting in an exception.
import java.util.Collection;
import com.datastax.driver.core.*;
public class Fail {
SimpleStatement createTableCQL = new SimpleStatement("create table test_table(testfield varchar primary key)");
SimpleStatement dropTableCQL = new SimpleStatement("drop table test_table");
Session session = null;
Cluster cluster = null;
public Fail()
{
System.out.println("First Run");
this.run();
System.out.println("Second Run");
this.run();
}
private void run()
{
try
{
cluster = Cluster.builder().addContactPoints("10.48.8.43 10.48.8.47 10.48.8.53")
.withCredentials("394016","394016")
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ALL))
.build();
session = cluster.connect("gid394016");
}
catch(Exception e)
{
System.err.println(e.toString());
System.exit(1);
}
//create the table
System.out.println("createTableCQL");
this.session.execute(createTableCQL);
//list tables in the keyspace
System.out.println("Table list:");
Collection<TableMetadata> results1 = cluster.getMetadata().getKeyspace("gid394016").getTables();
for (TableMetadata tm : results1)
{
System.out.println(tm.toString());
}
//drop the table
System.out.println("dropTableCQL");
this.session.execute(dropTableCQL);
//list tables in the keyspace
System.out.println("Table list:");
Collection<TableMetadata> results2 = cluster.getMetadata().getKeyspace("gid394016").getTables();
for (TableMetadata tm : results2)
{
System.out.println(tm.toString());
}
session.close();
cluster.close();
}
public static void main(String[] args) {
new Fail();
}
}
Console output:
First Run
[main] INFO com.datastax.driver.core.NettyUtil - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
[main] INFO com.datastax.driver.core.policies.DCAwareRoundRobinPolicy - Using data-center name 'Cassandra' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.51:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.47:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.53:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.49:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host 10.48.8.43 10.48.8.47 10.48.8.53/10.48.8.43:9042 added
createTableCQL
Table list:
CREATE TABLE gid394016.test_table (testfield text, PRIMARY KEY (testfield)) WITH read_repair_chance = 0.0 AND dclocal_read_repair_chance = 0.1 AND gc_grace_seconds = 864000 AND bloom_filter_fp_chance = 0.01 AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' } AND comment = '' AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' } AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.LZ4Compressor' } AND default_time_to_live = 0 AND speculative_retry = '99.0PERCENTILE' AND min_index_interval = 128 AND max_index_interval = 2048;
dropTableCQL
Table list:
Second Run
[main] INFO com.datastax.driver.core.policies.DCAwareRoundRobinPolicy - Using data-center name 'Cassandra' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.51:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.47:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.53:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.48.8.49:9042 added
[main] INFO com.datastax.driver.core.Cluster - New Cassandra host 10.48.8.43 10.48.8.47 10.48.8.53/10.48.8.43:9042 added
createTableCQL
Exception in thread "main" com.datastax.driver.core.exceptions.AlreadyExistsException: Table gid394016.test_table already exists
at com.datastax.driver.core.exceptions.AlreadyExistsException.copy(AlreadyExistsException.java:111)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:217)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:54)
at com.bdcauto.cassandrachecks.Fail.run(Fail.java:38)
at com.bdcauto.cassandrachecks.Fail.<init>(Fail.java:17)
at com.bdcauto.cassandrachecks.Fail.main(Fail.java:65)
Caused by: com.datastax.driver.core.exceptions.AlreadyExistsException: Table gid394016.test_table already exists
at com.datastax.driver.core.exceptions.AlreadyExistsException.copy(AlreadyExistsException.java:130)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:118)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:151)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:175)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:801)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1014)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:937)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.AlreadyExistsException: Table gid394016.test_table already exists
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:69)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:230)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:221)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
... 14 more
You are running this code with the table present in the database and that's why you are getting the "already exists" error. Please connect to the database using cqlsh and check that yourself.
Create, alter and drop table statements are being propagated throughout the cluster asynchronously. Even though you are receiving a response from the coordinator you still need to wait for schema agreement.
I want to execute two tasks on scheduled time (23:59 CET and 08:00 CET). I have created an EJB singleton bean that maintains those methods:
#Singleton
public class OfferManager {
#Schedule(hour = "23", minute = "59", timezone = "CET")
#AccessTimeout(value = 0) // concurrent access is not permitted
public void fetchNewOffers() {
Logger.getLogger(OfferManager.class.getName()).log(Level.INFO, "Fetching new offers started");
// ...
Logger.getLogger(OfferManager.class.getName()).log(Level.INFO, "Fetching new offers finished");
}
#Schedule(hour="8", minute = "0", timezone = "CET")
public void sendMailsWithReports() {
Logger.getLogger(OfferManager.class.getName()).log(Level.INFO, "Generating reports started");
// ...
Logger.getLogger(OfferManager.class.getName()).log(Level.INFO, "Generating reports finished");
}
}
The problem is that both tasks are executed twice. The server is WildFly Beta1, configured in UTC time.
Here are some server logs, that might be useful:
2013-10-20 11:15:17,684 INFO [org.jboss.as.server] (XNIO-1 task-7) JBAS018559: Deployed "crawler-0.3.war" (runtime-name : "crawler-0.3.war")
2013-10-20 21:59:00,070 INFO [com.indeed.control.OfferManager] (EJB default - 1) Fetching new offers started
....
2013-10-20 22:03:48,608 INFO [com.indeed.control.OfferManager] (EJB default - 1) Fetching new offers finished
2013-10-20 23:59:00,009 INFO [com.indeed.control.OfferManager] (EJB default - 2) Fetching new offers started
....
2013-10-20 23:59:22,279 INFO [com.indeed.control.OfferManager] (EJB default - 2) Fetching new offers finished
What might be the cause of such behaviour?
I solved the problem with specifying scheduled time with server time (UTC).
So
#Schedule(hour = "23", minute = "59", timezone = "CET")
was replaced with:
#Schedule(hour = "21", minute = "59")
I don't know the cause of such beahaviour, maybe the early release of Wildfly is the issue.
I had the same problem with TomEE plume 7.0.4. In my case the solution was to change #Singleton to #Stateless.