Customize the automatic reconnection settings to IBM MQ

Customize the automatic reconnection settings to IBM MQ - java

I have written a code to connect to IBM MQ, and i am using ConnectionNameList which automatically reconnects to ibm mq.
I want to customize the reconnection which is happening implicitly. I have referred many articles on the internet but i am not able to figure it out.
This is my Queue Manager Config:
#Configuration
public class QM1Config{
public String queueManager;
public String queue;
public String channel;
public String connName;
public String user;
public String password;
private static final int RECONNECT_TIMEOUT = 10;
#Autowired
MQService config;
#Bean
public MQConnectionFactory mqQueueConnectionFactory() {
this.channel = config.getHosts().get(0).getChannel();
this.user = config.getHosts().get(0).getUser();
this.password = config.getHosts().get(0).getPassword();
this.queueManager = config.getHosts().get(0).getQueueManager();
this.queue = config.getHosts().get(0).getQueue();
this.connName = config.getHosts().get(0).getConnName();
System.out.println(channel+" "+connName+" "+queueManager+" "+user);
MQConnectionFactory mqQueueConnectionFactory = new MQConnectionFactory();
try {
mqQueueConnectionFactory.setTransportType(WMQConstants.WMQ_CM_CLIENT);
mqQueueConnectionFactory.setBooleanProperty(WMQConstants.USER_AUTHENTICATION_MQCSP, false);
mqQueueConnectionFactory.setCCSID(1208);
mqQueueConnectionFactory.setChannel(channel);
mqQueueConnectionFactory.setStringProperty(WMQConstants.USERID, user);
mqQueueConnectionFactory.setStringProperty(WMQConstants.PASSWORD, password);
mqQueueConnectionFactory.setQueueManager(queueManager);
mqQueueConnectionFactory.setConnectionNameList(connName);
} catch (Exception e) {
e.printStackTrace();
}
return mqQueueConnectionFactory;
}
#Bean
public JmsListenerContainerFactory<?> qm1JmsListenerContainerFactory(#Qualifier("mqQueueConnectionFactory") MQConnectionFactory mqQueueConnectionFactory, DefaultJmsListenerContainerFactoryConfigurer configurer) throws InterruptedException {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
this.queue = config.getHosts().get(0).getQueue();
configurer.configure(factory, mqQueueConnectionFactory);
return factory;
}
#Bean("jmsTemplate1")
public JmsTemplate jmsTemplate(#Qualifier("mqQueueConnectionFactory") MQConnectionFactory mqQueueConnectionFactory) {
JmsTemplate jmsTemplate1 = new JmsTemplate(mqQueueConnectionFactory);
return jmsTemplate1;
}
}
When i stop the queue manager, i get the following exceptions every 5 seconds:
2022-04-24 01:17:43.194 WARN 6644 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'Q5' - trying to recover. Cause: JMSWMQ2002: Failed to get a message from destination 'Q5'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
2022-04-24 01:17:43.232 ERROR 6644 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'Q5' - retrying using FixedBackOff{interval=5000, currentAttempts=0, maxAttempts=unlimited}. Cause: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2059' ('MQRC_Q_MGR_NOT_AVAILABLE').
2022-04-24 01:17:48.243 ERROR 6644 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'Q5' - retrying using FixedBackOff{interval=5000, currentAttempts=1, maxAttempts=unlimited}. Cause: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE').
2022-04-24 01:17:53.245 ERROR 6644 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'Q5' - retrying using FixedBackOff{interval=5000, currentAttempts=2, maxAttempts=unlimited}. Cause: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE').
2022-04-24 01:17:58.250 ERROR 6644 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'Q5' - retrying using FixedBackOff{interval=5000, currentAttempts=3, maxAttempts=unlimited}. Cause: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE').
So i want that first 3 reconnection attempts should be a warn messages instead of error message as shown in above logs and 4th attempt onwards i want it to be an error message. and reconnection attempt to be every 10/15 seconds.
How do i configure these re-connection settings ?
Any help would be greatly appreciated ! Thanks !
EDIT: I have added an exception listener as follows:
public class MQExceptionListener implements ExceptionListener {
private static final Logger LOGGER = LoggerFactory.getLogger(MQExceptionListener.class);
int count = -1;
#Override
public void onException(JMSException ex) {
if(count > 2) {
System.out.println("COUNT - "+ count);
count++;
LOGGER.error("***********************************************");
LOGGER.error(ex.toString()+" THIS IS EX TO STRING");
if (ex.getLinkedException() != null) {
LOGGER.error(ex.getLinkedException().toString()+" THIS IS getLinkedException TO STRING");
}
LOGGER.error("================================================");
}else {
System.out.println("COUNT - "+ count);
count++;
LOGGER.warn("***********************************************");
LOGGER.warn(ex.toString()+" THIS IS EX TO STRING");
if (ex.getLinkedException() != null) {
LOGGER.warn(ex.getLinkedException().toString()+" THIS IS getLinkedException TO STRING");
}
LOGGER.warn("================================================");
}
}
}
Now My logs are as follows:
COUNT - 1
2022-04-24 14:41:04.905 WARN 9268 --- [enerContainer-1] com.mq.sslMQ.MQExceptionListener : ***********************************************
2022-04-24 14:41:04.905 WARN 9268 --- [enerContainer-1] com.mq.sslMQ.MQExceptionListener : com.ibm.msg.client.jms.DetailedIllegalStateException: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.
Check the queue manager is started and if running in client mode, check there is a listener running. Please see the linked exception for more information. THIS IS EX TO STRING
2022-04-24 14:41:04.905 WARN 9268 --- [enerContainer-1] com.mq.sslMQ.MQExceptionListener : com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE'). THIS IS getLinkedException TO STRING
2022-04-24 14:41:04.905 WARN 9268 --- [enerContainer-1] com.mq.sslMQ.MQExceptionListener : ================================================
2022-04-24 14:41:04.905 ERROR 9268 --- [enerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'Q5' - retrying using FixedBackOff{interval=5000, currentAttempts=1, maxAttempts=unlimited}. Cause: JMSWMQ0018: Failed to connect to queue manager 'QM5' with connection mode 'Client' and host name 'Client'.; nested exception is com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE').
I dont want the default message listener container log to be printed onto console. how do we achieve that ?

It says in IBM Docs:-
By default, the reconnection attempts happen at the following intervals:
The first attempt is made after an initial delay of 1 second, plus a random element up to 250 milliseconds.
The second attempt is made 2 seconds, plus a random interval of up to 500 milliseconds, after the first attempt fails.
The third attempt is made 4 seconds, plus a random interval of up to 1 second, after the second attempt fails.
The fourth attempt is made 8 seconds, plus a random interval of up to 2 seconds, after the third attempt fails.
The fifth attempt is made 16 seconds, plus a random interval of up to 4 seconds, after the fourth attempt fails.
The sixth attempt, and all subsequent attempts are made 25 seconds, plus a random interval of up to 6 seconds and 250 milliseconds after the previous attempt fails.
The reconnection attempts are delayed by intervals that are partly fixed and partly random. This is to prevent all of the IBM MQ classes for JMS applications that were connected to a queue manager that is no longer available from reconnecting simultaneously.
If you need to increase the default values, to more accurately reflect the amount of time that is required for a queue manager to recover, or a standby queue manager to become active, modify the ReconDelay attribute in the Channel stanza of the client configuration file.
You can read more about this attribute here.
Sounds like you need to put the following into your mqclient.ini file.
CHANNELS:
ReconDelay=(10000,5000)
That is requesting a delay of 10 seconds plus a random element up to 5 seconds, which is my interpretation of your request for 10/15 seconds. You haven't asked for any of the reconnection attempts to be different in timing, although you can do this if you need to.
Note, it is not possible to change the WARN/ERROR status of the messages.
Remember that you can always turn off automatic reconnect and implement whatever you need yourself by catching the connection failures in your application. Automatic reconnect was designed for applications that were unable (or unwilling) to catch the connection failures.

I would add that the 5 second interval reconnect attempts are the DefaultMessageListenerContainer trying to reconnect. The default reconnect interval is 5 seconds - DEFAULT_RECOVERY_INTERVAL, so I'm not thinking that this involves the MQ reconnect mechanism.
With the exception handler listed above in place, you could programatically change the DefaultMessageListenerContainer setRecoveryInterval() or use setBackOff() to control the backoff interval.
As to disabling the logging, setting the log level for the DefaultMessageListenerContainer to FATAL should do it.

Related

Problem at creating EMS application supporting Failover/FaultTolerance

I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS
I configured two EMS servers working both with FaultTolerance enabled:
For EMS running on server on server1 I have
in tibemsd.conf
ft_active = tcp://server2:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server1:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server1:7232,tcp://server2:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server1:7232,tcp://server2:7232
And for EMS running on server on server2 I have
in tibemsd.conf
ft_active = tcp://server1:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server2:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server2:7232,tcp://server1:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server2:7232,tcp://server1:7232
I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:04:58.566 Server is active.
2022-07-20 23:05:18.563 Standby server 'SERVERNAME#server1' has connected.
then if I start EMS on server2, I get
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'
Moreover, if I kill active EMS on server1, I immediately get the following message on server2:
2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
...
2022-07-20 23:21:52.924 Server is now active.
Until here, everything looks OK, active/standby EMS servers seems to be correctly configured
Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:
#Test
public void testEmsFailover() throws JMSException, InterruptedException {
int NB = 1000;
TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
Connection connection = factory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
connection.start();
for (int i = 0; i < NB; i++) {
LOG.info("sending message");
Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
MessageProducer producer = session.createProducer(queue);
MapMessage mapMessage = session.createMapMessage();
mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
producer.send(mapMessage);
LOG.info("done!");
Thread.sleep(1000);
}
}
If I run this code while both active and standby servers are up, everything looks good
23:26:32.431 [main] INFO JmsEndpointTest - sending message
23:26:32.458 [main] INFO JmsEndpointTest - done!
23:26:33.458 [main] INFO JmsEndpointTest - sending message
23:26:33.482 [main] INFO JmsEndpointTest - done!
Now If I kill the active EMS server, I would expect that
the standby server would instantaneously become the active one
my code would continue to publish such as if nothing had happened
However, in my code I get the following error:
javax.jms.JMSException: Connection is closed
at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...
and in the logs of the server (the previous standby server supposed to be now the active one) I get
2022-07-20 23:32:44.447 [anonymous#cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous#cersei]: reconnect failed: connection unknown for id=8
I would appreciate any help to enhance my code
Thank you

I think I found the origin of my problem:
according to the page Tibco-Ems Failover Issue, the error message
reconnect failed: connection unknown for id=8
means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."
I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:
two tibemsd.conf configuration files
configure a different listen port in each file
configure ft_active with url of other server
configure factories.conf
By doing so, I can replay my test and it works as expected

JUnit test case for Camel route for ActiveMQ

I have a camel route in MyRouteBuilder.java file which is consuming messages from ActiveMQ:
from("activemq:queue:myQueue" )
.process(consumeDroppedMessage)
.log(">>> I am here");
I wrote a test case for the following like this :
#Override
public RouteBuilder createRouteBuilder() throws Exception {
return new MyRouteBuilder();
}
#Test
void testMyTest() throws Exception {
String queueInputMessage = "My Msg";
template.sendBody("activemq:queue:myQueue", queueInputMessage);
assertMockEndpointsSatisfied();
}
When I run the unit test case I get this strange error:
7:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Route: route1 >>> Route[activemq://queue:null -> null]
17:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Starting consumer (order: 1000) on route: route1
17:53:26.175 [main] DEBUG org.apache.camel.support.DefaultConsumer - Build consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Init consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Starting consumer: Consumer[activemq://queue:null]
17:53:26.213 [main] DEBUG org.apache.activemq.thread.TaskRunnerFactory - Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor#3fffff43[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
17:53:26.215 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Reconnect was triggered but transport is not started yet. Wait for start to connect the transport.
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Started unconnected
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Waking up reconnect task
17:53:26.335 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
17:53:26.339 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Established shared JMS Connection
17:53:26.340 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Resumed paused task: org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker#58c34bb3
17:53:26.372 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Attempting 0th connect to: tcp://localhost:61616
17:53:28.393 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Connect fail to: tcp://localhost:61616, reason: {}
I am especially stumped to see these messages:
Route: route1 >>> Route[activemq://queue:null -> null]
and
urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
Why is the queue coming up as null though I have a proper queue name? Also why is the broker url tcp://localhost:61616?
I want to run this unit test case so that it runs properly in all environments like: local, DIT , SIT, PROD etc. So, for that I cannot afford the broker url to be: tcp://localhost:61616.
Any ideas as to what I am doing wrong here and what I should be doing?
EDIT 1:
One of the issues that I am seeing is even before the test class is called, the MyRouteBuilder() inside createRouteBuilder() is invoked, leading to the issues that I see in the log.

The "activemq:queue:.." is telling Camel to use the auto-configure magic behind the scenes (which uses default url) and your use case is beyond that.
You need to configure a connection factory (ActiveMQConnectionFactory) and configure a camel-jms component to use that connection factory.
The connection factory allows you to specify url, userName, password, default connection settings and setup SSL.
A best practice is to externalize the url, userName, password and queue to a properties file so you can change those across the environments-- local, DIT, SIT and prod, etc.
NOTE: Use org.apache.camel/camel-jms component, and not the org.apache.activemq/activemq-camel component. activemq-camel is deprecated and being removed in ActiveMQ 5.17.x.

Instead of setting up an explicit active mq broker , I started using a VM broker .
#Override
protected RoutesBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
#Override
public void configure() {
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
ActiveMQComponent activeMQComponent = new ActiveMQComponent();
activeMQComponent.setConnectionFactory(connectionFactory);
context.addComponent("activemq", activeMQComponent);
from("activemq:queue:myQueue").to("mock:collector");
}
};
}
Also , I mistook camel junit as a traditional junit . We don't need to call explicitly the actual route builder class . Instead after setting up my activeMq component up above , I was able to write my test methods, mock my end points for queue and send messages and assert them . Camel is truly versatile . Requires a lot of study though .

KafkaStream EXACTLY_ONCE in stream application results in failed to re-balance if one broker is down

I have a Kafka streaming application with kafka-streams and kafka-clients both 2.4.0
with the following configs
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
brokers= ip1:port1, ip2:port2,ip3:port3,
topic partition: 3
topic replication : 3
Scenario 1: I start only 2 brokers (stream app still contains three ips of broker in broker ip setting) and when i start the my stream app the following error occurs.
2020-02-13 13:28:19.711 WARN 18756 --- [-1-0_0-producer] org.apache.kafka.clients.NetworkClient : [Producer clientId=my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1-0_0-producer, transactionalId=my-app1-0_0] Connection to node -2 (/ip2:port2) could not be established. Broker may not be available.
and later after 1 minute
org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
I was Testing for High availability test scenarios. I think kafka should still work as replications are present in the two brokers properly(I have checked using kafka GUI tool).
Scenario 2: Today i noticed that when i start only 2 brokers and give the ips of theses two brokers (i.e. stream app only has the ip of two working brokers)
2020-02-16 16:18:24.818 INFO 5741 --- [-StreamThread-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1-consumer, groupId=my-app] Group coordinator ip2:port2 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
2020-02-16 16:18:24.818 ERROR 5741 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
Note: This is not the case if i don['t set EXACTLY_ONCE in properties. Them it works as intended.
Tried increasing reties and back off max ms but didn't help.
Can anyone explain what i am missing?
logs of broker2 when broker 1 is down:
[2020-02-17 02:29:00,302] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Retrying leaderEpoch request for partition __consumer_offsets-36 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
Kafak logs are filled with the above line.
Now One Major Observation:
When I turn off broker2(ie. broker 1 and broker 3 are running) then my stream application runs fine.
My App shuts down only when broker 1 is down. I'm guessing some critical information that should be distributed between all brokers is only saved in broker 1.

Rabbit SimpleMessageListenerContainer won't shut down

Following on from this question, we have a scenario where Rabbit credentials become invalidated, and we need to call resetConnection() on our CachingConnectionFactory to pick up a fresh set of credentials.
We're doing this in a ShutdownSignalException handler, and it basically works. What doesn't work is that we also need to restart our listeners. We have a few of these:
#RabbitListener(
id = ABC,
bindings = #QueueBinding(value = #Queue(value="myQ", durable="true"),
exchange = #Exchange(value="myExchange", durable="true"),
key = "myKey"),
containerFactory = "customQueueContainerFactory"
)
public void process(...) {
...
}
The impression given by this answer (also this) is that we just need to do:
#Autowired RabbitListenerEndpointRegistry registry;
#Autowired CachingConnectionFactory connectionFactory;
#Override
public void shutdownCompleted(ShutdownSignalException cause) {
refreshRabbitMQCredentials();
}
public void refreshRabbitMQCredentials() {
registry.stop(); // do this first
// Fetch credentials, update username/pass
connectionFactory.resetConnection(); // then this
registry.start(); // finally restart
}
The problem is that having debugged my way through SimpleMessageListenerContainer, when the very first of these containers has doShutdown() called, Spring tries to cancel the BlockingQueueConsumer.
Because the underlying Channel still reports as being open - even though the RabbitMQ UI doesn't report any connections or channels being open - a Cancel event is sent to the broker inside ChannelN.basicCancel(), but the channel then blocks forever for a reply, and as a result container shutdown is completely blocked.
I've tried injecting a TaskExecutor (a Executors.newCachedThreadPool()) into the containers and calling shutdownNow() or interrupting them, but none of this affects the channel's blocking wait.
It looks like my only option to unblock the channel is to trigger an additional ShutdownSignalException during cancellation, but (a) I don't know how I can do that, and (b) it looks like I would have to initiate cancellation of all listeners in parallel before trying to shutdown again).
// com.rabbitmq.client.impl.ChannelN
#Override
public void basicCancel(final String consumerTag) throws IOException
{
// [snip]
rpc(new Basic.Cancel(consumerTag, false), k);
try {
k.getReply(); // <== BLOCKS HERE
} catch(ShutdownSignalException ex) {
throw wrap(ex);
}
metricsCollector.basicCancel(this, consumerTag);
}
I'm not sure why this is proving so difficult. Is there a simpler way to force SimpleMessageListenerContainer shutdown?
Using Spring Rabbit 1.7.6; AMQP Client 4.0.3; Spring Boot 1.5.10-RELEASE
UPDATE
Some logs to demonstrate the theory that the message containers are restarting before connection refresh has completed, and that this might be why they don't reconnect:
ERROR o.s.a.r.c.CachingConnectionFactory - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue 'amq.gen-4-bqGxbLio9mu8Kc7MMexw' in vhost '/' refused for user 'cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4', class-id=50, method-id=10)
INFO u.c.c.c.r.ReauthenticatingChannelListener - Channel shutdown: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue 'amq.gen-4-bqGxbLio9mu8Kc7MMexw' in vhost '/' refused for user 'cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4', class-id=50, method-id=10)
INFO u.c.c.c.r.ReauthenticatingChannelListener - Channel closed with reply code 403. Assuming credentials have been revoked and refreshing config server properties to get new credentials. Cause: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue 'amq.gen-4-bqGxbLio9mu8Kc7MMexw' in vhost '/' refused for user 'cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4', class-id=50, method-id=10)
WARN u.c.c.c.r.ReauthenticatingChannelListener - Shutdown signalled: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=403, reply-text=ACCESS_REFUSED - access to queue 'amq.gen-4-bqGxbLio9mu8Kc7MMexw' in vhost '/' refused for user 'cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4', class-id=50, method-id=10)
INFO u.c.c.c.r.RabbitMQReauthenticator - Refreshing Rabbit credentials for XXXXXXXX
INFO o.s.c.c.c.ConfigServicePropertySourceLocator - Fetching config from server at: http://localhost:8888/configuration
INFO u.c.c.c.r.ReauthenticatingChannelListener - Got ListenerContainerConsumerFailedEvent: Consumer raised exception, attempting restart
INFO o.s.a.r.l.SimpleMessageListenerContainer - Restarting Consumer#2db55dec: tags=[{amq.ctag-ebAfSnXLbw_W1hlZ5ag7sQ=consumer.myQ}], channel=Cached Rabbit Channel: AMQChannel(amqp://cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4#127.0.0.1:5672/,2), conn: Proxy#12de62aa Shared Rabbit Connection: SimpleConnection#56c95789 [delegate=amqp://cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4#127.0.0.1:5672/, localPort= 50052], acknowledgeMode=AUTO local queue size=0
INFO o.s.c.c.c.ConfigServicePropertySourceLocator - Located environment: name=myApp, profiles=[default], label=null, version=null, state=null
INFO com.zaxxer.hikari.HikariDataSource - XXXXXXXX - Shutdown initiated...
INFO com.zaxxer.hikari.HikariDataSource - XXXXXXXX - Shutdown completed.
INFO u.c.c.c.r.RabbitMQReauthenticator - Refreshed username: 'cert-configserver-feb6e103-76a8-f5bf-3f23-1e8150812bc4' => 'cert-configserver-d7b54af2-0735-a9ed-7cc4-394803bf5e58'
INFO u.c.c.c.r.RabbitMQReauthenticator - CachingConnectionFactory reset, proceeding...
UPDATE 2:
This does seem to be a race condition of sorts. Having removed the container stop / starts, if I add a thread-only breakpoint to SimpleMessageListenerContainer.restart() to let the resetConnection() race past, and then release the breakpoint, then I can see things start to come back:
16:18:47,208 INFO u.c.c.c.r.RabbitMQReauthenticator - CachingConnectionFactory reset
// Get ready to release the SMLC.restart() breakpoint...
16:19:02,072 INFO o.s.a.r.c.CachingConnectionFactory - Attempting to connect to: rabbitmq.service.consul:5672
16:19:02,083 INFO o.s.a.r.c.CachingConnectionFactory - Created new connection: connectionFactory#7489bca4:1/SimpleConnection#68546c13 [delegate=amqp://cert-configserver-132a07c2-94f3-0099-4de1-f0b1a9875d5a#127.0.0.1:5672/, localPort= 33350]
16:19:02,086 INFO o.s.amqp.rabbit.core.RabbitAdmin - Auto-declaring a non-durable, auto-delete, or exclusive Queue ...
16:19:02,095 DEBUG u.c.c.c.r.ReauthenticatingChannelListener - Active connection check succeeded for channel AMQChannel(amqp://cert-configserver-132a07c2-94f3-0099-4de1-f0b1a9875d5a#127.0.0.1:5672/,1)
16:19:02,120 INFO o.s.amqp.rabbit.core.RabbitAdmin - Auto-declaring a non-durable, auto-delete, or exclusive Queue (springCloudBus...
That being the case I now have to work out either how to delay the container restarts until the refresh is done (i.e. my ShutdownSignalException handler completes), or make the refresh blocking somehow...
UPDATE 3:
My overall problem, of which this was a symptom, was solved with: https://stackoverflow.com/a/49392990/954442

It's not at all clear why the channel would report as open; this works fine for me; it recovers after deleting user foo...
#SpringBootApplication
public class So49323291Application {
public static void main(String[] args) {
SpringApplication.run(So49323291Application.class, args);
}
#Bean
public ApplicationRunner runner(RabbitListenerEndpointRegistry registry, CachingConnectionFactory cf,
RabbitTemplate template) {
return args -> {
cf.setUsername("foo");
cf.setPassword("bar");
registry.start();
doSends(template);
registry.stop();
cf.resetConnection();
cf.setUsername("baz");
cf.setPassword("qux");
registry.start();
doSends(template);
};
}
public void doSends(RabbitTemplate template) {
while (true) {
try {
template.convertAndSend("foo", "Hello");
Thread.sleep(5_000);
}
catch (Exception e) {
e.printStackTrace();
break;
}
}
}
#RabbitListener(queues = "foo", autoStartup = "false")
public void in(Message in) {
System.out.println(in);
}
}
(Body:'Hello' MessageProperties [headers={}, contentType=text/plain, contentEncoding=UTF-8, contentLength=0, receivedDeliveryMode=PERSISTENT, priority=0, redelivered=false, receivedExchange=, receivedRoutingKey=foo, deliveryTag=4, consumerTag=amq.ctag-9zt3wUGYSJmoON3zw03wUw, consumerQueue=foo])
2018-03-16 11:24:01.451 ERROR 11867 --- [ 127.0.0.1:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error; protocol method: #method(reply-code=320, reply-text=CONNECTION_FORCED - user 'foo' is deleted, class-id=0, method-id=0)
...
Caused by: com.rabbitmq.client.AuthenticationFailureException: ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.
2018-03-16 11:24:01.745 ERROR 11867 --- [cTaskExecutor-2] o.s.a.r.l.SimpleMessageListenerContainer : Stopping container from aborted consumer
2018-03-16 11:24:03.740 INFO 11867 --- [cTaskExecutor-3] o.s.a.r.c.CachingConnectionFactory : Created new connection: rabbitConnectionFactory#2c4d1ac:3/SimpleConnection#5e9c036b [delegate=amqp://baz#127.0.0.1:5672/, localPort= 59346]
(Body:'Hello' MessageProperties [headers={}, contentType=text/plain, contentEncoding=UTF-8, contentLength=0, receivedDeliveryMode=PERSISTENT, priority=0, redelivered=false, receivedExchange=, receivedRoutingKey=foo, deliveryTag=1, consumerTag=amq.ctag-ljnY00TBuvy5cCAkpD3r4A, consumerQueue=foo])
However, you really don't need to stop/start the registry, just reconfigure the connection factory with the new credentials and call resetConnection(); the containers will recover.

Tibjms javax.jms.JMSException: Connection unknown by server

I am using Tibjms jar for JMS connection and it works fine in normal case but I have problem in case the connection to jms provider is lost and then it comes back. To reproduce the issue, I performed the following steps -
Connect to intranet and start the server. Works fine.
Disconnect from intranet. It starts trying reconnecting the server. Fine.
Connect again to intranet. It throws unknown exception and never connects again. Problem.
So, my problem is "javax.jms.JMSException: Connection unknown by server" which does not tell me much and you can see it at the end of logs.
You can see it from the following logs -
2017-10-13 15:40:52,333 [ http-nio-8080-exec-2] INFO org.springframework.web.servlet.DispatcherServlet - FrameworkServlet 'dispatcherServlet': initialization completed in 37 ms
2017-10-13 15:41:29,293 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Disconnected from ssl://10.10.10.10:5071, will attempt to reconnect
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1912)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:42:29,334 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://11.11.11.11:5071, attempt 1 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:42:32,335 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://10.10.10.10:5071, attempt 1 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:43:35,358 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://11.11.11.11:5071, attempt 2 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:43:38,359 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://10.10.10.10:5071, attempt 2 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:44:41,368 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://11.11.11.11:5071, attempt 3 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:44:45,951 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Reconnecting to ssl://10.10.10.10:5071, attempt 3 out of 100
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2132)
at com.tibco.tibjms.TibjmsConnection._reconnect(TibjmsConnection.java:1975)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventReconnect(TibjmsConnection.java:387)
at com.tibco.tibjms.TibjmsxLinkTcp._doReconnect(TibjmsxLinkTcp.java:598)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:317)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
2017-10-13 15:44:50,525 [k Reader (Server-3285015)] ERROR com.example.jms.PaxJmsClient - Exception received from jms
javax.jms.JMSException: Connection unknown by server
at com.tibco.tibjms.Tibjmsx.buildException(Tibjmsx.java:659)
at com.tibco.tibjms.TibjmsConnection._invokeOnExceptionCallback(TibjmsConnection.java:2114)
at com.tibco.tibjms.TibjmsConnection._onDisconnected(TibjmsConnection.java:2487)
at com.tibco.tibjms.TibjmsConnection$ServerLinkEventHandler.onEventDisconnected(TibjmsConnection.java:367)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.work(TibjmsxLinkTcp.java:328)
at com.tibco.tibjms.TibjmsxLinkTcp$LinkReader.run(TibjmsxLinkTcp.java:259)
My code -
#PostConstruct
public void configurePaxJmsClient() {
try {
// create Topic Connection Factory
TibjmsTopicConnectionFactory cf = new TibjmsTopicConnectionFactory(serverUrl);
cf.setSSLTrustedCertificate(sslCertificatePath);
cf.setSSLEnableVerifyHostName(false);
cf.setUserName(username);
cf.setUserPassword(password);
cf.setReconnAttemptCount(100);
cf.setReconnAttemptDelay(60000);
cf.setReconnAttemptTimeout(10000);
cf.setConnAttemptCount(100);
cf.setConnAttemptDelay(60000);
cf.setConnAttemptTimeout(10000);
Tibjms.setExceptionOnFTEvents(true);
Tibjms.setExceptionOnFTSwitch(true);
// creation the connection and install an exception handler
connection = cf.createTopicConnection(username, password);
connection.setExceptionListener(this);
// You might also use CLIENT_ACKNOWLEDGE here
session = connection.createTopicSession(false, javax.jms.Session.AUTO_ACKNOWLEDGE);
Topic topic = session.createTopic(topicName);
// Create the subscriber and install the listener
TopicSubscriber ts;
/*if (dsName == null || dsName.length() == 0) {
ts = session.createSubscriber(topic);
} else {
ts = session.createDurableSubscriber(topic, dsName);
}*/
if (dsName == null || dsName.length() == 0) {
ts = session.createSubscriber(topic, messageSelector, false);
} else {
ts = session.createDurableSubscriber(topic, dsName, messageSelector, false);
}
//
ts.setMessageListener(this);
connection.start();
} catch (JMSException e) {
LOGGER.error("Failed to connect with message:" + e.getMessage(), e);
releaseResources();
}
}
#Override
public void onException(JMSException e) {
LOGGER.error("Exception received from jms", e);
}
Can you guys tell me what is the problem here or point me in the right direction?
Also, is this fine to have jms connection initialization in #PostConstruct of a spring bean?

Why EMS reports “reconnect failed: connection unknown for id=xxxxx”?
This message indicates that the EMS server does not have or no longer has the client connections information when the client attempts to reconnect.
There are two possible reasons:
Parameter “ft_reconnect_timeout” is not high enough. Before the client reconnects the server, the connection has already purged by the server.
This could be resolved by setting a higher value to the “ft_reconnect_timeout” parameter in tibemsd.conf. The default value is 60 seconds.
Parameter “ft_reconnect_timeout” is the amount of time (in seconds) that a backup server waits for clients to reconnect
(after it assumes the role of primary server in a failover situation), this parameter specifies in seconds how long the server will keep pending connections.
If a client does not reconnect within this time period, the server removes its state from the shared state files.
And if the client tries to reconnect after the time set in “ft_reconnect_timeout”, the server does not have the client connections information and prints the "reconnect failed: connection unknown" message.
So will suggest you to set the value according to your environment and test the same.Also
If Ft_reconnect_timeout value is high, a lot of connections and connection related objects are kept in the memory for a long time, you may have a memory issue. And if the connection is using clientID, you may run into “clientID already exists” issue.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.