Recovery of the connection in RabbitMQ for Clustered environment

Recovery of the connection in RabbitMQ for Clustered environment - java

I'm trying to recover the connection in RMQ for Clustered environment but unfortunately i'm not able to recover it in code and its also not catching in my exception.
For example. Initially node 1 is connected and our messages has been flow successfully and to test fail-over, we brought up node 2 and stopped node 1.. connections are being lost which is expected, but retry is not happening as node 2 is up.
When i restart my service, i'm able to get exception as:
"Rabbit MQ Message Exception : Error = 'connection is already closed due to
connection error; cause: java.net.SocketException: Connection reset'"
Can anyone please suggest how to recover it in such case?
Have used below configurations in my code. (AMQP client):
factory.setAutomaticRecoveryEnabled(true);
factory.setNetworkRecoveryInterval(5000);
factory.setTopologyRecoveryEnabled(true);
factory.setRequestedHeartbeat(60);
By using Lyra connection recovery will occur with following config:
.withRetryPolicy(new RetryPolicy()
.withMaxAttempts(30)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));

Related

Avoiding error messages in connecting to active/passive ActiveMQ

I am creating a program to send messages to ActiveMQ. This server is in active/passive configuration with one active and two standby nodes.
My code for creating connections as follows:
String furl = "failover:(tcp://aa.myamq-01:61616:tcp://aa.myamq-02:61616:tcp://aa.myamq-03:61616");
ConnectionFactory = new ActiveMQConnectionFactory(furl);
Connection connection = ConnectionFactory.createConnection();
connection.start();
Everything works as expected. However I get the following output in the console.
ERROR | Connect fail to tcp://aa.myamq-01:61616, error message : Connection refused:Connect
ERROR | Connect fail to tcp://aa.myamq-02:61616, error message : Connection refused:Connect
INFO |Successfully connected to tcp://aa.myamq-03:61616
Is there a way the active server can be identified and connection attempted to it only? Alternatively, can the error messages be suppressed?

AFAIK there isn't in this scenario. Inactive ActiveMQ nodes are actually not accepting connections. Therefor they look like they are down.
But if you would need to find out which node is master you can find that in DB if you use DB backend in ActiveMQ.
Regarding suppressing this message I would not even try to do that because if there would be a regular failure you will not notice it.

HA registry : how to avoid exception at boot?

I have successfully created 2 registries in HA mode. I followed instructions here and it's working, but since each registry need the other at boot it throws Exceptions until the two are ready :
java.net.ConnectException: Connection refused: connect
After a few seconds, each registry connect to each other and everything work fine (each registry has the other as replica), but it's annoying to have those exceptions at boot...
Is there a way to avoid this ?

Camunda DB connection closed but picked by camunda engine

We are using camunda with RDS/MySql as DB. It works fine but then sometimes it says DB is closed and so throws the ProcessEngine Exception.
Here is what I understood from our config and logs:
We have 5active connections at any time in our pool (Specified in datasource config)
There was a scenario where it was closed.
We saw error like:
Request received Context path: /engine-rest Request received Path
Info: /user PathInfo: /user ExceptionHandler:
org.camunda.bpm.engine.ProcessEngineException: Process engine
persistence exception at
org.camunda.bpm.engine.impl.interceptor.CommandInvocationContext.rethrow(CommandInvocationContext.java:148)
at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:173)
at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:113)
at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66)
at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
...... Caused by: org.apache.ibatis.exceptions.PersistenceException:
Error querying database. Cause:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
No operations allowed after connection closed. The error may exist in
org/camunda/bpm/engine/impl/mapping/entity/User.xml The error may
involve
org.camunda.bpm.engine.impl.persistence.entity.UserEntity.selectUserByQueryCriteria
The error occurred while executing a query SQL: select distinct RES.*
from ACT_ID_USER RES
order by RES.ID_ asc LIMIT ? OFFSET ? Cause:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
No operations allowed after connection closed.
Our tomcat props specify: minIdle = 5;
My best guess: Its closed on server but we are maintaining locally due to the above property.
Per tomcat doc (https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html):
testOnBorrow = true;
validationQuery = "select 1";
The two props should fix it as it validates the connection.
Question I am trying to figure out:
How can I repro this issue? Apart from keeping connection it idle for several hours, the scenario where this happened.
Does the AWS RDS server close the connection? If so, can we control it?

As mentioned by #Zelldon, the connection timeout can be reduced and then we could try it. It works as expected.
Just to be sure, I ran two instances of camunda, one with this fix and other without. Could see that the fix worked.
Regarding RDS, it does close the connection but I could not find any documentation on it.

Failed to connect to queue manager 'QUEUE-NAME' with connection mode 'Client' and host

I have developed subscripe (topic) conncept using Camel. it is working fine in my local tomcat.but it is not working in my test environment tomcat. it is getting below mentioned error. kindly help me to resolve the issue and how to debug the issue.
is it related to server configuration ?
Error
org.apache.camel.component.jms.JmsMessageListenerContainer refreshConnectionUntilSuccessful
SEVERE: Could not refresh JMS Connection for destination 'TOPIC-NAME' - retrying in 5000 ms. Cause: JMSWMQ0018: Failed to
connect to queue manager 'QUEUE-MANAGER' with connection mode 'Client' and
host name 'HOST-NAME'.; nested exception is com.ibm.mq.MQException:
JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED')
reason '2059' ('MQRC_Q_MGR_NOT_AVAILABLE').
regards,
Gnana

There is almost no information to go on here and therefore no way to answer with any confidence. Instead, I'll provide a diagnostic process and hopefully you will find the problem. Note that in the future if you have similar issues, it would help to list the diagnostics you have already tried so that people responding can narrow down their answers.
In order for this to work, the QMgr must be running a listener, have a channel defined and available, have authorizations set up to allow the connection, and be able to resolve the queue or topic requested. With that in mind, the things I normally check and the order I check them in is as follows:
Is the QMgr running.
Is the listener running? On what port?
Can I telnet to the QMgr on the listener port? i.e. telnet mqhost 1414.
Is the channel defined? If so, is it available?
Do the sample client programs work? In this case, amqspubc is the one to try.
There are other considerations and if all of the above work, it is time to look into the client code and configuration, the versions of the client and server, authorizations, etc. But until you know that the basic configuration is in place to support a client connection (which was not indicated in the question) then these are the things to start with.

I need some help with Sakai 2.7.1 and Tomcat 5.5.33, in regards to SQL issues

Today I managed to recreate the farms with Scalr.net and apparently after a few times restarting tomcat and fixing issues, I get this error once again. The thing is I was using MySQL with a clean install on the entire server, that includes Java 6.1_24, Tomcat 5.5.33, Sakai 2.7.1. The issue I keep running into is user denied when the fact that I have this user in the MySQL Instance, as well giving it complete remote access with sakai#% and even this is not working when it was working about an hour ago since this post was made.
... Continued from above log, everything before logs just fine
2011-03-31 18:31:14,120 WARN main org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy - Could not retrieve default auto-commit and transaction isolation settings
org.apache.commons.dbcp.SQLNestedException: Error preloading the connection pool
... continued over 400+ lines...
Here is another error in regards to the access denied error...
2011-03-31 18:31:16,854 WARN main org.hibernate.cfg.SettingsFactory - Could not obtain connection metadata
java.sql.SQLException: Access denied for user 'sakai'#'ec2-50-17-184-70.compute-1.amazonaws.com' (using password: YES)
.... continued....
I now get this error whenever I startup, this is with a fresh install of tomcat/sakai
SEVERE: Unable to set localhost. This prevents creation of a GUID. Cause was: ec2-72-44-56-167.compute-1.amazonaws.com: ec2-72-44-56-167.compute-1.amazonaws.com
java.net.UnknownHostException: ec2-72-44-56-167.compute-1.amazonaws.com: ec2-72-44-56-167.compute-1.amazonaws.com
(This most recent error (Localhost) was simply fixed by restarting the amazon aws instance. Thankfully) Although I keep getting the same errors even with a fresh install... Almost as if the information is being refreshed from a cache... Or something

As with the last question you posted on this topic, the error message seems very clear: the user 'sakai'#... does not have access to login to the database you have set it up to. I recommend taking a look at the Mysql documentation to understand how to administer the user accounts to find out if you've missed a setting somewhere to allow this account to have access.

I believe I may have figured out how to fix this problem. It has nothing to do with mysql, or the apache server itself. It has to do with the failure of Scalr.net not Initializing the IP or something of that sort. After doing some research I found some issues with the HostInit issues such as....
Cannot deliver message 'HostInit' (message_id: af9dcfdb-a09e-4971-bdb7-7871b3f7e21c) via REST to server '50.17.135.98' (server_id: e49cfec9-5bcb-44d1-bbc5-fde32450fc89). Error: 0 Timeout was reached; connect() timed out! (http://50.17.135.98:8013/control)
Cannot deliver message 'BlockDeviceAttached' (message_id: a153d83f-3d96-4d53-920a-ccb80701675a) via REST to server '50.17.135.98' (server_id: e49cfec9-5bcb-44d1-bbc5-fde32450fc89). Error: 0 Timeout was reached; connect() timed out! (http://50.17.135.98:8013/control)
Cannot deliver message 'HostUp' (message_id: 1adde27c-9982-4551-b266-c3c432d1dd44) via REST to server '50.17.135.98' (server_id: e49cfec9-5bcb-44d1-bbc5-fde32450fc89). Error: 0 Timeout was reached; connect() timed out! (http://50.17.135.98:8013/control)
Cannot deliver message 'HostInit' (message_id: f1aa4b14-ef57-4361-ae56-87702d674b11) via REST to server '50.17.135.98' (server_id: e49cfec9-5bcb-44d1-bbc5-fde32450fc89). Error: 0 Timeout was reached; connect() timed out! (http://50.17.135.98:8013/control)
So what I did was I made a snapshot image of the apache server/mysql etc. and terminated them allowing the recreation of the instance and this managed to solve the problem in one manner.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.