Debezium flush timeout and OutOfMemoryError errors with MySQL - java

Using Debezium 0.7 to read from MySQL but getting flush timeout and OutOfMemoryError errors in the initial snapshot phase. Looking at the logs below it seems like the connector is trying to write too many messages in one go:
WorkerSourceTask{id=accounts-connector-0} flushing 143706 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask]
WorkerSourceTask{id=accounts-connector-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask]
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
WorkerSourceTask{id=accounts-connector-0} Failed to flush, timed out while waiting for producer to flush outstanding 143706 messages [org.apache.kafka.connect.runtime.WorkerSourceTask]
Wonder what the correct settings are http://debezium.io/docs/connectors/mysql/#connector-properties for sizeable databases (>50GB). I didn't have this issue with smaller databases. Simply increasing the timeout doesn't seem like a good strategy. I'm currently using the default connector settings.
Update
Changed the settings as suggested below and it fixed the problem:
OFFSET_FLUSH_TIMEOUT_MS: 60000 # default 5000
OFFSET_FLUSH_INTERVAL_MS: 15000 # default 60000
MAX_BATCH_SIZE: 32768 # default 2048
MAX_QUEUE_SIZE: 131072 # default 8192
HEAP_OPTS: '-Xms2g -Xmx2g' # default '-Xms1g -Xmx1g'

This is a very complex question - first of all, the default memory settings for Debezium Docker images are quite low so if you are using them it might be necessary to increase them.
Next, there are multiple factors at play. I recommend to do follwoing steps.
Increase max.batch.size and max.queue.size - reduces number of commits
Increase offset.flush.timeout.ms - gives Connect time to process accumulated records
Decrease offset.flush.interval.ms - should reduce the amount of accumulated offsets
Unfortunately there is an issue KAFKA-6551 lurking in backstage that can still play a havoc.

I can confirm that the answer posted above by Jiri Pechanec solved my issues. This is the configurations I am using:
kafka connect worker configs set in worker.properties config file:
offset.flush.timeout.ms=60000
offset.flush.interval.ms=10000
max.request.size=10485760
Debezium configs passed through the curl request to initialize it:
max.queue.size = 81290
max.batch.size = 20480
We didn't run into this issue with our staging MySQL db (~8GB), because the dataset is a lot smaller. For production dataset (~80GB) , we had to adjust these configurations.
Hope this helps.

To add onto what Jiri said:
There is now an open issue in the Debezium bugtracker, if you have any more information about root causes, logs or reproduction, feel free to provide them there.
For me, changing the values that Jiri mentioned in his comment did not solve the issue. The only working workaround was to create multiple connectors on the same worker that are responsible for a subset of all tables each. For this to work, you need to start connector 1, wait for the snapshot to complete, then start connector 2 and so on. In some cases, an earlier connector will fail to flush when a later connector starts to snapshot. In those cases, you can just restart the worker once all snapshots are completed and the connectors will pick up from the binlog again (make sure your snapshot mode is "when_needed"!).

Related

jetty threads increasing linearly

All I have got an apache FUSION server and configured jetty for the same.
I can see using newrelic that the count of threads is increasing linearly. After a time these threads are increased to a limit and cause out of memory exception until I restart my proxy server.
Please find below the start.ini configs I did to regulate the number of threads.
--module=server
jetty.threadPool.minThreads=10
jetty.threadPool.maxThreads=150
jetty.threadPool.idleTimeout=5000
jetty.server.dumpAfterStart=false
jetty.server.dumpBeforeStop=false
jetty.httpConfig.requestHeaderSize=32768
etc/jetty-stop-timeout.xml
--module=continuation
--module=deploy
--module=jsp
--module=ext
--module=resources
--module=client
--module=annotations
--module=servlets
etc/jetty-logging.xml
--module=jmx
--module=stats
I tried adding thread enabled property too but it didn't work. Can anyone help how can I limit these threads? For the same configurations on other servers, I can see the threads are not increasing and are well in range on newrelic.

Tomcat jdbc stalls during amazon multi-az failover

SOLVED
Ultimately the solution noted in the similar ticket noted below worked for us. When we tried it initially our configuration parser was mangling the URL and removing &connectTimeout=15000&socketTimeout=60000 from it, which invalidated that test.
I'm having trouble getting a tomcat jdbc connection pool to fail-over to a new DB server using Amazon RDS' mutli-az feature. When fail-over occurs the application server gets hung up trying to borrow a connection from the pool. It's similar to this question, however the solution to this users problem did not help me, so I suspect it's not quite the same: Configure GlassFish JDBC connection pool to handle Amazon RDS Multi-AZ failover
The sequence goes a bit like this:
Succesful request is made, log output as expected
fail-over initiated (via reboot with failover in RDS)
Request made that times out, log messages from request appear as expected up until a connection is borrowed from the pool.
Subsequent requests generate no log messages, they time out as well.
After some amount of time, the daemon will eventually start printing more log messages as if it succesfully connected to the database and performing operations. This can take just over 16 minutes to occur, the client has long since timed out.
If I wait about 50 minutes and try again eventually the system will finally accept connections again.
Notes
If I tell tomcat to shut down there are exceptions in the logs about it being unable to clean up resources, and about my servlet still processing a request. Most notable (In my opinion) is the following stack trace indicating at least one thing is stuck waiting for communication over a socket from mysql.
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3116)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3573)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3562)
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4113)
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2812)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2761)
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
org.apache.tomcat.jdbc.pool.PooledConnection.validate(PooledConnection.java:441)
org.apache.tomcat.jdbc.pool.PooledConnection.validate(PooledConnection.java:384)
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:716)
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:579)
org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:174)
org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:111)
<...>
If I restart tomcat things return to normal as soon as it comes back. Obviously long term this is probably not a workable solution for maintaining uptime.
Environment Details
Database: MySQL (using mysql-connector-java 5.1.26) (server 5.5.53)
Tomcat 8.0.45
I've gone through several changes to my configuration while trying to solve this issue. At the time of this writing the following related settings are in place:
jre/lib/security/java.security -- I'm under the impression that the default value for Oracle Java 1.8 is 30s with no security manager. I set these to zero just to be positive this isn't the issue.
networkaddress.cache.ttl: 0
networkaddress.cache.negative.ttl: 0
connection pool settings
testOnBorrow: true
testOnConnect: true
testOnReturn: true
jdbc url parameters
connectTimeout:60000
socketTimeout:60000
autoReconnect:true
Update
Still no solution found.
Added in logging to confirm that this was not a DNS caching issue. IP address logged immediately before timeout matches up with IP address of 'new' master RDS host.
For reference the following block represents the properties I'm using to initialize my data source. I’m configuring the pool in code rather than JNDI, with some elements pulled out of our app's config file. I’ve pasted the code below along with comments indicating what the config values are for the tests I’ve been running.
PoolProperties p = new PoolProperties();
p.setUrl(config.get("JDBC_URL")); // jdbc:mysql://host:3306/dbname
p.setDriverClassName("com.mysql.jdbc.Driver");
p.setUsername(config.get("JDBC_USER"));
p.setPassword(config.get("JDBC_PASSWORD"));
p.setJmxEnabled(true);
p.setTestWhileIdle(false);
p.setTestOnBorrow(true);
p.setValidationQuery("SELECT 1");
p.setValidationInterval(30000);
p.setTimeBetweenEvictionRunsMillis(30000);
p.setMaxActive(Integer.parseInt(config.get("MAX_ACTIVE"))); //45
p.setInitialSize(10);
p.setMaxWait(5);
p.setRemoveAbandonedTimeout(Integer.parseInt(config.get("REMOVE_ABANDONED_TIMEOUT"))); //600
p.setMinEvictableIdleTimeMillis(Integer.parseInt(config.get("DB_EVICTION_TIMEOUT"))); //60000
p.setMinIdle(Integer.parseInt(config.get("DB_MIN_IDLE"))); //50
p.setLogAbandoned(Boolean.parseBoolean(config.get("LOG_ABANDONED"))); //true
p.setRemoveAbandoned(true);
p.setJdbcInterceptors(
"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;"+
"org.apache.tomcat.jdbc.pool.interceptor.StatementFinalizer;"+
"org.apache.tomcat.jdbc.pool.interceptor.ResetAbandonedTimer");
// make sure new connections have auto commit true
p.setDefaultAutoCommit(true);

Postgres query run executed using hibernate getting dropped if the query takes a long time without any timeout exception

I am running a postgres query that takes more than two hours.
This query is executed using hibernate in a java program.
After about 1.5 hours the query stops showing up in the server status in pg_admin.
Since, the query disappeared from the list of active queries on the database, I am expecting a success or a timeout exception. But, I get none.(No exception) and my thread in stuck in the wait state.
I know the query has not finished because it was supposed to do some inserts in a table and I cannot find the expected rows in the table.
I am using pgbouncer for the connection pooling and the query_timeout is disabled.
Had it been a hibernate timeout I should have got an exception.
OS parameters on the DB machine and Client machine(Machine running java program)
tcp_keepalive_time is 7200 (seconds)
tcp_keepalive_intvl = 75
tcp_keepalive_probes = 9 (number of probes)
Both the machines run RHEL operating system.
I am unable to put my finger on the issue.
I found that the issue was caused due to the TCP connection getting dropped and the client still hanging waiting for the response.
I altered the following parameters at OS level:-
/proc/sys/net/ipv4/tcp_keepalive_time = 2700
Default value was 7200.
This causes a keep alive check at every 2700 seconds instead of 7200 seconds.
I am sure you would have already looked at the following resources:
PostgreSQL Timeout Docs
PgBouncer timeout (you already mention).
Hibernate timeout parameters, if any.
Once that is done, (just like triaging permission issues during a new installation, ) I recommend that you try the following SQL, from different scenarios (given below) and ascertain what is actually causing this timeout:
SELECT pg_sleep(7200);
Login to the server (via psql) and see whether this SQL times-out.
Login to the PgBouncer (again via psql) and see whether PgBouncer times out.
Execute this SQL via Hibernate (via PgBouncer), and see whether there is a timeout.
This should allow you to clearly isolate the cause for this.

Unable to fill pool (no buffer space available)

I'm using Wildfly 8.2 and fire a series of DB requests when a certain web page is opened. All queries are invoked thru JPA Criteria API, return results as expected - and - none of them delivers a warning, error or exception. It all runs in Parallel Plesk.
Now, I noticed that within 2 to 3 days the following error appears and the site becomes unresponsive. I restart and I wait approx another 3 days till it happens again (depending on the number of requests I have).
I checked the tcpsndbuf on my linux server and I noticed it is constantly at max. Unless I restart Wildfly. Apparently it fails to release the connections.
The connections are managed by JPA/Hibernate and the Wildfly container. I don't do any special or custom transaction handling e.g. open, close. etc. I leave it all to Wildfly.
The MySQL Driver I'm using is 5.1.21 (mysql-connector-java-5.1.21-bin.jar)
In the standalone.xml I have defined the following datasource datasource values (among others):
<transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
<pool>
<min-pool-size>3</min-pool-size>
<max-pool-size>10</max-pool-size>
</pool>
<statement>
<prepared-statement-cache-size>32</prepared-statement-cache-size>
<shared-prepared-statements>true</shared-prepared-statements>
</statement
Has anyone experience the same rise of tcpsndbuf values (or this error)? In case you require more config or log files, let me know. Thanks!
UPDATE
Despite the following additional timeout settings, it still runs into the hanger. And thus, it will then use 100% CPU time, whenever the max tcpsndbuf is reached.,
Try adding this Hibernate property:
<property name="hibernate.connection.release_mode">after_transaction</property>
By default, JTA mandates that connection should be released after each statement, which is undesirable for most use cases. Most Drivers don't allow multiplexing a connection over multiple XA transactions anyway.
Do you use openvz? I think this question should be asked on serverfault. It is related to linux configuration. You can read: tcpsndbuf. You should count opened sockets and check condition:

RabbitMQ connection in blocking state?

I connect to RabbitMQ server that time my connection display in blocking state and i not able to publish new message
i have ram of 6 gb free and disk space also be about 8GB
how to configure disk space limit in RabbitMQ
I got the same problem. Seem like the rabbitmq server was using more memory than the threshold
http://www.rabbitmq.com/memory.html
I ran following command to unblock these connections:
rabbitmqctl set_vm_memory_high_watermark 0.6
(default value is 0.4)
By default, [disk_free_limit](source: [1]) must exceed 1.0 times of RAM available. This is true in your case so you may want to check what exactly is blocking the flow. To do that, read [rabbitmqctl man](source: [2]), and run the last_blocked_by command. That should tell you the cause for blocking.
Assuming it is memory (and you somehow didn't calculate your free disk space correctly), to change disk_free_limit, read [configuring rabbitmq.config](source: [1]), then open your rabbitmq.config file and add the following line: {rabbit, [{disk_free_limit, {mem_relative, 0.1}}]} inside the config declaration. My rabbitmq.config file looks as follows:
[
{rabbit, [{disk_free_limit, {mem_relative, 0.1}}]}
].
The specific number is up to you, of course.
Sources
http://www.rabbitmq.com/configure.html#configuration-file
http://www.rabbitmq.com/man/rabbitmqctl.1.man.html

Categories

Resources