When developing an application which consumes an external webservice I have generated the sources from the wsdl-url and then created a client:
GeoIPServiceClient service = new GeoIPServiceClient();
GeoIPServiceSoap geoIPClient = service.getGeoIPServiceSoap();
Since the creation of this proxy takes some time I set the client as an attribute in my service class.
But I'm worried that the client isn't thread safe and this webservice is heavily used in the application by concurrent threads (webapp). I can't find any documentation on this.
As a precaution I've started to use an object pool of soap clients instead of a shared one.
Is this an unnecessary precaution? What is the best practice when writing xfire clients?
I suspect some kind of concurrency problem with xfire since I regularly, under high load, get blocked threads and as a result of this the application crashes. Here's a partial thread dump:
"http-xx.xx.xx.xx-80-17" daemon prio=10 tid=0x00007f560d437000 nid=0x66cb waiting for monitor entry [0x00000000412b8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.sun.xml.bind.v2.runtime.reflect.opt.Injector.inject(Injector.java:174)
- waiting to lock <0x00007f561d44e1c0> (a com.sun.xml.bind.v2.runtime.reflect.opt.Injector)
at com.sun.xml.bind.v2.runtime.reflect.opt.Injector.inject(Injector.java:85)
at com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector.prepare(AccessorInjector.java:87)
at com.sun.xml.bind.v2.runtime.reflect.opt.OptimizedAccessorFactory.get(OptimizedAccessorFactory.java:165)
at com.sun.xml.bind.v2.runtime.reflect.Accessor$FieldReflection.optimize(Accessor.java:253)
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.<init>(TransducedAccessor.java:231)
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor.get(TransducedAccessor.java:173)
at com.sun.xml.bind.v2.runtime.property.SingleElementLeafProperty.<init>(SingleElementLeafProperty.java:83)
at sun.reflect.GeneratedConstructorAccessor165.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.sun.xml.bind.v2.runtime.property.PropertyFactory.create(PropertyFactory.java:124)
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.<init>(ClassBeanInfoImpl.java:171)
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(JAXBContextImpl.java:481)
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:315)
at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:139)
at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:117)
at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:188)
at sun.reflect.GeneratedMethodAccessor176.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:128)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:277)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:372)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:337)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:244)
at org.codehaus.xfire.jaxb2.JaxbType.getJAXBContext(JaxbType.java:306)
- locked <0x00007f565b3aee60> (a org.codehaus.xfire.jaxb2.JaxbType)
at org.codehaus.xfire.jaxb2.JaxbType.writeObject(JaxbType.java:230)
at org.codehaus.xfire.aegis.AegisBindingProvider.writeParameter(AegisBindingProvider.java:229)
at org.codehaus.xfire.service.binding.AbstractBinding.writeParameter(AbstractBinding.java:273)
at org.codehaus.xfire.service.binding.WrappedBinding.writeMessage(WrappedBinding.java:90)
at org.codehaus.xfire.soap.SoapSerializer.writeMessage(SoapSerializer.java:80)
at org.codehaus.xfire.transport.http.HttpChannel.writeWithoutAttachments(HttpChannel.java:56)
at org.codehaus.xfire.transport.http.OutMessageRequestEntity.writeRequest(OutMessageRequestEntity.java:51)
at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.codehaus.xfire.transport.http.CommonsHttpMessageSender.send(CommonsHttpMessageSender.java:369)
at org.codehaus.xfire.transport.http.HttpChannel.sendViaClient(HttpChannel.java:123)
at org.codehaus.xfire.transport.http.HttpChannel.send(HttpChannel.java:48)
at org.codehaus.xfire.handler.OutMessageSender.invoke(OutMessageSender.java:26)
at org.codehaus.xfire.handler.HandlerPipeline.invoke(HandlerPipeline.java:131)
at org.codehaus.xfire.client.Invocation.invoke(Invocation.java:79)
at org.codehaus.xfire.client.Invocation.invoke(Invocation.java:114)
at org.codehaus.xfire.client.Client.invoke(Client.java:336)
at org.codehaus.xfire.client.XFireProxy.handleRequest(XFireProxy.java:77)
at org.codehaus.xfire.client.XFireProxy.invoke(XFireProxy.java:57)
at $Proxy143.getMyMethod(Unknown Source)
The thread dump contains a lot of blocked threads that look like this.
I guess as you get a lot of blocked threads, the client is actually thread-safe as object data is not corrupted :). But I agree it's not handling the concurrency in a good way.
1) One observation is that the final lock seems to be in JAXB implementation and not in XFire. What if you try using different JAXB implementation like JaxMe?
2) Also the method getJAXBContext in JaxbType is synchronised. And most likely because your threads are accessing the same JaxbType instance they may be blocked.
Looking at that method I would actually moved the synchronisation into the method after context presense is checked:
if (context == null) {
synchronized (this) {
...
This will allow for clients that already have JAXBContext initialised to skip expensive synchronisation.
My suggestion is either try fixing the code yourself and make a test or submit a bug to XFire or do both :).
Depends on the version of Xfire you are using, as they have fixed few Thread Safety issues in version 1.2.5. You can check the bug raised at http://jira.codehaus.org/browse/XFIRE-886 , and see more details on the release notes at hxxp://xfire.codehaus.org/XFire+1.2.5+Release+Notes
Related
I use Async log4j2 over slf4j in our app and i was sure that is non-blocking. But after integrating BlockHound i got a surprise:
java.lang.Exception: [worker-1-8] Blocking call: sun.misc.Unsafe#park
at reactor.blockhound.BlockHound$Builder.lambda$install$6(BlockHound.java:318)
at reactor.blockhound.BlockHoundRuntime.checkBlocking(BlockHoundRuntime.java:46)
at sun.misc.Unsafe.park(Unsafe.java)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.lmax.disruptor.TimeoutBlockingWaitStrategy.signalAllWhenBlocking(TimeoutBlockingWaitStrategy.java:62)
at com.lmax.disruptor.MultiProducerSequencer.publish(MultiProducerSequencer.java:218)
at com.lmax.disruptor.RingBuffer.translateAndPublish(RingBuffer.java:990)
at com.lmax.disruptor.RingBuffer.tryPublishEvent(RingBuffer.java:538)
at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.tryEnqueue(AsyncLoggerConfigDisruptor.java:392)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncDelegate(AsyncLoggerConfig.java:135)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:116)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:460)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core.Logger.log(Logger.java:162)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2190)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2144)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2020)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:184)
Is this expected behavior or am i missing something?
Log4j2's async logging just hands off the I/O part to a separate thread via the disruptor (a kind of ring buffer). The producer signals the consumer using Java's built-in notify(), which has to acquire a lock.
I am using AsyncHttpClient 2.3.0 and Default configuration.
I've noticed that AHC created two types of threads (from the thread dump):
1)
AsyncHttpClient-timer-478-1" - Thread t#30390 java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.$$YJP$$sleep(Native Method)
at java.lang.Thread.sleep(Thread.java)
at io.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:560)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:459)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2)
AsyncHttpClient-3-4" - Thread t#20320 java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.$$YJP$$epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <16163575> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <49280039> (a java.util.Collections$UnmodifiableSet)
- locked <2decd496> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
I've expected that AsyncHttpClient uses a few threads under the hood. But after a few days of running, AsyncHttpClient creates ~500 of AsyncHttpClient-timer-xxx-x threads and a few AsyncHttpClient-x-x.
It's called not very intensively, probably also ~500 times per this period.
Only executeRequest is used (execute request and get on returned future) https://static.javadoc.io/org.asynchttpclient/async-http-client/2.3.0/org/asynchttpclient/AsyncHttpClient.html#executeRequest-org.asynchttpclient.Request-org.asynchttpclient.AsyncHandler-:
<T> ListenableFuture<T> executeRequest(Request request, AsyncHandler<T> handler);
I've seen a page about connection pool configuration (https://github.com/AsyncHttpClient/async-http-client/wiki/Connection-pooling) but nothing about thread pool configuration.
What is the difference between both types of thread and what can cause a large number of threads created? Is there any configuration I should apply?
AHC has two types of threads:
For I/O operation.
On your screen, it's AsyncHttpClient-x-x
threads. AHC creates 2*core_number of those.
For timeouts.
On your screen, it's AsyncHttpClient-timer-1-1 thread. Should be
only one.
Any different number means you’re creating multiple clients.
Source: issue on GitHub https://github.com/AsyncHttpClient/async-http-client/issues/1658
"pool-11-thread-1" prio=10 tid=0x0a974c00 nid=0x7210 runnable [0x3f3ad000]
java.lang.Thread.State: RUNNABLE
at oracle.jdbc.driver.T2CStatement.t2cDefineExecuteFetch(Native Method)
at oracle.jdbc.driver.T2CPreparedStatement.doDefineExecuteFetch(T2CPreparedStatement.java:878)
at oracle.jdbc.driver.T2CPreparedStatement.executeForRows(T2CPreparedStatement.java:760)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1062)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1126)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3339)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3445)
- locked <0x69579fb0> (a oracle.jdbc.driver.T2CPreparedStatement)
- locked <0x66157d68> (a oracle.jdbc.driver.T2CConnection)
at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.execute(CachedPreparedStatement.java:216)
at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:209)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:180)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryForObject(GeneralStatement.java:104)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:561)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:536)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:93)
at org.springframework.orm.ibatis.SqlMapClientTemplate$1.doInSqlMapClient(SqlMapClientTemplate.java:273)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:209)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForObject(SqlMapClientTemplate.java:271)
at com.alipay.bipgw.common.dal.bankchannel.ibatis.IbatisBipBusiOrderDAO.queryOrderOutTime(IbatisBipBusiOrderDAO.java:319)
at sun.reflect.GeneratedMethodAccessor3333.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:310)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at com.alipay.bipgw.common.dal.monitor.DalMonitorInterceptor.invoke(DalMonitorInterceptor.java:60)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy79.queryOrderOutTime(Unknown Source)
at com.alipay.bipgw.prodcore.repository.impl.BusiOrderRepositoryImpl.queryOrderOutTime(BusiOrderRepositoryImpl.java:402)
at com.alipay.bipgw.prodcore.listener.ProdStatusChangeTimeoutTaskListener.execute(ProdStatusChangeTimeoutTaskListener.java:148)
at com.alipay.bipgw.prodcore.listener.ProdStatusChangeTimeoutTaskListener.access$000(ProdStatusChangeTimeoutTaskListener.java:60)
at com.alipay.bipgw.prodcore.listener.ProdStatusChangeTimeoutTaskListener$1.run(ProdStatusChangeTimeoutTaskListener.java:104)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
the java programm like this :
public void onUniformEvent(UniformEvent message, UniformEventContext uContext) {
try {
// single thread running
service.execute(new Runnable() {
public void run() {
try{
execute();
}catch(Exception e ){
logger.error("working error",e);
}
}
});
} catch (RejectedExecutionException e) {
logger.error("ProdStatusChangeTimeoutTaskListener:error", e);
} catch (Exception e) {
logger.error("ProdStatusChangeTimeoutTaskListener:error", e);
}
}
//omit the body
private void execute() {.....}
and the execute method will not start any thread.
in two days i dump several thread dump
2013-03-04 16:54:12
- locked <0x695f91f0> (a oracle.jdbc.driver.T2CPreparedStatement)
- locked <0x6615a2d0> (a oracle.jdbc.driver.T2CConnection)
2013-03-04 17:20:53
- locked <0x695f91f0> (a oracle.jdbc.driver.T2CPreparedStatement)
- locked <0x6615a2d0> (a oracle.jdbc.driver.T2CConnection)
2013-03-05 10:58:30
- locked <0x6957bec8> (a oracle.jdbc.driver.T2CPreparedStatement)
- locked <0x66157e90> (a oracle.jdbc.driver.T2CConnection)
2013-03-05 17:16:31
- locked <0x69579fb0> (a oracle.jdbc.driver.T2CPreparedStatement)
- locked <0x66157d68> (a oracle.jdbc.driver.T2CConnection)
seems like the lock hold by jdbc client has changed, but the first two in 2013-03-04 16:54:12 and 2013-03-04 17:20:53, they are the same
I am using a Excutors.newSingleThreadExecutor() doing a query job in backgroud, and the following task will be submit to this executor Service in 20 minutes interval, but the work thread seems to hangs while executing the query , so the following task will not be executed. It last for several days , no exception occur and no log output at all, somebody can help me ? thanks
The problem is likely to lie in your actual query - is it a long running task? Do you know how long on average it takes to complete?
Its possible (but this is dependent on the query itself) that a previous query is locking tables which in turn block later ones.
The first thing I would do it to verify that the query does indeed complete within 20 minutes, and does so consistently. After you know that it does you can then investigate other possible causes for the hanging behavior you're seeing.
When the query is running I'd also suggest that you check the explain plan for your query (see what its doing - whats taking the most time etc). I'd also check that the relevant statistics are up to date (perhaps less of an issue on newer versions of Oracle)
This problem has been solved~
after several real time executes , out dba monitor the db server side, and find that the query time is 210s which execeeds the query-out-time(180s), but the cancel request of the stamentent may not be effective and the db server may not response to that request . so the thread which porform the original query hangs. the similar situation see [When I call PreparedStatement.cancel() in a JDBC application, does it actually kill it in an Oracle database?
the solution can been done by following:
add sqlprofile to the target table,and make sql using skip scan to short the query time.
but the deepest reason why hanging will occur is still unknown.
When I try to lease tasks from a pull queue in my application, I get the error below. This didn't happen previously with my code, so something has changed. I suspect that it's a threading issue - these calls are made in a separate thread (created using ThreadManager.createThreadForCurrentRequest) - has that recently been disallowed?
uk.org.jaggard.myapp.BlahBlahBlahRunnable run: Exception leasing tasks. Already tried 0 times.
com.google.apphosting.api.ApiProxy$CancelledException: The API call taskqueue.QueryAndOwnTasks() was explicitly cancelled.
at com.google.apphosting.runtime.ApiProxyImpl.doSyncCall(ApiProxyImpl.java:218)
at com.google.apphosting.runtime.ApiProxyImpl.access$000(ApiProxyImpl.java:68)
at com.google.apphosting.runtime.ApiProxyImpl$1.run(ApiProxyImpl.java:182)
at com.google.apphosting.runtime.ApiProxyImpl$1.run(ApiProxyImpl.java:180)
at java.security.AccessController.doPrivileged(Native Method)
at com.google.apphosting.runtime.ApiProxyImpl.makeSyncCall(ApiProxyImpl.java:180)
at com.google.apphosting.runtime.ApiProxyImpl.makeSyncCall(ApiProxyImpl.java:68)
at com.google.appengine.tools.appstats.Recorder.makeSyncCall(Recorder.java:323)
at com.googlecode.objectify.cache.TriggerFutureHook.makeSyncCall(TriggerFutureHook.java:154)
at com.google.apphosting.api.ApiProxy.makeSyncCall(ApiProxy.java:105)
at com.google.appengine.api.taskqueue.QueueApiHelper.makeSyncCall(QueueApiHelper.java:44)
at com.google.appengine.api.taskqueue.QueueImpl.leaseTasksInternal(QueueImpl.java:709)
at com.google.appengine.api.taskqueue.QueueImpl.leaseTasks(QueueImpl.java:731)
at uk.org.jaggard.myapp.BlahBlahBlahRunnable.run(BlahBlahBlahRunnable.java:53)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1$1.run(ApiProxyImpl.java:997)
at java.security.AccessController.doPrivileged(Native Method)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1.run(ApiProxyImpl.java:994)
at java.lang.Thread.run(Thread.java:679)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$2$1.run(ApiProxyImpl.java:1031)
I suspect the warnings in your application's logs will contain
"Thread was interrupted, throwing CancelledException."
Java's InterruptedException is discussed in
http://www.ibm.com/developerworks/java/library/j-jtp05236/index.html
and the book "Java Concurrency in Practice".
I combined large parts of the netty examples HexdumpProxy (http://netty.io/docs/stable/xref/org/jboss/netty/example/proxy/) and SecureChat (http://netty.io/docs/stable/xref/org/jboss/netty/example/securechat) to form an SSL (and non-SSL) capable proxy (to a non-SSL back end). This seemed like a good idea, proved to be simple enough and it is exactly what I need, currently.
The proxy example code uses a trafficlock lock as a solution to a race condition reported in 2010 (http://markmail.org/message/x7jc6mqx6ripynqf) happening around saturated channels and setting the writeable and readable state of the incoming and outgoing channels.
Now, in my combined example, under higher load, this leads to a deadlock because another lock "handshakelock" in the SSL code is intertwined. See the profiler diagnostic output below.
I fear that, even after reading the original discussion, I do not understand the underlying issue with the traffic lock good enough to find a straightforward solution for this deadlock.
(this was with netty 3.2.6)
Java-level deadlock has been detected
This means that some threads are blocked waiting to enter a synchronization block or
waiting to reenter a synchronization block after an Object.wait() call, where each thread
owns one monitor while trying to obtain another monitor already held by another thread.
Deadlock:
New I/O client worker #1-2 is waiting to lock java.lang.Object#7900f3c9 which is held by New I/O server worker #1-2
New I/O server worker #1-2 is waiting to lock java.lang.Object#2d854f2f which is held by New I/O client worker #1-2
Thread stacks
New I/O client worker #1-2 [BLOCKED; waiting to lock java.lang.Object#7900f3c9]
org.jboss.netty.handler.ssl.SslHandler.wrap(SslHandler.java:665) <== sync handshakelock
org.jboss.netty.handler.ssl.SslHandler.handleDownstream(SslHandler.java:461)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:776)
org.jboss.netty.channel.Channels.write(Channels.java:632)
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
org.jboss.netty.channel.Channels.write(Channels.java:611)
org.jboss.netty.channel.Channels.write(Channels.java:578)
org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)
com.activevideo.frontend.ProxyInboundHandler$OutboundHandler.messageReceived(ProxyInboundHandler.java:162) <== sync trafficlock
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(unknown source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(unknown source)
java.lang.Thread.run(unknown source)
New I/O server worker #1-2 [BLOCKED; waiting to lock java.lang.Object#2d854f2f]
com.activevideo.frontend.ProxyInboundHandler.channelInterestChanged(ProxyInboundHandler.java:138) <== sync trafficlock
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:116)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelInterestChanged(SimpleChannelUpstreamHandler.java:192)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:116)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelInterestChanged(SimpleChannelUpstreamHandler.java:192)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:116)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelInterestChanged(SimpleChannelUpstreamHandler.java:192)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:116)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
org.jboss.netty.channel.Channels.fireChannelInterestChanged(Channels.java:335)
org.jboss.netty.channel.socket.nio.NioWorker.setInterestOps(NioWorker.java:728)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:129)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
org.jboss.netty.handler.ssl.SslHandler.handleDownstream(SslHandler.java:430)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:776)
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
org.jboss.netty.channel.Channels.setInterestOps(Channels.java:652)
org.jboss.netty.channel.AbstractChannel.setInterestOps(AbstractChannel.java:222)
org.jboss.netty.channel.AbstractChannel.setReadable(AbstractChannel.java:244)
com.activevideo.frontend.SSLProxyInboundHandler$BackendConnector$1.operationComplete(SSLProxyInboundHandler.java:92)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
org.jboss.netty.channel.DefaultChannelFuture.addListener(DefaultChannelFuture.java:148)
com.activevideo.frontend.SSLProxyInboundHandler$BackendConnector.operationComplete(SSLProxyInboundHandler.java:87)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
org.jboss.netty.handler.ssl.SslHandler.setHandshakeSuccess(SslHandler.java:1040)
org.jboss.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:838)
org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:907) <=== sync handshakelock
org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:620)
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:214)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(unknown source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(unknown source)
java.lang.Thread.run(unknown source)
Okay, I think I fixed it.
The solution to the earlier race condition in the Proxy example unnecessarily includes the call to write() in the synchronized block.
The original problem there was this (rare) race condition between the outgoing thread (TO) and incoming thread (TI):
TO: inboundChannel.write() (at messageReceived)
TO: inboundChannel.isWritable() returns false (at messageReceived)
Then pending writes issued at (1) are flushed
TI: inboundChannel.isWritable() returns true (at channelInterestChanged)
TI: outboundChannel.setReadable(true) (at channelInterestChanged)
TO: outboundChannel.setReadable(false) (at messageReceived)
The solution in 2010 was to introduce synchronizing (using 'trafficlock') around setting the readable flag in the messageReceived() handler of both the inboundChannel and outboundChannel (and also in the InterestChanged handler), like so:
synchronized (trafficLock) {
outboundChannel.write(msg);
// If outboundChannel is saturated, do not read until notified in
// OutboundHandler.channelInterestChanged().
if (!outboundChannel.isWritable()) {
e.getChannel().setReadable(false);
}
}
This indeed solves the race condition, as you prevent steps 3, 4 and 5 to interfere with the steps 2 and 6. However, it is safe to leave step 1, the write(), out of the synchronized block.
The call to write() caused the deadlock, because way down that call in the SSLHandler, it uses another lock handshakelock.
So.. I moved the call to write() outside the synchronized block in both locations. The deadlock is now gone. I suggest the 'official' proxy example is changed accordingly.