Jetty Websocket IdleTimeout - java

I've been working on annotated websockets lately, with the Jetty API (9.4.5 release) , and made a chat with it.
However i got an issue, after 5 minutes (which i believe is the default timer), the session is closed (it is not due to an error).
The only solution I've found yet, is to notify my socket On closing event and reopen the connection in a new socket.
However i've read on stackOverflow, that by setting IdleTimeOut in the WebsocketPolicy, i could avoid the issue:
I've tried setting to 3600000 for instance, but the behavior does not change at all
I also tried to set it to -1 but i get the following error: IdleTimeout [-1] must be a greater than or equal to 0
private ServletContextHandler setupWebsocketContext() {
ServletContextHandler websocketContext = new AmosContextHandler(ServletContextHandler.SESSIONS | ServletContextHandler.SECURITY);
WebSocketHandler socketCreator = new WebSocketHandler(){
#Override
public void configure(WebSocketServletFactory factory){
factory.getPolicy().setIdleTimeout(-1);
factory.getPolicy().setMaxTextMessageBufferSize(MAX_MESSAGE_SIZE);
factory.getPolicy().setMaxBinaryMessageBufferSize(MAX_MESSAGE_SIZE);
factory.getPolicy().setMaxTextMessageSize(MAX_MESSAGE_SIZE);
factory.getPolicy().setMaxBinaryMessageSize(MAX_MESSAGE_SIZE);
factory.setCreator(new UpgradedSocketCreator());
}
};
ServletHolder sh = new ServletHolder(new WebsocketChatServlet());
websocketContext.addServlet(sh, "/*");
websocketContext.setContextPath("/Chat");
websocketContext.setHandler(socketCreator);
websocketContext.getSessionHandler().setMaxInactiveInterval(0);
return websocketContext;
}
I've also tried to change the policy directly in the OnConnect event, by using the call session.getpolicy.setIdleTimeOut(), but I haven't noticed any results.
Is this an expected behavior or am I missing something? Thanks for your help.
EDIT:
Log on the closure:
Client Side:
2017-07-03T12:48:00.552 DEBUG HttpClient#179313750-scheduler Ignored idle endpoint SocketChannelEndPoint#2fb4b627{localhost/127.0.0.1:5080<->/127.0.0.1:53835,OPEN,fill=-,flush=-,to=1/300000}{io=0/0,kio=0,kro=1}->WebSocketClientConnection#e0198ece[ios=IOState#3ac0ec79[CLOSING,in,!out,close=CloseInfo[code=1000,reason=null],clean=false,closeSource=LOCAL],f=Flusher[queueSize=0,aggregateSize=0,failure=null],g=Generator[CLIENT,validating],p=Parser#65c4d838[ExtensionStack,s=START,c=0,len=187,f=null]]
Server side:
2017-07-03T12:48:00.595 DEBUG Idle pool thread onClose WebSocketServerConnection#e0033d54[ios=IOState#10d40dca[CLOSED,!in,!out,finalClose=CloseInfo[code=1000,reason=null],clean=true,closeSource=REMOTE],f=Flusher[queueSize=0,aggregateSize=0,failure=null],g=Generator[SERVER,validating],p=Parser#317213f3[ExtensionStack,s=START,c=0,len=2,f=CLOSE[len=2,fin=true,rsv=...,masked=true]]]<-SocketChannelEndPoint#690dfbfb'{'/127.0.0.1:53835<->/127.0.0.1:5080,CLOSED,fill=-,flush=-,to=1/360000000}'{'io=0/0,kio=-1,kro=-1}->WebSocketServerConnection#e0033d54[ios=IOState#10d40dca[CLOSED,!in,!out,finalClose=CloseInfo[code=1000,reason=null],clean=true,closeSource=REMOTE],f=Flusher[queueSize=0,aggregateSize=0,failure=null],g=Generator[SERVER,validating],p=Parser#317213f3[ExtensionStack,s=START,c=0,len=2,f=CLOSE[len=2,fin=true,rsv=...,masked=true]]]
2017-07-03T12:48:00.595 DEBUG Idle pool thread org.eclipse.jetty.util.thread.Invocable$InvocableExecutor#4f13dee2 invoked org.eclipse.jetty.io.ManagedSelector$$Lambda$193/682154970#551e133a
2017-07-03T12:48:00.595 DEBUG Idle pool thread EatWhatYouKill#6ba355e4/org.eclipse.jetty.io.ManagedSelector$SelectorProducer#7b1559f1/PRODUCING/0/1 produce exit
2017-07-03T12:48:00.595 DEBUG Idle pool thread ran EatWhatYouKill#6ba355e4/org.eclipse.jetty.io.ManagedSelector$SelectorProducer#7b1559f1/PRODUCING/0/1
2017-07-03T12:48:00.595 DEBUG Idle pool thread run EatWhatYouKill#6ba355e4/org.eclipse.jetty.io.ManagedSelector$SelectorProducer#7b1559f1/PRODUCING/0/1
2017-07-03T12:48:00.595 DEBUG Idle pool thread EatWhatYouKill#6ba355e4/org.eclipse.jetty.io.ManagedSelector$SelectorProducer#7b1559f1/PRODUCING/0/1 run
2017-07-03T12:48:00.597 DEBUG Idle pool thread 127.0.0.1 has disconnected !
2017-07-03T12:48:00.597 DEBUG Idle pool thread Disconnected: 127.0.0.1 (127.0.0.1) (statusCode= 1,000 , reason=null)

Annotated WebSockets have their own timeout settings in the annotation.
#WebSocket(maxIdleTime=30000)

The annotation #WebSocket has option:
int maxIdleTime() default -2;
In fact it's not clear what does it mean.
If you check implementation, you can find:
if (anno.maxIdleTime() > 0)
{
this.policy.setIdleTimeout(anno.maxIdleTime());
}
method implementation:
/**
* The time in ms (milliseconds) that a websocket may be idle before closing.
*
* #param ms
* the timeout in milliseconds
*/
public void setIdleTimeout(long ms)
{
assertGreaterThan("IdleTimeout",ms,0);
this.idleTimeout = ms;
}
and finally:
/**
* The time in ms (milliseconds) that a websocket may be idle before closing.
* <p>
* Default: 300000 (ms)
*/
private long idleTimeout = 300000;
Conclusion: negative value apply default behavior (300000 ms). You need to configure 'idleTimeout' according your business value.
PS: solved my case with:
#WebSocket(maxIdleTime = Integer.MAX_VALUE)

Related

MarkLogic Java API deadlock detection

One of our application just suffered from some nasty deadlocks. I had quite a hard time recreating the problem because the deadlock (or stacktrace) did not show up immediately in my java application logs.
To my surprise the marklogic java api retries failing requests (e.g because of a deadlock). This might make sense, if your request is not a multi statement request, but otherwise i'm not sure if it does.
So lets stick with this deadlock problem. I created a simple code snippet in which i create a deadlock on purpose. The snippet creates a document test.xml and then tries to read and write from two different transactions, each on a new thread.
public static void main(String[] args) throws Exception {
final Logger root = (Logger) LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME);
final Logger ok = (Logger) LoggerFactory.getLogger(OkHttpServices.class);
root.setLevel(Level.ALL);
ok.setLevel(Level.ALL);
final DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, new DatabaseClientFactory.DigestAuthContext("username", "password"));
final StringHandle handle = new StringHandle("<doc><name>Test</name></doc>")
.withFormat(Format.XML);
client.newTextDocumentManager().write("test.xml", handle);
root.info("t1: opening");
final Transaction t1 = client.openTransaction();
root.info("t1: reading");
client.newXMLDocumentManager()
.read("test.xml", new StringHandle(), t1);
root.info("t2: opening");
final Transaction t2 = client.openTransaction();
root.info("t2: reading");
client.newXMLDocumentManager()
.read("test.xml", new StringHandle(), t2);
new Thread(() -> {
root.info("t1: writing");
client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t1</t></doc>").withFormat(Format.XML), t1);
t1.commit();
}).start();
new Thread(() -> {
root.info("t2: writing");
client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t2</t></doc>").withFormat(Format.XML), t2);
t2.commit();
}).start();
TimeUnit.MINUTES.sleep(5);
client.release();
}
This code will produce the following log:
14:12:27.437 [main] DEBUG c.m.client.impl.OkHttpServices - Connecting to localhost at 8000 as admin
14:12:27.570 [main] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction null
14:12:27.608 [main] INFO ROOT - t1: opening
14:12:27.609 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:27.962 [main] INFO ROOT - t1: reading
14:12:27.963 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 5298588351036278526
14:12:28.283 [main] INFO ROOT - t2: opening
14:12:28.283 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:28.286 [main] INFO ROOT - t2: reading
14:12:28.286 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 8819382734425123844
14:12:28.289 [Thread-1] INFO ROOT - t1: writing
14:12:28.289 [Thread-1] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 5298588351036278526
14:12:28.289 [Thread-2] INFO ROOT - t2: writing
14:12:28.290 [Thread-2] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 8819382734425123844
Neither t1 or t2 will get commited. MarkLogic logs confirm that there actually is a deadlock:
==> /var/opt/MarkLogic/Logs/8000_AccessLog.txt <==
127.0.0.1 - admin [24/Nov/2018:14:12:30 +0000] "PUT /v1/documents?txid=5298588351036278526&category=content&uri=test.xml HTTP/1.1" 503 1034 - "okhttp/3.9.0"
==> /var/opt/MarkLogic/Logs/ErrorLog.txt <==
2018-11-24 14:12:30.719 Info: Deadlock detected locking Documents test.xml
This would not be a problem, if one of the requests would fail and throw an exception, but this is not the case. MarkLogic Java Api retries every request up to 120 seconds and one of the updates timeouts after like 120 seconds or so:
Exception in thread "Thread-1" com.marklogic.client.FailedRequestException: Service unavailable and maximum retry period elapsed: 121 seconds after 65 retries
at com.marklogic.client.impl.OkHttpServices.putPostDocumentImpl(OkHttpServices.java:1422)
at com.marklogic.client.impl.OkHttpServices.putDocument(OkHttpServices.java:1256)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:920)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:758)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:717)
at Scratch.lambda$main$0(scratch.java:40)
at java.lang.Thread.run(Thread.java:748)
What are possible ways to overcome this problem? One way might be to set a maximum time to live for a transaction (like 5 seconds), but this feels hacky and unreliable. Any other ideas? Are there any other settings i should check out?
I'm on MarkLogic 9.0-7.2 and using marklogic-client-api:4.0.3.
Edit: One way to solve the deadlock would be by syncronizing the calling function, this is actually the way i solved it in my case (see comments). But i think the underlying problem still exists. Having a deadlock in a multi statement transaction should not be hidden away in a 120 second timeout. I rather have a immediately failing request than a 120 second lock on one of my documents + 64 failing retries per thread.
Deadlocks are usually resolvable by retrying. Internally, the server does a inner-retry loop because usually deadlocks are transient and incidental, lasting a very short time. In your case you have constructed a case that will never succeed with any timeout that's equal for both threads.
Deadlocks can be avoided at the application layer by avoiding multi-statement transactions when using the REST API. (which is what the Java api uses).
Multi statement transactions over REST cannot be implemented 100% safely due to the client's responsibility to manage the transaction ID and the server's inability to detect client-side errors or client-side identity. Very subtle problems can and do occur unless you are aggressively proactive wrt handling errors and multithreading. If you 'push' the logic to the server (xquery or javascript) the server is able to manage things much better.
As for if its 'good' or not for the Java API to implement retries for this case, that's debatable either way. (The compromise for an seemingly easy-to-use interface is that many things that would otherwise be options are decided for you as a convention. There's generally no one-size-fits-all answer. In this case I am presuming the thought was that a deadlock is more likely caused by independant code/logic by 'accident' as opposed to identical code running in tangent -- a retry in that case would be a good choice. In your example its not, but then an earlier error would still fail predictably until you change your code to 'not do that' ).
If it doesn't already exist, a feature request for a configurable timeout and retry behaviour does seem a reasonable request. I would recommend, however, to attempt to avoid any REST calls that result in an open transaction -- inherently that is problematic, particularly if you don't notice the problem upfront (then its more likely to bite you in production). Unlike JDBC, which keeps a connection open so that the server can detect client disconnects, HTTP and the ML Rest API do not -- which leads to a different programming model then traditional database coding in java.

Taking 5 seconds to shutdown a java grpc ManagedChannel

I have a client that needs to disconnect from one server and connect to another. Its taking about 16 seconds. I still haven't debugged the connection logic, but I can see the shutdown of the channel is taking 5 seconds. Is this expected behavior, or should I be looking for thread starvation in my code.
LOG.debug("==============SHUTTING DOWN MANAGED CHANNEL");
long startTime=System.currentTimeMillis();
channel.shutdown().awaitTermination(20, SECONDS);
long endTime=System.currentTimeMillis();
LOG.debug("Time to shutdown channel ms = {}",endTime-startTime);
LOG.debug("==============RETURN FROM SHUTTING DOWN MANAGED CHANNEL");
From the log
2018-07-09 14:41:23,143 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) ==============SHUTTING DOWN MANAGED CHANNEL
2018-07-09 14:41:28,151 INFO [io.grpc.internal.ManagedChannelImpl] (grpc-default-worker-ELG-1-1) [io.grpc.internal.ManagedChannelImpl-1] Terminated
2018-07-09 14:41:28,152 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) Time to shutdown channel ms = 5009
2018-07-09 14:41:28,152 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) ==============RETURN FROM SHUTTING DOWN MANAGED CHANNEL
There are two shutdown functions, shutdown and shutdownNow. Is there any chance you have a calls going that are blocking shutdown? You may be better served by shutdownNow.
shutdown
Initiates an orderly shutdown in which preexisting calls continue but new calls are rejected.
shutdownNow
Initiates a forceful shutdown in which preexisting and new calls are rejected. Although forceful, the shutdown process is still not instantaneous; isTerminated() will likely return false immediately after this method returns.

Connecting to dcm4chee using dcm4che from a JAVA program

Update
I dug deeper in dcm4che's source code and found that an IncompatibleConnectionException is thrown if either
a connection is "not installed"
or the types of protocols are not set or don't match.
I don't know what it means that a connection is "installed" but this flag can be set manually, so I set it for both the local and remote connections to true (even checked them with getInstalled() whether they are "installed" - and yes they are now - previously this property was null).
And as to the protocols, they weren't specified, so for both connections I set them to DICOM.
Results: I still get the same Exception.
I'd like to establish a DICOM association between dcm4chee (2.18.3) and my JAVA application using the dcm4che (5.12.0) toolkit.
The problem is that it doesn't seem to be any documentation available on how to use dcm4che in a JAVA application, so all I can do is read dcm4che's source code and try to figure out what its classes and methods are for, but I'm stuck. If someone already has a working example it would be very helpful.
So far I have:
import org.dcm4che3.net.ApplicationEntity;
import org.dcm4che3.net.Association;
import org.dcm4che3.net.Connection;
import org.dcm4che3.net.Device;
import org.dcm4che3.net.pdu.AAssociateRQ;
import org.dcm4che3.net.pdu.PresentationContext;
...
ApplicationEntity locAE = new ApplicationEntity();
locAE.setAETitle("THIS_JAVA_APP");
Connection localConn = new Connection();
localConn.setCommonName("loc_conn");
localConn.setHostname("localhost");
localConn.setPort(11112);
localConn.setProtocol(Connection.Protocol.DICOM);
localConn.setInstalled(true);
locAE.addConnection(localConn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle("DCM4CHEE");
Connection remoteConn = new Connection();
remoteConn.setCommonName("rem_conn");
remoteConn.setHostname("localhost");
remoteConn.setPort(11112);
remoteConn.setProtocol(Connection.Protocol.DICOM);
remoteConn.setInstalled(true);
remAE.addConnection(remoteConn);
AAssociateRQ assocReq = new AAssociateRQ();
assocReq.setCalledAET(remAE.getAETitle());
assocReq.setCallingAET(locAE.getAETitle());
assocReq.setApplicationContext("1.2.840.10008.3.1.1.1");
assocReq.setImplClassUID("1.2.40.0.13.1.3");
assocReq.setImplVersionName("dcm4che-5.12.0");
assocReq.setMaxPDULength(16384);
assocReq.setMaxOpsInvoked(0);
assocReq.setMaxOpsPerformed(0);
assocReq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.1.1", "1.2.840.10008.1.2"));
Device device = new Device("device");
device.addConnection(localConn);
device.addApplicationEntity(locAE);
Association assoc = locAE.connect(remAE, assocReq);
but I don't know whether I'm on the right path doing it.
The error I get:
org.dcm4che3.net.IncompatibleConnectionException: No compatible connection to DCM4CHEE available on THIS_JAVA_APP
at org.dcm4che3.net.ApplicationEntity.findCompatibelConnection(ApplicationEntity.java:646)
at org.dcm4che3.net.ApplicationEntity.connect(ApplicationEntity.java:651)
Could it be, that You are missing a Device instance from Your setup? It seems, that You need a Device, to which You attach both ApplicationEntity and Connection.
Looking at FindSCU.java source from dcm4che source.
private final Device device = new Device("findscu");
private final ApplicationEntity ae = new ApplicationEntity("FINDSCU");
private final Connection conn = new Connection();
public FindSCU() throws IOException {
device.addConnection(conn);
device.addApplicationEntity(ae);
ae.addConnection(conn);
}
I also think, that maybe the local Connection object can be instantiated without any parameters as the FindSCU example here demonstrates. Maybe the parameters are confusing it somehow, especially considering, that you have both local and remote connections pointing to localhost:11112.
But yes, one has to agree, that the documentation for dcm4che3 API is totally inadequate.
Here is the working code: (I don't know if it's the minimal solution, feel free to experiment with it...)
ApplicationEntity locAE = new ApplicationEntity();
locAE.setAETitle("THIS_JAVA_APP");
locAE.setInstalled(true);
Connection localConn = new Connection();
localConn.setCommonName("loc_conn");
localConn.setHostname("localhost");
localConn.setPort(11112);
localConn.setProtocol(Connection.Protocol.DICOM);
localConn.setInstalled(true);
locAE.addConnection(localConn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle("DCM4CHEE");
remAE.setInstalled(true);
Connection remoteConn = new Connection();
remoteConn.setCommonName("rem_conn");
remoteConn.setHostname("localhost");
remoteConn.setPort(11112);
remoteConn.setProtocol(Connection.Protocol.DICOM);
remoteConn.setInstalled(true);
remAE.addConnection(remoteConn);
AAssociateRQ assocReq = new AAssociateRQ();
assocReq.setCalledAET(remAE.getAETitle());
assocReq.setCallingAET(locAE.getAETitle());
assocReq.setApplicationContext("1.2.840.10008.3.1.1.1");
assocReq.setImplClassUID("1.2.40.0.13.1.3");
assocReq.setImplVersionName("dcm4che-5.12.0");
assocReq.setMaxPDULength(16384);
assocReq.setMaxOpsInvoked(0);
assocReq.setMaxOpsPerformed(0);
assocReq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.1.1", "1.2.840.10008.1.2"));
Device device = new Device("device");
device.addConnection(localConn);
device.addApplicationEntity(locAE);
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
Association assoc = locAE.connect(localConn, remoteConn, assocReq);
And the relevant dcm4chee log:
2018-03-02 23:21:42,832 INFO THIS_JAVA_APP->DCM4CHEE (TCPServer-1) [org.dcm4cheri.net.FsmImpl] received AAssociateRQ
appCtxName: 1.2.840.10008.3.1.1.1/DICOM Application Context Name
implClass: 1.2.40.0.13.1.3
implVersion: dcm4che-5.12.0
calledAET: DCM4CHEE
callingAET: THIS_JAVA_APP
maxPDULen: 16378
asyncOpsWindow:
pc-1: as=1.2.840.10008.1.1/Verification SOP Class
ts=1.2.840.10008.1.2/Implicit VR Little Endian
2018-03-02 23:21:42,843 INFO THIS_JAVA_APP->DCM4CHEE (TCPServer-1) [org.dcm4cheri.net.FsmImpl] sending AAssociateAC
appCtxName: 1.2.840.10008.3.1.1.1/DICOM Application Context Name
implClass: 1.2.40.0.13.1.1.1
implVersion: dcm4che-1.4.34
calledAET: DCM4CHEE
callingAET: THIS_JAVA_APP
maxPDULen: 16352
asyncOpsWindow:
pc-1: 0 - acceptance
ts=1.2.840.10008.1.2/Implicit VR Little Endian
After you have the association, see this other post for how to perform a C-FIND.
Edit
Apparently, I solved the problem. Changing the executor from
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
to
ExecutorService executorService = Executors.newSingleThreadExecutor();
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
device.setExecutor(executorService);
device.setScheduledExecutor(scheduledExecutorService);
made it so my application correctly received the association response from the server. This might serve as reference for someone else.
Thank you for sharing your code. It was really helpful to me.
Original Post
I am unable to perform the connection with a code similar to the solution you proposed. I am trying to request an association with a dcm4chee-arc-light with dcm4che (both 5.14.1), and I have as it follows:
Device device = new Device(deviceName);
ApplicationEntity locAE = new ApplicationEntity(localAE);
Connection conn = new Connection();
Connection remote = new Connection();
AAssociateRQ rq = new AAssociateRQ();
device.addConnection(conn);
device.addApplicationEntity(locAE);
locAE.addConnection(conn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle(remoteAE);
remote.setCommonName("rem_conn");
remote.setHostname(remoteIP);
remote.setPort(remotePort);
remote.setProtocol(Connection.Protocol.DICOM);
remAE.addConnection(remote);
rq.setCalledAET(remAE.getAETitle());
rq.setCallingAET(locAE.getAETitle());
rq.setApplicationContext("1.2.840.10008.3.1.1.1");
rq.setImplClassUID("1.2.40.0.13.1.3");
rq.setImplVersionName("dcm4che-5.14.1");
rq.setMaxPDULength(16384);
rq.setMaxOpsInvoked(0);
rq.setMaxOpsPerformed(0);
rq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.5.1.4.1.2.2.1", "1.2.840.10008.1.2"));
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
//Opens association and connects to remote server
Association as = locAE.connect(conn, remote, rq);
But when trying to connect to a remote AET, it doesn't seem to receive the AAssociation response from the remote AET. My Java application hangs in Sta5 (waiting for association response) while the server hangs in Sta6 (ready for data transfer).
Java log:
[main] INFO org.dcm4che3.net.Connection - Initiate connection from 0.0.0.0/0.0.0.0:0 to localhost:11112
[main] INFO org.dcm4che3.net.Connection - Established connection Socket[addr=localhost/127.0.0.1,port=11112,localport=50101]
[main] DEBUG org.dcm4che3.net.Association - /127.0.0.1:50101>localhost/127.0.0.1:11112(1): enter state: Sta4 - Awaiting transport connection opening to complete
[main] INFO org.dcm4che3.net.Association - DEVICEAE->DCMQRSCP(1) << A-ASSOCIATE-RQ
[main] DEBUG org.dcm4che3.net.Association - A-ASSOCIATE-RQ[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
as: 1.2.840.10008.5.1.4.1.2.2.1 - Study Root Query/Retrieve Information Model - FIND
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
[main] DEBUG org.dcm4che3.net.Association - DEVICEAE->DCMQRSCP(1): enter state: Sta5 - Awaiting A-ASSOCIATE-AC or A-ASSOCIATE-RJ PDU
Server log:
19:11:29,397 INFO - Accept connection Socket[addr=/127.0.0.1,port=50101,localport=11112]
19:11:29,397 DEBUG - /127.0.0.1:11112<-/127.0.0.1:50101(3): enter state: Sta2 - Transport connection open
19:11:29,416 INFO - DCMQRSCP<-DEVICEAE(3) >> A-ASSOCIATE-RQ
19:11:29,416 DEBUG - A-ASSOCIATE-RQ[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
as: 1.2.840.10008.5.1.4.1.2.2.1 - Study Root Query/Retrieve Information Model - FIND
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
19:11:29,419 DEBUG - DCMQRSCP<-DEVICEAE(3): enter state: Sta3 - Awaiting local A-ASSOCIATE response primitive
19:11:29,419 INFO - DCMQRSCP<-DEVICEAE(3) << A-ASSOCIATE-AC
19:11:29,419 DEBUG - A-ASSOCIATE-AC[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
result: 0 - acceptance
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
19:11:29,427 DEBUG - DCMQRSCP<-DEVICEAE(3): enter state: Sta6 - Association established and ready for data transfer
I feel like I am missing something, but I cannot find the source of the problem. Any help is appreciated, as I am still new to dcm4che and DICOM protocol.
Thank you.

Hazelcast Operation Heartbeat Timeouts appearing sporadically

We have a Hazelcast client (3.7.4):
//Initializes Hazelcast client config
ClientConfig aHazelcastClientConfig = new ClientConfig();
String aHazelcastUrl = this.getHost()+":"+this.getPort().toString();
ClientNetworkConfig aHazelcastNetworkConfig=
aHazelcastClientConfig.getNetworkConfig();
aHazelcastNetworkConfig.addAddress(aHazelcastUrl);
GroupConfig group = new GroupConfig (getGroupName(),getGroupPassword());
aHazelcastClientConfig.setGroupConfig(group);
HazelcastInstance aHazelcastClient=
HazelcastClient.newHazelcastClient(aHazelcastClientConfig);
...
IMap aMonitoredMap = aHazelcastClient.getMap(getMonitoredMap());
that periodically checks one HZ Server (3.7.4), and we have observed sometimes next exceptions are appearing in the client side:
InitializeDistributedObjectOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2017-02-07 18:07:30.329. Total elapsed time: 120189 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2017-02-07 18:05:37.489. Invocation{op=com.hazelcast.spi.impl.proxyservice.impl.operations.InitializeDistributedObjectOperation{serviceName='hz:impl:mapService', identityHash=9759664, partitionId=-1, replicaIndex=0, callId=0, invocationTime=1486487130140 (2017-02-07 18:05:30.140), waitTimeout=-1, callTimeout=60000}, tryCount=1, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1486487130140, firstInvocationTime='2017-02-07 18:05:30.140', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=[10.118.152.82]:5720, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=7, /172.22.191.200:5720->/10.118.152.82:42563, endpoint=[10.118.152.82]:5720, alive=true, type=MEMBER]}
It seems the maximum call waiting timeout (by default 60000 msecs) is being reached. In the above example, the total elapsed time is more than 2 minutes (120189 ms)
This problem is appearing sporadically, without any regular appearance pattern.
It seems the network is working correctly when it has appeared, so we can discard some network connectivity issue.
Any hint or recommendation about which reasons could provoke it?
Thanks a lot.
Best Regards,
Jorge

Terracotta Ehcache: server disconnects during debug

I found out, that when I connect by debugger to the application, and starting to debug,
the connection to terracotta server is lost (?) and in the terracotta server logs next messages are appeared:
2012-03-30 13:45:06,758 [L2_L1:TCComm Main Selector Thread_R (listen
0.0.0.0:9510)] WARN com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 1 2012-03-30 13:45:27,761 [L2_L1:TCComm Main Selector Thread_R
(listen 0.0.0.0:9510)] WARN
com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 1 2012-03-30 13:45:31,761 [L2_L1:TCComm Main Selector Thread_R
(listen 0.0.0.0:9510)] WARN
com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 2
...
2012-03-30 13:46:37,768 [L2_L1:TCComm Main Selector Thread_R (listen
0.0.0.0:9510)] ERROR com.tc.net.protocol.transport.ConnectionHealthChecke rImpl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 10. But its too long. No more retries 2012-03-30 13:46:38,768
[HealthChecker] INFO
com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Server
- 127.0.0.1:55112 is DEAD 2012-03-30 13:46:38,768 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO
Server - Declared connection dead
ConnectionID(1.0b1994ac80f14b7191080bdc3f38582a) idle time 45317ms
2012-03-30 13:46:38,768 [L2_L1:TCWorkerComm # 0_R] WARN
com.tc.net.protocol.transport.ServerMessageTransport -
ConnectionID(1.0b1994ac80f14b71 91080bdc3f38582a): CLOSE EVENT :
com.tc.net.core.TCConnectionJDK14#5158277: connected: false, closed:
true local=127.0.0.1:9510 remote=127.0.0 .1:55112 connect=[Fri Mar 30
13:34:22 BST 2012] idle=2001ms [207584 read, 229735 write]. STATUS :
DISCONNECTED
...
2012-03-30 13:46:38,799 [L2_L1:TCWorkerComm # 0_R] INFO
com.tc.objectserver.persistence.sleepycat.SleepycatPersistor - Deleted
client state fo r ChannelID=[1] 2012-03-30 13:46:38,801
[WorkerThread(channel_life_cycle_stage, 0)] INFO
com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received tran
sport disconnect. Shutting down client ClientID[1] 2012-03-30
13:46:38,801 [WorkerThread(channel_life_cycle_stage, 0)] INFO
com.tc.objectserver.persistence.impl.TransactionStoreImpl - shutdownC
lient() : Removing txns from DB : 0
After this is happened, any operation with cache, like getWithLoader just doesn't answer, until terracotta server won't be restarted again.
Question: how can it be fixed/reconfigured? I assume, it can happen in production also (and actually sometimes happens) if for some (any) reason application will hang/staled/etc.
This is just to get you started.
TC connections betwee server and client are considered dead when the applicable HealthCheck fails. The default values for the HealthCheck assume a very stable and performant network. I recommend you familiarize yourself with the details and the calculations on
http://www.terracotta.org/documentation/3.5.2/terracotta-server-array/high-availability#85916
So typically you begin with
a) making sure your network doesn't hiccup occasionally
b) setting the TC HealthCheck values a bit higher
If the problem persists I'd recommend posting directly on the TC forums (they'll help you even if you only use the open-source edition, may take a few days to reply though.

Categories

Resources