Update
I dug deeper in dcm4che's source code and found that an IncompatibleConnectionException is thrown if either
a connection is "not installed"
or the types of protocols are not set or don't match.
I don't know what it means that a connection is "installed" but this flag can be set manually, so I set it for both the local and remote connections to true (even checked them with getInstalled() whether they are "installed" - and yes they are now - previously this property was null).
And as to the protocols, they weren't specified, so for both connections I set them to DICOM.
Results: I still get the same Exception.
I'd like to establish a DICOM association between dcm4chee (2.18.3) and my JAVA application using the dcm4che (5.12.0) toolkit.
The problem is that it doesn't seem to be any documentation available on how to use dcm4che in a JAVA application, so all I can do is read dcm4che's source code and try to figure out what its classes and methods are for, but I'm stuck. If someone already has a working example it would be very helpful.
So far I have:
import org.dcm4che3.net.ApplicationEntity;
import org.dcm4che3.net.Association;
import org.dcm4che3.net.Connection;
import org.dcm4che3.net.Device;
import org.dcm4che3.net.pdu.AAssociateRQ;
import org.dcm4che3.net.pdu.PresentationContext;
...
ApplicationEntity locAE = new ApplicationEntity();
locAE.setAETitle("THIS_JAVA_APP");
Connection localConn = new Connection();
localConn.setCommonName("loc_conn");
localConn.setHostname("localhost");
localConn.setPort(11112);
localConn.setProtocol(Connection.Protocol.DICOM);
localConn.setInstalled(true);
locAE.addConnection(localConn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle("DCM4CHEE");
Connection remoteConn = new Connection();
remoteConn.setCommonName("rem_conn");
remoteConn.setHostname("localhost");
remoteConn.setPort(11112);
remoteConn.setProtocol(Connection.Protocol.DICOM);
remoteConn.setInstalled(true);
remAE.addConnection(remoteConn);
AAssociateRQ assocReq = new AAssociateRQ();
assocReq.setCalledAET(remAE.getAETitle());
assocReq.setCallingAET(locAE.getAETitle());
assocReq.setApplicationContext("1.2.840.10008.3.1.1.1");
assocReq.setImplClassUID("1.2.40.0.13.1.3");
assocReq.setImplVersionName("dcm4che-5.12.0");
assocReq.setMaxPDULength(16384);
assocReq.setMaxOpsInvoked(0);
assocReq.setMaxOpsPerformed(0);
assocReq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.1.1", "1.2.840.10008.1.2"));
Device device = new Device("device");
device.addConnection(localConn);
device.addApplicationEntity(locAE);
Association assoc = locAE.connect(remAE, assocReq);
but I don't know whether I'm on the right path doing it.
The error I get:
org.dcm4che3.net.IncompatibleConnectionException: No compatible connection to DCM4CHEE available on THIS_JAVA_APP
at org.dcm4che3.net.ApplicationEntity.findCompatibelConnection(ApplicationEntity.java:646)
at org.dcm4che3.net.ApplicationEntity.connect(ApplicationEntity.java:651)
Could it be, that You are missing a Device instance from Your setup? It seems, that You need a Device, to which You attach both ApplicationEntity and Connection.
Looking at FindSCU.java source from dcm4che source.
private final Device device = new Device("findscu");
private final ApplicationEntity ae = new ApplicationEntity("FINDSCU");
private final Connection conn = new Connection();
public FindSCU() throws IOException {
device.addConnection(conn);
device.addApplicationEntity(ae);
ae.addConnection(conn);
}
I also think, that maybe the local Connection object can be instantiated without any parameters as the FindSCU example here demonstrates. Maybe the parameters are confusing it somehow, especially considering, that you have both local and remote connections pointing to localhost:11112.
But yes, one has to agree, that the documentation for dcm4che3 API is totally inadequate.
Here is the working code: (I don't know if it's the minimal solution, feel free to experiment with it...)
ApplicationEntity locAE = new ApplicationEntity();
locAE.setAETitle("THIS_JAVA_APP");
locAE.setInstalled(true);
Connection localConn = new Connection();
localConn.setCommonName("loc_conn");
localConn.setHostname("localhost");
localConn.setPort(11112);
localConn.setProtocol(Connection.Protocol.DICOM);
localConn.setInstalled(true);
locAE.addConnection(localConn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle("DCM4CHEE");
remAE.setInstalled(true);
Connection remoteConn = new Connection();
remoteConn.setCommonName("rem_conn");
remoteConn.setHostname("localhost");
remoteConn.setPort(11112);
remoteConn.setProtocol(Connection.Protocol.DICOM);
remoteConn.setInstalled(true);
remAE.addConnection(remoteConn);
AAssociateRQ assocReq = new AAssociateRQ();
assocReq.setCalledAET(remAE.getAETitle());
assocReq.setCallingAET(locAE.getAETitle());
assocReq.setApplicationContext("1.2.840.10008.3.1.1.1");
assocReq.setImplClassUID("1.2.40.0.13.1.3");
assocReq.setImplVersionName("dcm4che-5.12.0");
assocReq.setMaxPDULength(16384);
assocReq.setMaxOpsInvoked(0);
assocReq.setMaxOpsPerformed(0);
assocReq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.1.1", "1.2.840.10008.1.2"));
Device device = new Device("device");
device.addConnection(localConn);
device.addApplicationEntity(locAE);
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
Association assoc = locAE.connect(localConn, remoteConn, assocReq);
And the relevant dcm4chee log:
2018-03-02 23:21:42,832 INFO THIS_JAVA_APP->DCM4CHEE (TCPServer-1) [org.dcm4cheri.net.FsmImpl] received AAssociateRQ
appCtxName: 1.2.840.10008.3.1.1.1/DICOM Application Context Name
implClass: 1.2.40.0.13.1.3
implVersion: dcm4che-5.12.0
calledAET: DCM4CHEE
callingAET: THIS_JAVA_APP
maxPDULen: 16378
asyncOpsWindow:
pc-1: as=1.2.840.10008.1.1/Verification SOP Class
ts=1.2.840.10008.1.2/Implicit VR Little Endian
2018-03-02 23:21:42,843 INFO THIS_JAVA_APP->DCM4CHEE (TCPServer-1) [org.dcm4cheri.net.FsmImpl] sending AAssociateAC
appCtxName: 1.2.840.10008.3.1.1.1/DICOM Application Context Name
implClass: 1.2.40.0.13.1.1.1
implVersion: dcm4che-1.4.34
calledAET: DCM4CHEE
callingAET: THIS_JAVA_APP
maxPDULen: 16352
asyncOpsWindow:
pc-1: 0 - acceptance
ts=1.2.840.10008.1.2/Implicit VR Little Endian
After you have the association, see this other post for how to perform a C-FIND.
Edit
Apparently, I solved the problem. Changing the executor from
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
to
ExecutorService executorService = Executors.newSingleThreadExecutor();
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
device.setExecutor(executorService);
device.setScheduledExecutor(scheduledExecutorService);
made it so my application correctly received the association response from the server. This might serve as reference for someone else.
Thank you for sharing your code. It was really helpful to me.
Original Post
I am unable to perform the connection with a code similar to the solution you proposed. I am trying to request an association with a dcm4chee-arc-light with dcm4che (both 5.14.1), and I have as it follows:
Device device = new Device(deviceName);
ApplicationEntity locAE = new ApplicationEntity(localAE);
Connection conn = new Connection();
Connection remote = new Connection();
AAssociateRQ rq = new AAssociateRQ();
device.addConnection(conn);
device.addApplicationEntity(locAE);
locAE.addConnection(conn);
ApplicationEntity remAE = new ApplicationEntity();
remAE.setAETitle(remoteAE);
remote.setCommonName("rem_conn");
remote.setHostname(remoteIP);
remote.setPort(remotePort);
remote.setProtocol(Connection.Protocol.DICOM);
remAE.addConnection(remote);
rq.setCalledAET(remAE.getAETitle());
rq.setCallingAET(locAE.getAETitle());
rq.setApplicationContext("1.2.840.10008.3.1.1.1");
rq.setImplClassUID("1.2.40.0.13.1.3");
rq.setImplVersionName("dcm4che-5.14.1");
rq.setMaxPDULength(16384);
rq.setMaxOpsInvoked(0);
rq.setMaxOpsPerformed(0);
rq.addPresentationContext(new PresentationContext(
1, "1.2.840.10008.5.1.4.1.2.2.1", "1.2.840.10008.1.2"));
Executor exec = (Runnable command) -> {};
device.setExecutor(exec);
//Opens association and connects to remote server
Association as = locAE.connect(conn, remote, rq);
But when trying to connect to a remote AET, it doesn't seem to receive the AAssociation response from the remote AET. My Java application hangs in Sta5 (waiting for association response) while the server hangs in Sta6 (ready for data transfer).
Java log:
[main] INFO org.dcm4che3.net.Connection - Initiate connection from 0.0.0.0/0.0.0.0:0 to localhost:11112
[main] INFO org.dcm4che3.net.Connection - Established connection Socket[addr=localhost/127.0.0.1,port=11112,localport=50101]
[main] DEBUG org.dcm4che3.net.Association - /127.0.0.1:50101>localhost/127.0.0.1:11112(1): enter state: Sta4 - Awaiting transport connection opening to complete
[main] INFO org.dcm4che3.net.Association - DEVICEAE->DCMQRSCP(1) << A-ASSOCIATE-RQ
[main] DEBUG org.dcm4che3.net.Association - A-ASSOCIATE-RQ[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
as: 1.2.840.10008.5.1.4.1.2.2.1 - Study Root Query/Retrieve Information Model - FIND
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
[main] DEBUG org.dcm4che3.net.Association - DEVICEAE->DCMQRSCP(1): enter state: Sta5 - Awaiting A-ASSOCIATE-AC or A-ASSOCIATE-RJ PDU
Server log:
19:11:29,397 INFO - Accept connection Socket[addr=/127.0.0.1,port=50101,localport=11112]
19:11:29,397 DEBUG - /127.0.0.1:11112<-/127.0.0.1:50101(3): enter state: Sta2 - Transport connection open
19:11:29,416 INFO - DCMQRSCP<-DEVICEAE(3) >> A-ASSOCIATE-RQ
19:11:29,416 DEBUG - A-ASSOCIATE-RQ[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
as: 1.2.840.10008.5.1.4.1.2.2.1 - Study Root Query/Retrieve Information Model - FIND
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
19:11:29,419 DEBUG - DCMQRSCP<-DEVICEAE(3): enter state: Sta3 - Awaiting local A-ASSOCIATE response primitive
19:11:29,419 INFO - DCMQRSCP<-DEVICEAE(3) << A-ASSOCIATE-AC
19:11:29,419 DEBUG - A-ASSOCIATE-AC[
calledAET: DCMQRSCP
callingAET: DEVICEAE
applicationContext: 1.2.840.10008.3.1.1.1 - DICOM Application Context Name
implClassUID: 1.2.40.0.13.1.3
implVersionName: dcm4che-5.14.1
maxPDULength: 16378
maxOpsInvoked/maxOpsPerformed: 1/1
PresentationContext[id: 1
result: 0 - acceptance
ts: 1.2.840.10008.1.2 - Implicit VR Little Endian
]
]
19:11:29,427 DEBUG - DCMQRSCP<-DEVICEAE(3): enter state: Sta6 - Association established and ready for data transfer
I feel like I am missing something, but I cannot find the source of the problem. Any help is appreciated, as I am still new to dcm4che and DICOM protocol.
Thank you.
Related
I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS
I configured two EMS servers working both with FaultTolerance enabled:
For EMS running on server on server1 I have
in tibemsd.conf
ft_active = tcp://server2:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server1:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server1:7232,tcp://server2:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server1:7232,tcp://server2:7232
And for EMS running on server on server2 I have
in tibemsd.conf
ft_active = tcp://server1:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server2:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server2:7232,tcp://server1:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server2:7232,tcp://server1:7232
I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:04:58.566 Server is active.
2022-07-20 23:05:18.563 Standby server 'SERVERNAME#server1' has connected.
then if I start EMS on server2, I get
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'
Moreover, if I kill active EMS on server1, I immediately get the following message on server2:
2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
...
2022-07-20 23:21:52.924 Server is now active.
Until here, everything looks OK, active/standby EMS servers seems to be correctly configured
Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:
#Test
public void testEmsFailover() throws JMSException, InterruptedException {
int NB = 1000;
TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
Connection connection = factory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
connection.start();
for (int i = 0; i < NB; i++) {
LOG.info("sending message");
Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
MessageProducer producer = session.createProducer(queue);
MapMessage mapMessage = session.createMapMessage();
mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
producer.send(mapMessage);
LOG.info("done!");
Thread.sleep(1000);
}
}
If I run this code while both active and standby servers are up, everything looks good
23:26:32.431 [main] INFO JmsEndpointTest - sending message
23:26:32.458 [main] INFO JmsEndpointTest - done!
23:26:33.458 [main] INFO JmsEndpointTest - sending message
23:26:33.482 [main] INFO JmsEndpointTest - done!
Now If I kill the active EMS server, I would expect that
the standby server would instantaneously become the active one
my code would continue to publish such as if nothing had happened
However, in my code I get the following error:
javax.jms.JMSException: Connection is closed
at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...
and in the logs of the server (the previous standby server supposed to be now the active one) I get
2022-07-20 23:32:44.447 [anonymous#cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous#cersei]: reconnect failed: connection unknown for id=8
I would appreciate any help to enhance my code
Thank you
I think I found the origin of my problem:
according to the page Tibco-Ems Failover Issue, the error message
reconnect failed: connection unknown for id=8
means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."
I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:
two tibemsd.conf configuration files
configure a different listen port in each file
configure ft_active with url of other server
configure factories.conf
By doing so, I can replay my test and it works as expected
I have a camel route in MyRouteBuilder.java file which is consuming messages from ActiveMQ:
from("activemq:queue:myQueue" )
.process(consumeDroppedMessage)
.log(">>> I am here");
I wrote a test case for the following like this :
#Override
public RouteBuilder createRouteBuilder() throws Exception {
return new MyRouteBuilder();
}
#Test
void testMyTest() throws Exception {
String queueInputMessage = "My Msg";
template.sendBody("activemq:queue:myQueue", queueInputMessage);
assertMockEndpointsSatisfied();
}
When I run the unit test case I get this strange error:
7:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Route: route1 >>> Route[activemq://queue:null -> null]
17:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Starting consumer (order: 1000) on route: route1
17:53:26.175 [main] DEBUG org.apache.camel.support.DefaultConsumer - Build consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Init consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Starting consumer: Consumer[activemq://queue:null]
17:53:26.213 [main] DEBUG org.apache.activemq.thread.TaskRunnerFactory - Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor#3fffff43[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
17:53:26.215 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Reconnect was triggered but transport is not started yet. Wait for start to connect the transport.
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Started unconnected
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Waking up reconnect task
17:53:26.335 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
17:53:26.339 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Established shared JMS Connection
17:53:26.340 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Resumed paused task: org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker#58c34bb3
17:53:26.372 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Attempting 0th connect to: tcp://localhost:61616
17:53:28.393 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Connect fail to: tcp://localhost:61616, reason: {}
I am especially stumped to see these messages:
Route: route1 >>> Route[activemq://queue:null -> null]
and
urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
Why is the queue coming up as null though I have a proper queue name? Also why is the broker url tcp://localhost:61616?
I want to run this unit test case so that it runs properly in all environments like: local, DIT , SIT, PROD etc. So, for that I cannot afford the broker url to be: tcp://localhost:61616.
Any ideas as to what I am doing wrong here and what I should be doing?
EDIT 1:
One of the issues that I am seeing is even before the test class is called, the MyRouteBuilder() inside createRouteBuilder() is invoked, leading to the issues that I see in the log.
The "activemq:queue:.." is telling Camel to use the auto-configure magic behind the scenes (which uses default url) and your use case is beyond that.
You need to configure a connection factory (ActiveMQConnectionFactory) and configure a camel-jms component to use that connection factory.
The connection factory allows you to specify url, userName, password, default connection settings and setup SSL.
A best practice is to externalize the url, userName, password and queue to a properties file so you can change those across the environments-- local, DIT, SIT and prod, etc.
NOTE: Use org.apache.camel/camel-jms component, and not the org.apache.activemq/activemq-camel component. activemq-camel is deprecated and being removed in ActiveMQ 5.17.x.
Instead of setting up an explicit active mq broker , I started using a VM broker .
#Override
protected RoutesBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
#Override
public void configure() {
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
ActiveMQComponent activeMQComponent = new ActiveMQComponent();
activeMQComponent.setConnectionFactory(connectionFactory);
context.addComponent("activemq", activeMQComponent);
from("activemq:queue:myQueue").to("mock:collector");
}
};
}
Also , I mistook camel junit as a traditional junit . We don't need to call explicitly the actual route builder class . Instead after setting up my activeMq component up above , I was able to write my test methods, mock my end points for queue and send messages and assert them . Camel is truly versatile . Requires a lot of study though .
One of our application just suffered from some nasty deadlocks. I had quite a hard time recreating the problem because the deadlock (or stacktrace) did not show up immediately in my java application logs.
To my surprise the marklogic java api retries failing requests (e.g because of a deadlock). This might make sense, if your request is not a multi statement request, but otherwise i'm not sure if it does.
So lets stick with this deadlock problem. I created a simple code snippet in which i create a deadlock on purpose. The snippet creates a document test.xml and then tries to read and write from two different transactions, each on a new thread.
public static void main(String[] args) throws Exception {
final Logger root = (Logger) LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME);
final Logger ok = (Logger) LoggerFactory.getLogger(OkHttpServices.class);
root.setLevel(Level.ALL);
ok.setLevel(Level.ALL);
final DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, new DatabaseClientFactory.DigestAuthContext("username", "password"));
final StringHandle handle = new StringHandle("<doc><name>Test</name></doc>")
.withFormat(Format.XML);
client.newTextDocumentManager().write("test.xml", handle);
root.info("t1: opening");
final Transaction t1 = client.openTransaction();
root.info("t1: reading");
client.newXMLDocumentManager()
.read("test.xml", new StringHandle(), t1);
root.info("t2: opening");
final Transaction t2 = client.openTransaction();
root.info("t2: reading");
client.newXMLDocumentManager()
.read("test.xml", new StringHandle(), t2);
new Thread(() -> {
root.info("t1: writing");
client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t1</t></doc>").withFormat(Format.XML), t1);
t1.commit();
}).start();
new Thread(() -> {
root.info("t2: writing");
client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t2</t></doc>").withFormat(Format.XML), t2);
t2.commit();
}).start();
TimeUnit.MINUTES.sleep(5);
client.release();
}
This code will produce the following log:
14:12:27.437 [main] DEBUG c.m.client.impl.OkHttpServices - Connecting to localhost at 8000 as admin
14:12:27.570 [main] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction null
14:12:27.608 [main] INFO ROOT - t1: opening
14:12:27.609 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:27.962 [main] INFO ROOT - t1: reading
14:12:27.963 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 5298588351036278526
14:12:28.283 [main] INFO ROOT - t2: opening
14:12:28.283 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:28.286 [main] INFO ROOT - t2: reading
14:12:28.286 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 8819382734425123844
14:12:28.289 [Thread-1] INFO ROOT - t1: writing
14:12:28.289 [Thread-1] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 5298588351036278526
14:12:28.289 [Thread-2] INFO ROOT - t2: writing
14:12:28.290 [Thread-2] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 8819382734425123844
Neither t1 or t2 will get commited. MarkLogic logs confirm that there actually is a deadlock:
==> /var/opt/MarkLogic/Logs/8000_AccessLog.txt <==
127.0.0.1 - admin [24/Nov/2018:14:12:30 +0000] "PUT /v1/documents?txid=5298588351036278526&category=content&uri=test.xml HTTP/1.1" 503 1034 - "okhttp/3.9.0"
==> /var/opt/MarkLogic/Logs/ErrorLog.txt <==
2018-11-24 14:12:30.719 Info: Deadlock detected locking Documents test.xml
This would not be a problem, if one of the requests would fail and throw an exception, but this is not the case. MarkLogic Java Api retries every request up to 120 seconds and one of the updates timeouts after like 120 seconds or so:
Exception in thread "Thread-1" com.marklogic.client.FailedRequestException: Service unavailable and maximum retry period elapsed: 121 seconds after 65 retries
at com.marklogic.client.impl.OkHttpServices.putPostDocumentImpl(OkHttpServices.java:1422)
at com.marklogic.client.impl.OkHttpServices.putDocument(OkHttpServices.java:1256)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:920)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:758)
at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:717)
at Scratch.lambda$main$0(scratch.java:40)
at java.lang.Thread.run(Thread.java:748)
What are possible ways to overcome this problem? One way might be to set a maximum time to live for a transaction (like 5 seconds), but this feels hacky and unreliable. Any other ideas? Are there any other settings i should check out?
I'm on MarkLogic 9.0-7.2 and using marklogic-client-api:4.0.3.
Edit: One way to solve the deadlock would be by syncronizing the calling function, this is actually the way i solved it in my case (see comments). But i think the underlying problem still exists. Having a deadlock in a multi statement transaction should not be hidden away in a 120 second timeout. I rather have a immediately failing request than a 120 second lock on one of my documents + 64 failing retries per thread.
Deadlocks are usually resolvable by retrying. Internally, the server does a inner-retry loop because usually deadlocks are transient and incidental, lasting a very short time. In your case you have constructed a case that will never succeed with any timeout that's equal for both threads.
Deadlocks can be avoided at the application layer by avoiding multi-statement transactions when using the REST API. (which is what the Java api uses).
Multi statement transactions over REST cannot be implemented 100% safely due to the client's responsibility to manage the transaction ID and the server's inability to detect client-side errors or client-side identity. Very subtle problems can and do occur unless you are aggressively proactive wrt handling errors and multithreading. If you 'push' the logic to the server (xquery or javascript) the server is able to manage things much better.
As for if its 'good' or not for the Java API to implement retries for this case, that's debatable either way. (The compromise for an seemingly easy-to-use interface is that many things that would otherwise be options are decided for you as a convention. There's generally no one-size-fits-all answer. In this case I am presuming the thought was that a deadlock is more likely caused by independant code/logic by 'accident' as opposed to identical code running in tangent -- a retry in that case would be a good choice. In your example its not, but then an earlier error would still fail predictably until you change your code to 'not do that' ).
If it doesn't already exist, a feature request for a configurable timeout and retry behaviour does seem a reasonable request. I would recommend, however, to attempt to avoid any REST calls that result in an open transaction -- inherently that is problematic, particularly if you don't notice the problem upfront (then its more likely to bite you in production). Unlike JDBC, which keeps a connection open so that the server can detect client disconnects, HTTP and the ML Rest API do not -- which leads to a different programming model then traditional database coding in java.
I have set up a replica set using three machines (192.168.122.21, 192.168.122.147 and 192.168.122.148) and I am interacting with the MongoDB Cluster using the Java SDK:
ArrayList<ServerAddress> addrs = new ArrayList<ServerAddress>();
addrs.add(new ServerAddress("192.168.122.21", 27017));
addrs.add(new ServerAddress("192.168.122.147", 27017));
addrs.add(new ServerAddress("192.168.122.148", 27017));
this.mongoClient = new MongoClient(addrs);
this.db = this.mongoClient.getDB(this.db_name);
this.collection = this.db.getCollection(this.collection_name);
After the connection is established I do multiple inserts of a simple test document:
for (int i = 0; i < this.inserts; i++) {
try {
this.collection.insert(new BasicDBObject(String.valueOf(i), "test"));
} catch (Exception e) {
System.out.println("Error on inserting element: " + i);
e.printStackTrace();
}
}
When simulating a node crash of the master server (power-off), the MongoDB cluster does a successful failover:
19:08:03.907+0100 [rsHealthPoll] replSet info 192.168.122.21:27017 is down (or slow to respond):
19:08:03.907+0100 [rsHealthPoll] replSet member 192.168.122.21:27017 is now in state DOWN
19:08:04.153+0100 [rsMgr] replSet info electSelf 1
19:08:04.154+0100 [rsMgr] replSet couldn't elect self, only received -9999 votes
19:08:05.648+0100 [conn15] replSet info voting yea for 192.168.122.148:27017 (2)
19:08:10.681+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
19:08:10.910+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
19:08:16.394+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
19:08:22.876+.
19:08:22.912+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
19:08:23.623+0100 [SyncSourceFeedbackThread] replset setting syncSourceFeedback to 192.168.122.148:27017
19:08:23.917+0100 [rsHealthPoll] replSet member 192.168.122.148:27017 is now in state PRIMARY
This is also recognized by the MongoDB Driver on the Client Side:
Dec 01, 2014 7:08:16 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: Read timed out
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.SocketTimeoutException: connect timed out
Dec 01, 2014 7:08:36 PM com.mongodb.DBTCPConnector setMasterAddress
WARNING: Primary switching from /192.168.122.21:27017 to /192.168.122.148:27017
But it still keeps trying to connect to the old node (forever):
Dec 01, 2014 7:08:50 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
.....
Dec 01, 2014 7:10:43 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException -message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
The Document count on the Database stays the same from the moment the primary fails and a secondary becomes primary. Here is the Output from the same node during the process:
"rs0":SECONDARY> db.test_collection.find().count() 12260161
"rs0":PRIMARY> db.test_collection.find().count() 12260161
Update:
Using WriteConcern Unacknowledged it works as designed. Insert Operations are also performed on the new master and all operations during the election process get lost.
With WriteConcern Acknowleged it seems that the Operation is waiting infinitely for an ACK from the crashed master. This could explain why the program continuous after the crashed server boots up again and joins the cluster as a secondary. But in my case I don't want the driver to wait forever, it should raise an error after a certain time.
Update:
WriteConcern Acknowledged is also working as expected when killing the mongod process on the primary. In this case the failover only takes ~3 Seconds. During this time no inserts are done, and after the new primary is elected the insert operations continue.
So I only get the problem when simulating a node failure (power off/network down). In this case the operation hangs until the failed node starts up again.
Does your app still work? Since that server is still in your seed list, the driver will try to connect to it as far as I know. Your app should still work so long as any of the other servers in your seed list can gain primary status.
Explicit specifying a Connection Timeout Value solved the error. See also: http://api.mongodb.org/java/2.7.0/com/mongodb/MongoOptions.html
I'm a newbie in programming with dcm4che2 libraries and I'm writing a simple program to query a PACS server, by setting Query/Retrieve Level to Patient/Series/Image.
The code is very simple and, in some cases, it works fine:
dcmqr.setCalledAET("AET_REMOTE", true);
dcmqr.setRemoteHost("aa.bb.cc.dd");
dcmqr.setRemotePort(xxxx);
dcmqr.getKeys();
dcmqr.setDateTimeMatching(true);
dcmqr.setCFind(true);
dcmqr.setCGet(false);
dcmqr.configureTransferCapability(true);
dcmqr.setQueryLevel(DcmQR.QueryRetrieveLevel.IMAGE);
dcmqr.addMatchingKey(new int[]{Tag.PatientName},sPatientName);
dcmqr.addMatchingKey(new int[]{Tag.Modality},sModality);
dcmqr.addMatchingKey(new int[]{Tag.AccessionNumber},sAccession);
dcmqr.addMatchingKey(new int[]{Tag.SeriesNumber},sSeriesNumber);
dcmqr.addReturnKey(new int[]{Tag.SeriesDescription});
dcmqr.addReturnKey(new int[]{Tag.StudyDescription});
dcmqr.addReturnKey(new int[]{Tag.PatientBirthDate});
dcmqr.addReturnKey(new int[]{Tag.PatientSex});
List<DicomObject> result = null;
try{
dcmqr.start();
dcmqr.open();
result = dcmqr.query();
dcmqr.stop();
dcmqr.close();
}
catch(Exception e){
...
}
However in some cases (and whenever I set Query/Retrieve Level to "Image"), the query() method fails ("unexpected message ID in DIMSE RSP") and an A-Abort command is thrown, as reported below:
...
[main] INFO org.dcm4che2.net.PDUEncoder - AET_REMOTE(1) << 3:C-FIND-RQ[pcid=1, prior=0
cuid=xyz/Study Root Query/Retrieve Information Model - FIND
ts=xyz/Implicit VR Little Endian]
[AE_TITLE_X] INFO org.dcm4che2.net.PDUDecoder - AET_REMOTE(1) >> 2:C-FIND-RSP[
pcid=1, status=0H cuid=xyz/Study Root Query/Retrieve Information Model - FIND]
[main] INFO org.dcm4che2.tool.dcmqr.DcmQR - Send Query Request #3/15 using .../Study Root Query/Retrieve Information Model - FIND:
(0008,0052) CS #6 [IMAGE] Query/Retrieve Level
(0008,0060) CS #2 [CT] Modality
(0010,0010) PN #12 [xxx^yyyy] PatientÆs Name
(0020,000D) UI #42 [x.y.z.zyx...] Study Instance UID
(0020,000E) UI #56 [y.x.z.zyx...] Series Instance UID
[AE_TITLE_X] WARN org.dcm4che2.net.Association - unexpected message ID in DIMSE RSP:
(0000,0002) UI #28 [x.y.z.zax...] Affected SOP Class UID
(0000,0100) US #2 [32800] Command Field
(0000,0120) US #2 [2] Message ID Being Responded To
(0000,0800) US #2 [257] Command Data Set Type
(0000,0900) US #2 [0] Status
[AE_TITLE_X] INFO org.dcm4che2.net.PDUEncoder - AET_REMOTE(1) << A-ABORT[source=0, reason=0]
[AE_TITLE_X] INFO org.dcm4che2.net.Association - AET_REMOTE(1): close Socket[addr=/aa.bb.cc.dd,port=xxx,localport=yyy]
[main] INFO org.dcm4che2.net.PDUEncoder - AET_REMOTE(1) << 4:C-FIND-RQ[pcid=1, prior=0
cuid=.../Study Root Query/Retrieve Information Model - FIND
ts=.../Implicit VR Little Endian]
[main] WARN org.dcm4che2.net.Association - unable to send P-DATA-TF in state: Sta1
Indeed, I can't understand what does this error mean and figure out a solution.
I guess it's a communication issue..
Do anyone could help me?
Thanks.
Your logging indicates that you've made query request #3, then received the response for query request #2. If the listener is now expecting a response for 3, then it will throw an exception because it has received a message ID for message 2.
If you are looping over the query call to do this, you could try specifying the instances as a list instead:
addMatchingKey( new int[] { Tag.SeriesInstanceUID }, "uid1\\uid2\\uid3" );