I want my two JVM applications speak to each other on the same machine. I considered using RMI, but then I found Chronicle Queue which claims that it is very fast. I wonder whether I can use Chronicle to invoke a method on the other JVM and wait for the return value. Are there any use cases for that?
It's doable, but might be an overkill (especially if you don't have to keep the history of the request/responses). Imagine a simple scenario with two processes: C (client) and S (server). Create two IndexedChronicles:
Q1 for sending requests from C to S
Q2 for sending responses from S to C
Server has a thread that is polling (busy spin with back-off) on Q1. When it receives a request (with id=x it does whatever is needed and writes out response to Q2 (with id=x. C polls Q2 with some policy and reads out responses as they appear. It uses the id to tie responses to requests.
Main task would be in devising a wire-level protocol for serialising your commands (equivalent of the method calls) from the client. This is application specific and can be done efficiently with the Chronicle tools.
Another issues to consider:
what should the client do with the historical responses on startup?
some heartbeat system so that client knows the server is alive
archiving of the old queues (VanillaChronicle makes it easier at some cost)
Related
How to use TBufferedTransport of TThreadedSelectorServer in java?
in Python client:
self.tsocket= TSocket.TSocket(self.host, self.port)
self.transport = TTransport.TBufferedTransport(self.tsocket)
protocol = TBinaryProtocol(self.transport)
client = Handler.Client(protocol)
self.transport.open()
in Java Server
TNonblockingServerSocket serverTransport = new TNonblockingServerSocket(port);
TProcessor tprocessor = new ExecutionService.Processor<ExecutionService.Iface>(handler);
TThreadedSelectorServer.Args tArgs = new TThreadedSelectorServer.Args(serverTransport);
tArgs.processor(tprocessor);
tArgs.protocolFactory(new TBinaryProtocol.Factory());
this.server = new TThreadedSelectorServer(tArgs);
The Python client uses TBufferedTransport, and the Java server uses TFramedTransport. Causes an exception:
AbstractNonblockingServer$FrameBuffer Read an invalid frame size of -2147418111. Are you using TFramedTransport on the client side?
For some reasons, the client cannot be modified, so I want to modify the java server to TBufferedTransport.
How to use TBufferedTransport of TThreadedSelectorServer in java?
thanks!!!
The TThreadedSelectorServer requires TFramedTransport (reference):
A Half-Sync/Half-Async server with a separate pool of threads to handle non-blocking I/O. Accepts are handled on a single thread, and a configurable number of nonblocking selector threads manage reading and writing of client connections. ... Like TNonblockingServer, it relies on the use of TFramedTransport.
This applies for the other non-blocking server classes deriving from TNonblockingServer (reference):
A nonblocking TServer implementation. This allows for fairness amongst all connected clients in terms of invocations. This server is inherently single-threaded. If you want a limited thread pool coupled with invocation-fairness, see THsHaServer. To use this server, you MUST use a TFramedTransport at the outermost transport, otherwise this server will be unable to determine when a whole method call has been read off the wire. Clients must also use TFramedTransport.
If you cannot use TFramedTransport on the client side, you therefore have to use a blocking server, i.e. TThreadPoolServer (reference):
Server which uses Java's built in ThreadPool management to spawn off a worker pool that deals with client connections in blocking way.
Your code would then look like this:
TServerSocket serverTransport = new TServerSocket(9090);
TThreadPoolServer.Args tArgs = new TThreadPoolServer.Args(serverTransport);
tArgs.processor(processor);
tArgs.protocolFactory(new TBinaryProtocol.Factory());
TThreadPoolServer server = new TThreadPoolServer(tArgs);
To detail the differences between the blocking and the non-blocking servers (for general reference, apologies if the difference is already clear to you): Blocking means that when data is read from a socket, no other operation can be done while reading. So when the data arrives partially, the current thread waits until the remaining data arrives. So when a blocking server only has a single thread, only one client can be handled at a time. The time spend waiting for further data from a client cannot be used to serve other clients.
To support multiple clients, multiple threads can be added (as done for TThreadPoolServer). Each thread can only handle one client at a time as before, so the number of clients that can be served simultaneously is limited by the number of threads. You could of course spawn many threads, but this does not scale well: The threads used by the Java ThreadPool which backs the TThreadPoolServer are system-level threads, so they come with some resource over-head for creation and switching between threads. So creating a large number of threads to serve a large number of clients means more time is spent with OS book-keeping of the tasks.
Non-blocking servers (deriving from TNonblockingServer) are meant to solve this problem by utilizing the time spend waiting for data from one client by reading data from other clients. This way a single thread can handle multiple clients, reading from whichever client currently has available data. A non-blocking server can of course also have multiple threads, each handling multiple clients. This way the number of threads does not have to scale with the number of clients. Instead, the number of threads can be chosen proportionally to the number of CPU cores, and then each thread running on a core can read as much data as the I/O band-width and CPU speed allows. For this reason, a non-blocking server scales better with high-client numbers.
For this reason, if you have to handle a large number of clients simultaneously , using TNonblockingServer would be preferable and it would be better to find a way to switch the client to use the TFramedTransport. If your use-case is handling only a limited number of clients, then using TThreadPoolServer without modifying the client should be fine, even if each client produces a lot of data.
I'm working on a program that will have multiple threads requiring information from a web-service that can handle requests such as:
"Give me [Var1, Var2, Var3] for [Object1, Object2, ... Object20]"
and the resulting reply will give me a, in this case, 20-node XML (one for each object), each node with 3 sub-nodes (one for each var).
My challenge is that each request made of this web-service costs the organization money and, whether it be for 1 var for 1 object or 20 vars for 20 objects, the cost is the same.
So, that being the case, I'm looking for an architecture that will:
Create a request on each thread as data is required
Have a middle-tier "aggregator" that gets all the requests
Once X number of requests have been aggregated (or a time-limit has reached), the middle-tier performs a single request of the web-service
Middle-tier receives reply from web-service
Middle-tier routes information back to waiting objects
Currently, my thoughts are to use a library such as NetMQ with my middle-tier as a server and each thread as a poller, but I'm getting stuck on the actual implementation and, before going too far down the rabbit-hole, am hoping there's already a design pattern / library out there that does this substantially more efficiently than I'm conceiving of.
Please understand that I'm a noob, and, so, ANY help / guidance would be really greatly appreciated!!
Thanks!!!
Overview
From the architectural point of view, you just sketched out a good approach for the problem:
Insert a proxy between the requesting applications and the remote web service
In the proxy, put the requests in the request queue, until at least one of the following events occurs
The request queue reaches a given length
The oldest request in the request queue reaches a certain age
Group all requests in the request queue in one single request, removing duplicate objects or attributes
Send this request to the remote web service
Move the requests into the (waiting for) response queue
Wait for the response until one of the following occurs
the oldest request in the response queue reaches a certain age (time out)
a response arrives
Get the response (if applicable) and map it to the according requests in the response queue
Answer all requests in the response queue that have an answer
Send a timeout error for all requests older than the timeout limit
Remove all answered requests from the response queue
Technology
You probably won't find an off-the-shelf product or a framework that exactly matches you requirements. But there are several frameworks / architectural patterns that you can use to build a solution.
C#: RX and LINQ
When you want to use C#, you could use reactive extensions for getting the timing and the grouping right.
You could then use LINQ to select the attributes from the requests to build the response and to select the requests in the response queue that either match to a certain part of a response or that timed out.
Scala/Java: Akka
You could model the solution as an actor system, using several actors:
An actor as the gateway for the requests
An actor holding the request queue
An actor sending the request to the remote web service and getting the response back
An actor holding the response queue
An actor sending out the responses or the timeouts
An actor system makes it easy to deal with concurrency and to separate the concerns in a testable way.
When using Scala, you could use its "monadic" collection API (filter, map, flatMap) to do basically the same as with LINQ in the C# approach.
The actor approach really shines when you want to test the individual elements. It is very easy to test each actor individually, without having to mock the whole workflow.
Erlang/Elixir: Actor System
This is similar to the Akka approach, just with a different (functional!) language. Erlang / Elixir has a lot of support for distributed actor systems, so when you need an ultra stable or scalable solution, you should look into this one.
NetMQ / ZeroMQ
This is probably too low level and brings in to few infrastructure. When you use an actor system, you could try to bring in NetMQ / ZeroMQ as the transport system.
Your idea of using a queue looks good to me.
This is one possible solution to your problem and I'm sure there are countless other solutions that can do what you need.
Have a "publish queue" (PQ) and a "consume queue" (CQ)
Clients subscribe to CQ and MT subscribes to PQ
Clients publish the requests to PQ
MT Listens to PQ, aggregates requests and dispatches to farm in a thread
Once the results are back, this thread separates the results into req/res pair
It then publishes the req/res pairs to the CQ
Each client picks the correct message and processes it
Long(er) version:
Have your "middle tier" to listen to a queue (to which, the clients publish messages) and aggregate the requests until N number of requests have come through or X amount of time has passed.
One you are ready, offload the aggregated request to a thread to call your farm and get the results. A bigger problem will most likely arise when you need to communicate this back to the clients.
For that, you probably need another queue that all your clients subscribe to and once your result batch is ready (say 20 responses in XML) from the farm, the thread that called the farm will separate the XML results into their corresponding request/response pair and publish to this queue. Each client will need to pick up the correct request/response pair from the queue and process it.
This will not be a webservice in the traditional sense since the wait times can be prohibitively long and you don't want to maintain a connection which is why I suggest the queue.
You can also have your consumer queue to be topic based, meaning you only publish the req/res pairs to the consumer that asked for it and don't broadcast it (so the client doesn't have to "pick the correct req/res". It will be taken care of based on the topic name). Almost all queues support this.
I wonder if I can do request-reply with this:
1 hazelcast instance/member (central point)
1 application with hazelcast-client sending request through a queue
1 application with hazelcast-client waiting for requests into the queue
The 1st application also receives the response on another queue posted by the second application.
Is it a good way to proceed? Or do you think of a better solution?
Thanks!
The last couple of days I also worked on a "soa like" solution using hazelcast queues to communicate between different processes on different machines.
My main goals were to have
"one to one-of-many" communication with garanteed reply of one-of-the-many's
"one to one" communication one way
"one to one" communication with answering in a certain time
To make a long story short, I dropped this approach today because of the follwoing reasons:
lots of complicated code with executor services, callables, runnables, InterruptedException's, shutdown-handling, hazelcast transactions, etc
dangling messages in case of the "one to one" communciation when the receiver has shorter lifetime than the sender
loosing messages if I kill certain cluster member(s) at the right time
all cluster members must be able to deserialize the message, because it could be stored anywhere. Therefore the messages can't be "specific" for certain clients and services.
I switched over to a much simpler approach:
all "services" register themselves in a MultiMap ("service registry") using the hazelcast cluster member UUID as key. Each entry contains some meta information like service identifier, load factor, starttime, host, pid, etc
clients pick a UUID of one of the entries in that MultiMap and use a DistributedTask (distributed executor service) for the choosen specific cluster member to invoke the service and optionally get a reply (in time)
only the service client and the service must have the specific DistributedTask implementation in their classpath, all other cluster members are not bothered
clients can easily figure out dead entries in the service registry themselves: if they can't see a cluster member with the specific UUID (hazelcastInstance.getCluster().getMembers()), the service died probably unexpected. Clients can then pick "alive" entries, entries which fewer load factor, do retries in case of idempotent services, etc
The programming gets very easy and powerful using the second approach (e.g. timeouts or cancellation of tasks), much less code to maintain.
Hope this helps!
In the past we have build a SOA system that uses Hazelcast queue's as a bus. Here is some of the headlines.
a. Each service has an income Q. Simply service name is the name of the queue. You can have as many service providers as you wish. You can scale up and down. All you need is these service providers to poll this queue and process the arrived requests.
b. Since the system is fully asynchronous, to correlate request and response, there is also a call id both on request and response.
c. Each client sends a request into the queue of the service that it wants to call. The request has all the parameters for the service, a name of the queue to send the response and a call id. A queue name simply can be the address of the client. This way each client will have it's own unique queue.
d. Upon receiving the request, a service provider processes it and sends the response to the answer queue
e. Each client also continuously polls its input queue to receive the answers for the requests that it send.
The major drawback with this design is that the queues are not as scalable as maps. Thus it is not very scalable. Hoever it still can process 5K requests per seconds.
I made a test for myself and validated that it works well with certain limitation.
The architecture is Producer-Hazelcast_node-Consumer(s)
Using two Hazelcast queues, one for Request, one for Response, I could mesure a round trip of under 1ms.
Load balancing is working fine if I put several consumers of the Request queue.
If I add another node, and connect the clients to each node, then the round trip is above 15ms. This is due to replication between the 2 hazelcast nodes. If I kill a node, the clients continue to work. So failover is working at the cost of time.
Can't you use the correlation id to perform request-reply on a single queue in hazelcast? That's the id that should uniquely define a conversation between 2 providers/consumers of a queue.
What is the purpose of this setup #unludo ?. I am just curious
I'm writing a process which must connect (and keep alive) to several (hundreds) remote peers and manage messaging / control over them.
I made two versions of this software: first with classic "thread-per-connection" model, the second using standard java NIO and selectors (to reduce thread allocation, but has problems). Then, looking around I found Netty can boost a lot in most cases and I started a third one using it. My goal is to keep resource usage quite low keeping it fast.
Once written the pipeline factory with custom events and dynamic handler switching, I stopped on the most superficial part: its allocation.
All the examples I read use a single client with single connection, so I got the doubt: I set up a ChannelFactory and a PipelineFactory, so every (new ClientBootstrap(factory)).connect(address) makes a new channel with a new pipeline. Is it possible to make a shared pipeline and defer business logic to a thread-pool?
If so, how?
Using standard java NIO I managed to use two small small thread pools (threads < remote peers) taking advantage of selectors; I had, however, troubles on recycling listened channels for writing.
Communication should happen through a single channel which can receive timed messages from the remote peer or make a 3-way control (command-ack-ok).
On second hand: once the event as reached the last handler, what happens? Is it there I extract it or can I extract a message from any point?
You should only have one bootstrap (i.e one ChannelFactory and one PipeLineFactory). Pipelines, or even individual channel handlers, may be shared, but they are usually created unique per channel.
You can have an ExecutionHandler in your pipeline to transfer execution from the IO worker threads to a thread pool.
But why don't you read the exhaustive documentation at http://netty.io/wiki/ ? You'll find answers to every question of your's there.
I'm working on the existing application that uses transport layer with point-to-point MQ communication.
For each of the given list of accounts we need to retrieve some information.
Currently we have something like this to communicate with MQ:
responseObject getInfo(requestObject){
code to send message to MQ
code to retrieve message from MQ
}
As you can see we wait until it finishes completely before proceeding to the next account.
Due to performance issues we need to rework it.
There are 2 possible scenarios that I can think off at the moment.
1) Within an application to create a bunch of threads that would execute transport adapter for each account. Then get data from each task. I prefer this method, but some of the team members argue that transport layer is a better place for such change and we should place extra load on MQ instead of our application.
2) Rework transport layer to use publish/subscribe model.
Ideally I want something like this:
void send (requestObject){
code to send message to MQ
}
responseObject receive()
{
code to retrieve message from MQ
}
Then I will just send requests in the loop, and later retrieve data in the loop. The idea is that while first request is being processed by the back end system we don't have to wait for the response, but instead send next request.
My question, is it going to be a lot faster than current sequential retrieval?
The question title frames this as a choice between P2P and pub/sub but the question body frames it as a choice between threaded and pipelined processing. These are two completely different things.
Either code snippet provided could just as easily use P2P or pub/sub to put and get messages. The decision should not be based on speed but rather whether the interface in question requires a single message to be delivered to multiple receivers. If the answer is no then you probably want to stick with point-to-point, regardless of your application's threading model.
And, incidentally, the answer to the question posed in the title is "no." When you use the point-to-point model your messages resolve immediately to a destination or transmit queue and WebSphere MQ routes them from there. With pub/sub your message is handed off to an internal broker process that resolves zero to many possible destinations. Only after this step does the published message get put on a queue where, for the remainder of it's journey, it then is handled like any other point-to-point message. Although pub/sub is not normally noticeably slower than point-to-point the code path is longer and therefore, all other things being equal, it will add a bit more latency.
The other part of the question is about parallelism. You proposed either spinning up many threads or breaking the app up so that requests and replies are handled separately. A third option is to have multiple application instances running. You can combine any or all of these in your design. For example, you can spin up multiple request threads and multiple reply threads and then have application instances processing against multiple queue managers.
The key to this question is whether the messages have affinity to each other, to order dependencies or to the application instance or thread which created them. For example, if I am responding to an HTTP request with a request/reply then the thread attached to the HTTP session probably needs to be the one to receive the reply. But if the reply is truly asynchronous and all I need to do is update a database with the response data then having separate request and reply threads is helpful.
In either case, the ability to dynamically spin up or down the number of instances is helpful in managing peak workloads. If this is accomplished with threading alone then your performance scalability is bound to the upper limit of a single server. If this is accomplished by spinning up new application instances on the same or different server/QMgr then you get both scalability and workload balancing.
Please see the following article for more thoughts on these subjects: Mission:Messaging: Migration, failover, and scaling in a WebSphere MQ cluster
Also, go to the WebSphere MQ SupportPacs page and look for the Performance SupportPac for your platform and WMQ version. These are the ones with names beginning with MP**. These will show you the performance characteristics as the number of connected application instances varies.
It doesn't sound like you're thinking about this the right way. Regardless of the model you use (point-to-point or publish/subscribe), if your performance is bounded by a slow back-end system, neither will help speed up the process. If, however, you could theoretically issue more than one request at a time against the back-end system and expect to see a speed up, then you still don't really care if you do point-to-point or publish/subscribe. What you really care about is synchronous vs. asynchronous.
Your current approach for retrieving the data is clearly synchronous: you send the request message, and wait for the corresponding response message. You could do your communication asynchronously if you simply sent all the request messages in a row (perhaps in a loop) in one method, and then had a separate method (preferably on a different thread) monitoring the incoming topic for responses. This would ensure that your code would no longer block on individual requests. (This roughly corresponds to option 2, though without pub/sub.)
I think option 1 could get pretty unwieldly, depending on how many requests you actually have to make, though it, too, could be implemented without switching to a pub/sub channel.
The reworked approach will use fewer threads. Whether that makes the application faster depends on whether the overhead of managing a lot of threads is currently slowing you down. If you have fewer than 1000 threads (this is a very, very rough order-of-magnitude estimate!), i would guess it probably isn't. If you have more than that, it might well be.