Good time!
There is a big web application with lots of threads processing data back and forth. One part of it is the service that processes trades (TradeProcessingService). When a trade is received it is validated and sent for a further processing to other services. So that the TradeProcessingService is an entry point of this web app component.
Each trade is connected with exactly one exchange. As far as all the processing is based on the trade exchange, it is required to perform such processing in parallel for different exchanges.
Together with a described above functionality, there is a scheduling service (ExchangeDataUpdaterService) that updates exchange data (one by one) every 10 seconds. As far as this data is used for the trade processing, it is required to synchronize the processing and updating operations.
Thereby it is not only required to synchronize each processing thread (with all the services' method call chain) by an exchange, but also synchronize those threads with updating threads (also, by exchange data).
I am not experienced with such tasks. It seems that there should be some shared monitor objects (say, one per each exchange) to use in processing and updating threads...
Could you, please, suggest some best practices for dealing with the above scenario?
First of all, thanks OldCurmudgeon, Martin James and Peter Lawrey for the suggested solutions.
I'll describe the approach I've implemented (just an overview without any disaster recovery or resource releasing mechanisms).
As I wrote before, the main idea is to make a parallel processing for each exchange.
Firstly, I created a map with exchanges as keys and ExchangeTaskProcessorEnv classes as values, where an ExchangeTaskProcessorEnv class object contains a BlockingQueue (for the ExchangeAwareTask class objects) and an ExchangeAwareTaskProcessor class object (which is a Runnable implementer that calls an 'invoke' method of each ExchangeAwareTask class object from the queue and perform some additional processing). The described map is located in a singleton class - ExchangeTaskHolder.
Secondly, there is a number of operations that could be performed for different business cases: update an exchange's info, update a trade's info, synchronize data with an FTP server... For each of these cases I've created a task that extends the ExchangeAwareTask class and overrides the 'invoke' method (each task knows how it must be processed). It is worth mentioning that each task contains a reference to a corresponding exchange.
Thirdly, I introduced a static factory to be able to create a required task for a required exchange.
Lastly, when a user or a scheduling mechanism needs to perform some action it creates a required task with the factory and adds it to the ExchangeTaskHolder object that assigns it for the corresponding exchange.
After a while I realized that there are a number of special cases when some action does not correspond any exchange or when an action corresponds all exchanges. No problem, in the first case an additional task-map-bucket could be added (the exchange-aware processing mechanism is not affected), while on the second case a special method of the ExchangeTaskHolder creates the required tasks for each ExchangeAwareTaskProcessor (for each exchange).
Some time later another requirement came: each task must have a certain weight. Not problem, I just changed the ExchangeTaskHolder's BlockingQueue with a PriorityBlockingQueue one.
Related
Current implementation:
There is a queue from which messages are pushed to a component, From where the messages are placed in a DB and are processed further. It involved many DB calls and takes more time also. So need to modify this by a different approach.
One such solution is : Having a concurrent Hashmap with key as header id and value as Concurrent linked queue of messages.
Dispatcher – Segregates the incoming messages based on their header id and place them in a ConcurrentHashMap with key as the id and value in ConcurrentLinkedQueue.
Worker – Worker thread is a scheduled thread that will invoke the processor with the specified time delay repeatedly. It sends the individual queue grouped under each header id to the processor through the executor. Once a particular header id is empty in the map, it removes it.
Processor – Polls the messages one by one from the queue(ConcurrentLinkedQueue) and processes it.
Also, one of my colleague comments as "The approach should be scalable as we are accounting for another instance of the component running from a different host"
Please throw some light on this. How this can be done? Any direction or link or any help is much appreciated.
Maybe you need something like Hazelcast (https://hazelcast.com/), eg http://docs.hazelcast.org/docs/3.8.3/javadoc/com/hazelcast/core/IMap.html
I am working on a simple project that is going to send multiple HTTP requests to retrieve some data, parse the response from each URL, and return response that contains the original URL and some information about the data(The reason for threads usage is obviously the multiple HTTP requests).
I am wandering if there is best practice for this scenario, here are the options that pops into my mind :
1. Have each thread send an HTTP request. parse the data to get the required information and return the information itself (by a Future<SomeDataType>, or a simple DataType getInformation() call to be done after the thread is complete), then create the URL-SomeDataType pair in the original thread
2. Having each thread take an additional argument of a synchronized list/map, which the thread will add the URL-Information pair to (the same instance of the list/map will be shared across all threads).
3. Less likely option - having each thread just pull the information, and return in in either way mentioned in 1/2, than parsing all the information in the main thread (which will reduce performance but will require almost 0 synchronisation handling, which is nice)
Is there a best practice for a similar scenario?
Thanks!
In my opinion, Option 1 is the cleanest and aligns with the best practice. Preferred way to implement it would be to use the executor framework (thread-pools and Callables). Reasons for the choice -
Separation of concerns - each thread returns the results of its' work independently. After that, it's the main thread's job to take that result and process it further the way it likes (e.g. put it in a map OR merge it into something else). In the future, if you found a better/cleaner way of aggregating the results - that change would most-likely not impact what the worker threads themselves do or return.
Option 2 would involve unnecessary synchronization (although you could use ConcurrentHashMap to make it minimal). Bigger problem - it mixes the concerns among the main thread and the worker threads. The worker threads now "know" a bit about what to with the result (their concern should only be - getting the results)
Option 3, as you said, would degrade performance. If the information fetched by each thread is independent from each other, it makes sense to let each thread parse that info and then return it.
I am going through different concurrency model in multi-threading environment (http://tutorials.jenkov.com/java-concurrency/concurrency-models.html)
The article highlights about three concurrency models.
Parallel Workers
The first concurrency model is what I call the parallel worker model. Incoming jobs are assigned to different workers.
Assembly Line
The workers are organized like workers at an assembly line in a factory. Each worker only performs a part of the full job. When that part is finished the worker forwards the job to the next worker.
Each worker is running in its own thread, and shares no state with other workers. This is also sometimes referred to as a shared nothing concurrency model.
Functional Parallelism
The basic idea of functional parallelism is that you implement your program using function calls. Functions can be seen as "agents" or "actors" that send messages to each other, just like in the assembly line concurrency model (AKA reactive or event driven systems). When one function calls another, that is similar to sending a message.
Now I want to map java API support for these three concepts
Parallel Workers : Is it ExecutorService,ThreadPoolExecutor, CountDownLatch API?
Assembly Line : Sending an event to messaging system like JMS & using messaging concepts of Queues & Topics.
Functional Parallelism: ForkJoinPool to some extent & java 8 streams. ForkJoin pool is easy to understand compared to streams.
Am I correct in mapping these concurrency models? If not please correct me.
Each of those models says how the work is done/splitted from a general point of view, but when it comes to implementation, it really depends on your exact problem. Generally I see it like this:
Parallel Workers: a producer creates new jobs somewhere (e.g in a BlockingQueue) and many threads (via an ExecutorService) process those jobs in parallel. Of course, you could also use a CountDownLatch, but that means you want to trigger an action after exactly N subproblems have been processed (e.g you know your big problem may be split in N smaller problems, check the second example here).
Assembly Line: for every intermediate step, you have a BlockingQueue and one Thread or an ExecutorService. On each step the jobs are taken from one BlickingQueue and put in the next one, to be processed further. To your idea with JMS: JMS is there to connect distributed components and is part of the Java EE and was not thought to be used in a high concurrent context (messages are kept usually on the hard disk, before being processed).
Functional Parallelism: ForkJoinPool is a good example on how you could implement this.
An excellent question to which the answer might not be quite as satisfying. The concurrency models listed show some of the ways you might want to go about implementing an concurrent system. The API provides tools used to implementing any of these models.
Lets start with ExecutorService. It allows you to submit tasks to be executed in a non-blocking way. The ThreadPoolExecutor implementation then limits the maximum number of threads available. The ExecutorService does not require the task to perform the complete process as you might expect of a parallel worker. The task may be limited to specific part of the process and send a message upon completion that starts the next step in an assembly line.
The CountDownLatch and the ExecutorService provide a means to block until all workers have completed that may come in handy if a certain process has been divided to different concurrent sub-tasks.
The point of JMS is to provide a means for messaging between components. It does not enforce a specific model for concurrency. Queues and topics denote how a message is sent from a publisher to a subscriber. When you use queues the message is sent to exactly one subscriber. Topics on the other hand broadcast the message to all subscribers of the topic.
Similar behavior could be achieved within a single component by for example using the observer pattern.
ForkJoinPool is actually one implementation of ExecutorService (which might highlight the difficulty of matching a model and an implementation detail). It just happens to be optimized for working with large amount of small tasks.
Summary: There are multiple ways to implement a certain concurrency model in the Java environment. The interfaces, classes and frameworks used in implementing a program may vary regardless of the concurrency model chosen.
Actor model is another example for an Assembly line. Ex: akka
There is one controlling entity and several 'worker' entities. The controlling entity requests certain data from the worker entities, which they will fetch and return in their own manner.
Since the controlling entity can agnostic about the worker entities (and the working entities can be added/removed at any point), putting a JMS provider in between them sounds like a good idea. That's the assumption at least.
Since it is an one-to-many relation (controller -> workers), a JMS Topic would be the right solution. But, since the controlling entity is depending on the return values of the workers, request/reply functionality would be nice as well (somewhere, I read about the TopicRequester but I cannot seem to find a working example). Request/reply is typical Queue functionality.
As an attempt to use topics in a request/reply sort-of-way, I created two JMS topis: request and response. The controller publishes to the request topic and is subscribed to the response topic. Every worker is subscribed to the request topic and publishes to the response topic. To match requests and responses the controller will subscribe for each request to the response topic with a filter (using a session id as the value). The messages workers publish to the response topic have the session id associated with them.
Now this does not feel like a solution (rather it uses JMS as a hammer and treats the problem (and some more) as a nail). Is JMS in this situation a solution at all? Or are there other solutions I'm overlooking?
Your approach sort of makes sense to me. I think a messaging system could work. I think using topics are wrong. Take a look at the wiki page for Enterprise Service Bus. It's a little more complicated than you need, but the basic idea for your use case, is that you have a worker that is capable of reading from one queue, doing some processing and adding the processed data back to another queue.
The problem with a topic is that all workers will get the message at the same time and they will all work on it independently. It sounds like you only want one worker at a time working on each request. I think you have it as a topic so different types of workers can also listen to the same queue and only respond to certain requests. For that, you are better off just creating a new queue for each type of work. You could potentially have them in pairs, so you have a work_a_request queue and work_a_response queue. Or if your controller is capable of figuring out the type of response from the data, they can all write to a single response queue.
If you haven't chosen an Message Queue vendor yet, I would recommend RabbitMQ as it's easy to set-up, easy to add new queues (especially dynamically) and has really good spring support (although most major messaging systems have spring support and you may not even be using spring).
I'm also not sure what you are accomplishing the filters. If you ensure the messages to the workers contain all the information needed to do the work and the response messages back contain all the information your controller needs to finish the processing, I don't think you need them.
I would simply use two JMS queues.
The first one is the one that all of the requests go on. The workers will listen to the queue, and process them in their own time, in their own way.
Once complete, they will put bundle the request with the response and put that on another queue for the final process to handle. This way there's no need for the the submitting process to retain the requests, they just follow along with the entire procedure. A final process will listen to the second queue, and handle the request/response pairs appropriately.
If there's no need for the message to be reliable, or if there's no need for the actual processes to span JVMs or machines, then this can all be done with a single process and standard java threading (such as BlockingQueues and ExecutorServices).
If there's a need to accumulate related responses, then you'll need to capture whatever grouping data is necessary and have the Queue 2 listening process accumulate results. Or you can persist the results in a database.
For example, if you know your working set has five elements, you can queue up the requests with that information (1 of 5, 2 of 5, etc.). As each one finishes, the final process can update the database, counting elements. When it sees all of the pieces have been completed (in any order), it marks the result as complete. Later you would have some audit process scan for incomplete jobs that have not finished within some time (perhaps one of the messages erred out), so you can handle them better. Or the original processors can write the request to a separate "this one went bad" queue for mitigation and resubmission.
If you use JMS with transaction, if one of the processors fails, the transaction will roll back and the message will be retained on the queue for processing by one of the surviving processors, so that's another advantage of JMS.
The trick with this kind of processing is to try and push the state with message, or externalize it and send references to the state, thus making each component effectively stateless. This aids scaling and reliability since any component can fail (besides catastrophic JMS failure, naturally), and just pick up where you left off when you get the problem resolved an get them restarted.
If you're in a request/response mode (such as a servlet needing to respond), you can use Servlet 3.0 Async servlets to easily put things on hold, or you can put a local object on a internal map, keyed with the something such as the Session ID, then you Object.wait() in that key. Then, your Queue 2 listener will get the response, finalize the processing, and then use the Session ID (sent with message and retained through out the pipeline) to look up
the object that you're waiting on, then it can simply Object.notify() it to tell the servlet to continue.
Yes, this sticks a thread in the servlet container while waiting, that's why the new async stuff is better, but you work with the hand you're dealt. You can also add a timeout to the Object.wait(), if it times out, the processing took to long so you can gracefully alert the client.
This basically frees you from filters and such, and reply queues, etc. It's pretty simple to set it all up.
Well actual answer should depend upon whether your worker entities are external parties, physical located outside network, time expected for worker entity to finish their work etc..but problem you are trying to solve is one-to-many communication...u added jms protocol in your system just because you want all entities to be able to talk in jms protocol or asynchronous is reason...former reason does not make sense...if it is latter reason, you can choose other communication protocol like one-way web service call.
You can use latest java concurrent APIs to create multi-threaded asynchronous one-way web service call to different worker entities...
I have a component that I wish to write and it's the kind of thing that feels like a common pattern. I was hoping to find the common name for the pattern if there is one, and examples of how to go about implementing it.
I have a service that queues requests and processes them one at a time. I have a number of client threads which make the requests. The key is that the calling threads must block until their own particular request is serviced.
E.g. if there are 10 threads, all making a request, then the 10th thread will block for longest while it waits for its request to make it to the front of the queue, and to be processed. In brief pseodocode, a call would be as simple as:
service.processMessage(myMessage); /* block whilst it enqueues, waits, */
/* processes and returns */
I know what you're thinking - why bother having threads at all? Let's just say there are design constraints well outside my control.
Also, this should run on JavaME, which means an infuriating subset of real Java, and no swanky external libraries.
If you do not have any requirements on the total ordering of handling requests (i.e., you don't mind arbitrarily mixing requests from different threads independent of the order they "arrive" in), you could simply make processMessage() synchronized, I guess.