I have an app where upon pressing a Start button, a service will begin that polls a few sensors, stores the sensor data into some object whenever the sensor values change. Every 10ms, a database insert occurs that takes the objects current values and stores them into the database. This happens for 30 minutes
Given the speed and duration of insertion, I want to run this in a separate thread from the UI thread so navigation doesn't take a hit. So my service will offer some data to the thread by adding it to a queue, then the other thread (consumer) will take from the queue and insert into the database
When the Stop button is pressed, I need to make sure to process the rest of the queue before killing off the thread.
It seems that everywhere I look, some sort of blocking queue is recommended for producer/consumer type situations (e.g. LinkedBlockingQueue vs ConcurrentLinkedQueue, or What's the different between LinkedBlockingQueue and ConcurrentLinkedQueue?)
My question is, does a blocking queue make sense in my situation?
The most vital thing in this app is that all data gets inserted into the db. From what I understand (please correct me if I'm wrong), but if the queue becomes full, and the consumer thread can't do inserts quickly enough to free up more queue space, then the producer is blocked from adding things to the queue? If that's right then by the time queue the has free space, a few sensor readings would have gone by and they would not be inserted into the db due to the blocking
At the end of the day, I just need the best way to ensure that data gets inserted every 10ms without skipping a beat. In my mind it makes sense to dump the values into some unbounded, limitless queue every 10ms, and have the consumer poll it as soon as it's able. Then when Stop is pressed, drain the rest of the queue before killing the thread.
So what is the correct way to handle this in a 1 producer/1 consumer situations?
If I were you, I would use a single thread executor for this task - it already comes with the exact functionality that you need out of the box. More info here.
Related
I have an architecture where there are a set of "daemon" processes that form my platform. These daemon processes are full Hazelcast members and are the datastore for all data in the application. The actual business logic is segregated from the daemons and resides in a large number of microservice style components that are located either physically on the same server or on different machines (vms, containers, etc). The services can modify data in the datastore and subscribe to events in the datastore from the daemons, but the model is actually quite different and abstracted from Hazelcast's map view so my events are not as simple as listening to map modifications but are generated when multiple maps are modified in certain ways. The service clients (Hazelcast lite members) define the events that they want to listen to. The catch is, multiple instances (any number) of each flavour of service component could be running and I only want one instance (any one) to handle the each event (i.e. round-robin or load balancing).
My current solution is to use a Hazelcast queue. The Daemon's listen to events on maps and decide when to trigger an event based on those maps. The daemon that is the owner of the key is the one that will trigger the event so that the event is only triggered in one place. I push this event onto a queue, which each instance of a listener for this event is connected to. Thus, whoever gets to the event first processes it.
For example, I have a datasource microservice called IncomingBondPrices that puts the prices into the daemon datastore. I have 10 instances of a separate microservice called priceProcessor. When a price reaches a certain threshold the daemons trigger an event (let's call it "PriceThresholdReached"). I want one and only one of the 10 instances of priceProcessor to handle each event so if I am streaming in hundreds or thousands of prices the load of handling the events is split across my instances of priceProcessor.
My concern is what happens if there are no consumers? I can't find any way to count the number of consumers on a hazelcast queue. The system is entirely dynamic, the services start-up and send the definitions of events that their interested in to the daemons. It is possible for 1, 2, 20, or 100 instances of any given service to be started and it is possible that they may all be shut down and there will no longer be any subscribers for the event. If there are currently no subscribers to a given event i'd like to destroy the queue and not push any events to it. I do not want events to queue up if there are no subscribers...
How could I go about managing this? The only way I can come up with is to keep a count of the subscribers for each event type in the daemons and destroy the queues when that drops to 0. But my concern is that services will most likely be killed without a graceful shutdown so they won't have a chance to explicitly tell the daemon they're not listening anymore. Managing this would require me to explicitly check that all members are still alive or subscribe to the events when Hazlecast has found that a member has disconnected and then track down all if that member's subscriptions to end them. Is there a better way to do this? It seems overly complex. Ideally what I would like is for some way to find on the queue how many current members are running a take() on the queue at any given time and if that is 0 and there is no data on the queue then destroy it.
Thank-you,
Troy.
What I can suggest to you, is create a dedicated ISet (or IMap) with the name "registerConsumers" for instance. Each of consumers writes its id into the set and removes it on shutdown hook.
Producers checks initially the set and registers an ItemListener to be updated. The question what to do, if process of the listener failed without good luck? Hope to load balancing - will start a new instance and you will see new one. If you used IMap, then consumer cans update its time (in the value of the map) periodically, while producer checks periodically last update and removes guys which did not update time. This way, if yo see that there are no consumers, then simply persist data in another storage, waiting up to a consumer available. Why to destroy queues- finally a consuming microservice must start at a time.
I need a blocking queue that has a size of 1, and every time put is applied it removes the last value and adds the next one. The consumers would be a thread pool in which each thread needs to read the message as it gets put on the queue and decide what to do with it, but they shouldn't be able to take from the queue since all of them need to read from it.
I was considering just taking and putting every time the producer sends out a new message, but having only peek in the run method of the consumers will result in them constantly peeking, won't it? Ideally the message will disappear as soon as the peeking stops, but I don't want to use a timed poll as it's not guaranteed that every consumer will peek the message in time.
My other option at the moment is to iterate over the collection of consumers and call a public method on them with the message, but I really don't want to do that since the system relies on real time updates, and a large collection will take a while to iterate through completely if I'm going through each method call on the stack.
After some consideration, I think you're best off, with each consumer having its own queue and the producer putting its messages on all queues.
If there are few consumers, then putting the messages on those few queues will not take too long (except when the producer blocks because a consumer can't keep up).
If there are many consumers this situation will be highly preferable over a situation where many consumers are in contention with each other.
At the very least this would be a good measure to compare alternate solutions against.
Previously, when I use single-producer mode of disruptor, e.g.
new Disruptor<ValueEvent>(ValueEvent.EVENT_FACTORY,
2048, moranContext.getThreadPoolExecutor(), ProducerType.Single,
new BlockingWaitStrategy())
the performance is good. Now I am in a situation that multiple threads would write to a single ring buffer. What I found is that ProducerType.Multi make the code several times slower than single producer mode. That poor performance is not going to be accepted by me. So should I use single producer mode while multiple threads invoke the same event publish method with locks, is that OK? Thanks.
I'm somewhat new to the Disruptor, but after extensive testing and experimenting, I can say that ProducerType.MULTI is more accurate and faster for 2 or more producer threads.
With 14 producer threads on a MacBook, ProducerType.SINGLE shows more events published than consumed, even though my test code is waiting for all producers to end (which they do after a 10s run), and then waiting for the disruptor to end. Not very accurate: Where do those additional published events go?
Driver start: PID=38619 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[SINGLE] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6956894 eventsProcessed=4954645
Stats: events/sec=494883.36 sec/event=0.0000 CPU=82.4%
Using ProducerType.MULTI, fewer events are published than with SINGLE, but more events are actually consumed in the same 10 seconds than with SINGLE. And with MULTI, all of the published events are consumed, just what I would expect due to the careful way the driver shuts itself down after the elapsed time expires:
Driver start: PID=38625 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[MULTI] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6397109 eventsProcessed=6397109
Stats: events/sec=638906.33 sec/event=0.0000 CPU=30.1%
Again: 2 or more producers: Use ProducerType.MULTI.
By the way, each Producer publishes directly to the ring buffer by getting the next slot, updating the event, and then publishing the slot. And the handler gets the event whenever its onEvent method is called. No extra queues. Very simple.
IMHO, single producer accessed by multi threads with lock won't resolve your problem, because it simply shift the locking from the disruptor side to your own program.
The solution to your problem varies from the type of event model you need. I.e. do you need the events to be consumed chronologically; merged; or any special requirement. Since you are dealing with disruptor and multi producers, that sounds to me very much like FX trading systems :-) Anyway, based on my experience, assuming you need chronological order per producer but don't care about mixing events between producers, I would recommend you to do a queue merging thread. The structure is
Each producer produces data and put them into its own named queue
A worker thread constantly examine the queues. For each queue it remove one or several items and put it to the single producer of your single producer disruptor.
Note that in the above scenario,
Each producer queue is a single producer single consumer queue.
The disruptor is a single producer multi consumer disruptor.
Depends on your need, to avoid a forever running thread, if the thread examine for, say, 100 runs and all queues are empty, it can set some variable and go wait() and the event producers can yield() it when seeing it's waiting.
I think this resolve your problem. If not please post your need of event processing pattern and let's see.
I have an requirement where I have to send the alerts when the record in db is not updated/changed for specified intervals. For example, if the received purchase order doesn't processed within one hour, the reminder should be sent to the delivery manager.
The reminder/alert should sent exactly at the interval (including seconds). If the last modified time is 13:55:45 means, the alert should be triggered 14:55:45. There could be million rows needs to be tracked.
The simple approach could be implementing a custom scheduler and all the records will registered with it. But should poll the database to look for the change every second and it will lead to performance problem.
UPDATE:
Another basic approach would be a creating a thread for each record and put it on sleep for 1 hour (or) Use some queuing concept which has timeout. But still it has performance problems
Any thoughts on better approach to implement the same?
probably using internal JMS queue would be better solution - for example you may want to use scheduled message feature http://docs.jboss.org/hornetq/2.2.2.Final/user-manual/en/html/examples.html#examples.scheduled-message with hornetq.
You can ask broker to publish alert message after exactly 1h. From the other hand during processing of some trading activity you can manually delete this message meaning that the trade activity has been processed without errors.
Use Timer for each reminder.i.e. If the last modified time is 17:49:45 means, the alert should be triggered 18:49:45 simply you should create a dynamic timer scheduling for each task it'll call exact after one hour.
It is not possible in Java, if you really insist on the "Real-timeness". In Java you may encouter Garbage collector's stop-the-world phase and you can never guarantee the exact time.
If the approximate time is also permissible, than use some kind of scheduled queue as proposed in other answers, if not, than use real-time Java or some native call.
If we can assume that the orders are entered with increasing time then:
You can use a Queue with elements that have the properties time-of-order and order-id.
Each new entry that is added to the DB is also enqueued to this Queue.
You can check the element at the start of the Queue each minute.
When checking the element at the start of the Queue, if an hour has passed from the time-of-order, then search for the entry with order-id in the DB.
If found and was not updated then send a notification, else dequeue it from the Queue .
I'm really new to programming and having performance problems with my software. Basically I get some data and run a 100 loop on it(i=0;i<100;i++) and during that loop my program makes 1 of 3 decisions, keep the data its working on, discard it, or send a version of it back to the queue to process. The individual work each thread does is very small but there's a lot of it(which is why I'm using a queue server to scale horizontally).
My problem is it never takes close to my entire cpu, my program runs at around 40% per core. After profiling, it seems the majority of the time is spend sending/receiving data from the queue(64% approx. in a part called com.rabbitmq.client.impl.Frame.readFrom(DataInputStream) and com.rabbitmq.client.impl.SocketFrameHandler.readFrame(), 17% approx. is getting it in the format for the queue(I brought down from 40% before) and the rest is spend on my programs logic). Obviously, I want my work to be done faster and want to not have it spend so much time in the queue and I'm wondering if there's a better design I can use.
My code is actually quite large but here's a overview of what it does:
I create a connection to the queue server(rabbitmq and java)
I fork as many threads as I have cpu cores(using the same connection)
Data from thread is
each thread creates its own channel to the queue server using the shared connection.
There'a while loop that pools the server and gets X number of messages without acknowledgments
Once I get a message, I use thread executor to send an acknowledge while my job is running
I parse the message and run my loop
If data is sent back to the queue, I send it to a thread executor that sends it back so my program can proceed with the next data set.
One weird thing I did, was although I use thread executor for acknowledgments and sending to the queue, my main worker thread is just a forked thread(using public void run()) because my program is dedicated to this single process I did that to make sure there was always X number of threads ready to work(and there was no shutting down/respawning of them). The rest is in threads because I figured the rest could wait/be queued while my main program runs.
I'm not sure how to design it better so it spends less time gathering/sending data. Is there any designs, rabbitmq, Java things I can use to help?
If it's not IO wait, then I suspect that it's down to some locking going on inside those methods.
It looks to me like your threads are spending a significant amount of time waiting for them to return. Somewhat counter-intuitively, you might well be able to increase your performance by cutting down on the number of threads, since they'll spend less time tripping over each other and more time actively doing something.
Give it a try and see what affect it has on the profile.