I have around 1000 entries in my datastore and this is likely to increase with time to around 10,000 entries. My task is to update each row's certain properties and save it back and this task has to be performed every 24 hours.
So, what should I use?
First, you create a cron job that runs every 24 hours.
Second, you need to decide what this cron job will do. The simplest option is to update all 1,000 records. You can retrieve and save all entities in large batches (i.e. 500 per call). If this is a simple update of values, it will take just a few seconds.
Since cron jobs are not retried if they fail, a better option is to create a task and add it to the queue. All updates will happen within that task.
NB: Make sure that if your task is retried, it won't mess the data. If this is not possible, you will have to use some kind of flag (i.e. timestamp of last update) to separate updated entities from those that still need updates.
As your data set grows, your cron job can start multiple tasks to update, for example, 1,000 records in each task.
In the task queue the tasks have to be added to the queue manually though code. If you want to do this task automatically every x time, what you need is a cron job.
You need both,
Cron job to start your batch update job every 24 hours
Task-queues to process you records.
Related
I have an event system that I can subscribe to for when a specific object is changed. After receiving this event, I want to execute a task for this object.
It is possible that multiple objects are changed at the same time. E.g. if I change 1000 objects I get 1000 events. The problem is that it takes way longer for the task I want to execute to process 1 objects 1000 times than 1000 objects 1 time. I cannot change the way the events are generated.
So what I thought about is to batch up these events when I receive them. E.g. Collect 1000 Items in a Queue and the execute the task on all objects from the collected events.
The problem is: what happens when only 999 objects are changed? Then my task is never executed. So I also want to drain the queue e.g. 5 seconds after the first object was inserted.
Is there any library for this specific task? Or do I have to build this myself with a Queue and some logic to do the things I want?
I'm almost sure that doesn't exist some specific lib for this, what I done once I needed a same strategy for events like you, was create a queue or a repository to store the events, and started a ScheduledExecutorService with a task running at a fixed rate, to consume the events, if there isn't events to consume I just skiped the execution. You can even put a verification in the store add method to see if the store has 1000 or more and hasn't been processed yet, so you can fire the task.
I have a few quartz (2.2) Jobs running. Let's say one is running ever 5 seconds, another on runs every 10 mins.
I don't want 2 jobs to be executed the same time. I've seen this
DisallowConcurrentExecution
but this only applies to jobs from the same instance but I generally don't want two jobs (of any instance) to overlap.
Edit:
All the jobs working with one database, so this is why it's important that they don't run the same time. Each job has different things to do.
The simplest way is to configure the underlying thread pool to use one thread, this will achieve your goal. Add the following property to your quartz.properties configuration file:
org.quartz.threadPool.threadCount
The number of threads available for
concurrent execution of jobs. You can specify any positive integer,
although only numbers between 1 and 100 are practical. If you only
have a few jobs that fire a few times a day, then one thread is
plenty. If you have tens of thousands of jobs, with many firing every
minute, then you want a thread count more like 50 or 100 (this highly
depends on the nature of the work that your jobs perform, and your
systems resources).
I am using Spring 4.2.x and Quartz 2.1.
My requirement is that, I need to poll database at certain interval (say every 30 seconds) and pull 5 records at a time and schedule them for processing. At any given point of time, only 5 should be submitted for processing.
For example, if 5 are submitted, one of them completed, then I need to pull another one and submit it so 5 of them are constantly processed.
I can write a cron trigger and poll the database and pull 5 records at a time and submit them as Job for processing. But if none of first batch of 5 are not processed (i.e. still being processed), then I should not be submitting again. I have to wait until at least one of them is finished, then submit x number of jobs to meet 5 limit.
What is the approach? Do I need to use countdownlatch or something I can do within spring/quartz that can handle this scenario?
Does Quartz thread count (org.quartz.threadPool.threadCount) help here?
Good Day,
I am required to write a java server that performs an action every X minutes. The action is to check a database to see if the current/system time matches any of the times in a database, and to pull out those items, and send a TCP message to them.
Hencen, the database call is local on the machine, so that is no problem. However, at least 10 TCP calls need to be sent out simultaneously. Hence, the tick may actually need to occur on it's own thread. Can I have some suggestions?
Do I need a thread pool?
one thing you can do is create a schedular job and run that job every x minutes. so that job will be perform every x minutes and you need to define your task in job to perform for more info click here
I would use a Timer or else I would use the Quartz Scheduler - the former is more lightweight, while the latter is (optionally) durable (meaning that scheduled tasks will be saved to a database and reloaded when your program restarts).
Either TimerTask (or) ScheduledExecutorServices implementations would be best options for this task. Yes, I think thread pool would be best option because you don't need to create 10 threads every X minutes.
I'm about to create a small application which will be responsible for sending out various reports to various users at various intevals. We might be talking about 50 or 100 different reports going to different people. Some reports needs to be generated every day, some every week, and some every month.
I've been using the Quartz library earlier to run tasks at regular intervals. However, in order to keep things simple I like the thought of having a single Quartz thread taking care of all reports. That is, the thread should loop through all reports, say every 15 minutes, and determine wether it is time for one or more to be generated and sent. It does not matter if a report is generated at 12:00 or 12:15.
I'm thinking about wether it would be possible, somehow, for each report to set up specific times such as "mon#12:00,wed#12:00" or "fri#09:30". Then, based on that, the thread would determine if it was time to send a report or not.
My question is; has anyone else done something like this and does any libraries exist which can make it easy to implement this task?
why not simply register a separate quartz task instance for each report and let Quartz handle all the scheduling for you? That is after all the point behind it.
you can create just single thread and it would ping a "job schedule data structure" at some time interval to see if it needs to run a report. If yes, it would run the report, otherwise, it would go for a short nap and ping again after specified sleep time.
It will cause problem if one job takes too much time to complete and you start accumulating jobs.
The job schedule data structure would keep its record sorted by time stamp.