I'm working on a project that will record data on real time events using Java on a linux system.
I have all of the HTML scraping stuff down, that's fine, what I need to figure out is the scheduling and management of the tasks.
There are potentially up to forty events occurring each week, at varying times and events can last up to three hours.
I can create and update the calendar of these events at will, my problem is how to:
Schedule a process to scrape each event at the right time, and update the schedule if there's a change.
Ensure once the scrape process has begun that it stays running for the entire (indeterminate) duration of the event.
Can anyone advise how best to approach this? I'm not sure where I need to start.
Thanks!
a) Schedule a process to scrape each event at the right time, and
update the schedule if there's a change.
If you do not want to use a library, a good starting point for scheduling your tasks can be ScheduledExecutorService. Though you may find other scheduling frameworks useful for your problem out of which Quartz can specifically give you a flexibility in how to schedule the next task based on the current schedule execution results; it also provides a cron capability so that if your schedule is fixed, you can take advantage of a fixed scheduled calendar.
b) Ensure once the scrape process has begun that it stays running for
the entire (indeterminate) duration of the event.
Assuming that you're using a library for HTML scraping, you don't need to ensure it's running since it will be Java task object initiated from your application.
Related
we are using spring sceduler using
#Scheduled(cron = "0 15 10 15 * ?")
the problem is that some time we have maintenece and the system is down when the job is sceduled to run.
is there another sceduler we can use ? maybe a parameter that checks if there was scedualed job that didnt run during maintenence and run it when the system is up?
or a recomenation for a different scedualer to use
Thanks
M. Deinum mentioned Quartz as a possible solution. It is a very advanced scheduling product that may handle scheduling for multiple nodes insuring that the job would run only on one node. It has many other features. I haven't used it in long while so you can look up if it is something you want to use.
However, I have dealt with your particular case in a simpler way. Part of the scheduled job responsibility was upon each run to write down into a DB table the last scheduled time (the one in the past that triggered the current run), the next scheduled time and the actual last execution time. Then, after a down time when the server starts up it has to check if the next scheduled time is in the past (also the last execution time will be older then the next scheduled time). If it is so, it is your flag that the the job missed its running due to down time (or any other reason). So you can reschedule or run it now
P.S. This will not address your actual problem, but I wrote my own scheduler and published it as part of an open-source library. My scheduler allows you to set the time intervals in more human readable form such as "4h" for 4 hours or "30m" for 30 minutes and so forth. Also it can handle multiple tasks scheduling and allows you to specify the number of threads that will handle all your scheduled tasks. You can read about it here. The library is called MgntUtils and you can get it as Maven artifacts or from Github repository releases (with source code and Javadoc included). You can read an article about the library that describes some of the features here
I need to execute a task in the future, just once.
Requirements:
- The environment is clustered, so need to take care of competition in the moment that the task gets fired, it cannot execute twice;
- The task can be scheduled a month ahead and cannot be just scheduled in memory as soon as the node can be restarted or even destroyed at a certain moments (it's an Amazon Elastick Beanstalk environment);
Any suggestions will be welcome.
One idea: Instead of trying to get the cron/timer tool to only execute once, you could schedule the task on all of the nodes, but then use some kind of coordination between nodes to decide which one will actually execute the task.
I have a user workflow where at a specific time a webservice is called, and the results are presented to the user.
According to the search request and the queried results, I want to perform some database updates and statistic logging.
As the workflow pauses while the webservice is requested, I thought about creating some kind of background thread that performs these database actions, while the user can already continue the workflow without having to wait for database actions to complete.
Do you think this is a good practice? How could I create such onetime running background threads?
If you only want to run in the background, then an Executor service is a good solution.
If you need to ensure that queued requests survive events like a server restart, then you need a persistent queue like a JMS Queue. There are some nice, free open source JMS implementations that serve this purpose.
If service call teakes little time (say 1 or 2 seconds) then it is a waste of time to develop such feature.
If it takes significant amount of time you should do this in background.
Asynchronous jobs such as download scores from the website, or send emails after completion of some critical tasks.
Rightnow we when we download some scores, we have to wait on the current page to get the response page or to get file downloaded.
Is there a possibility that i can click on download scores and it happens in the background so that i can navigate to other parts
of the website, and in the mean-time check the status of the job. Or Schedule some job later in the future and get its execution results
via email.
Ours is a struts 2 webapplication with Hibernate 3.5 ORM. After browsing into some java scheduling libraries, got some info on Quartz.
But is Quartz the right library for the above requirements or any other library that i can try for?
Please guide me in the right direction.
You will need some sort of asynchronous processing support. You can use:
quartz-scheduler - this library is very comprehensive and allows you to schedule all sorts of jobs. If you want to use it only for the purpose of scheduling jobs in the background and run them immediately, might be an overkill
use thread pool, see Executors class
jms queue can listen on requests and process them asynchronously in mdbs
Finally you can take advantage of #Async/#Asynchronous support in spring or ejb
Then you mut somehow restore the results. Depening on whether you want to deliver them directly in the browser or via e-mail:
every time you are rendering a page, check whether there aren't any completed/in progress jobs. If there are some completed jobs, display an extra link on the page somewhere (sort of notification). If the job is in progress, start an ajax request and ask every other second or use long-polling/comet to receive the result immediately
if you want to send results by e-mail, just send it after the job finishes. Much simpler but less user-friendly IMHO.
Quartz is certainly one way to do that - and works well if you want to schedule a job to run at a particular time or with a particular frequency.
If you just want to kick something off in the background in response to a user action, and check its status, there are a few other ways to do it which may be better suited to this pattern:
the java.util.concurrent package: you can set up a ThreadPoolExecutor and submit tasks to it that implement Callable. You get back a Future<T> object that you can check for completion (isDone) and get its result when complete (get).
with EJB or Spring, there is also a concept of a (session) bean method being #Async or #Asynchronous, which return a Future<T> as well and behave as above. Basically this just abstracts away the thread-pool creation and management from your code, and moves it into the container or framework.
I'm writing an application for a doctor which should be able to define Notifications that will show up in the patient's computer. These Notifications are scheduled by the doctor, so he/she can choose when it's going to show up. For example: "Remeber to take your pills", show once a week, from January to July 2010.
So it would be something like Google's Calendar's event scheduler, but with much richer timing conditions. I'm wondering what's the recommended solution/tool for:
Notification scheduler in the client side. The client's application is a java based application. It should have a background event scheduler that checks for new Noifications and if they timing conditions apply.
Notification designer/manager in the server side. The doctor's application should be able to show a visual tool to define the timing conditions (in java too). The Notifications are store in a database for remote accesing via web service.
Is there an open source tool available for this kind of issue? Also, I've been reading about Drools, but it's a completely new topic to me. Any recommendation on this?
There are various open source schedulers available.
Quartz is one of them, gives fine control for scheduling tasks.
It sounds like you have 3 separate but related issues:
The scheduling of one or more future events.
The persistence of the schedule and related contextual data.
A push model to [re-]deliver a scheduling event from the server to the client.
More or less right ?
For scheduling and persistence, I recommend you look at Quartz. It will provide you a clean API for scheduling (one time or recurring) with some flexibility including fixed period or cron. It will also persist schedule data and context (referred to as a Job) to a JDBC database.
As for #3, I am not clear on how you want this to work, but one possible way this might work is that when the client connects to the server, it non-persistently caches the server provided scheduled events applicable to that client (or user etc.). When the client shuts down, these events are discarded, but renewed on the next connection. Once the events are loaded in the client, the client will assume responsibility for firing them with its own local scheduler (Quartz or even a more simplified ScheduledThreadPoolExcutor).
Drools is an excellent rules engine, but might be overkill for what you are trying to do.
//Nicholas