is there a quartz feature to store and read instance specific results with a job instance. Simplest example is a job's status (running, finished ok, finished with errors + msg). This would also mean to store a history of jobs that ran already.
I just saw examples of storing job data which is not instance specific.
Thanks,
Karsten
Related
How to access Job creation time using taskcontext.
I'm planing to get this time in the different executors and set it with persisting data which is help full in later processes. Since job creation time is unique even when when retrieved from different executors it helps to keep track of persisted data in a one job.
Is it possible to get from TaskMetrics?
How to access Jobdata class ?
I could do this using listeners. I had to extend the class SparkListners and track when a job ( all the tasks for a job) ends and start and perform actions depending on that.
I wanted to know how does Quartz ensure that only one node in the cluster run a particular job. I know it can be done by setting some properties in quartz.properties like JobStoreTX ,etc, but what is the internal implementation that supports it?
The Clustering in the quartz is done using the database ,so if your job store is JdbcJobStore then the clustering will happen.
Only the scheduler with the same schedule name and distinct instance_name will participate and form a cluster.
1 ClusterManager thread will be used to manage the cluster.This thread ensure that the lastcheckin time is updated at a set interval.If there is any mismatch in that set interval the instance is deemed to be down.
2 Quartz use the standardlocksemaphore for database level clustering of quartz instance
StdRowLockSemaphore
In this class quartz takes a pessimist lock on the row using the
select * from {}_Lock for update
If you check JobStoreSupport class after completing the trigger firing the row lock will be released by completing the transaction.
So if two scheduler instances will complete for the same trigger then only one will be able to succeed.
Hope this helps you in connecting the dots.
Suppose i have 10 records and some of them are corrupted records so how spring will handle restart.
Example suppose record no. 3& 7 are corrupt and they go to different reducer then how spring will handle the restart
1.how it will maintain the queue to track where it last failed.
2.what are the different ways we can solve this one
SpringBatch will do exactly what you tell SpringBatch to do.
Restart for SpringBatch means run the same job that failed with the same set of input parameters. However, the new instance (execution) of this job will be created.
The job will run on the same data set that the failed instance of the job ran on.
In general, it is not a good idea to modify the input data set for you job - the input data of MapReduce job must be immutable (I assume, you will not modify the same data set you use as an input).
In your case the job is likely to complete with the BatchStatus.COMPLETED unless you put a very specific logic in the last step of your SpringBatch job.
This last step will validate all records and if any broken records detected artificially will set the status of the job to BatchStatus.FAILED like below:
jobExecution.setStatus(BatchStatus.FAILED)
Now how to restart the job is a good question that I will answer in a few moments.
However, before restrting the question you need to ask is: if the input data set for your MapReduce job and the code of your MapReduce job have not changed, how restrt is going to help you?
I think you need to have some kind of data set where you dump all the bad records that the original MapReduce job failed to process. Than how to process these broken records is for you to decide.
Anyway, restarting SpringBatch job is easy, once you know what is the ID of failed jobExecution. Below is the code:
final Long restartId = jobOperator.restart(failedJobId);
final JobExecution restartExecution = jobExplorer.getJobExecution(restartId);
EDIT
Read about ItemReader, ItemWriter and ItemProcessor interfaces
I think that you can achieve tracking by using CompositeItemProcessor.
In Hadoop every record in a file must have a unique ID. So, I think you can store the list of IDs of the bad record in the Job context. Update JobParameter that you would have created when the job starts for the first time, call it badRecordsList. Now when you restart/resume your job, you will read the value of the badRecordsListand will have a reference.
Quartz API provide a way in which i can create a Job and add it to scheduler for future use by doing something like
SchdularFactory.getSchedulerInstance().addJob(jobDetail, false);
This provides me the flexibility to create jobs store them with the scheduler and use them in later stage.
i am wondering is there any way i can create triggers and add them to scheduler to be used in future.
Not sure if this is valid requirement but if its not possible than all i have to do is to associate the Trigger with any given/existing Job
In Quartz there is a one-to-many relationship between jobs and triggers, which is understandable: one job can be run by several different triggers but one trigger can only run a single job. If you need to run several jobs, create one composite job that runs these jobs manually.
Back to your question. Creating a job without associated triggers is a valid use-case: you have a piece of logic and later you will attach one or more triggers to execute it at different points in time.
The opposite situation is weird - you want to create a trigger that will run something at a given time, but you don't know yet what. I can't imagine use-case for that.
Note that you can create a trigger for future use (with next fire time far in the future), but it must have a job attached.
Finally, check out How-To: Storing a Job for Later Use in the official documentation.
In my webservice all method calls submits jobs to a queue. Basically these operations take long time to execute, so all these operations submit a Job to a queue and return a status saying "Submitted". Then the client keeps polling using another service method to check for the status of the job.
Presently, what I do is create my own Queue, Job classes that are Serializable and persist these jobs (i.e, their serialized byte stream format) into the database. So an UpdateLogistics operation just queues up a "UpdateLogisticsJob" to the queue and returns. I have written my own JobExecutor which wakes up every N seconds, scans the database table for any existing jobs, and executes them. Note the jobs have to persisted because these jobs have to survive app-server crashes.
This was done a long time ago, and I used bespoke classes for my Queues, Jobs, Executors etc. But now, I would like to know has someone done something similar before? In particular,
Are there frameworks available for this ? Something in Spring/Apache etc
Any framework that is easy to adapt/debug and plays well along with libraries like Spring will be great.
EDIT - Quartz
Sorry if I had not explained more, Quartz is good for stateless jobs (and also for some stateful jobs), but the key for me is very stateful persisted "job instances" (not just jobs or tasks). So for example an operation of executeWorkflow("SUBMIT_LEAVE") might actually create 5 job instances each with atleast 5-10 parameters like userId, accountId etc to be saved into the database.
I was looking for some support around that area, where Job instances can be saved into DB and recreated etc ?
Take a look at JBoss jBPM. It's a workflow definition package that lets you mix automated and manual processes. Tasks are persisted to a DB back end, and it looks like it has some asynchronous execution properties.
I haven't used Quartz for a long time, but I suspect it would be capable of everything you want to do.
spring-batch plus quartz
Depending upon the nature of your job, you might also look into spring-integration to assist with queue processing. But spring-batch will probably handle most of your requirements.
Please try ted-driver (https://github.com/labai/ted)
It's purpose is similar to what you need - you create task (or many of them), which is saved in db, and then ted-driver is responsible to execute it. On error you can postpone retry for later or finish with status error.
Unlike other java frameworks, here tasks are in simple and clear structure in database, where you can manually search or update using standard sql.