I am looking for a library that can retry faied jobs after server restart. For example my API expose endpoint that allow end user to upload photo and then I need asynchronusly upload this photo to third part API.
Spring Retry seemed a perfect option for me, but I can't be sure the faild job will resume after application restart. I tried to implement RetryContextCache that stores serialized objects in the database, but it does not work.
If there is any production ready library or any other way I can achive this?
You can take a look at the JobRunr and Quartz Scheduler libs.
Alternatively you could just use Spring Scheduler and check manually, if there new/unfinished jobs.
Related
I have spring batch app and I want an admin portal to manage failed jobs and see other job related activities. I saw there was some Spring batch Admin portal package in spring, but it has been deprecated in 2017 and I have to use Spring cloud data flow as mentioned here. I want to know for Spring cloud data flow, is this some dependency we need to add to project as an artifact or is this some separate standalone service that needs to be set up?
My batch has dozen of cron jobs, can I just give my jar to Cloud Data Flow and it will take care of rest or Do I need to configure each and every job there? Any sample for the same are appreciated, as I want to know how big will be an effort to set up all this.
On the side note : My app is a combination of some REST controllers and some batch jobs.So does it make sense to use cloud data flow? If not, then is there better console manager for batch jobs(like restart ,cancel jobs portal) etc.?
Spring Data Flow requires a server to be running, where you can deploy your jars (register tasks as wrapper over spring batch). This server will be responsible for orchestration and deploying to runtime. If you have massive work load you probably need to go with cluster and Kubernetes, which supports scheduling via cron, but if now you have a single server that handles all together and you don't have performance issues, you may simplify it by using Local mode. But with Local mode you have to manage scheduling anyway by yourself with Quartz for example.
https://dataflow.spring.io/docs/feature-guides/batch/scheduling/
So just having SCDF for monitoring may be complicated and probably requires re-think your application design. Also SCDF as I see is good when you have some dependencies between tasks.
Maybe it would be easier for you to write couple of REST endpoint to fetch failed jobs and re-run them - everything depends on what you need and how big your app is.
PS:
I'm currently also thinking between having just spring batch monolith or cloud ready tasks :)
I have an app that contains 2 dozen of spring batch cron jobs.There is no rest controller as it is an analytics app and it runs daily and read data from db, process it, and then store aggregated data in another db.I want to have spring inbuilt metrics on the jobs using micrometer and push them to Prometheus .As my app is not a webserver app, so still micrometer will be publishing results on HOST:8080? Will actuator automatically start a new server on HOST:8080?or do we need to have application server running on 8080?
My understanding is that actuator and application server can run of different ports as these are different processes ?Even if application server is there or not, actuator should be able to either use same port as application server port, or it can use different port?
So if my application is not a webserver based app, still I can access metrics at localhost:8080/actuator/ and publish to Prometheus?
Prometheus is a pull-based system, meaning you give it a URL from your running application and it will go pull metrics from it. If your application is an ephemeral batch application, it does not make sense to make it a webapp for the only sake of exposing a URL for a short period of time. That's exactly why Prometheus folks created the Push gateway, see When to use the Push Gateway.
Now with is in mind, in order for your batch applications to send metrics to Prometheus, you need:
A Prometheus server
A Pushgateway server
An optional metrics dashbaord (Grafana or similar, Prometheus also provides a built-in UI)
Make your batch applications push metrics to the gateway
A complete example with this setup can be found in the Batch metrics with Micrometer. This example is actually similar to your use case. It shows two jobs scheduled to run every few seconds which store metrics in Micrometer's main registry and a background task that pushes metrics regularly from Micrometer's registry to Prometheus's gateway.
Another option is to use the RSocket protocol, which is provided for free if you use Spring Cloud Dataflow.
For Spring Boot, there are no actuator endpoints for Spring Batch, please refer to Actuator endpoint for Spring Batch for more details about the reasons about this decision.
#Mahmoud I think there are valid use cases for exposing the health endpoints optionally. The first question to consider is when we say a batch operation runs for a short time, how short is that time - a few minutes? I agree there's no need; but how about jobs that run for a few hours? it's important for some jobs that we get metrics especially when such jobs are bound by a business SLA and the operator needs to know if the job is processing at the required operations per second, has the right connection pool size etc.
There are also a variety of implementation details of the running platform - we can use Spring Batch without SCDF, not be in control of the Prometheus gateway to be able to use push, run in a cloud where Istio will pull the metrics automatically etc.
For the OPs question, in general one can run a spring batch job in web instance, as far as I have used Spring Batch with a web instance, the application does shut down after job completion.
I want to run cron jobs and use the same code base. I found a few solutions, but they don't appear ideal. For example, with Heroku, you can add a Scheduler element and fill in the commands to run in a web page.
http://blog.rotaready.com/scheduled-tasks-elastic-beanstalk-cron/
It seems overly complicated for load-balanced instances.
It makes use of require('async') in Node, but what would be a Java Spring Boot equivalent?
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
There doesn't appear to be any security. Any one the net could access the /path to POST and execute the job, causing a denial-of-service attack.
it mentions cron.yaml which doesn't make sense as the app is deployed via a WAR/ZIP file to a Tomcat instance (Spring Boot).
It mentions Amazon DynamoDB, which we don't use. We use MySQL.
It doesn't specify whether the load balancer connection draining timeout is in effect for these jobs (10s).
It mentions "Worker Configuration card on the Configuration page in the environment management console" but there is no Worker Configuration card under Configuration page.
Running a cron job in Elastic Beanstalk
For Python/Django - uses cron.yaml.
I thought of just having a dedicated EC2 instance, but how can I deploy the latest code changes there?
This may also belong on SoftwareEngineering.StackExchange.
There is an easy way to do this using other AWS systems.
You can use CloudWatch to set scheduled events (https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html). You can set a rule to set the event on a set schedule.
You then have at least two options:
set the event to publish an SNS message and use that SNS to call a web hook on your server. Many examples on how to do this but you will have to make sure you check the signature to ensure the web API is called from the signed SNS. But this would use a public API and may not be something you are comfortable with.
set the event to publish an SQS message. Then set an elastic beanstalk worker to process the SQS message or just run a background script on your main server, which is basically on an infinite loop polling SQS for work to do.
Not sure how familiar you are with these systems so not sure if it will be clear what I am talking about, but there is no way to give a detail solution so hope this is enough to give you ideas.
I know that by sending a http post request to http://host:port/shutdown, we can shutdown a Springboot application. Is it possible to restart the whole springboot application by sending a http request in a production environment? So we don't need to login in the server to do that. Thank you.
I don't think such a thing exists, I'll be glad to be proven otherwise:
Spring boot doesn't do any assumptions about the environment it runs in. So when spring boot process gets shut down, re-starting it again is "out of competence" of spring boot infrastructure which is just a bunch of java classes running inside a JVM process.
You can find Here a list of endpoints exposed by the spring boot. There is a "shutdown" method that you've mentioned there, but there is no "restart" functionality exposed.
Now there are other techniques that probably can help:
If the application gets shut down because of some illegal state of some spring bean, maybe it makes sense to expose some endpoint that will "clean up" the state and make application operational again. If the application has to be restarted due to changes in configuration files or something, then you might want to consider using spring cloud's Refresh Scope for Beans. It's kind of hard to provide more information here, because you haven't mentioned the reason for shutting down the application, but I guess you've got the direction.
Having said that, there are probably some different ways to achieve what you want depending on the environment your application runs in:
If you're running in AWS for example, you can take advantage of their autoscaling policies, shut down the application remotely and AWS will run another instance for you. I'm not an expert in AWS, but I saw this working in ECS for example.
If you're running "java -jar" just on some server and want to make sure that when your process ends (by using 'shutdown') it should be started again, its possible to use some kind of wrapper that would wrap the process in service and track the service availability. There are even ready solutions for this, like Tanuki wrapper (I'm not affiliated with this product but used once its free version and it served us great)
If you're using Docker infrastructure you can change the policy and restart the container automatically when it gets shut down, I haven't used this by myself, but according to This excellent blog post is perfectly doable.
You should look at Spring boot jenkins You will also find a small article explaining how to configure the project on jenkins.
I have a process that processes an input file, uses 100% of the processor (uses 16 cores), and 8 GB of RAM. I currently run it directly from the console. But I need to call this process from a REST service. The service must be asynchronous, and there will be another service to consult the output of the first service called. The input files must be queued, because it can only be processed one at a time.
I use RestEasy on Wildfly.
My query is:
What architecture do you suggest to call this process?
I have these possible solutions.
Call from my EJB to JAR with Runtime. And have a queue of files in a database.
Transform my JAR into a Demon, which is constantly monitoring a directory. And they will be storing the files there. And that the demon took one by one the files according to the date of arrival.
Copy the classes in my EAR project, and call them as a simple EJB, and let wildfly manage the resources. This would also imply having a file queue in a database.
Do you have any other suggestions?
Instead of writing a queue use a JMS implementation or Kafka. Cloud solutions exist from Google and AWS. Have your REST endpoint publish to the queue and your "daemon" receive from it.
From there it's easy to pull these apart into microservice architecture
Technically the solution is simple: have a rest endpoint call another Stateless EJB with #Asynchronous annotation. If you want to track the status, use some kind of Job entity, that you create (synchronously) when you receive the request, then start the async job with this Job entity as param and update it from your async task.
There is however a more conceptual problem: why do you need to invoke such long running op on your web server? With requirement like this it is going to be very difficult to scale your app, handle failover(what happens if your server crashes during processing)..
If you need to do IO or computation heavy task like this in java env, have a look at batch processing - jberet or spring batch will do the job. That way you can process your file in chunks, paralelize the compution and spread the load more evenly. If you are doing some video processing or similiar task, consider using a dedicated machines with some kind of job queue(celery, kafka...) for that, and just let your server handle the rest layer and job monitoring.