Spring Batch Processing using Spring Cloud data flow - java

I have spring batch app and I want an admin portal to manage failed jobs and see other job related activities. I saw there was some Spring batch Admin portal package in spring, but it has been deprecated in 2017 and I have to use Spring cloud data flow as mentioned here. I want to know for Spring cloud data flow, is this some dependency we need to add to project as an artifact or is this some separate standalone service that needs to be set up?
My batch has dozen of cron jobs, can I just give my jar to Cloud Data Flow and it will take care of rest or Do I need to configure each and every job there? Any sample for the same are appreciated, as I want to know how big will be an effort to set up all this.
On the side note : My app is a combination of some REST controllers and some batch jobs.So does it make sense to use cloud data flow? If not, then is there better console manager for batch jobs(like restart ,cancel jobs portal) etc.?

Spring Data Flow requires a server to be running, where you can deploy your jars (register tasks as wrapper over spring batch). This server will be responsible for orchestration and deploying to runtime. If you have massive work load you probably need to go with cluster and Kubernetes, which supports scheduling via cron, but if now you have a single server that handles all together and you don't have performance issues, you may simplify it by using Local mode. But with Local mode you have to manage scheduling anyway by yourself with Quartz for example.
https://dataflow.spring.io/docs/feature-guides/batch/scheduling/
So just having SCDF for monitoring may be complicated and probably requires re-think your application design. Also SCDF as I see is good when you have some dependencies between tasks.
Maybe it would be easier for you to write couple of REST endpoint to fetch failed jobs and re-run them - everything depends on what you need and how big your app is.
PS:
I'm currently also thinking between having just spring batch monolith or cloud ready tasks :)

Related

Java retry logic after migration or application restart

I am looking for a library that can retry faied jobs after server restart. For example my API expose endpoint that allow end user to upload photo and then I need asynchronusly upload this photo to third part API.
Spring Retry seemed a perfect option for me, but I can't be sure the faild job will resume after application restart. I tried to implement RetryContextCache that stores serialized objects in the database, but it does not work.
If there is any production ready library or any other way I can achive this?
You can take a look at the JobRunr and Quartz Scheduler libs.
Alternatively you could just use Spring Scheduler and check manually, if there new/unfinished jobs.

Publishing Spring Batch metrics using Micrometer

I have an app that contains 2 dozen of spring batch cron jobs.There is no rest controller as it is an analytics app and it runs daily and read data from db, process it, and then store aggregated data in another db.I want to have spring inbuilt metrics on the jobs using micrometer and push them to Prometheus .As my app is not a webserver app, so still micrometer will be publishing results on HOST:8080? Will actuator automatically start a new server on HOST:8080?or do we need to have application server running on 8080?
My understanding is that actuator and application server can run of different ports as these are different processes ?Even if application server is there or not, actuator should be able to either use same port as application server port, or it can use different port?
So if my application is not a webserver based app, still I can access metrics at localhost:8080/actuator/ and publish to Prometheus?
Prometheus is a pull-based system, meaning you give it a URL from your running application and it will go pull metrics from it. If your application is an ephemeral batch application, it does not make sense to make it a webapp for the only sake of exposing a URL for a short period of time. That's exactly why Prometheus folks created the Push gateway, see When to use the Push Gateway.
Now with is in mind, in order for your batch applications to send metrics to Prometheus, you need:
A Prometheus server
A Pushgateway server
An optional metrics dashbaord (Grafana or similar, Prometheus also provides a built-in UI)
Make your batch applications push metrics to the gateway
A complete example with this setup can be found in the Batch metrics with Micrometer. This example is actually similar to your use case. It shows two jobs scheduled to run every few seconds which store metrics in Micrometer's main registry and a background task that pushes metrics regularly from Micrometer's registry to Prometheus's gateway.
Another option is to use the RSocket protocol, which is provided for free if you use Spring Cloud Dataflow.
For Spring Boot, there are no actuator endpoints for Spring Batch, please refer to Actuator endpoint for Spring Batch for more details about the reasons about this decision.
#Mahmoud I think there are valid use cases for exposing the health endpoints optionally. The first question to consider is when we say a batch operation runs for a short time, how short is that time - a few minutes? I agree there's no need; but how about jobs that run for a few hours? it's important for some jobs that we get metrics especially when such jobs are bound by a business SLA and the operator needs to know if the job is processing at the required operations per second, has the right connection pool size etc.
There are also a variety of implementation details of the running platform - we can use Spring Batch without SCDF, not be in control of the Prometheus gateway to be able to use push, run in a cloud where Istio will pull the metrics automatically etc.
For the OPs question, in general one can run a spring batch job in web instance, as far as I have used Spring Batch with a web instance, the application does shut down after job completion.

Multifunctional SpringBoot JAVA Application (REST/BATCH/LAMBDA)

I have a java spring boot application that runs a job to upload data to Database after polling a message from SQS and this application also contains a REST API over that same database.
Now I need to decouple the upload functionality and REST API.
Upload functionality would be done by an AWS Batch Job which would be triggered by a lambda.
Rest API would be simply as it was before.
Challenge is that I need to do all these operations within the same code repo. This is to avoid having 3 repositories one for REST API, another for the AWS Batch Job, and the last for AWS lambda handler.
Thus trying to find out solutions that spring boot can provide to run a same application in different modes. Please help.
I won't recommend using Spring Boot for lambda - technically you can, but it's waste of money. Spring Boot is overhead for java, it requires more memory, so it's more expensive.
You need to create a multi-module Maven application. The modules would be:
Existing Spring Boot app.
Batch job.
Common code, used by modules 1 & 2.
Simple new lambda.
... more modules if you need ...
But if you still sure that for some reason you want to wrap existing Spring Boot app into lambda, this library would help you:
https://github.com/awslabs/aws-serverless-java-container/wiki/Quick-start---Spring-Boot

Spring Cloud Data Flow: is it possible to run without any messaging middle-ware (kafka/rabbit) or with a db nstead of a queue?

I am new to Spring Cloud Data flow and I've been reading through the tutorials, trying to set up a project locally. (https://dataflow.spring.io/docs/installation/local/manual/)
Am I right to assume that a queuing system is a prerequisite for the servers to run?
How is this messaging middle-ware used by the data flow server and by the skipper server?
Is there a way to use a db to store state instead of passing it from one app to the next using a queue?
You can run it without messaging middleware. In that case the streams features are disabled, but you can still work with Spring Cloud tasks and Spring Batch jobs.
Essentially, in such a setup you only need the dataflow server and a database (i.e. MySQL).
To do so, just set the feature toggle spring.cloud.dataflow.features.streams-enabled to false. See also: https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#configuration-local
Hope that helps!

Is it possible to restart a springboot application?

I know that by sending a http post request to http://host:port/shutdown, we can shutdown a Springboot application. Is it possible to restart the whole springboot application by sending a http request in a production environment? So we don't need to login in the server to do that. Thank you.
I don't think such a thing exists, I'll be glad to be proven otherwise:
Spring boot doesn't do any assumptions about the environment it runs in. So when spring boot process gets shut down, re-starting it again is "out of competence" of spring boot infrastructure which is just a bunch of java classes running inside a JVM process.
You can find Here a list of endpoints exposed by the spring boot. There is a "shutdown" method that you've mentioned there, but there is no "restart" functionality exposed.
Now there are other techniques that probably can help:
If the application gets shut down because of some illegal state of some spring bean, maybe it makes sense to expose some endpoint that will "clean up" the state and make application operational again. If the application has to be restarted due to changes in configuration files or something, then you might want to consider using spring cloud's Refresh Scope for Beans. It's kind of hard to provide more information here, because you haven't mentioned the reason for shutting down the application, but I guess you've got the direction.
Having said that, there are probably some different ways to achieve what you want depending on the environment your application runs in:
If you're running in AWS for example, you can take advantage of their autoscaling policies, shut down the application remotely and AWS will run another instance for you. I'm not an expert in AWS, but I saw this working in ECS for example.
If you're running "java -jar" just on some server and want to make sure that when your process ends (by using 'shutdown') it should be started again, its possible to use some kind of wrapper that would wrap the process in service and track the service availability. There are even ready solutions for this, like Tanuki wrapper (I'm not affiliated with this product but used once its free version and it served us great)
If you're using Docker infrastructure you can change the policy and restart the container automatically when it gets shut down, I haven't used this by myself, but according to This excellent blog post is perfectly doable.
You should look at Spring boot jenkins You will also find a small article explaining how to configure the project on jenkins.

Categories

Resources