Publishing Spring Batch metrics using Micrometer

Publishing Spring Batch metrics using Micrometer - java

I have an app that contains 2 dozen of spring batch cron jobs.There is no rest controller as it is an analytics app and it runs daily and read data from db, process it, and then store aggregated data in another db.I want to have spring inbuilt metrics on the jobs using micrometer and push them to Prometheus .As my app is not a webserver app, so still micrometer will be publishing results on HOST:8080? Will actuator automatically start a new server on HOST:8080?or do we need to have application server running on 8080?
My understanding is that actuator and application server can run of different ports as these are different processes ?Even if application server is there or not, actuator should be able to either use same port as application server port, or it can use different port?
So if my application is not a webserver based app, still I can access metrics at localhost:8080/actuator/ and publish to Prometheus?

Prometheus is a pull-based system, meaning you give it a URL from your running application and it will go pull metrics from it. If your application is an ephemeral batch application, it does not make sense to make it a webapp for the only sake of exposing a URL for a short period of time. That's exactly why Prometheus folks created the Push gateway, see When to use the Push Gateway.
Now with is in mind, in order for your batch applications to send metrics to Prometheus, you need:
A Prometheus server
A Pushgateway server
An optional metrics dashbaord (Grafana or similar, Prometheus also provides a built-in UI)
Make your batch applications push metrics to the gateway
A complete example with this setup can be found in the Batch metrics with Micrometer. This example is actually similar to your use case. It shows two jobs scheduled to run every few seconds which store metrics in Micrometer's main registry and a background task that pushes metrics regularly from Micrometer's registry to Prometheus's gateway.
Another option is to use the RSocket protocol, which is provided for free if you use Spring Cloud Dataflow.
For Spring Boot, there are no actuator endpoints for Spring Batch, please refer to Actuator endpoint for Spring Batch for more details about the reasons about this decision.

#Mahmoud I think there are valid use cases for exposing the health endpoints optionally. The first question to consider is when we say a batch operation runs for a short time, how short is that time - a few minutes? I agree there's no need; but how about jobs that run for a few hours? it's important for some jobs that we get metrics especially when such jobs are bound by a business SLA and the operator needs to know if the job is processing at the required operations per second, has the right connection pool size etc.
There are also a variety of implementation details of the running platform - we can use Spring Batch without SCDF, not be in control of the Prometheus gateway to be able to use push, run in a cloud where Istio will pull the metrics automatically etc.
For the OPs question, in general one can run a spring batch job in web instance, as far as I have used Spring Batch with a web instance, the application does shut down after job completion.

Related

Spring Batch Processing using Spring Cloud data flow

I have spring batch app and I want an admin portal to manage failed jobs and see other job related activities. I saw there was some Spring batch Admin portal package in spring, but it has been deprecated in 2017 and I have to use Spring cloud data flow as mentioned here. I want to know for Spring cloud data flow, is this some dependency we need to add to project as an artifact or is this some separate standalone service that needs to be set up?
My batch has dozen of cron jobs, can I just give my jar to Cloud Data Flow and it will take care of rest or Do I need to configure each and every job there? Any sample for the same are appreciated, as I want to know how big will be an effort to set up all this.
On the side note : My app is a combination of some REST controllers and some batch jobs.So does it make sense to use cloud data flow? If not, then is there better console manager for batch jobs(like restart ,cancel jobs portal) etc.?

Spring Data Flow requires a server to be running, where you can deploy your jars (register tasks as wrapper over spring batch). This server will be responsible for orchestration and deploying to runtime. If you have massive work load you probably need to go with cluster and Kubernetes, which supports scheduling via cron, but if now you have a single server that handles all together and you don't have performance issues, you may simplify it by using Local mode. But with Local mode you have to manage scheduling anyway by yourself with Quartz for example.
https://dataflow.spring.io/docs/feature-guides/batch/scheduling/
So just having SCDF for monitoring may be complicated and probably requires re-think your application design. Also SCDF as I see is good when you have some dependencies between tasks.
Maybe it would be easier for you to write couple of REST endpoint to fetch failed jobs and re-run them - everything depends on what you need and how big your app is.
PS:
I'm currently also thinking between having just spring batch monolith or cloud ready tasks :)

I am trying to create a separate HTTPServer alongside the one running for management and metrics exposure. Is it a good idea?

I am running a spring boot application, and I have added Micrometer for exposing the metrics for Prometheus to scrape. My requirement is that the prometheus URL should not be beneath /actuator and neither it should be anything like /prometheus. It should be accessible by hitting / on a different port other than the port specified for management using management.server.port. So I thought of creating a com.sun.net.httpserver type HTTP server on a separate thread which will listen on my desired port and respond to requests made to / with the metrics. This is feasible, however I am wondering if this would be a good idea in terms of:
Scalability.
Accuracy in reading the system metrics and exposing.
What if the thread gets interrupted at any point and therefore stop!
Do spring's management APIs like healthcheck,heartbeat,prometheus (prometheus when micrometer is added) run on different instance of Tomcat?

Spring Cloud Data Flow: is it possible to run without any messaging middle-ware (kafka/rabbit) or with a db nstead of a queue?

I am new to Spring Cloud Data flow and I've been reading through the tutorials, trying to set up a project locally. (https://dataflow.spring.io/docs/installation/local/manual/)
Am I right to assume that a queuing system is a prerequisite for the servers to run?
How is this messaging middle-ware used by the data flow server and by the skipper server?
Is there a way to use a db to store state instead of passing it from one app to the next using a queue?

You can run it without messaging middleware. In that case the streams features are disabled, but you can still work with Spring Cloud tasks and Spring Batch jobs.
Essentially, in such a setup you only need the dataflow server and a database (i.e. MySQL).
To do so, just set the feature toggle spring.cloud.dataflow.features.streams-enabled to false. See also: https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#configuration-local
Hope that helps!

How do you use cron jobs using Elastic Beanstalk and Java?

I want to run cron jobs and use the same code base. I found a few solutions, but they don't appear ideal. For example, with Heroku, you can add a Scheduler element and fill in the commands to run in a web page.
http://blog.rotaready.com/scheduled-tasks-elastic-beanstalk-cron/
It seems overly complicated for load-balanced instances.
It makes use of require('async') in Node, but what would be a Java Spring Boot equivalent?
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
There doesn't appear to be any security. Any one the net could access the /path to POST and execute the job, causing a denial-of-service attack.
it mentions cron.yaml which doesn't make sense as the app is deployed via a WAR/ZIP file to a Tomcat instance (Spring Boot).
It mentions Amazon DynamoDB, which we don't use. We use MySQL.
It doesn't specify whether the load balancer connection draining timeout is in effect for these jobs (10s).
It mentions "Worker Configuration card on the Configuration page in the environment management console" but there is no Worker Configuration card under Configuration page.
Running a cron job in Elastic Beanstalk
For Python/Django - uses cron.yaml.
I thought of just having a dedicated EC2 instance, but how can I deploy the latest code changes there?
This may also belong on SoftwareEngineering.StackExchange.

There is an easy way to do this using other AWS systems.
You can use CloudWatch to set scheduled events (https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html). You can set a rule to set the event on a set schedule.
You then have at least two options:
set the event to publish an SNS message and use that SNS to call a web hook on your server. Many examples on how to do this but you will have to make sure you check the signature to ensure the web API is called from the signed SNS. But this would use a public API and may not be something you are comfortable with.
set the event to publish an SQS message. Then set an elastic beanstalk worker to process the SQS message or just run a background script on your main server, which is basically on an infinite loop polling SQS for work to do.
Not sure how familiar you are with these systems so not sure if it will be clear what I am talking about, but there is no way to give a detail solution so hope this is enough to give you ideas.

Integrating weekly e-mail delivery/newsletter with Spring Framework

For my Spring-based web application, I now have the requirement to send out weekly e-mails to my application's users.
What are elegant solutions to this requirements?
Up until now, I have come up with the following possible solutions:
a dedicated cron job that I schedule to run once a week, running independently from my web application JVM process and outside of the web application Servlet container. This process takes care of sending out those weekly e-mails. To accomplish sending personalized e-mails, it reuses domain classes (such as my User class) that I have already developed for my web application. This dedicated process accesses my application's MySQL database concurrently to the running Spring Web MVC servlet?
a scheduled mechanism inside my Spring Web MVC servlet or inside my Servlet container.
In this setup, the e-mail sending happens inside the same JVM and the same servlet container as my web-serving Spring Web MVC servlet. Maybe this setup has (irrelevant?) advantages such as "database connection pool sharing" and "transaction sharing" "class sharing" with the servlet hosted inside the same environment.
Using or not using Spring Batch, for any of the above conceived setups. I have no experience right now with Spring Batch as to judge whether Spring Batch is or isn't an adequate tool for my requirement.
Maybe there are other solutions as well?
I am especially interested in answers that can give insights and guide in making an educated decision.
It is irrelevant for this particular question whether e-mails get sent with my own infrastructure or with a third party e-mail SaaS service.

From your description, the code for generating newsleters must share common code base with your main application. So the natural solution is to develop this code withing your main application. The open case is how this code is triggered:
From CRON. You start a script from CRON that would trigger the function within you application somehow. This somehow may be a process listening on specific port, or, what is quite natural for web application, a dedicated URL that would trigger newsletter. Just make sure that URL can't be run from outside, only from localhost (check caller IP, for example). You must, however, deal with the situation, that your app is down (restarting for example) when CRON launches the script.
From within the application. For example, using Quartz. The minus is that you need to include new library, create database tables for Quartz. The plus - Quartz will handle situation, when the task was scheduled on the moment when the application was down, because it stores the information about what was launched in DB.

We always use cron to fire a JMS message to a queue and have a dedicated process which consumes these messages. You can add the email contents to the message or just use the message as a trigger. The nice thing about this approach is you can fire in a JMS message from anywhere and have multiple handlers lots of different email scenarios. The only downside is installing a JMS broker, if you don't already have one...

I am building a Spring-MVC based web application which is required to send a weekly newsletter to a small group of people. I am using Spring's built-in scheduling mechanism. http://static.springsource.org/spring/docs/3.0.x/reference/scheduling.html‎
Yes, in this setup, the e-mail sending happens inside the same JVM and the same servlet container and it is quite easy and handy to implement the solution. I am observing the stability and reliability of this mechanism and cannot feedback more about it now.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.