How to monitor cluster using Promeheus? - java

Initially we had single node application and we used Prometheus where we set metrics path url to our single node application like this:
- job_name: 'spring-actuator'
metrics_path: '/prometheus'
scrape_interval: 5s
For now we switched to the cloud application and if we set load balancer path - it will use different node each time so we will see some kind of mess. Is there way to aggregate metrics from the cluster using prometheus?

You should use prometheus to gather metrics from individual backends and then use aggregation in query or pre-aggregate data (using prometheus recording rules).
Prometheus has a number of service discovery mechanism built-in and they can be used to automatically find and use all endpoints your app runs on.
For a taste of how configuration can look like you can see for example https://github.com/prometheus/prometheus/blob/release-2.15/config/testdata/conf.good.yml#L199
Depending on which cloud service you use you'll be using different _sd_config directives. All available ones are described in the documentation - https://prometheus.io/docs/prometheus/latest/configuration/configuration/

Related

Spring Boot + k8s - autodiscovery solution

Let's say I have such architecture based on Java Spring Boot + Kubernetes:
N pods with similar purpose (lets say: order-executors) - GROUP of pods
other pods with other business implementation
I want to create solution where:
One (central) pod can communicate with all/specific GROUP of pods to get some information about state of this pods (REST or any other way)
Some pods can of course be replicated i.e. x5 and central pod should communicate with all replicas
Is it possible with any technology? If every order-executor pod has k8s service - there is a way to communicate with all replicas of this pod to get some info about all of them? Central pod has only service url so it doesn't know which replica pod is on the other side.
Is there any solution to autodiscovery every pod on cluster and communicate with them without changing any configuration? Spring Cloud? Eureka? Consul?
P.S.
in my architecture there is also deployed etcd and rabbitmq. Maybe it can be used as part of solution?
You can use a "headless Service", one with clusterIP: none. The result of that is when you do an DNS lookup for the normal service DNS name, instead of a single A result with the magic proxy mesh IP, you'll get separate A responses for every pod that matches the selector and is ready (i.e. the IPs that would normally be the backend). You can also fetch these from the Kubernetes API via the Endpoints type (or EndpointSlices if you somehow need to support groups with thousands, but for just 5 it would be Endpoints) using the Kubernetes Java client library. In either case, then you have a list of IPs and the rest is up to you to figure out :)
I'm not familiar with java, but the concept is something I've done before. There are a few approaches you can do. One of them is using kubernetes events.
Your application should listen to kubernetes events (using a websocket). Whenever a pod with a specific label or labelset has been created, been removed or in terminating state, etc. You will get updates about their state, including the pod ip's.
You then can do whatever you like in your application without having to contact the kubernetes api yourself in your application.
You can even use a sidecar pod which listens to those events and write it to a file. By using shared volumes, your application can read that file and use the content of it.

Publishing Spring Batch metrics using Micrometer

I have an app that contains 2 dozen of spring batch cron jobs.There is no rest controller as it is an analytics app and it runs daily and read data from db, process it, and then store aggregated data in another db.I want to have spring inbuilt metrics on the jobs using micrometer and push them to Prometheus .As my app is not a webserver app, so still micrometer will be publishing results on HOST:8080? Will actuator automatically start a new server on HOST:8080?or do we need to have application server running on 8080?
My understanding is that actuator and application server can run of different ports as these are different processes ?Even if application server is there or not, actuator should be able to either use same port as application server port, or it can use different port?
So if my application is not a webserver based app, still I can access metrics at localhost:8080/actuator/ and publish to Prometheus?
Prometheus is a pull-based system, meaning you give it a URL from your running application and it will go pull metrics from it. If your application is an ephemeral batch application, it does not make sense to make it a webapp for the only sake of exposing a URL for a short period of time. That's exactly why Prometheus folks created the Push gateway, see When to use the Push Gateway.
Now with is in mind, in order for your batch applications to send metrics to Prometheus, you need:
A Prometheus server
A Pushgateway server
An optional metrics dashbaord (Grafana or similar, Prometheus also provides a built-in UI)
Make your batch applications push metrics to the gateway
A complete example with this setup can be found in the Batch metrics with Micrometer. This example is actually similar to your use case. It shows two jobs scheduled to run every few seconds which store metrics in Micrometer's main registry and a background task that pushes metrics regularly from Micrometer's registry to Prometheus's gateway.
Another option is to use the RSocket protocol, which is provided for free if you use Spring Cloud Dataflow.
For Spring Boot, there are no actuator endpoints for Spring Batch, please refer to Actuator endpoint for Spring Batch for more details about the reasons about this decision.
#Mahmoud I think there are valid use cases for exposing the health endpoints optionally. The first question to consider is when we say a batch operation runs for a short time, how short is that time - a few minutes? I agree there's no need; but how about jobs that run for a few hours? it's important for some jobs that we get metrics especially when such jobs are bound by a business SLA and the operator needs to know if the job is processing at the required operations per second, has the right connection pool size etc.
There are also a variety of implementation details of the running platform - we can use Spring Batch without SCDF, not be in control of the Prometheus gateway to be able to use push, run in a cloud where Istio will pull the metrics automatically etc.
For the OPs question, in general one can run a spring batch job in web instance, as far as I have used Spring Batch with a web instance, the application does shut down after job completion.

is there any way to get all pods in cluster without RBAC?

I looked at https://github.com/kubernetes-client/java library but it requires RBAC enabled in cluster. Is any other way to retrieve pods in kubernetes programatically?
As per Kubernetes Java Client library you can find there:
InClusterClient Example (Configure a client while running inside the Kubernetes cluster.).
KubeConfigFileClient Example: (Configure a client to access a Kubernetes cluster from outside.)
The first example from inside the cluster is using serviceaccount applied to the POD.
The second example from outside the cluster is using kubeconfig file.
In the official docs you can find java example of Accessing Kubernetes API Using Java client I it uses kubeconfig file by default stored in $HOME/.kube/config. In addition you can find there other examples how to programmatically access the Kubernetes API with the list of Officially-supported Kubernetes client libraries and Community-maintained client libraries
Please refer also to the Authorization Modes
Kubernetes RBAC allows admins to configure and control access to Kubernetes resources as well as the operations that can be performed on those resources.
RBAC can be enabled by starting the API server with --authorization-mode=RBAC
Kubernetes includes a built-in role-based access control (RBAC) mechanism that allows you to configure fine-grained and specific sets of permissions that define how a given GCP user, or group of users, can interact with any Kubernetes object in your cluster, or in a specific Namespace of your cluster.
Additional resources:
Using RBAC Authorization
Accessing Clusters
Configure Service Accounts for Pods
Authorization Modes
Kubernetes in Production
Hope this help.

Java health monitoring in clustered environment

I am working on back end service, which is running in clustered environment (running three instance in parallel to distribute some calculation job). I am using hazel cast for creating cluster and distributing jobs.
I want to create rest end point to do some health checks of the service. As this service is in clustering mode, i need to check health check in all instances.
How would i achieve this kind of health check across cluster?
Is there any library available which is recommended for this?
One approach is to “push” health indicators to a db (all instances need to know or “discover” the db).
Another approach is to use consul (or similar solutions) to register services with health checks.
Consul has a few java clients you can choose from.
Java platform has JMX feature, you need to implement JMX beans for your services which will provide application metrics. Then you can use one of the existing solutions to monitor JMX metrics (Zabbiz, Grafana, ELK etc.) or implement your own service which will poll or consumes JMX data from the each instance in your cluseter and provide access to this data via rest api.

Custom metrics using the CloudWatch agent with the StatsD protocol

I have a web application running in EC2 instance. It has different API endpoints. I want to count the number of times each API is called. The web application is in Java.
Can anyone suggest to me some articles where I can find proper Java implementation for integration of statsD with CloudWatch?
Refer their doc page https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-custom-metrics-statsd.html, They have mentioned about publishing the metrics in the same page, for your client side you can refer https://github.com/etsy/statsd/wiki#client-implementations.
Usually I follow a simple approach without using statsd, Log the events in the file and sync the file to the Cloudwatch, In cloudwatch you can configure filters and based on filters, you can increment the custom metrics.
Install CloudWatch Agent on your EC2 instance
Locate and open CW Agent config file
Add statsd section into the config file (JSON format)
{
....,
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
AWS CloudWatch agent is smart enough to understand custom tags helping you to correctly split statistics gathered from different API methods ("correctly" here means splitting API methods stats by dimension name, not by metric name). So you need a Java client lib supporting tags. For example, DataDog client
Configure the client instance as explained in the package documentation and that's it. Now you can do thing like this at the beginning of your each REST API operation:
statsd.incrementCounter(“InvocationCount”, 1, “host:YOUR-EC2-INSTANCE-NAME”, “operation:YOUR-REST-OPERATION-NAME”);
CloudWatch will handle everything else automatically. You will be able to see you metrics data flowing in the AWS CloudWatch Console under "CWAgent" namespace. Please be aware that average delay between statds client call and the data visibility in CW Console is about 10-15 minutes.
Manually writing statsd calls in each REST API operation may not be a good idea. Decorators will help you to automatically instrument it with just a several lines of code.

Categories

Resources