I looked at https://github.com/kubernetes-client/java library but it requires RBAC enabled in cluster. Is any other way to retrieve pods in kubernetes programatically?
As per Kubernetes Java Client library you can find there:
InClusterClient Example (Configure a client while running inside the Kubernetes cluster.).
KubeConfigFileClient Example: (Configure a client to access a Kubernetes cluster from outside.)
The first example from inside the cluster is using serviceaccount applied to the POD.
The second example from outside the cluster is using kubeconfig file.
In the official docs you can find java example of Accessing Kubernetes API Using Java client I it uses kubeconfig file by default stored in $HOME/.kube/config. In addition you can find there other examples how to programmatically access the Kubernetes API with the list of Officially-supported Kubernetes client libraries and Community-maintained client libraries
Please refer also to the Authorization Modes
Kubernetes RBAC allows admins to configure and control access to Kubernetes resources as well as the operations that can be performed on those resources.
RBAC can be enabled by starting the API server with --authorization-mode=RBAC
Kubernetes includes a built-in role-based access control (RBAC) mechanism that allows you to configure fine-grained and specific sets of permissions that define how a given GCP user, or group of users, can interact with any Kubernetes object in your cluster, or in a specific Namespace of your cluster.
Additional resources:
Using RBAC Authorization
Accessing Clusters
Configure Service Accounts for Pods
Authorization Modes
Kubernetes in Production
Hope this help.
Related
Let's say I have such architecture based on Java Spring Boot + Kubernetes:
N pods with similar purpose (lets say: order-executors) - GROUP of pods
other pods with other business implementation
I want to create solution where:
One (central) pod can communicate with all/specific GROUP of pods to get some information about state of this pods (REST or any other way)
Some pods can of course be replicated i.e. x5 and central pod should communicate with all replicas
Is it possible with any technology? If every order-executor pod has k8s service - there is a way to communicate with all replicas of this pod to get some info about all of them? Central pod has only service url so it doesn't know which replica pod is on the other side.
Is there any solution to autodiscovery every pod on cluster and communicate with them without changing any configuration? Spring Cloud? Eureka? Consul?
P.S.
in my architecture there is also deployed etcd and rabbitmq. Maybe it can be used as part of solution?
You can use a "headless Service", one with clusterIP: none. The result of that is when you do an DNS lookup for the normal service DNS name, instead of a single A result with the magic proxy mesh IP, you'll get separate A responses for every pod that matches the selector and is ready (i.e. the IPs that would normally be the backend). You can also fetch these from the Kubernetes API via the Endpoints type (or EndpointSlices if you somehow need to support groups with thousands, but for just 5 it would be Endpoints) using the Kubernetes Java client library. In either case, then you have a list of IPs and the rest is up to you to figure out :)
I'm not familiar with java, but the concept is something I've done before. There are a few approaches you can do. One of them is using kubernetes events.
Your application should listen to kubernetes events (using a websocket). Whenever a pod with a specific label or labelset has been created, been removed or in terminating state, etc. You will get updates about their state, including the pod ip's.
You then can do whatever you like in your application without having to contact the kubernetes api yourself in your application.
You can even use a sidecar pod which listens to those events and write it to a file. By using shared volumes, your application can read that file and use the content of it.
Initially we had single node application and we used Prometheus where we set metrics path url to our single node application like this:
- job_name: 'spring-actuator'
metrics_path: '/prometheus'
scrape_interval: 5s
For now we switched to the cloud application and if we set load balancer path - it will use different node each time so we will see some kind of mess. Is there way to aggregate metrics from the cluster using prometheus?
You should use prometheus to gather metrics from individual backends and then use aggregation in query or pre-aggregate data (using prometheus recording rules).
Prometheus has a number of service discovery mechanism built-in and they can be used to automatically find and use all endpoints your app runs on.
For a taste of how configuration can look like you can see for example https://github.com/prometheus/prometheus/blob/release-2.15/config/testdata/conf.good.yml#L199
Depending on which cloud service you use you'll be using different _sd_config directives. All available ones are described in the documentation - https://prometheus.io/docs/prometheus/latest/configuration/configuration/
What server side technologies would i need to learn to develop a cloud based storage system for users using my service?
Currently i am using java spring and hibernate and have developed a login system. I am wondering how would i be able to store users's files on my server separately for each user and allow access to files accordingly.
It seems that you are looking for a document-oriented database: https://en.m.wikipedia.org/wiki/Document-oriented_database
In case you're not allowed to use a fully managed service such as S3, there are options like MongoDB cluster: https://docs.aws.amazon.com/quickstart/latest/mongodb/architecture.html
"The following AWS components are deployed and configured as part of this reference deployment:
A VPC configured with public and private subnets across three Availability Zones.*
In the public subnets, NAT gateways to allow outbound internet connectivity for resources (MongoDB instances) in the private subnets. (For more information, see the Amazon VPC Quick Start.)*
In the public subnets, bastion hosts in an Auto Scaling group with Elastic IP addresses to allow inbound Secure Shell (SSH) access. One bastion host is deployed by default, but this number is configurable. (For more information, see the Linux bastion host Quick Start.)*
An AWS Identity and Access Management (IAM) instance role with fine-grained permissions for access to AWS services necessary for the deployment process.
Security groups to enable communication within the VPC and to restrict access to only necessary protocols and ports.
In the private subnets, a customizable MongoDB cluster with the option of running standalone or in replica sets, along with customizable Amazon EBS storage. The Quick Start launches each member of the replica set in a different Availability Zone. However, if you choose an AWS Region that doesn’t provide three or more Availability Zones, the Quick Start reuses one of the zones to create the third subnet.
You can choose to launch the Quick Start for a new VPC or use your existing VPC. The template that deploys the Quick Start into an existing VPC skips the creation of components marked by asterisks and prompts you for your existing configuration."
My question relates to using either Zookeeper or Hashicorp's Vault as a back-end data store to Spring's Cloud Config Server.
We're currently running a number of Spring Boot micro-services that rely on a Spring Config Server to serve each service's configuration. This works well and we have no issues with it.
Initially, config server ran on the native profile and had the config files embedded in the application. This doesn't work as each time we make a configuration change to any of the applications we needed to redeploy config-server.
Using GIT is obviously more robust and we were in the process of switching to a standalone GIT backend when we were asked to look into using Zookeeper or Vault instead.
Which brings me the question:- is it at all possible to use Vault/Zookeeper as the back-end data store for Config Server without needing each application to talk to Vault/Zookeeper directly?
Thanks
Yes, it's possible to use a different backend (like Vault or SVN, called EnvironmentRepository) in Spring Cloud Config without touching your clients.
See the reference docs on more details.
To update this:
We switched out the Zookeeper backend for Consul instead as we were able to use SSL for the connection between Vault and Consul. This isn't currently available when using Zookeeper as the storage backend.
We now have a working configuration stack comprising of Consul, Vault and Spring Cloud Config Server with SSL enabled between all three. Additionally, Consul and Vault are both running in a clustered mode with replication between all nodes in the cluster.
Working well thus far.
We have developed few applications (written in Java) that would consume services of Kafka, spark, Yarn and HDFS. Currently our cluster is not security enabled and we are planning to enable the security shortly. These applications are run using spark-submit (through Yarn). Once these services are kerberos'd, I would like to know if we can use JAAS based Single Sign On (SSO) to access these services.Our plan is to create a common java package that will handle this authentication and authorization for all the services. Is that possible? An example would really help. I already looked at examples from oracle site and other blogs but it is always pointing to LDAP authorization but nothing specific to hadoop