I have a Lagom service which writes events for persistent entities to cassandra.
I would like to process those events from another application (not lagom-based) by embedding a read-side processor which would connect to the same cassandra.
I did not find any documentation on how to embed lagom services or read-side processors into existing java/scala applications, is it possible?
It isn't recommended to have multiple services sharing the same database. This causes tight coupling between the services and can make it hard to upgrade in the future.
Instead, you can use the Lagom Message Broker Topic Producer API to publish events to a Kafka topic. Then, you can consume these from another application, either using Lagom with the Message Broker Consumer API, or using any other Kafka client.
Related
I found these 3 ways for implementing messaging with Google Pub Sub:
with client libraries
https://cloud.google.com/pubsub/docs/publisher
with spring integration message channels and PubSubTemplate API
https://dzone.com/articles/spring-boot-and-gcp-cloud-pubsub
without message channels but with PubSubTemplate API
https://medium.com/bb-tutorials-and-thoughts/gcp-how-to-subscribe-and-send-pubsub-messages-in-spring-boot-app-b27e2e8863e3
I want to understand the differences between them / when each is best to use and which would be useful for my case.
I have to implement a single Topic and a single Subscription to get the queue functionality. I think I'd rather not use Spring message channels if not needed , they seem to intermediate the communication between Pub Sub topic and the subscription and I don't want that. I want things simple , so I think the option 3 would be best but I am also wondering about option 1.
Option 1, client libraries, is universal. You don't need Spring to run it, you can use this library in Groovy or in Kotlin also.
Option 2, it's deeply integrated to Spring. It's quite invisible but if you have special thing to do, it's tricky to override this implementation
Option 3, it's a light spring integration. PubSubTemplate (the client in fact) is loaded automatically for you at startup, as any bean and you can use it easily in your code. It's my preferred option when I use Spring.
Google Cloud Pub/Sub Using Client Libraries :
Using Google Cloud Pub/Sub with Client libraries is one of the standard and easiest way to implement Cloud Pub/Sub.
A producer of the data publishes messages to Pub/Sub topic, a subscriber client then creates a subscription to that topic and consumes messages.
You need to install the client libraries. You can follow this setup and tutorial for further information.
Here you won't require Spring integration, you can directly use the client library to publish messages and pull it from subscription.
Spring Integration using spring channels :
This use case involves intensive integration of Spring Boot Application with Google Cloud Pub/Sub using Spring Integration to send and receive Pub/Sub messages. ie. Pub/Sub acts as intermediate messaging system
Here The Spring Application sends messages to Cloud Pub/Sub topic utilizing spring channels and the Application further receives messages from Pub/Sub through these channels.
Pub/Sub message in Spring-Boot App :
This use case is a simple and valid example of integrating Cloud Pub/Sub with Spring boot application.
The use case demonstrates how to subscribe to a subscription and send message to topics using Spring Boot Application
Message is published to the topic, queued in the respective subscription and then received by the subscriber Spring Boot Application
I am using Spring kafka implementation and I need to start and stop my kafka consumer through an REST API. For that i am using KafkaListenerEndpointRegistry endpointRegistry
endpointRegistry.getListenerContainer("consumer1").stop();
endpointRegistry.getListenerContainer("consumer1").start();
We are deploying the microservice on kubernetes pod so there might be multiple deployments for the same microservice. how could i manage to start and stop the consumer on all the container.
Kubernetes offers nothing to automatically broadcast an http-request to all pods for a service; so you have to do it yourself.
Broadcasting over Kafka
You can publish the start/stop command from the single instance that receives the http-request to a topic, dedicated for broadcasting commands between all instances.
Of course, you must make sure that each instance can read all message on that topic, so you need to prevent the partitions from being balanced between these instances. You can achieve that by setting a unique group-id (e.g. by suffixing your normal groupId with a UUID) on the Consumer for that topic.
Broadcasting over Http
Kubernetes knows which pods are listening on which endpoints, and you can get that information in your service. Spring Cloud Kubernetes (https://cloud.spring.io/spring-cloud-static/spring-cloud-kubernetes/2.0.0.M1/reference/html/#ribbon-discovery-in-kubernetes) makes it easy to get at that information; there's probably lots of different ways to do that, with Spring Cloud Kubernetes it would go something like this:
Receive the command on the randomly selected pod, get the ServerList from Ribbon (it contains all the instances and the ip-address/port where they can be reached) for your service, and send a new http-request to each of them.
I would prefer the Kafka-approach for its robustness, the http-approach might be easier to implement, if you're already using Spring Cloud.
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline. However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval). In addition, both can be configured to communicate with the rest of the pipeline through the message broker. Currently there is this unanswered question, so I've yet to find a clear answer.
Please see my response as below:
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline.
Both Spring Cloud Stream and Spring Cloud Task are not under Spring Cloud Data Flow, instead, they can be used as standalone projects and Spring Cloud Data Flow just uses them.
Spring Cloud Stream lets you bind your event-driven long-running applications into a messaging middleware or a streaming platform. As a developer, you have to choose your binder (the binder implementations for RabbitMQ, Apache Kafka etc.,) to stream your events or data from/to the messaging middleware you bind to.
Spring Cloud Task doesn't bind your application into a messaging middleware. Instead, it provides abstractions and lifecycle management to run your ephemeral or finite duration applications (tasks). It also provides the foundation for developing Spring Batch applications.
However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval)
A task application can be triggered/scheduled to make it a recurring one whereas the streaming application is a long-running, not a recurring one.
In addition, both can be configured to communicate with the rest of the pipeline through the message broker.
Though a task application can be configured to communicate to a messaging middleware, the concept of pipeline is different when it comes to stream vs task (batch). For the streaming applications, the pipeline refers to the communication via the messaging middleware while for the task applications, the concept of composed tasks lets you create a conditional workflow of multiple task applications. For more information on composed tasks, you can refer to the documentation.
I have several apps developed using spring boot. Some apps call another apps, that in time call other apps, it is getting hard to manage and scale. I need to be able to distribute them in a network and also combine the apps in different 'flows' with minimun changes to the apps.
Ideally I would like to wrap the apps and abstract them into components that have N inputs and M outputs. At boot time I would use some configuration to wire the inputs and outputs to real kafka topic queues.
For instance, input A to an app can come from several kafka topic queues, and the output B from the same app can go to other set of kafka topic queues.
I would like to be able to change the queues without having to recompile the apps, also no extra network hops to send/receive from/to multiple queues, this should in the same process and multi threaded.
Does anybody knows if something similar exists already? Can spring integration do this? Apache Camel? Or am I better off writing it myself?
See Spring for Apache Kafka. There is also a spring-integration-kafka extension that sits on top of spring-kafka.
The Spring for Apache Kafka (spring-kafka) project applies core Spring concepts to the development of Kafka-based messaging solutions. It provides a "template" as a high-level abstraction for sending messages. It also provides support for Message-driven POJOs with #KafkaListener annotations and a "listener container". These libraries promote the use of dependency injection and declarative. In all of these cases, you will see similarities to the JMS support in the Spring Framework and RabbitMQ support in Spring AMQP.
I am building a spring cloud-based microservice ML pipeline.
I have a data ingestion service that (currently) takes in data from SQL, this data needs to be used by the prediction service.
The general consensus is that writes should use async message-based communication using kafka/rabbitmq.
What I am not sure about is how do I orchestrate these services?
Should I use an API gateway that invokes ingestion that starts the pipeline?
Typically you would build a service with rest endpoints (Spring Boot) to ingest the data. This service can then be deployed multiple times behind a api gateway (Zuul, Spring Cloud) that takes care about routing. This is the default spring cloud microservices setup. The ingest service can then convert the data and produce it to a RabbitMQ or Kafka. I recommend using Spring Cloud Stream for the interaction with the queue, it's abstraction on top of RabbitMQ and Kafka, which can be configured using starters/binders.
Spring Cloud Dataflow is a declarative approach for orchestration of your queues and also takes care of deployment on several cloud services/platforms. This can also be used but might add extra complexity to your use case.