I am building a spring cloud-based microservice ML pipeline.
I have a data ingestion service that (currently) takes in data from SQL, this data needs to be used by the prediction service.
The general consensus is that writes should use async message-based communication using kafka/rabbitmq.
What I am not sure about is how do I orchestrate these services?
Should I use an API gateway that invokes ingestion that starts the pipeline?
Typically you would build a service with rest endpoints (Spring Boot) to ingest the data. This service can then be deployed multiple times behind a api gateway (Zuul, Spring Cloud) that takes care about routing. This is the default spring cloud microservices setup. The ingest service can then convert the data and produce it to a RabbitMQ or Kafka. I recommend using Spring Cloud Stream for the interaction with the queue, it's abstraction on top of RabbitMQ and Kafka, which can be configured using starters/binders.
Spring Cloud Dataflow is a declarative approach for orchestration of your queues and also takes care of deployment on several cloud services/platforms. This can also be used but might add extra complexity to your use case.
Related
I found these 3 ways for implementing messaging with Google Pub Sub:
with client libraries
https://cloud.google.com/pubsub/docs/publisher
with spring integration message channels and PubSubTemplate API
https://dzone.com/articles/spring-boot-and-gcp-cloud-pubsub
without message channels but with PubSubTemplate API
https://medium.com/bb-tutorials-and-thoughts/gcp-how-to-subscribe-and-send-pubsub-messages-in-spring-boot-app-b27e2e8863e3
I want to understand the differences between them / when each is best to use and which would be useful for my case.
I have to implement a single Topic and a single Subscription to get the queue functionality. I think I'd rather not use Spring message channels if not needed , they seem to intermediate the communication between Pub Sub topic and the subscription and I don't want that. I want things simple , so I think the option 3 would be best but I am also wondering about option 1.
Option 1, client libraries, is universal. You don't need Spring to run it, you can use this library in Groovy or in Kotlin also.
Option 2, it's deeply integrated to Spring. It's quite invisible but if you have special thing to do, it's tricky to override this implementation
Option 3, it's a light spring integration. PubSubTemplate (the client in fact) is loaded automatically for you at startup, as any bean and you can use it easily in your code. It's my preferred option when I use Spring.
Google Cloud Pub/Sub Using Client Libraries :
Using Google Cloud Pub/Sub with Client libraries is one of the standard and easiest way to implement Cloud Pub/Sub.
A producer of the data publishes messages to Pub/Sub topic, a subscriber client then creates a subscription to that topic and consumes messages.
You need to install the client libraries. You can follow this setup and tutorial for further information.
Here you won't require Spring integration, you can directly use the client library to publish messages and pull it from subscription.
Spring Integration using spring channels :
This use case involves intensive integration of Spring Boot Application with Google Cloud Pub/Sub using Spring Integration to send and receive Pub/Sub messages. ie. Pub/Sub acts as intermediate messaging system
Here The Spring Application sends messages to Cloud Pub/Sub topic utilizing spring channels and the Application further receives messages from Pub/Sub through these channels.
Pub/Sub message in Spring-Boot App :
This use case is a simple and valid example of integrating Cloud Pub/Sub with Spring boot application.
The use case demonstrates how to subscribe to a subscription and send message to topics using Spring Boot Application
Message is published to the topic, queued in the respective subscription and then received by the subscriber Spring Boot Application
I have a Lagom service which writes events for persistent entities to cassandra.
I would like to process those events from another application (not lagom-based) by embedding a read-side processor which would connect to the same cassandra.
I did not find any documentation on how to embed lagom services or read-side processors into existing java/scala applications, is it possible?
It isn't recommended to have multiple services sharing the same database. This causes tight coupling between the services and can make it hard to upgrade in the future.
Instead, you can use the Lagom Message Broker Topic Producer API to publish events to a Kafka topic. Then, you can consume these from another application, either using Lagom with the Message Broker Consumer API, or using any other Kafka client.
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline. However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval). In addition, both can be configured to communicate with the rest of the pipeline through the message broker. Currently there is this unanswered question, so I've yet to find a clear answer.
Please see my response as below:
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline.
Both Spring Cloud Stream and Spring Cloud Task are not under Spring Cloud Data Flow, instead, they can be used as standalone projects and Spring Cloud Data Flow just uses them.
Spring Cloud Stream lets you bind your event-driven long-running applications into a messaging middleware or a streaming platform. As a developer, you have to choose your binder (the binder implementations for RabbitMQ, Apache Kafka etc.,) to stream your events or data from/to the messaging middleware you bind to.
Spring Cloud Task doesn't bind your application into a messaging middleware. Instead, it provides abstractions and lifecycle management to run your ephemeral or finite duration applications (tasks). It also provides the foundation for developing Spring Batch applications.
However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval)
A task application can be triggered/scheduled to make it a recurring one whereas the streaming application is a long-running, not a recurring one.
In addition, both can be configured to communicate with the rest of the pipeline through the message broker.
Though a task application can be configured to communicate to a messaging middleware, the concept of pipeline is different when it comes to stream vs task (batch). For the streaming applications, the pipeline refers to the communication via the messaging middleware while for the task applications, the concept of composed tasks lets you create a conditional workflow of multiple task applications. For more information on composed tasks, you can refer to the documentation.
I have several apps developed using spring boot. Some apps call another apps, that in time call other apps, it is getting hard to manage and scale. I need to be able to distribute them in a network and also combine the apps in different 'flows' with minimun changes to the apps.
Ideally I would like to wrap the apps and abstract them into components that have N inputs and M outputs. At boot time I would use some configuration to wire the inputs and outputs to real kafka topic queues.
For instance, input A to an app can come from several kafka topic queues, and the output B from the same app can go to other set of kafka topic queues.
I would like to be able to change the queues without having to recompile the apps, also no extra network hops to send/receive from/to multiple queues, this should in the same process and multi threaded.
Does anybody knows if something similar exists already? Can spring integration do this? Apache Camel? Or am I better off writing it myself?
See Spring for Apache Kafka. There is also a spring-integration-kafka extension that sits on top of spring-kafka.
The Spring for Apache Kafka (spring-kafka) project applies core Spring concepts to the development of Kafka-based messaging solutions. It provides a "template" as a high-level abstraction for sending messages. It also provides support for Message-driven POJOs with #KafkaListener annotations and a "listener container". These libraries promote the use of dependency injection and declarative. In all of these cases, you will see similarities to the JMS support in the Spring Framework and RabbitMQ support in Spring AMQP.
I have two spring boot apps running in the same local network and they need to communicate with each other. An obvious answer is to leverage REST API and make http calls, but I would like to use Spring Integration project for this purpose.
That said, I have several architectural questions regarding that:
Should I setup a standalone messaging framework (e.g. Rabbit MQ) or embedded should also work (e.g. messaging will be embedded to one of the two apps).
If standalone, what messaging framework should I choose: ActiveMQ, RabbitMQ or something else?
Welcome to the Messaging Microservices world!
You go right way, but forget the embedded middleware if you are going to production. Especially when your application will be distributed geographically.
So, right you need some Message Broker and that should be definitely external one.
It's really your choice which one is better for your purpose. For example you can even take into account Apache Kafka or Redis.
If we talk here about Spring Integration it might be better for you to consider to use our new product - Spring Cloud Stream.
With that you just have your applications as Spring Boot Microservices which are able to connect to the external middleware transparently for the application. You just deal with message channels in the application!