My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline. However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval). In addition, both can be configured to communicate with the rest of the pipeline through the message broker. Currently there is this unanswered question, so I've yet to find a clear answer.
Please see my response as below:
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline.
Both Spring Cloud Stream and Spring Cloud Task are not under Spring Cloud Data Flow, instead, they can be used as standalone projects and Spring Cloud Data Flow just uses them.
Spring Cloud Stream lets you bind your event-driven long-running applications into a messaging middleware or a streaming platform. As a developer, you have to choose your binder (the binder implementations for RabbitMQ, Apache Kafka etc.,) to stream your events or data from/to the messaging middleware you bind to.
Spring Cloud Task doesn't bind your application into a messaging middleware. Instead, it provides abstractions and lifecycle management to run your ephemeral or finite duration applications (tasks). It also provides the foundation for developing Spring Batch applications.
However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval)
A task application can be triggered/scheduled to make it a recurring one whereas the streaming application is a long-running, not a recurring one.
In addition, both can be configured to communicate with the rest of the pipeline through the message broker.
Though a task application can be configured to communicate to a messaging middleware, the concept of pipeline is different when it comes to stream vs task (batch). For the streaming applications, the pipeline refers to the communication via the messaging middleware while for the task applications, the concept of composed tasks lets you create a conditional workflow of multiple task applications. For more information on composed tasks, you can refer to the documentation.
Related
I found these 3 ways for implementing messaging with Google Pub Sub:
with client libraries
https://cloud.google.com/pubsub/docs/publisher
with spring integration message channels and PubSubTemplate API
https://dzone.com/articles/spring-boot-and-gcp-cloud-pubsub
without message channels but with PubSubTemplate API
https://medium.com/bb-tutorials-and-thoughts/gcp-how-to-subscribe-and-send-pubsub-messages-in-spring-boot-app-b27e2e8863e3
I want to understand the differences between them / when each is best to use and which would be useful for my case.
I have to implement a single Topic and a single Subscription to get the queue functionality. I think I'd rather not use Spring message channels if not needed , they seem to intermediate the communication between Pub Sub topic and the subscription and I don't want that. I want things simple , so I think the option 3 would be best but I am also wondering about option 1.
Option 1, client libraries, is universal. You don't need Spring to run it, you can use this library in Groovy or in Kotlin also.
Option 2, it's deeply integrated to Spring. It's quite invisible but if you have special thing to do, it's tricky to override this implementation
Option 3, it's a light spring integration. PubSubTemplate (the client in fact) is loaded automatically for you at startup, as any bean and you can use it easily in your code. It's my preferred option when I use Spring.
Google Cloud Pub/Sub Using Client Libraries :
Using Google Cloud Pub/Sub with Client libraries is one of the standard and easiest way to implement Cloud Pub/Sub.
A producer of the data publishes messages to Pub/Sub topic, a subscriber client then creates a subscription to that topic and consumes messages.
You need to install the client libraries. You can follow this setup and tutorial for further information.
Here you won't require Spring integration, you can directly use the client library to publish messages and pull it from subscription.
Spring Integration using spring channels :
This use case involves intensive integration of Spring Boot Application with Google Cloud Pub/Sub using Spring Integration to send and receive Pub/Sub messages. ie. Pub/Sub acts as intermediate messaging system
Here The Spring Application sends messages to Cloud Pub/Sub topic utilizing spring channels and the Application further receives messages from Pub/Sub through these channels.
Pub/Sub message in Spring-Boot App :
This use case is a simple and valid example of integrating Cloud Pub/Sub with Spring boot application.
The use case demonstrates how to subscribe to a subscription and send message to topics using Spring Boot Application
Message is published to the topic, queued in the respective subscription and then received by the subscriber Spring Boot Application
I have a Lagom service which writes events for persistent entities to cassandra.
I would like to process those events from another application (not lagom-based) by embedding a read-side processor which would connect to the same cassandra.
I did not find any documentation on how to embed lagom services or read-side processors into existing java/scala applications, is it possible?
It isn't recommended to have multiple services sharing the same database. This causes tight coupling between the services and can make it hard to upgrade in the future.
Instead, you can use the Lagom Message Broker Topic Producer API to publish events to a Kafka topic. Then, you can consume these from another application, either using Lagom with the Message Broker Consumer API, or using any other Kafka client.
I have several apps developed using spring boot. Some apps call another apps, that in time call other apps, it is getting hard to manage and scale. I need to be able to distribute them in a network and also combine the apps in different 'flows' with minimun changes to the apps.
Ideally I would like to wrap the apps and abstract them into components that have N inputs and M outputs. At boot time I would use some configuration to wire the inputs and outputs to real kafka topic queues.
For instance, input A to an app can come from several kafka topic queues, and the output B from the same app can go to other set of kafka topic queues.
I would like to be able to change the queues without having to recompile the apps, also no extra network hops to send/receive from/to multiple queues, this should in the same process and multi threaded.
Does anybody knows if something similar exists already? Can spring integration do this? Apache Camel? Or am I better off writing it myself?
See Spring for Apache Kafka. There is also a spring-integration-kafka extension that sits on top of spring-kafka.
The Spring for Apache Kafka (spring-kafka) project applies core Spring concepts to the development of Kafka-based messaging solutions. It provides a "template" as a high-level abstraction for sending messages. It also provides support for Message-driven POJOs with #KafkaListener annotations and a "listener container". These libraries promote the use of dependency injection and declarative. In all of these cases, you will see similarities to the JMS support in the Spring Framework and RabbitMQ support in Spring AMQP.
I'm building a small app that models city public transport network. The idea is that each bus stop is a Sink and listens to messages from other bus stops, thus, calculating times the bus will show up.
Bus stops with unique ids are stored in the database and I need to generate and run exactly the number of sinks with unique ids. How do I do that?
My guess is that the task that can be done using Spring Cloud Data Flow that will launch .jar files using (--id) property that'll be injected with #Value notation. But I can't understand how to implement that.
Also found this but it didn't help.
You got some of the concepts correctly, but your implementation may need some help.
So Spring Cloud Dataflow is an orchestration engine that deploys boot applications and connects them using a middleware.
Those apps can be Streaming apps, which mean they use Spring Cloud Stream as an abstraction layer to communicate with a middleware (Rabbit or Kafka), and in its core it has three types of apps: Sources (data emitters), Processors (data transformation) and sinks (data receivers)
You use dataflow to combine those and deploy to a runtime (Local, CloudFoundry, K8S, YARN)
So, yes SCDF can be used for your assignment, however you do not want to create one sink per bus, this is abusing your resources.
You can have a simple stream that captures the data from your buses (the source), maybe do some transformation and sinks it to a DB
You can then create a tap that listens to the messages stored in a DB if you are interested in processing them.
You can tap that information and have a client that broadcasts it downstream (your display at each bus stop)
So for example you can have just one single sink, but have a websocket for example where each client connects and passes an id. You can then forward the events received filtered by that id to this specific client.
This is a much more efficient way to deal with that.
I am building a spring cloud-based microservice ML pipeline.
I have a data ingestion service that (currently) takes in data from SQL, this data needs to be used by the prediction service.
The general consensus is that writes should use async message-based communication using kafka/rabbitmq.
What I am not sure about is how do I orchestrate these services?
Should I use an API gateway that invokes ingestion that starts the pipeline?
Typically you would build a service with rest endpoints (Spring Boot) to ingest the data. This service can then be deployed multiple times behind a api gateway (Zuul, Spring Cloud) that takes care about routing. This is the default spring cloud microservices setup. The ingest service can then convert the data and produce it to a RabbitMQ or Kafka. I recommend using Spring Cloud Stream for the interaction with the queue, it's abstraction on top of RabbitMQ and Kafka, which can be configured using starters/binders.
Spring Cloud Dataflow is a declarative approach for orchestration of your queues and also takes care of deployment on several cloud services/platforms. This can also be used but might add extra complexity to your use case.