I'm building a small app that models city public transport network. The idea is that each bus stop is a Sink and listens to messages from other bus stops, thus, calculating times the bus will show up.
Bus stops with unique ids are stored in the database and I need to generate and run exactly the number of sinks with unique ids. How do I do that?
My guess is that the task that can be done using Spring Cloud Data Flow that will launch .jar files using (--id) property that'll be injected with #Value notation. But I can't understand how to implement that.
Also found this but it didn't help.
You got some of the concepts correctly, but your implementation may need some help.
So Spring Cloud Dataflow is an orchestration engine that deploys boot applications and connects them using a middleware.
Those apps can be Streaming apps, which mean they use Spring Cloud Stream as an abstraction layer to communicate with a middleware (Rabbit or Kafka), and in its core it has three types of apps: Sources (data emitters), Processors (data transformation) and sinks (data receivers)
You use dataflow to combine those and deploy to a runtime (Local, CloudFoundry, K8S, YARN)
So, yes SCDF can be used for your assignment, however you do not want to create one sink per bus, this is abusing your resources.
You can have a simple stream that captures the data from your buses (the source), maybe do some transformation and sinks it to a DB
You can then create a tap that listens to the messages stored in a DB if you are interested in processing them.
You can tap that information and have a client that broadcasts it downstream (your display at each bus stop)
So for example you can have just one single sink, but have a websocket for example where each client connects and passes an id. You can then forward the events received filtered by that id to this specific client.
This is a much more efficient way to deal with that.
Related
I am new to Spring Cloud Data flow and I've been reading through the tutorials, trying to set up a project locally. (https://dataflow.spring.io/docs/installation/local/manual/)
Am I right to assume that a queuing system is a prerequisite for the servers to run?
How is this messaging middle-ware used by the data flow server and by the skipper server?
Is there a way to use a db to store state instead of passing it from one app to the next using a queue?
You can run it without messaging middleware. In that case the streams features are disabled, but you can still work with Spring Cloud tasks and Spring Batch jobs.
Essentially, in such a setup you only need the dataflow server and a database (i.e. MySQL).
To do so, just set the feature toggle spring.cloud.dataflow.features.streams-enabled to false. See also: https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#configuration-local
Hope that helps!
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline. However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval). In addition, both can be configured to communicate with the rest of the pipeline through the message broker. Currently there is this unanswered question, so I've yet to find a clear answer.
Please see my response as below:
My current understanding is that both of these projects are under Spring Cloud Dataflow, and serve as components of the pipeline.
Both Spring Cloud Stream and Spring Cloud Task are not under Spring Cloud Data Flow, instead, they can be used as standalone projects and Spring Cloud Data Flow just uses them.
Spring Cloud Stream lets you bind your event-driven long-running applications into a messaging middleware or a streaming platform. As a developer, you have to choose your binder (the binder implementations for RabbitMQ, Apache Kafka etc.,) to stream your events or data from/to the messaging middleware you bind to.
Spring Cloud Task doesn't bind your application into a messaging middleware. Instead, it provides abstractions and lifecycle management to run your ephemeral or finite duration applications (tasks). It also provides the foundation for developing Spring Batch applications.
However, both can be made recurring (a stream is by definition recurring, where a task can run every certain time interval)
A task application can be triggered/scheduled to make it a recurring one whereas the streaming application is a long-running, not a recurring one.
In addition, both can be configured to communicate with the rest of the pipeline through the message broker.
Though a task application can be configured to communicate to a messaging middleware, the concept of pipeline is different when it comes to stream vs task (batch). For the streaming applications, the pipeline refers to the communication via the messaging middleware while for the task applications, the concept of composed tasks lets you create a conditional workflow of multiple task applications. For more information on composed tasks, you can refer to the documentation.
I must create a small IOT platform based on Spring Boot/Java 8.
Context: Devices send some various informations to the platform. I must save them and after consume them in an analysis algorithm.
Constraint: I want it all be async and the platform must be based on Java8/Spring technologies or must be easily integrated in a spring boot app.
What I imagine? I thought send devices' informations to Async Spring REST controller and save them async in Mongodb.
I have already the analysis algorithm based on Google Guava Event Bus.
To resume, I have datas from devices in Mongodb database and an algorithm based on Java POJO and the last part which is missing is transform datas from devices to Java POJO.
With which technologies can I do that? Spring Reactor? RxJava? Something else? And how can I put this in place?
I search something simple to put in place which can easily scale by instance duplication for example. For the moment, I thought that Spring Cloud technologies is a bit too big for my purpose.
You should have a look at Spring XD engine.
Spring XD Enables Different Sources (HTTP, FTP, MQTT, File etc), Transformers, Filters, Sinks (HTTP, FTP, MQTT, File etc).
Please check this post on a small IoT Project based on Spring XD and Twitter API.
i am going to integrate some applications using RabbitMQ. Now i am facing the design issue. Right now i am having one application producing message and one application consuming it (in future more are possible). Both applications have access to some database. Application A is some kind of registration application when it receives registration request it sends message to on rabitmq. Now application b receives this message and its task is to load the registration data to elasticsearch server. Now i have some options
consumer will read the message and id from q and load the data and send it to the elastic search server
fastest throughput. Because things will move in asynchronous way. other process which may be running on separate
server will loading the data and sending to elastic server
consumer will read the message and id from the q and then call the rest service to load the company data.
will take more time for processing each request as it will be having network overhead.although it will save time to data load
but will add network delay. And it will by pass the ESB(Message Broker) also. (i personally think if i am using esb in my application
it is not necessary that i use it for every single method call)
send all the registration data in the message. consumer will receive it and just upload it to elasticsearch server.
which approach i should follow?
Apparently there are many components to your application set up that is hard to take into account and suggest a straightforward answer. I would suggest that you should look into each design and identify I/O points, calls over the network and data volume exchanged over the network. Then depending on the load you expect and the volume of data you expect to store over time I would suggest you hierarchize these bottlenecks giving a higher score depending on the severity of it. Identify the one solution that has the lowest score and go with that.
I would suggest you should benchmark the difference between sending only the iq or sending the whole object. I would expect that the difference is negligible.
One suggestion. Make your objects immutable. It is not directly relevant with what you are describing but in situations like yours, where components are operating "blindly" you will find that knowing that an object has not changed state is a big assurance.
I am looking to push large amounts of data from my Java Web Application to AWS. Within my Java application I have some flexibility in the approach/technology to use. Generally I am trying to dump large amounts of system data into a AWS store for historical purposes that can eventually be reported on and server for audit/historical purposes.
1) The Java Web app (N nodes) will push system-diagnostic information to AWS in near-real time.
2) System-diagnostic information will be collected by a custom plugin for the system and push to some AWS end-point for aggregation.
3) New information to push to AWS will be available approx every second
4) Multiple java web apps will be collecting and pushing information to a central serve
I am looking for the best way to transport the data from the java apps to AWS; Ideally the solution would integrate well on the AWS side and not be overly complex to implement on the Java Web app side (ex. I do not want to have to run some other app/DS to provide an intermediary store). I do not have strong opinions on the AWS storage technology yet either.
Example ideas: Batch HTTP POST data from java web app to AWS, use JMS solution to send data out, leverage some Logger technology to "write" to a AWS datastore.
Assuming that the diagnostic information is not too big I would consider SQS. If you have different classes of data, you can push the different types to different queues. You can then consume the messages in the queue(s) either from servers running in EC2 or on your own servers.
SQS will deliver each message at least once, but you have to be ready for a given message to be delivered multiple times. Duplicates do happen occasionally.
If your payloads are large, you will want to drop them in S3. If you have to go this route, you might want to use SQS as well: create a file in S3 and push a message to SQS with the S3 filename so you make sure all the payloads get processed.
I would imagine that you will push the data packets into SQS and then have a separate process that will consume the messages and insert into a database or other store in a format that supports whatever reporting/aggregation requirements you might have. The queue provides scalable flow control so you size the message consumption/processing for your average data rate, even though your data production rate will likely vary greatly during the day.
SQS only holds messages for a maximum of 14 days, so you must have some other process that will consume the messages and do some long-term storage.