I wanted to build a Application which listens to a queue and does a series of steps.
Basically the application should listen to Queue1 and:
- Get some data from ServiceA[Small amount of data]
- Get some data from ServiceB[Small amount of data]
- Update Some information in Service C [Based on the data]
- Create number of messages[based on the data] on a Queue2.
Due to the flow based nature of this application I was looking into Job Execution system in Spring. However all the steps are designed to be idempotent and the data being transferred between steps is small, hence I did not want a Database with this application.
I started exploring Spring Batch or Spring Task for this. Spring batch provides really good constructs like Tasklet and Steps but there are number of comments recommending connecting Spring Batch to database and how it is designed to manage massive amounts of data, reliably(I don't need reliability here since the queue and idempotent nature provides that.). While I can pass data using the Execution Context there were recommendations against it.
Question:
- Are there simpler starters in the Spring Boot ecosystem which provide workflows/Job like interface which I should use ?
- Is this a valid use case for spring Batch or is that over engineering/misuse of the steps ?
Thanks a lot for the Help
Ayushman
P.S: I can provide exact details of the job but did not want to conflate the question.
I had two projects worth of experience with Spring Batch. I haven't tried Spring Task.
Having said that, my answer is somewhat bias. Spring Batch is a bit notorious to configure. If your application is simple enough, just use "spring-boot-starter-amqp". It will be enough.
By any chance, you decide to use Spring Batch (for its Job and Step Aspects features or other features), you may want to configure to just use an in-memory database (because you don't need any retry/roll-back feature it is providing).
Related
To understand if the spring events fits the task im working on I need to understand how they work, where are they stored?
as I can guess they are stored in a spring application context and disappears if the application crashes, is my guess correct?
Spring events are intended to use when calling methods directly would create too much coupling. If you need to track events for auditing or replay purposes you have to save the events yourself. Based on your comments, there are many ways to achieve this, based on the topology and purpose of the application (list not complete):
Model entities that represent the events and store them in a repository
Incorporate a message broker such as Kafka that support message persistence
Install an in-memory cache such as Hazelcast
Use a cloud message service such as AWS SQS
Lastly, please make sure that you carefully evaluate which options suits your needs best. Options 2 to 4 all introduce heavy complexity and distributed applications can bring sorrow and misery to your life. Go for the simplest option if you can and only resort the other options if absolutely necessary.
I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!
You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.
Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.
I need to build a pipeline system for setting up different types of data. For example, when a customer signs up, the first time the user logs in, i need to setup some sample data, sample reports. The data is not fully static as it will add some time sensitive stuff, Like tasks that can expire in 30 days, 10 days.
In order to make this happen, we already have some REST services that can insert the data I need to add for the customer. What I was wondering is that for the orchestration, is spring batch the right way or spring integration the right way ?
Spring integration all the way. You could model it in batch but you're not really doing batch processing. You're simply doing repeated tasks per user based on a given signal (first login). I would use integration first. You can always integrate batch later if needs be but integration is a shorter learning curve and I feel suits your needs better.
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.