I need to build a pipeline system for setting up different types of data. For example, when a customer signs up, the first time the user logs in, i need to setup some sample data, sample reports. The data is not fully static as it will add some time sensitive stuff, Like tasks that can expire in 30 days, 10 days.
In order to make this happen, we already have some REST services that can insert the data I need to add for the customer. What I was wondering is that for the orchestration, is spring batch the right way or spring integration the right way ?
Spring integration all the way. You could model it in batch but you're not really doing batch processing. You're simply doing repeated tasks per user based on a given signal (first login). I would use integration first. You can always integrate batch later if needs be but integration is a shorter learning curve and I feel suits your needs better.
Related
We have a requirement where we need to pull data from multiple rest API services transform it and populate it into new database. There would be huge amount of records that might have to be fetched, transformed and updated this way. But it is one time activity once all the data that we get from rest calls have been transformed and populated into new DB we would not have to re run the transformation anytime later. What is the best way to achieve in spring.
Can spring batch be possible solution if it has to be a one time execution?
If it is a one-time thing I wouldn't bother using Spring Batch. I would simply call the external APIs, get the data, transform it and then persist it in your database. You could trigger the process either by exposing an endpoint in your own API to start it or relying on a Scheduled task.
Keeping things as simple as possible (but never simpler) is one of the greatest assets you can have while developing software but it is also one of the hardest thing to achieve for us as software engineers simply because we usually overthink the solutions.
For this kind of problem, it will be better if you use the ETL (extract, transfer, and load) tool or framework, my recommendation is Kafka check this link, I think it will be helpful Link
This is more of an architectural question in programming to understand the possibility of my thinking. I have a java microservice rest endpoint that takes an int value. For example, If the int value is 10, the endpoint will get 10 users from database and run through the business logic one user at a time and update different things. This works but I want to know whats the best way to see response of each user at real time to know if there are any unknown exceptions and had to stop the job from running further. I am not sure if postman can do this or an executable? If so, please suggest how and I will get going. Thanks!
Firstly, if you want to see the processing as real-time stream of responses per each user in a task - you have to implement your microservice in reactive way.
If it is not possible - instead of executing one task of 10 users, you can feed you service by small chunks (10 x 1 user) in parallel threads from the client side.
To achieve it, you can create a simple executable leveraging one of the reactive frameworks. For example, RxJava or, in if you are using Spring/Spring Boot, WebFlux framework
Or, you can try to use some tools like Apache Jmeter, if you prefer to configure something instead of developing.
I wanted to build a Application which listens to a queue and does a series of steps.
Basically the application should listen to Queue1 and:
- Get some data from ServiceA[Small amount of data]
- Get some data from ServiceB[Small amount of data]
- Update Some information in Service C [Based on the data]
- Create number of messages[based on the data] on a Queue2.
Due to the flow based nature of this application I was looking into Job Execution system in Spring. However all the steps are designed to be idempotent and the data being transferred between steps is small, hence I did not want a Database with this application.
I started exploring Spring Batch or Spring Task for this. Spring batch provides really good constructs like Tasklet and Steps but there are number of comments recommending connecting Spring Batch to database and how it is designed to manage massive amounts of data, reliably(I don't need reliability here since the queue and idempotent nature provides that.). While I can pass data using the Execution Context there were recommendations against it.
Question:
- Are there simpler starters in the Spring Boot ecosystem which provide workflows/Job like interface which I should use ?
- Is this a valid use case for spring Batch or is that over engineering/misuse of the steps ?
Thanks a lot for the Help
Ayushman
P.S: I can provide exact details of the job but did not want to conflate the question.
I had two projects worth of experience with Spring Batch. I haven't tried Spring Task.
Having said that, my answer is somewhat bias. Spring Batch is a bit notorious to configure. If your application is simple enough, just use "spring-boot-starter-amqp". It will be enough.
By any chance, you decide to use Spring Batch (for its Job and Step Aspects features or other features), you may want to configure to just use an in-memory database (because you don't need any retry/roll-back feature it is providing).
I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!
You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.
Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.
I need to run a very long process in a java based spring boot web application. The process consists of following steps.
Get details for about 3,00,000 users from the database.
Iterate over them.
Generate PDF file for each user using itext.
Save PDF file on the filesystem.
Update the database that the PDF file for the given user has been created.
Update the PDF path for the user in the database.
Now, this entire process can take lots of time. May be lots of hours or even may be days as it consist of creating pdf file for each user, then lots of db updates.
Also, I need this process to run in background so that the rest of the web application can run smoothly.
I am thinking of using Spring Batch or Messaging Queue. Haven't really used any of them, so not sure if they are proper frameworks for such kind of problem or which one of these two is best fit for the problem.
What is the ideal way to implement such kind of tasks?
If you can't name a requirement you expect to be satisfied by a framework / library you most likely won't need one...
Generating PDFs might need a lot of power, you might want to keep this background process away from your main web application on it's own machines.
If it's a simple java process it's usually easier to control and to move it around your environment.
To me this looks like a simple task for "plain" java - KISS. Or am I missing something?
I'd make sure the Finder used to fetch the users from the database is
restartable, i.e. only fetches unprocessed users (in case you have to stop the processing because shit happens:-)
runs in batches to keep the db round trips and load low
is multi threadable i.e. can fetch users split into a given number of threads (userid mod numberOfThreads, assuming userId is evenly distributed) so you can add more machines / threads if necessary.
You should use spring batch for this process. When the user presses the button, you would launch the job asynchronously. It will then run in a separate thread and process all your records. The current status of the job can be obtained from the job repository. Spring batch is made for this type of processing.