We have a microservice written using Spring boot which has its own NoSQL datastore. We are working on functionality whereby we want to delete some old data (in magnitude of 0.5 million documents) and want to do it on a regular basis(once a day) based on presence of records of particular type in data store.
Is having a scheduler which runs once everyday and does the deletion, a correct approach for it ? Also since its a microservice and several instances of it will be running, how do we control that this scheduler runs on only 1 instance ?
There are multiple options I can think of now:
If there is a single instance of micro-service deployed, you can use something like quartz to time the job.
Create a RESTful API for cleanup, invoke it using a script, please refer to https://stackoverflow.com/a/15090893/2817980 for example. This will make sure that only one instance of the service works on cleanup.
If there is a master-slave replica, ask the master to allocate to only 1 instance
Create a scheduled job using something like quartz and then check if the job already taken up by some other scheduler in zookeeper/redis/db or any other storage.
I can discuss more on this.
Related
I have a terminal server monitor project. In the backend, I use the Spring MVC, MyBatis and PostgreSQL. Basically I query the session information from DB and send back to front-end and display it to users. But there is some large queries(like searching total users, total sessions, etc.), which slow down the system when user opens the website, So I want to do these queries as asynchronous tasks so the website could be opened fast rather than waiting for the query. Also, I would check terminal server state periodically from DB(every hour), and if terminal server fails or average load is too high, I would notifying admins. I do not know what should I use, maybe AKKA, or any other way to do these two jobs(1.do the large query asynchronously 2. do some periodical query)? Please help me, thanks!
You can achieve this using Spring and caching where necessary.
If the data you're displaying is not required to be "in real-time", but it can be "near real-time" you can read the data from the DB periodically and cache it. Your app then reads from the cache.
There's different approaches you can explore.
You can try to create a materialized view in PostgreSQL which will hold the statistic data you need. Depending on your requirements you have to see how to handle refresh intervals etc.
Another approach is to use application level cache - you can leverage Spring for that(Spring docs). You can populate the cache on start up and refresh it as necessary.
The task that runs every hour can be implemented again leveraging Spring (Spring docs) #Scheduled annotation.
To answer your question - don't use Akka - you have all the tools necessary to achieve the task in the Spring ecosystem.
Akka is not very relevant here, it is for event-driven programming model which deals with concurrency issues to build highly scalable multithreaded applications.
You can use Spring task scheduler for running heavy queries periodically. If you want to keep it simple, you can solve your problem by simply storing the data like total users, total sessions etc, in the global application context. And periodically update this data from database using spring scheduler. You can also store the same in a separate database table, so that this data can be easily loaded at the initialization time.
I really don't see why you need "memcached", "materialized views", "Websockets" and other heavy technologies and frameworks, for a caching a small set of data. All you need is maintain a set of global parameters in your application context, keep them updated using a scheduled task as frequently as desired.
I am looking for a good solution how to execute scheduled task on a one from few instances.
The problem:
I have a Java server with Spring Boot. Also I have a scheduled task that runs by using #Scheduled(cron="...") annotation. My application works with load balancer and usually it works on 3 instances. The scheduled task does update of postgres DB and scheduled task always runs on 3 server simultaneously.
How can I run the scheduled task only on one from servers ?
Thanks a lot!
You have to select a leader somehow, selecting a leader can be quite hard https://en.wikipedia.org/wiki/Consensus_(computer_science). There are however quite a lot of solutions that can help in selecting a leader.
I personally like http://curator.apache.org/ a lot. However depending on the tools you already use, there might be already something that can provide the needed leader election support like Redis (https://redis.io/topics/distlock) or your database (Postgres -> Advisory Locks).
The simplest solution however, if you do not need failover capabilities, is to configure one app as your lead in a config file and do not execute the task when the config is not set.
I'm looking to use Cassandra to store data which will be picked by some scheduling framework and will process those asynchronously at some time in future. Does anyone know if there are any scheduling frameworks that can be hooked up with Cassandra to make sure multiple instances of scheduling framework can work on the data set in parallel. Challenge here will be when a particular row is picked by any instance of scheduler framework, it should not be picked by any other instance. In RDMS I know we can achieve that by row locking mechanism but not sure if there is a cleaner way to achieve that in Cassandra. Please let me know if there is any framework of that nature to pick up those tasks.
I'm implementing a Java Application that does use of Hibernate for the DB management (mySQL 6.0)
A table of my database has a column that stores the the date of a future day, like 09/09/2014.
So, I'd find a way that when that day is the current day, I have to do some stuff in that table (and maybe in another one).
I was thinking to use a trigger to do that, but unfortunately I have no idea how.
Is it possible to do that, using Hibernate? Obviously, after the table changes, data in my application should be updated.
I am willing to any solution, both sides, Java and Hibernate.
If you are developing a Java EE application (ie: web application served by an application server like Wildfly/JBoss AS), then you can use the EJB Scheduler. This allows you to get a business class triggered, which you can use then to get an EntityManager instance and manipulate the data that you want/need. More info here: http://docs.oracle.com/javaee/6/tutorial/doc/bnboy.html
I suggest you to use Quarts Scheduler. It's Java API to trigger some activities on the specified time.
You can specify when to start the Scheduler. And in some function you can take the value for database and Quartz will wait for that time and then It will start the process whatever you are specifying.
Here you can see some samples and documents.
You can download the API, or its available with maven.
If you are using Spring, with that also Quartz can be used.
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.