Strategies for handling repetitive background tasks in a Java web application?

Strategies for handling repetitive background tasks in a Java web application? - java

I'm building a personal Web application using Java EE 6 Technologies (the container is an application server, Jboss AS 7). I'm starting from scratch to create
repetitive background tasks, I identified two possible scenarios :
Scheduled tasks (e.g, sending bulk mails every sunday night)
Trigger tasks based on web event (e.g,running some long background updates from a web action)
What I want to avoid (I don't know if is posible) is to have some background task scattered around my platformm (some of them using cron, others using TimerTask, db jobs, etc..) becoming difficult to maintain.
What are the different approaches to handling repetitive background tasks in a Java web application, taking into account the two previous requirements ?
Related:
Scheduled Tasks for Web Applications
Scheduled task in a web application?

With EE6 you can get rid of Quartz for almost all situations using the TimerService with #Timeout annotations.
And you dont need to write a line of XML to get it working.
There is a nice example in the EE Night Hacks book, also available as source here.
You can add a Timeout method to a bean processing your trigger web events. This way, they can be maintained in one place. You can also modify the timer settings by trigger events.

I'd still look at Quartz also. I can't comment on TimerService with EE6 as a substitute as I haven't used it, but I found Quartz to be quite useful.
When I used it (quite a few years ago now), it had a config file that closely resembled what you'd find for cron. You could use that to call whatever methods you need to perform your scheduled jobs, and then simply provide some other mechanism to call the method on demand.

Related

How can I avoid duplication of business logic when batch processing?

I have a web application dedicated to batch processing (batch service here on out, api driven) and I have the main web application that is dedicated to everything else. I've been struggling with making a decision on what the best way is to avoid duplication of business logic in the batch service. Both applications are clustered. The separation for batch processing has been okay for simple jobs, but I have more complex jobs where it would just cause chaos if the business logic were duplicated. Here's my use case for the purposes of this question.
Customer schedules a cron job for user updates.
Batch service is given a CSV file with 20,000 user records.
The batch service rips through the file performing validation on the records, basically a dry run.
The batch service will check the allowable change and error thresholds (percentages are counts)
If validation thresholds pass, the batch service will begin creating/updating users.
When users are created or updated, there are a number of modules/features that need to know about these events.
Job progress is tracked and customer can view progress, logs, and status of job.
Here are a few solutions I have been thinking about:
Jar up the business logic and share it across the two applications. This wouldn't necessarily be easy because the main application is a Grails application and it's got GORM littered throughout.
Have the batch service hit APIs on the main application for the create and updates and possibly the more complex validation scenarios. Worried about the toll this would take on tomcat, but calls would be going through the load balancer so they would be distributed.
Have the batch service hit APIs on the main application for validation, then queue create/update requests and let the main application retrieve them. Same as above, queue would help reduce http calls. Also would need a queue to report status back to batch service.
Duplicate some logic by having batch service do it's own validation and inserts/updates, but then fire a user created event or user updated event so modules/features in the main app can deal with the changes.
Embed the batch processing service into the main application
Other details:
The batch service and web application are both clustered
Both are running on AWS, so I have tools like SQS and SNS easily accessible
Java 1.7 applications
Tomcat containers
Main application is Grails
Batch service uses Spring Batch and Quartz at it's core
So my question is what are accepted ways to avoid duplication of business logic based on the details above? Can/Should the architecture be changed to better accommodate this?
Another idea to consider is what would this look like and a "microservices" architecture. That word has been tossed around a number of times in the office and we have been considering the idea of breaking up the main web application into services. So for example, we may end up with a service for user management.

Say for example you are using a Java EE 6 application.
Your CSV batch updater could be nothing more than a Timer that every once in a while reads a CSV file dumped in a folder and for each user update encoded on that file pumps a message to a queue encoding the update you want to do.
Somewhere else, you have a message driven bean that reacts to the update request message and triggers the update business logic for the user reported on the JMS message.
After the transaction is committed successfuly, if you have ten differn application that are interested in knowing that the user was updated, you could post a message to, for example, a notification topic with - say - messageType='userUpdated'.
Each of your 10 applications that cares about this could be a consumer on this topic.
They would be informed that a user was updated and maybe internally publish a local event (e.g. a CDI event - a guava event - whatever - and the internal satke holders would now of it).
Normally, there are always Java EE replacements in every technlogy stack.
Every decent technology stack offers ways to promote loose coupling between UI and business logic, precisely so that HTML / WEB is just viewed as one of many entry points to an application's business logic.
In scala, i there is an AKKA framework that looks super interesting.
The point is, as long as your business logic is not written in some place that only the web application can tap into, your fine. Otherwise, you've made the design decision to couple your business logic with UI.

In your case, I would suggest to make a separation by concern, I mean a plugin that gathers only the domain classes if using Grails, the other plugins that take care of Services ... these would represent the application core, I think it's much easier this way if your application contain too much KLOC, using microservices will take you time too much time if you have a lot of calls between modules.
Communication between functional modules aka. plugins can be made via events, see events-si or rabbit MQ plugin.

Best Way to Update Database Periodically in Java Web Application

I have a java web application which is running on Glassfish server. Using war file i use to deploy the application in various servers. Now to keep my application's database updated, i want to run some class (inside from application)periodically without any user interaction (should not depends on application is running or not/current users/session). i have seen that using some Timer and TimerTask class i can run any job periodically. But how to initialize it for the first time?
Please put your thoughts on how to complete this process.

Use a Job scheduler. Consider Quartz http://quartz-scheduler.org/ and start it when the program starts. The good part about using a scheduler is your program is more maintainable and you can easily create other new jobs

Create a servlet and make it load on startup. There you can initialize your task, I think.

Quartz is a good solution like already suggested. But if you need something more lite weight, I would have a look at the scheduled executor:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html
It is less flexible than Quartz, but you don't need to add any dependency and it might be that it is good enough for your needs.
About starting up; I normally use Spring to wire up my application and its dependencies. So starting schedulers and running scheduled tasks is then a no brainer.

The answer changes depending on the version of Java EE you are using. In Java EE 5 and previous versions you would use a ServletContextListener to run code (call an EJB) at deployment time that used the Timer API. In Java EE 6+ you can use the #Schedule annotation which uses annotations and a cron-type syntax to schedule your task at deployment time.
Of course if you don't need automatic deployment time scheduling then you'd just create some web form that calls a EJB when submitted which in turn calls the Timer API programmatically.
For more see the Java EE tutorial

Fast Multithreaded Online Processing Application Framework Suggestions

I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?

We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html

Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.

I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.

Running scheduled methods on tomcat

I am trying to set up a method that will be automatically run by the server at a specific time. For instance, a method that sends out emails to contacts every Friday at 9.00 am. I have seen methods that are run when the server is first started and was wondering if what I want to do is possible. If it is possible, can someone point me to where I can start reading up how to do this. Any help will be highly appreciated.

There is an excellent library quartz which can help you create scheduled tasks within your application. See e.g., the Job Scheduling in Java guide by o'reilly.

If you really want to do it manually (and not use specific tools like Quartz), you could use a Timer, which would be created when the application is deployed and canceled when the application is destroyed, using a ServletContextListener declared in your web.xml.
Be prepared for additional complexity if your application is clustered on multiple servers, though.

I also recommend using Quartz as Johan already suggested, it is a well-established solution for job scheduling in Java applications and also allows for central job storage in a database and clustering of multiple Tomcat instances.
In case your web application uses the Spring Framework, you could instead use the built-in scheduling support instead.

How to use Quartz with EJB3?

I want to be able to :
define different jobs and triggers.
modify the expirations dates and intervals on demand
pause or cancel an execution (trigger)
the jobs would be ejbs or call ejbs and i would want to manage everything from the website (the user will have to define the executions)
So i looked at the timerservice, timerobjects, timer and timerhandle. But i think it can't answer to all my needs
Quartz, on the other hand, allows me to do every thing that i want, but i haven't the slightest clue on how to integrate this into my jboss.
I read that quartz uses its own threadpool, and i don't know how to handle all this.
I use Jboss Seam in my project, but the seam/quartz integration is very limited (or the documentation is) and not 100% safe (seen on their forum : 'run forever' tasks end after only a few weeks)
If someone managed to integrate a good scheduler into his application server (jboss is a plus) and could give me directions, advices, or even code snippets, i would be thrilled.
Thanks in advance.

I have some experience integrating Quartz into a Weblogic (no jboss experience, sorry) application server. Quartz has a built in listener class that will be called upon server startup (per J2EE specs) that automatically configure the Quartz scheduler. Then in another startup class you can retrieve that scheduler, add jobs and begin serving those jobs.
You generally don't need to worry about the threadpool, Quartz can handle all this itself if you want it too. It gets its information from a properties files on startup that you can define or use the default one that comes with quartz. I have been using the default because it works for my purposes.
As far as defining jobs, you create your job classes and call your ejbs from there. It is rather very simple.
For your reading pleasure:
All Quartz documentation
Quartz JavaDoc
Cookbook containing lots of code snippets
Hope that's enough to get you started!

Great news! JBoss has a built-in scheduler already.
Since the EJB 2.0 specification included running stateless session beans and MDBs at scheduled intervals, all application servers have included this capability for some time now.
Here is an example of configuring JBoss to run a class using its built-in scheduler:
http://www.jboss.org/community/wiki/Scheduler
The best part about JBoss' implementation is that it is based on the MBean specification, which means that you can create/update/delete scheduled tasks at runtime.

Ok, i am sorry, i found in the sources of Jboss Seam just what i needed :
QuartzDispatcher to create QuartzTriggerHandle wich fires seam event at specified time and date and is manually pausable, resumable and stoppable. I use an #observer on the method i wanted to execute.
It's simple, and it works so far.

As pointed out by Poindexter, the Quartz documentation has nice starting points: Tutorial for Developing with Quartz, Examples of Usage, Cook Book (Quick How-Tos in the form of code examples), etc.
The What Is Quartz article is really good too (even if a bit old now).
For integration with JBoss, maybe have a look at How to configure a Quartz service on JBoss Wiki.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.