rest interface to configure, spawn, monitor, stop long tasks java - java

I'm looking for suggestion about this project.
I would like to build a web ui from which configure, launch, monitor, long lived task (potentially weeks).
I was thinking of a mongodb database where to write configuration and a monitoring service from which detect change and eventually spawn new tasks.
However this solution has lots of issues and drawbacks:
If a service died? how should I know, present part of the result (or the cuase of fail) to the user?
If i want to monitor the service (i.e. how much data it is collecting) where shoud i write this info? (maybe i can write them to mongo)
If i want to stop a service? I may write the stop condition to mongo (a boolean in the configuration document?)
Do you know some example project? It can be considered a task list with a real running task behind it.
Other info that could be related to the problem: I write usually in kotlin (coroutines look like a good alternative I need to study them thoroughly), I use Spring, Camel, and Kafka. I have an ELK stack where to put the message, I'm planning to write the UI with Vue.js (the web server is node.js at the moment). I don't want to use any cloud service. I'm alone on this project.

Related

How can I avoid duplication of business logic when batch processing?

I have a web application dedicated to batch processing (batch service here on out, api driven) and I have the main web application that is dedicated to everything else. I've been struggling with making a decision on what the best way is to avoid duplication of business logic in the batch service. Both applications are clustered. The separation for batch processing has been okay for simple jobs, but I have more complex jobs where it would just cause chaos if the business logic were duplicated. Here's my use case for the purposes of this question.
Customer schedules a cron job for user updates.
Batch service is given a CSV file with 20,000 user records.
The batch service rips through the file performing validation on the records, basically a dry run.
The batch service will check the allowable change and error thresholds (percentages are counts)
If validation thresholds pass, the batch service will begin creating/updating users.
When users are created or updated, there are a number of modules/features that need to know about these events.
Job progress is tracked and customer can view progress, logs, and status of job.
Here are a few solutions I have been thinking about:
Jar up the business logic and share it across the two applications. This wouldn't necessarily be easy because the main application is a Grails application and it's got GORM littered throughout.
Have the batch service hit APIs on the main application for the create and updates and possibly the more complex validation scenarios. Worried about the toll this would take on tomcat, but calls would be going through the load balancer so they would be distributed.
Have the batch service hit APIs on the main application for validation, then queue create/update requests and let the main application retrieve them. Same as above, queue would help reduce http calls. Also would need a queue to report status back to batch service.
Duplicate some logic by having batch service do it's own validation and inserts/updates, but then fire a user created event or user updated event so modules/features in the main app can deal with the changes.
Embed the batch processing service into the main application
Other details:
The batch service and web application are both clustered
Both are running on AWS, so I have tools like SQS and SNS easily accessible
Java 1.7 applications
Tomcat containers
Main application is Grails
Batch service uses Spring Batch and Quartz at it's core
So my question is what are accepted ways to avoid duplication of business logic based on the details above? Can/Should the architecture be changed to better accommodate this?
Another idea to consider is what would this look like and a "microservices" architecture. That word has been tossed around a number of times in the office and we have been considering the idea of breaking up the main web application into services. So for example, we may end up with a service for user management.
Say for example you are using a Java EE 6 application.
Your CSV batch updater could be nothing more than a Timer that every once in a while reads a CSV file dumped in a folder and for each user update encoded on that file pumps a message to a queue encoding the update you want to do.
Somewhere else, you have a message driven bean that reacts to the update request message and triggers the update business logic for the user reported on the JMS message.
After the transaction is committed successfuly, if you have ten differn application that are interested in knowing that the user was updated, you could post a message to, for example, a notification topic with - say - messageType='userUpdated'.
Each of your 10 applications that cares about this could be a consumer on this topic.
They would be informed that a user was updated and maybe internally publish a local event (e.g. a CDI event - a guava event - whatever - and the internal satke holders would now of it).
Normally, there are always Java EE replacements in every technlogy stack.
Every decent technology stack offers ways to promote loose coupling between UI and business logic, precisely so that HTML / WEB is just viewed as one of many entry points to an application's business logic.
In scala, i there is an AKKA framework that looks super interesting.
The point is, as long as your business logic is not written in some place that only the web application can tap into, your fine. Otherwise, you've made the design decision to couple your business logic with UI.
In your case, I would suggest to make a separation by concern, I mean a plugin that gathers only the domain classes if using Grails, the other plugins that take care of Services ... these would represent the application core, I think it's much easier this way if your application contain too much KLOC, using microservices will take you time too much time if you have a lot of calls between modules.
Communication between functional modules aka. plugins can be made via events, see events-si or rabbit MQ plugin.

What is the right way to run background task in Play 2.1 (Java)?

In my app I need to process uploaded documents and put results of processing in DB.
Documents are stored in file system and metadata is stored in DB.
For each document it is needed to open and process file from disk, than update metadata in DB accordingly. Processing may be expensive and take long time.
What I plan to do is:
Span N tasks, one task to process single document
Each task will go and find oldest, "unprocessed" document
Task will mark it as "in progress" in DB and start processing it
After processing document task will update metadata and mark it in DB as "processed"
Task will go to step 2 after that
What is the right / easiest way to implement this leveraging Play and Akka assuming applicaton is written in Java, not Scala? Source code examples would be also appreciated.
The right way is "Don't run any background tasks in a Play app". Play is a web framework for writing web apps, and a background task, by definition, does not use a web interface. So set up a separate background task runner and send it messages/events via Akka. In fact, you will have a far more scalable application if you push as much business logic as possible into background tasks.
For an example of this model taken to its logical conclusion, have a look at the Mongrel2 web server http://mongrel2.org/manual/book-final.html
Given that we have tools like Akka and Camel in the JVM world, and that frameworks like Play are weaning us off the servlet architecture, I think it is about time to follow Mongrel2's lead and get back to more of a 3 tier architecture where the web app layer only does the minimum of work.
If you follow this architecture, you would bundle up all the information needed run the background task into a message, send that to an external actor which does the work and then possibly, have that actor send a completion message to another actor which would update the database.

Java enterprise architecture for delegating tasks between applications

In my environment I need to schedule long-running task. I have application A which just shows to the client the list of currently running tasks and allows to schedule new ones. There is also application B which does the actual hard work.
So app A needs to schedule a task in app B. The only thing they have in common is the database. The simplest thing to do seems to be adding a table with a list of tasks and having app B query that table every once in a while and execute newly scheduled tasks.
Yet, it doesn't seem to be the proper way of doing it. At first glance it seems that the tool for the job in an enterprise environment is a message queue. App A sends a message with task description to the queue, app B reads a message from the queue and executes the task. Is it possible in such case for app A to get the status of all the tasks scheduled (persistent queue?) without creating a table like the one mentioned above to which app B would write the status of completed tasks? Note also that there may be multiple instances of app A and each of them needs to know about all tasks of all instances.
The disadvantage of the 'table approach' is that I need to have DB polling.
The disadvantage of the 'message queue approach' is that I'm introducing a new communication channel into the infrastructure (yet another thing that can fail).
What do you think? Any other ideas?
Thank you in advance for any advice :)
========== UPDATE ==========
Eventually I decided on the following approach: there are two sides of this problem: one is communication between A and B. The other is getting information about the tasks.
For communication the right tool for the job is JMS. For getting data the right tool is the database.
So I'll have app A add a new row to the 'tasks' table descibing a task (I can query this table later on to get list of all tasks). Then A will send a message to B via JMS just to say 'you have work to do'. B will do the work and update task status in the table.
Thank you for all responses!
You need to think about your deployment environment both now and likely changes in the future.
You're effectively looking at two problems, both which can be solved in several ways, depending on how much infrastructure you able to obtain and are also willing to introduce, but it's also important to "right size" your design for your problems.
Whilst you're correct to think about the use of both databases and messaging, you need to consider whether these items are overkill for your domain and only you and others who know your domain can really answer that.
My advice would be to look at what is already in use in your area. If you already have database infrastructure that you can build into, then monitoring task activity and scheduling jobs in a database are not a bad idea. However, if you would have to run your own database, get new hardware, don't have sufficient support resources then introduction of a database may not be a sensible option and you could look at a simpler, but potentially more fragile approach of having your processes write files to schedule jobs and report tasks.
At the same time, don't look at the introduction of a DB or JMS as inherently error prone. Correctly implemented they are stable and proven technologies that will make your system scalable and manageable.
As #kan says, use exposing an web service interface is also a useful option.
Another option is to make the B as a service, e.g. expose control and status interfaces as REST or SOAP interfaces. In this case the A will just be as a client application of the B. The B stores its state in the database. The A is a stateless application which just communicates with B.
BTW, using Spring Remote you could expose an interface and use any of JMS, REST, SOAP or RMI as a transport layer which could be changed later if necessary.
You have messages (JMS) in enterprise architecture. Use these, they are available in Java EE containers like Glassfish. Messages can be serialized to be sure they will be delivered even if the server reboots while they are in the queue. And you even do not need to care how all this is implemented.
There can be couple of approaches here. First, as #kan suggested to have app B expose some web service for the interactions. This will heterogenous clients to communicate with app B. Seems a good approach. App B can internally use whatever persistent store it deems fit.
Alternatively, you can have app B expose some management interface via JMX and have applications like app A talk to app B through this management interface. Implementing the task submission and retrieving the statistics etc. would be simpler. Additionally, you can also leverage JMX notifications for real time updates on task submissions and accomplishments etc. Downside to this is that this would be a Java specific solution and hence supporting heterogenous clients will be distant dream.

Fast Multithreaded Online Processing Application Framework Suggestions

I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.

Strategies for handling repetitive background tasks in a Java web application?

I'm building a personal Web application using Java EE 6 Technologies (the container is an application server, Jboss AS 7). I'm starting from scratch to create
repetitive background tasks, I identified two possible scenarios :
Scheduled tasks (e.g, sending bulk mails every sunday night)
Trigger tasks based on web event (e.g,running some long background updates from a web action)
What I want to avoid (I don't know if is posible) is to have some background task scattered around my platformm (some of them using cron, others using TimerTask, db jobs, etc..) becoming difficult to maintain.
What are the different approaches to handling repetitive background tasks in a Java web application, taking into account the two previous requirements ?
Related:
Scheduled Tasks for Web Applications
Scheduled task in a web application?
With EE6 you can get rid of Quartz for almost all situations using the TimerService with #Timeout annotations.
And you dont need to write a line of XML to get it working.
There is a nice example in the EE Night Hacks book, also available as source here.
You can add a Timeout method to a bean processing your trigger web events. This way, they can be maintained in one place. You can also modify the timer settings by trigger events.
I'd still look at Quartz also. I can't comment on TimerService with EE6 as a substitute as I haven't used it, but I found Quartz to be quite useful.
When I used it (quite a few years ago now), it had a config file that closely resembled what you'd find for cron. You could use that to call whatever methods you need to perform your scheduled jobs, and then simply provide some other mechanism to call the method on demand.

Categories

Resources