We have an application which does a lot of imports and exports - basically between CSV files and database tables.
Some of the imports and exports are conflicting (you can't execute them simultaneously) for various reasons (like "legacy code").
We were looking into javax.batch. Conceptually it suits very well. But what we really failed to find is the possibility to somehow manage the "exclusiveness" of certain jobs we want to run.
Could someone please provide a pointer on that? How would we implement exclusive batch jobs with javax.batch? Or should we implement our own JobOperator for this?
Update
What I mean by "exclusiveness" is tha ability to define that certain jobs may not be executed in parallel. In the most trivial case this would mean "only execute one an only one job at time". In more complex cases more complex logic like "job of type A can't run with other jobs of type A or B, but C is OK". The "type of job" is, for instance, jobXmlName here (regardless of job parameters).
JSR-352 (and Spring Batch) both avoid the topic of orchestration on purpose. To do so would require a particular approach which prevents the inherent flexibility these batch frameworks offer. Because of that, the JobOperator in JSR-352 does not have any notion of preventing one job from running while another one is running.
While you could accomplish this via your own custom JobOperator, I wouldn't recommend that approach. Instead, you'd be better off moving that one layer higher, into whoever is calling the JobOperator so that the logic for that type of orchestration concern is separated from the implementation details of launching a job. For example, if you're using a scheduler to launch jobs, I'd put the logic there as to what jobs can run in parallel and which ones cannot...not in a custom JobOperator.
Related
Our application uses java / sql server.
We have ETL jobs (around 35 for different upstreams) using sprint batch. Some of the code is in java and some in database. We want to track lifecycle of a job from database. E.g. when a job started, when a particular component got called, when a method / stored procedure got called and how much time that took. The purpose is to do health check which component is taking more time and in case some stored procedure takes lots of time in production we should be able to query database. Moreover, we also want to store intermediate calculations for audit and debug purpose.
This time tracking and intermediate calculations would be stored besides normal application logging.
Current solution we have implemented is normalized tables in database (e.g. Job, Task, status, etc) for which we have stored procedure wrapper and then have java classes as well to call those stored procedures.
We are not redesigning our application, so wanted to check what is the best approach to track such information. AOP? but I believe that usually gets called for before and after what about the intermediate calculations we want to store?
Our current approach is working, but it is cluttering code as method is doing logging & auditing, instead of just concentrating on the main logic.
A free and open-source tool you should consider is Jamon, it is a comprehensive monitoring framework that provides a lots of useful features:
JAMon allows developers to track their applications performance and
behavior using predefined modules. There are modules that
automatically monitor : SQL, HTTP page requests, Spring beans, method
invocations, Log4j, and Exceptions. Other modules are often easy to
build. JAMon keeps track of the following metrics for any of the items
it tracks in the modules: hits, total, average, min, max and
concurrency (average, max, current/active) to name a few.
Now about storing calculation, I would suggest to break your methods in smaller sub-methods and then use AOP or any other tool to capture the returned value and perform whatever operation you want on these data.
In addition, if you need to have more details on the database layer I would recommend log4jdbc, which will give you nice audit and metrics around jdbc calls. For example you'll be able to get the execution time, the in and out parameters of called procedures, parameters provided to any statements.
You can even extends this tool to provide custom behavior (audit only some procedures, do something specific with collected data.
Aspects are a very good way to isolate the timing code in one place.
Stored procedures seem unnecessary to me. A simple SQL INSERT ought to do the trick. It's fine if you're using the stored proc as an interface to hide the schema from users, but I doubt that this table will evolve much.
Logging, timing, and auditing are the "hello world" of aspect oriented programming.
Sorry if title is confusing, let me explain my question.
Our team need to develop web service which is suppose to run on several nodes (web farm - horizontal scaling). We know how to implement this "manually", but we're pretty excited about Spring Integration which is new to us - so we really trying to understand whether this is good fit for our scenario - and if so we'll try to make use of it.
Typical scenario:
Sevaral servers ("nodes") running same web application (lets call it "OurWebService")
We need to pull files from external systems ("InboundExtSystems")
Process this data with help of other external systems (involves local resource-consuming operations) ("UtilityExtServices")
Submit processing results to another set of external systems ("OutboundExtSystems")
Non-functional requirements:
Due to performance reasons we cannot query UtilityExtServices by demand -AND- local processing also CPU-intensive. So we need to have queue, in order to control pace at which we performing requests and process results
We expect several nodes will equally pull tasks from this queue and process them
We need to make sure that every queued task pulled from InboundExtSystems will be handled - we need to guarantee that none of them will disappear.
We need to make sure timeouts are handled as well. If task processing timed out - we need to "requeue" this task (and make sure previous handled will not submit results for this task)
We need to be able to perform rolling updates. Like let's say 5 nodes are processing queue. We want to be able to sequentially stop-upgrade-start each node without noticeably impacting system performance.
So question is: is spring integration perfect fit for such case?
If answer is "Yes", could you kindly name primary components we should use primarily?
p.s. Sure enough we would probably also need to pick something as a message bus and queue acessible by every node (maybe redis, hazelcast or maybe rabbitmq, not sure what is more appropriate)
Yes, it's a good fit. I would suggest rabbitmq for the transport/queuing and the Spring Integration AMQP enpoints.
Rolling updates shouldn't be an issue unless you change the format of the messages sent between nodes). But even then you could handle it relatively easily by moving to a new set of queues.
In my environment I need to schedule long-running task. I have application A which just shows to the client the list of currently running tasks and allows to schedule new ones. There is also application B which does the actual hard work.
So app A needs to schedule a task in app B. The only thing they have in common is the database. The simplest thing to do seems to be adding a table with a list of tasks and having app B query that table every once in a while and execute newly scheduled tasks.
Yet, it doesn't seem to be the proper way of doing it. At first glance it seems that the tool for the job in an enterprise environment is a message queue. App A sends a message with task description to the queue, app B reads a message from the queue and executes the task. Is it possible in such case for app A to get the status of all the tasks scheduled (persistent queue?) without creating a table like the one mentioned above to which app B would write the status of completed tasks? Note also that there may be multiple instances of app A and each of them needs to know about all tasks of all instances.
The disadvantage of the 'table approach' is that I need to have DB polling.
The disadvantage of the 'message queue approach' is that I'm introducing a new communication channel into the infrastructure (yet another thing that can fail).
What do you think? Any other ideas?
Thank you in advance for any advice :)
========== UPDATE ==========
Eventually I decided on the following approach: there are two sides of this problem: one is communication between A and B. The other is getting information about the tasks.
For communication the right tool for the job is JMS. For getting data the right tool is the database.
So I'll have app A add a new row to the 'tasks' table descibing a task (I can query this table later on to get list of all tasks). Then A will send a message to B via JMS just to say 'you have work to do'. B will do the work and update task status in the table.
Thank you for all responses!
You need to think about your deployment environment both now and likely changes in the future.
You're effectively looking at two problems, both which can be solved in several ways, depending on how much infrastructure you able to obtain and are also willing to introduce, but it's also important to "right size" your design for your problems.
Whilst you're correct to think about the use of both databases and messaging, you need to consider whether these items are overkill for your domain and only you and others who know your domain can really answer that.
My advice would be to look at what is already in use in your area. If you already have database infrastructure that you can build into, then monitoring task activity and scheduling jobs in a database are not a bad idea. However, if you would have to run your own database, get new hardware, don't have sufficient support resources then introduction of a database may not be a sensible option and you could look at a simpler, but potentially more fragile approach of having your processes write files to schedule jobs and report tasks.
At the same time, don't look at the introduction of a DB or JMS as inherently error prone. Correctly implemented they are stable and proven technologies that will make your system scalable and manageable.
As #kan says, use exposing an web service interface is also a useful option.
Another option is to make the B as a service, e.g. expose control and status interfaces as REST or SOAP interfaces. In this case the A will just be as a client application of the B. The B stores its state in the database. The A is a stateless application which just communicates with B.
BTW, using Spring Remote you could expose an interface and use any of JMS, REST, SOAP or RMI as a transport layer which could be changed later if necessary.
You have messages (JMS) in enterprise architecture. Use these, they are available in Java EE containers like Glassfish. Messages can be serialized to be sure they will be delivered even if the server reboots while they are in the queue. And you even do not need to care how all this is implemented.
There can be couple of approaches here. First, as #kan suggested to have app B expose some web service for the interactions. This will heterogenous clients to communicate with app B. Seems a good approach. App B can internally use whatever persistent store it deems fit.
Alternatively, you can have app B expose some management interface via JMX and have applications like app A talk to app B through this management interface. Implementing the task submission and retrieving the statistics etc. would be simpler. Additionally, you can also leverage JMX notifications for real time updates on task submissions and accomplishments etc. Downside to this is that this would be a Java specific solution and hence supporting heterogenous clients will be distant dream.
Are there any recommendations, best practices or good articles on providing integration hooks ?
Let's say I'm developing a web based ordering system. Eventually I'd like my client to be able to write some code, packaged it into a jar, dump it into the classpath, and it would change the way the software behaves.
For example, if an order comes in, the code
1. may send an email or sms
2. may write some additional data into the database
3. may change data in the database, or decide that the order should not be saved into the database (cancel the data save)
Point 3 is quite dangerous since it interferes too much with data integrity, but if we want integration to be that flexible, is it doable ?
Options so far
1. provide hooks for specific actions, e.g. if this and that occurs, call this method, client will write implementation for that method, this is too rigid though
2. mechanism similar to servlet filters, there is code before the actual action is executed and code after, not quite sure how this could be designed though
We're using Struts2 if that matters.
This integration must be able to detect a "state change", not just the "end state" after the core action executes.
For example if an order changes state from In Progress to Paid, then it will do something, but if it changes from Draft to Paid, it should not do anything.The core action in this case would be loading the order object from the database, changing the state to Paid, and saving it again (or doing an sql update).
Many options, including:
Workflow tool
AOP
Messaging
DB-layer hooks
The easiest (for me at the time) was a message-based approach. I did a sort-of ad-hoc thing using Struts 2 interceptors, but a cleaner approach would use Spring and/or JMS.
As long as the relevant information is contained in the message, it's pretty much completely open-ended. Having a system accessible via services/etc. means the messages can tap back in to the main app in ways you haven't anticipated.
If you want this to work without system restarts, another option would be to implement handlers in a dynamic language (e.g., Groovy). Functionality can be stored in a DB. Using a Spring factory makes this pretty fun and reduces some of the complexity of a message-based approach.
One issue with a synchronous approach, however, is if a handler deadlocks or takes a long time; it can impact that thread at the least, or the system as a whole under some circumstances.
I am considering using the Quartz framework to schedule the run of several hundred jobs.
According to their API, jobs can be scheduled to run at certain moments in time but not to run one after the other (and stop a chain of jobs if one fails).
The only recommended methods I was able to find are:
Using a listener which notices the completion of a job and schedule the next trigger to fire (how to coordinate this?)
Each job will receive a parameter containing the next job to run and, after completing the actual work, schedule its run. (Cooperative)
Do you know a better method to create a workflow of jobs in Quartz?
Can you recommend other methods/framework for implementing a workflow in Java ?
EDITED: In the meantime I found out about OSWorkflow which appears to be a good match for what I need. It appears that what I need to implement is a "Sequence Pattern".
When Quartz documentation talks about "Job", it is referring to a class implementing the "Job" Interface, which is really just any class with an "execute" method that takes in the Quartz Context object. When creating this implementation you can really do whatever you want.
You could create an implementation of the Quartz Job Interface which simply calls all the jobs in your workflow in series, and throws a JobExecutionException exception on failure.
It sounds to me like you want Quartz to schedule the first job, and chain everything off that.
Have you looked at encapsulating each task using the Command Pattern, and linking them together ?
I've worked on a project called Dynamic Task Scheduler that use Quartz to execute job chains implementing a simple workflow in a fault-tolerant way (definied in XML format).
Take a look at http://sourceforge.net/projects/dynatasksched/
The project is beta, but I think it can gives you some ideas to start..
Hope it's useful!
For job chaining support for Quartz, you may want to check the QuartzDesk project that I have been involved in. In version 2.0. we have added a powerful job chaining engine that enables you to orchestrate your Quartz jobs without the need to modify your application code.
The engine takes care of propagating the job execution result and other parameters from the source job to the chained target job.
QuartzDesk comes with a GUI that allows you to dynamically update your job chains without disrupting your application.