I'm new to web servers. I have a java class that does a set of computations. I want to have this java class run every hour and update my domain on AWS, with the data.
My question is how/where do I set this job to run?
Is there a standard for this? Or does AWS have something I can use? I know how to read/write my data to AWS.
Should a cron job be used? Should the cron job run on AWS?
You have 2 options for this.
Set a cron job and let the operating system execute the script that starts your java program every hour or so.
Use something like Quartz Scheduler. In this case your Java program would be running continuously and the scheduler would be within your Java program.
There are various advantages and disadvantages to both approaches. In the first case the advantage is that if something wrong happens to the program, you know that in the next hour a new process with a fresh new instance of your program will launch, while in the second case if your Java program hangs for some reason you won't know unless you have some kind of monitoring. However, in case 2 you can maintain some kind of state information you might want to keep between runs. Quartz has also lots of advanced features, like maintaining info about executions in a database.
You can also have the Quartz Scheduler run within your webserver itself (so no need for another process). Its just an extra few .jar files to include. So it depends what you actually want to do. You can refer to what features it supports here.
Related
I have a Java application that needs to run several times. Every time it runs, it checks if there's data to process and if so, it processes the data.
I'm trying to figure out what's the best approach (performance, resource consumption, etc.) to do this:
1.- Launch it once, and if there's nothing to process make it sleep (All Java).
2.- Using a bash script to launch the Java app, and when it finishes, sleep (the script) and then relaunch the java app.
I was wondering if it is best to keep the Java app alive (sleeping) or relaunching every time.
It's hard to answer your question without the specific context. On the face of it, your questions sounds like it could be a premature optimization.
Generally, I suggest you do what's easier for you to do (and to maintain), unless you have good reasons not to. Here are some possible good reasons, pick the ones appropriate to your situation:
For sleeping in Java:
The check of whether there's new data is easier in Java
Starting the Java program takes time or other resources, for example if on startup, your program needs to load a bunch of data
Starting the Java process from bash is complex for some reason - maybe it requires you to fiddle with a bunch of environment variables, files or something else.
For re-launching the Java program from bash:
The check of whether there's new data is easier in bash
Getting the Java process to sleep is complex - maybe your Java process is a complex multi-threaded beast, and stopping, and then re-starting the various threads is complicated.
You need the memory in between Java jobs - killing the Java process entirely would free all of its memory.
I would not keep it alive.
Instead of it you can use some Job which runs at defined intervals you can use jenkins or you can use Windows scheduler and configure it to run every 5 minutes (as you wish).
Run a batch file with Windows task scheduler
And from your batch file you can do following:
javac JavaFileName.java // To Compile
java JavaFileName // to execute file
See here how to execute java file from cmd :
How do I run a Java program from the command line on Windows?
I personally would determine it, by the place where the application is working.
if it would be my personal computer, I would use second option with bash script (as resources on my local machine might change a lot, due to extensive use of some other programs and it can happen that at some point I might be running out of memory for example)
if it goes to cloud (amazon, google, whatever) I know exactly what kind of processes are running there (it should not change so dynamically comparing to my local PC) and long running java with some scheduler would be fine for me
I am working on a scheduled job that will run at certain interval (eg. once a day at 1pm), scheduled through Cron. I am working with Java and Spring.
Writing the scheduled job is easy enough - it does: grab list of people will certain criteria from db, for each person do some calculation and trigger a message.
I am working on a single-node environment locally and in testing, however when we go to production, it will be multi-node environment (with load balancer, etc). My concern is how would multi node environment affect the scheduled job?
My guess is I could (or very likely would) end up with triggering duplicate message.
Machine 1: Grab list of people, do calculation
Machine 2: Grab list of people, do calculation
Machine 1: Trigger message
Machine 2: Trigger message
Is my guess correct?
What would be the recommended solution to avoid the above issue? Do I need to create a master/slave distributed system solution to manage multi node environment?
If you have something like three Tomcat instances, each load balanced behind Apache, for example, and on each your application runs then you will have three different triggers and your job will run three times. I don't think you will have a multi-node environment with distributed job execution unless some kind of mechanism for distributing the parts of the job is in place.
If you haven't looked at this project yet, take a peek at Spring XD. It handles Spring Batch Jobs and can be run in distributed mode.
I have a Java/Database project in Netbeans that I would like to run once a day at a set time. I am using Derby for the database driver. I am trying to automate a process.
How can I 'schedule' this program to run at specified times?
How can I customize this to keep running until a certain criteria is met?
Say my criteria is that It has to populate 500 rows in the database. (So say at the scheduled time it runs it can only populate 400 rows, then maybe 2 hours later it tries running again to fill the last 100 rows)
Lastly, what are the best practices of automation and scheduled tasks?
How can I 'schedule' this program to run at specified times?
This can be done one of two ways, depending on your operating system - write a job that kicks off the java program at the intervals you need. You may then hook up the job to be started off on start up.
In Linux you can accomplish this with a cron job or so. On windows you may refer to this http://support.microsoft.com/kb/308569.
You may also program the scheduler into your java program using http://quartz-scheduler.org or http://www.sauronsoftware.it/projects/cron4j/ .
How can I customize this to keep running until a certain criteria is met?
This is perhaps best established from within your program, although it is hard to give you directions without much info.
Lastly, what are the best practices of automation and scheduled tasks?
Depending on your application architecture, scheduling and automation can be handled either from within the app or get support from the operating system. The criteria depends on how much control the application needs, which platform makes scheduling easy etc.
Hope this helps.
Quartz is a scheduling project for Java. I have used it in many projects and find it to be very intuitive.
It may be a little over the top for what your after but worth a look anyway.
You can make use of Timer for scheduling the events & the events/task must be implemented using TimerTask
We have an application deployed on Tomcat 6. It's built on Spring/Struts 2, and has several Quartz tasks scheduled.
We'd like to move some tasks away from Quartz and onto Linux's cron, doing the very least amount of coding as possible. How do I run those Spring/Quartz tasks outside the Tomcat container and in a standalone Java application?
(UPDATE: Since someone wanted to know why we want to do this)
We wanted to move the scheduled tasks to their own Java applications because our Tomcat keeps dying on us. There are no errors logged. We suspect that this one huge Quartz task we have is the culprit, but whether it's because of a memory leak or our Tomcat seg-faulting due to being set-up incorrectly, we still don't know.
We wanted to isolate it by kicking it out of the Tomcat container, and see if Tomcat will still die intermittently. However, since the application is already live (though in closed beta), we wanted to troubleshoot this with the least amount of coding work, while still keeping it running (coz, you know, "new code, new problems" -- FYI, we're already considering a rewrite/re-engineering, but "firefighting" is a more urgent concern right now).
I'm not familiar with Quartz but I am familiar with stuts2 and cron.
Generally in linux you call separate processes with cron so I'd think it would be best to reduce the quartz jobs down into separate stand alone programs. Considering the Java EE nature of your project and the dependency on on aquiring services via spring I don't think this is a particularly attractive option.
A second route that I've seen with PHP but would work equally well with struts2, would be using lynx to call a specific url, which could trigger the job something like:
*/15 * * * * lynx -dump http://localhost/MyApp/MyAction
Which would call your action every 15 min (the dump option prevents lynx from entering interactive mode and just dumps output to stdout so the program will just run for a moment), which could then run your job. You would then want to look at iptables (or similar) to restrict access to those services you would not want accessed externally. You can do this within struts2 as well by putting all these actions in a single package and making an interceptor to check that the requester is the local host.
I think this second method would require the least amount of change.
Apparently, the simplest way to go about this is to create a standalone Java application that calls the bean method you're executing in Quartz:
import org.springframework.context.*;
import org.springframework.context.support.*;
public class SomeJob {
public static void main(String[] args) {
ApplicationContext ctx = new ClassPathXmlApplicationContext("applicationContext.xml");
MyBean myBean = (MyBean) ctx.getBean("myBean");
myBean.someMethod();
}
}
..then, running this in cron.
Meh.
In my application I need to have periodically run background tasks (which I can easily do with Quartz - i.e. schedule a given job to be run at a specific time periodically).
But I would like to have a little bit more control. In particular I need to:
have the system rerun a task that wasn't run at its scheduled time (i.e. the server was down and because of this the task was not run. In such a situation I want the 'late' task to be run ASAP)
it would be nice to easily control tasks - i.e. run a task on demand or see when a given task was last run or reschedule a given task to be run at a different time
It seems to me that the above points can be achieved with Spring Batch Admin, but I don't have much experience in this area yet. Also, I've seen numerous posts on how Spring Batch is not a scheduling tool so I'm becoming to have doubts what the right tool for the job is here.
So my question is: can the above be achieved with Spring Batch Admin? Or perhaps Quartz is enough but needs configuring to do the above? Or maybe I need both? Or something else?
Thanks a lot :)
Peter
have the system rerun a task that wasn't run at its scheduled time
This feature in Quartz is called Misfire Instructions and does exactly what you need - but is a lot more flexible. All you need is to define JDBCJobStore.
it would be nice to easily control tasks - i.e. run a task on demand or see when a given task was last run or reschedule a given task to be run at a different time
You can use Quartz JMX to access various information (like previous and next run time) or query the Quartz database tables directly. There are also free and commercial management tools basex on the above input. I believe you can also manually run jobs there.
Spring Batch can be integrated with Quartz, but not replace it.