Each night a job is invoked which checks whether it is first of a month. During our testing, we can't wait for days and hence want to change the system date to a first of the month so that job gets invoked and processes all transactions logged into database so far.
We are using AWS ECS on EC2. Containers largely run Java (Spring Boot) but other containers run NodeJS and Spring batch jobs (short lived tasks). Now I want to be able to set all containers to use a specific date (main requirement; and may be time too and that means that launch with some specific time and then run their clocks as normal clocks). The long running tasks are auto-scaled (min 3 and max 10).
How can this be achieved?
Additional information
The database is in UTC and all Java containers have time zone set to 'America/New_York'.
Other option thought was to override Java System.currentTimeMillis() through Aspect oriented programming (AOP) but the AOP code needs to invoked from some place. It was thought to do it through API but there are 2 issues, a) Unable not make AOP code to work yet & b) How to call API on each container when it starts?
EC2 instances run using Red Hat Enterprise Linux.
Related
I am working on a scheduled job that will run at certain interval (eg. once a day at 1pm), scheduled through Cron. I am working with Java and Spring.
Writing the scheduled job is easy enough - it does: grab list of people will certain criteria from db, for each person do some calculation and trigger a message.
I am working on a single-node environment locally and in testing, however when we go to production, it will be multi-node environment (with load balancer, etc). My concern is how would multi node environment affect the scheduled job?
My guess is I could (or very likely would) end up with triggering duplicate message.
Machine 1: Grab list of people, do calculation
Machine 2: Grab list of people, do calculation
Machine 1: Trigger message
Machine 2: Trigger message
Is my guess correct?
What would be the recommended solution to avoid the above issue? Do I need to create a master/slave distributed system solution to manage multi node environment?
If you have something like three Tomcat instances, each load balanced behind Apache, for example, and on each your application runs then you will have three different triggers and your job will run three times. I don't think you will have a multi-node environment with distributed job execution unless some kind of mechanism for distributing the parts of the job is in place.
If you haven't looked at this project yet, take a peek at Spring XD. It handles Spring Batch Jobs and can be run in distributed mode.
I'm new to web servers. I have a java class that does a set of computations. I want to have this java class run every hour and update my domain on AWS, with the data.
My question is how/where do I set this job to run?
Is there a standard for this? Or does AWS have something I can use? I know how to read/write my data to AWS.
Should a cron job be used? Should the cron job run on AWS?
You have 2 options for this.
Set a cron job and let the operating system execute the script that starts your java program every hour or so.
Use something like Quartz Scheduler. In this case your Java program would be running continuously and the scheduler would be within your Java program.
There are various advantages and disadvantages to both approaches. In the first case the advantage is that if something wrong happens to the program, you know that in the next hour a new process with a fresh new instance of your program will launch, while in the second case if your Java program hangs for some reason you won't know unless you have some kind of monitoring. However, in case 2 you can maintain some kind of state information you might want to keep between runs. Quartz has also lots of advanced features, like maintaining info about executions in a database.
You can also have the Quartz Scheduler run within your webserver itself (so no need for another process). Its just an extra few .jar files to include. So it depends what you actually want to do. You can refer to what features it supports here.
I have a Java/Database project in Netbeans that I would like to run once a day at a set time. I am using Derby for the database driver. I am trying to automate a process.
How can I 'schedule' this program to run at specified times?
How can I customize this to keep running until a certain criteria is met?
Say my criteria is that It has to populate 500 rows in the database. (So say at the scheduled time it runs it can only populate 400 rows, then maybe 2 hours later it tries running again to fill the last 100 rows)
Lastly, what are the best practices of automation and scheduled tasks?
How can I 'schedule' this program to run at specified times?
This can be done one of two ways, depending on your operating system - write a job that kicks off the java program at the intervals you need. You may then hook up the job to be started off on start up.
In Linux you can accomplish this with a cron job or so. On windows you may refer to this http://support.microsoft.com/kb/308569.
You may also program the scheduler into your java program using http://quartz-scheduler.org or http://www.sauronsoftware.it/projects/cron4j/ .
How can I customize this to keep running until a certain criteria is met?
This is perhaps best established from within your program, although it is hard to give you directions without much info.
Lastly, what are the best practices of automation and scheduled tasks?
Depending on your application architecture, scheduling and automation can be handled either from within the app or get support from the operating system. The criteria depends on how much control the application needs, which platform makes scheduling easy etc.
Hope this helps.
Quartz is a scheduling project for Java. I have used it in many projects and find it to be very intuitive.
It may be a little over the top for what your after but worth a look anyway.
You can make use of Timer for scheduling the events & the events/task must be implemented using TimerTask
I'm trying to write a Spring web application on a Weblogic server that makes several independent database SELECTs(i.e. they can safely be called concurrently), one of which takes 15 minutes to execute.
Once all the results are fetched, an email containing the results will be sent to a user list.
What's a good way to get around this problem? Is there a Spring library that can help or do I go ahead and create daemon threads to do the job?
EDIT: This will have to be done at the application layer (business requirement) and the email will be sent out by the web application.
Are you sure you are doing everything optimally? 15 minutes is a really long time unless you have a gabillion rows across dozens of tables and need a heckofalot of joins....this is your highest priority -- why is it taking so long?
Do you do the email job at set intervals, or is it invoked from your web app? If set intervals, you should do it in an outside job, possibly on another machine. You can use daemons or the quartz scheduler.
If you need to fire this process off from the web app, you need to do it asynchronously. You could use JMS, or you could just have a table into which you enter a new job request, with daemon process that looks for new jobs every X time period. Firing off background threads is possible, but its error prone and not worth the complication, especially since you have other valid options that are simpler.
If you are asking about Spring support for long-running, possibly asynchronous tasks, you have a choice between Spring JMS support and Spring Batch.
You can use spring quartz to schedule the job. That way the jobs will run in the same container but will not require an http request to trigger them.
I have a developed two small Java applications - a vanilla Java app and a Java Web application (i.e. Spring MVC, Servlets, JSP, etc.).
The vanilla application consists of several threads which read data continuously at varying rates (from once a second to twice a minute) from several websites, process the data and write it to a database.
The Web Application reads the data from the database and presents it using JSPs, etc.
I'd now like to deploy the applications to a Linux machine and have them run 24 x 7.
If the applications crash I would like them to be restarted.
What's the best way of doing this?
Your web container will run 24x7 by default. If your deployed application throws an exception, it's captured by the container. I wouldn't normally expect this process to not run. Perhaps if threads run away, then it may become unresponsive, so it's worth monitoring (perhaps by a separate process querying it via HTTP?).
Does your vanilla application need to run at regular intervals ? If so, then cron is a good bet. It'll invoke a new instance every 'n' minutes (or however you configure it). If your instance suffers a problem, then it'll simply bail out and a new instance will be launched at the next configured interval. Again, you should probably monitor this (capture log files?) in case some problem determines that it'll never succeed completely.
with Ubuntus upstart you can respawn processes automatically. A little bit more low-level is to put the respawn directly in /etc/inittab. Both work well, but upstart is more manageable (more tools), but requires a newer system (ubuntu, fedora, and debian is switching soon).
For inittab you need to add a line like this to /etc/inittab (from the inittab manpage):
12:2345:respawn:/path/to/myapp flags
For upstart you do something similar (this is a standard script in ubuntu 9.10):
user#host:/etc/init$ cat tty1.conf
# tty1 - getty
#
# This service maintains a getty on tty1 from the point the system is
# started until it is shut down again.
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
exec /sbin/getty -8 38400 tty1
Check out the ServletContextListener, this allows you to embed your java application inside your web application (by creating a background thread). Then you can have it all running inside the web container.
Consider investigating and using a web container supported by the operating system vendor so all the scripts to bring it up and down (including in case of problems) is written and maintained by somebody else but you.
E.g. Ubuntu has a Tomcat as a package
I have a crontab job running every 15 minutes to see if the script is still running. If not, it restarts the service. The script itself is a piece of Perl code:
#!/usr/bin/perl
use diagnostics;
use strict;
my %is_set;
for (#ARGV) {
$is_set{$_} = 1;
}
my $verbose = -1;
if ($is_set{"--verbose"}) {
$verbose = 1;
}
my #components = ("cdk", "qsar", "rdf");
foreach my $comp (#components) {
print "Checking component $comp\n" if ($verbose == 1);
my $bla = `ps aux | grep component | grep $comp-xws | grep -v "ps aux" | wc -l`;
$bla =~ s/\n|\r//g;
if ($bla eq "1") {
print " ... running\n" if ($verbose == 1);
} else {
print " ... restarting component $comp\n" if ($verbose == 1);
system "cd /home/egonw/runtime/$comp; sh runCDKcomponent.sh &";
}
}
First, when a problem occur, it is in general a good idea to have a human look at it to find the root cause as restarting a service without any action will in many cases not magically solve the issue. The common way to handle this situation is to use a monitoring solution offering some kind of alerting (by email, sms, etc) to let a human know that something is wrong and needs a human action. For example, have a look at HypericHQ, OpenNMS, Zenoss, Nagios, etc.
Second, if you want to offer some kind of highly available service, running multiple instances of the service (this is often referred to as clustering) would be a good idea. When doing so, if one instance goes down, the service won't be totally interrupted, obviously. Note that when using a cluster, if one node goes down because of too heavy load, it's very unlikely that the remaining part of the cluster will be able to handle the load so clustering isn't an absolute guarantee in all situations. Implementing this (at least for the web application) depends on the application server or servlet engine you are using.
But actually, if you are looking for something simple and pretty straight forward, I'd warmly suggest to check monit which is really a better alternative to a custom cron job (don't reinvent the wheel, monit is precisely doing what you want in a smart way). See this article for an introduction:
monit is a utility for managing and monitoring processes, files, directories and devices on a Unix system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. For example, monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses to much resources. You may use monit to monitor files, directories and devices for changes, such as timestamps changes, checksum changes or size changes.
Java Service Wrapper may help with keeping the Java program up 24x7 (or very close).
Several years ago I worked on a project using Java 1.2 and our goal was to run 24x7. We never made it. The longest we managed to keep Java running was about 2-3 weeks. Twice it crashed after about 15 days. The first time we just restarted it, the second time a colleague did some research and found that the crash was due to an int variable overflowing in the Calendar class: the JdbcDriver had called new Date(year, month, day, hour minute, second) more than about 300 million times and each call had incremented the int 6 times. I think this particular bug may be fixed but you may find there are others that you encounter as you try to keep the JVM running for a long time.
So you may need to design your application to be restarted occasionally to avoid this kind of thing.