Schedule,
a Cron Adjunct
by Jim Scott
The author of a C program that runs other programs
as soon as possible after their scheduled times explains
how it works.
Cron
is the standard UNIX scheduling program. It is controlled
by a file, crontab, that controls when programs are
run. Cron is a remarkably versatile program, but it
has a deficiency. If the computer is down when a command
is scheduled to run, cron doesn't run the command
until the next scheduled time. This may be inappropriate
for commands that must be run on a particular day,
even if it happens later than usual. (Think of payroll,
which will be noticed, rather than deleting empty
logs, which might not.)
This
shortcoming can be partially overcome by running programs
at or after the scheduled time on a particular day.
This scheduler, Schedule, runs a program as soon as
possible after the scheduled time. The scheduled time
and the status of the run are stored in a database
table. Results are stored in a logfile.
A Solution
I have written a C program, inspired by a program
that handled scheduling at a former employers, that
runs a program any time after its scheduled time,
on the day it was scheduled. It resets the database
at midnight, reflecting that it's a new day, and nothing
has yet run on this new day.
The
input is a configuration table and the result of a
check program. The check program can be anything you
need to check prior to running the program. For example,
you can check if an input file is present and is no
longer growing, as you might need in a networked environment
with chance-y communication. Or perhaps you need 10
input files, and all must be present before processing.
Many circumstances will suggest themselves.
Implementation
The Schedule program (Listing 1) is written for a
PostgreSQL configuration and state table. It could
be changed to use two text files or a text file and
a database with little effort. In the former employers'
implementation, it used a text table and a database.
Listing 1. Schedule
The program starts by reading its configuration table,
setting interrupt vectors and using the at program
to schedule resetting the state values in the database.
This done, it goes into the schedule loop.
The
scheduler, run every sixty seconds, determines the
time and traverses a list of programs to be run. If
a program hasn't been run and it is after the scheduled
time, it is checked and run and the fact logged and
reflected in the database. If it has been run, it
no longer appears in the refilled list.
The
list is refilled from the configuration table at each
iteration of the scheduler, so the latency of a change
to the configuration table is the same as the latency
of the scheduler. The at program and its dæmon,
(atd, run programs when requested. You can look at
the queue using atq and remove entries during testing
using atrm. Type man at for details.
If
you change a scheduled time, the change is reflected
as soon as the configuration is refreshed, usually
within a minute. This would be useful if you knew
the data would be late and wanted to avoid unnecessary
notification of failure in the logs. Or, perhaps,
if the data had already arrived early, and your bonus
depended upon prompt processing. Of course, if the
data arrives very late, it is run whether you're there
or not.
The
program currently uses a predefined array of structures
to hold the configuration table. This could be malloced
and freed each time through the read_config procedure;
however, it is small enough that I didn't think it
necessary to add the overhead of allocating and freeing
the memory once a minute. The size of the structure
is 218 bytes, so an array of 100 programs requires
only 21,800 bytes of storage.
The
re-invocation of the signal callback function each
time a signal is received reflects that, in the Linux
world, the signal handler is reset to its default
behavior when a signal is received. You must explicitly
reset the handler unless the default is what you want.
This is the same as SysV behavior, but it differs
from BSD Unix.
The
scheduler depends upon an array of days and times.
The days start on Sunday (day 0) and end on Saturday
(day 6). The string representing the days some program
is to be run looks like NYNYYNN for a program to be
run on Monday, Wednesday and Thursday. The scheduled
time of a run is entered in local time, in 24-hour
format. For example, 2:20 PM is entered as 14:20.
The
scheduler runs as a dæmon, in the background.
The code to turn it into a dæmon is taken directly
from Stevens Unix Network Programming, Vol 1. The
program has no standard input or standard or error
output after it becomes a dæmon. It talks to
you only through the log, and you can communicate
with it only through signals. As it is designed, it
responds to kill -10 (User signal 1) by quitting.
Of course, it may be killed by kill -9, but this should
be a last resort with any program. With kill -9, the
program gets no time to tidy up or end processing
in an orderly manner; it is simply stopped.
Discussion
The scheduler first checks if the prerequisites for
running a program have been satisfied, then it runs
the program. If the machine is down at the scheduled
time, the program is run as soon as possible, allowing
for the granularity of the program (currently 60 seconds).
This should be adequate for all but critical processing.
If ten programs are scheduled to run at one particular
time, the last to run does so about ten minutes later.
If there is any time between scheduled runs, everything
runs on time.
The
granularity can be adjusted by changing the sleep
period in the scheduling subroutine. As it is, the
scheduler responds within about a minute. Actually,
five minutes is a more reasonable period; for test
purposes, however, the time was set to one minute.
Here
is a sample check file:
#!/usr/bin/perl
#Checks for the existence of the glloadfile. Returns
0 if it is present and has
#a non-zero size.
if (-s "/usr/tmp/glloadfile"){
exit(0);
} else {
exit(1);
}
If
you wanted to know if the file was still growing,
you might check its size twice in a loop. If it's
still the same size after a minute, it's probably
all here, although that judgment must reflect the
realities of your processing. For example, some systems
create the output file, then consume many hours processing,
sometimes with waits between bursts of output. In
general, anyone who has dealt with this sort of processing
knows what is required for a particular program to
run.
The
program to be run can be anything. However, all input
must come from the program, its files, the environment
or the database. The easiest way to allow differing
arguments or multiple runs each day is to run the
program using a script as the database program to
run. The script merely invokes the program with any
required arguments. With seven scripts and appropriate
entries in the configuration table, it would be possible
to run a program at a different time each day, using
different arguments each time it is run.
The
other problem is in the database. If you want to run
the program more than once each day, the second and
subsequent runs all must have different names. As
far as the database is concerned, if it has run, it
has run, at least until tomorrow. By using scripts
to invoke a program, you can run it many times, each
with a different configuration line and database entry.
Simply use a different script name. Here is the format
of the database table:
Table "schedule"
Column Type Modifiers
days character(7)
time character(5)
program character varying(100)
checkprog character varying(100)
didrun character(1)
The SQL to create the database table is:
create table schedule (days char(7),time char(5),program
varchar(100),checkprog varchar(100),didrun char(1));
The
entry for program is the name to be invoked. This
could be the program name or a script name. The entry
for didrun has three possible values, Y for yes, N
for no and P for pending. If the program has run,
the scheduler changes it from N to Y. The midnight
reset changes states for all programs to N, reflecting
the new day's reality.
Managing the Database
The database sometimes requires changes. I have written
a Tk procedure to handle such changes. It is enclosed
as listing 2. You can add, delete, query or change
database entries using this procedure.
Listing 2. Tk Procedure for Database Changes
Conclusion
Schedule is a viable alternative to cron or at for
many types of processing. It's easy to use and can
run unattended for many types of processing. It can
significantly ease the burden of scheduling on the
programmer responsible for both the operation of the
machine itself and its production. After all, when
the machine's down, you're trying to get it back up
and stable. You probably don't have time to worry
about what needs to run today. So, within limits,
you can let the scheduler run things. The principal
limitation is the scheduler won't delay something
for a day. It thinks something should run today or
not at all. But it does have the advantage of using
all of today, rather than only a dedicated small slice
of today.
|