I find myself quite often having to split a large write operation such as updating millions of rows in a heavily used table. This usually boils down to writing a small Perl script that runs a small one-off function that performs the task by updating a limited numbers of rows per run, and then committing in between to avoid a long running transaction.
This time I needed to do this I decided to not write yet another Perl script and to instead do something to improve the situation. I’ve not been able to find any cron-ish tools at all to run database functions, except for pgAgent, but that looks more like a client-side daemon, I wanted something bare-minimum that can run server-side and piggy-back on the OS cron.
A few hours later and 382 lines of code later, the project was finished and pushed to Github:
https://github.com/trustly/pgcronjob
createuser pgcronjob psql -f install.sql crontab pgcronjob.crontab psql CREATE OR REPLACE FUNCTION public.CronJob_Function_Template_Skeleton() RETURNS batchjobstate LANGUAGE plpgsql SET search_path TO public, pg_temp AS $FUNC$ DECLARE BEGIN RAISE NOTICE 'Hello world!'; PERFORM pg_sleep(random()); RAISE NOTICE 'Slept for a while.'; IF random() < 0.5 THEN -- Tell CronJob() we have more work to do and we want it to run us again in due time RAISE NOTICE 'See you again!'; RETURN 'AGAIN'; ELSIF random() < 0.5 THEN -- Throw error to CronJob() to test errors RAISE EXCEPTION 'Simulate error in CronJob function'; ELSE -- Tell CronJob() we're done and we don't want it to run us ever again RAISE NOTICE 'Bye world!'; RETURN 'DONE'; END IF; END; $FUNC$; GRANT EXECUTE ON FUNCTION public.CronJob_Function_Template_Skeleton() TO pgcronjob; SELECT CronJob_Register('public','cronjob_function_template_skeleton');
Hopefully this will be useful for others as well. It would be fun to get some feedback.
The initial commit even comes with a nice README and install and uninstall scripts.
Extract from the README:
The settings are conditions that must all be TRUE for the cronjob to run, i.e. they are AND'd together. Always NOT NULL: - Enabled boolean NOT NULL DEFAULT TRUE: Controls whether the cronjob is enabled or not. - RunEvenIfOthersAreWaiting boolean NOT NULL DEFAULT FALSE: Controls whether to run the cronjob or not if there are other waiting db txns (pg_stat_activity.waiting). - RetryOnError boolean NOT NULL DEFAULT FALSE: Controls whether to run the cronjob ever again if the user-defined function would throw an error. Can be NULL (which means setting is ignored): - RunAfterTimestamp timestamptz: Run only after the specified timestamp. - RunUntilTimestamp timestamptz: Run only until the specified timestamp. - RunAfterTime time: Run only after the specified time of the day. - RunBeforeTime time: Run only until the specified time of the day. - RunInterval interval: Run only after having slept for this interval after the last run started. - SleepInterval interval: Run only after having slept for this interval after the last run finished.
Huh, interesting. I actually just wrote a replacement for pgAgent at my company and open sourced it: https://github.com/Tostino/jpgAgent
There really were a lack of schedulers until recently… Good to have options.
You definitely have a cool approach here though, very light weight.