Using pg_badplan to CREATE STATISTICS is a good plan

Thanks to Thomas Vondra’s presentation CREATE STATISTICS – What is it for? at Nordic PGDay 2018, my Trustly colleague Claes Jakobsson came up with the idea for a new extension.

The presentation explained how to manually compare the cost rows with the actual rows output from manually executing EXPLAIN ANALYZE.
Claes came up with the idea to do this comparison automatically, right after a query has been executed.
During the conference, Claes implemented the idea and emailed me some code for me to test.

Below is a demo of this extension, using the same data set and queries as in the presentation.

git clone
psql -X -f create-statistics-talk/create.sql

Let’s first try the query without pg_badplan using EXPLAIN ANALYZE:

\copy zip_codes from ~/src/create-statistics-talk/no_postal_codes_utf.csv with csv header;
COPY 4574

SELECT * FROM zip_codes WHERE city = 'Oslo' AND county = 'Oslo';
                                      QUERY PLAN                                       
 Seq Scan on zip_codes  (cost=0.00..108.61 rows=90 width=36) (actual rows=642 loops=1)
   Filter: ((city = 'Oslo'::text) AND (county = 'Oslo'::text))
   Rows Removed by Filter: 3932
 Planning time: 0.357 ms
 Execution time: 0.679 ms
(5 rows)

As you can see, we can manually see the ratio is approx 7.133 (=642/90).

Next, let’s see how we can automate this manual process using pg_badplan.

git clone
cd pg_badplan
make install
echo "shared_preload_libraries = 'pg_badplan'" >> /usr/local/var/postgres/postgresql.conf
echo "pg_badplan.min_row_threshold = 10" >> /usr/local/var/postgres/postgresql.conf
brew services restart postgresql

The pg_badplan.min_row_threshold has been set to 10 just for testing,
the default is 1000, but the query in the example returns fewer rows than that.

We can now try to run the same query without EXPLAIN ANALYZE,
and rely on pg_badplan to notify us in the normal log file.

SELECT * FROM zip_codes WHERE city = 'Oslo' AND county = 'Oslo';
tail -f /usr/local/var/postgres/log/postgresql.csv | grep pg_badplan
2018-03-19 11:46:54.154 CET,"joel","joel",91666,"[local]",5aaf951a.16612,5,"SELECT",2018-03-19 11:46:50 CET,4/34,0,LOG,00000,"pg_badplan: rows expected/actual ratio 7.133 exceeded for query SELECT * FROM zip_codes WHERE city = 'Oslo' AND county = 'Oslo';",,,,,,"SELECT * FROM zip_codes WHERE city = 'Oslo' AND county = 'Oslo';",,,"psql"

Finally, after having analyzed the pg_badplan log output,
we will hopefully be smart enough to manually realise
it would be a good plan to CREATE STATISTICS to help
the planner make a better estimates, which in turn will
help it to select better faster plans.

CREATE STATISTICS s (dependencies) ON city, state, county FROM zip_codes;
ANALYZE zip_codes;

Run the same query again, and watch how pg_badplan will now be silent thanks to the ratio between the estimate and actual rows are now below the pg_badplan.ratio threshold value.

The code is available here:

pg_catalog visualized

I couldn’t find any graph showing all the relations between all the pg_catalog tables,
so just for fun I wrote a little script to parse the SGML and generate a graph using GraphViz.

use strict;
use warnings;

use File::Slurp qw(slurp);
use Data::Dumper;

open my $fh, "<:encoding(utf8)", './doc/src/sgml/catalogs.sgml' or die "$!";

my $table;
my $column;
my $references;
my $pg_catalog_fk_map = {};
while (my $line = <$fh>) {
    chomp $line;
    if ($line =~ m!^\s+<title><structname>([^<>]+)</> Columns</title>$!) {
        $table = $1;
    } elsif ($line =~ m!^\s+<entry><structfield>([^<>]+)</structfield></entry>$!) {
        $column = $1;
    } elsif ($line =~ m!^\s+<entry><type>(oid|regproc)</type></entry>$!) {
    } elsif (defined $column && $line =~ m!^\s+<entry><literal><link\ linkend="[^"]+"><structname>([^<>]+)</structname></link>\.oid</literal></entry>$!) {
        $references = $1;
        if (!defined($pg_catalog_fk_map->{$table}->{$column}))
            $pg_catalog_fk_map->{$table}->{$column} = $references;
        } elsif ($pg_catalog_fk_map->{$table}->{$column} ne $references) {
            die "Inconsistent data, $table.$column references both $pg_catalog_fk_map->{$table}->{$column} and $references";
        $pg_catalog_fk_map->{$table}->{$column} = $references;
        if (!defined($pg_catalog_fk_map->{$references})) {
            $pg_catalog_fk_map->{$references} = {};
    } else {

my $dot = qq!
    digraph g {
        graph [
            rankdir = "LR"
        node [
            fontsize = "16"
            shape = "ellipse"
        edge [
foreach my $table (sort keys %{$pg_catalog_fk_map}) {
    $dot .= qq!
        "$table" [
            headlabel = "$table"
            label = "$table | <oid> oid|!;
    foreach my $column (sort keys %{$pg_catalog_fk_map->{$table}}) {
        my $references = $pg_catalog_fk_map->{$table}->{$column};
        $dot .= "<$column> $column|";
    $dot .= qq!"
            shape = "record"
my $id = 0;
foreach my $table (sort keys %{$pg_catalog_fk_map}) {
    foreach my $column (sort keys %{$pg_catalog_fk_map->{$table}}) {
        my $references = $pg_catalog_fk_map->{$table}->{$column};
        $dot .= qq!
        "$table":$column -> "$references":oid [
            id = $id

$dot .= qq!

print $dot;
./ > 
dot -Tpng -o pg_catalog.png


PgTerminator: Kill misbehaving db users if important processes are waiting

If you are a DBA and ever have had to manually call pg_terminate_backend() to kill some misbehaving backend process, then this tool might be of interest.

Long running queries are not a problem, as long as they don’t force other important processes to wait.

Another classic is a human user being forgetting to COMMIT.

Please see the Github page for more info:

Pgcronjob: Cron daemon to run user defined functions

I find myself quite often having to split a large write operation such as updating millions of rows in a heavily used table. This usually boils down to writing a small Perl script that runs a small one-off function that performs the task by updating a limited numbers of rows per run, and then committing in between to avoid a long running transaction.

This time I needed to do this I decided to not write yet another Perl script and to instead do something to improve the situation. I’ve not been able to find any cron-ish tools at all to run database functions, except for pgAgent, but that looks more like a client-side daemon, I wanted something bare-minimum that can run server-side and piggy-back on the OS cron.

A few hours later and 382 lines of code later, the project was finished and pushed to Github:

createuser pgcronjob
psql -f install.sql
crontab pgcronjob.crontab

CREATE OR REPLACE FUNCTION public.CronJob_Function_Template_Skeleton()
RETURNS batchjobstate
LANGUAGE plpgsql
SET search_path TO public, pg_temp
RAISE NOTICE 'Hello world!';
PERFORM pg_sleep(random());
RAISE NOTICE 'Slept for a while.';
IF random() < 0.5 THEN
 -- Tell CronJob() we have more work to do and we want it to run us again in due time
 RAISE NOTICE 'See you again!';
ELSIF random() < 0.5 THEN
 -- Throw error to CronJob() to test errors
 RAISE EXCEPTION 'Simulate error in CronJob function';
 -- Tell CronJob() we're done and we don't want it to run us ever again
 RAISE NOTICE 'Bye world!';

GRANT EXECUTE ON FUNCTION public.CronJob_Function_Template_Skeleton() TO pgcronjob;

SELECT CronJob_Register('public','cronjob_function_template_skeleton');


Hopefully this will be useful for others as well. It would be fun to get some feedback.

The initial commit even comes with a nice README and install and uninstall scripts.

Extract from the README:

The settings are conditions that must all be TRUE for the cronjob to run, i.e. they are AND'd together.

Always NOT NULL:
- Enabled boolean NOT NULL DEFAULT TRUE: Controls whether the cronjob is enabled or not.
- RunEvenIfOthersAreWaiting boolean NOT NULL DEFAULT FALSE: Controls whether to run the cronjob or not if there are other waiting db txns (pg_stat_activity.waiting).
- RetryOnError boolean NOT NULL DEFAULT FALSE: Controls whether to run the cronjob ever again if the user-defined function would throw an error.

Can be NULL (which means setting is ignored):
- RunAfterTimestamp timestamptz: Run only after the specified timestamp.
- RunUntilTimestamp timestamptz: Run only until the specified timestamp.
- RunAfterTime time: Run only after the specified time of the day.
- RunBeforeTime time: Run only until the specified time of the day.
- RunInterval interval: Run only after having slept for this interval after the last run started.
- SleepInterval interval: Run only after having slept for this interval after the last run finished.

Finding missing foreign keys

By coincidence I stumbled upon a table where one of its columns didn’t have a foreign key. I found this strange, since our convention is to always add foreign keys when appropriate.

But humans are humans, and will eventually forget to add a foreign key, so we better have both belt and suspenders and find a way to deal with this inevitable problem in an efficient way.

It would be a tedious job to manually look for missing foreign keys in all tables and columns.
But if you’re lucky enough to have the pleasure of working with a system where all tables and columns have been given their names by following a strict naming convention, it might be possible to fully automate the task.

This is the namning convention we use:

  1. Table names are always in plural form with a tailing “s”, e.g. users
  2. Primary key column names are always equal to the table name in singular form with a tailing id, e.g. userid.
  3. Foreign key columns are always equal to the primary key they are referecning, e.g. transactions.userid -> users.userid

This means you always know based on the column name alone, what table if any that column might be referencing.

Example: If table transactions have a column named userid, and there is a table where userid is also the primary key, but if there isn’t any foreign key on transactions.userid, then it’s a missing foreign key, or else someone has not followed the namning convention.

Thanks to PostgreSQL’s fantastic pg_catalog system tables, we can write a query which uses the rules from the naming convention and returns all the columns which appears to be missing foreign keys. It doesn’t support multi-column keys, but we don’t have many of those, so it’s not a problem in my case.

Thanks to the view below, I automatically found three more missing foreign keys of the same type, which saves me many hours of boring work today.

pg1:joel=#* SELECT * FROM view_missing_foreign_keys;
 nspname |      relname    | attname 
 public  | transactions    | userid
 public  | someohtertable1 | userid
 public  | someothertable2 | userid
(5 rows)

I’m posting the view I wrote here in hope it might be useful for others with a similar naming convention, and/or to inspire others to include the table name in their column names used as primary/foreign keys.

CREATE OR REPLACE VIEW view_missing_foreign_keys AS
FROM pg_catalog.pg_namespace
INNER JOIN pg_catalog.pg_class ON (pg_catalog.pg_class.relnamespace = pg_catalog.pg_namespace.oid)
INNER JOIN pg_catalog.pg_attribute ON (pg_catalog.pg_attribute.attrelid = pg_catalog.pg_class.oid)
WHERE pg_catalog.pg_class.relkind = 'r'
AND pg_catalog.pg_attribute.attnum > 0
AND NOT pg_catalog.pg_attribute.attisdropped
AND pg_catalog.pg_namespace.nspname NOT IN ('pg_toast','information_schema','pg_catalog')
AND pg_catalog.pg_attribute.attname LIKE '%id'
    -- The column is PRIMARY KEY in some table
    SELECT 1 FROM pg_catalog.pg_constraint
    WHERE pg_catalog.pg_constraint.contype = 'p'
    AND pg_catalog.pg_get_constraintdef(pg_catalog.pg_constraint.oid) = format('PRIMARY KEY (%s)',pg_catalog.pg_attribute.attname)
    -- There is no FOREIGN KEY on this column
    SELECT 1 FROM pg_catalog.pg_constraint
    WHERE pg_catalog.pg_constraint.contype = 'f'
    AND pg_catalog.pg_constraint.conrelid = pg_catalog.pg_class.oid
    AND pg_catalog.pg_get_constraintdef(pg_catalog.pg_constraint.oid) LIKE (format('FOREIGN KEY (%s)',pg_catalog.pg_attribute.attname) || '%')
    -- This column is not the PRIMARY KEY of it's own table,
    -- since if it was, we wouldn't require a FOREIGN KEY on it
    SELECT 1 FROM pg_catalog.pg_constraint
    WHERE pg_catalog.pg_constraint.contype = 'p'
    AND pg_catalog.pg_constraint.conrelid = pg_catalog.pg_class.oid
    AND pg_catalog.pg_get_constraintdef(pg_catalog.pg_constraint.oid) = format('PRIMARY KEY (%s)',pg_catalog.pg_attribute.attname)

“How we use PostgreSQL at Trustly” at PGConf.EU

Slides from my talk, How we use PostgreSQL at Trustly, are now available.

psql \watch 1400000000 epoch time countdown counter

\pset fieldsep ' '
    (('epoch'::timestamptz + 14*10^8 * '1 s'::interval)-now())::interval(0),
    (14*10^8-extract(epoch from now()))::int,
    extract(epoch from now())::int
\watch 1

09:18:28 33508 1399966492
09:18:27 33507 1399966493
09:18:26 33506 1399966494
09:18:25 33505 1399966495
09:18:24 33504 1399966496
09:18:23 33503 1399966497