Wednesday 28 March 2018

Don't use "kill" on PostgreSQL... or you're gonna have a bad time!

Have you ever been tempted to "kill -9a non-behaving PostgreSQL client session that won't die via "pg_cancel_backend()" or "pg_terminate_backend()"?  Let's just say that's at first you may believe (incorrectly) that it will affect only the single session you've killed, but in reality it's an incredibly bad idea... as the result will be ALL of your client sessions being reset and the database crashing.





Here's an example.  Let's say there is one session you need to kill, so you use kill -9:

postgres=# select * from pg_stat_activity;
 datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event | state  | backend_xid | backend_xmin |              query
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+---------------------------------
 13323 | postgres | 15658 |       10 | postgres | psql             |             |                 |          -1 | 2018-03-28 16:46:27.924226+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663808+00 |                 |            | active |             |         1757 | select * from pg_stat_activity;
(1 row)


postgres=# \! kill -9 15658


The postmaster detects that a process has exited abnormally:

16:46:44 UTC [15642]: LOG:  server process (PID 15658) was terminated by signal 9: Killed
16:46:44 UTC [15642]: DETAIL:  Failed process was running: select * from pg_stat_activity;


And even though it wasn't in a transaction and was only running a simple select, the postmaster terminates "any other active service processes":


16:46:44 [15642]: LOG:  terminating any other active server processes


The end result is that the entire PostgreSQL cluster/server resets itself, and does crash recovery to cleanup.  The time taken to do this can be significant, depending on how busy your database is and how long it's been since the last checkpoint.

16:46:44 [15648]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
16:46:44 [15648]: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
16:46:44 [15642]: LOG:  archiver process (PID 15649) exited with exit code 1
16:46:44 [15642]: LOG:  all server processes terminated; reinitializing
16:46:44 [15661]: LOG:  database system was interrupted; last known up at 2018-03-28 16:46:16 UTC
16:46:44 [15661]: LOG:  database system was not properly shut down; automatic recovery in progress
16:46:44 [15661]: LOG:  redo starts at 0/155C230
16:46:44 [15661]: LOG:  invalid record length at 0/155C2A0: wanted 24, got 0
16:46:44 [15661]: LOG:  redo done at 0/155C268
16:46:44 [15661]: LOG:  checkpoint starting: end-of-recovery immediate
16:46:45 [15661]: LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.030 s, sync=0.000 s, total=0.035 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
16:46:45 [15661]: LOG:  MultiXact member wraparound protections are now enabled
16:46:45 [15665]: LOG:  autovacuum launcher started

16:46:45 [15642]: LOG:  database system is ready to accept connections


So in short, if you can't kill a session using the standard PostgreSQL Server Signalling Functions, then the only sensible option is to issue a restart during a maintenance window:

psql -c "checkpoint;"
pg_ctl restart -m fast

... or if that still won't work, then the last resort is:

pg_ctl restart -m immediate


Finally, it may seem like a more complicated process to follow when your database has gone awry, but if you don't use the PostgreSQL supported methods then your users will end up in a heap... just like they used to back in the Ingres days:



Stay safe, and keep your fingers off the trigger!

Greg.

Tuesday 20 March 2018

Running multiple PostgreSQL clusters in RedHat/CentOS?

It's always something that comes up, even in today's vitrualised and containerised environments... "How can I run multiple PostgreSQL clusters on a single server?"

If you're used to doing RPM installs on RedHat/CentOS, then you'll be aware that the default location for all database files is:

/var/lib/pgsql/<version>/data

And the related service name is:

postgresql-<version>

That works well when you're only running a single cluster of each major version on a server, but things become complicated when you need more than one.  There are several considerations:
  • What network port should it use?
  • Where should $PGDATA be?
  • What should the services be called?

What network port should it use?

This is the easiest to solve, as the default is "5432".  By convention subsequent clusters running on a host simply increment this number by one, so "5433", then "5434", etc.  In practice, you can choose any number you prefer, and some sites allocate a unique port to each cluster across the entire enterprise, so that there are never any clashes.

Modify the $PGDATA/postgresql.conf, and uncomment this line with the new value, and restart the cluster:


#port = 5432   # (change requires restart)



Where will $PGDATA be?

This becomes more complex, especially if you run SELinux in "enforcing" mode.  The simplest way is to extend the existing structure in a similar way that Ubuntu has done, and include the cluster name after the version:

/var/lib/pgsql/<version>/<cluster_name>/data

If you want several mountpoints because you prefer to split your database, then they can all be mounted at the same level as "data", and this still fits within the default SELinux configuration for PostgreSQL:

/var/lib/pgsql/<version>/<cluster_name>/data
                                       /pg_wal
                                       /pg_log
                                       /<tblspc_name>
                                       /pg_stat_tmp



What will the services be called?

Assuming you're using CentOS v7 and systemd then there are two options here, both of which have merits.  You can either use a separate service file for each new cluster, or create a single service file template to cover all clusters.

Service File Template

The file should be owned by root:root, mode 644, and be called (replacing "<version>" with the actual PostgreSQL version):

/etc/systemd/system/postgresql-<version>@.service

The contents should be:

.include /lib/systemd/system/postgresql-<version>.service

[Unit]
Description=PostgreSQL <version> database server - cluster_name:%i

[Service]
Environment=PGDATA=/var/lib/pgsql/<version>/%i/data


For PostgreSQL v9.6, this would look like:

[root@localhost]# cat /etc/systemd/system/postgresql-9.6@.service
.include /lib/systemd/system/postgresql-9.6.service

[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:%i

[Service]
Environment=PGDATA=/var/lib/pgsql/9.6/%i/data


Service File per Cluster

If you need more control on a per-cluster basis, then a separate file for each cluster must be created:

/etc/systemd/system/postgresql-<version>-<cluster_name>.service

The contents should be:

.include /lib/systemd/system/postgresql-<version>.service

[Unit]
Description=PostgreSQL <version> database server - cluster_name:<cluster_name>

[Service]
User=<postgres_service_user>
Group=<postgres_service_group>
Environment=PGDATA=/var/lib/pgsql/<version>/<cluster_name>/data


Here is an example for PostgreSQL v9.6, a cluster name of "test", running with the Linux user:group "postgresql_test:postgresql_test":

[root@localhost]# cat /etc/systemd/systempostgresql-9.6-test.service
.include /lib/systemd/system/postgresql-9.6.service

[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:test

[Service]
User=postgres_test
Group=postgres_test
Environment=PGDATA=/var/lib/pgsql/9.6/test/data

What should the services be called?

Once you've chosen to use either a template or a service file per cluster, the service names are:

Service File Template

postgresql-<version>@<cluster_name>

To start a PostgreSQL v9.6 cluster called "main", you would use:

sudo systemctl start postgresql-9.6@main

Service File per Cluster

postgresql-<version>-<cluster_name>

To start a PostgreSQL v9.6 cluster called "main", you would use:

sudo systemctl start postgresql-9.6-main

Both are very similar, with the only difference being the use of an at "@", or a dash "-" between the version and the cluster name.

IMPORTANT: The services won't get picked up until you reload the systemd configuration by running:

sudo systemctl daemon-reload

Conclusion

This structure allows for the flexibility of running multiple clusters on a server, without going as far as Ubuntu and splitting the configuration files out of $PGDATA... which in my opinion is very "un-Postgres-ish".

Even if you don't need it now, adopting such a structure could someday make your life easier, because you're already prepared to run multiple clusters on the server.


Video

I presented my ideas to the London PostgreSQL Users Group, and their input was affirming and interesting at the same time.  If you want to hear what they had to say, then click below:


  https://skillsmatter.com/skillscasts/11736-multiple-postgre-clusters-on-redhat-centos#video


New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...