Have you ever been tempted to "kill -9" a non-behaving PostgreSQL client session that won't die via "pg_cancel_backend()" or "pg_terminate_backend()"? Let's just say that's at first you may believe (incorrectly) that it will affect only the single session you've killed, but in reality it's an incredibly bad idea... as the result will be ALL of your client sessions being reset and the database crashing.
Here's an example. Let's say there is one session you need to kill, so you use kill -9:
postgres=# select * from pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+---------------------------------
13323 | postgres | 15658 | 10 | postgres | psql | | | -1 | 2018-03-28 16:46:27.924226+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663808+00 | | | active | | 1757 | select * from pg_stat_activity;
(1 row)
postgres=# \! kill -9 15658
The postmaster detects that a process has exited abnormally:
16:46:44 UTC [15642]: LOG: server process (PID 15658) was terminated by signal 9: Killed
16:46:44 UTC [15642]: DETAIL: Failed process was running: select * from pg_stat_activity;
And even though it wasn't in a transaction and was only running a simple select, the postmaster terminates "any other active service processes":
16:46:44 [15642]: LOG: terminating any other active server processes
The end result is that the entire PostgreSQL cluster/server resets itself, and does crash recovery to cleanup. The time taken to do this can be significant, depending on how busy your database is and how long it's been since the last checkpoint.
16:46:44 [15648]: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
16:46:44 [15648]: HINT: In a moment you should be able to reconnect to the database and repeat your command.
16:46:44 [15642]: LOG: archiver process (PID 15649) exited with exit code 1
16:46:44 [15642]: LOG: all server processes terminated; reinitializing
16:46:44 [15661]: LOG: database system was interrupted; last known up at 2018-03-28 16:46:16 UTC
16:46:44 [15661]: LOG: database system was not properly shut down; automatic recovery in progress
16:46:44 [15661]: LOG: redo starts at 0/155C230
16:46:44 [15661]: LOG: invalid record length at 0/155C2A0: wanted 24, got 0
16:46:44 [15661]: LOG: redo done at 0/155C268
16:46:44 [15661]: LOG: checkpoint starting: end-of-recovery immediate
16:46:45 [15661]: LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.030 s, sync=0.000 s, total=0.035 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
16:46:45 [15661]: LOG: MultiXact member wraparound protections are now enabled
16:46:45 [15665]: LOG: autovacuum launcher started
16:46:45 [15642]: LOG: database system is ready to accept connections
So in short, if you can't kill a session using the standard PostgreSQL Server Signalling Functions, then the only sensible option is to issue a restart during a maintenance window:
psql -c "checkpoint;"
pg_ctl restart -m fast
... or if that still won't work, then the last resort is:
pg_ctl restart -m immediate
Finally, it may seem like a more complicated process to follow when your database has gone awry, but if you don't use the PostgreSQL supported methods then your users will end up in a heap... just like they used to back in the Ingres days:
Stay safe, and keep your fingers off the trigger!
Greg.
I'm a PostgreSQL DBA, and when I run across interesting facts then I'll post them here.
Wednesday, 28 March 2018
Tuesday, 20 March 2018
Running multiple PostgreSQL clusters in RedHat/CentOS?
It's always something that comes up, even in today's vitrualised and containerised environments... "How can I run multiple PostgreSQL clusters on a single server?"
If you're used to doing RPM installs on RedHat/CentOS, then you'll be aware that the default location for all database files is:
/var/lib/pgsql/<version>/data
And the related service name is:
postgresql-<version>
That works well when you're only running a single cluster of each major version on a server, but things become complicated when you need more than one. There are several considerations:
Modify the $PGDATA/postgresql.conf, and uncomment this line with the new value, and restart the cluster:
#port = 5432 # (change requires restart)
/var/lib/pgsql/<version>/<cluster_name>/data
If you want several mountpoints because you prefer to split your database, then they can all be mounted at the same level as "data", and this still fits within the default SELinux configuration for PostgreSQL:
/var/lib/pgsql/<version>/<cluster_name>/data
/pg_wal
/pg_log
/<tblspc_name>
/pg_stat_tmp
If you're used to doing RPM installs on RedHat/CentOS, then you'll be aware that the default location for all database files is:
/var/lib/pgsql/<version>/data
And the related service name is:
postgresql-<version>
That works well when you're only running a single cluster of each major version on a server, but things become complicated when you need more than one. There are several considerations:
- What network port should it use?
- Where should $PGDATA be?
- What should the services be called?
What network port should it use?
This is the easiest to solve, as the default is "5432". By convention subsequent clusters running on a host simply increment this number by one, so "5433", then "5434", etc. In practice, you can choose any number you prefer, and some sites allocate a unique port to each cluster across the entire enterprise, so that there are never any clashes.Modify the $PGDATA/postgresql.conf, and uncomment this line with the new value, and restart the cluster:
#port = 5432 # (change requires restart)
Where will $PGDATA be?
This becomes more complex, especially if you run SELinux in "enforcing" mode. The simplest way is to extend the existing structure in a similar way that Ubuntu has done, and include the cluster name after the version:/var/lib/pgsql/<version>/<cluster_name>/data
If you want several mountpoints because you prefer to split your database, then they can all be mounted at the same level as "data", and this still fits within the default SELinux configuration for PostgreSQL:
/var/lib/pgsql/<version>/<cluster_name>/data
/pg_wal
/pg_log
/<tblspc_name>
/pg_stat_tmp
What will the services be called?
Assuming you're using CentOS v7 and systemd then there are two options here, both of which have merits. You can either use a separate service file for each new cluster, or create a single service file template to cover all clusters.
Service File Template
The file should be owned by root:root, mode 644, and be called (replacing "<version>" with the actual PostgreSQL version):
/etc/systemd/system/postgresql-<version>@.service
The contents should be:
.include /lib/systemd/system/postgresql-<version>.service
[Unit]
Description=PostgreSQL <version> database server - cluster_name:%i
[Service]
Environment=PGDATA=/var/lib/pgsql/<version>/%i/data
For PostgreSQL v9.6, this would look like:
[root@localhost]# cat /etc/systemd/system/postgresql-9.6@.service
.include /lib/systemd/system/postgresql-9.6.service
[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:%i
[Service]
Environment=PGDATA=/var/lib/pgsql/9.6/%i/data
Service File per Cluster
If you need more control on a per-cluster basis, then a separate file for each cluster must be created:
/etc/systemd/system/postgresql-<version>-<cluster_name>.service
The contents should be:
.include /lib/systemd/system/postgresql-<version>.service
[Unit]
Description=PostgreSQL <version> database server - cluster_name:<cluster_name>
[Service]
User=<postgres_service_user>
Group=<postgres_service_group>
Environment=PGDATA=/var/lib/pgsql/<version>/<cluster_name>/data
Here is an example for PostgreSQL v9.6, a cluster name of "test", running with the Linux user:group "postgresql_test:postgresql_test":
[root@localhost]# cat /etc/systemd/systempostgresql-9.6-test.service
.include /lib/systemd/system/postgresql-9.6.service
[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:test
[Service]
User=postgres_test
Group=postgres_test
Environment=PGDATA=/var/lib/pgsql/9.6/test/data
What should the services be called?
Once you've chosen to use either a template or a service file per cluster, the service names are:
Service File Template
postgresql-<version>@<cluster_name>
To start a PostgreSQL v9.6 cluster called "main", you would use:
sudo systemctl start postgresql-9.6@main
Service File per Cluster
postgresql-<version>-<cluster_name>
To start a PostgreSQL v9.6 cluster called "main", you would use:
sudo systemctl start postgresql-9.6-main
Both are very similar, with the only difference being the use of an at "@", or a dash "-" between the version and the cluster name.
IMPORTANT: The services won't get picked up until you reload the systemd configuration by running:
sudo systemctl daemon-reload
Conclusion
This structure allows for the flexibility of running multiple clusters on a server, without going as far as Ubuntu and splitting the configuration files out of $PGDATA... which in my opinion is very "un-Postgres-ish".
Even if you don't need it now, adopting such a structure could someday make your life easier, because you're already prepared to run multiple clusters on the server.
https://skillsmatter.com/skillscasts/11736-multiple-postgre-clusters-on-redhat-centos#video
Video
I presented my ideas to the London PostgreSQL Users Group, and their input was affirming and interesting at the same time. If you want to hear what they had to say, then click below:https://skillsmatter.com/skillscasts/11736-multiple-postgre-clusters-on-redhat-centos#video
Subscribe to:
Posts (Atom)
New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4
It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...
-
I realise it's a blanket statement to say NEVER to use pg_resetwal / pg_resetxlog , but in almost 100% of cases this will be true. One...
-
Have you ever restarted your PostgreSQL cluster and wondered why it's taking so long? The cause of this is generally one of these: Y...
-
It's always something that comes up, even in today's vitrualised and containerised environments... "How can I run multiple Post...