Wednesday 10 January 2018

Faster shutdowns with checkpoint

Have you ever restarted your PostgreSQL cluster and wondered why it's taking so long?  The cause of this is generally one of these:

  • You didn't include the "-m fast" flag (v9.4 and earlier), and sessions are still connected
  • A large update query is rolling back
  • There are a large number of dirty blocks, which are not yet written to disk


You can cover off the first two by:

  • Using "pg_ctl -m fast"
  • Checking in pg_stat_activity for long-running queries
But to ensure you minimise your downtime when restarting a PostgreSQL cluster, then consider forcing a "checkpoint" beforehand.

The normal process of a shutdown is:
  1. Stop any new connections
  2. Disconnect any existing connections, and rollback transactions (if "-m fast" is used)
  3. Write all dirty blocks to disk
  4. Shutdown the database cleanly, and restart
The 3rd step can sometimes take a long time if you've got a lot of dirty buffers.  Maybe you've got a very long checkpoint_timeout set, or possibly your server is just very busy.

If you run a manual "CHECKPOINT" immediately prior to running the shutdown, then you can sometimes drastically shorten the time a restart takes and keep your users happy:

postgres@localhost[default:5432] ~/$ psql
psql (10.1)
Type "help" for help.

postgres=# CHECKPOINT;
CHECKPOINT
postgres=# \q

postgres@localhost[default:5432] ~/$ pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "log".
 done
server started


You can combine this all into a single step by running:

postgres@localhost[default:5432] ~/$ psql -c CHECKPOINT && pg_ctl restart -m fast
CHECKPOINT
waiting for server to shut down.... done
server stopped
waiting for server to start....2018-01-11 09:53:21.419 GMT [8919] LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "log".
 done

server started

Bug in pg_rewind overwrites pg_control

For those of you that don't monitor the pgsql-bugs mailing list, this interesting bug was posted recently:

>> I have encountered a bug in PostgreSQL 10.1: when the target 
>> directory for pg_rewind contains a read-only file (e.g. server.key), 
>> pg_rewind exits with "could not open target file" (legitimate) and 
>> corrupts the control file global/pg_control to size 0 (bug). From now 
>> on, pg_rewind always exits with "unexpected control file size 0, 
>> expected 8192" and a restore from pg_basebackup is needed.

> Likely that's reproducible down to 9.5 where pg_rewind has been 
> introduced. I agree that we should do better with failure handling 
> here. Corrupting the control file is not cool.

I can already confirm that this also occurs with PostgreSQL 9.6.


So if you want to use pg_rewind, then make sure you don't have any read-only files in your $PGDATA directory... otherwise your control file will get overwritten. :-O

I suspect this will get fixed in v9.5.11, v9.6.7, and v10.2... but there are no guarantees yet.


Here's the link to the original chat:

https://www.postgresql.org/message-id/20180104200633.17004.16377%40wrigleys.postgresql.org

Autovacuum broken by Incorrect localhost entry

If you're seeing an error like this in your postgresql.log:

LOG:  could not connect socket for statistics collector: Invalid argument
LOG:  disabling statistics collector for lack of working socket
WARNING:  autovacuum not started because of misconfiguration
HINT:  Enable the "track_counts" option.


There's a good chance that your localhost entry in /etc/hosts is broken.  In my case, there was a typo in the IP address, which should have been 127.0.0.1:

27.0.0.1   localhost

Once that was fixed, then the PostgreSQL database started up normally and Autovacuum was running.

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...