Wednesday, 10 January 2018

Faster shutdowns with checkpoint

Have you ever restarted your PostgreSQL cluster and wondered why it's taking so long?  The cause of this is generally one of these:

  • You didn't include the "-m fast" flag (v9.4 and earlier), and sessions are still connected
  • A large update query is rolling back
  • There are a large number of dirty blocks, which are not yet written to disk


You can cover off the first two by:

  • Using "pg_ctl -m fast"
  • Checking in pg_stat_activity for long-running queries
But to ensure you minimise your downtime when restarting a PostgreSQL cluster, then consider forcing a "checkpoint" beforehand.

The normal process of a shutdown is:
  1. Stop any new connections
  2. Disconnect any existing connections, and rollback transactions (if "-m fast" is used)
  3. Write all dirty blocks to disk
  4. Shutdown the database cleanly, and restart
The 3rd step can sometimes take a long time if you've got a lot of dirty buffers.  Maybe you've got a very long checkpoint_timeout set, or possibly your server is just very busy.

If you run a manual "CHECKPOINT" immediately prior to running the shutdown, then you can sometimes drastically shorten the time a restart takes and keep your users happy:

postgres@localhost[default:5432] ~/$ psql
psql (10.1)
Type "help" for help.

postgres=# CHECKPOINT;
CHECKPOINT
postgres=# \q

postgres@localhost[default:5432] ~/$ pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "log".
 done
server started


You can combine this all into a single step by running:

postgres@localhost[default:5432] ~/$ psql -c CHECKPOINT && pg_ctl restart -m fast
CHECKPOINT
waiting for server to shut down.... done
server stopped
waiting for server to start....2018-01-11 09:53:21.419 GMT [8919] LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "log".
 done

server started

Bug in pg_rewind overwrites pg_control

For those of you that don't monitor the pgsql-bugs mailing list, this interesting bug was posted recently:

>> I have encountered a bug in PostgreSQL 10.1: when the target 
>> directory for pg_rewind contains a read-only file (e.g. server.key), 
>> pg_rewind exits with "could not open target file" (legitimate) and 
>> corrupts the control file global/pg_control to size 0 (bug). From now 
>> on, pg_rewind always exits with "unexpected control file size 0, 
>> expected 8192" and a restore from pg_basebackup is needed.

> Likely that's reproducible down to 9.5 where pg_rewind has been 
> introduced. I agree that we should do better with failure handling 
> here. Corrupting the control file is not cool.

I can already confirm that this also occurs with PostgreSQL 9.6.


So if you want to use pg_rewind, then make sure you don't have any read-only files in your $PGDATA directory... otherwise your control file will get overwritten. :-O

I suspect this will get fixed in v9.5.11, v9.6.7, and v10.2... but there are no guarantees yet.


Here's the link to the original chat:

https://www.postgresql.org/message-id/20180104200633.17004.16377%40wrigleys.postgresql.org

Autovacuum broken by Incorrect localhost entry

If you're seeing an error like this in your postgresql.log:

LOG:  could not connect socket for statistics collector: Invalid argument
LOG:  disabling statistics collector for lack of working socket
WARNING:  autovacuum not started because of misconfiguration
HINT:  Enable the "track_counts" option.


There's a good chance that your localhost entry in /etc/hosts is broken.  In my case, there was a typo in the IP address, which should have been 127.0.0.1:

27.0.0.1   localhost

Once that was fixed, then the PostgreSQL database started up normally and Autovacuum was running.

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...