Wednesday, 14 February 2018

Do you have a HUGE table that never seems to vacuum frequently enough?

Do you have a HUGE table that never seems to vacuum frequently enough?

In situations where you have very large tables, then relying on  
autovacuum_vacuum_scale_factor  can mean many thousands or hundreds of thousands of rows are updated before a vacuum is triggered.

One way to ensure you have regular vacuuming is to reduce  autovacuum_vacuum_scale_factor = 0  for those tables, and increase the  autovacuum_vacuum_threshold  to an appropriate level.  Be warned that you can trap yourself into a constant vacuum spiral if you don't make the threshold high enough, so be informed about your update volume before you make changes and monitor it afterwards to ensure you're not constantly vacuuming.

I tested this on PostgreSQL v9.6.7 on CentOS v7.4, running standard parameters except for setting  log_autovacuum_min_duration = 0  so that we can see the autovacuum activity:

postgres=# SELECT version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# SHOW autovacuum_vacuum_threshold;
 autovacuum_vacuum_threshold 
-----------------------------
 50
(1 row)

postgres=# SHOW autovacuum_vacuum_scale_factor;
 autovacuum_vacuum_scale_factor 
--------------------------------
 0.2
(1 row)

postgres=# show log_autovacuum_min_duration;
 log_autovacuum_min_duration 
-----------------------------
 0
(1 row)


Create a table, and set the autovacuum parameters explicitly:

postgres=# CREATE TABLE vacuum_threshold_test(id int);
CREATE TABLE

postgres=# ALTER TABLE vacuum_threshold_test SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE

postgres=# ALTER TABLE vacuum_threshold_test SET (autovacuum_vacuum_threshold = 1000);
ALTER TABLE

postgres=# \d+ vacuum_threshold_test
                Table "public.vacuum_threshold_test"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 
Options: autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=1000


Create some dummy data, in excess of the autovacuum_vacuum_threshold:

postgres=# INSERT INTO vacuum_threshold_test(id) (SELECT generate_series(1,2000));
INSERT 0 2000


Update exactly the same number of rows as the autovacuum_vacuum_threshold, and for good measure force a checkpoint:

postgres=# UPDATE vacuum_threshold_test SET id = id + 1 WHERE id BETWEEN 1 AND 1000;
UPDATE 1000
postgres=# CHECKPOINT;
CHECKPOINT


That isn't enough to trigger an autovacuum, as it must be GREATER THAN the  autovacuum_vacuum_threshold,  so update one more row:

postgres=# UPDATE vacuum_threshold_test SET id = id + 1 WHERE id = 2000;
UPDATE 1


That then triggers an autovacuum, but strangely it only removed 1 tuple... not 1001 as you would expect:

< 2018-02-14 11:59:05.571 GMT > LOG:  automatic vacuum of table "postgres.public.vacuum_threshold_test": index scans: 0
pages: 0 removed, 14 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1 removed, 2000 remain, 0 are dead but not yet removable
buffer usage: 51 hits, 2 misses, 12 dirtied
avg read rate: 0.613 MB/s, avg write rate: 3.677 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec


Alternatively if you exceed the  autvacuum_vacuum_threshold  in one transaction, it removes all 1001 of them:

postgres=# UPDATE vacuum_threshold_test SET id = id + 1 WHERE id BETWEEN 2 AND 1002;
UPDATE 1001

< 2018-02-14 12:07:05.648 GMT > LOG:  automatic vacuum of table "postgres.public.vacuum_threshold_test": index scans: 0
pages: 0 removed, 18 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1001 removed, 2000 remain, 0 are dead but not yet removable
buffer usage: 60 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec


If you know what you want, then you can tune individual tables to always vacuum at the right point to prevent bloat, rather than relying on the cluster wide parameters in postgresql.conf.

Interestingly, if you set both  autovacuum_vacuum_scale_factor = 0, and  autovacuum_vacuum_threshold = 0, then any single update/delete to the table will result in autovacuum being triggered.

This has been logged as a bug on the PostgreSQL mailing list:

https://www.postgresql.org/message-id/151861949011.1450.18007996621988910831%40wrigleys.postgresql.org

Thursday, 1 February 2018

Monitoring bgwriter efficiency during checkpoints with pg_buffercache

Have you ever wanted to know what's happening inside your PostgreSQL buffer cache?  Is your bgwriter keeping up, or are you suffering with a lot of dirty buffers during a checkpoint?

Uncloak some of the mystery by using the pg_buffercache extension.

CREATE EXTENSION pg_buffercache;


You can then use the following query, and plot the results into your favourite graphing solution:

SELECT buffer_status, sum(count) AS count
  FROM (SELECT CASE isdirty
                 WHEN true THEN 'dirty'
                 WHEN false THEN 'clean'
                 ELSE 'empty'
               END AS buffer_status,
               count(*) AS count
          FROM pg_buffercache
          GROUP BY buffer_status
        UNION ALL
          SELECT * FROM (VALUES ('dirty', 0), ('clean', 0), ('empty', 0)) AS tab2 (buffer_status,count)) tab1
  GROUP BY buffer_status;


The output will look like this:

postgres=# SELECT buffer_status, sum(count) AS count
  FROM (SELECT CASE isdirty
                 WHEN true THEN 'dirty'
                 WHEN false THEN 'clean'
                 ELSE 'empty'
               END AS buffer_status,
               count(*) AS count
          FROM pg_buffercache
          GROUP BY buffer_status
        UNION ALL
          SELECT * FROM (VALUES ('dirty', 0), ('clean', 0), ('empty', 0)) AS tab2 (buffer_status,count)) tab1
  GROUP BY buffer_status;
 buffer_status | count
---------------+--------
 clean         |    544
 dirty         |     18
 empty         | 523726
(3 rows)


And whenever there is a checkpoint, the dirty count should go to zero:

postgres=# checkpoint;
CHECKPOINT

postgres=# SELECT buffer_status, sum(count) AS count
  FROM (SELECT CASE isdirty
                 WHEN true THEN 'dirty'
                 WHEN false THEN 'clean'
                 ELSE 'empty'
               END AS buffer_status,
               count(*) AS count
          FROM pg_buffercache
          GROUP BY buffer_status
        UNION ALL
          SELECT * FROM (VALUES ('dirty', 0), ('clean', 0), ('empty', 0)) AS tab2 (buffer_status,count)) tab1
  GROUP BY buffer_status;
 buffer_status | count
---------------+--------
 clean         |    562
 dirty         |      0
 empty         | 523726
(3 rows)


ACTION: If there is a large clearance of dirty blocks during the checkpoint, then you should tune bgwriter to be more aggressive.


Should you want to get the output as one per line, then you can use crosstab:

CREATE EXTENSION tablefunc;

SELECT * FROM crosstab(
  $$
    SELECT now() sample_timestamp, buffer_status, sum(buffer_count)::int AS buffer_count
          FROM (SELECT CASE isdirty
                         WHEN false THEN 'clean'
                         WHEN true THEN 'dirty'
                         ELSE 'empty'
                       END AS buffer_status,
                       count(*) AS buffer_count
                  FROM pg_buffercache
                  GROUP BY buffer_status
                UNION ALL
                  SELECT * FROM (VALUES ('clean', 0), ('dirty', 0), ('empty', 0)) AS dummy_table(buffer_status, buffer_count)
               ) outer_table
          GROUP BY buffer_status
          ORDER BY buffer_status
  $$)

AS ct(sample_timestamp timestamptz, clean int, dirty int, empty int);

The output will then look like:

       sample_timestamp        | clean | dirty | empty
-------------------------------+-------+-------+--------
 2018-02-02 11:38:00.915978+00 |  3108 |     0 | 521180
(1 row)


I created a test database, and ran pgbench against it whilst sampling every 10 seconds:

pgbench -i -s 100 pgbench

then

pgbench -c 32 -j 4 -P 10 -T 1200 pgbench


The database was configured with these parameters:

checkpoint_timeout = 5min
checkpoint_completion_target = 0.9

bgwriter_delay = 100ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 3.0


The resulting graph was:


Some notable points:

1) The db_buffers is big enough to fit the entire data set, because we never run out of empty buffers (grey).
2) Work starts at #14 and the dirty buffers (orange) start to climb.
3) A checkpoint starts at #27, and dirty blocks are written to disk.  This results in clean buffers (blue) balancing out with the new dirty buffers being written.
4) This held in equilibrium until a forced manual checkpoint is done at #98.  This clears all dirty buffers to disk immediately, instead of letting bgwriter do it in a controlled fashion.
5) Things quickly get back to equilibrium.
6) The test completes at #128, and the buffers stay in their current clean/dirty/empty state.
7) Once the next timed checkpoint is triggered at #140, the dirty blocks are written to disk at the rate determined by the bgwriter parameters.
8) All dirty blocks are written to disk by #167

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...