Friday 18 September 2020

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon:

REGION: us-east-2
  "DBEngineVersionDescription": "PostgreSQL 10.11-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.12-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.13-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.6-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.7-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.8-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.2-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.3-R1"

REGION: us-east-1
  "DBEngineVersionDescription": "PostgreSQL 10.11-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.12-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.13-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.6-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.7-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.8-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.2-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.3-R1"

REGION: us-west-1
  "DBEngineVersionDescription": "PostgreSQL 10.11-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.12-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.13-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.14-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.6-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.7-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.8-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.9-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.2-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.3-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.4-R1"

REGION: us-west-2
  "DBEngineVersionDescription": "PostgreSQL 10.11-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.12-R1"
  "DBEngineVersionDescription": "PostgreSQL 10.13-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.6-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.7-R1"
  "DBEngineVersionDescription": "PostgreSQL 11.8-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.2-R1"
  "DBEngineVersionDescription": "PostgreSQL 12.3-R1"

... etc.

Monday 14 September 2020

WARNING: AWS Aurora PostgreSQL v10.12, v10.13, and v11.8 have been removed

If you are using RDS Aurora PostgreSQL v10.12, v10.13, or v11.8, please be aware that they have been removed from the catalog and you can no longer create new clusters with these versions.  It will probably give you an error like this:

Error: error creating RDS cluster: InvalidParameterCombination:
Cannot find version 10.12 for aurora-postgresql
status code: 400, request id:

database error - Dr Evil meme | Meme Generator


They used to exist:

https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-aurora-supports-postgresql-11-8-10-13-and-9-6-18/

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.html

Topics

Version 3.3, Compatible with PostgreSQL 11.8
Version 3.2, Compatible with PostgreSQL 11.7
Version 3.1, Compatible with PostgreSQL 11.6
Version 3.0, Compatible with PostgreSQL 11.4
Version 2.6, Compatible with PostgreSQL 10.13
Version 2.5, Compatible with PostgreSQL 10.12
Version 2.4, Compatible with PostgreSQL 10.11
Version 2.3, Compatible with PostgreSQL 10.7
Version 2.2, Compatible with PostgreSQL 10.6
Version 2.1, Compatible with PostgreSQL 10.5
Version 2.0, Compatible with PostgreSQL 10.4


But not now:

$ aws rds describe-db-engine-versions --engine aurora-postgresql | grep "DBEngineVersionDescription"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.3)"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.6)"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.8)"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.9)"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.11)"
"Aurora PostgreSQL (compatible with PostgreSQL 9.6.12)"
"Aurora PostgreSQL (Compatible with PostgreSQL 9.6.16)"
"Aurora PostgreSQL (Compatible with PostgreSQL 9.6.17)"
"Aurora PostgreSQL (Compatible with PostgreSQL 9.6.18)"

"Aurora PostgreSQL (compatible with PostgreSQL 10.4)"
"Aurora PostgreSQL (compatible with PostgreSQL 10.5)"
"Aurora PostgreSQL (compatible with PostgreSQL 10.6)"
"Aurora PostgreSQL (compatible with PostgreSQL 10.7)"
"Aurora PostgreSQL (compatible with PostgreSQL 10.7)"
"Aurora PostgreSQL (Compatible with PostgreSQL 10.11)"

"Aurora PostgreSQL (Compatible with PostgreSQL 11.4)"
"Aurora PostgreSQL (Compatible with PostgreSQL 11.6)"
"Aurora PostgreSQL (Compatible with PostgreSQL 11.7)"


Bug Details

The problem relates to "NOT EXISTS" queries, insomuch as v10.12 ignores the "NOT" part, and gives you the exact opposite result.  One workaround is to add "AND <column> is NOT NULL" to all affected queries. I have no information if this also affects v10.13 or v11.8, but I suspect it does as these versions were all pulled from the catalogue at the same time.

If you can't change your code, then you can either: * Downgrade to Aurora PostgreSQL v10.11, either by "pg_dump" or DMS
* Upgrade to Aurora v11.6, as this is the current "preferred minor engine version" for Aurora.


Wednesday 23 October 2019

Think AWS support for RDS gives 100% coverage? Think again!

Just a friendly FYI that AWS do not have access to any of your PostgreSQL RDS database logs, such as postgresql.log.  This is important, if you just happen to hit PostgreSQL bug such as this one that was fixed in v11.3:

  • Fix race conditions in management of dynamic shared memory (Thomas Munro)

    These could lead to dsa_area could not attach to segment or cannot unpin a segment that is not pinned errors.

Once this problem occurs, your RDS database will appear to be running, and can possibly be used to service some queries... but most queries will hang, and eventually your application servers will timeout causing application downtime.

Unfortunately from the AWS support perspective, the RDS database is up, it's using some CPU, and network, so there's no reason to suspect it's not healthy.  It's only if you see this message by looking in your postgresql.log that you will know what the problem is:

2018-01-01 12:34:56 UTC::@:[10234]:FATAL:  cannot unpin a segment that is not pinned



So if you run an application team, then please ensure you have a DBA somewhere on staff, because AWS doesn't have your back when you hit problems such as this.  You could always just turn it off and on again, but you will never find the root-cause and will be cursed to keep rebooting whenever things break at unpredictable times.

postgresql.log file grows infinitely, and never rotates... did your disk fill up previously?

An interesting (and currently undocumented) behaviour of PostgreSQL is that when there is an error writing to the postgresql.log configured in log_filename, that all future log rotations are disabled by this piece of code in syslogger.c:


/*
 * ENFILE/EMFILE are not too surprising on a busy system; just
 * keep using the old file till we manage to get a new one.
 * Otherwise, assume something's wrong with Log_directory and stop
 * trying to create files.
 */
 if (errno != ENFILE && errno != EMFILE)
 {
     ereport(LOG,
             (errmsg("disabling automatic rotation (use SIGHUP to re-enable)")));
     rotation_disabled = true;
 }


In most default configurations this will never be noticed, because the log_directory is on the same filesystem as the data_directory.  In that case the entire instance will crash, and once space is cleared then it will be restarted afresh with log rotation enabled.




If, however, you had the foresight to put your log_directory onto a different filesystem, then it can fill up without taking the entire cluster down.  Once you clear space then logging will resume, but both log_rotation_age, and log_rotation_size will be disabled:


if (Log_RotationAge > 0 && !rotation_disabled)
{
 /* Do a logfile rotation if it's time */
 now = (pg_time_t) time(NULL);
 if (now >= next_rotation_time)
        rotation_requested = time_based_rotation = true;
}

if (!rotation_requested && Log_RotationSize > 0 && !rotation_disabled)
{
 /* Do a rotation if file is too big */
 if (ftell(syslogFile) >= Log_RotationSize * 1024L)
 {
        rotation_requested = true;
        size_rotation_for |= LOG_DESTINATION_STDERR;
 }
 if (csvlogFile != NULL &&
        ftell(csvlogFile) >= Log_RotationSize * 1024L)
 {
        rotation_requested = true;
        size_rotation_for |= LOG_DESTINATION_CSVLOG;
 }
}


The result of being disabled means that one single log file will be written to, growing and growing until it once again fills the disk.  If you're relying on log_truncate_on_rotation to run, or cron jobs to cleanup old logs, then neither will work.

To correct this, once some space has been cleared then you just need to send a SIGHUP to the postmaster process.  This is safely done by calling pg_reload_conf(), but you could also use kill or pg_ctl reload if you prefer.


/*
 * If we had a rotation-disabling failure, re-enable rotation
 * attempts after SIGHUP, and force one immediately.
 */
if (rotation_disabled)
{
    rotation_disabled = false;
    rotation_requested = true;
}


After that your logging will go back to normal, and will rotate as you have configured in your postgresql.conf.

Tuesday 21 May 2019

You should NEVER use pg_resetwal / pg_resetxlog

I realise it's a blanket statement to say NEVER to use pg_resetwal / pg_resetxlog, but in almost 100% of cases this will be true.  One common reason cited is:

"My disk filled up and my database crashed, so I ran pg_resetwal to clear some space."





If you have seen this:

postgres=# INSERT INTO full_disk SELECT generate_series(1,1000000);
PANIC:  could not write to file "pg_wal/xlogtemp.11818": No space left on device
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


or this:

2019-05-21 13:30:20.080 BST [11818] PANIC:  could not write to file "pg_wal/xlogtemp.11818": No space left on device
2019-05-21 13:30:20.080 BST [11818] STATEMENT:  insert into full_disk select generate_series(1,1000000);
2019-05-21 13:30:20.100 BST [11708] LOG:  server process (PID 11818) was terminated by signal 6: Aborted
2019-05-21 13:30:20.100 BST [11708] DETAIL:  Failed process was running: insert into full_disk select generate_series(1,1000000);
2019-05-21 13:30:20.100 BST [11708] LOG:  terminating any other active server processes
2019-05-21 13:30:20.101 BST [11715] WARNING:  terminating connection because of crash of another server process
2019-05-21 13:30:20.101 BST [11715] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-05-21 13:30:20.101 BST [11715] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-05-21 13:30:20.103 BST [11708] LOG:  all server processes terminated; reinitializing
2019-05-21 13:30:20.117 BST [11849] LOG:  database system was interrupted; last known up at 2019-05-21 13:27:43 BST
2019-05-21 13:30:20.119 BST [11850] FATAL:  the database system is in recovery mode
2019-05-21 13:30:20.177 BST [11849] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-21 13:30:20.179 BST [11849] LOG:  redo starts at 0/16B0250
2019-05-21 13:30:20.468 BST [11849] LOG:  redo done at 0/1FFFF88
2019-05-21 13:30:20.468 BST [11849] LOG:  last completed transaction was at log time 2019-05-21 13:29:41.186055+01
2019-05-21 13:30:20.500 BST [11849] PANIC:  could not write to file "pg_wal/xlogtemp.11849": No space left on device
2019-05-21 13:30:20.522 BST [11708] LOG:  startup process (PID 11849) was terminated by signal 6: Aborted
2019-05-21 13:30:20.522 BST [11708] LOG:  aborting startup due to startup process failure

2019-05-21 13:30:20.527 BST [11708] LOG:  database system is shut down



DO NOT RUN pg_resetwal!


This will result in a corrupted database and lost data.  To fix a full disk situation you should try these, in order of preference:

1) Remove non-PostgreSQL files on the filesystem
2) Add storage to the filesystem
3) Move/remove files in the "pg_log" or "log" directory, after confirming they are all postgresql logs, and not a part of the database itself.

... then restart the database and will do crash recovery to ensure you once again have a consistent database.  Upon startup you will see crash recovery happen, and the database will then come up:

2019-05-21 13:35:02.361 BST [11977] LOG:  database system shutdown was interrupted; last known up at 2019-05-21 13:30:20 BST
2019-05-21 13:35:02.379 BST [11977] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-21 13:35:02.380 BST [11977] LOG:  redo starts at 0/16B0250
2019-05-21 13:35:02.678 BST [11977] LOG:  redo done at 0/1FFFF88
2019-05-21 13:35:02.678 BST [11977] LOG:  last completed transaction was at log time 2019-05-21 13:29:41.186055+01

2019-05-21 13:35:02.734 BST [11973] LOG:  database system is ready to accept connections


The same goes for deleting files in the pg_wal / pg_log directories, which happened so frequently that it was the motivation to rename everything from "xlog" to "wal".

The only reason you should ever consider running pg_resetwal is if you have an already corrupted database, and you have no backups available.  This will allow you to start the database, and pg_dump what's left... with the understanding that it may not be consistent.

Even if you have an already corrupt database, and you've decided that pg_resetwal is the one and only solution... ALWAYS take a file-level backup (tar, etc). of the database and all related files (tablespaces, wal) before doing anything.  It's a one-way command, that can actually make things worse.

Tuesday 14 August 2018

Simulating ON COMMIT DELETE Global Temporary Tables on PostgreSQL

Vibhor Kumar did a great post on Leveraging Local Temporary Table for Global Temporary Table in EDB Postgres 9.6, including both ON COMMIT PRESERVE, and ON COMMIT DELETE options:

https://vibhorkumar.blog/2017/07/06/leveraging-local-temporary-table-for-global-temporary-table-in-edb-postgres-9-6/


The code example only covers the INSERT case, so here is an extension on that code which includes INSERT/UPDATE/DELETE:

CREATE UNLOGGED TABLE gtt_test_definition (id int);
CREATE VIEW gtt_test AS SELECT * FROM gtt_test_definition;

CREATE OR REPLACE FUNCTION gtt_test_function()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$function$
BEGIN
  /* attempt to run the DML */
  IF (TG_OP = 'INSERT') THEN
    INSERT INTO gtt_test_temp VALUES(NEW.*);
    RETURN NEW;
  ELSIF (TG_OP = 'UPDATE') THEN
    UPDATE gtt_test_temp SET id = NEW.id WHERE id = OLD.id;
    RETURN NEW;
  ELSIF (TG_OP = 'DELETE') THEN
    DELETE FROM gtt_test_temp WHERE id = OLD.id;
    RETURN NEW;
  END IF;

  /* when the temp table doesn't already exist */
  EXCEPTION WHEN undefined_table THEN
    /* create local temporary table */
    CREATE TEMP TABLE gtt_test_temp
      (LIKE gtt_test_definition INCLUDING ALL )
      INHERITS (gtt_test_definition)
      ON COMMIT DELETE ROWS;

    /* re-run the DML */
    IF (TG_OP = 'INSERT') THEN
      INSERT INTO gtt_test_temp VALUES(NEW.*);
      RETURN NEW;
    ELSIF (TG_OP = 'UPDATE') THEN
      UPDATE gtt_test_temp SET id = NEW.id WHERE id = OLD.id;
      RETURN NEW;
    ELSIF (TG_OP = 'DELETE') THEN
      DELETE FROM gtt_test_temp WHERE id = OLD.id;
      RETURN NEW;
    END IF;

END;
$function$;

CREATE TRIGGER gtt_test_trigger INSTEAD OF INSERT OR UPDATE OR DELETE ON gtt_test

  FOR EACH ROW EXECUTE PROCEDURE gtt_test_function();


And here is a test case, showing it working for ON COMMIT DELETE:

postgres=# \d
No relations found.

postgres=# \i test_gtt.sql
CREATE TABLE
CREATE VIEW
CREATE FUNCTION
CREATE TRIGGER

postgres=# \d
                List of relations
 Schema |        Name         | Type  |  Owner   
--------+---------------------+-------+----------
 public | gtt_test            | view  | postgres
 public | gtt_test_definition | table | postgres
(2 rows)

postgres=# select * from gtt_test;
 id 
----
(0 rows)

postgres=# insert into gtt_test values(1);
NOTICE:  merging column "id" with inherited definition
INSERT 0 1
postgres=# select * from gtt_test;
 id 
----
(0 rows)

postgres=# \d
                 List of relations
  Schema   |        Name         | Type  |  Owner   
-----------+---------------------+-------+----------
 pg_temp_2 | gtt_test_temp       | table | postgres
 public    | gtt_test            | view  | postgres
 public    | gtt_test_definition | table | postgres
(3 rows)

postgres=# begin;
BEGIN
postgres=# insert into gtt_test values(1);
INSERT 0 1
postgres=# select * from gtt_test;
 id 
----
  1
(1 row)

postgres=# insert into gtt_test values(2);
INSERT 0 1
postgres=# select * from gtt_test;
 id 
----
  1
  2
(2 rows)

postgres=# commit;
COMMIT
postgres=# select * from gtt_test;
 id 
----
(0 rows)

postgres=# begin;
BEGIN
postgres=# insert into gtt_test values(1);
INSERT 0 1
postgres=# insert into gtt_test values(2);
INSERT 0 1
postgres=# select * from gtt_test;
 id 
----
  1
  2
(2 rows)

postgres=# update gtt_test set id=100 where id=1;
UPDATE 1
postgres=# select * from gtt_test;
 id  
-----
   2
100
(2 rows)

postgres=# commit;
COMMIT
postgres=# select * from gtt_test;
 id 
----
(0 rows)

postgres=# begin;
BEGIN
postgres=# insert into gtt_test values(1);
INSERT 0 1
postgres=# insert into gtt_test values(2);
INSERT 0 1
postgres=# insert into gtt_test values(3);
INSERT 0 1
postgres=# select * from gtt_test;
 id 
----
  1
  2
  3
(3 rows)

postgres=# delete from gtt_test where id=2;
DELETE 0
postgres=# select * from gtt_test;
 id 
----
  1
  3
(2 rows)

postgres=# commit;
COMMIT
postgres=# select * from gtt_test;
 id 
----
(0 rows)

Wednesday 28 March 2018

Don't use "kill" on PostgreSQL... or you're gonna have a bad time!

Have you ever been tempted to "kill -9a non-behaving PostgreSQL client session that won't die via "pg_cancel_backend()" or "pg_terminate_backend()"?  Let's just say that's at first you may believe (incorrectly) that it will affect only the single session you've killed, but in reality it's an incredibly bad idea... as the result will be ALL of your client sessions being reset and the database crashing.





Here's an example.  Let's say there is one session you need to kill, so you use kill -9:

postgres=# select * from pg_stat_activity;
 datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event | state  | backend_xid | backend_xmin |              query
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+---------------------------------
 13323 | postgres | 15658 |       10 | postgres | psql             |             |                 |          -1 | 2018-03-28 16:46:27.924226+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663799+00 | 2018-03-28 16:46:34.663808+00 |                 |            | active |             |         1757 | select * from pg_stat_activity;
(1 row)


postgres=# \! kill -9 15658


The postmaster detects that a process has exited abnormally:

16:46:44 UTC [15642]: LOG:  server process (PID 15658) was terminated by signal 9: Killed
16:46:44 UTC [15642]: DETAIL:  Failed process was running: select * from pg_stat_activity;


And even though it wasn't in a transaction and was only running a simple select, the postmaster terminates "any other active service processes":


16:46:44 [15642]: LOG:  terminating any other active server processes


The end result is that the entire PostgreSQL cluster/server resets itself, and does crash recovery to cleanup.  The time taken to do this can be significant, depending on how busy your database is and how long it's been since the last checkpoint.

16:46:44 [15648]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
16:46:44 [15648]: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
16:46:44 [15642]: LOG:  archiver process (PID 15649) exited with exit code 1
16:46:44 [15642]: LOG:  all server processes terminated; reinitializing
16:46:44 [15661]: LOG:  database system was interrupted; last known up at 2018-03-28 16:46:16 UTC
16:46:44 [15661]: LOG:  database system was not properly shut down; automatic recovery in progress
16:46:44 [15661]: LOG:  redo starts at 0/155C230
16:46:44 [15661]: LOG:  invalid record length at 0/155C2A0: wanted 24, got 0
16:46:44 [15661]: LOG:  redo done at 0/155C268
16:46:44 [15661]: LOG:  checkpoint starting: end-of-recovery immediate
16:46:45 [15661]: LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.030 s, sync=0.000 s, total=0.035 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
16:46:45 [15661]: LOG:  MultiXact member wraparound protections are now enabled
16:46:45 [15665]: LOG:  autovacuum launcher started

16:46:45 [15642]: LOG:  database system is ready to accept connections


So in short, if you can't kill a session using the standard PostgreSQL Server Signalling Functions, then the only sensible option is to issue a restart during a maintenance window:

psql -c "checkpoint;"
pg_ctl restart -m fast

... or if that still won't work, then the last resort is:

pg_ctl restart -m immediate


Finally, it may seem like a more complicated process to follow when your database has gone awry, but if you don't use the PostgreSQL supported methods then your users will end up in a heap... just like they used to back in the Ingres days:



Stay safe, and keep your fingers off the trigger!

Greg.

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...