Wednesday, 23 October 2019

Think AWS support for RDS gives 100% coverage? Think again!

Just a friendly FYI that AWS do not have access to any of your PostgreSQL RDS database logs, such as postgresql.log.  This is important, if you just happen to hit PostgreSQL bug such as this one that was fixed in v11.3:

  • Fix race conditions in management of dynamic shared memory (Thomas Munro)

    These could lead to dsa_area could not attach to segment or cannot unpin a segment that is not pinned errors.

Once this problem occurs, your RDS database will appear to be running, and can possibly be used to service some queries... but most queries will hang, and eventually your application servers will timeout causing application downtime.

Unfortunately from the AWS support perspective, the RDS database is up, it's using some CPU, and network, so there's no reason to suspect it's not healthy.  It's only if you see this message by looking in your postgresql.log that you will know what the problem is:

2018-01-01 12:34:56 UTC::@:[10234]:FATAL:  cannot unpin a segment that is not pinned



So if you run an application team, then please ensure you have a DBA somewhere on staff, because AWS doesn't have your back when you hit problems such as this.  You could always just turn it off and on again, but you will never find the root-cause and will be cursed to keep rebooting whenever things break at unpredictable times.

No comments:

Post a Comment

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...