Tuesday 20 March 2018

Running multiple PostgreSQL clusters in RedHat/CentOS?

It's always something that comes up, even in today's vitrualised and containerised environments... "How can I run multiple PostgreSQL clusters on a single server?"

If you're used to doing RPM installs on RedHat/CentOS, then you'll be aware that the default location for all database files is:

/var/lib/pgsql/<version>/data

And the related service name is:

postgresql-<version>

That works well when you're only running a single cluster of each major version on a server, but things become complicated when you need more than one.  There are several considerations:
  • What network port should it use?
  • Where should $PGDATA be?
  • What should the services be called?

What network port should it use?

This is the easiest to solve, as the default is "5432".  By convention subsequent clusters running on a host simply increment this number by one, so "5433", then "5434", etc.  In practice, you can choose any number you prefer, and some sites allocate a unique port to each cluster across the entire enterprise, so that there are never any clashes.

Modify the $PGDATA/postgresql.conf, and uncomment this line with the new value, and restart the cluster:


#port = 5432   # (change requires restart)



Where will $PGDATA be?

This becomes more complex, especially if you run SELinux in "enforcing" mode.  The simplest way is to extend the existing structure in a similar way that Ubuntu has done, and include the cluster name after the version:

/var/lib/pgsql/<version>/<cluster_name>/data

If you want several mountpoints because you prefer to split your database, then they can all be mounted at the same level as "data", and this still fits within the default SELinux configuration for PostgreSQL:

/var/lib/pgsql/<version>/<cluster_name>/data
                                       /pg_wal
                                       /pg_log
                                       /<tblspc_name>
                                       /pg_stat_tmp



What will the services be called?

Assuming you're using CentOS v7 and systemd then there are two options here, both of which have merits.  You can either use a separate service file for each new cluster, or create a single service file template to cover all clusters.

Service File Template

The file should be owned by root:root, mode 644, and be called (replacing "<version>" with the actual PostgreSQL version):

/etc/systemd/system/postgresql-<version>@.service

The contents should be:

.include /lib/systemd/system/postgresql-<version>.service

[Unit]
Description=PostgreSQL <version> database server - cluster_name:%i

[Service]
Environment=PGDATA=/var/lib/pgsql/<version>/%i/data


For PostgreSQL v9.6, this would look like:

[root@localhost]# cat /etc/systemd/system/postgresql-9.6@.service
.include /lib/systemd/system/postgresql-9.6.service

[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:%i

[Service]
Environment=PGDATA=/var/lib/pgsql/9.6/%i/data


Service File per Cluster

If you need more control on a per-cluster basis, then a separate file for each cluster must be created:

/etc/systemd/system/postgresql-<version>-<cluster_name>.service

The contents should be:

.include /lib/systemd/system/postgresql-<version>.service

[Unit]
Description=PostgreSQL <version> database server - cluster_name:<cluster_name>

[Service]
User=<postgres_service_user>
Group=<postgres_service_group>
Environment=PGDATA=/var/lib/pgsql/<version>/<cluster_name>/data


Here is an example for PostgreSQL v9.6, a cluster name of "test", running with the Linux user:group "postgresql_test:postgresql_test":

[root@localhost]# cat /etc/systemd/systempostgresql-9.6-test.service
.include /lib/systemd/system/postgresql-9.6.service

[Unit]
Description=PostgreSQL 9.6 database server - cluster_name:test

[Service]
User=postgres_test
Group=postgres_test
Environment=PGDATA=/var/lib/pgsql/9.6/test/data

What should the services be called?

Once you've chosen to use either a template or a service file per cluster, the service names are:

Service File Template

postgresql-<version>@<cluster_name>

To start a PostgreSQL v9.6 cluster called "main", you would use:

sudo systemctl start postgresql-9.6@main

Service File per Cluster

postgresql-<version>-<cluster_name>

To start a PostgreSQL v9.6 cluster called "main", you would use:

sudo systemctl start postgresql-9.6-main

Both are very similar, with the only difference being the use of an at "@", or a dash "-" between the version and the cluster name.

IMPORTANT: The services won't get picked up until you reload the systemd configuration by running:

sudo systemctl daemon-reload

Conclusion

This structure allows for the flexibility of running multiple clusters on a server, without going as far as Ubuntu and splitting the configuration files out of $PGDATA... which in my opinion is very "un-Postgres-ish".

Even if you don't need it now, adopting such a structure could someday make your life easier, because you're already prepared to run multiple clusters on the server.


Video

I presented my ideas to the London PostgreSQL Users Group, and their input was affirming and interesting at the same time.  If you want to hear what they had to say, then click below:


  https://skillsmatter.com/skillscasts/11736-multiple-postgre-clusters-on-redhat-centos#video


No comments:

Post a Comment

New PostgreSQL RDS versions - v10.14 , v11.9, and v12.4

It appears that new patch versions of PostgreSQL RDS are being rolled out in AWS.. so we should expect an announcement soon: REGION: us-east...