[feature] support setting target_session_attrs postgres option by passing additional options to a django database backend

Bug #1791957 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Medium
Unassigned

Bug Description

libpq coming with PostgreSQL 10 has support for multiple hostnames and IP addresses and iterating over them until it finds a good endpoint to connect to. It also supports resolving a single hostname with multiple address records backing it and iterating over those. However target_session_attrs needs to be passed to the postgres backend in order to always connect to the master instance, not slaves.

# without target_session_attrs
python -c "import psycopg2; psycopg2.connect('user=postgres password=postgres host=pg1,pg2,pg2andpg3 port=5432,5432,5432 dbname=maas connect_timeout=3')"

# with target_session_attrs
python -c "import psycopg2; psycopg2.connect('user=postgres password=postgres host=pg1,pg2,pg2andpg3 port=5432,5432,5432 dbname=maas connect_timeout=3 target_session_attrs=read-write')"

https://paquier.xyz/postgresql-2/postgres-10-multi-host-connstr/

Parsing and handling: https://git.launchpad.net/~usd-import-team/ubuntu/+source/postgresql-10/tree/src/interfaces/libpq/fe-connect.c?h=ubuntu/bionic-updates#n5419

This explains that target_session_attrs=read-write needs to be passed in order to avoid connecting to read-only slaves:
https://paquier.xyz/postgresql-2/postgres-10-libpq-read-write/
“any”, meaning that any kind of servers can be accepted. This is as well the default value.
“read-write”, to disallow connections to read-only servers, hot standbys for example.

"If a failover happens and a standby is promoted and switches to be a primary, target_session_attrs can be used in read-write mode with the addresses of all the nodes of the cluster to allow the application to connect to a primary for read-write actions or any nodes for read-only actions."

MAAS relies on django and psycopg2 backend to connect to postgres.

https://github.com/maas/maas/blob/2.4.2/src/maasserver/djangosettings/settings.py#L105-L115

However, it does not allow passing OPTIONS which could contain "target_session_attrs=read-write"

https://docs.djangoproject.com/en/2.1/ref/settings/#std:setting-OPTIONS
"Extra parameters to use when connecting to the database. Available parameters vary depending on your database backend."

Using the default setting target_session_attrs=any should also work but if we know that there is only one master in the cluster it is better to be specific.

"host=host1,host2,host3 target_session_attrs=any

This simplifies the logic at application level: there is no need for it to know exactly which node is the primary and which ones are the standbys. The cost though, is an increase in connection failures when using the read-write mode, but that may be acceptable if the cluster is in a low-latency environment."

~~
Looks like people did not have any real issues in using this via psycopg2:
https://github.com/psycopg/psycopg2/issues/602
https://github.com/psycopg/psycopg2/issues/669

~~

On failover handling:

From the perspective of using a VIP and gratuitous ARP type of failover on a single subnet (with Pacemaker managing the VIP and GARP), it doesn't look like there is a big difference with the multi-endpoint setup:
We do not have any TCP connection state synchronization as in LVS http://www.linuxvirtualserver.org/docs/sync.html or other connection state replication relevant to PostgreSQL between PostgreSQL nodes - there is only data replication;
on failover a client recreates a TCP connection to the same VIP endpoint.
In the case of multiple endpoints we would just try several of them before the client lib would declare the connection as failed when creating a new connection or during failover handling.

The bulk of the client logic is in PQconnectPoll and the rest should be handled in the client library (psycopg2 that uses libpq).
https://git.launchpad.net/~usd-import-team/ubuntu/+source/postgresql-10/tree/src/interfaces/libpq/fe-connect.c?h=ubuntu/bionic-updates#n2065

https://docs.djangoproject.com/en/2.1/ref/databases/#connection-management (unrecoverable errors seem to affect only one request in django which is used by MAAS).

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Dmitrii,

In what version of psycopg was this introduced/supported?

Also, allowing the passing of options would mean that you would change the behavior of how MAAS connects to PostgreSQL. If that is to cause any issue with MAAS or break things, this would be an unsupported configuration provided that this is not something that the team has tested or validated or event made sure MAAS can work with. THis is because any region can write to postgresql and could potentially lead to issues we won't be able to resolve.

That said, do you have any production clusters that are doing this today by manually modifying MAAS?

Changed in maas:
status: New → Incomplete
milestone: none → 2.5.x
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Hi Andres,

Thanks for the answer.

> In what version of psycopg was this introduced/supported?

I think any contemporary version will do because additional arguments are passed down to libpq in a generic way:
https://git.launchpad.net/~usd-import-team/ubuntu/+source/psycopg2/tree/lib/__init__.py?h=applied/ubuntu/bionic#n85

def connect(dsn=None, connection_factory=None, cursor_factory=None, **kwargs):
...
    Any other keyword parameter will be passed to the underlying client
    library: the list of supported parameters depends on the library version.
# ...
# calls C code
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)

Grepping for target_session_attrs in the psycopg2 code base does not show any result which means that there is no specific support required in that lib - only in libpq.

> If that is to cause any issue with MAAS or break things, this would be an unsupported configuration provided that this is not something that the team has tested or validated or event made sure MAAS can work with.

Currently our deployments implicitly use target_session_attrs=any (as this is a default value in PostgreSQL 10) and rely on a single VIP endpoint set up via pacemaker and corosync. Changes in the client connection state would have to be there for sure during failover because requests will hit a different DB instance after a gratuitous ARP is sent on the broadcast domain.

I guess there could be differences in how retries are handled with multi-endpoint support (I have not looked into it deeply yet) but it does not look like there is a big difference for the application (MAAS) side as connection strings are handled in libpq.

Surely this needs to be verified before use.

> That said, do you have any production clusters that are doing this today by manually modifying MAAS?

Not at the moment but I have a very short-term requirement of using multiple subnets for a MAAS HA setup which means that using a VIP is not possible directly - there is no shared L2 to work with. Specifically, in a L3 leaf-spine setup every leaf switch terminates a complete L2 and every infrastructure node hosting MAAS (connected to its own leaf) will have interfaces with IP addresses on different subnets.

As for alternatives:

1) Using a localhost haproxy between MAAS services and postgres to hide multiple endpoints (providing 127.0.0.1:5432 to MAAS) would require using complex postgres protocol-aware master health-checks;
2) Using a shared VXLAN for all infrastructure nodes and setting up a VIP on that stretched L2 requires some automation changes and is a bit odd to do.

I think I will have to settle on (2) for a workaround.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

> any contemporary version will do

psycopg in bionic supports passing this option.

https://git.launchpad.net/~usd-import-team/ubuntu/+source/psycopg2/tree/lib/__init__.py?h=applied/ubuntu/bionic#n85

Changed in maas:
status: Incomplete → New
summary: - support setting target_session_attrs postgres option by passing
- additional options to a django database backend
+ [feature] support setting target_session_attrs postgres option by
+ passing additional options to a django database backend
Changed in maas:
status: New → Triaged
tags: added: feature
description: updated
Changed in maas:
milestone: 2.5.x → next
Changed in maas:
milestone: next → 2.6.0
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Thanks for your feature request, please could you start a thread over on our Discourse (https://discourse.maas.io/c/features) to discuss the feature?

Changed in maas:
importance: Undecided → Medium
assignee: Blake Rouse (blake-rouse) → nobody
status: Triaged → Invalid
milestone: 2.6.0 → none
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.