[2.0b2] DNS zone serials are not stable

Bug #1571645 reported by Gavin Panella on 2016-04-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Gavin Panella

Bug Description

Revision 4852 introduced a race into maasserver.Sequence.current() that
makes it no longer reliable. The new code ostensibly allows a current
value of a sequence to be obtained without the side-effects associated
with calling nextval() [1].

However, it does have side-effects. The code in question:

  cursor.execute(
      "SELECT setval(%s, nextval(%s) - %s);",
      [self.name, self.name, self.increment])

Sequences work outside of transactions, and are mutated immediately, so
there's a moment between nextval() and setval() above where the sequence
has been incremented. This is visible to other database sessions. This
can be demonstrated with a short example [2].

Calls to current() from different database sessions that occur close in
time — for example, when the database notifies all regiond listeners
that an IP address has been allocated — are at real risk of getting
different values out of current(). This means that zones might be
generated with inconsistent serials, or even a serial _before_ the
previous one.

In practice this _may_ work out okay, or it may not. We MUST, however,
remove this code from Sequence.current(). maasserver.Sequence is an OO
abstraction around PostgreSQL's sequences and is intended to share
essentially the same behaviour.

I can suggest two ways to fix this:

1. Subclass m.Sequence to something specific for DNS serials. Only the
   subclass would contain the problematic setval(nextval()) code, but
   would add an explicit non-transactional advisory database lock around
   it to eliminate the race.

2. In the database triggers that drive DNS serials — like
   sys_dns_subnet_update — instead of calling nextval() for its
   side-effects alone, capture its result and include it in the
   pg_notify() payload. The listening code would then never use
   Sequence.current() and the problematic code could be removed.

Of these I _strongly_ suggest #2; #1 would still be hack, albeit a
lesser one.

[1] Normally, in a database session, there is no current value of a
sequence until nextval() has been called on it.

[2] http://paste.ubuntu.com/15912293/. In a MAAS branch call it like so:

  DJANGO_SETTINGS_MODULE=maas.demo \
    bin/database --preserve run -- bin/py script.py

and leave it running for at least a minute.

Related branches

Changed in maas:
importance: Undecided → High
milestone: none → 2.0.0
status: New → Triaged
summary: - DNS zone serials are not stable
+ [2.0b2] DNS zone serials are not stable
Changed in maas:
importance: High → Critical
Gavin Panella (allenap) on 2016-04-19
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Gavin Panella (allenap)
Gavin Panella (allenap) on 2016-05-26
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers