Many records created at the same time may lead to a race condition

Bug #1871332 reported by Erik Olof Gunnar Andersson
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Designate
Fix Released
Undecided
Erik Olof Gunnar Andersson
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Stein
Fix Released
High
Unassigned
Train
Fix Released
High
Unassigned
designate (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * When multiple records are created at the very same time the
   serial number may not be updated properly. This is especially
   easy to reproduce with the designate-sink and creating 2+ VMs
   at the same time.

 * The fix is included in the upstream Stein branch but is not
   included in any Stein release.

[Test Plan]

 * Create multiple DNS recordsets in parallel and verify that
   they are all included in the DNS zone on the backend DNS
   server. It is important to created those recordsets very close
   in time so that the bug is triggered. Using a deployment
   mechanism such as Terraform might help.

[Where problems could occur]

 * The upstream fix introduces a central lock which prevents the
   bug but also leads to performance degradation because this
   lock functions as a parallel barrier.

[Original description]

When multiple records are created at the very same time the serial number may not be updated properly. This is especially easy to reproduce with the designate-sink and creating 2+ VMs at the same time.

Changed in designate:
assignee: nobody → Erik Olof Gunnar Andersson (eandersson)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.opendev.org/717955
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=f6090d885c8ebe00a89307a1e2f5621de75dcda8
Submitter: Zuul
Branch: master

commit f6090d885c8ebe00a89307a1e2f5621de75dcda8
Author: Erik Olof Gunnar Andersson <email address hidden>
Date: Mon Apr 6 22:39:36 2020 -0700

    Adding distributed locking to central

    The current locking implementation is limited to
    the running process. This introduces distributed
    locking that will help prevent race conditions
    when there are many instances of designate-central
    running.

    Closes-Bug: #1871332
    Change-Id: I98f7f80ce365cdee33528f9964c03274f62a795a

Changed in designate:
status: In Progress → Fix Released
Revision history for this message
Akradiy Shinkarev (k3nny0ne) wrote :

Backported this change to locally installed stable/train, but catch race condition on 2 concurrent 'create recordset' operations - zone serial is not being updated for 'second' recordset (logs from designate-central in attachment).

Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

What coordination backend are you using? The above fix only works if you enable something like Redis or Memcached.

Revision history for this message
Akradiy Shinkarev (k3nny0ne) wrote :

It is Redis (default in Kolla Ansible):

# grep -A1 coordination /etc/kolla/designate-central/designate.conf
[coordination]
backend_url = redis://xxx:yyy@10.10.10.31:26379?sentinel=kolla&sentinel_fallback=10.10.10.32:26379&sentinel_fallback=10.10.10.33:26379&db=0&socket_timeout=60&retry_on_timeout=yes

Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

There is something that seems off, because your logs is showing a traditional threading lock.
> Acquired lock <zone-id>

Can you make sure that the coordinator is returning a tooz based lock and not the lockutils based lock?
https://github.com/openstack/designate/blob/7ff2f3956d3c543bbeaf5aafaf214965a1f44df3/designate/coordination.py#L63

As an example you can look at the central logs from master. You don't see the lockutils used for zone actions here.
https://zuul.opendev.org/t/openstack/build/d8fbeeb3fdf84561b34e7d4c32b872b4/log/controller/logs/screen-designate-central.txt

Revision history for this message
Akradiy Shinkarev (k3nny0ne) wrote :

I double checked code in designate-central container and realise that it was non-patched version, sorry :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/736055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (stable/train)

Reviewed: https://review.opendev.org/736055
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=6a48f54dfc74357e8eacb119f881f03db5e28550
Submitter: Zuul
Branch: stable/train

commit 6a48f54dfc74357e8eacb119f881f03db5e28550
Author: Erik Olof Gunnar Andersson <email address hidden>
Date: Mon Apr 6 22:39:36 2020 -0700

    Adding distributed locking to central

    The current locking implementation is limited to
    the running process. This introduces distributed
    locking that will help prevent race conditions
    when there are many instances of designate-central
    running.

    Closes-Bug: #1871332
    Change-Id: I98f7f80ce365cdee33528f9964c03274f62a795a
    (cherry picked from commit f6090d885c8ebe00a89307a1e2f5621de75dcda8)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/designate 9.0.2

This issue was fixed in the openstack/designate 9.0.2 release.

affects: designate (Ubuntu) → cloud-archive
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

[Impact]

 * When multiple records are created at the very same time the
   serial number may not be updated properly. This is especially
   easy to reproduce with the designate-sink and creating 2+ VMs
   at the same time.

 * The fix is included in the upstream Stein branch but is not
   included in any Stein release.

[Test Plan]

 * Create multiple DNS recordsets in parallel and verify that
   they are all included in the DNS zone on the backend DNS
   server. It is important to created those recordsets very close
   in time so that the bug is triggered. Using a deployment
   mechanism such as Terraform might help.

[Where problems could occur]

 * The upstream fix introduces a central lock which prevents the
   bug but also leads to performance degradation because this
   lock functions as a parallel barrier.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

The package which includes the upstream change can be found here:

https://launchpad.net/~nicolasbock/+archive/ubuntu/bionic-stein/+packages

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Original description:

When multiple records are created at the very same time the serial number may not be updated properly. This is especially easy to reproduce with the designate-sink and creating 2+ VMs at the same time.

description: updated
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Debdiff for Bionic / Stein.

Changed in designate (Ubuntu Focal):
status: New → Fix Released
Changed in designate (Ubuntu):
status: New → Fix Released
Changed in cloud-archive:
status: New → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks Nicolas. New package versions with this patch have been uploaded to train-staging and stein-staging PPAs.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Thanks Corey!

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Erik, or anyone else affected,

Accepted designate into train-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-train-needed to verification-train-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-train-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-train-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Erik, or anyone else affected,

Accepted designate into stein-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:stein-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-stein-needed to verification-stein-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-stein-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-stein-needed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Hello Erik, or anyone else affected,

Accepted designate into train-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-train-needed to verification-train-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-train-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Hi Chris,

I verified that the package from stein-proposed fixes the issue.

tags: added: verification-stein-done
removed: verification-stein-needed
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Hi Chris,

I also verified train-proposed.

tags: added: verification-train-done
removed: verification-train-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for designate has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package designate - 1:9.0.2-0ubuntu1~cloud0
---------------

 designate (1:9.0.2-0ubuntu1~cloud0) bionic-train; urgency=medium
 .
   * d/watch: Add trailing slash to URL.
   * New stable point release for OpenStack Train (LP: #1923038).
   * d/p/0001-Adding-distributed-locking-to-central.patch: removed as fix
     landed upstream.
 .
 designate (1:9.0.1-0ubuntu1~cloud1) bionic-train; urgency=medium
 .
   [ Chris MacNaughton ]
   * d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.
 .
   [ Corey Bryant ]
   * d/p/0001-Adding-distributed-locking-to-central.patch: Fix race with
     multiple instances of designate-central (LP: #1871332).

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for designate has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package designate - 1:8.0.1-0ubuntu1~cloud1
---------------

 designate (1:8.0.1-0ubuntu1~cloud1) bionic-stein; urgency=medium
 .
   [ Chris MacNaughton ]
   * d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.
 .
   [ Nicolas Bock ]
   * d/p/0001-Adding-distributed-locking-to-central.patch: Fix race with
     multiple instances of designate-central (LP: #1871332).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/designate stein-eol

This issue was fixed in the openstack/designate stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.