Performance degradation archiving DB with large numbers of FK related records

Bug #2024258 reported by melanie witt
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
melanie witt
Antelope
In Progress
Undecided
Unassigned
Wallaby
In Progress
Undecided
Unassigned
Xena
In Progress
Undecided
Unassigned
Yoga
In Progress
Undecided
Unassigned
Zed
In Progress
Undecided
Unassigned
nova (Ubuntu)
Won't Fix
Undecided
Unassigned
Focal
Fix Committed
Undecided
Chengen Du
Jammy
Fix Committed
Undecided
Chengen Du

Bug Description

[Impact]
Originally, Nova archives deleted rows in batches consisting of a maximum number of parent rows (max_rows) plus their child rows, all within a single database transaction.
This approach limits the maximum value of max_rows that can be specified by the caller due to the potential size of the database transaction it could generate.
Additionally, this behavior can cause the cleanup process to frequently encounter the following error:
oslo_db.exception.DBError: (pymysql.err.InternalError) (3100, "Error on observer while running replication hook 'before_commit'.")

The error arises when the transaction exceeds the group replication transaction size limit, a safeguard implemented to prevent potential MySQL crashes [1].
The default value for this limit is approximately 143MB.

[Fix]
An upstream commit has changed the logic to archive one parent row and its related child rows in a single database transaction.
This change allows operators to choose more predictable values for max_rows and achieve more progress with each invocation of archive_deleted_rows.
Additionally, this commit reduces the chances of encountering the issue where the transaction size exceeds the group replication transaction size limit.

commit 697fa3c000696da559e52b664c04cbd8d261c037
Author: melanie witt <email address hidden>
CommitDate: Tue Jun 20 20:04:46 2023 +0000

    database: Archive parent and child rows "trees" one at a time

[Test Plan]
1. Create an instance and delete it in OpenStack.
2. Log in to the Nova database and confirm that there is an entry with a deleted_at value that is not NULL.
select display_name, deleted_at from instances where deleted_at <> 0;
3. Execute the following command, ensuring that the timestamp specified in --before is later than the deleted_at value:
nova-manage db archive_deleted_rows --before "XXX-XX-XX XX:XX:XX" --verbose --until-complete
4. Log in to the Nova database again and confirm that the entry has been archived and removed.
select display_name, deleted_at from instances where deleted_at <> 0;

[Where problems could occur]
The commit changes the logic for archiving deleted entries to reduce the size of transactions generated during the operation.
If the patch contains errors, it will only impact the archiving of deleted entries and will not affect other functionalities.

[1] https://bugs.mysql.com/bug.php?id=84785

[Original Bug Description]

Observed downstream in a large scale cluster with constant create/delete
server activity and hundreds of thousands of deleted instances rows.

Currently, we archive deleted rows in batches of max_rows parents +
their child rows in a single database transaction. Doing it that way
limits how high a value of max_rows can be specified by the caller
because of the size of the database transaction it could generate.

For example, in a large scale deployment with hundreds of thousands of
deleted rows and constant server creation and deletion activity, a
value of max_rows=1000 might exceed the database's configured maximum
packet size or timeout due to a database deadlock, forcing the operator
to use a much lower max_rows value like 100 or 50.

And when the operator has e.g. 500,000 deleted instances rows (and
millions of deleted rows total) they are trying to archive, being
forced to use a max_rows value several orders of magnitude lower than
the number of rows they need to archive is a poor user experience and
makes it unclear if archive progress is actually being made.

melanie witt (melwitt)
description: updated
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 28.0.0.0rc1

This issue was fixed in the openstack/nova 28.0.0.0rc1 release candidate.

Revision history for this message
melanie witt (melwitt) wrote :
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/yoga)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/887983
Reason: stable/yoga branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/yoga if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/888002
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/887998
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/887988
Reason: stable/wallaby branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/wallaby if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/xena)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/887985
Reason: stable/xena branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/xena if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/zed)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/887981
Reason: stable/zed branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/zed if you want to further work on this patch.

Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Focal

description: updated
Revision history for this message
Chengen Du (chengendu) wrote :

Debdiff for Jammy

Changed in nova (Ubuntu Focal):
status: New → In Progress
Changed in nova (Ubuntu Jammy):
status: New → In Progress
Changed in nova (Ubuntu Focal):
assignee: nobody → Chengen Du (chengendu)
Changed in nova (Ubuntu Jammy):
assignee: nobody → Chengen Du (chengendu)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp2024258-nova-focal.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
description: updated
Revision history for this message
Steve Langasek (vorlon) wrote :

What is the status of this bug in noble? It needs to be resolved for noble before we can SRU it to jammy and focal.

Changed in nova (Ubuntu):
status: New → Incomplete
Revision history for this message
Chengen Du (chengendu) wrote :

Noble 3:29.0.1-0ubuntu1.3 already contains the patch, so no changes are needed.

Changed in nova (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
Lukas Märdian (slyon) wrote :

I can confirm the patch is included in Noble, via new upstream snapshot in 3:27.1.0+git2023071215.f7ce4df5-0ubuntu1

Also, the backported patch matches the upstream logic of https://opendev.org/openstack/nova/commit/697fa3c000696da559e52b664c04cbd8d261c037

Backported changes are documented in the patch header.

Builds fine and passes its build-time tests.

LGTM.

I've rebased on top of the most recent security fixes (3:25.2.1-0ubuntu2.4) and fixed the version string to be "3:25.2.1-0ubuntu2.5".

(Please see https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging for future reference)

Revision history for this message
Lukas Märdian (slyon) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

The nova upload for Focal has been rejected as it does not include the pending nova upload in the unapproved queue.
Please either re-upload them combined or rebase this upload on top of that after it is in -proposed.
Thanks!

Revision history for this message
Lukas Märdian (slyon) wrote :

rebased on top of the pending nova upload. re-uploaded as a combined fix on focal in 21.2.4-0ubuntu2.10

I dropped the security debdiff from 21.2.4-0ubuntu2.8 as those are already included in focal-updates.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Thanks, Lukas!

I rejected/reuploaded (now I had time :) the combined changes with version 2.9 (as this has not yet been used in the archive).

The delta from 2.8 seen in the queue might be an effect of it not considering -security, I guess; it isn't re-included in the upload.

This should be good to go for review by the SRU team, AFAICT.
Thanks again.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Uploaded a combined SRU for Jammy (ubuntu2.7) with
bug 1999814 (previously in jammy-proposed; ubuntu2.4) and
bug 2024258 (previously in jammy's unapproved queue; ubuntu2.5),
which were superseded by security upload (jammy-security; ubuntu2.6).

Build-tested (with build-time unit tests) in a PPA.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello melanie, or anyone else affected,

Accepted nova into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/3:25.2.1-0ubuntu2.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nova (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Changed in nova (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello melanie, or anyone else affected,

Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.2.4-0ubuntu2.12 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (nova/2:21.2.4-0ubuntu2.12)

All autopkgtests for the newly accepted nova (2:21.2.4-0ubuntu2.12) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

ceilometer/1:14.1.0-0ubuntu1 (amd64, ppc64el, s390x)
nova/2:21.2.4-0ubuntu2.12 (amd64, arm64, armhf, ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#nova

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Chengen Du (chengendu) wrote :

The nova package in jammy-proposed has been tested according to the [Test Plan].
The test results met our expectations.
ubuntu@juju-7d3324-openstack-jammy-7:~$ apt policy nova-common
nova-common:
  Installed: 3:25.2.1-0ubuntu2.7
  Candidate: 3:25.2.1-0ubuntu2.7
  Version table:
 *** 3:25.2.1-0ubuntu2.7 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     3:25.2.1-0ubuntu2.6 500
        500 http://availability-zone-2.clouds.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     3:25.0.0-0ubuntu1 500
        500 http://availability-zone-2.clouds.archive.ubuntu.com/ubuntu jammy/main amd64 Packages

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Chengen Du (chengendu) wrote :

@mfo @slyon

I apologize for the indentation issue in the focal patch.
I may have inadvertently modified the patches after testing them.
The issue originates from nova/tests/functional/db/test_archive.py, where the test_archive_deleted_rows_parent_child_trees_one_at_time function is not indented correctly.

Could you please confirm if I need to upload a new patch to fix this? I apologize again for increasing your workload.

tags: added: verification-failed-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Chengen,

Thanks for clarifying. No worries, I can fix it up this time. I know you'll keep an eye on this. :)

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Fixed up, tested, uploaded.

Before:
---

Setting up python3-nova (2:21.2.4-0ubuntu2.12) ...
Sorry: IndentationError: unindent does not match any outer indentation level (test_archive.py, line 172)
dpkg: error processing package python3-nova (--configure):
 installed python3-nova package post-installation script subprocess returned error exit status 1
Processing triggers for man-db (2.9.1-1) ...
Errors were encountered while processing:
 python3-nova
E: Sub-process /usr/bin/dpkg returned an error code (1)

After:
---

Setting up python3-nova (2:21.2.4-0ubuntu2.13) ...
Processing triggers for man-db (2.9.1-1) ...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

For reference, the autopkgtests regression signature:

195s Setting up python3-nova (2:21.2.4-0ubuntu2.12) ...
197s Sorry: IndentationError: unindent does not match any outer indentation level (test_archive.py, line 172)
199s dpkg: error processing package python3-nova (--configure):
199s installed python3-nova package post-installation script subprocess returned error exit status 1
...
199s dpkg: dependency problems prevent configuration of nova-compute-libvirt:
...
199s dpkg: dependency problems prevent configuration of nova-compute-kvm:
...
199s dpkg: dependency problems prevent configuration of nova-compute:
...
199s dpkg: dependency problems prevent configuration of autopkgtest-satdep:
...
199s Errors were encountered while processing:
199s python3-nova
199s nova-compute-libvirt
199s nova-compute-kvm
199s nova-compute
199s autopkgtest-satdep
199s E: Sub-process /usr/bin/dpkg returned an error code (1)
...
205s test-services FAIL badpkg

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello melanie, or anyone else affected,

Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.2.4-0ubuntu2.13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-focal
removed: verification-failed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (nova/2:21.2.4-0ubuntu2.13)

All autopkgtests for the newly accepted nova (2:21.2.4-0ubuntu2.13) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

nova/2:21.2.4-0ubuntu2.13 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#nova

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Chengen Du (chengendu) wrote :

The nova package in focal-proposed has been tested according to the [Test Plan].
The test results met our expectations.
root@juju-757eba-openstack-focal-7:~# apt policy nova-common
nova-common:
  Installed: 2:21.2.4-0ubuntu2.13
  Candidate: 2:21.2.4-0ubuntu2.13
  Version table:
 *** 2:21.2.4-0ubuntu2.13 500
        500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2:21.2.4-0ubuntu2.11 500
        500 http://availability-zone-1.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
     2:21.0.0~b3~git2020041013.57ff308d6d-0ubuntu2 500
        500 http://availability-zone-1.clouds.archive.ubuntu.com/ubuntu focal/main amd64 Packages

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

The autopkgtests regression was due to a transient infrastructure issue, and has cleared with a retry.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.