OpenStack Compute (nova)

Comment 14 for bug 1837995

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-27: Fix merged to nova (stable/victoria)

#14

Reviewed: https://review.opendev.org/c/openstack/nova/+/785069
Committed: https://opendev.org/openstack/nova/commit/7b4f4796478941eafa9c0997f7ef03293c442d94
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 7b4f4796478941eafa9c0997f7ef03293c442d94
Author: melanie witt <email address hidden>
Date: Wed Jan 27 22:49:19 2021 +0000

Dynamically archive FK related records in archive_deleted_rows

    Currently, it is possible to "partially archive" the database by
    running 'nova-manage db archive_deleted_rows' with --max_rows or by
    interrupting the archive process in any way. When this happens, it is
    possible to have archived a record with a foreign key relationship to a
    parent record (example: 'instance_extra' table record is archived while
    the 'instances' table record remains).

    When an instance's records become "split" in this way, any API request
    that can (1) access the deleted instance and (2) tries to access data
    that should be in a child table (example: the embedded flavor for an
    instance) will fail with an OrphanedObjectError and HTTP 500 to the
    user. Examples of APIs that are affected by this are the tenant usage
    APIs and listing of deleted instances as admin.

    In the tenant usage example, the API looks at deleted instances to
    calculate usage over a time period. It pulls deleted and non-deleted
    instances and does instance.get_flavor() to calculate their usage. The
    flavor data is expected to be present because
    expecteds_attrs=['flavor'] is used to do a join with the
    'instance_extra' table and populate the instance object's flavor data.
    When get_flavor() is called, it tries to access the instance.flavor
    attribute (which hasn't been populated because the 'instance_extra'
    record is gone). That triggers a lazy-load of the flavor which loads
    the instance from the database again with expected_attrs=['flavor']
    again which doesn't populate instance.flavor (again) because the
    'instance_extra' record is gone. Then the Instance._load_flavor code
    intentionally orphans the instance record to avoid triggering
    lazy-loads while it attempts to populate instance.flavor,
    instance.new_flavor, and instance.old_flavor. Finally, another
    lazy-load is triggered (because instance.flavor is still not populated)
    and fails with OrphanedObjectError.

    One way to solve this problem is to make it impossible for
    archive_deleted_records to orphan records that are related by foreign
    key relationships. The approach is to process parent tables first
    (opposite of today where we process child tables first) and find all of
    the tables that refer to it by foreign keys, create and collect
    insert/delete statements for those child records, and then put them all
    together in a single database transaction to archive all related
    records "atomically". The idea is that if anything were to interrupt
    the transaction (errors or other) it would roll back and keep all the
    related records together. Either all or archived or none are archived.

    This changes the logic of the per table archive to discover tables that
    refer to the table by foreign keys and generates insert/delete query
    statements to execute in the same database transaction as the table
    archive itself. The extra records archived along with the table are
    added to the rows_archived result. The existing code for "archiving
    records if instance is deleted" also has to be removed along with this
    because the new logic does the same thing dynamically and makes it
    obsolete. Finally, some assertions in the unit tests need to be changed
    or removed because they were assuming certain types of archiving
    failures due to foreign key constraint violations that can no longer
    occur with the new dynamic logic for archiving child records.

Closes-Bug: #1837995

Conflicts:
nova/db/sqlalchemy/api.py

    NOTE(melwitt): The conflict is because change
    I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e (Remove six.text_type (1/2))
    is not in Victoria.

Change-Id: Ie653e5ec69d16ae469f1f8171fee85aea754edff
(cherry picked from commit becb94ae643ab4863daa564783646921b4a2b372)

Reviewed:  https://review.opendev.org/c/openstack/nova/+/785069
Committed: https://opendev.org/openstack/nova/commit/7b4f4796478941eafa9c0997f7ef03293c442d94
Submitter: "Zuul (22348)"
Branch:    stable/victoria

commit 7b4f4796478941eafa9c0997f7ef03293c442d94
Author: melanie witt <melwittt@gmail.com>
Date:   Wed Jan 27 22:49:19 2021 +0000

Dynamically archive FK related records in archive_deleted_rows
    
    Currently, it is possible to "partially archive" the database by
    running 'nova-manage db archive_deleted_rows' with --max_rows or by
    interrupting the archive process in any way. When this happens, it is
    possible to have archived a record with a foreign key relationship to a
    parent record (example: 'instance_extra' table record is archived while
    the 'instances' table record remains).
    
    When an instance's records become "split" in this way, any API request
    that can (1) access the deleted instance and (2) tries to access data
    that should be in a child table (example: the embedded flavor for an
    instance) will fail with an OrphanedObjectError and HTTP 500 to the
    user. Examples of APIs that are affected by this are the tenant usage
    APIs and listing of deleted instances as admin.
    
    In the tenant usage example, the API looks at deleted instances to
    calculate usage over a time period. It pulls deleted and non-deleted
    instances and does instance.get_flavor() to calculate their usage. The
    flavor data is expected to be present because
    expecteds_attrs=['flavor'] is used to do a join with the
    'instance_extra' table and populate the instance object's flavor data.
    When get_flavor() is called, it tries to access the instance.flavor
    attribute (which hasn't been populated because the 'instance_extra'
    record is gone). That triggers a lazy-load of the flavor which loads
    the instance from the database again with expected_attrs=['flavor']
    again which doesn't populate instance.flavor (again) because the
    'instance_extra' record is gone. Then the Instance._load_flavor code
    intentionally orphans the instance record to avoid triggering
    lazy-loads while it attempts to populate instance.flavor,
    instance.new_flavor, and instance.old_flavor. Finally, another
    lazy-load is triggered (because instance.flavor is still not populated)
    and fails with OrphanedObjectError.
    
    One way to solve this problem is to make it impossible for
    archive_deleted_records to orphan records that are related by foreign
    key relationships. The approach is to process parent tables first
    (opposite of today where we process child tables first) and find all of
    the tables that refer to it by foreign keys, create and collect
    insert/delete statements for those child records, and then put them all
    together in a single database transaction to archive all related
    records "atomically". The idea is that if anything were to interrupt
    the transaction (errors or other) it would roll back and keep all the
    related records together. Either all or archived or none are archived.
    
    This changes the logic of the per table archive to discover tables that
    refer to the table by foreign keys and generates insert/delete query
    statements to execute in the same database transaction as the table
    archive itself. The extra records archived along with the table are
    added to the rows_archived result. The existing code for "archiving
    records if instance is deleted" also has to be removed along with this
    because the new logic does the same thing dynamically and makes it
    obsolete. Finally, some assertions in the unit tests need to be changed
    or removed because they were assuming certain types of archiving
    failures due to foreign key constraint violations that can no longer
    occur with the new dynamic logic for archiving child records.
    
    Closes-Bug: #1837995
    
    Conflicts:
        nova/db/sqlalchemy/api.py
    
    NOTE(melwitt): The conflict is because change
    I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e (Remove six.text_type (1/2))
    is not in Victoria.
    
    Change-Id: Ie653e5ec69d16ae469f1f8171fee85aea754edff
    (cherry picked from commit becb94ae643ab4863daa564783646921b4a2b372)