Bug #1622545 “archive_deleted_rows isn't archiving instances” : Bugs : OpenStack Compute (nova)

Roman Podoliaka (rpodolyaka) on 2016-09-12

tags:

added: db

leehom (feli5) on 2016-09-12

Changed in nova:
assignee:	nobody → leehom (feli5)

Roman Podoliaka (rpodolyaka) on 2016-09-12

Changed in nova:
status:	New → Confirmed
assignee:	leehom (feli5) → Roman Podoliaka (rpodolyaka)
importance:	Undecided → Medium
status:	Confirmed → New
assignee:	Roman Podoliaka (rpodolyaka) → nobody

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-09-12:

#1

I checked this on devstack master and rows are soft-deleted / archived properly:

https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1838-L1840 (InstanceExtra row is soft-deleted on soft-deletion of an instance)
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L6392 (tables are processed in reverse order based on FKs, so that we delete referencing tables first)

http://paste.openstack.org/show/572335/

It's not clear to me, how you can run into this problem, given the fact creation / soft-deletion of InstanceExtra rows was added at the same time in https://review.openstack.org/#/c/108097

Changed in nova:
assignee:	nobody → leehom (feli5)

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-09-12:

#2

feli5, sorry, I assigned this to myself by accident.

leehom (feli5) on 2016-09-12

Changed in nova:
assignee:	leehom (feli5) → nobody

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-09-12:

#3

Do you have instances that existed in the database before upgrading to mitaka and the archive is failing on those? Otherwise I agree with Roman and I'm not sure how you're hitting issues with instance_extra foreign keys.

Maybe check something like:

select flavor from nova.instance_extra where instance_uuid in (select uuid from nova.instances where deleted != 0);

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-09-12:

#4

Also, can you confirm that you have this fix? https://review.openstack.org/#/c/246635/

Changed in nova:
status:	New → Incomplete

Revision history for this message

Derek Higgins (derekh) wrote on 2016-09-13:

#5

@Matt this was a fresh mitaka install there was no upgrade

Also I checked and I have the patch
https://review.openstack.org/#/c/246635/

I'll see if I can get a dump of the sql to help debug the problem.

Revision history for this message

Derek Higgins (derekh) wrote on 2016-09-16:

#6

I've uploaded a dump of the nova database here
http://goodsquishy.com/downloads/nova.sql.gz

Revision history for this message

Dan Smith (danms) wrote on 2016-09-27:

#7

So the reason this is happening is that we have some residue laying around in the database from failed cleanups. This would be things like instances that are deleted, but that have undeleted instance_extra or instance_fault records for them. The archive process is too naive to handle this, and thus those instances can never get purged. Patch coming for discussion shortly.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix proposed to nova (master)

#8

Fix proposed to branch: master
Review: https://review.openstack.org/377933

Changed in nova:
assignee:	nobody → Dan Smith (danms)
status:	Incomplete → In Progress

Revision history for this message

Dan Smith (danms) wrote on 2016-09-27:

#9

FYI, using Derek's database dump:

FYI, using Derek's database dump:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix merged to nova (master)

#10

Reviewed: https://review.openstack.org/377933
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ceaf853894352b6d0ae12efe85ba5eb4e651e58a
Submitter: Jenkins
Branch: master

commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

Closes-Bug: #1622545
Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix proposed to nova (stable/newton)

#11

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/378055

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-28: Fix proposed to nova (stable/mitaka)

#12

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/378650

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-05: Fix merged to nova (stable/mitaka)

#13

Reviewed: https://review.openstack.org/378650
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=76d1b24c00a4dc24c9bc3290fca513b5ece7247a
Submitter: Jenkins
Branch: stable/mitaka

commit 76d1b24c00a4dc24c9bc3290fca513b5ece7247a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

Conflicts:
nova/db/sqlalchemy/api.py (indentation change)

    Closes-Bug: #1622545
    Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18
    (cherry picked from commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a)

tags:

added: in-stable-mitaka

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-10: Fix merged to nova (stable/newton)

#14

Reviewed: https://review.openstack.org/378055
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f15561b60957cb67a40c51b1f636a37b26c0205a
Submitter: Jenkins
Branch: stable/newton

commit f15561b60957cb67a40c51b1f636a37b26c0205a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

    Closes-Bug: #1622545
    Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18
    (cherry picked from commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a)

tags:

added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-10: Fix included in openstack/nova 13.1.2

#15

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-12: Fix included in openstack/nova 14.0.1

#16

This issue was fixed in the openstack/nova 14.0.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-10: Fix included in openstack/nova 13.1.2

#17

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-17: Fix included in openstack/nova 15.0.0.0b1

#18

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

OpenStack Compute (nova)

archive_deleted_rows isn't archiving instances

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches