archive_deleted_rows isn't archiving instances

Bug #1622545 reported by Derek Higgins
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Dan Smith

Bug Description

Running "nova-manage archive_deleted_rows ..." clears out little or none of the deleted nova instances

for example running the command several times

$ nova-manage --debug db archive_deleted_rows --max_rows 100000 --verbose

I get
+--------------------------+-------------------------+
| Table | Number of Rows Archived |
+--------------------------+-------------------------+
| block_device_mapping | 10108 |
| instance_actions | 31838 |
| instance_actions_events | 2 |
| instance_extra | 10108 |
| instance_faults | 459 |
| instance_info_caches | 10108 |
| instance_metadata | 6037 |
| instance_system_metadata | 17883 |
| reservations | 9 |
+--------------------------+-------------------------+

the only way I've been able to get an instances archived is to lower the --max-rows parameter, but this only deletes a small number of the instances and sometimes doesn't archive any at all

In my nova-mange.log I have the following error

2016-09-12 09:22:21.658 17603 WARNING nova.db.sqlalchemy.api [-] IntegrityError detected when archiving table instances: (pymysql.err.IntegrityError) (1451, u'Cannot delete or update a parent row: a foreign key constraint fails (`nova`.`instance_extra`, CONSTRAINT `instance_extra_instance_uuid_fkey` FOREIGN KEY (`instance_uuid`) REFERENCES `instances` (`uuid`))') [SQL: u'DELETE FROM instances WHERE instances.id in (SELECT T1.id FROM (SELECT instances.id \nFROM instances \nWHERE instances.deleted != %s ORDER BY instances.id \n LIMIT %s) as T1)'] [parameters: (0, 787)]

mysql -e 'select count(*) from instances where deleted_at is not NULL;' nova
+----------+
| count(*) |
+----------+
| 70829 |
+----------+

I'm running mitaka with this patch installed
https://review.openstack.org/#/c/326730/1

tags: added: db
leehom (feli5)
Changed in nova:
assignee: nobody → leehom (feli5)
Changed in nova:
status: New → Confirmed
assignee: leehom (feli5) → Roman Podoliaka (rpodolyaka)
importance: Undecided → Medium
status: Confirmed → New
assignee: Roman Podoliaka (rpodolyaka) → nobody
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I checked this on devstack master and rows are soft-deleted / archived properly:

https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1838-L1840 (InstanceExtra row is soft-deleted on soft-deletion of an instance)
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L6392 (tables are processed in reverse order based on FKs, so that we delete referencing tables first)

http://paste.openstack.org/show/572335/

It's not clear to me, how you can run into this problem, given the fact creation / soft-deletion of InstanceExtra rows was added at the same time in https://review.openstack.org/#/c/108097

Changed in nova:
assignee: nobody → leehom (feli5)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

feli5, sorry, I assigned this to myself by accident.

leehom (feli5)
Changed in nova:
assignee: leehom (feli5) → nobody
Revision history for this message
Matt Riedemann (mriedem) wrote :

Do you have instances that existed in the database before upgrading to mitaka and the archive is failing on those? Otherwise I agree with Roman and I'm not sure how you're hitting issues with instance_extra foreign keys.

Maybe check something like:

select flavor from nova.instance_extra where instance_uuid in (select uuid from nova.instances where deleted != 0);

Revision history for this message
Matt Riedemann (mriedem) wrote :

Also, can you confirm that you have this fix? https://review.openstack.org/#/c/246635/

Changed in nova:
status: New → Incomplete
Revision history for this message
Derek Higgins (derekh) wrote :

@Matt this was a fresh mitaka install there was no upgrade

Also I checked and I have the patch
https://review.openstack.org/#/c/246635/

I'll see if I can get a dump of the sql to help debug the problem.

Revision history for this message
Derek Higgins (derekh) wrote :

I've uploaded a dump of the nova database here
http://goodsquishy.com/downloads/nova.sql.gz

Revision history for this message
Dan Smith (danms) wrote :

So the reason this is happening is that we have some residue laying around in the database from failed cleanups. This would be things like instances that are deleted, but that have undeleted instance_extra or instance_fault records for them. The archive process is too naive to handle this, and thus those instances can never get purged. Patch coming for discussion shortly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/377933

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: Incomplete → In Progress
Revision history for this message
Dan Smith (danms) wrote :

FYI, using Derek's database dump:

dan@guaranine:~$ mysql -u root -pfoo nova < nova.sql
dan@guaranine:~$ nova-manage db sync >/dev/null 2>&1
dan@guaranine:~$ nova-manage db archive_deleted_rows --verbose --max_rows 100000
+-------------------------+-------------------------+
| Table | Number of Rows Archived |
+-------------------------+-------------------------+
| instance_actions_events | 39153 |
| reservations | 60847 |
+-------------------------+-------------------------+
dan@guaranine:~$ nova-manage db archive_deleted_rows --verbose --max_rows 100000
+--------------------------+-------------------------+
| Table | Number of Rows Archived |
+--------------------------+-------------------------+
| instance_info_caches | 7210 |
| instance_metadata | 7314 |
| instance_system_metadata | 71278 |
| reservations | 14198 |
+--------------------------+-------------------------+
dan@guaranine:~$ nova-manage db archive_deleted_rows --verbose --max_rows 100000
+----------------------+-------------------------+
| Table | Number of Rows Archived |
+----------------------+-------------------------+
| block_device_mapping | 8838 |
| instance_actions | 38991 |
| instance_extra | 12747 |
| instance_faults | 37 |
| instance_info_caches | 1631 |
| instances | 37756 |
+----------------------+-------------------------+
dan@guaranine:~$ nova-manage db archive_deleted_rows --verbose --max_rows 100000
+-----------+-------------------------+
| Table | Number of Rows Archived |
+-----------+-------------------------+
| instances | 41919 |
+-----------+-------------------------+
dan@guaranine:~$ nova-manage db archive_deleted_rows --verbose --max_rows 100000
Nothing was archived.
dan@guaranine:~$ echo "SELECT COUNT(1) FROM instances WHERE deleted!=0" | mysql -uroot -pfoo nova
COUNT(1)
0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/377933
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ceaf853894352b6d0ae12efe85ba5eb4e651e58a
Submitter: Jenkins
Branch: master

commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

    Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

    Closes-Bug: #1622545
    Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/378055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/378650

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/378650
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=76d1b24c00a4dc24c9bc3290fca513b5ece7247a
Submitter: Jenkins
Branch: stable/mitaka

commit 76d1b24c00a4dc24c9bc3290fca513b5ece7247a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

    Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

    Conflicts:
     nova/db/sqlalchemy/api.py (indentation change)

    Closes-Bug: #1622545
    Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18
    (cherry picked from commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/378055
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f15561b60957cb67a40c51b1f636a37b26c0205a
Submitter: Jenkins
Branch: stable/newton

commit f15561b60957cb67a40c51b1f636a37b26c0205a
Author: Dan Smith <email address hidden>
Date: Tue Sep 27 10:17:00 2016 -0700

    Archive instance-related rows when the parent instance is deleted

    This is something I expect has been very broken for a long time. We
    have rows in tables such as instance_extra, instance_faults, etc that
    pertain to a single instance, and thus have a foreign key on their
    instance_uuid column that points to the instance. If any of those
    records exist, an instance can not be archived out of the main
    instances table.

    The archive routine currently "handles" this by skipping over said
    instances, and eventually iterating over all the tables to pull out
    any records that point to that instance, thus freeing up the instance
    itself for archival. The problem is, this only happens if those extra
    records are actually marked as deleted themselves. If we fail during
    a cleanup routine and leave some of them not marked as deleted, but
    where the instance they reference *is* marked as deleted, we will
    never archive them.

    This patch adds another phase of the archival process for any table
    that has an "instance_uuid" column, which attempts to archive records
    that point to these deleted instances. With this, using a very large
    real world sample database, I was able to archive my way down to
    zero deleted, un-archivable instances (from north of 100k).

    Closes-Bug: #1622545
    Change-Id: I77255c77780f0c2b99d59a9c20adecc85335bb18
    (cherry picked from commit ceaf853894352b6d0ae12efe85ba5eb4e651e58a)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.1

This issue was fixed in the openstack/nova 14.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b1

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.