Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

Bug #1680918 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Steven Webster
Newton
Fix Committed
High
Dan Smith
Ocata
Fix Committed
High
Dan Smith

Bug Description

Description
===========
If a Nova DB is upgraded (migrated) while containing PCI devices with device type 'type-PF' or 'type-PCI',
a validation error similar to this will be thrown:

"ValidationError: There are still 2 unmigrated records in the pci_devices table. Migration cannot continue until all records have been migrated."

The error is generated by the 330_enforce_mitaka_online_migrations.py upgrade script.

The PCI device migration validation will fail if any PCI device entries without a populated parent_addr are found. However, the parent_addr really only applies to PCI device entries of 'type-VF' (ie. SRIOV virtual functions)

This is an example of what the pci_devices table looks like with SRIOV enabled PCI devices if the appropriate entries are whitelisted in nova.conf:

MariaDB [nova]> select * from pci_devices;
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 1 | 1 | 0000:05:10.1 | 10ed | 8086 | type-VF | pci_0000_05_10_1 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 2 | 1 | 0000:05:10.3 | 10ed | 8086 | type-VF | pci_0000_05_10_3 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 3 | 1 | 0000:05:10.5 | 10ed | 8086 | type-VF | pci_0000_05_10_5 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 4 | 1 | 0000:05:10.7 | 10ed | 8086 | type-VF | pci_0000_05_10_7 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:53:13 | NULL | NULL | 0 | 5 | 1 | 0000:05:00.0 | 10fb | 8086 | type-PF | pci_0000_05_00_0 | label_8086_10fb | available | {} | NULL | NULL | 0 | NULL |
| 2017-04-06 21:53:13 | NULL | NULL | 0 | 6 | 1 | 0000:05:00.1 | 10fb | 8086 | type-PF | pci_0000_05_00_1 | label_8086_10fb | available | {} | NULL | NULL | 0 | NULL |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
6 rows in set (0.00 sec)

I think the upgrade script should be checking the PciDevice dev_type field for 'type-VF' when validating the parent_addr.

Steps to reproduce
==================

1. Install a Mitaka control node and edit the nova.conf file to include 1 or more PCI devices in the pci_passthrough_whitelist. ie:

pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

2. Install a second Newton or newer control node and edit the nova.conf to point to the SQL database of the Mitaka node. ie:

[database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova?charset=utf8

[api_database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova_api?charset=utf8

3. From the new control node, issue the following command:

nova-manage db sync

Expected result
===============
Database migration/upgrade should succeed

Actual result
=============
A ValidationError, similar to:

"ValidationError: There are still <N> unmigrated records in the pci_devices table. Migration cannot continue until all records have been migrated."

Environment
===========
1. Exact version of OpenStack you are running. See the following
   Mitaka (old node) Newton, Ocata (new node)

2. Which hypervisor did you use?
   libvirt + kvm

2. Which storage type did you use?
   lvm

3. Which networking type did you use?
   Neutron, OVS

Logs & Configs
==============
See attached

Tags: pci
Revision history for this message
Steven Webster (swebster-wr) wrote :
Changed in nova:
assignee: nobody → Steven Webster (swebster-wr)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/456397

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/456397
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Submitter: Jenkins
Branch: master

commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/458667

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/458668

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/458668
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad12fa65f5cc06bdee52be49b7370d703855d618
Submitter: Jenkins
Branch: stable/newton

commit ad12fa65f5cc06bdee52be49b7370d703855d618
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11
    (cherry picked from commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093)
    (cherry picked from commit c23c5e9f747e7127497dfd77ca0b33df2be74a2d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/458667
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c23c5e9f747e7127497dfd77ca0b33df2be74a2d
Submitter: Jenkins
Branch: stable/ocata

commit c23c5e9f747e7127497dfd77ca0b33df2be74a2d
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11
    (cherry picked from commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.4

This issue was fixed in the openstack/nova 15.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.6

This issue was fixed in the openstack/nova 14.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b2

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.