Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

Bug #1680918 reported by Steven Webster on 2017-04-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Steven Webster
Newton
High
Dan Smith
Ocata
High
Dan Smith

Bug Description

Description
===========
If a Nova DB is upgraded (migrated) while containing PCI devices with device type 'type-PF' or 'type-PCI',
a validation error similar to this will be thrown:

"ValidationError: There are still 2 unmigrated records in the pci_devices table. Migration cannot continue until all records have been migrated."

The error is generated by the 330_enforce_mitaka_online_migrations.py upgrade script.

The PCI device migration validation will fail if any PCI device entries without a populated parent_addr are found. However, the parent_addr really only applies to PCI device entries of 'type-VF' (ie. SRIOV virtual functions)

This is an example of what the pci_devices table looks like with SRIOV enabled PCI devices if the appropriate entries are whitelisted in nova.conf:

MariaDB [nova]> select * from pci_devices;
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 1 | 1 | 0000:05:10.1 | 10ed | 8086 | type-VF | pci_0000_05_10_1 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 2 | 1 | 0000:05:10.3 | 10ed | 8086 | type-VF | pci_0000_05_10_3 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 3 | 1 | 0000:05:10.5 | 10ed | 8086 | type-VF | pci_0000_05_10_5 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL | 0 | 4 | 1 | 0000:05:10.7 | 10ed | 8086 | type-VF | pci_0000_05_10_7 | label_8086_10ed | available | {} | NULL | NULL | 0 | 0000:05:00.1 |
| 2017-04-06 21:53:13 | NULL | NULL | 0 | 5 | 1 | 0000:05:00.0 | 10fb | 8086 | type-PF | pci_0000_05_00_0 | label_8086_10fb | available | {} | NULL | NULL | 0 | NULL |
| 2017-04-06 21:53:13 | NULL | NULL | 0 | 6 | 1 | 0000:05:00.1 | 10fb | 8086 | type-PF | pci_0000_05_00_1 | label_8086_10fb | available | {} | NULL | NULL | 0 | NULL |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
6 rows in set (0.00 sec)

I think the upgrade script should be checking the PciDevice dev_type field for 'type-VF' when validating the parent_addr.

Steps to reproduce
==================

1. Install a Mitaka control node and edit the nova.conf file to include 1 or more PCI devices in the pci_passthrough_whitelist. ie:

pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

2. Install a second Newton or newer control node and edit the nova.conf to point to the SQL database of the Mitaka node. ie:

[database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova?charset=utf8

[api_database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova_api?charset=utf8

3. From the new control node, issue the following command:

nova-manage db sync

Expected result
===============
Database migration/upgrade should succeed

Actual result
=============
A ValidationError, similar to:

"ValidationError: There are still <N> unmigrated records in the pci_devices table. Migration cannot continue until all records have been migrated."

Environment
===========
1. Exact version of OpenStack you are running. See the following
   Mitaka (old node) Newton, Ocata (new node)

2. Which hypervisor did you use?
   libvirt + kvm

2. Which storage type did you use?
   lvm

3. Which networking type did you use?
   Neutron, OVS

Logs & Configs
==============
See attached

Tags: pci Edit Tag help
Steven Webster (swebster-wr) wrote :
Changed in nova:
assignee: nobody → Steven Webster (swebster-wr)
status: New → In Progress
Matt Riedemann (mriedem) on 2017-04-12
Changed in nova:
importance: Undecided → High

Reviewed: https://review.openstack.org/456397
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Submitter: Jenkins
Branch: master

commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/458668
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad12fa65f5cc06bdee52be49b7370d703855d618
Submitter: Jenkins
Branch: stable/newton

commit ad12fa65f5cc06bdee52be49b7370d703855d618
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11
    (cherry picked from commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093)
    (cherry picked from commit c23c5e9f747e7127497dfd77ca0b33df2be74a2d)

Reviewed: https://review.openstack.org/458667
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c23c5e9f747e7127497dfd77ca0b33df2be74a2d
Submitter: Jenkins
Branch: stable/ocata

commit c23c5e9f747e7127497dfd77ca0b33df2be74a2d
Author: Steven Webster <email address hidden>
Date: Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices

    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function). PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr. If any of those records
    are present on upgrade, the validation will fail.

    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated

    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11
    (cherry picked from commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093)

This issue was fixed in the openstack/nova 15.0.4 release.

This issue was fixed in the openstack/nova 14.0.6 release.

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers