Unshelving an offloaded server with volume attachments may not attach to the guest in multi-cell env

Bug #1702932 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Dan Smith

Bug Description

This is based on code inspection currently but it looks like this should fail in the following case:

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723

When we attach a volume to a shelved offloaded server, we create the BDM in the API. If the API is configured to point at cell0, then the BDM would be created in cell0.

When we unshelve the instance, conductor asks the scheduler for a host (which is in some cell) and we build the instance in that cell. This could be a different cell because we currently don't restrict that in the conductor task manager when unshelving like we do for migrate:

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66

The fact we don't restrict where the instance goes when it's unshelved is a separate bug.

When unshelving the instance, it gets built on some compute and we pull the BDMs from the database configured for that cell (should be cell1, cell2, ..., cellN - some specific non-cell0 database).

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513

If the BDM was created in the API in cell0, it shouldn't come back from that query in the compute manager code.

What's most confusing about this is Tempest has tests for testing attach/detach a volume to a shelved offloaded instance:

https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148

And those are passing on the devstack change that runs with multiple cells and configures the API to use cell0 for the [database] section where the BDM would live:

https://review.openstack.org/#/c/473565/

Unless maybe that test is broken.

We are configured to run ssh validation in the gate jobs on master (pike) though, so the test is counting the number of partitions on the guest before and after the unshelve operation to see that they show up. It's also listing volume attachments for the instance after unshelve.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/481683

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: New → In Progress
Matt Riedemann (mriedem)
tags: added: cells shelve volumes
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Dan Smith (<email address hidden>) on branch: master
Review: https://review.openstack.org/481683

Revision history for this message
Matt Riedemann (mriedem) wrote :

Turns out this was invalid. Volume attach works for a shelved offloaded instance with cells v2 because the context is targeted to the cell that the target instance lives in when we lookup the instance in the API code, in nova.compute.api.API._get_instance. So when the BDM is created using that context, it's also created in the same cell as the instance.

Changed in nova:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.