Post instance evacuation, image metadata is not retained when using shared storage

Bug #1562681 reported by guo.lei on 2016-03-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Kashyap Chamarthy
Liberty
Medium
Tony Breeds
Mitaka
Medium
Kashyap Chamarthy

Bug Description

when we boot a instance with image , and the image has metadata, eg. hw_qemu_guest_agent=yes.that means the instance enable qemu-guest-agent,but the qemu-guest-agent was disappeared after the instance evacuated.

nova evacuate 10836590-b600-429c-a106-3e67b14f572e node58 --on-shared-storage

# dpkg -l |grep nova
ii nova-common 2:12.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 2:12.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:12.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:12.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python-nova 2:12.0.1-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 2:2.30.1-1~cloud0 all client library for OpenStack Compute API

guo.lei (403554373-w) wrote :
tags: added: agent guest monitor qemu with
tags: added: qemu-guest-agent set-password windows
removed: agent guest qemu with
guo.lei (403554373-w) wrote :

after evacuate qemu.guest.agent sock disappear in xml

Changed in nova:
assignee: nobody → guo.lei (403554373-w)
status: New → Confirmed
status: Confirmed → New
status: New → Confirmed
guo.lei (403554373-w) on 2016-03-28
description: updated
description: updated
description: updated
guo.lei (403554373-w) wrote :
guo.lei (403554373-w) on 2016-03-28
description: updated
Sarafraj Singh (sarafraj-singh) wrote :

guo.lei,
Are you also working on a fix? Change status->Inprogress if you are, otherwise change assign->nobody

guo.lei (403554373-w) on 2016-04-19
Changed in nova:
status: Confirmed → In Progress
Kashyap Chamarthy (kashyapc) wrote :
Download full text (3.9 KiB)

I just did a test with yestearday's Nova Git (output of `git describe`:
"13.0.0-500-g654181b" when I tested), seems like during instance
evacuate (`nova evacuate vm1 target-host`), QEMU guest agent metadata
for the evauated instance _does_ seem to persist the destination
Compute node.

Test procedure
--------------

(1) Ensure `qemu-guest-agent` is running on both, source _and_ target
    Compute nodes:

    $ systemctl status qemu-guest-agent
    * qemu-guest-agent.service - QEMU Guest Agent
       Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static; vendor preset: enabled)
       Active: active (running) since Wed 2016-04-20 05:16:59 EDT; 2h 10min ago
    [...]

(2) Update Glance image metadata property to have
    'hw_qemu_guest_agent=yes':

    $ glance image-update 9c915a2c-5c74-4274-aca4-112d322618dd \
        --property hw_qemu_guest_agent=yes

(3) Boot an instance (and ensure the instance is active):

    $ nova boot --flavor 1 --key_name oskey1 --image \
        9c915a2c-5c74-4274-aca4-112d322618dd cirrvm1

(4) Check for the QEMU guest agent libvirt XML snippet for the instance:

    $ sudo virsh dumpxml 06e11b94-d178-4a3a-99cc-ce76b3738579 | grep agent -A3 -B1
        <channel type='unix'>
          <source mode='bind' path='/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000003.sock'/>
          <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
          <alias name='channel0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>

(5) Force-down the 'nova-compute' binary on the host where source
    Compute node is running:

    $ nova service-force-down devstack1 nova-compute
    +-----------+--------------+-------------+
    | Host | Binary | Forced down |
    +-----------+--------------+-------------+
    | devstack1 | nova-compute | True |
    +-----------+--------------+-------------+

(6) 'nova-compute' binary is marked down on devstack1 (where the first
    Compute node is running):

    $ nova service-list
    +----+----------------+-----------+----------+---------+-------+----------------------------+-----------------+
    | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
    +----+----------------+-----------+----------+---------+-------+----------------------------+-----------------+
    | 3 | nova-conductor | devstack1 | internal | enabled | up | 2016-04-20T11:14:10.000000 | - |
    | 4 | nova-cert | devstack1 | internal | enabled | up | 2016-04-20T11:14:07.000000 | - |
    | 5 | nova-scheduler | devstack1 | internal | enabled | up | 2016-04-20T11:14:08.000000 | - |
    | 6 | nova-compute | devstack1 | nova | enabled | down | 2016-04-20T11:14:04.000000 | - |
    | 7 | nova-compute | devstack2 | nova | enabled | up | 2016-04-20T11:14:05.000000 | - |
    +----+----------------+-----------+----------+---------+-------+----------------------------+-----------------+

(7) Perform the evacuation of the Nova instance ('cirrvm1') to the
    destinatio...

Read more...

Sylvain Bauza (sylvain-bauza) wrote :

Looking at https://github.com/openstack/nova/blob/0315d46766fa4357c6608541be39aada4eb5941d/nova/compute/manager.py#L2878-L2890

It seems that when the instance is on shared storage, we're creating a new ImageMeta object from an empty dict, instead of trying to get the older image metadata (only if image_ref is None)

But, there is a flaw, evacuate is called by the compute API with setting image_ref=None.
https://github.com/openstack/nova/blob/6464c0a4c788df816b33a63c6d2bf2c61349f052/nova/compute/api.py#L3514

So there is litterally no way to get the previous image_ref when evacuating a shared-storage-backed instance because we don't use it.

Changed in nova:
importance: Undecided → Low
importance: Low → Medium
Kashyap Chamarthy (kashyapc) wrote :

Thanks Sylvain for the analysis.

Just to note for clarity, I forgot to make a distinction in my test: I tested _without_ shared storage.

tags: added: compute low-hanging-fruit
removed: monitor set-password windows
summary: - instance evacuate without image's metadata
+ Post instance evacuation, image metadata is not retained

Fix proposed to branch: master
Review: https://review.openstack.org/309440

Changed in nova:
assignee: guo.lei (403554373-w) → Kashyap Chamarthy (kashyapc)
Changed in nova:
assignee: Kashyap Chamarthy (kashyapc) → Diana Clarke (diana-clarke)
Changed in nova:
assignee: Diana Clarke (diana-clarke) → nobody
Matt Riedemann (mriedem) on 2016-05-02
summary: - Post instance evacuation, image metadata is not retained
+ Post instance evacuation, image metadata is not retained when using
+ shared storage
Diana Clarke (diana-clarke) wrote :

This bug should be assigned back to @kashyap, but I can't do that.

Reviewed: https://review.openstack.org/309440
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=82098d06dbaf401966a70c873b8aa97e7eab4b10
Submitter: Jenkins
Branch: master

commit 82098d06dbaf401966a70c873b8aa97e7eab4b10
Author: Kashyap Chamarthy <email address hidden>
Date: Fri Apr 22 12:12:26 2016 +0200

    compute: Retain instance metadata for 'evacuate' on shared storage

    When performing instance evacuation (which is essentially "rebuild an
    instance elsewhere"), image metadata is not persistent -- this manifests
    only when instances are on shared storage.

    So, supply the original instance metadata to 'image_meta' parameter,
    instead of an empty dict.

    Change-Id: Ibea4954c149b9dcb162c5962ab8e9a4f17e51a1d
    Co-Authored-By: Diana Clarke <email address hidden>
    Closes-Bug: 1562681

Changed in nova:
status: In Progress → Fix Released
Changed in nova:
assignee: nobody → Kashyap Chamarthy (kashyapc)

Reviewed: https://review.openstack.org/311990
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=524d59e8c32e0eb1ef7fc86d989897d52180ab83
Submitter: Jenkins
Branch: stable/mitaka

commit 524d59e8c32e0eb1ef7fc86d989897d52180ab83
Author: Kashyap Chamarthy <email address hidden>
Date: Fri Apr 22 12:12:26 2016 +0200

    compute: Retain instance metadata for 'evacuate' on shared storage

    When performing instance evacuation (which is essentially "rebuild an
    instance elsewhere"), image metadata is not persistent -- this manifests
    only when instances are on shared storage.

    So, supply the original instance metadata to 'image_meta' parameter,
    instead of an empty dict.

    Change-Id: Ibea4954c149b9dcb162c5962ab8e9a4f17e51a1d
    Co-Authored-By: Diana Clarke <email address hidden>
    Closes-Bug: 1562681
    (cherry picked from commit 82098d06dbaf401966a70c873b8aa97e7eab4b10)

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/314012
Reason: Moved to https://review.openstack.org/#/c/315423/ to preserve Change-Id from master and mitaka

Reviewed: https://review.openstack.org/315423
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=45fad7807f9b66fd0e29acfc95145902b6454326
Submitter: Jenkins
Branch: stable/liberty

commit 45fad7807f9b66fd0e29acfc95145902b6454326
Author: Kashyap Chamarthy <email address hidden>
Date: Fri May 6 14:19:54 2016 +0200

    compute: Retain instance metadata for 'evacuate' on shared storage

    When performing instance evacuation (which is essentially "rebuild an
    instance elsewhere"), image metadata is not persistent -- this manifests
    only when instances are on shared storage.

    So, supply the original instance metadata to 'image_meta' parameter,
    instead of an empty dict.

    Liberty notes (compared to upstream Newton/Mitaka):

      - We get the image metadata from the instance object's
        system_metadata.
      - We're modifying the tests
        'test_rebuild_on_host_with_shared_storage()' and
        'test_on_shared_storage_not_provided_host_with_shared_storage()' to
        exercise the code change.

    Closes-Bug: #1562681
    Change-Id: Ibea4954c149b9dcb162c5962ab8e9a4f17e51a1d
    (cherry picked from commit 82098d06dbaf401966a70c873b8aa97e7eab4b10)
    (cherry picked from commit 524d59e8c32e0eb1ef7fc86d989897d52180ab83)

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

This issue was fixed in the openstack/nova 12.0.4 release.

This issue was fixed in the openstack/nova 13.1.0 release.

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/308719
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers