nova-compute should stop handling virt lifecycle events when it's shutting down

Bug #1444630 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Juno
Fix Released
Medium
Vladik Romanovsky
Kilo
Fix Released
Medium
Matt Riedemann

Bug Description

This is a follow on to bug 1293480 and related to bug 1408176 and bug 1443186.

There can be a race when rebooting a compute host where libvirt is shutting down guest VMs and sending STOPPED lifecycle events up to nova compute which then tries to stop them via the stop API, which sometimes works and sometimes doesn't - the compute service can go down with a vm_state of ACTIVE and task_state of powering-off which isn't resolve on host reboot.

Sometimes the stop API completes and the instance is stopped with power_state=4 (shutdown) in the nova database. When the host comes back up and libvirt restarts, it starts up the guest VMs which sends the STARTED lifecycle event and nova handles that but because the vm_state in the nova database is STOPPED and the power_state is 1 (running) from the hypervisor, nova things it started up unexpectedly and stops it:

http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2015.1.0rc1#n6145

So nova shuts the running guest down.

Actually the block in:

http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2015.1.0rc1#n6145

conflicts with the statement in power_state.py:

http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/power_state.py?id=2015.1.0rc1#n19

"The hypervisor is always considered the authority on the status
of a particular VM, and the power_state in the DB should be viewed as a
snapshot of the VMs's state in the (recent) past."

Anyway, that's a different issue but the point is when nova-compute is shutting down it should stop accepting lifecycle events from the hypervisor (virt driver code) since it can't really reliably act on them anyway - we can leave any sync up that needs to happen in init_host() in the compute manager.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Matt Riedemann (mriedem)
tags: added: kilo-backport-potential
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.5 KiB)

Attaching some logs sent from someone with IBM that recreated this on Juno with a debug patch (https://review.openstack.org/#/c/169782/) for logging:

Hi, I finished another round of testing, this time all the VMs were in SHUTOFF state after hypervisor reboot (), here are the key time points in the log file:
13:41:47 Triggered hypervisor reboot, "Emitting event" arrived
13:45:33 Nova compute server started after hypervisor started up
13:46:25 Finished VM state sync up
For more details please check the attached log file: compute_kvm_reboot.2.log.zip
Thanks!

================================ Before host reboot: =============================
================= on kvm001 node Before KVM Reboot ===================
[root@hkg02kvm001ccz023 ˜]# date
Wed Apr 15 13:39:32 UTC 2015
[root@hkg02kvm001ccz023 ˜]# virsh list
Id Name State
----------------------------------------------------
3 instance-000000a2 running
4 instance-00000058 running

================= on controller node Before KVM Reboot ===================
[root@hkg02ops001ccz023 ˜]# date
Wed Apr 15 13:39:52 UTC 2015
[root@hkg02ops001ccz023 ˜]# nova list
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| e53dcdcd-1e19-4a89-8648-1373b4e29e6a | zy001 | ACTIVE | - | Running | Shared-Custom-Network1=192.168.100.18 |
| 3bcdec02-bb42-4eb7-bfca-eca1686f735b | zy002 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.19 |
| e0638150-6ef0-4e98-884d-fb4cfda140a3 | zy004 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.21 |
| 793cd8ba-fcb2-4e42-8b83-fcb8bdf519e6 | zy005 | ACTIVE | - | Running | Shared-Custom-Network1=192.168.100.25 |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+

================================ After host reboot: =============================
================= on kvm001 After KVM Reboot ===================
[root@hkg02kvm001ccz023 ˜]# date
Wed Apr 15 13:47:46 UTC 2015
[root@hkg02kvm001ccz023 ˜]# virsh list
Id Name State
----------------------------------------------------

[root@hkg02kvm001ccz023 ˜]# nova list
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| e53dcdcd-1e19-4a89-8648-1373b4e29e6a | zy001 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.18 |
| 3bcdec02-bb42-4eb7-bfca-eca1686f735b | zy002 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.19...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.5 KiB)

Attaching some logs sent from someone with IBM that recreated this on Juno with a debug patch (https://review.openstack.org/#/c/169782/) for logging:

Hi, I finished another round of testing, this time all the VMs were in SHUTOFF state after hypervisor reboot (), here are the key time points in the log file:
13:41:47 Triggered hypervisor reboot, "Emitting event" arrived
13:45:33 Nova compute server started after hypervisor started up
13:46:25 Finished VM state sync up
For more details please check the attached log file: compute_kvm_reboot.2.log.zip
Thanks!

================================ Before host reboot: =============================
================= on kvm001 node Before KVM Reboot ===================
[root@hkg02kvm001ccz023 ˜]# date
Wed Apr 15 13:39:32 UTC 2015
[root@hkg02kvm001ccz023 ˜]# virsh list
Id Name State
----------------------------------------------------
3 instance-000000a2 running
4 instance-00000058 running

================= on controller node Before KVM Reboot ===================
[root@hkg02ops001ccz023 ˜]# date
Wed Apr 15 13:39:52 UTC 2015
[root@hkg02ops001ccz023 ˜]# nova list
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| e53dcdcd-1e19-4a89-8648-1373b4e29e6a | zy001 | ACTIVE | - | Running | Shared-Custom-Network1=192.168.100.18 |
| 3bcdec02-bb42-4eb7-bfca-eca1686f735b | zy002 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.19 |
| e0638150-6ef0-4e98-884d-fb4cfda140a3 | zy004 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.21 |
| 793cd8ba-fcb2-4e42-8b83-fcb8bdf519e6 | zy005 | ACTIVE | - | Running | Shared-Custom-Network1=192.168.100.25 |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+

================================ After host reboot: =============================
================= on kvm001 After KVM Reboot ===================
[root@hkg02kvm001ccz023 ˜]# date
Wed Apr 15 13:47:46 UTC 2015
[root@hkg02kvm001ccz023 ˜]# virsh list
Id Name State
----------------------------------------------------

[root@hkg02kvm001ccz023 ˜]# nova list
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+---------+------------+-------------+---------------------------------------+
| e53dcdcd-1e19-4a89-8648-1373b4e29e6a | zy001 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.18 |
| 3bcdec02-bb42-4eb7-bfca-eca1686f735b | zy002 | SHUTOFF | - | Shutdown | Shared-Custom-Network1=192.168.100.19...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/174069

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/174069
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0
Submitter: Jenkins
Branch: master

commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 15 11:51:26 2015 -0700

    compute: stop handling virt lifecycle events in cleanup_host()

    When rebooting a compute host, guest VMs can be getting shutdown
    automatically by the hypervisor and the virt driver is sending events to
    the compute manager to handle them. If the compute service is still up
    while this happens it will try to call the stop API to power off the
    instance and update the database to show the instance as stopped.

    When the compute service comes back up and events come in from the virt
    driver that the guest VMs are running, nova will see that the vm_state
    on the instance in the nova database is STOPPED and shut down the
    instance by calling the stop API (basically ignoring what the virt
    driver / hypervisor tells nova is the state of the guest VM).

    Alternatively, if the compute service shuts down after changing the
    intance task_state to 'powering-off' but before the stop API cast is
    complete, the instance can be in a strange vm_state/task_state
    combination that requires the admin to manually reset the task_state to
    recover the instance.

    Let's just try to avoid some of this mess by disconnecting the event
    handling when the compute service is shutting down like we do for
    neutron VIF plugging events. There could still be races here if the
    compute service is shutting down after the hypervisor (e.g. libvirtd),
    but this is at least a best attempt to do the mitigate the potential
    damage.

    Closes-Bug: #1444630
    Related-Bug: #1293480
    Related-Bug: #1408176

    Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/174477

tags: added: kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/174477
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b19764d2c6a8160102a806c1d6811c4182a8bac8
Submitter: Jenkins
Branch: stable/kilo

commit b19764d2c6a8160102a806c1d6811c4182a8bac8
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 15 11:51:26 2015 -0700

    compute: stop handling virt lifecycle events in cleanup_host()

    When rebooting a compute host, guest VMs can be getting shutdown
    automatically by the hypervisor and the virt driver is sending events to
    the compute manager to handle them. If the compute service is still up
    while this happens it will try to call the stop API to power off the
    instance and update the database to show the instance as stopped.

    When the compute service comes back up and events come in from the virt
    driver that the guest VMs are running, nova will see that the vm_state
    on the instance in the nova database is STOPPED and shut down the
    instance by calling the stop API (basically ignoring what the virt
    driver / hypervisor tells nova is the state of the guest VM).

    Alternatively, if the compute service shuts down after changing the
    intance task_state to 'powering-off' but before the stop API cast is
    complete, the instance can be in a strange vm_state/task_state
    combination that requires the admin to manually reset the task_state to
    recover the instance.

    Let's just try to avoid some of this mess by disconnecting the event
    handling when the compute service is shutting down like we do for
    neutron VIF plugging events. There could still be races here if the
    compute service is shutting down after the hypervisor (e.g. libvirtd),
    but this is at least a best attempt to do the mitigate the potential
    damage.

    Closes-Bug: #1444630
    Related-Bug: #1293480
    Related-Bug: #1408176

    Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9
    (cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)

Thierry Carrez (ttx)
tags: removed: kilo-backport-potential kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/159275
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d09785b97a282e8538642f6f8bcdd8491197ed74
Submitter: Jenkins
Branch: master

commit d09785b97a282e8538642f6f8bcdd8491197ed74
Author: Matt Riedemann <email address hidden>
Date: Wed Feb 25 14:13:45 2015 -0800

    Add config option to disable handling virt lifecycle events

    Historically the _sync_power_states periodic task has had the potential
    for race conditions and several changes have been made to try and
    tighten up this code:

    cc5388bbe81aba635fb757e202d860aeed98f3e8
    aa1792eb4c1d10e9a192142ce7e20d37871d916a
    baabab45e0ae0e9e35872cae77eb04bdb5ee0545
    bd8329b34098436d18441a8129f3f20af53c2b91

    The handle_lifecycle_events method which gets power state change events
    from the compute driver (currently only implemented by the libvirt
    driver) and calls _sync_instance_power_state - the same method that the
    _sync_power_states periodic task uses, except the periodic task at least
    locks when it's running - expands the scope for race problems in the
    compute manager so cloud providers should be able to turn it off. It is
    also known to have races with reboot where rebooted instances are
    automatically shutdown because of delayed lifecycle events that the
    instance is stopped even though it's running.

    This is consistent with the view that Nova should manage it's own state
    and not rely on external events telling it what to do about state
    changes. For example, in _sync_instance_power_state, if the Nova
    database thinks an instance is stopped but the hypervisor says it's
    running, the compute manager issues a force-stop on the instance.

    Also, although not documented (at least from what I can find), Nova has
    historically held a stance that it does not support out-of-band
    discovery and management of instances, so allowing external events to
    change state somewhat contradicts that stance and should be at least a
    configurable deployment option.

    DocImpact: New config option "handle_virt_lifecycle_events" in the
               DEFAULT group of nova.conf. By default the value is True
               so there is no upgrade impact or change in functionality.

    Related-Bug: #1293480
    Partial-Bug: #1443186
    Partial-Bug: #1444630

    Change-Id: I26a1bc70939fb40dc38e9c5c43bf58ed1378bcc7

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179284

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (18.1 KiB)

Reviewed: https://review.openstack.org/179284
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5228d4e418734164ffa5ccd91d2865d9cc659c00
Submitter: Jenkins
Branch: master

commit 906ab9d6522b3559b4ad36d40dec3af20397f223
Author: He Jie Xu <email address hidden>
Date: Thu Apr 16 07:09:34 2015 +0800

    Update rpc version aliases for kilo

    Update all of the rpc client API classes to include a version alias
    for the latest version implemented in Kilo. This alias is needed when
    doing rolling upgrades from Kilo to Liberty. With this in place, you can
    ensure all services only send messages that both Kilo and Liberty will
    understand.

    Closes-Bug: #1444745

    Conflicts:
     nova/conductor/rpcapi.py

    NOTE(alex_xu): The conflict is due to there are some logs already added
    into the master.

    Change-Id: I2952aec9aae747639aa519af55fb5fa25b8f3ab4
    (cherry picked from commit 78a8b5802ca148dcf37c5651f75f2126d261266e)

commit f191a2147a21c7e50926b288768a96900cf4c629
Author: Hans Lindgren <email address hidden>
Date: Fri Apr 24 13:10:39 2015 +0200

    Add security group calls missing from latest compute rpc api version bump

    The recent compute rpc api version bump missed out on the security group
    related calls that are part of the api.

    One possible reason is that both compute and security group client side
    rpc api:s share a single target, which is of little value and only cause
    mistakes like this.

    This change eliminates future problems like this by combining them into
    one to get a 1:1 relationship between client and server api:s.

    Change-Id: I9207592a87fab862c04d210450cbac47af6a3fd7
    Closes-Bug: #1448075
    (cherry picked from commit bebd00b117c68097203adc2e56e972d74254fc59)

commit a2872a9262985bd0ee2c6df4f7593947e0516406
Author: Dan Smith <email address hidden>
Date: Wed Apr 22 09:02:03 2015 -0700

    Fix migrate_flavor_data() to catch instances with no instance_extra rows

    The way the query was being performed previously, we would not see any
    instances that didn't have a row in instance_extra. This could happen if
    an instance hasn't been touched for several releases, or if the data
    set is old.

    The fix is a simple change to use outerjoin instead of join. This patch
    includes a test that ensures that instances with no instance_extra rows
    are included in the migration. If we query an instance without such a
    row, we create it before doing a save on the instance.

    Closes-Bug: #1447132
    Change-Id: I2620a8a4338f5c493350f26cdba3e41f3cb28de7
    (cherry picked from commit 92714accc49e85579f406de10ef8b3b510277037)

commit e3a7b83834d1ae2064094e9613df75e3b07d77cd
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 23 02:18:41 2015 +0000

    Updated from global requirements

    Change-Id: I5d4acd36329fe2dccb5772fed3ec55b442597150

commit 8c9b5e620eef3233677b64cd234ed2551e6aa182
Author: Divya <email address hidden>
Date: Tue Apr 21 08:26:29 2015 +0200

    Control create/delete flavor api permissions using policy.json

    The permissions of ...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/192244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/192244
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7bc4be781564c6b9e7a519aecea84ddbee6bd935
Submitter: Jenkins
Branch: stable/juno

commit 7bc4be781564c6b9e7a519aecea84ddbee6bd935
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 15 11:51:26 2015 -0700

    compute: stop handling virt lifecycle events in cleanup_host()

    When rebooting a compute host, guest VMs can be getting shutdown
    automatically by the hypervisor and the virt driver is sending events to
    the compute manager to handle them. If the compute service is still up
    while this happens it will try to call the stop API to power off the
    instance and update the database to show the instance as stopped.

    When the compute service comes back up and events come in from the virt
    driver that the guest VMs are running, nova will see that the vm_state
    on the instance in the nova database is STOPPED and shut down the
    instance by calling the stop API (basically ignoring what the virt
    driver / hypervisor tells nova is the state of the guest VM).

    Alternatively, if the compute service shuts down after changing the
    intance task_state to 'powering-off' but before the stop API cast is
    complete, the instance can be in a strange vm_state/task_state
    combination that requires the admin to manually reset the task_state to
    recover the instance.

    Let's just try to avoid some of this mess by disconnecting the event
    handling when the compute service is shutting down like we do for
    neutron VIF plugging events. There could still be races here if the
    compute service is shutting down after the hypervisor (e.g. libvirtd),
    but this is at least a best attempt to do the mitigate the potential
    damage.

    Closes-Bug: #1444630
    Related-Bug: #1293480
    Related-Bug: #1408176

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/compute/test_compute_mgr.py

    Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9
    (cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
Revision history for this message
Marian Horban (mhorban) wrote :

Libvirt event threads are not stopped during stopping of nova-compute service. That'w why during restart nova-compute with SIGHUP signal we can see traceback:

2015-11-30 10:03:06.013 INFO nova.service [-] Starting compute node (version 13.0.0)
2015-11-30 10:03:06.013 DEBUG nova.virt.libvirt.host [-] Starting native event thread from (pid=17505) _init_events /opt/stack/nova/nova/virt/libvirt/host.py:452
2015-11-30 10:03:06.014 DEBUG nova.virt.libvirt.host [-] Starting green dispatch thread from (pid=17505) _init_events /opt/stack/nova/nova/virt/libvirt/host.py:458
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/utils.py", line 1158, in context_wrapper
    return func(*args, **kwargs)
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 248, in _dispatch_thread
    self._dispatch_events()
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 353, in _dispatch_events
    assert _c
AssertionError
Removing descriptor: 9

Started threads should be stopped during stopping of nova-compute service

Changed in nova:
status: Fix Released → In Progress
assignee: Matt Riedemann (mriedem) → Marian Horban (mhorban)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/259066

Changed in nova:
assignee: Marian Horban (mhorban) → nobody
status: In Progress → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/259066
Reason: Please open a new bug for tracking this rather than re-opening something that was already marked as fixed.

Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → Fix Released
assignee: nobody → Matt Riedemann (mriedem)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.