Stop and Delete operations should give the Guest a chance to shutdown

Bug #1196924 reported by Phil Day on 2013-07-02
48
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Phil Day
nova (Ubuntu)
Undecided
Liang Chen
Trusty
Undecided
Liang Chen

Bug Description

This feature will cause an ACPI event to be sent to the system while shutting down, and the acpid running inside the system can catch the event, thus giving the system a chance to shutdown cleanly.

[Impact]

 * VMs being shutdown with any signal/notification from the The hypervisor level, services running inside VMs have no chance to perform a clean shutoff

[Test Case]

 * 1. stop a VM
   2. the VM is shutdown without any notification

The can be easily seen by ssh into the system before shutting down. With the patch in place, the ssh session will be close during shutdown, because the sshd has the chance to close the connection before being brought down. Without the patch, the ssh session will just hang there for a while until timeout, because the connection is not promptly closed.

To leverage the clean shutdown feature, one can create a file named /etc/acpi/events/power that contains the following:

              event=button/power
              action=/etc/acpi/power.sh "%e"

Then create a file named /etc/acpi/power.sh that contains whatever required to gracefully shutdown a particular server (VM).
With the apicd running, shutdown of the VM will cause the rule in /etc/acpi/events/power to trigger the script in /etc/acpi/power.sh, thus cleanly shutdown the system.

[Regression Potential]

 * none

Currently in libvirt stop and delete operations simply destroy the underlying VM. Some GuestOS's do not react well to this type of power failure, and it would be better if these operations followed the same approach a a soft_reboot and give the guest a chance to shutdown gracefully. Even where VM is being deleted, it may be booted from a volume which will be reused on another server.

Related branches

Phil Day (philip-day) on 2013-07-02
Changed in nova:
assignee: nobody → Phil Day (philip-day)

Fix proposed to branch: master
Review: https://review.openstack.org/35303

Changed in nova:
status: New → In Progress

Reviewed: https://review.openstack.org/35303
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b32d01d44ca5711c96d192df51bf7acd34f52556
Submitter: Jenkins
Branch: master

commit b32d01d44ca5711c96d192df51bf7acd34f52556
Author: Phil Day <email address hidden>
Date: Tue Jul 2 15:32:57 2013 +0100

    Stop, Rescue, and Delete should give guest a chance to shutdown

    Currently in libvirt stop, shelve, rescue, and delete simply
    destroy the underlying VM. Some GuestOS's do not react well to this
    type of power failure, and so it would be better if these operations
    followed the same approach as soft_reboot and give the guest as
    chance to shutdown gracefully. Even where VM is being deleted,
    it may be booted from a volume which will be reused on another
    server.

    The change is implemented by adding a clean_shutdown parameter
    to the relevant methods from the compute/api layer downwards
    and into the virt drivers. The implementation in the libvirt
    driver is also provided. Other drivers are modified just to
    expect the additional parameter.

    The timer configuration value previous used by soft_reboot in
    libvirt is moved up to virt/driver so it can be used by other drivers.

    The timer logic itself is changed from simple loop counting with one
    second sleeps to a more precise approach, since testing showed that
    other calls in the loop could introduce a difference of up to +50% on
    the expected duration. This can extent the default two minute to
    three minutes, which would not be acceptable in some environments
    and breaks some of the tempest tests.

    A separate config value defines what the default shutdown
    behaviour for delete should be (default False to keep compatibility
    with current behaviour).

    This code is structured to enable a subsequent API change to add
    clean/forced options to the stop and delete methods

    Also as a minor tidy-up moved the repeated definition of
    FakeLoopingCall in test_libvirt be common across tests

    Partially-Implements: blueprint user-defined-shutdown
    Closes-Bug: #1196924

    DocImpact

    Change-Id: Ie69aa2621cb52d6fefdc19f664247b069d6782ee

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-2
Thierry Carrez (ttx) on 2014-01-22
Changed in nova:
status: Fix Committed → Fix Released
Phil Day (philip-day) wrote :

The associated change was reverted as it extended the duration of the gate too much

Changed in nova:
status: Fix Released → In Progress
Thierry Carrez (ttx) on 2014-01-22
Changed in nova:
milestone: icehouse-2 → icehouse-3
Thierry Carrez (ttx) on 2014-03-05
Changed in nova:
milestone: icehouse-3 → icehouse-rc1
Tracy Jones (tjones-i) on 2014-03-07
tags: added: compute
Tracy Jones (tjones-i) on 2014-03-07
Changed in nova:
milestone: icehouse-rc1 → next
tags: added: icehouse-rc-potential
Thierry Carrez (ttx) on 2014-04-01
tags: added: icehouse-backport-potential
removed: icehouse-rc-potential
Mattieu Puel (vodmat-news) wrote :

What would you think of a --soft switch on "nova stop" command ? This would still allow for a classic power off.

Phil Day (philip-day) wrote :

Hi,

Yep, there is a blueprint up for review which which includes that as part of the solution

https://review.openstack.org/#/c/89650

Changed in nova:
assignee: Phil Day (philip-day) → Claudiu Belu (cbelu)
Phil Day (philip-day) wrote :

Asigned back to me as I am activeky working on delivering the associated BP:

https://review.openstack.org/#/c/68942/

Reviews welcome :-)

Changed in nova:
assignee: Claudiu Belu (cbelu) → Phil Day (philip-day)

Reviewed: https://review.openstack.org/68942
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c07ed15415c0ec3c5862f437f440632eff1e94df
Submitter: Jenkins
Branch: master

commit c07ed15415c0ec3c5862f437f440632eff1e94df
Author: Phil Day <email address hidden>
Date: Fri Jan 24 15:43:20 2014 +0000

    Power off commands should give guests a chance to shutdown

    Currently in libvirt operations which power off an instance such as stop,
    shelve, rescue, and resize simply destroy the underlying VM. Some
    GuestOS's do not react well to this type of power failure, and so it would
    be better if these operations followed the same approach as soft_reboot
    and give the guest as chance to shutdown gracefully.

    The shutdown behavior is defined by two values:

    - shutdown_timeout defines the overall period a Guest is allowed to
    complete it's shutdown. The default valus is set via nova.conf and can be
    overridden on a per image basis by image metadata allowing different types
    of guest OS to specify how long they need to shutdown cleanly.

    - shutdown_retry_interval defines how frequently within that period
    the Guest will be signaled to shutdown. This is a protection against
    guests that may not be ready to process the shutdown signal when it
    is first issued. (e.g. still booting). This is defined as a constant.

    This is one of a set of changes that will eventually expose the choice
    of whether to give the GuestOS a chance to shutdown via the API.

    This change implements the libvirt changes to power_off() and adds
    a clean shutdown to compute.manager.stop().

    Subsequent patches will:
    - Add clean shutdown to Shelve
    - Add clean shutdown to Rescue
    - Convert soft_reboot to use the same approach
    - Expose clean shutdown via rpcapi
    - Expose clean shutdown via API

    Partially-Implements: blueprint user-defined-shutdown
    Closes-Bug: #1196924
    DocImpact

    Change-Id: I432b0b0c09db82797f28deb5617f02ee45a4278c

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2014-10-07
Changed in nova:
milestone: next → none
status: Fix Committed → Fix Released
santhi swaroop (leoswaroop87) wrote :

HI ,

i am currently using the Nova 2.17.0 version, is this fix available in that version?

melanie witt (melwitt) wrote :

Hi santhi, 2.17.0 is a version of the novaclient, but this fix is in the nova server, release 2014.2.b3 and later.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/151515
Reason: This is too risky for stable/icehouse at this point, we're just about to end of life icehouse so we really can only take security/critical and/or trivial fixes for high impact bugs at this point.

Liang Chen (cbjchen) on 2015-07-01
Changed in nova (Ubuntu):
assignee: nobody → Liang Chen (cbjchen)
Liang Chen (cbjchen) on 2015-07-01
description: updated

The attachment "trusty nova patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Liang Chen (cbjchen) on 2015-07-11
Changed in nova (Ubuntu):
status: New → In Progress
Liang Chen (cbjchen) on 2015-07-14
description: updated
Liang Chen (cbjchen) on 2015-07-14
Changed in nova (Ubuntu Trusty):
assignee: nobody → Liang Chen (cbjchen)
Changed in nova (Ubuntu Trusty):
status: New → In Progress
James Page (james-page) on 2015-07-16
Changed in nova (Ubuntu):
status: In Progress → Fix Released

Hello Phil, or anyone else affected,

Accepted nova into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/1:2014.1.5-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Liang Chen (cbjchen) wrote :

The proposed build have been deployed and tested, and this work as expected.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2014.1.5-0ubuntu1.2

---------------
nova (1:2014.1.5-0ubuntu1.2) trusty; urgency=medium

  * Add rsyslog retry support (LP: #1459046)
    - d/p/add-support-for-syslog-connect-retries.patch
  * Add vm clean shutdown support (LP: #1196924)
    - d/p/clean-shutdown.patch

 -- Edward Hope-Morley <email address hidden> Thu, 16 Jul 2015 11:55:57 +0100

Changed in nova (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers