Tempest has no test for soft reboot

Bug #1014647 reported by David Kranz
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned
tempest
Won't Fix
Medium
Unassigned

Bug Description

1. soft reboot requires support from the guest to operate. The current nova implementation tells the guest to reboot and
then waits. If the soft reboot did not happen, it triggers a hard reboot but after a default wait of 2 minutes.

Solution: Provide a new soft_reboot_image_ref, defaults to None, that is used for soft reboot tests which. If the value is None then the test is skipped.

2. Because of (1), we should only use soft reboot when we are actually testing that feature.

3. The current soft reboot test does not check that a soft reboot was done rather than hard. It should check for the server state of REBOOT. Same issue for the hard reboot test.

Tags: api tempest
Changed in tempest:
assignee: nobody → David Kranz (david-kranz)
Revision history for this message
David Kranz (david-kranz) wrote :

With regard to (3) we can't really check for REBOOT vs REBOOT_HARD because those states are ephemeral.

Revision history for this message
David Kranz (david-kranz) wrote :

I filed https://bugs.launchpad.net/openstack-manuals/+bug/1017543 to get the spec settled. It also seems from IRC with maoy that the eventual "success" of soft boot we are seeing is actually a timeout that just sets the state to ACTIVE with no reboot. There are a bunch of nova bugs related to this, some fixed and some not. I am going to disable this test until things get more settled.

Jay Pipes (jaypipes)
Changed in tempest:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/8936

Changed in tempest:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/8936
Committed: http://github.com/openstack/tempest/commit/9b6129c8f042ae2defb749b41ee33312052b7280
Submitter: Jenkins
Branch: master

commit 9b6129c8f042ae2defb749b41ee33312052b7280
Author: David Kranz <email address hidden>
Date: Mon Jun 25 12:10:48 2012 -0400

    Skip slow/buggy soft reboot test until bug 1014647 is dealt with.

    Change-Id: I41cfa8075214a178fe986cd2845253bd49340400

Changed in tempest:
status: In Progress → Fix Committed
Revision history for this message
David Kranz (david-kranz) wrote : Re: Soft reboot has multiple issues

Silly jenkins. This bug is not fixed.

Changed in tempest:
status: Fix Committed → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/9503

Changed in tempest:
assignee: David Kranz (david-kranz) → Jay Pipes (jaypipes)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/9503
Committed: http://github.com/openstack/tempest/commit/257d3f847b06bbc3bed0fddde549e2ab7da13a05
Submitter: Jenkins
Branch: master

commit 257d3f847b06bbc3bed0fddde549e2ab7da13a05
Author: Jay Pipes <email address hidden>
Date: Sun Jul 8 23:01:31 2012 -0400

    Adds a script for tracking bug skips in tempest

    New file tools/skip_tracker.py can be used to show the
    status and priority of bugs that are marking test methods
    for skipping, and instruct the caller to remove skips
    on bugs that have been fixed in upstream. Output looks like this:

    jpipes@uberbox:~/repos/tempest$ python tools/skip_tracker.py
    INFO: Total bug skips found: 52
    INFO: Total unique bugs causing skips: 30
    INFO: Bug # 940500 ( Medium - Fix Released)
    INFO: Bug # 963248 ( Undecided - Invalid)
    INFO: Bug # 966249 ( Undecided - Fix Released)
    INFO: Bug # 987121 ( Medium - Fix Released)
    INFO: Bug # 988920 ( Undecided - Opinion)
    INFO: Bug # 997725 ( Medium - Fix Released)
    INFO: Bug # 999084 ( Medium - Triaged)
    INFO: Bug # 999209 ( Low - Fix Released)
    INFO: Bug # 999219 ( High - Triaged)
    INFO: Bug # 999567 ( Medium - Fix Released)
    INFO: Bug # 999594 ( Medium - In Progress)
    INFO: Bug # 999608 ( Low - Fix Released)
    INFO: Bug #1002892 ( Undecided - Invalid)
    INFO: Bug #1002901 ( Undecided - Invalid)
    INFO: Bug #1002911 ( Undecided - Invalid)
    INFO: Bug #1002918 ( Undecided - Invalid)
    INFO: Bug #1002924 ( Undecided - Incomplete)
    INFO: Bug #1002926 ( Undecided - Invalid)
    INFO: Bug #1002935 ( Undecided - Invalid)
    INFO: Bug #1004007 ( Low - Confirmed)
    INFO: Bug #1004564 ( Low - Confirmed)
    INFO: Bug #1005397 ( Undecided - Invalid)
    INFO: Bug #1005423 ( Low - Triaged)
    INFO: Bug #1006033 ( Undecided - New)
    INFO: Bug #1006725 ( Low - Triaged)
    INFO: Bug #1006857 ( Low - Confirmed)
    INFO: Bug #1006875 ( Low - Confirmed)
    INFO: Bug #1014647 ( Medium - Confirmed)
    INFO: Bug #1014683 ( Undecided - New)
    INFO: Bug #1022411 ( Undecided - In Progress)
    The following bugs have been fixed and the corresponding skips
    should be removed from the test cases:

       940500
       966249
       987121
       997725
       999209
       999567
       999608

    Change-Id: Ic58fc8beb2f6134504d4eb2f6ebe40fa24fe06f6

Changed in tempest:
status: In Progress → Fix Committed
Changed in tempest:
status: Fix Committed → In Progress
Revision history for this message
Andrea Frittoli (andrea-frittoli) wrote : Re: Soft reboot has multiple issues

For soft reboot we would need two tests, one with an image which reacts to acip signals, and one with an image which does not react. It should be ok to use the same linux image with ACIPD running for the first test and not running for the second test.

The problem remains of being able to distinguish between SOFT reboot from HARD reboot. If we had an interface into the virtualization layer, we could check for the domain ID. In case of HARD reboot the domain is recreated and the ID will be different - this is true at least when using libvirt to access qemu/kvm.

Jay Pipes (jaypipes)
Changed in tempest:
assignee: Jay Pipes (jaypipes) → nobody
tags: added: tempest
Revision history for this message
Scott Moser (smoser) wrote :

@Andrea,
 another way to distinguish between SOFT and HARD would be to launch an instance that supported SOFT, and then break it from inside (ie, by removing /sbin/reboot, or killing the acpid).
 cirros 0.3.1 *should* support acpi (SOFT) reboot, so this could be accomplished.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/27558

Changed in tempest:
assignee: nobody → Hoisaleshwara Madan V S (mahoisal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/27584

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/27704

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/28394

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/28788

Revision history for this message
Attila Fazekas (afazekas) wrote : Re: Soft reboot has multiple issues

Can we expect the images used by heat are able to soft-reboot ?

Changed in tempest:
status: In Progress → Incomplete
Sean Dague (sdague)
Changed in tempest:
status: Incomplete → Invalid
Revision history for this message
David Kranz (david-kranz) wrote :

I'm not sure why this bug was closed. We still have no test verifying that a soft boot request actually does a soft rather than hard reboot.

summary: - Soft reboot has multiple issues
+ Tempest has no test for soft reboot
Changed in tempest:
status: Invalid → Confirmed
Revision history for this message
Attila Fazekas (afazekas) wrote :

Cirros is able to do soft reboot.

If you decrease the build timeout below the soft reboot timeout (120 sec) the basic minimum scenario will fail if nova falls back to the hard reboot.

The other way is, you can modify the acpid action to power button, which is more visible than just reboot.
But in this case you assume, the image using the acpid for handling the event (different image may have different way), and you may also expect something about how the acpid configuration files are organized.

Revision history for this message
Attila Fazekas (afazekas) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Maybe the nova get-diagnostics API could be used to get the state of the backing VM while it's rebooting and tell us if soft or hard was used? Seems that would be very time sensitive (read: racey) though.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Maybe nova just needs an extension to the reboot API that allows the caller to not fallback to hard reboot if the soft fails, then the tempest test would just set that and if the soft reboot fails the test will fail (since the instance would be in error state).

Revision history for this message
Matt Riedemann (mriedem) wrote :

Another idea is if there was a change in the virt driver reboot API such that if the virt driver tries a soft reboot and that fails, it saves the failure and then does the hard reboot. When the hard reboot is done, the virt driver raises the exception and the compute manager saves that as an instance fault (this would be some specific SoftRebootFailed nova exception or something so the compute manager code could distinguish it).

Then the client can get the faults off the instance and see if one of them is for a soft reboot failed type situation.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think I'll draft up a nova blueprint spec for Kilo to go over some of the ideas.

Changed in nova:
status: New → Triaged
tags: added: api
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/122156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/122156
Reason: meh

Sean Dague (sdague)
Changed in nova:
status: Triaged → Invalid
Matt Riedemann (mriedem)
Changed in tempest:
assignee: Hoisaleshwara Madan V S (madan) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/188687

Changed in tempest:
assignee: nobody → Alexander Gubanov (ogubanov)
status: Confirmed → In Progress
Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :

It looks like the patch https://review.openstack.org/188687 doesn't solve the issue. So moving the status back to "New".

Revision history for this message
Yaroslav Lobankov (ylobankov) wrote :

Sorry, to "Confirmed".

Changed in tempest:
status: In Progress → Confirmed
assignee: Alexander Gubanov (ogubanov) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Alexander Gubanov (<email address hidden>) on branch: master
Review: https://review.openstack.org/188687

Revision history for this message
Ken'ichi Ohmichi (oomichi) wrote :

"soft reboot" makes the gate unstable, and there was not activity for this bug in long-term.
So we need to drop this bug report from our queue.

Changed in tempest:
status: Confirmed → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/647718

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master)

Reviewed: https://review.opendev.org/647718
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=104e0b731a4b44e1d1f3e9092a1d1426809fb78b
Submitter: Zuul
Branch: master

commit 104e0b731a4b44e1d1f3e9092a1d1426809fb78b
Author: afazekas <email address hidden>
Date: Tue Mar 26 12:38:29 2019 +0100

    Delete test_reboot_server_soft

    The test has been skipped for than 6 years already.

    Nova automatically switched to hard reboot if the guest is
    not responding, no way to see the difference from the API.

    The following alternatives were not agreed on the PTG.
     - Change the acpid config an all supported image before the reboot,
       and log the acpi event in a tempest friendly way
     - ssh the machine before reboot, in this case more likely a
       soft reboot would happen, but we are unable to distinguish it from
       the hard one.

    Take into account that the minimum scenario test uses soft reboot
    and the nova functional test also covers reboot.

    The test hasn't failed for more than 6 years (it's been skipped),
    so nothing prevents us from removing it by the usual removal procedure:

     https://docs.openstack.org/tempest/latest/test_removal.html

    The test deletion was also announced on ML:
     http://lists.openstack.org/pipermail/openstack-discuss/2020-October/017889.html

    Change-Id: I62b48865f5b21e55c28b8ee08ad5786473cc5ddf
    Related-Bug: #1014647

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.