Comment 2 for bug 1236930

Revision history for this message
Phil Frost (bitglue) wrote : Re: [Bug 1236930] Re: attempting to reboot a shutdown instance appears to have failed, but then suprisingly succeeds two minutes later

On 10/10/2013 03:47 AM, Jay Lau wrote:
> So it seems to be a valid case. Phil, what do you think? Thanks.

I agree it's a valid case, and the logic seems to make sense, after it
all happens and has been investigated. The problem is just that in the
two minutes between the failed soft reboot, and when the hard reboot is
done, it's really confusing. Here's what went through my mind:

- let's reboot the instance
- hum...that's taking a while. Why?
- the logs say it failed, but the API indicates that it's still rebooting.
- let's see if I can reproduce
- let's file a bug report, and manually reset the instance state in the
database (I've run into this before, with other operations)
- what the hell? My instance is running now!

Besides being confusing, it's also unnecessarily slow. In those two
minutes between soft and hard reboot attempts, nothing else can be done
to the instance.

I think this could be avoided two ways:

1) the reboot procedure can check if the instance is not running, and if
so, just start it, instead of attempting to reboot it, since that's
bound to fail

2) the first soft reboot attempt can do a better job of checking for
failures, and if they are encountered, bypass the two minute timeout and
proceed directly to the hard reboot attempt.