OpenStack Compute (Nova)

'nova reboot' under KVM always does a hard reboot

Reported by David Kranz on 2012-02-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Vish Ishaya

Bug Description

I really hesitate to report this as a bug because it is so hard to believe but a user reported it and I can reproduce it.

1. Booted a vm using oneiric-server-cloudimg-amd64 and assigned floating ip
2. Ssh to vm and
   sudo apt-get install -y git
   git clone git://github.com/rackspace/python-novaclient.git
3. nova reboot
4. Ssh in again. The python-novaclient directory is intact but all the files have zero length

This happens often but not always. I had been using these vms myself without such a problem.

This was running stable-diablo on oneiric with kvm. I don't now what info would be more useful but on vm:

ubuntu@buggy:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda 10321208 687340 9109592 8% /
devtmpfs 1026112 4 1026108 1% /dev
none 205648 196 205452 1% /run
none 5120 0 5120 0% /run/lock
none 1028228 0 1028228 0% /run/shm
/dev/vdb 20642428 176196 19417656 1% /mnt
ubuntu@buggy:~$ mount
/dev/vda on / type ext4 (rw)
none on /proc type proc (rw,noexec,nosuid,nodev)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
devtmpfs on /dev type devtmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/vdb on /mnt type ext3 (rw)

David Kranz (david-kranz) wrote :

After too much manual retrying this I conclude with high probability that this does not happen if I do a reboot from the shell of the vm, only through the nova API. Furthermore, once I reboot from the vm shell, a subsequent reboot through nova does not zero the files. Strange. I wanted to try this on trystack but reboot from the dashboard is not working.

Thierry Carrez (ttx) wrote :

I suspect nova reboot does a hard reboot while files weren't flushed to disk yet, while doing reboot from the vm does a soft reboot that flushes disks first. In which case I'm not sure that should be considered a bug... What happens if you do a "sync" before nova reboot ?

Changed in nova:
status: New → Incomplete
David Kranz (david-kranz) wrote :

I tried doing a sync and the problem did not happen. However, 'nova reboot' is documented to do a SOFT reboot unless you give it the --hard flag. I verified using the --debug flag that it is indeed doing SOFT. Any ideas for further debugging this? Or is even a SOFT reboot not supposed to flush?

send: u'POST /v1.1/testproject/servers/d231cdf2-1967-41f8-9528-6eadee5fd7f1/action HTTP/1.1\r\nHost: 172.18.0.131:8774\r\nContent-Length: 28\r\nx-auth-project-id: testproject\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nx-auth-token: 866223043243cc5b0fa536d6cb3094acbefd817e\r\nuser-agent: python-novaclient\r\ncontent-type: application/json\r\n\r\n{"reboot": {"type": "SOFT"}}'

David Kranz (david-kranz) wrote :

This diablo code TODO appears to be the culprit. It was changed in essex. I will test in essex, but this bug might be considered for diablo backport.

    def _action_reboot(self, input_dict, req, id):
        if 'reboot' in input_dict and 'type' in input_dict['reboot']:
            valid_reboot_types = ['HARD', 'SOFT']
            reboot_type = input_dict['reboot']['type'].upper()
            if not valid_reboot_types.count(reboot_type):
                msg = _("Argument 'type' for reboot is not HARD or SOFT")
                LOG.exception(msg)
                raise exc.HTTPBadRequest(explanation=msg)
        else:
            msg = _("Missing argument 'type' for reboot")
            LOG.exception(msg)
            raise exc.HTTPBadRequest(explanation=msg)
        try:
            # TODO(gundlach): pass reboot_type, support soft reboot in
            # virt driver
            self.compute_api.reboot(req.environ['nova.context'], id)
        except Exception, e:
            LOG.exception(_("Error in reboot %s"), e)
            raise exc.HTTPUnprocessableEntity()
        return webob.Response(status_int=202)

Download full text (3.1 KiB)

soft reboot isn't implemented in libvirt.

looks like it could be done relatively easily by using dom.shutdown() / waiting for it to power off and then starting it again instead of doing the hard version which is destroy/create

Vish

On Feb 24, 2012, at 6:37 AM, David Kranz wrote:

> I tried doing a sync and the problem did not happen. However, 'nova
> reboot' is documented to do a SOFT reboot unless you give it the --hard
> flag. I verified using the --debug flag that it is indeed doing SOFT.
> Any ideas for further debugging this? Or is even a SOFT reboot not
> supposed to flush?
>
> send: u'POST
> /v1.1/testproject/servers/d231cdf2-1967-41f8-9528-6eadee5fd7f1/action
> HTTP/1.1\r\nHost: 172.18.0.131:8774\r\nContent-Length: 28\r\nx-auth-
> project-id: testproject\r\naccept-encoding: gzip, deflate\r\naccept:
> application/json\r\nx-auth-token:
> 866223043243cc5b0fa536d6cb3094acbefd817e\r\nuser-agent: python-
> novaclient\r\ncontent-type: application/json\r\n\r\n{"reboot": {"type":
> "SOFT"}}'
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/939557
>
> Title:
> Files in vm sometimes zeroed after 'nova reboot'
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> I really hesitate to report this as a bug because it is so hard to
> believe but a user reported it and I can reproduce it.
>
> 1. Booted a vm using oneiric-server-cloudimg-amd64 and assigned floating ip
> 2. Ssh to vm and
> sudo apt-get install -y git
> git clone git://github.com/rackspace/python-novaclient.git
> 3. nova reboot
> 4. Ssh in again. The python-novaclient directory is intact but all the files have zero length
>
> This happens often but not always. I had been using these vms myself
> without such a problem.
>
> This was running stable-diablo on oneiric with kvm. I don't now what
> info would be more useful but on vm:
>
> ubuntu@buggy:~$ df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/vda 10321208 687340 9109592 8% /
> devtmpfs 1026112 4 1026108 1% /dev
> none 205648 196 205452 1% /run
> none 5120 0 5120 0% /run/lock
> none 1028228 0 1028228 0% /run/shm
> /dev/vdb 20642428 176196 19417656 1% /mnt
> ubuntu@buggy:~$ mount
> /dev/vda on / type ext4 (rw)
> none on /proc type proc (rw,noexec,nosuid,nodev)
> none on /sys type sysfs (rw,noexec,nosuid,nodev)
> none on /sys/fs/fuse/connections type fusectl (rw)
> none on /sys/kernel/debug type debugfs (rw)
> none on /sys/kernel/security type securityfs (rw)
> devtmpfs on /dev type devtmpfs (rw,mode=0755)
> none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
> none on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
> none on /run/shm type tmpfs (rw,nosuid,nodev)
> /dev/vdb on /mnt type ext3 (rw)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/939557/+subscripti...

Read more...

This is a pretty serious bug and hope it can be fixed. I think it is not the only case where the compute API defines something but some hypervisor doesn't do it right. What is the "official" story about these deviations? I would think that the driver for any hypervisor that is supported in nova core (i.e. can be mentioned by name in nova.conf) needs to honor the core API.

Agreed.

We tried to minimize all of the feature gaps during this release. Looks like we missed this one. It seems like a relatively easy addition if someone wants to tackle it.

Vish

On Feb 24, 2012, at 11:56 AM, David Kranz wrote:

> This is a pretty serious bug and hope it can be fixed. I think it is not
> the only case where the compute API defines something but some
> hypervisor doesn't do it right. What is the "official" story about these
> deviations? I would think that the driver for any hypervisor that is
> supported in nova core (i.e. can be mentioned by name in nova.conf)
> needs to honor the core API.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/939557
>
> Title:
> Files in vm sometimes zeroed after 'nova reboot'
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> I really hesitate to report this as a bug because it is so hard to
> believe but a user reported it and I can reproduce it.
>
> 1. Booted a vm using oneiric-server-cloudimg-amd64 and assigned floating ip
> 2. Ssh to vm and
> sudo apt-get install -y git
> git clone git://github.com/rackspace/python-novaclient.git
> 3. nova reboot
> 4. Ssh in again. The python-novaclient directory is intact but all the files have zero length
>
> This happens often but not always. I had been using these vms myself
> without such a problem.
>
> This was running stable-diablo on oneiric with kvm. I don't now what
> info would be more useful but on vm:
>
> ubuntu@buggy:~$ df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/vda 10321208 687340 9109592 8% /
> devtmpfs 1026112 4 1026108 1% /dev
> none 205648 196 205452 1% /run
> none 5120 0 5120 0% /run/lock
> none 1028228 0 1028228 0% /run/shm
> /dev/vdb 20642428 176196 19417656 1% /mnt
> ubuntu@buggy:~$ mount
> /dev/vda on / type ext4 (rw)
> none on /proc type proc (rw,noexec,nosuid,nodev)
> none on /sys type sysfs (rw,noexec,nosuid,nodev)
> none on /sys/fs/fuse/connections type fusectl (rw)
> none on /sys/kernel/debug type debugfs (rw)
> none on /sys/kernel/security type securityfs (rw)
> devtmpfs on /dev type devtmpfs (rw,mode=0755)
> none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
> none on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
> none on /run/shm type tmpfs (rw,nosuid,nodev)
> /dev/vdb on /mnt type ext3 (rw)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/939557/+subscriptions

There is a comment in the essex code about shutdown not working with kvm. This seems to be relevant:

http://wiki.libvirt.org/page/Tips#Debian.2FUbuntu_guests_under_KVM_don.27t_shut_down_properly

It suggests that if the guest installs a certain package then the shutdown will work.

Still seems like we could implement it and just doc the fact that you need guest config for it to work. They can always use --hard if need be.

On Feb 25, 2012, at 1:43 PM, David Kranz wrote:

> There is a comment in the essex code about shutdown not working with
> kvm. This seems to be relevant:
>
> http://wiki.libvirt.org/page/Tips#Debian.2FUbuntu_guests_under_KVM_don.27t_shut_down_properly
>
> It suggests that if the guest installs a certain package then the
> shutdown will work.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/939557
>
> Title:
> Files in vm sometimes zeroed after 'nova reboot'
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> I really hesitate to report this as a bug because it is so hard to
> believe but a user reported it and I can reproduce it.
>
> 1. Booted a vm using oneiric-server-cloudimg-amd64 and assigned floating ip
> 2. Ssh to vm and
> sudo apt-get install -y git
> git clone git://github.com/rackspace/python-novaclient.git
> 3. nova reboot
> 4. Ssh in again. The python-novaclient directory is intact but all the files have zero length
>
> This happens often but not always. I had been using these vms myself
> without such a problem.
>
> This was running stable-diablo on oneiric with kvm. I don't now what
> info would be more useful but on vm:
>
> ubuntu@buggy:~$ df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/vda 10321208 687340 9109592 8% /
> devtmpfs 1026112 4 1026108 1% /dev
> none 205648 196 205452 1% /run
> none 5120 0 5120 0% /run/lock
> none 1028228 0 1028228 0% /run/shm
> /dev/vdb 20642428 176196 19417656 1% /mnt
> ubuntu@buggy:~$ mount
> /dev/vda on / type ext4 (rw)
> none on /proc type proc (rw,noexec,nosuid,nodev)
> none on /sys type sysfs (rw,noexec,nosuid,nodev)
> none on /sys/fs/fuse/connections type fusectl (rw)
> none on /sys/kernel/debug type debugfs (rw)
> none on /sys/kernel/security type securityfs (rw)
> devtmpfs on /dev type devtmpfs (rw,mode=0755)
> none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
> none on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
> none on /run/shm type tmpfs (rw,nosuid,nodev)
> /dev/vdb on /mnt type ext3 (rw)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/939557/+subscriptions

Thierry Carrez (ttx) on 2012-02-27
summary: - Files in vm sometimes zeroed after 'nova reboot'
+ 'nova reboot' under KVM always does a hard reboot
Changed in nova:
importance: Undecided → High
status: Incomplete → Triaged
David Kranz (david-kranz) wrote :

Vish, I agree. I wonder if the Ubuntu folks consider it a bug that their cloud image does not handle this out-of-the-box?

Thierry Carrez (ttx) wrote :

If it's not a bug, it's at the very least a wanted feature. Let me check with them

Thierry Carrez (ttx) on 2012-02-28
Changed in nova:
milestone: none → essex-4
assignee: nobody → Vish Ishaya (vishvananda)

Fix proposed to branch: master
Review: https://review.openstack.org/4747

Changed in nova:
status: Triaged → In Progress
Thierry Carrez (ttx) on 2012-03-01
Changed in nova:
milestone: essex-4 → essex-rc1

Reviewed: https://review.openstack.org/4747
Committed: http://github.com/openstack/nova/commit/2efb017a06afeb10b474245455310ec21601a701
Submitter: Jenkins
Branch: master

commit 2efb017a06afeb10b474245455310ec21601a701
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Feb 29 17:29:02 2012 -0800

    Adds soft-reboot support to libvirt

     * Falls back to hard reboot if guest doesn't respond
     * Cleans up reboot/rescue/unrescue interaction
     * Fixed fake for tests
     * Added a force hard reboot test to verify fallback works
     * Fixes bug 939557

    Change-Id: I8d0c9a35725de5e5bfb8f13a2d869c6122ba44ef

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-20
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers