Should shut down domains on system shutdown

Bug #350936 reported by Alexandre Kandalintsev on 2009-03-29
302
This bug affects 53 people
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
libvirt (Ubuntu)
High
Serge Hallyn
Lucid
High
Unassigned
Maverick
High
Unassigned
Natty
Undecided
Unassigned

Bug Description

===========================
SRU Justification
1. Impact: libvirt VMs must be manually shut down before host shutdown, else they will be corrupted.
2. How bug was addressed: the libvirt upstart job will now try to shut down all VMs before the host reboots or shuts down
3. patch: see the linked bzr tree (or https://code.launchpad.net/~serge-hallyn/ubuntu/oneiric/libvirt/fix-shutdown/+merge/70245/+preview-diff/+files/preview.diff)
4. TEST CASE: start a KVM vm under libvirt, then reboot the host.
5. Regression potential: shutting down a host can now be delayed until the VMs shut down or the shutdown attempts time out. This should be better than having VMs corrupted, and won't happen if VMs are not running; but could conceivably be seen as a regression.
===========================

I'm using autostart feature of libvirtd (my guest defined in /etc/libvirt/qemu/autostart/). It's starts normally, but if I reboot server then guest fs will be corrupted becouse its cant stop in time:

WARNING: / was not properly dismounted

We need to give more time to libvirtd to stop all guests.

I'm running jaunty amd64 with all updates.

Related branches

affects: ubuntu → kvm (Ubuntu)

This is not a KVM issue but rather a libvirt issue. libvirt should delay shutdown until the guests have fully shut down.

Changed in kvm (Ubuntu):
status: New → Invalid
Chuck Short (zulcss) wrote :

Hi,

With which version of libvirt is this with?

Regards
chuck

Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in libvirt (Ubuntu):
importance: Undecided → Medium

I used jaunty version of libvirt. Now I'm on debian so I can't say you what exactly version it was.

Are you shure it's a nessesary information? Does libvirt rc-script have any ability to wait for virtual maschines at all?

Dustin Kirkland  (kirkland) wrote :

I can hack this into the upstart script. There will need to be a configurable timeout, though, on how long to delay shutdown. I suggest 5 minutes.

Changed in libvirt (Ubuntu):
status: Incomplete → Triaged
assignee: nobody → Dustin Kirkland (kirkland)
importance: Medium → Low
summary: - kvm stop rc
+ Should shut down domains on system shutdown

I think 5 minutes is very long for servers... I suggest two minutes.

Alessandro Bono (a.bono) wrote :

Hi

Any news for this bug? I want to convert my xen server to libvirt+kvm but this bug it's really serious for me, it's not possible to automate shutdown (ups or time based) or simply "press power button to clean shutdown" without important modification to libvirt upstart script. Any chance to update upstart script prior to lucid announce??

thanks

Francesco Pretto (ceztko) wrote :

I'm not sure it can be done hacking upstart job. When libvirt was a sysvinit script in ubuntu 9.10, I used this [1] script adding it in "stop" function. In Lucid, if I add the same script to /etc/init/libvirt-bin.conf with the stanza:

pre-stop exec shut-guests.sh 2>&1 > /var/log/shut-guests.log

This works if I manually invoke a stop of libvirt-bin in a running system with:
$ sudo stop libvirt-bin

but **doesn't** work when issuing a reboot or halt. It seems there's an aggressive killing of all kvm processes **before** the pre-stop stanza is execute, and libvirt doesn't find any running guests when the shut-guests.sh is executed. it's not clear to me why and If this is an exptected behavior. For sure, kvm processes are detached from libvirt process branch.

[1] http://www.linux-kvm.com/content/stop-script-running-vms-using-virsh

Francesco Pretto (ceztko) wrote :

Workaround:

Add the lines:

        /usr/local/bin/shut-guests.sh 2>&1 >> /var/log/shut-guests.log
        /sbin/initctl emit guests-shutted

just after "do_stop() {" in /etc/init.d/sendsings

*and* changing the line:

stop on (runlevel [!2345])

on file /etc/init/libvirt-bin.conf to:

stop on guests-shutted

Ugly but works, until libvirt/kvm/distributions crews will care more about data integrity in vm guests during shutdown (sigh).

Christopher Hlubek (hlubek) wrote :

This is a very important issue for server virtualization. If Ubuntu wants to be en par with Xen on other Distributions this bug has to be fixed!

Soren Hansen (soren) wrote :

Note to anyone implementing this (maybe I'll get around to it myself, but not right now):

It's important to remember that libvirt intentionally does not kill VM's on termination. This enables us to upgrade libvirt without interruping running VM's, so we want to only conditionally shut down VM's. What I've done on one of my Hardy boxes is to check $0 to see if the init script was invoked as K??libvirt-bin. If so, I assume we're being shut down and then I go and shut down the virtual machines. If invoked any other way, it just stops libvirt, not the VM's. To achieve something similar with upstart jobs, there's an environment variable that tells us why libvirt-bin is being stopped. This will reveal whether it's being shut down as a result of shutdown being called or whatnot.

Changed in libvirt (Ubuntu):
assignee: Dustin Kirkland (kirkland) → nobody
Francesco Pretto (ceztko) wrote :

2010/5/5 Soren Hansen <email address hidden>:
>
> It's important to remember that libvirt intentionally does not kill VM's
> on termination. This enables us to upgrade libvirt without interruping
> running VM's, so we want to only conditionally shut down VM's.

Correct.

> What I've
> done on one of my Hardy boxes is to check $0 to see if the init script
> was invoked as K??libvirt-bin. If so, I assume we're being shut down and
> then I go and shut down the virtual machines. If invoked any other way,
> it just stops libvirt, not the VM's. To achieve something similar with
> upstart jobs, there's an environment variable that tells us why libvirt-
> bin is being stopped. This will reveal whether it's being shut down as a
> result of shutdown being called or whatnot.
>

But the problem is that the PIDs of the vm are not part of the libvirt-bin upstart job. Try to look my *ugly* workaround[1] :
/etc/init.d/sendsings is just parallelizing killing any process not in upstart pool of tracked ones. So, when upstart shutdown libvirt-bin, almost for sure libvirt-bin won't find any active guests because they have been killed before by /etc/init.d/sendsing (killing those vm processes seems very quick). While this is good for libvirt-bin independence from kvm guests processes, this make fine grain control of guests shutdown harder, and one may wonder if it's not kvm itelsef that should send ACPI shutdown signal to guests, and not libvirt.

I have another suspect: upstart may failing here. Doesn't libvirt-bin forks here to create kvm guests? Shouldn't "expect daemon" upstart stanza track daemons that fork more than once? If so, shouldn't libvirt-bin upstart job track these guests processes too? I have to ask in upstart mailing list.

[1] https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/350936/comments/8

Reinhard Tartler (siretart) wrote :

How about adding a pre-stop script to /etc/init/libvirt-bin.conf that uses /usr/bin/virsh to iterate over all running guests to shut them down? Probably this should be done a) in parallel and b) supervised by reasonable timeouts. So technically, it might make sense to implement that pre-stop script in C or python to access libvirt directly, or extend /usr/bin/virsh to offer an appropriate command.

Reinhard Tartler (siretart) wrote :

ignore my previous comment, I should have read the whole thread in more detail.

ossjunkie (ossjunkie) wrote :

as this bug can corrupt guests i did an old school init script for my server, but that doesn't work as libvirt-bin got always stopped before (even at S01). so i removed all bashism like arrays and tried to create an upstart script. the "core logic" is done, but i need some help regarding upstart integration:

-how can we ensure in upstart that the script got run on shutdown & reboots but always before libvirt-bin and qemu-kvm got stopped?

-should we use return or exit in upstart scripts?

feedback highly welcome :)

BTW better output to screen and a log file wouldn't hurt

On Wed, Jun 23, 2010 at 03:53:18PM -0000, ossjunkie wrote:
> -how can we ensure in upstart that the script got run on shutdown &
> reboots but always before libvirt-bin and qemu-kvm got stopped?

I /think/ this should do the trick:

start on shutdown and stopping libvirt-bin

IIUIC, libvirt-bin will block its stop procedure until your script is
done.

--
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/

ossjunkie (ossjunkie) wrote :

thanks for the hint, but after trying i found that "start on (runlevel [06] and stopping libvirt-bin)" should do the trick. but i still got problems as i can use libvirt at first in the upstart job, but when testing the timeout loop it seems that libvirt-bin still stopped before the script finished.

why does libvirt-bin still stop before the libvirt-shutdown-guests job has finished the script?
do i have to modify the libvirt-bin job to wait for libvirt-shutdown-guests?
does upstart jobs also receive some sort of kill signal?
does upstart in general allow to delay the shutdown just by a loop in the script of a upstart job?

here is my current version. upstart experts required urgently ;)

BTW i have no problems when testing it with "initctl start libvirt-shutdown-guests", so any testing needs to done on real shutdown ;(

Soren Hansen (soren) wrote :

On Thu, Jun 24, 2010 at 01:38:04PM -0000, ossjunkie wrote:
> thanks for the hint, but after trying i found that "start on (runlevel
> [06] and stopping libvirt-bin)" should do the trick. but i still got
> problems as i can use libvirt at first in the upstart job, but when
> testing the timeout loop it seems that libvirt-bin still stopped before
> the script finished.

That shouldn't happen. You're doing the right thing with upstart. I
double checked with one of the experts. :) Something else seems to be
killing libvirt-bin. Try putting a a post-stop thing in the libvirt-bin
job to log when it's being killed, just to double check.

> why does libvirt-bin still stop before the libvirt-shutdown-guests job
> has finished the script?

Good question. Upstart shouldn't be killing it, but something else
might. I can't imagine what, though.

> do i have to modify the libvirt-bin job to wait for
> libvirt-shutdown-guests?

Nope.

> does upstart jobs also receive some sort of kill signal? does upstart
> in general allow to delay the shutdown just by a loop in the script of
> a upstart job?

Yes.

> BTW i have no problems when testing it with "initctl start libvirt-
> shutdown-guests", so any testing needs to done on real shutdown ;(

> ** Attachment added: "libvirt-shutdown-guests.conf"
> http://launchpadlibrarian.net/50850625/libvirt-shutdown-guests.conf

The script still has some debugging stuff in it. That would need to go
away before we can include this in the package.

Thanks for your efforts so far! This has been a problem for a looong
time.

--
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/

Francesco Pretto (ceztko) wrote :

2010/6/24 ossjunkie <email address hidden>:
> why does libvirt-bin still stop before the libvirt-shutdown-guests job has finished the script?

Because kvm guests processes aren't tracked by the upstart job and
they are killed immediately by /etc/init.d/sendsings. This approach
have pros (independence of libvirt from guests) and cons (this
problem) so there's no easy fix. In my previous post [1] there's an
ugly but safe workaround. If you want a definitive solution try to bug
libvirt devs by pointing this exact problem (/etc/init.d/sendsings
killing guests before libvirt-bin upstart job finishes).

[1] https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/350936/comments/8

Francesco Pretto (ceztko) wrote :

2010/6/24 Soren Hansen <email address hidden>:
>> why does libvirt-bin still stop before the libvirt-shutdown-guests job
>> has finished the script?
>
> Good question. Upstart shouldn't be killing it, but something else
> might. I can't imagine what, though.
>

I'm not sure Upstart isn't supposed to kill them. Let's suppose
upstart is able to track all pids forked for livbvirt-bin job start
(and it doesn't do so, even if the "expect daemon" stanza is used:
this may be an upstart bug): there would be a pid for libvirt-bin and
many pids for kvm guests. Suppose you want to kill the libvirt-bin
job: how can you tell upstart to kill just libvirt-bin and not kvm
guests? I don't think upstart jobs are so flexible.

Soren Hansen (soren) wrote :

On Thu, Jun 24, 2010 at 08:04:48PM -0000, Francesco Pretto wrote:
>>> why does libvirt-bin still stop before the libvirt-shutdown-guests
>>> job has finished the script?
>> Good question. Upstart shouldn't be killing it, but something else
>> might. I can't imagine what, though.
> I'm not sure Upstart isn't supposed to kill them.

Not "them" (as in the kvm guests), but libvirtd itself. Besides, of
course upstart is meant to kill it (or shut it down by some other
means), but it shouldn't be doing it until all tasks that are triggered
on "stopping libvirt-bin" have completed.

> Let's suppose upstart is able to track all pids forked for
> livbvirt-bin job start (and it doesn't do so, even if the "expect
> daemon" stanza is used: this may be an upstart bug): there would be a
> pid for libvirt-bin and many pids for kvm guests. Suppose you want to
> kill the libvirt-bin job: how can you tell upstart to kill just
> libvirt-bin and not kvm guests? I don't think upstart jobs are so
> flexible.

I'm reasonably sure upstart only tracks one pid per upstart job, and as
such has no notion of the started kvm guests.

--
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/

Soren Hansen (soren) wrote :

Why are you not doing this in the pre-stop part of the libvirt-bin job,
by the way?

ossjunkie (ossjunkie) wrote :

@soren: as i simple don't know how to trigger it only on shutdown, as the premise was it should run on regular stop. but i know it may ease the situation and would like to do so. do you know some solution here. maybe we could do a simple condition that checks for an enviromental stuff thats only present on shutdown/reboot and is a reliable source. and yes i left some ugly debugging stuff in, but because it isn't ready so far ;)

regarding the killing by /etc/init.d/sendsings we could easily test when the VMs got killed and how to prevent when libvirtd would be still accessable. so i would say we should get the upstart things right first and then go on with that. maybe we could try to track and bound the pids of the guest somehow to the upstart job (in case we keep it seperate) and prevent it that way.

Soren Hansen (soren) wrote :

On Fri, Jun 25, 2010 at 10:32:07AM -0000, ossjunkie wrote:
> @soren: as i simple don't know how to trigger it only on shutdown, as
> the premise was it should run on regular stop. but i know it may ease
> the situation and would like to do so. do you know some solution here.
> maybe we could do a simple condition that checks for an enviromental
> stuff thats only present on shutdown/reboot and is a reliable source.
> and yes i left some ugly debugging stuff in, but because it isn't
> ready so far ;)

I actually had a modified version of libvirt's init script that did what
you're doing. It checked whether $0 was called K??libvirt-bin, in which
case it was being called as part of a runlevel change. The equivalent in
the world of Upstart is probably something like the UPSTART_EVENTS
environment variable. It will not be set if you're running "stop
libvirt-bin", but will be set if the libvirt-bin job is being stopped as
a result of an Upstart event.

> regarding the killing by /etc/init.d/sendsings we could easily test
> when the VMs got killed and how to prevent when libvirtd would be
> still accessable. so i would say we should get the upstart things
> right first and then go on with that. maybe we could try to track and
> bound the pids of the guest somehow to the upstart job (in case we
> keep it seperate) and prevent it that way.

libvirt makes sure qemu creates pidfiles for kvm processes anyway. It
shouldn't be hard at all to make sendsigs omit them from its killing
spree.

--
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/

tomcrus (tomcrus) wrote :

As I'm currently setting up an intranet mail-server for my company (based on lucid / ebox) in some kvm-guests and there for have exactly the same problem. Reading all previous posts - and because I need kvm-guest shutdown as well - I will use something like this:

Add following lines in /etc/init.d/sendsigs just before the sync line:

  # avoid every running kvm (started thru libvirt-bin) from being killed
  for pidfile in /var/run/libvirt/qemu/*.pid; do
    OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $(cat $pidfile)"
  done

I think this should prevent kvm-guests from being killed on shutdown.
Then in /etc/init/libvirt-bin.conf I would try to add following lines:

  kill timeout 360

  pre-stop script
    # only shutdown all guests if stopping of task is because of
    # runlevels 0 (halt) or 6 (reboot)
    if [ "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "6" ]; then
      TIMEOUT=300
      VIRSH_CONNECT="qemu:///system"

      function list_running_vms(){
        virsh --connect $VIRSH_CONNECT list | grep running | awk '{ print $2 }'
      }

      # send shutdown to each running vm (they must handle acpi-power-button!)
      list_running_vms | while read vm; do
        virsh --connect $VIRSH_CONNECT shutdown $vm
      done

      # wait until timeout has reached or no running are vms left
      END_TIME=$(date -d "$TIMEOUT seconds" +%s)
      while [ $END_TIME -lt $(date +%s) ]; do
        test -z "$(list_running_vms)" && break
        sleep 1
      done

      # if any vms left running destroy them now
      if [ -n "$(list_running_vms)" ]; then
        list_running_vms | while read vm; do
          virsh --connect $VIRSH_CONNECT destroy $vm
        done
      fi
      sleep 3
    fi
  end script

I don't know yet if this works, but I will test as soon as possible. Maybe if somebody else is in the mood to do so please report if it succeeded...

ossjunkie (ossjunkie) wrote :

i reworked the changes by tomcrus (thanks for the elegance ;) and removed all reintroduced bashism. so here is the current of the "core logic" to shutdown guests.

ossjunkie (ossjunkie) wrote :

i have also tested the hack to /etc/init.d/sendsigs

ossjunkie (ossjunkie) wrote :

but i am still hanging on the upstart integration when using pre-stop in libvirt-bin. while the condition regarding the runlevel by tomcrus works great, it seems we are not able to simply delay the shutdown process with a simple sleep. in my case i only managed to get the pre-stop killed or make the reboot hang forever. i also tried "exec sleep". there is also some stuff in /etc/init.d/sendsigs that might be of interest.

for anyone who like to do the puzzle just make the timeout work in pre-stop and we should have the solution by placing the core script into it.

Reinhard Tartler (siretart) wrote :

instead of destroying vms, wouldn't be a more elegant solution to "save"
the state of the VM in a statefile?

On bootup, upstart would check for such statefiles and restore the
VMs. This would work even for VMs that don't react to ACPI events
properly and saves the VM's uptimes.

ossjunkie (ossjunkie) wrote :

@Rheinhard: sure it would be, but this should be the next step, as we still have shutdown not working yet. so let's stay to the KISS principle till we got the upstart integration.

ossjunkie (ossjunkie) wrote :

as i really needed a solution i did a dirty workaround for servers:

move /sbin/shutdown to /sbin/shutdown.real
place the attached script to /sbin/shutdown and make it executable

you can do the same for /sbin/reboot to have it run even on "reboot, halt, poweroff --force". you only need to change REALCMD to /sbin/reboot.real

this way we could at least just run "reboot" and "shutdown" again without worring about the guests.

Sergey Svishchev (svs) wrote :

This is implemented in libvirt 0.8.2:

https://bugzilla.redhat.com/show_bug.cgi?id=444273

Valentijn Sessink (valentijn) wrote :

I'm still not sure, but doesn't the /etc/init.d/sendsigs script, combined with the Ubuntu 10.04 /etc/init/rc.conf, make up for a giant race condition? Where a shell script tries to find out which processes it should not kill? This is, at least, what I'm making of it, when running libvirt on a server that has no other services running (i.e. the sendsigs script is run almost right away).

Francesco Pretto (ceztko) wrote :

I have seen the commit that implements this functionality and it's, as
usual, a sysvinit script. Unfortunately, I think this is a use case
where upstart can't do anything yet and the only solution (without
moving back to a sysvinit script) is doing an ugly workaround in
/etc/init.d/sendsigs. I've tried to subscribe Scott James Remnant
(maintainer of upstart: thanks!): this is not a bug in upstart (maybe
a feature lack) but I hope he have something to share with us about
this problem.

2010/8/6 Sergey Svishchev <email address hidden>:
> This is implemented in libvirt 0.8.2:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=444273
>
> ** Bug watch added: Red Hat Bugzilla #444273
>   https://bugzilla.redhat.com/show_bug.cgi?id=444273
>
>

Valentijn Sessink (valentijn) wrote :

~$ dpkg -S /etc/init/rc.conf
upstart: /etc/init/rc.conf

While rc.conf has nothing to do with shutting down virtual machines, it does have the consequence of running the sendsigs script in parallel with the upstart scripting. I'm not sure that is the right way to proceed.

Andy (andy-xillean) wrote :

So does this mean that KVM is not properly supported on Ubuntu 10.04 Server LTS? The ability to automatically shutdown VMs when someone logs into a production host server and types /sbin/shutdown does not exist on a server operating sytstem flagged LTS that is supposed to be for production? This can't be. There must be something that is being overlooked here.

I tried creating a simple traditional sys5 init script to test this out after the script I wrote failed to shutdown the guests on reboot or shutdown but works manually from the command line.

[code]

!/bin/bash
### BEGIN INIT INFO
# Provides: virsh test
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop: 0 1 6
# Short-Description: Gracefull shutdown of all KVM guests.
# Description: Shutdown KVM guests on host shutdown
### END INIT INFO

/usr/bin/virsh list >>/var/log/kvmguest.log 2>&1

[/code]

this works when i run it manually.
I created a link from /etc/init.d/testscript to /etc/rc6.d/K5testscript.
reboot server.

and I get this message in the log
error: unable to connect to '/var/run/libvirt/libvirt-sock': Connection refused
error: failed to connect to the hypervisor

So ofcourse the guests are not shutdown gracefully on host shutdown. A very basic feature of running a virtual machine server. Wow. Just WOW!

Valentijn Sessink (valentijn) wrote :

libvirt-bin gets shutdown by Upstart, so when you try to shutdown from an init script, chances are that libvirt has shutdown already. So if you want to do this, you need to change /etc/init/libvirt-bin.conf as well, and have it wait for the VM's to shut down.

Andy (andy-xillean) wrote :

Ok I create a pre-stop script in /etc/init/libvirt-bin.conf
and it is running the script however its not waiting for the script to complete. Everything shutsdown extremely fast in about 3 seconds and I am back to the BIOS screen.
So the it starts shutting down the guest os but the host contiues to shutdown no waiting for the guest to complete shutting down?
I even added sleep 20 in the pre-stop script in /etc/init/libvirt-bin.conf and it completely ignored the sleep command or because sendsigs may be running in parallel the system shutsdown anyways regardless of any sleep command. If this is the case then thats just brilliant. So how does one fix this if this is indeed the case?

modified the /etc/init.d/sendsigs file as stated above in comment #24 by tomcrus and it just hangs the host on reboot forcing a hard reset.

Does anyone know if Canonical uses KVM internally on Ubuntu 10.04 LTS. If so how have they solved this problem? Is there a paid solution?

Thanks.

Andy (andy-xillean) wrote :

Ok so I came up with a workaround that seems to be working for me. The guests shutdown completely. Not sure what this breaks if anything. There still needs to be a permanent fix for this bug but here is what I did.

(1)
Edit /etc/init/libvirt-bin.conf
I added this under the "pre-start script" section.

pre-stop script
        /etc/init.d/kvmguests stop
end script

(2)
Edit /etc/init.d/sendsigs

Search in /etc/init.d/sendsigs for "do_stop () {" and add these lines right below it.

kvm_shutdown_timeout=30
while /bin/ls /var/run/libvirt/qemu/*.pid>/dev/null 2>&1 && [ $kvm_shutdown_timeout -gt 0 ]
        do
        sleep 1
        kvm_shutdown_timeout=$(expr $kvm_shutdown_timeout - 1)
done

(3)
Create sys-v init script kvmguests and put it in /etc/init.d and make it executable.
Here is mine attached to this post. It can also be used to manually stop the guests.

**Notes: The kvm_shutdown_timeout=30 in /etc/init.d/sendsigs can be tuned to your environment.

Andy (andy-xillean) wrote :

Update**
In my comment #38 above step (2) I replaced the following modification in /etc/init.d/sendsigs from.

while /bin/ls /var/run/libvirt/qemu/*.pid>/dev/null 2>&1 && [ $kvm_shutdown_timeout -gt 0 ]

with:

while /usr/bin/pgrep -cx kvm > /dev/null 2>&1 && [ $kvm_shutdown_timeout -gt 0 ]

I have rebooted the host several times and all 2 guests shutdown cleanly.

Reinhard Tartler (siretart) wrote :

Andy,

I didn't try your suggestion (yet), but is it really necessary to create
the kvmguests init script? Can't you just place the script inside the
pre-stop script section of /etc/init/libvirt-bin.conf?

Moreover, I've noticed that your kvmguests script uses upstart's 'start'
command to ensure that libvirtd is actually running. is that really
necessary? AFAIUI this can be dropped if integrated in libvirt-bin.conf

As for your change to /etc/init.d/sendsigs, wouldn't it be less
intrusive to make /etc/init/libvirt-bin.conf or probably libvirtd itself
drop or copy the pid files into /lib/init/rw/sendsigs.omit.d? AFAIUI,
the upstart job should be executed before the sendsigs init script, or
is there another race here?

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Nick Barcet (nijaba) on 2010-10-07
Changed in libvirt (Ubuntu):
milestone: none → ubuntu-10.04.2
tags: added: patch
Changed in libvirt (Ubuntu):
milestone: ubuntu-10.04.2 → ubuntu-10.04.3
32 comments hidden view all 112 comments
gollum53 (smid) wrote :

Hi,
I have installed the latest mentioned files here (sendsigs, omit-kvm-vm-pid, libvirt-bin.conf with 'break' command suggested by Martin Rusko) on our Lucid 64bit. servers, and it did not help. When doing a reboot, the system does not wait for the domains to perform a clean shutdown...is there anyone here with working configuration? What could be wrong? Can you please post your files?
Thanx
Roman Smid

Martin Rusko (rusko) wrote :

Hi Roman,

actually I ended up with molly-guard solution. It is suboptimal solution because it stops your virtual machines only when shutdown/restart is initiated manually (and only from remote ssh session). It won't apply if UPS initiates shutdown or if power button is pressed shortly, etc.

In my case "the server" (well, shall I still call it this way?) is desktop PC with no screen, no UPS, accessible only via SSH. Therefore molly-guard solutions is satisfactory for me. Attached is file, which you can copy into /etc/molly-guard/run.d/ directory if you decide to use it this way.

Cheers,
Martin

Clint Byrum (clint-fewbar) wrote :

On Fri, 2011-03-18 at 11:27 +0000, gollum53 wrote:
> Hi,
> I have installed the latest mentioned files here (sendsigs, omit-kvm-vm-pid, libvirt-bin.conf with 'break' command suggested by Martin Rusko) on our Lucid 64bit. servers, and it did not help. When doing a reboot, the system does not wait for the domains to perform a clean shutdown...is there anyone here with working configuration? What could be wrong? Can you please post your files?
> Thanx
> Roman Smid
>

I'd be interested to hear if adding this to libvirt-bin.conf would work:

pre-stop script
  if [ x$UPSTART_EVENTS = xrunlevel ] ; then
    shutdown_guests
  fi
end script

Obviously the 'shutdown_guests' command needs to be a single command
which stops the guests and waits for them to die.

Lars Hansson (romabysen) wrote :

The problem with the upstart solution is that it doesn't work since
Upstart will kill the virsh instances that you spawn in
shutdown_guests.
So far the only somewhat satisfactory solution is molly-guard.

John Morrissey (jwm) wrote :

On Fri, Mar 18, 2011 at 05:13:55PM -0000, Lars Hansson wrote:
> The problem with the upstart solution is that it doesn't work since
> Upstart will kill the virsh instances that you spawn in
> shutdown_guests.

That's only the case if you don't follow the instructions to patch sendsigs
to avoid that bug:

> #44: John Morrissey wrote on 2010-09-17:
> Finally, this modified upstart job requires the fix for
> https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/639940. Otherwise,
> the libvirt-bin pre-stop script isn't guaranteed to finish successfully,
> since sendsigs races the child processes executed during the course of the
> job script.

john
--
John Morrissey _o /\ ---- __o
<email address hidden> _-< \_ / \ ---- < \,
www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__

Clint Byrum (clint-fewbar) wrote :

On Fri, 2011-03-18 at 17:13 +0000, Lars Hansson wrote:
> The problem with the upstart solution is that it doesn't work since
> Upstart will kill the virsh instances that you spawn in
> shutdown_guests.
> So far the only somewhat satisfactory solution is molly-guard.
>

Right, so you would need shutdown_guests to block until everything is in
fact shut down:

while box in `virsh list --all | grep '^ -'|grep -qv 'shut off'` ; do
  virsh shutdown all
  sleep 1
done

Note that currently libvirt-bin.conf has

stop on runlevel [!2345]

There is a bug in sendsigs that does not wait for these upstart jobs to
die before moving on to the end of the shutdown... so really, we also
need to fix that bug before the technique above will work.

Serge Hallyn (serge-hallyn) wrote :

@Clint,

is there an open bug for sendsigs? Can you mark this bug as blocked on that one?

gollum53 (smid) wrote :

Hi,
the proposed solution (patched sendsigs, libvirt-bin.conf, omit-kvm-vm-pids) still does not work for me. I have done several tests with sendsigs and found out that the corresponding pids of running virtuals are successfully included in the omitpids array in sendsigs. Hovever, the virtuals are still being killed instantly upon reboot...:( Any ideas?
Thanx
Roman Smid

Changed in libvirt (Ubuntu):
importance: Low → Medium

This is really a show-stopper and spoils the otherwise solid solution to use Ubuntu and KVM for virtualization.

Why did nobody propose yet to shut down the VMs by the qemu-kvm script? I think "stop qemu-kvm" should try to shutdown all running kvms and kill them after a timeout, independently of libvirt. This would also work for instance, if libvirtd has been stopped before for other reasons and then the system is rebooted. With the monitor (which is accessible after libvirtd has been stopped) you can send a "system_powerdown" command.

At the moment "stop qemu-kvm" just removes the kvm kernel modules. Maybe this is the explanation, why the kvm processes get killed? (Well, normally not, the modprobe -r should not succeed, as long as the module is in use. I cannot try it at the moment.)

Sven

Serge Hallyn (serge-hallyn) wrote :

Hi Clint,

Assigned this to you in the hopes you would reply to comment #79 :)

Changed in libvirt (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Clint Byrum (clint-fewbar) wrote :

Shutdown is definitely on the radar for heavy testing this cycle, and I hope to have a good solution for this bug soon. The bug in sendsigs should hopefully become moot as we transition the shutdown to a more upstart-aware solution.

tags: added: upstart

The problem of shutting down the virtual machines consists of two parts:

If a init-script is used, libvirtd might be killed, before the shutdown command has been sent to the virtual machines. If the pre-stop script in the libvirt-bin upstart job is used or a new upstart job starting on stopping libvirt-bin is used, the computer is rebooted, the VMs are killed by the sendsigs script. Even if the sendsigs script is modified to not kill the VMs, the reboot or halt script might run before the libvirt-bin upstart job has been stopped.

I use the following approach, which works quite well for me:
I added a libvirt-shutdown-domains upstart job, which is a task starting on stopping libvirt-bin, thus running before libvirt-bin is stopped. This makes sure that libvirt-bin is not stopped, before the VMs are shutdown. In fact, I also included a runlevel check in the libvirt-shutdown-domains script, so that the VMs are only shutdown, when libvirtd is stopped because of a runlevel change. This allows for libvirtd to be restarted (e.g. when libvirt-bin is upgraded) without having to shutdown all VMs.
I also added a init-script running before the sendsigs script, which waits for libvirt-bin to be stopped (or a timeout, whichever occurs first). Thus, sendsigs will not kill VMs and the machine will not be rebooted or powered off, before the VMs have been stopped.

The scripts I wrote for this can be found at http://sebastian.marsching.com/wiki/Linux/KVM#Shutdown_virtual_machines_on_host_system_shutdown and some more explanations can be found at http://sebastian.marsching.com/blog/archives/112-KVM-and-Graceful-Shutdown-on-Ubuntu.html.

The solution I created is not perfect though because of two issues:
1. The python-libvirt package is needed, because I use a Python script to shutdown the virtual machines. You might want to use a shell script using virsh for that, thus eliminating this dependency.
2. The timeout, after which the shutdown will proceed anyway, is configured at two different places: The init-script and the Python script called by the upstart job. You might want to put both in the same place (e.g. /etc/default/libvirt-bin).

Thank you, Sebastian, it works more awesome and came just in time.

Francesco Pretto (ceztko) wrote :

Is this fixed in ubuntu 10.04.3?

dbendlin (diego-bendlin) wrote :

Hello Guys,

I wanned to share my workarroud on this issue, Im using debian squeeze as OS for a couple of virtualization hosts, each one of them holding a number of vm's.

What I need is that when shuting down my virtualization host (from cli, by cron, by ups low batt or by pressing the power button (ACPI)), the host hast to shutdown each vm before going down.

I used one of the scripts from earlier posts:
1.- I copied it to /usr/local/bin
2.- Changed ownership (chown root:root /usr/local/bin/libvirt-shutdown-domains)
3.- Chandeg permitions (chmod 775 /usr/local/bin/libvirt-shutdown-domains)
4.- Edited /etc/init.d/libvirt-bin, adding a reference to this file as the secound line in the stop section of the init file

Before actually executing the libvirt-bin stop logic, it calls the first script witch shutdown each vm's, when that script ends (after every vm is gracefully halted), the libvirt-bin init script continues.

dbendlin (diego-bendlin) wrote :

Here's my /etc/init.d/libvirt-bin file

Changed in libvirt (Ubuntu):
importance: Medium → High
milestone: ubuntu-10.04.3 → ubuntu-11.10-beta-1
Changed in libvirt (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → Serge Hallyn (serge-hallyn)
status: Triaged → In Progress
Serge Hallyn (serge-hallyn) wrote :

Thanks to ossjunkie for your original script, most of which is in my updated libvirt job.

I'm requesting a review by Spamaps for upstart oddities, but the script in the linked tree is working for me.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.2-4ubuntu7

---------------
libvirt (0.9.2-4ubuntu7) oneiric; urgency=low

  * libvirt-bin.upstart: add a pre-stop script to shut down running VMs
    before the host shuts down. (LP: #350936)
 -- Serge Hallyn <email address hidden> Tue, 02 Aug 2011 19:49:40 -0500

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Andreas Ntaflos (daff) wrote :

So is there any way the proper fix will make it into Lucid? Or has that ship sailed?

In the meantime I have applied Sebastian Marsching's method which works quite well. Thanks for that, Sebastian.

Changed in libvirt (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → High
Clint Byrum (clint-fewbar) wrote :

Andreas, yes, this should make it into lucid soon enough. We'll let Serge's fix bake for about a week in oneiric, and then backport to lucid. I don't know if we'll do the same for maverick and natty, though its not out of the question (its just a matter of whether its important enough to users).

Changed in kvm (Ubuntu Lucid):
status: New → Invalid
Changed in libvirt (Ubuntu Lucid):
milestone: none → ubuntu-10.04.4
Changed in libvirt (Ubuntu Maverick):
status: New → Triaged
importance: Undecided → High
Changed in kvm (Ubuntu Maverick):
status: New → Invalid
Andreas Ntaflos (daff) wrote :

Clint, great to hear, thanks. We have many Lucid servers in production, but no Maverick or Natty, so the fix getting into Lucid is most important for us. But if time and resources allow it, backporting the fix to Maverick and Natty would certainly be nice.

description: updated

Hello exe, or anyone else affected,

Accepted libvirt into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Martin Pitt (pitti) wrote :

Hello exe, or anyone else affected,

Accepted libvirt into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in libvirt (Ubuntu Lucid):
status: Triaged → Fix Committed
Changed in libvirt (Ubuntu Maverick):
status: Triaged → Fix Committed
Martin Pitt (pitti) wrote :

Hello exe, or anyone else affected,

Accepted libvirt into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Nathan Crawford (njcrawford) wrote :

Tested on 10.04 64bit. Before installing the proposed package, guests' /var/log/dmesg contained this after every time the host was shutdown (either with shutdown command or by pressing power button):
[ 7.316252] EXT3-fs: INFO: recovery required on readonly filesystem.
[ 7.316256] EXT3-fs: write access will be enabled during recovery.
[ 36.242794] kjournald starting. Commit interval 5 seconds
[ 36.242815] EXT3-fs: recovery complete.

And after installing proposed package, guest dmesg contains:
[ 7.101435] kjournald starting. Commit interval 5 seconds
[ 7.101455] EXT3-fs: mounted filesystem with ordered data mode.

It gets a thumbs up from me!

tags: added: verification-done-lucid
Martin Pitt (pitti) on 2011-09-11
tags: added: verification-done
Saman Behnam (sbehnam73) wrote :

OK I've wrote something, that works on Lucid 10.04-3 without patching or backporting!!!!
Just check the attachment!

Deshalb koennen Pinguine nicht fliegen. Was nicht fliegt kann nicht abstuerzen.

have fun

Saman Behnam (sbehnam73) wrote :

sorry
the first one had errors!
here is a corrected version 1.0-1

Martin Pitt (pitti) wrote :

Resetting verification tags for missing maverick testing.

tags: removed: verification-done verification-done-lucid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.7.5-5ubuntu27.17

---------------
libvirt (0.7.5-5ubuntu27.17) lucid-proposed; urgency=low

  * debian/libvirt-bin.upstart: add a pre-stop script to shut down running VMs
    before the host shuts down. (LP: #350936)
  * debian/libvirt-bin.default: add a comment that this file is not actually
    used at startup. (LP: #823638)
 -- Serge Hallyn <email address hidden> Mon, 15 Aug 2011 15:20:37 -0500

Changed in libvirt (Ubuntu Lucid):
status: Fix Committed → Fix Released
Chris Halse Rogers (raof) wrote :

Is anyone available to test this for natty and/or maverick?

I could test on Natty later today.

Adding link to pending SRU page for my convenience
http://people.canonical.com/~ubuntu-archive/pending-sru.html

Serge Hallyn (serge-hallyn) wrote :

Verified on maverick-proposed.

Verified on natty-proposed.

$ dpkg -l | grep libvirt | awk '{print $2 " " $3}'
libvirt-bin 0.8.8-1ubuntu6.6
libvirt0 0.8.8-1ubuntu6.6
python-libvirt 0.8.8-1ubuntu6.6
$ lsb_release -ds
Ubuntu 11.04
$ uname -srvi
Linux 2.6.38-13-generic #52-Ubuntu SMP Tue Nov 8 16:53:51 UTC 2011 x86_64
$ cat /var/log/libvirt/shutdownlog.log
libvirt: libvirt-bin: entering pre-stop at Mon Nov 14 17:37:47 EST 2011
libvirt: libvirt-bin: attempting clean shutdown of test-vm at Mon Nov 14 17:37:47 EST 2011
libvirt: libvirt-bin: exiting pre-stop at Mon Nov 14 17:37:53 EST 2011

Changed in libvirt (Ubuntu Natty):
status: New → Fix Committed
tags: added: verification-done
removed: verification-needed
tags: added: verification-done-natty verification-needed

In comment #104 Serge wrote that he did the verification on Maverick. Based on that I'll update the tags.

tags: added: verification-done-maverick
removed: verification-needed
Clint Byrum (clint-fewbar) wrote :

nutznboltz, typically we require somebody who wasn't the developer uploading the fix to verify it (Serge did the fix). However, with maverick in its sunset (< 6 months of life left) I think we should accept the fact the the surrounding releases, lucid and natty worked fine with the same fix.

clint-fewbar, Thanks, I'll keep that in mind.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in kvm (Ubuntu Natty):
status: New → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.8.3-1ubuntu19.4

---------------
libvirt (0.8.3-1ubuntu19.4) maverick-proposed; urgency=low

  * New version of debian/patches/lxc-use-own-ptyfns.patch. Previous version
    failed to build.

libvirt (0.8.3-1ubuntu19.3) maverick-proposed; urgency=low

  * lxc_controller: use our own unlocpt+grantpt rather than glibc's, which
    can't handle opening a pty in a devpts not mounted at /dev/pts.
    (LP: #863629)
 -- Serge Hallyn <email address hidden> Tue, 15 Nov 2011 08:06:57 -0600

Changed in libvirt (Ubuntu Maverick):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.8.8-1ubuntu6.7

---------------
libvirt (0.8.8-1ubuntu6.7) natty-proposed; urgency=low

  * lxc_controller: use our own unlocpt+grantpt rather than glibc's, which
    can't handle opening a pty in a devpts not mounted at /dev/pts.
    (LP: #863629)
 -- Serge Hallyn <email address hidden> Tue, 01 Nov 2011 18:00:51 +0000

Changed in libvirt (Ubuntu Natty):
status: Fix Committed → Fix Released
Rolf Leggewie (r0lf) wrote :

natty has seen the end of its life and is no longer receiving any updates. Marking the natty task for this ticket as "Won't Fix".

Changed in kvm (Ubuntu Natty):
status: Confirmed → Won't Fix
Displaying first 40 and last 40 comments. View all 112 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.