Stefan Bader [2013-05-29 13:42 -0000]:
> Booting in Xen mode there is
> one systemd-udevd crashing immediatly after calling udevadm control
> --exit and that ends the udevadm command (so it does not wait).
That indeed sounds like the most plausible cause for now, and in any
way udevd should not crash; let's fix the crash first, and then see
how much further this gets us.
Thanks for the core file! I can load it fine with current
202-0ubuntu11 version, and get the same bt as you. The actual crash
seems obvious:
if (worker_monitor == NULL) return;
/* allow the main daemon netlink address to send devices to the worker */ udev_monitor_allow_unicast_sender(worker_monitor, monitor);
I. e. "monitor" is not checked for being NULL here; but it is (see
frame #0), thus causing the crash.
The global monitor gets initialized in main(), and after the
initialization there is a NULL check. However, further down in main()
it is unreffed and set to NULL if "udev_exit" becomes true, i. e. the
worker got a SIGTERM. As far as I can see, this is incompatible with
our 0024-avoid-exit-deadlock-for-dm_cookie.patch patch which keeps
processing events even after an exit has been requested; in that case,
we must not access monitor any more.
Thanks for your investigations!
Stefan Bader [2013-05-29 13:42 -0000]:
> Booting in Xen mode there is
> one systemd-udevd crashing immediatly after calling udevadm control
> --exit and that ends the udevadm command (so it does not wait).
That indeed sounds like the most plausible cause for now, and in any
way udevd should not crash; let's fix the crash first, and then see
how much further this gets us.
Thanks for the core file! I can load it fine with current
202-0ubuntu11 version, and get the same bt as you. The actual crash
seems obvious:
#0 udev_monitor_ allow_unicast_ sender udev_monitor= udev_monitor@ entry=0x6e0110, sender=0x0)
with
udev_ monitor- >snl_trusted_ sender. nl.nl_pid = sender- >snl.nl. nl_pid;
The caller does
if (worker_monitor == NULL)
return;
udev_monitor_ allow_unicast_ sender( worker_ monitor, monitor);
/* allow the main daemon netlink address to send devices to the worker */
I. e. "monitor" is not checked for being NULL here; but it is (see
frame #0), thus causing the crash.
The global monitor gets initialized in main(), and after the exit-deadlock- for-dm_ cookie. patch patch which keeps
initialization there is a NULL check. However, further down in main()
it is unreffed and set to NULL if "udev_exit" becomes true, i. e. the
worker got a SIGTERM. As far as I can see, this is incompatible with
our 0024-avoid-
processing events even after an exit has been requested; in that case,
we must not access monitor any more.