On Wed, Sep 21, 2005 at 03:38:06AM +0100, Scott James Remnant wrote: > Background: in the upcoming Ubuntu 5.10 we've been having some problems > with /dev/input/mice not being created on startup despite the "mousedev" > module being hard-loaded early in the boot sequence. > (http://bugzilla.ubuntu.com/show_bug.cgi?id=12915 for those interested). > > Debian has had similar problems too (http://bugs.debian.org/317333) and > found that starting udevd earlier manually seemed to fix it. Yes, that's a good way to fix it. > After much debugging, I've finally figured out what's going on ... it's > a bit of a story, but here goes... Great, we finally have an idea why this happens. Thanks for finding that out. > On receiving the netlink event for the printer port, udevd disables > receipt of any "sequence numbered" events from udevsend (ie. those that > will almost certainly be duplicated over the netlink socket). > Unfortunately this means all the udevsend events we're about to receive > from the processes that backed off a second or so while fighting over > who got to start udevd[1]. > > These udevsend processes deliver their events to udevd, which cheerfully > ignores them because it thinks it's going to get another copy over the > netlink socket any second now. Unfortunately the netlink event has > already been and gone, and we just ignored an event we weren't supposed > to. > > > The two problems as I see them are: > > 1) The fact that receiving a netlink event disables sequence numbered > udevsend events, when there's already code to deal with de-duping > events anyway. Is there actually any need for this additional check, > can't we just queue both events and have them ignored by > msg_queue_insert() ? > > 2) That this ignoring of events is done at receipt, rather than in queue > order. This means that the "later" parport_pc netlink event is able > to disable queueing of udevsend events with a lower sequence number. > > I can envisage that #1 is necessary in case the time between receiving > the udevsend and netlink event is so long that we've already processed > and removed one of the events by the time the second is queued. Yes, that was the reason for ignoring the incoming messages. > In which case the problem becomes fixing #2, however unless the kernel > promises strict ordering of events over the netlink socket (which I > doubt, otherwise it wouldn't need sequence numbers) Netlink events are always in the right order. The SEQNUM is only needed for the forked events. > we can't assume > that we've received all of the pre-netlink events we are going to. Right, as "/proc/sys/kernel/hotplug" events are forked processes, you will never know when and in which order they will arrive. > I suspect the right solution is actually to implement history of what > events we've already processed, and de-dupe them that way; rather than > ignoring messages on receipt. We could just accept all events with a lower sequence number as the first netlink event's one, that may fix it. The "right solution" is to start udevd as one of the first things after taking over control from the kernel. This way you will only catch the events for the last "non driver core" subsystem, the input layer. At the time the input layer is fixed, the need for udevsend will completely go away and /proc/sys/kernel/hotplug should be disabled when taking over control from the kernel - it is only needed in initramfs. After input is fixed, the whole event reordering and timeout handling will be removed from udevd and we need to start udevd manually anyway. Kay