Added some debugging to the teardown code and managed to reproduce this. What we find is that we unbind and then attempt and fail a bind on the array, then we see the deletes for the unbind complete. This leads to the bind failure:
Here where we happened to mount successfully, note the delete falls in
the expected place:
[ 3.479917] md: bind<sda5>
[...]
[ 35.118235] md: md1 stopped.
[ 35.118240] md: unbind<sda5>
[ 35.118244] APW: sysfs_remove_link ret<0>
[ 35.140164] md: export_rdev(sda5)
[ 35.142276] APW: deleted something
[ 35.143848] md: bind<sda1>
[ 35.152288] md: bind<sda5>
[ 35.158571] raid1: raid set md1 active with 1 out of 2 mirrors
If we look at the code for stopping the array we see the following:
static int do_md_stop(mddev_t * mddev, int mode, int is_open)
{
[...]
rdev_for_each(rdev, tmp, mddev)
if (rdev->raid_disk >= 0) {
char nm[20];
sprintf(nm, "rd%d", rdev->raid_disk); sysfs_remove_link(&mddev->kobj, nm);
}
/* make sure all md_delayed_delete calls have finished */
flush_scheduled_work();
export_array(mddev);
[...]
Note that we flush_scheduled_work() to wait for md_delayed_deletes and then
export the array. However it is export_array() which triggers these
deletes:
static void unbind_rdev_from_array(mdk_rdev_t * rdev)
{
[...]
rdev->sysfs_state = NULL;
/* We need to delay this, otherwise we can deadlock when
* writing to 'remove' to "dev/state". We also need
* to delay it due to rcu usage.
*/
synchronize_rcu();
INIT_WORK(&rdev->del_work, md_delayed_delete);
kobject_get(&rdev->kobj);
schedule_work(&rdev->del_work);
}
So in reality we do not want to wait for this before the export_array()
but after. Testing with a patch to do this seems to resolve the issue.
Added some debugging to the teardown code and managed to reproduce this. What we find is that we unbind and then attempt and fail a bind on the array, then we see the deletes for the unbind complete. This leads to the bind failure:
[ 3.476504] md: bind<sda1> build/jaunty/ ubuntu- jaunty/ fs/sysfs/ dir.c:462 sysfs_add_ one+0x4c/ 0x50()
[...]
[ 35.097882] md: md0 stopped.
[ 35.097897] md: unbind<sda1>
[ 35.097907] APW: sysfs_remove_link ret<0>
[ 35.110198] md: export_rdev(sda1)
[ 35.113254] md: bind<sda1>
[ 35.113297] ------------[ cut here ]------------
[ 35.113300] WARNING: at /home/apw/
[...]
[ 35.115126] APW: deleted something
Here where we happened to mount successfully, note the delete falls in
the expected place:
[ 3.479917] md: bind<sda5>
[...]
[ 35.118235] md: md1 stopped.
[ 35.118240] md: unbind<sda5>
[ 35.118244] APW: sysfs_remove_link ret<0>
[ 35.140164] md: export_rdev(sda5)
[ 35.142276] APW: deleted something
[ 35.143848] md: bind<sda1>
[ 35.152288] md: bind<sda5>
[ 35.158571] raid1: raid set md1 active with 1 out of 2 mirrors
If we look at the code for stopping the array we see the following:
static int do_md_stop(mddev_t * mddev, int mode, int is_open) for_each( rdev, tmp, mddev)
sysfs_ remove_ link(&mddev- >kobj, nm);
{
[...]
rdev_
if (rdev->raid_disk >= 0) {
char nm[20];
sprintf(nm, "rd%d", rdev->raid_disk);
}
/* make sure all md_delayed_delete calls have finished */ scheduled_ work();
flush_
export_ array(mddev) ;
[...]
Note that we flush_scheduled _work() to wait for md_delayed_deletes and then
export the array. However it is export_array() which triggers these
deletes:
static void export_ array(mddev_ t *mddev) for_each( rdev, tmp, mddev) { rdev_from_ array(rdev) ;
{
[...]
rdev_
if (!rdev->mddev) {
MD_BUG();
continue;
}
kick_
}
[...]
}
It does this via unbind_ rdev_from_ array() :
static void kick_rdev_ from_array( mdk_rdev_ t * rdev) rdev_from_ array(rdev) ; rdev(rdev) ;
{
unbind_
export_
}
Which triggers the delated delete:
static void unbind_ rdev_from_ array(mdk_ rdev_t * rdev) >sysfs_ state = NULL; e_rcu() ; WORK(&rdev- >del_work, md_delayed_delete); get(&rdev- >kobj); work(&rdev- >del_work) ;
{
[...]
rdev-
/* We need to delay this, otherwise we can deadlock when
* writing to 'remove' to "dev/state". We also need
* to delay it due to rcu usage.
*/
synchroniz
INIT_
kobject_
schedule_
}
So in reality we do not want to wait for this before the export_array()
but after. Testing with a patch to do this seems to resolve the issue.