mdmonitor doesn't start recovery immediately
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mdadm (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
mdmonitor reacts on md events, it pools on /proc/mdstat file. Those events are generated if a change on any mddevice is observed in kernel. This is done asynchronously and can be caused by user space process (mdadm called by udev or user), or by kernel itself (drive is removed because it has to many errors).
The problem here is that mdmonitor isn't dealing with user space or udev. When drive with metadata is inserted, mdadm adds it to mddevice (it is done by udev). Md Event is generated then and mdmonitor may try to move drive to other mddevice if needed. It relies on by-path links, but this link to newly appeared device is not created yet, udev is still working on. As a result recovery doesn't start immediately.
Observed on Ubuntu 20.04.
Steps to reproduce:
1. Create RAID volume:
# mdadm --create /dev/md/imsm0 --metadata=imsm --raid-devices=4 /dev/nvme6n1 /dev/nvme1n1 /dev/nvme7n1 /dev/nvme3n1 --run
# mdadm --create /dev/md/
2. Add spare to container:
# mdadm --add /dev/md/imsm0 /dev/nvme0n1
3. Create appropriate policy line in /etc/mdadm/
POLICY domain=
4. Disconnect spare from container.
5. Start mdadm monitor with big delay (ex. 10 minutes):
# mdadm --monitor --delay 6000 --scan --mail=
6. Hot remove disk from array (physical disconnect).
7. Connect previously prepared spare.
Expected results:
Rebuild should start.
Actual results:
Rebuild does not start, added spare is in separate container.
description: | updated |
tags: | added: vroc |
Changed in mdadm (Ubuntu Impish): | |
status: | Incomplete → Fix Committed |
Changed in mdadm (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in mdadm (Ubuntu Impish): | |
status: | Fix Committed → Fix Released |
Hi,
mdmonitor needs to deal with other tasks, issue still in development.
Thanks, Mariusz