[ibp] Provisioning stucks on Ubuntu (bare metal): 'mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?\n'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Alexander Adamov | ||
6.1.x |
Won't Fix
|
High
|
Fuel Python (Deprecated) | ||
7.0.x |
Fix Released
|
High
|
Alexander Adamov | ||
8.0.x |
New
|
Undecided
|
Fuel Documentation Team |
Bug Description
Fuel version info (6.1 build #432): http://
Environment deployment hanged because one slave node was unavailable after provisioning, here is a part of fuel-agent logs:
http://
Bug was reproduced on bare metal lab. I was able to connect to failed slave via IPMI, but couldn't log in using default (root/r00tme) credentials (see screenshot).
Steps to reproduce:
1. Create new environment on Ubuntu.
2. Add some nodes.
3. Deploy changes
Expected result:
- cluster is deployed and works fine
Actual:
- deployment hangs on provisioning step
Seems the issue is floating and is reproduced on bare metal servers only (I've never seen similar failures on CI or KVM/VirtualBox deployments). Diagnostic snapshot is attached.
Changed in fuel: | |
assignee: | Fuel provisioning team (fuel-provisioning) → Aleksandr Gordeev (a-gordeev) |
Changed in fuel: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in fuel: | |
importance: | Medium → High |
status: | Confirmed → Incomplete |
status: | Incomplete → Confirmed |
Changed in fuel: | |
milestone: | 6.1 → 7.0 |
status: | In Progress → Confirmed |
assignee: | Aleksandr Gordeev (a-gordeev) → Fuel Python Team (fuel-python) |
Changed in fuel: | |
assignee: | Fuel Python Team (fuel-python) → Aleksandr Gordeev (a-gordeev) |
status: | Confirmed → In Progress |
tags: |
added: release-notes-done rn7.0 removed: release-notes |
Changed in fuel: | |
assignee: | Fuel Documentation Team (fuel-docs) → Alexander Adamov (aadamov) |
tags: | added: area-docs |
The question is: was this md meant to be removed? It's obviously not md device, which fuel created some time in the past. It is Raid 0 and metadata version is 1.2. So, it's kinda user defined md device. Currently, Fuel Agent does have just rudimentary decommission support which is supposed to erase everything (all lvm, md, most plain partitions), but sometimes fails to do this.
I'd say this issue is medium. Probably, we could add some additional logic (fuser) to try to figure out which processes use this md and try to kill them. But, frankly, we need to develop our decommission feature to make it mature and data driven.