Multipath JBOD storage devices are not shown via /dev/mapper but each path as a single device.

Bug #1887558 reported by Mirek
56
This bug affects 9 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Alexsander de Souza
3.2
Triaged
High
Unassigned
3.3
Triaged
High
Unassigned
bcache-tools
New
Undecided
Unassigned
bcache-tools (Ubuntu)
New
Undecided
Unassigned

Bug Description

As in the title, if machine has multipath storage device connected e.g. JBOD via SAS controller, then each disk in the JBOD is reported N times, depend how many paths you have. It's problematic when you do commissioning as things are done multiple times, badblock test takes forever as each disk is done multiple times and also configuring when you do e.g. Ceph OSD deployment as you don't see a device you are going to use, you have to first ssh into a system and check all /dev/mapper to find a correct device for OSD.

Related branches

Revision history for this message
Lee Trager (ltrager) wrote :

MAAS does not currently support multipath devices. Curtin, the tool MAAS uses to perform installations, does. We'd have to add support for gathering information on multipath devices during commissioning, a way to properly configure multipath storage, and a way to test multipath devices.

Changed in maas:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Mirek (mirek186) wrote :

I guess you could either probe for any mapper devices and detect multipath this way or do it over a serial number when collecting info about storage if the same serial more then once then just grab the first one or look for a mapper device.
Do you know where in the code it is so I could have a look myself?

Revision history for this message
Lee Trager (ltrager) wrote :

I looked at adding support for multipath on IBM Z. In that case each LPAR has access to mutlipath devices from all other LPARs. There is no way to locally determine which multipath device is for which LPAR, nor is there a way to determine if any are in use. IBM told me that its expected that a storage administrator tells users which multipath device to use. They were adding an API to get that information but it would still be configured manually in the JBOD.

The code for storage is in various places and would require alot of work to add multipath support

* Storage data actually comes from LXD - https://github.com/lxc/lxd/blob/master/lxd/resources/storage.go
* The data is processed in the metadata server - https://git.launchpad.net/maas/tree/src/metadataserver/builtin_scripts/hooks.py
* It has to be modeled. We model block devices and physical block devices. For multipath we'd most likely need to design a new model - https://git.launchpad.net/maas/tree/src/maasserver/models
* The API would have to be updated to interact with the new model - https://git.launchpad.net/maas/tree/src/maasserver/api/blockdevices.py
* The websocket would also have to be updated - https://git.launchpad.net/maas/tree/src/maasserver/websockets/handlers/node.py
* The preseed, which generates Curtin config, would require changes - https://git.launchpad.net/maas/tree/src/maasserver/preseed_storage.py
* We'd also want to update the UI to show that this is a multipath device - https://github.com/canonical-web-and-design/maas-ui

Lee Trager (ltrager)
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
milestone: none → 2.10.0
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.0.0 → 3.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Mirek (mirek186) wrote :

Hi, sorry for the late reply, but the issue is still there. The machine-resource binary is still only listing each disk. I think the correct way of doing it would be first to check multipath, e.g. multipath -ll and if any disks found there you would have to discard them from standard SATA disk output. I can see MAAS is now using LXD resource API and they don't care about multipath so maybe fixing it there would fix it for MAAS at the same time.

Revision history for this message
Carlos Bravo (carlosbravo) wrote :

Hi, as of MAAS 3.1 and 3.2 Beta the issue is still present. I have several multipath devices and they still show multiple devices instead of an mpath device.

Revision history for this message
Junien Fridrick (axino) wrote :

Hi,

This is still a bug in MAAS 3.2.6. This will be a problem for PS6.

Thanks !

Changed in maas:
status: Fix Released → Triaged
importance: Wishlist → Undecided
assignee: Lee Trager (ltrager) → nobody
milestone: 3.0.0-beta1 → none
importance: Undecided → Wishlist
Changed in maas:
importance: Wishlist → Medium
milestone: none → 3.4.0
Revision history for this message
Björn Tillenius (bjornt) wrote :

Junien, and Mirek, could you both please attach the output of machine-resources and 'multipath -ll'? It would be good to see if both your setups require the same fix.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Junien Fridrick (axino) wrote :
Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Andy Wu (qch2012) wrote :

we have the this issue during PS6 deployment , MAAS 3.2.6 list all multipath devices separately, so it is not possible to configure bcache using multipath directly in MAAS

The workaround (credit to Junien Fririck) is to do the bache config in post-deployment, as following:

1. make-bcache -C /dev/nvme6n1p1 -B /dev/mapper/mpatha --writeback
2. copy /lib/udev/rules.d/69-bcache.rules to /etc/udev/rules.d/ folder, modify the rules to not register backing device (eg sd*) , so those device can still be used by multipathd

    cp /lib/udev/rules.d/69-bcache.rules /etc/udev/rules.d/.
    sed -i 's/sr\*/sr\*|sd\*/' /etc/udev/rules.d/69-bcache.rules

    # line 7 of modified rules look like this
    KERNEL=="fd*|sr*|sd*", GOTO="bcache_end

3. normally if bache is configured in MAAS, curtin will create udev rules to link the bache to /dev/disk/by-dname , here we need to do it manually

    uuid=$(bcache-super-show /dev/mapper/mpatha | grep dev.uuid | awk '{print $2}')
    echo "SUBSYSTEM==\"block\", ACTION==\"add|change\", ENV{CACHED_UUID}==\"$uuid\", SYMLINK+=\"disk/by-dname/bcache-osd1\"" > /etc/udev/rules.d/bcache-osd1.rules

4. trigger udev rules

    sudo udevadm trigger

5. update initramfs

    update-initramfs -u -k all

Revision history for this message
Junien Fridrick (axino) wrote :

Note that this workaround only works if you don't have sdX devices as part of bcache (in our case, we only have nvme* and mpath* devices). If you have both mpath and sdX devices as part of bcache, then you'd need something more elaborate.

tags: added: bug-council canonical-bootstack
Changed in maas:
importance: Medium → High
tags: removed: bug-council
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

This bug is caused by /lib/udev/rules.d/69-bcache.rules (from bcache-tools) grabbing all block devices where ID_FS_TYPE=bcache, including the disks backing the multipath device. Latter when multipathd daemon comes up and tries to initialize the mpath devices, it fails because those disks are already in use.

no longer affects: bcache-tools
Changed in maas:
assignee: nobody → Alexsander de Souza (alexsander-souza)
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in bcache-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :

Hi, can we get an update on this bug ?

Revision history for this message
Paride Legovini (paride) wrote :

Hi @Alexsander, my understanding of comment 12 is that it's bcache-tools is claiming some devices it shouldn't claim, however I see that you removed the bcache-tools bug task shortly after. Looks like you had an idea on where this should be fixed; if this is the case can you share more about it? Thanks!

Changed in bcache-tools (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

Hi @paride, I removed the bcache-tools task unintentionally, we don't have a good solution to work-around this bcache-tools behaviour.

Changed in bcache-tools (Ubuntu):
status: Incomplete → New
Changed in maas:
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.