Fuel7.0 Deployment using multiple NMVe disks fails

Bug #1503987 reported by Atze de Vries
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Alexander Kislitsky
7.0.x
Fix Released
High
Sergii Rizvan

Bug Description

In Fuel 7.0 with using more than 1 NMVe disk the deployment fails.

Fuel agens comes with an error

Disk not found: <diskname> , where diskname is nvme1n1 or nvme2n1

The error comes from the fuction '_disk_dev' in the fuel_agent / drivers / nailgun.py

So i've been digging arround and i found the cause of the error

In the bootstrap image based on CentOS 'udevadm' in not working correctly. It returns the id of the disk incorrectly. Id's of two nvme disks are the same.

Here is an example of a server with 3 disks, 1 hardware raid and two nvme disks. (which are on the pci bus)

[root@bootstrap ~]# udevadm info --export-db | grep /dev/disk
E: DEVLINKS=/dev/block/253:0 /dev/disk/by-path/pci-0000:04:00.0 /dev/disk/by-id/wwn-0x65cd2e4080864356494e000000010000
E: DEVLINKS=/dev/block/253:64 /dev/disk/by-path/pci-0000:05:00.0 /dev/disk/by-id/wwn-0x65cd2e4080864356494e000000010000
E: DEVLINKS=/dev/block/8:0 /dev/disk/by-id/scsi-3600304801c1e8f001d5cb9c602dff27b /dev/disk/by-path/pci-0000:82:00.0-scsi-0:2:0:0 /dev/disk/by-id/wwn-0x600304801c1e8f001d5cb9c602dff27b

This causes the function 'match_device' in fuel_agent / drivers /nailgun.py to return two block devices (for example an array ['/dev/nvme0n1','/dev/nvme1n1']).

Which in the function '_dev_disk' causes the var 'len(found) > 1' and throws the Disk not found error.

a temponary work arroud i used in (and i then succesfully deployed an env) is to modify the function '_dev_disk' to this

def _disk_dev(self, ks_disk):
        # first we try to find a device that matches ks_disk
        # comparing by-id and by-path links
        matched = [hu_disk['device'] for hu_disk in self.hu_disks
                   if match_device(hu_disk, ks_disk)]
        # if we can not find a device by its by-id and by-path links
        # we try to find a device by its name
        fallback = [hu_disk['device'] for hu_disk in self.hu_disks
                    if '/dev/%s' % ks_disk['name'] == hu_disk['device']]

        # fix for centOS since udevadm is not reporting correctly
        # choose in case of nvme disks only fallback method
        if any(f.find('nvme') > 0 for f in matched):
            matched = False
        # end of fix

        found = matched or fallback
        if not found or len(found) > 1:
             raise errors.DiskNotFoundError(
                'Disk not found: %s ' % ks_disk['name'])

so in case of nvme, only use the fallback method.

Here is some data to play with:

hu_disks

[{'bspec': {'alignoff': '0',
   'iomin': '512',
   'ioopt': '0',
   'maxsect': '256',
   'pbsz': '512',
   'ra': '256',
   'ro': '0',
   'size64': '800166076416',
   'ss': '512',
   'sz': '1562824368'},
  'device': '/dev/nvme0n1',
  'espec': {'removable': '0', 'vendor': '0x8086'},
  'uspec': {'DEVLINKS': ['/dev/block/253:0',
    '/dev/disk/by-path/pci-0000:04:00.0',
    '/dev/disk/by-id/wwn-0x65cd2e4080864356494e000000010000'],
   'DEVNAME': '/dev/nvme0n1',
   'DEVPATH': '/devices/pci0000:00/0000:00:03.0/0000:04:00.0/block/nvme0n1',
   'DEVTYPE': 'disk',
   'ID_MODEL': 'INTEL_SSDPE2ME80',
   'ID_SERIAL_SHORT': '65cd2e4080864356494e000000010000',
   'ID_VENDOR': 'NVMe',
   'ID_WWN': '0x65cd2e4080864356',
   'MAJOR': '253',
   'MINOR': '0'}},
 {'bspec': {'alignoff': '0',
   'iomin': '512',
   'ioopt': '0',
   'maxsect': '256',
   'pbsz': '512',
   'ra': '256',
   'ro': '0',
   'size64': '800166076416',
   'ss': '512',
   'sz': '1562824368'},
  'device': '/dev/nvme1n1',
  'espec': {'removable': '0', 'vendor': '0x8086'},
  'uspec': {'DEVLINKS': ['/dev/block/253:64',
    '/dev/disk/by-path/pci-0000:05:00.0',
    '/dev/disk/by-id/wwn-0x65cd2e4080864356494e000000010000'],
   'DEVNAME': '/dev/nvme1n1',
   'DEVPATH': '/devices/pci0000:00/0000:00:03.1/0000:05:00.0/block/nvme1n1',
   'DEVTYPE': 'disk',
   'ID_MODEL': 'INTEL_SSDPE2ME80',
   'ID_SERIAL_SHORT': '65cd2e4080864356494e000000010000',
   'ID_VENDOR': 'NVMe',
   'ID_WWN': '0x65cd2e4080864356',
   'MAJOR': '253',
   'MINOR': '64'}},
 {'bspec': {'alignoff': '0',
   'iomin': '4096',
   'ioopt': '0',
   'maxsect': '560',
   'pbsz': '4096',
   'ra': '256',
   'ro': '0',
   'size64': '79456894976',
   'ss': '512',
   'sz': '155189248'},
  'device': '/dev/sda',
  'espec': {'removable': '0',
   'state': 'running',
   'timeout': '90',
   'vendor': 'LSI'},
  'uspec': {'DEVLINKS': ['/dev/block/8:0',
    '/dev/disk/by-id/scsi-3600304801c1e8f001d5cb9c602dff27b',
    '/dev/disk/by-path/pci-0000:82:00.0-scsi-0:2:0:0',
    '/dev/disk/by-id/wwn-0x600304801c1e8f001d5cb9c602dff27b'],
   'DEVNAME': '/dev/sda',
   'DEVPATH': '/devices/pci0000:80/0000:80:01.0/0000:82:00.0/host10/target10:2:0/10:2:0:0/block/sda',
   'DEVTYPE': 'disk',
   'ID_BUS': 'scsi',
   'ID_MODEL': 'SMC3108',
   'ID_SERIAL_SHORT': '600304801c1e8f001d5cb9c602dff27b',
   'ID_VENDOR': 'LSI',
   'ID_WWN': '0x600304801c1e8f00',
   'MAJOR': '8',
   'MINOR': '0'}}]

ks_disk

{'extra': ['disk/by-id/wwn-0x65cd2e4080864356494e000000010000'],
 'free_space': 762469,
 'id': 'disk/by-path/pci-0000:05:00.0',
 'name': 'nvme0n1',
 'size': 763097,
 'type': 'disk',
 'volumes': [{'size': 300, 'type': 'boot'},
  {'file_system': 'ext2',
   'mount': '/boot',
   'name': 'Boot',
   'size': 200,
   'type': 'raid'},
  {'size': 0, 'type': 'lvm_meta_pool'},
  {'lvm_meta_size': 64, 'size': 55360, 'type': 'pv', 'vg': 'os'},
  {'lvm_meta_size': 64, 'size': 707237, 'type': 'pv', 'vg': 'vm'}]}

description: updated
description: updated
description: updated
description: updated
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
milestone: none → 8.0
importance: Undecided → High
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Hi, Atze de Vries

Thanks for perfectly reported issue.

tags: added: customer-found
Changed in fuel:
status: New → Confirmed
tags: added: ibp
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

It seems that nailgun affected as well.

https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/extensions/volume_manager/manager.py#L630

it relies on 'extra' and does comparing disks as intersections of sets https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/extensions/volume_manager/manager.py#L648

so, even if nailgun-agent reports two nvme disks, nailgun couldn't distinguish one disk from the another due to the same 'by-id-wwn' links in 'extra'

tags: added: module-nailgun
Revision history for this message
Oleg S. Gelbukh (gelbuhos) wrote :

Alexander, we could include all contents of DEVLINKS when generate 'extra' parameter in fuel-nailgun-agent. It should solve the problem as /dev/by-path/ links are different for them.

Revision history for this message
Oleg S. Gelbukh (gelbuhos) wrote :

fuel-agent still have to be fixed, of course, as it doesn't take 'extra' into account, as far as I can see.

Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Stanislav Kolenkin (skolenkin) wrote :

Workaround:

1.Disable driver nvme in Cobbler in bootstrap profile
(Gudie: https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html#using-the-cobbler-web-ui-to-set-kernel-parameters)
Add to Kernel Options
nvme.blacklist=1

2.Reboot node to Bootstrap

3.Connect to node via ssh and check kernel module
lsmod |grep name

4.Deploy node

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Alexander Kislitsky (akislitsky)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246424

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-nailgun-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246444

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246995

Roman Rufanov (rrufanov)
tags: added: support
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-nailgun-agent (master)

Change abandoned by Alexander Kislitsky (<email address hidden>) on branch: master
Review: https://review.openstack.org/246444
Reason: This change is not required for fix.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (master)

Reviewed: https://review.openstack.org/246424
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=f7a0ecd242425143d8bed5093572e01c66607ce8
Submitter: Jenkins
Branch: master

commit f7a0ecd242425143d8bed5093572e01c66607ce8
Author: Alexander Kislitsky <email address hidden>
Date: Tue Nov 17 17:24:01 2015 +0300

    Workaround for detection of CentOS NVMe disks added

    On CentOs udevadm returns the same ids for NVMe disks.
    For handling this case we use matched by name devices.

    Change-Id: Iecca4c188fa148be4c4857767a8b57b60def8be9
    Partial-Bug: #1503987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/246995
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=d7432c144f1756533bdea220fe3024f57fc70b56
Submitter: Jenkins
Branch: master

commit d7432c144f1756533bdea220fe3024f57fc70b56
Author: Alexander Kislitsky <email address hidden>
Date: Wed Nov 18 18:45:56 2015 +0300

    Case with same disk ids covered in VolumeManager

    For NVMe disks udevadm on the CentOs returns the same ids,
    thus disks can't be distincted. Disk matching workflow
    is changed. Now we use composite identifier built from
    disk id and path for workaround issue with same ids.
    If disk don't matched by the composite key we are trying to
    matching them by id. If matching by id failed we are trying
    to match disk by path.

    Closes-Bug: #1503987
    Change-Id: I09d6514e9749f964bb2d697b0adaf9dcb2b0ac73

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/252912

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/252914

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/7.0)

Reviewed: https://review.openstack.org/252914
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=c43d7df2ac1b6195437cde46658f54e684180953
Submitter: Jenkins
Branch: stable/7.0

commit c43d7df2ac1b6195437cde46658f54e684180953
Author: Alexander Kislitsky <email address hidden>
Date: Wed Nov 18 18:45:56 2015 +0300

    Case with same disk ids covered in VolumeManager

    For NVMe disks udevadm on the CentOs returns the same ids,
    thus disks can't be distincted. Disk matching workflow
    is changed. Now we use composite identifier built from
    disk id and path for workaround issue with same ids.
    If disk don't matched by the composite key we are trying to
    matching them by id. If matching by id failed we are trying
    to match disk by path.

    Closes-Bug: #1503987
    Change-Id: I09d6514e9749f964bb2d697b0adaf9dcb2b0ac73
    (cherry picked from commit d7432c144f1756533bdea220fe3024f57fc70b56)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/7.0)

Reviewed: https://review.openstack.org/252912
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=8fbc83b7da790e8942e3a86302479b0d87d2d0d6
Submitter: Jenkins
Branch: stable/7.0

commit 8fbc83b7da790e8942e3a86302479b0d87d2d0d6
Author: Alexander Kislitsky <email address hidden>
Date: Tue Nov 17 17:24:01 2015 +0300

    Workaround for detection of CentOS NVMe disks added

    On CentOs udevadm returns the same ids for NVMe disks.
    For handling this case we use matched by name devices.

    Change-Id: Iecca4c188fa148be4c4857767a8b57b60def8be9
    Partial-Bug: #1503987
    (cherry picked from commit f7a0ecd242425143d8bed5093572e01c66607ce8)

Revision history for this message
Sergii Rizvan (srizvan) wrote :

Verified on 7.0
Packages:
fuel-agent-7.0.0-138.1.git337e782.noarch
nailgun-7.0.0-7684.1.git687cd9d.noarch

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Sergii Rizvan (srizvan), was you able to test a fix against a node with multiple NVME storages?

Revision history for this message
Sergii Rizvan (srizvan) wrote :

Alexander Gordeev (a-gordeev), no, I wasn't able to test this fix with NVME storages because we don't have such lab. We have merged the fix based on review and verified that updated packages contain changes from the fix.

tags: added: 7.0-mu-2
tags: added: on-verification
Revision history for this message
Andrey Lavrentyev (alavrentyev) wrote :

Verified on ISO #570

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.