Incorrect naming of partitions on pmem-based devices

Bug #2002345 reported by Rod Smith
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
New
Undecided
Unassigned

Bug Description

We have a Lenovo ThinkSystem SR850P (gloomer) with sixteen Intel Optane DCPMMs that provide four "disk" devices -- /dev/pmem0 through /dev/pmem3. MAAS (3.2.6) correctly detects these as disk devices, but after configuring partitions on these devices, when attempting to deploy Ubuntu 22.04, the deployment fails, with a message like this available on the console (or IPMI SoL):

[ 180.580624] cloud-init[2642]: devsync happy - path /dev/pmem0 now exists
[ 180.600631] cloud-init[2642]: return volume path /dev/pmem0
[ 180.616616] cloud-init[2642]: Running command ['lsblk', '--noheadings', '--bytes', '--pairs', '--output=ALIGNMENT,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,FSTYPE,GROUP,KNAME,LABEL,LOG-SEC,MAJ:MIN,MIN-IO,MODE,MODEL,MOUNTPOINT,NAME,OPT-IO,OWNER,PHY-SEC,RM,RO,ROTA,RQ-SIZE,SIZE,STATE,TYPE,UUID', '/dev/pmem0'] with allowed return codes [0] (capture=True)
[ 180.660629] cloud-init[2642]: get_blockdev_sector_size: info:
[ 180.676628] cloud-init[2642]: {
[ 180.688627] cloud-init[2642]: "pmem0": {
[ 180.704633] cloud-init[2642]: "ALIGNMENT": "0",
[ 180.720633] cloud-init[2642]: "DISC-ALN": "0",
[ 180.736641] cloud-init[2642]: "DISC-GRAN": "0",
[ 180.752614] cloud-init[2642]: "DISC-MAX": "0",
[ 180.768647] cloud-init[2642]: "DISC-ZERO": "0",
[ 180.784637] cloud-init[2642]: "FSTYPE": "",
[ 180.800638] cloud-init[2642]: "GROUP": "disk",
[ 180.816643] cloud-init[2642]: "KNAME": "pmem0",
[ 180.832612] cloud-init[2642]: "LABEL": "",
[ 180.848617] cloud-init[2642]: "LOG-SEC": "4096",
[ 180.864627] cloud-init[2642]: "MAJ:MIN": "259:0",
[ 180.880634] cloud-init[2642]: "MIN-IO": "4096",
[ 180.896622] cloud-init[2642]: "MODE": "brw-rw----",
[ 180.912617] cloud-init[2642]: "MODEL": "",
[ 180.928621] cloud-init[2642]: "MOUNTPOINT": "",
[ 180.944619] cloud-init[2642]: "NAME": "pmem0",
[ 180.960620] cloud-init[2642]: "OPT-IO": "0",
[ 180.976614] cloud-init[2642]: "OWNER": "root",
[ 180.992618] cloud-init[2642]: "PHY-SEC": "4096",
[ 181.008636] cloud-init[2642]: "RM": "0",
[ 181.024625] cloud-init[2642]: "RO": "0",
[ 181.040611] cloud-init[2642]: "ROTA": "0",
[ 181.056621] cloud-init[2642]: "RQ-SIZE": "128",
[ 181.072616] cloud-init[2642]: "SIZE": "541165879296",
[ 181.088616] cloud-init[2642]: "STATE": "",
[ 181.104626] cloud-init[2642]: "TYPE": "disk",
[ 181.120623] cloud-init[2642]: "UUID": "",
[ 181.136627] cloud-init[2642]: "device_path": "/dev/pmem0"
[ 181.152712] cloud-init[2642]: }
[ 181.164626] cloud-init[2642]: }
[ 181.176626] cloud-init[2642]: get_blockdev_sector_size: (log=4096, phys=4096)
[ 181.196639] cloud-init[2642]: pmem0 logical_block_size_bytes: 4096
[ 181.212625] cloud-init[2642]: adding partition 'pmem0-part1' to disk 'pmem0' (ptable: 'gpt')
[ 181.232633] cloud-init[2642]: partnum: 1 offset_sectors: 256 length_sectors: 132118527
[ 181.252630] cloud-init[2642]: Preparing partition location on disk /dev/pmem0
[ 181.272636] cloud-init[2642]: Wiping 1M on /dev/pmem0 at offset 1048576
[ 181.292641] cloud-init[2642]: Running command ['sgdisk', '--new', '1:256:132118783', '--typecode=1:8300', '/dev/pmem0'] with allowed return codes [0] (capture=True)
[ 181.320645] cloud-init[2642]: Running command ['udevadm', 'info', '--query=property', '--export', '/dev/pmem0'] with allowed return codes [0] (capture=True)
[ 181.348620] cloud-init[2642]: /dev/pmem0 is multipath device? False
[ 181.364618] cloud-init[2642]: Running command ['blockdev', '--rereadpt', '/dev/pmem0'] with allowed return codes [0] (capture=True)
[ 181.388614] cloud-init[2642]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
[ 181.408621] cloud-init[2642]: TIMED udevadm_settle(): 0.099
[ 181.424616] cloud-init[2642]: Running command ['udevadm', 'settle', '--exit-if-exists=/dev/pmem01'] with allowed return codes [0] (capture=False)
[ 181.448616] cloud-init[2642]: TIMED udevadm_settle(exists='/dev/pmem01'): 0.012
[ 181.468660] cloud-init[2642]: get_path_to_storage_volume for volume pmem0-part1({'device': 'pmem0', 'id': 'pmem0-part1', 'name': 'pmem0-part1', 'number': 1, 'offset': '4194304B', 'size': '541157490688B', 'type': 'partition', 'uuid': 'a8898ccd-3f20-4dfb-9948-0fad284617c7', 'wipe': 'superblock'})
[ 181.508636] cloud-init[2642]: get_path_to_storage_volume for volume pmem0({'id': 'pmem0', 'name': 'pmem0', 'path': '/dev/pmem0', 'ptable': 'gpt', 'type': 'disk', 'wipe': 'superblock'})
[ 181.536630] cloud-init[2642]: Running command ['udevadm', 'info', '--query=property', '--export', '/dev/pmem0'] with allowed return codes [0] (capture=True)
[ 181.564631] cloud-init[2642]: /dev/pmem0 is multipath device member? False
[ 181.584628] cloud-init[2642]: Running command ['partprobe', '/dev/pmem0'] with allowed return codes [0, 1] (capture=False)
[ 181.608631] cloud-init[2642]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
[ 181.628631] cloud-init[2642]: TIMED udevadm_settle(): 0.063
[ 181.644621] cloud-init[2642]: devsync happy - path /dev/pmem0 now exists
[ 181.664625] cloud-init[2642]: return volume path /dev/pmem0
[ 181.680631] cloud-init[2642]: An error occured handling 'pmem0-part1': OSError - could not get path to dev from kname: pmem01
[ 181.704612] cloud-init[2642]: finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: configuring partition: pmem0-part1
[ 181.728633] cloud-init[2642]: TIMED BLOCK_META: 14.138
[ 181.744647] cloud-init[2642]: finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
[ 181.768641] cloud-init[2642]: Traceback (most recent call last):
[ 181.784644] cloud-init[2642]: File "/curtin/curtin/commands/main.py", line 202, in main
[ 181.804615] cloud-init[2642]: ret = args.func(args)
[ 181.820648] cloud-init[2642]: File "/curtin/curtin/log.py", line 97, in wrapper
[ 181.840643] cloud-init[2642]: return log_time("TIMED %s: " % msg, func, *args, **kwargs)
[ 181.860645] cloud-init[2642]: File "/curtin/curtin/log.py", line 79, in log_time
[ 181.880615] cloud-init[2642]: return func(*args, **kwargs)
[ 181.896621] cloud-init[2642]: File "/curtin/curtin/commands/block_meta.py", line 117, in block_meta
[ 181.916612] cloud-init[2642]: return meta_custom(args)
[ 181.932617] cloud-init[2642]: File "/curtin/curtin/commands/block_meta.py", line 2057, in meta_custom
[ 181.952621] cloud-init[2642]: handler(command, storage_config_dict, command_handlers)
[ 181.972643] cloud-init[2642]: File "/curtin/curtin/commands/block_meta.py", line 1038, in partition_handler
[ 181.992634] cloud-init[2642]: make_dname(info.get('id'), storage_config)
[ 182.012612] cloud-init[2642]: File "/curtin/curtin/commands/block_meta.py", line 307, in make_dname
[ 182.032623] cloud-init[2642]: path = get_path_to_storage_volume(volume, storage_config)
[ 182.052611] cloud-init[2642]: File "/curtin/curtin/commands/block_meta.py", line 451, in get_path_to_storage_volume
[ 182.076615] cloud-init[2642]: volume_path = block.kname_to_path(partition_kname)
[ 182.096626] cloud-init[2642]: File "/curtin/curtin/block/__init__.py", line 117, in kname_to_path
[ 182.116645] cloud-init[2642]: raise OSError('could not get path to dev from kname: {}'.format(kname))
[ 182.136622] cloud-init[2642]: OSError: could not get path to dev from kname: pmem01
[ 182.156713] cloud-init[2642]: could not get path to dev from kname: pmem01
[ 182.176632] cloud-init[2642]:
[ 182.188622] cloud-init[2642]: Stderr: ''
[ 182.200635] cloud-init[2642]: 2023-01-09 18:11:59,382 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[ 182.228635] cloud-init[2642]: 2023-01-09 18:11:59,382 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
[ 182.260654] cloud-init[2642]: Cloud-init v. 22.4.2-0ubuntu0~22.04.1 finished at Mon, 09 Jan 2023 18:12:00 +0000. Datasource DataSourceMAAS [http://10.1.10.3:5248/MAAS/metadata/curtin]. Up 122.98 seconds
[ 186.521297] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.534202] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.546541] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.558861] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.571184] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.583524] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.595815] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.608120] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.620455] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.632701] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.645015] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.657334] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.669609] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 186.681880] systemd-journald[873]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected
[ 187.485066] reboot: Restarting system

As I read this, it looks as if curtin (or something) is itentiyfing the partition as /dev/pmem01, whereas it is actually /dev/pmem0p1. Reconfiguring MAAS to place filesystems directly on the disk device, without first partitioning it, succeeds.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

This was fixed (generically) in e60b738d84c332d5fb2a51625143bfe998bb9b7f, do we need to do a release to make it easier for you to pick up?

Revision history for this message
Rod Smith (rodsmith) wrote :

Michael, yes, but I don't think there's any need to rush it, unless the next release is scheduled for months from now.

Revision history for this message
Jeff Lane  (bladernr) wrote :

@mwhudson when would a release be possible for Jammy and Focal?

I'm a bit concerned that someone externally will try deployting systems with the lastest MAAS and hit this issue too.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.