Activity log for bug #2039614

Date Who What changed Old value New value Message
2023-10-17 22:22:36 Jeff Lane  bug added bug
2023-10-17 22:24:56 Paul Larson bug added subscriber Paul Larson
2023-10-19 08:47:56 Jerzy Husakowski maas: importance Undecided High
2023-10-19 08:47:56 Jerzy Husakowski maas: status New Triaged
2023-10-19 08:47:56 Jerzy Husakowski maas: milestone 3.4.x
2023-10-20 14:33:32 Jeff Lane  attachment added tabuu_snappy.yaml https://bugs.launchpad.net/maas/+bug/2039614/+attachment/5711748/+files/tabuu_snappy.yaml
2023-10-20 14:33:57 Jeff Lane  description We (Cert) just updated MAAS from 3.3.x to 3.4.0-RC1. We have, in testflinger, default partition definitions that, because of how MAAS identifies partitions and disks is very reliant on MAAS ids for disk devices and partitions. For example, prior to the move to 3.4.0, this was the definition for one server (these change and grow more or less complex depending on the number of disks in a machine): 2 default_disks: 3 - id: '216' 4 name: nvme0n1 5 parent_disk_blkid: '216' 6 ptable: GPT 7 type: disk 8 - device: '882' 9 id: nvme0n1-part1 10 number: '882' 11 parent_disk: '216' 12 parent_disk_blkid: '216' 13 size: '536870912' 14 type: partition 15 - fstype: fat32 16 id: 882-format 17 label: efi 18 parent_disk: '216' 19 parent_disk_blkid: '216' 20 type: format 21 volume: '882' 22 - device: 882-format 23 id: 882-mount 24 parent_disk: '216' 25 parent_disk_blkid: '216' 26 path: /boot/efi 27 type: mount 28 - device: '883' 29 id: nvme0n1-part2 30 number: '883' 31 parent_disk: '216' 32 parent_disk_blkid: '216' 33 size: '1599778848768' 34 type: partition 35 - fstype: ext4 36 id: 883-format 37 label: root 38 parent_disk: '216' 39 parent_disk_blkid: '216' 40 type: format 41 volume: '883' 42 - device: 883-format 43 id: 883-mount 44 parent_disk: '216' 45 parent_disk_blkid: '216' 46 path: / 47 type: mount As you can see, this spells out partitions on a disk with the ID of 216, where the partition id is 882 and 883 to spell out the /boot/efi filesystem and the root filesystem respectively. These IDs were pulled from MAAS and reflected what on would get from a 'maas <name> partition reads <disk_id>. This allows us to provide a means for users to define their own partition scheme (e.g. set up something ceph-like, or bcache or whatever) and then revert things to the default. After the update, all testflinger deployments now fail seemingly because apparently the partition IDs have been changed. Looking at a dump of this machine via the MAAS CLI, the disk ID has remained the same but the partition IDs are now all it the 16,000s: bladernr@weavile:~$ maas bladernr partitions read 8pk6f8 216 Success. Machine-readable output follows: [ { "uuid": "b838b3db-3266-44da-bdbe-2a90b75df617", "size": 1599778848768, "bootable": false, "tags": [], "used_for": "ext4 formatted filesystem mounted at /", "type": "partition", "path": "/dev/disk/by-dname/nvme0n1-part2", "device_id": 216, "filesystem": { "fstype": "ext4", "label": "root", "uuid": "21aa8167-f0f7-4166-9e62-57e6504cac8d", "mount_point": "/", "mount_options": "" }, "id": 16153, "system_id": "8pk6f8", "resource_uri": "/MAAS/api/2.0/nodes/8pk6f8/blockdevices/216/partition/16153" }, { "uuid": "94256eca-c024-454b-b9f2-5c3b79b29611", "size": 536870912, "bootable": false, "tags": [], "used_for": "fat32 formatted filesystem mounted at /boot/efi", "type": "partition", "path": "/dev/disk/by-dname/nvme0n1-part1", "device_id": 216, "filesystem": { "fstype": "fat32", "label": "efi", "uuid": "1b93141c-af66-4594-a6d2-56e00f097108", "mount_point": "/boot/efi", "mount_options": "" }, "id": 16152, "system_id": "8pk6f8", "resource_uri": "/MAAS/api/2.0/nodes/8pk6f8/blockdevices/216/partition/16152" } ] I am pretty sure that testflinger is failing because it expects to see a partition ID of 882 and 883 on disk 216, but those no longer exist. Should we expect the partition IDs to change every time MAAS is updated, or is this a weird bug this time around (I don't think we've updated MAAS since we implemented the disk layout in testflinger, so it's possible this has always been the case and we just never had a problem with it before). Note, the only thing that has changed on our end was the MAAS snap update to 3.4.0, we did not update anything in the testflinger agents from yesterday to today, so I'm reasonably certain this is the root cause here, at least from what I have seen over the last 30 minutes or so of poking at this. We (Cert) just updated MAAS from 3.3.x to 3.4.0-RC1. We have, in testflinger, default partition definitions that, because of how MAAS identifies partitions and disks is very reliant on MAAS ids for disk devices and partitions. For example, prior to the move to 3.4.0, this was the definition for one server (these change and grow more or less complex depending on the number of disks in a machine):   2 default_disks:   3 - id: '216'   4 name: nvme0n1   5 parent_disk_blkid: '216'   6 ptable: GPT   7 type: disk   8 - device: '882'   9 id: nvme0n1-part1  10 number: '882'  11 parent_disk: '216'  12 parent_disk_blkid: '216'  13 size: '536870912'  14 type: partition  15 - fstype: fat32  16 id: 882-format  17 label: efi  18 parent_disk: '216'  19 parent_disk_blkid: '216'  20 type: format  21 volume: '882'  22 - device: 882-format  23 id: 882-mount  24 parent_disk: '216'  25 parent_disk_blkid: '216'  26 path: /boot/efi  27 type: mount  28 - device: '883'  29 id: nvme0n1-part2  30 number: '883'  31 parent_disk: '216'  32 parent_disk_blkid: '216'  33 size: '1599778848768'  34 type: partition  35 - fstype: ext4  36 id: 883-format  37 label: root  38 parent_disk: '216'  39 parent_disk_blkid: '216'  40 type: format  41 volume: '883' 42 - device: 883-format  43 id: 883-mount  44 parent_disk: '216'  45 parent_disk_blkid: '216'  46 path: /  47 type: mount As you can see, this spells out partitions on a disk with the ID of 216, where the partition id is 882 and 883 to spell out the /boot/efi filesystem and the root filesystem respectively. These IDs were pulled from MAAS and reflected what on would get from a 'maas <name> partition reads <disk_id>. This allows us to provide a means for users to define their own partition scheme (e.g. set up something ceph-like, or bcache or whatever) and then revert things to the default. After the update, all testflinger deployments now fail seemingly because apparently the partition IDs have been changed. Looking at a dump of this machine via the MAAS CLI, the disk ID has remained the same but the partition IDs are now all it the 16,000s: bladernr@weavile:~$ maas bladernr partitions read 8pk6f8 216 Success. Machine-readable output follows: [     {         "uuid": "b838b3db-3266-44da-bdbe-2a90b75df617",         "size": 1599778848768,         "bootable": false,         "tags": [],         "used_for": "ext4 formatted filesystem mounted at /",         "type": "partition",         "path": "/dev/disk/by-dname/nvme0n1-part2",         "device_id": 216,         "filesystem": {             "fstype": "ext4",             "label": "root",             "uuid": "21aa8167-f0f7-4166-9e62-57e6504cac8d",             "mount_point": "/",             "mount_options": ""         },         "id": 16153,         "system_id": "8pk6f8",         "resource_uri": "/MAAS/api/2.0/nodes/8pk6f8/blockdevices/216/partition/16153"     },     {         "uuid": "94256eca-c024-454b-b9f2-5c3b79b29611",         "size": 536870912,         "bootable": false,         "tags": [],         "used_for": "fat32 formatted filesystem mounted at /boot/efi",         "type": "partition",         "path": "/dev/disk/by-dname/nvme0n1-part1",         "device_id": 216,         "filesystem": {             "fstype": "fat32",             "label": "efi",             "uuid": "1b93141c-af66-4594-a6d2-56e00f097108",             "mount_point": "/boot/efi",             "mount_options": ""         },         "id": 16152,         "system_id": "8pk6f8",         "resource_uri": "/MAAS/api/2.0/nodes/8pk6f8/blockdevices/216/partition/16152"     } ] I am pretty sure that testflinger is failing because it expects to see a partition ID of 882 and 883 on disk 216, but those no longer exist. Should we expect the partition IDs to change every time MAAS is updated, or is this a weird bug this time around (I don't think we've updated MAAS since we implemented the disk layout in testflinger, so it's possible this has always been the case and we just never had a problem with it before). Note, the only thing that has changed on our end was the MAAS snap update to 3.4.0, we did not update anything in the testflinger agents from yesterday to today, so I'm reasonably certain this is the root cause here, at least from what I have seen over the last 30 minutes or so of poking at this.
2023-10-20 14:35:21 Jeff Lane  bug added subscriber Adrian Lane
2023-10-25 14:46:21 Jeff Lane  attachment removed tabuu_snappy.yaml https://bugs.launchpad.net/maas/+bug/2039614/+attachment/5711748/+files/tabuu_snappy.yaml
2023-10-25 14:47:05 Jeff Lane  attachment added tabuu_snappy.yml https://bugs.launchpad.net/maas/+bug/2039614/+attachment/5713217/+files/tabuu_snappy.yml
2024-03-07 09:24:39 Jerzy Husakowski maas: milestone 3.4.x 3.5.x
2024-03-07 09:24:47 Jerzy Husakowski nominated for series maas/3.4
2024-03-07 09:24:47 Jerzy Husakowski bug task added maas/3.4
2024-03-07 09:24:52 Jerzy Husakowski maas/3.4: milestone 3.4.x
2024-03-07 09:24:57 Jerzy Husakowski maas/3.4: status New Triaged
2024-03-07 09:24:59 Jerzy Husakowski maas/3.4: importance Undecided High
2024-04-30 15:03:14 Jerzy Husakowski nominated for series maas/3.6
2024-04-30 15:03:14 Jerzy Husakowski bug task added maas/3.6
2024-04-30 15:03:28 Jerzy Husakowski nominated for series maas/3.5
2024-04-30 15:03:28 Jerzy Husakowski bug task added maas/3.5
2024-04-30 15:03:35 Jerzy Husakowski maas/3.5: milestone 3.5.x
2024-04-30 15:03:39 Jerzy Husakowski maas/3.6: milestone 3.5.x 3.6.0
2024-04-30 15:03:44 Jerzy Husakowski maas/3.5: importance Undecided High
2024-04-30 15:03:48 Jerzy Husakowski maas/3.5: status New Triaged