maas 2.7 regression: creates partition when trying to use entire disk as bcache cache device

Bug #1867812 reported by Jason Hobbs
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Medium
Unassigned

Bug Description

I start out with an nvme device with no partitions:

http://paste.ubuntu.com/p/794rqbdWWP/

Then I try to create a cache set using it as the cache device:

maas root bcache-cache-sets create qywwt3 cache_device=46

Instead of turning the whole device into a cache device, it creates a partition table and a partition. Return from above command:

{'id': 8, 'name': 'cache0', 'cache_device': {'uuid': '5721745b-4853-4098-a260-f810efd29579', 'size': 400082075648, 'bootable': False, 'tags': [], 'type': 'partition', 'system_id': 'qywwt3', 'filesystem': {'fstype': 'bcache-cache', 'label': None, 'uuid': 'e5a07151-757d-4720-86ba-1ad00d647e6d', 'mount_point': None, 'mount_options': None}, 'device_id': 46, 'path': '/dev/disk/by-dname/nvme0n1-part1', 'id': 134, 'used_for': 'Cache device for cache0', 'resource_uri': '/MAAS/api/2.0/nodes/qywwt3/blockdevices/46/partition/134'}, 'system_id': 'qywwt3', 'resource_uri': '/MAAS/api/2.0/nodes/qywwt3/bcache-cache-set/8/'}

http://paste.ubuntu.com/p/HsQ8RcjWyj/

Expected behavior: turn the entire block device into a bcache cache device. This is how 2.6 and before behaved. If I had wanted a partition, I would have created it.

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Here is the same thing on 2.6:

before creating:
http://paste.ubuntu.com/p/7bCDdwwb4p/

creating and result:
2020-03-17-18:52:21 root DEBUG [localhost]: maas root bcache-cache-sets create r448pr cache_device=24
> /home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/configurenodes.py(447)setup_bcaches()
-> backing_device = bcache["backing_device"]
(Pdb) cache_set
{'cache_device': {'firmware_version': None, 'serial': 'CVMD552400AX400AGN', 'size': 400088457216, 'tags': ['ssd'], 'model': 'INTEL SSDPEDME400G4', 'available_size': 0, 'used_size': 400088457216, 'uuid': None, 'partition_table_type': None, 'id_path': '/dev/disk/by-id/nvme-nvme.8086-43564d44353532343030415834303041474e-494e54454c205353445045444d453430304734-00000001', 'storage_pool': None, 'path': '/dev/disk/by-dname/nvme0n1', 'partitions': [], 'type': 'physical', 'used_for': 'Cache device for cache0', 'block_size': 4096, 'system_id': 'r448pr', 'filesystem': {'fstype': 'bcache-cache', 'label': None, 'uuid': '344c0027-069d-4473-9957-93437754cfa3', 'mount_point': None, 'mount_options': None}, 'name': 'nvme0n1', 'id': 24, 'resource_uri': '/MAAS/api/2.0/nodes/r448pr/blockdevices/24/'}, 'system_id': 'r448pr', 'name': 'cache0', 'id': 1, 'resource_uri': '/MAAS/api/2.0/nodes/r448pr/bcache-cache-set/1/'}

block-devices read shows the cache device is the entire disk and there are no partitions/partition table.
http://paste.ubuntu.com/p/ZH8h2sWTdg/

summary: - maas 2.7 creates partition when trying to use entire disk as bcache
- cache device
+ maas 2.7 regression: creates partition when trying to use entire disk as
+ bcache cache device
Revision history for this message
Adam Collard (adam-collard) wrote :

When you create a bcache, MAAS will always create a partition on the underlying block device IF it's the "boot device" - see bug 1509536.

When MAAS doesn't know the boot device, it falls back on the first physical block device created and assumes that is the boot device.

1) can you confirm what MAAS has listed as the physical block devices on the node?
2) Are you able to tell MAAS what the boot disk is? e.g. `maas root block-device set-boot-disk r448pr <block-device-id-of-boot-disk>`

Changed in maas:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

1) We aren't using bios, so "boot device" shouldn't matter. We're explicitly telling MAAS where to put EFI and it isn't here.
2) This is a behavior change from 2.6 - a regression.

Changed in maas:
status: Incomplete → New
Changed in maas:
importance: Undecided → High
status: New → Triaged
milestone: none → 2.9.0b1
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b1 → 2.9.0b2
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b2 → 2.9.0b3
milestone: 2.9.0b3 → 2.9.0b4
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b4 → 2.9.0b7
Changed in maas:
assignee: nobody → Adam Collard (adam-collard)
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b7 → 2.9.0b8
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b8 → 2.9.0rc1
Changed in maas:
assignee: Adam Collard (adam-collard) → nobody
Dougal Matthews (d0ugal)
Changed in maas:
assignee: nobody → Dougal Matthews (d0ugal)
Dougal Matthews (d0ugal)
Changed in maas:
milestone: 2.9.0rc1 → 2.9.x
Revision history for this message
Dougal Matthews (d0ugal) wrote :

I have spent a bunch of time trying to resolve this bug and I've gotten a little stuck. I wanted to write up a bit about what I have learned so far.

There is a method in maasserver.models.node, get_boot_disk which returns the boot disk for a node. However, if there isn't one specified then it will default to the first physical disk. This seems like a confusing behaviour and I think it is the cause of the issue here. There is also another suspicious method is_boot_disk in maasserver.models.blockdevice, which uses get_boot_disk. This means that if there is no specified boot disk, but if there is a physical disk that can be found then is_boot_disk will return True.

I was able to replicate this bug with this unit test:

+ def test_create_without_creating_partitions(self):
+ """Verify we don't create a partition when creating a bcache cache set
+
+ Test for LP 1867812
+ """
+ self.become_admin()
+ node = factory.make_Node(
+ status=NODE_STATUS.READY, with_boot_disk=False
+ )
+ uri = get_bcache_cache_sets_uri(node)
+ cache_device = factory.make_PhysicalBlockDevice(node=node)
+ response = self.client.post(uri, {"cache_device": cache_device.id})
+ self.assertEqual(
+ http.client.OK, response.status_code, response.content
+ )
+ parsed_device = json_load_bytes(response.content)
+ self.assertFalse(
+ parsed_device["cache_device"]["partitions"],
+ )

This test causes get_boot_disk to be called from three different places.

  File "/home/ubuntu/code/canonical/maas/src/maasserver/models/blockdevice.py", line 302, in create_partition_if_boot_disk

  File "/home/ubuntu/code/canonical/maas/src/maasserver/models/partition.py", line 223, in get_partition_number

  File "/home/ubuntu/code/canonical/maas/src/maasserver/storage_layouts.py", line 59, in __init__

When trying to remove these three uses I broke a test in maasserver.forms.tests.test_cachset; TestCreateCacheSetForm.test_cache_set_creation_with_boot_disk. This test seems to verify the opposite behaviour and expects partitions to be created. Despite the node not having a explicit boot disk assigned.

Looking at when this test had that behaviour introduced, it was in this 2015 change: https://git.launchpad.net/maas/commit/?id=035e8d44c07aef638dae9579627e67129aa3cb27

This makes me wonder how the 2.6 behaviour was ever possible. It might be that I am missing something or my reproduction is incomplete.

From a brief discussion on mattermost it sounds like the best solution here would be to make everything much more explicit. By having users specify which disks should be bootable and we should be clear when partitions will be created.

I have also spent some time diving into the git history to try and track down whatever the breaking change might be. No luck there. I'd really like to better understand why/when the behaviour changed.

Dougal Matthews (d0ugal)
Changed in maas:
milestone: 2.9.x → none
Dougal Matthews (d0ugal)
Changed in maas:
assignee: Dougal Matthews (d0ugal) → nobody
milestone: none → 2.10-next
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

sub'd to field-high; field team is hitting this too.

tags: added: cpe-onsite
Changed in maas:
milestone: 3.0.0 → 3.0.1
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Does this issue still happen on MAAS 3.2 or later?

Changed in maas:
assignee: nobody → Alexander Balderson (asbalderson)
milestone: 3.0.1 → 3.3.0
status: Triaged → Incomplete
Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Closing as outdated and no longer reproducible

Changed in maas:
assignee: Alexander Balderson (asbalderson) → nobody
importance: High → Medium
milestone: 3.3.0 → none
status: Incomplete → Invalid
milestone: none → 3.3.0
Changed in maas:
milestone: 3.3.0 → 3.3.0-beta3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.