Instance boot into incorrect root with identical partition UUID without cleaning

Bug #1746033 reported by Derek Higgins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Low
Unassigned
ironic-python-agent
Invalid
Low
Unassigned

Bug Description

When deploying onto a node the UUID for the root filesystem is baked into the image. This UUID may then used to identify the root filesystem e.g.

[centos@t1 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=UUID=cee7ed05-291b-4b28-b489-40566c2862cf ro console=tty0 crashkernel=auto console=ttyS0,115200

If the same node is then used a second time(without cleaning), this time using root device hints to select a different root device, we then get a partition with the same UUID written to a different disk. The result is a node with two disks with partitions using the same UUID

On boot either of them may be selected, causing a root device that changes on boot(based on some timing factors)

[centos@t1 ~]$ blkid
/dev/vda1: UUID="2018-01-26-17-14-45-00" LABEL="config-2" TYPE="iso9660"
/dev/vda2: LABEL="img-rootfs" UUID="cee7ed05-291b-4b28-b489-40566c2862cf" TYPE="xfs"
/dev/vdc1: UUID="2018-01-29-11-48-48-00" LABEL="config-2" TYPE="iso9660"
/dev/vdc2: LABEL="img-rootfs" UUID="cee7ed05-291b-4b28-b489-40566c2862cf" TYPE="xfs"

[centos@t1 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=UUID=cee7ed05-291b-4b28-b489-40566c2862cf ro console=tty0 crashkernel=auto console=ttyS0,115200
[centos@t1 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 80G 0 disk
├─vda1 252:1 0 1M 0 part
└─vda2 252:2 0 80G 0 part /
vdb 252:16 0 20G 0 disk
vdc 252:32 0 20G 0 disk
├─vdc1 252:33 0 1M 0 part
└─vdc2 252:34 0 20G 0 part

== reboot here ==

[centos@t1 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=UUID=cee7ed05-291b-4b28-b489-40566c2862cf ro console=tty0 crashkernel=auto console=ttyS0,115200
[centos@t1 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 80G 0 disk
├─vda1 252:1 0 1M 0 part
└─vda2 252:2 0 80G 0 part
vdb 252:16 0 20G 0 disk
vdc 252:32 0 20G 0 disk
├─vdc1 252:33 0 1M 0 part
└─vdc2 252:34 0 20G 0 part /

the / partition has moved to /dev/vdc2

Revision history for this message
Derek Higgins (derekh) wrote :

Possible solutions here are the
a) assert that cleaning should be done
b) make IPA abort if it senses duplicate partition UUID's
c) make IPA clear duplicate partition UUID's during install

Revision history for this message
Dmitry Tantsur (divius) wrote :

> assert that cleaning should be done

YES!

> make IPA abort if it senses duplicate partition UUID's

This may be a sensible thing to do for people who insist on having cleaning disabled.

Changed in ironic:
status: New → Triaged
importance: Undecided → Low
summary: - Instance boot into incorrect root with identical partition UUID
+ Instance boot into incorrect root with identical partition UUID without
+ cleaning
Changed in ironic-python-agent:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Jay Faulkner (jason-oldos) wrote :

To be clear: the correct fix for this is to warn against the bad/unsupported behavior, not to enable it.

Revision history for this message
Jay Faulkner (jason-oldos) wrote :

Cleaning insures a blank slate for Ironic to operate and deploy images onto. If you disable cleaning in the entire environment, you will get undefined behavior -- of which this is maybe one of the more innocuous cases.

Ironic cannot and will not support fixing this class of error due to:
- It being impossible for us to know all the potential incoming states
- Any partial-fix would imply that we support this behavior, when in reality, we'd only be handling the most common/known failure cases and potentially leaving others in places (such as IPA images with cloud-init -- how do they behave when they see uncleaned config drives?)

Changed in ironic:
status: Triaged → Invalid
Changed in ironic-python-agent:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.