curtin race on vivid when /dev/sda1 doesn't exist
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Fix Released
|
Medium
|
Unassigned | ||
curtin (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Trusty |
Fix Released
|
Medium
|
Unassigned | ||
Vivid |
Fix Released
|
Medium
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Description]
Installation fails, with log information showing:
Unexpected error while running command.
Command: ['mkfs.ext4', '-q', '-L', 'cloudimg-rootfs', '/dev/sda1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'simple']
This is a result of curtin having done:
a.) partition the disk
b.) invoke wipefs on the block device with an offset to where the partition started
The intent of that was to ensure that there were no fileystem signatures or other interesting metadata (lvm, raid) on the partition.
The issue was two fold:
a.) wipefs always invokes rereadpt on the disk it wipes
b.) wipefs opened the block device, not the partition.
Both of these things cause a flurry of udev events that may result in the device having an open filehandle, and thus mkfs refusing to create a filesystem on it.
The solution is to replace 'wipefs' with our own 'wipe_partition' that does not open the block device, but rather only the partition and does not invoke rereadpt.
[Impact]
Installation fails occasionally.
[Test Case]
In the original bug-opener's environment it fails fairly reliably under heavy host load using vmware. He would do a deploy to several guests on the same host at the same time and this would reproduce. Unfortunately I was unable to come up with a test case in a less complex environment.
[Regression Potential]
Not specifically a regression, but it is possible that we need additional 'udevadm settle' after the internal 'wipe_partition'
=== End SRU Template ===
The file /dev/sda1 does not exist and no size was specified.
Unexpected error while running command.
Command: ['mkfs.ext4', '-q', '-L', 'cloudimg-rootfs', '/dev/sda1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'simple']
Exit code: 3
Reason: -
Stdout: b"The file /dev/sda1 does not exist and no size was specified.
Stderr: ''
This happened 4 out of 10 times when Ryan tested it.
Could this be a race condition because /dev/sda1 hasn't had time to come into existence between when curtin partitions the disk and when curtin tries to create a filesystem on the partition?
cloud-init[1226]: /var/lib/ cloud/instance/ scripts/ user_data. sh: 18: /var/lib/ cloud/instance/ scripts/ user_data. sh: initctl: not found
ubuntu@ phony-trick: /var/log$ cat /proc/partitions
major minor #blocks name
11 0 1048575 sr0 phony-trick: /var/log$ ls -al /dev/sda1
8 0 488386584 sda
8 1 488385560 sda1
8 16 488386584 sdb
8 17 487336967 sdb1
8 18 1047552 sdb2
8 32 1433600 sdc
ubuntu@
brw-rw---- 1 root disk 8, 1 May 11 21:13 /dev/sda1