cloud-init 19.2.36 fails with python exception "Not all expected physical devices present ..." during bionic image deployment from MAAS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
Critical
|
Dan Watkins | ||
cloud-init (Ubuntu) |
Fix Released
|
Critical
|
Dan Watkins | ||
Xenial |
Fix Released
|
Undecided
|
Dan Watkins | ||
Bionic |
Fix Released
|
Undecided
|
Dan Watkins | ||
Disco |
Fix Released
|
Undecided
|
Dan Watkins | ||
Eoan |
Fix Released
|
Critical
|
Dan Watkins |
Bug Description
[Impact]
Any instances launched with bridges or bonds in their network configuration will fail to bring up networking.
[Test Case]
# Juju bootstrap on maas of a machine sets up a network bridge that triggers a failure in cloud-init init stage.
# This results in a maas machine deployment failure and the machine gets released
Procedure:
# Alternate steps on a maas machine with a bridge already created
A1. confirm a bridge interface is configured for the target machine on interface eno1, name it broam, attach it to a subnet and select auto-assign for IP
A2. click deploy -> bionic
A3. Once manual deployment fails go to step 2 below
# Alternative 2 juju bootstrap failure on maas
B1: juju bootrap mymaas --no-gui
B2: Once bootstrap fails go to step 2
2. After deployment failure and machine is powered off click on the failed/released node in the MAAS UI
3. Click Rescue Mode from the 'Take Action' drop down in the MAAS UI
4. Grab the IP from the interfaces tab
5. ssh ubuntu@
# Expect failure message
6. Click Exit Rescue Mode on the node in MAAS UI.
7. ssh to the maas server add the following lines to /etc/maas/
system_upgrade: {enabled: True}
apt:
sources:
proposed.list:
source: deb $MIRROR $RELEASE-proposed main universe # upstream -proposed
8. Repeat step 1 and expect bootstrap success
# expect to see MAASDatasource from bootstrapped machine and no errors
9. juju ssh 0 -- cloud-init status --long
Additional verification checks to avoid regression
- DONE oracle
- DONE ec2
- DONE openstack
- DONE gce
- DONE azure
- DONE nocloud kvm
- DONE nocloud lxd
[Regression Potential]
The change being SRU'd adds more conditions to an existing conditional. There is potential to regress the cases that the existing conditional was introduced to cover, so we will be testing those specifically. Other than that, there was some minor refactoring of the existing conditional statement (which did not change the logic it checks), which could cause issues for Oracle netfailover interfaces. We will also specifically test on Oracle.
[Original Report]
Symptoms
========
After deployment of Ubuntu Bionic image on MAAS provider (deploying to a bare metal server) juju cannot access any deployed machine due to missing SSH keys and machines are stuck in pending state:
$ juju ssh 0
ERROR retrieving SSH host keys for "0": keys not found
$ juju machines
Machine State DNS Inst id Series AZ Message
0 pending 172.20.10.125 block-3 bionic AZ3 Deployed
1 pending 172.20.10.124 block-2 bionic AZ2 Deployed
2 pending 172.20.10.126 block-1 bionic AZ1 Deployed
3 pending 172.20.10.127 object-2 bionic AZ1 Deployed
4 pending 172.20.10.128 object-1 bionic AZ2 Deployed
5 pending 172.20.10.129 object-3 bionic AZ3 Deployed
It worth mentioning that pods can be successfully deployed with MAAS, only bare metal deployment fails.
We checked different bionic images: cloud-init 19.2.24 works, and cloud-init 19.2.36 doesn't.
Related branches
- Server Team CI bot: Needs Fixing (continuous-integration)
- Chad Smith: Approve
-
Diff: 180 lines (+158/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/cpick-a7d8d032-get_interfaces-don-t-exclude-bridge-and-bond-members (+150/-0)
debian/patches/series (+1/-0)
- Server Team CI bot: Needs Fixing (continuous-integration)
- Chad Smith: Approve
-
Diff: 178 lines (+158/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/cpick-a7d8d032-get_interfaces-don-t-exclude-bridge-and-bond-members (+150/-0)
debian/patches/series (+1/-0)
- Server Team CI bot: Needs Fixing (continuous-integration)
- Chad Smith: Approve
-
Diff: 178 lines (+158/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/cpick-a7d8d032-get_interfaces-don-t-exclude-bridge-and-bond-members (+150/-0)
debian/patches/series (+1/-0)
- Server Team CI bot: Needs Fixing (continuous-integration)
- Chad Smith: Approve
-
Diff: 179 lines (+158/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/cpick-a7d8d032-get_interfaces-don-t-exclude-bridge-and-bond-members (+150/-0)
debian/patches/series (+1/-0)
- Chad Smith: Approve
- Server Team CI bot: Approve (continuous-integration)
-
Diff: 132 lines (+73/-10)2 files modifiedcloudinit/net/__init__.py (+19/-5)
cloudinit/net/tests/test_init.py (+54/-5)
tags: | added: regression-update |
tags: | added: cpe-onsite |
Changed in cloud-init (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in cloud-init: | |
status: | New → In Progress |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
importance: | Undecided → Critical |
Changed in cloud-init (Ubuntu): | |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
Changed in cloud-init (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Disco): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Bionic): | |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
Changed in cloud-init (Ubuntu Disco): | |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in cloud-init (Ubuntu Xenial): | |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
Issue was introduced in Cloud-init v 19.2-36- g059d049c- 0ubuntu1~ 18.04.1. ge7881d5c- 0ubuntu1~ 18.04.1
It was not present in Cloud-init v. 19.2-24-
Symptoms: _.py[WARNING] : Not all expected physical devices present: {'3c:fd: fe:d5:7a: 42', '3c:fd: fe:d5:70: d9', '3c:fd: fe:d5:7a: 41', '3c:fd: fe:d5:70: d8', '3c:fd: fe:d5:7a: 40', '3c:fd: fe:d5:70: da'}
2019-10-03 13:10:59,100 - __init_
It seems that following change causes that:
* New upstream snapshot. (LP: #1844334)
- net: add is_master check for filtering device list