cloud-init selects sysconfig netconfig renderer if network-manager is installed on Ubuntu

Bug #1819994 reported by duanbenliang on 2019-03-14
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
Provider for Plainbox - Canonical Certification Server
Critical
Rod Smith
cloud-init
High
Unassigned
cloud-init (Ubuntu)
Undecided
Unassigned

Bug Description

Configuration:
UEFI/BIOS: TEE136S
IMM/BMC: CDI333V
CPU: Intel(R) Xeon(R) Platinum 8253 CPU @ 2.20GHz
Memory: 16G DIMM * 12
Raid card: ThinkSystem RAID 530-8i
NIC Card: Intel X722 LOM

Reproduce Steps:
1.Config "network" as first boot
2.Power on machine
3.Visit TC through web browser and Commission machine
4.When commission complete, deploy ubuntu 18.04 LTS on SUT
5.The Error appeared during OS deploy.

Deploy errors like the following(you can view the attachment for details):

cloud-init[xxxx] Date_and_time - handlers.py[WARNING]: failed posting event: start: modules-final/config-xxxx: running config-xxxx

cloud-init[xxxx] Date_and_time - handlers.py[WARNING]: failed posting event: fainish: modules-final: SUCCESS: running modules for final

Related branches

duanbenliang (duanbl1) wrote :
duanbenliang (duanbl1) wrote :
duanbenliang (duanbl1) wrote :
duanbenliang (duanbl1) wrote :
Blake Rouse (blake-rouse) wrote :

Looks like it might be an issue either in curtin or MAAS based on the network configuration.

Once the machine fails to deploy can you provide the output of:

maas {profile} machine get-curtin-config {system_id}

Jeff Lane (bladernr) wrote :

FYI, I've added a cert task for this. I don't know for sure this is curtin, it looks like something may have changed in one of the hundreds of dependency packages that checkbox pulls in causing curtin to fail.

Rod is investigating it on our side.

Changed in plainbox-provider-certification-server:
importance: Undecided → Critical
assignee: nobody → Rod Smith (rodsmith)
status: New → Confirmed
Jeff Lane (bladernr) wrote :

We have a bug for this as well, 1189973 but duping for that kills the MAAS (possibly curtin) task. So I un-duped it for now

Changed in maas:
status: New → Incomplete
Rod Smith (rodsmith) wrote :

We've traced the problem to the network-manager package, which gets pulled in by a dependency in canonical-certification-server. Apparently, curtin or cloud-init (I'm not sure which) is now skipping netplan configuration when the network-manager package is installed.

Ryan Harper (raharper) wrote :

Neither curtin, nor cloud-init will *skip* generating networking. However, if there exists some additional netplan config in the target system that cloud-init is not aware (maybe provided in the NetworkManager package (or something else)) then there may be a conflict in the configuration that prevents netplan apply from bringing up the network.

If possible, getting the systemd journal and what's in /etc/netplan and /run/systemd/{netif,network} and /var/log/cloud-init.log could help see what's going on.

Rod Smith (rodsmith) wrote :

I've attached the /var/log/cloud-init.log file from a node that failed deployment. (This is a different node from the one that generated the earlier logs.) The /etc/netplan directory is empty, and neither there is no /run/systemd directory on this node that failed to deploy.

Ryan Harper (raharper) wrote :

2019-03-14 17:32:34,606 - __init__.py[DEBUG]: Selected renderer 'sysconfig' from priority list: None

This is a cloud-init bug. The sysconfig renderer has NetworkManager support, this triggered cloud-init to render sysconfig instead of netplan.

Changed in cloud-init:
importance: Undecided → High
status: New → Confirmed
Changed in maas:
status: Incomplete → Invalid
Ryan Harper (raharper) on 2019-03-15
summary: - An error occurs when MAAS Deploy 18.04 on ThinkSystem SR590
+ cloud-init selects sysconfig netconfig renderer if network-manager is
+ installed on Ubuntu
Ryan Harper (raharper) wrote :

You can workaround this issue by including the following curtin config when deploying.

write_files:
  policy:
    path: /etc/cloud/cloud.cfg.d/01_network_renderer_policy.cfg
    content: |
      #cloud-config
      system_info:
        network:
          renderers: ['eni', 'netplan']

Changed in cloud-init:
status: Confirmed → In Progress
Rod Smith (rodsmith) wrote :

Thanks for the quick fix, Ryan! I've confirmed that your curtin config workaround in comment #15 works. Do you have an estimate for how long it'll be before a fix goes live? (I ask so we can plan whether we should push your workaround through one of the certification packages.)

On Fri, Mar 15, 2019 at 2:50 PM Rod Smith <email address hidden> wrote:

> Thanks for the quick fix, Ryan! I've confirmed that your curtin config
> workaround in comment #15 works. Do you have an estimate for how long
> it'll be before a fix goes live? (I ask so we can plan whether we should
> push your workaround through one of the certification packages.)
>

Depends on where you need it. It can likely land upstream either today
or on Monday; and would be available via the cloud-init-dev daily PPA;
however, an SRU will take at least another week after next; We're almost
done with an existing cloud-init SRU; so we'd likely not start another SRU
until the current one is in -updates.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1819994
>
> Title:
> cloud-init selects sysconfig netconfig renderer if network-manager is
> installed on Ubuntu
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1819994/+subscriptions
>

Amy Gou (goujm1) wrote :

hi Jeff and all,

After upgrade online, it is MAAS 0.4.0 show under version tale, but still 2.4.2 under the log. in the same time, the deploy fails again. please double check the log and let me know if there is any comments.

Best Regards,
Amy

Jeff Lane (bladernr) wrote :

Hi Amy,

first, which machine failed? I see a bunch of machines in the /var/log/maas/rsyslog/ directory, and I'm not sure exactly which one to look at.

Secondly, the version you posted in the screen shot looks correct, can you show me the output of:

ls -l /etc/maas/preseed/curtin_userdata*

Jeff Lane (bladernr) wrote :

Amy: Also, could you send me a tarball containing /etc/maas/preseeds ??

duanbenliang (duanbl1) wrote :
Amy Gou (goujm1) wrote :

hi Jeff,

it is SR590 Cascadelake deploy failed with the new MAAS 0.4.0. the attahmen above is collected from The environment with SR590 Cascadelake.
Besides, the same issue also occurs on SR650 cascadelake.

best Regards,
Amy

Rod Smith (rodsmith) wrote :

Amy, I think you're confusing the MAAS version (which is 2.4.2 on one of our installations) and the maas-cert-server package version (the latest of which is 0.4.0). The maas-cert-server 0.3.9 package includes a workaround (but NOT A FIX) for this bug, and 0.4.0 provides some unrelated improvements, so the installation SHOULD succeed after you've upgraded maas-cert-server to version 0.3.9 or 0.4.0. If it's still failing, then it could be you'll need to apply the workaround described by Ryan Harper in comment #15, which is different from the workaround in maas-cert-server 0.3.9 and 0.4.0. (Post back if you need help applying Ryan's workaround.) It could also be that you're looking at a completely different problem.

Amy Gou (goujm1) wrote :

hi Rod,

Thanks for your update, we will use the workaround to execute the current certification test on Purley Cascadelake.
As to the Deploy failure on MAAS 0.4.0, do you advise we raise the other defect to track?

Best Regards,
Amy

This bug is fixed with commit 5de83fc5 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=5de83fc5

Changed in cloud-init:
status: In Progress → Fix Committed

This bug is believed to be fixed in cloud-init in version 19.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Amy Gou (goujm1) wrote :

Sorry for the later reply, the issue does not occur with current Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~18.04.1. please move on and close it. thanks a lot.

Best Regards,
Amy

Jeff Lane (bladernr) wrote :

Hi Amy, it's likely that you're still using our patched tooling that includes a workaround. cloud-init 18.5 should not work.

Jeff Lane (bladernr) wrote :

Just a heads up, the fix is now in -updates, I've tested this locally on a couple deployments and it seems to resolve the issue we had before. Asking my team to verify on a couple more deployments for due diligence.

Rod Smith (rodsmith) wrote :

I've tested this on three nodes on two MAAS servers (my own home MAAS server and maastiff, our MAAS server in the certification lab), using both 18.04 and 19.04. It looks good to me.

Jeff Lane (bladernr) on 2019-06-19
Changed in plainbox-provider-certification-server:
status: Confirmed → Fix Committed
Amy Gou (goujm1) wrote :

thanks for your kindly update, i will do the double check with the latest one.

Hi Amy et al,

I'm going to mark this Fix Released, as 19.1 has made its way in to Ubuntu. Please let us know if you don't think this is fixed!

Dan

Changed in cloud-init (Ubuntu):
status: New → Fix Released
Changed in plainbox-provider-certification-server:
status: Fix Committed → Fix Released
To post a comment you must log in.