cloud-init re-generates network config every reboot overwriting manual admin changes on CentOS.

Bug #1712680 reported by Andres Rodriguez on 2017-08-23
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
cloud-init
Medium
Unassigned
maas-images
Undecided
Lee Trager

Bug Description

Using MAAS 2.2.2 and newest CentOS image that uses cloud-init cloud-init-0.7.9+224.g681baff-1.el7.centos.noarch, network configuration re-created after reboot (this is *not* with network passthrough).

The configuration created was:

[centos@withkvm ~]$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=52:54:00:03:02:6e
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

I changed that to do:

[centos@withkvm ~]$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=static
DEVICE=eth0
HWADDR=52:54:00:03:02:6e
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPADDR=192.168.122.3
NETMASK=255.255.255.0
GATEWAY=192.168.122.2

However, after I reboot, the network config is changed:

[centos@withkvm ~]$ sudo cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=52:54:00:03:02:6e
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

Related bugs:
 * bug 1782315: cloud-init blocks the boot process if it can't reach its MAAS datasource

Related branches

Andres Rodriguez (andreserl) wrote :

Adding MAAS for tracking, as this was seen via maas testing affecting production environments.

summary: cloud-init re-generates network config every reboot overwriting manual
- admin changes
+ admin changes on CentOS.
description: updated
Ryan Harper (raharper) wrote :

I think that we need the in-image curtin hooks for centos to write out a cloud-init network config disable if maas isn't sending network config to the image; this is what curtin does for ubuntu.

Gabriel Ramirez (gabriel1109) wrote :

Quick note. This case originally arose from a customer case who has agreed to close the ticket out as he believes he has a good way to proceed with this

Merlin Hartley (scarabmonkey) wrote :

I have the same problem - cloud-init is rewriting /etc/sysconfig/network thus undoing my change of default route.

Lee Trager (ltrager) wrote :

@raharper I think the issue is MAAS is always sending the config. On Ubuntu its only applied on first boot while on CentOS its applied on every boot. I've confirmed /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg is being written. I think the bug is in cloud-init and out of MAAS control.

Changed in maas:
status: New → Incomplete
Scott Moser (smoser) wrote :

Networking is intended to only be rendered once per instance.
This re-rendering happens in MAAS because MAAS is a network datasource, and
network information is provided to cloud-init via system config (in
/etc/cloud/cloud.cfg.d by curtin).

So what happens is:
 a.) cloud-init-local executes and does not find a local datasource.
 b.) since nothing is found, we render default networking configuration
     so that we can search for network datasources. in this case the default
     networking configuration comes from /etc/cloud/cloud.cfg.d
 c.) cloud-init-net runs and finds maas datasource, and even that this is
     not a new instance, which woudl prevent it from rendering network config.

The problem is that to get the instance-id to verify, we had to render
network config, so its too late at the point when we could determine
that we did not / should not re-render networking.

The fix for this is to make the MAAS datasource have a 'check_instance_id'
function. If that returns true, then we skip 'b', as this is determined to
not be a new instance id, and even some portions of 'c' are skipped.

The reason that MAAS datasource does not have a 'check_instance_id' is
that there is no local way to verify that the the cached data (in
/var/lib/cloud/instance/) is valid.

The easiest solution is to jsut add the method and have it return True.
The issue with doing this is then snapshot and re-provision of a new
image requires cleaning of /var/lib/cloud/instance (as otherwise it
would never know it was a new instance).

MAAS doesn't *have* an instance-id. (bug 944325). However, it seems
that the fix that went in for bug 1507586 will change the token on
each deploy. Since that token is available to cloud-init, and will
be updated on every install (by curtin) we can compare current token to
cached token and use that as 'check_instance_id'.

That will mean that capturing an image of a MAAS deployed system and
then deploying that on another datasource will not work correctly.
In order to do that the user would have "clean" the image with either:
  rm -Rf /var/lib/cloud/
  rm -f /etc/cloud/cloud.cfg.d/*curtin*

Ryan Harper (raharper) wrote :

Cloud-init certainly knows that MAAS is the datasource without networking; During a curtin deploy, MAAS either dpkg-reconfigures cloud-init, or provides a cloud-config that writes out a config with MAAS URL defined (or one is provided on the kernel command line).

Ad deployment time, curtin renders the network configuration into the target system as cloud-config.

During boot of the node

1) cloud-init's cloud-id finds a MAAS configuration in /etc/cloud/cloud.cfg.d/* and enables cloud-init service
2) cloud-init --local will create a DataSourceMAAS based on the cloud-init system config
3) cloud-init --local checks if the datasource has a 'network_config' attribute;
  DataSourceMAAS does not
4) cloud-init --local checks if the cloud-init system config has a network configiration (it does in /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg
5) The system network config provided by MAAS gets rendered and brings up the network configuration as configured in MAAS

For this but, we have local modification. Local modifications aren't expected to stay; the file certainly indicates not to edit it in the header of the file. Note that the same thing would happen on an Ubuntu system deployed with MAAS w.r.t modifying network config outside of MAAS control.

Persistent changes need to be applied to the network configuration of the node in MAAS at deployment time.

Possible temporary workarounds include updating /etc/cloud/cloud.cfg.d/50-curtin-network.cfg such that when the node reboots the change will get applied. One will need to be versed in the cloud-init network configuration formats.

http://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v1.html#network-config-v1

Now, as Scott says, cloud-init doesn't re-render networking if cloud-init can determine if it's running on the same instance as before. This isn't implemented in the MAAS datasource as it was not available from MAAS.

In light of the new information about oauth tokens, that seems like a reasonable check, possibly also collecting the MAAS url as well; though I suspect the tokens won't collide.

Ron Lipke (rlipke) wrote :

I'd like to add that we are seeing the same behavior as of Centos 7.4 and cloud-init 0.7.9-9.el7.centos.2.x86_64 in centos/os and 0.7.9-9.el7.centos.1.x86_64 in centos/updates.

We have a build pipeline that configures /etc/sysconfig/network-scripts/ifcfg-eth0 but in initial boot and any reboot thereafter it ends up being

# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=0a:be:79:ba:6e:ec
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

This only started with the new cloud-init versions in centos 7.4.

Our current solution is to set

network:
  config: disabled

in /etc/cloud/cloud.cfg which isn't ideal but works.

Scott Moser (smoser) on 2018-01-08
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Chad Smith (chad.smith) on 2018-01-10
Changed in cloud-init:
status: Confirmed → Fix Committed
Andres Rodriguez (andreserl) wrote :

@Lee

Adding a task to maas-images to ensure we are using the latest cloud-init which contains this fix.

Changed in maas:
status: Incomplete → Invalid
Changed in maas-images:
assignee: nobody → Lee Trager (ltrager)

This bug is believed to be fixed in cloud-init in 18.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Scott Moser (smoser) wrote :

If you're plagued by this bug, there are at least 2 workarounds and one
long term solution. 'A' and probably the easiest and most effective
workaround.

A.) set 'manual_cache_clean' to true

After first boot, do this:
  $ echo "manual_cache_clean: true" > /etc/cloud/cloud.cfg.d/99-manual.cfg
  $ touch /var/lib/cloud/instance/manual-clean

The above could also be done with 'in-target' during installation
via 'late_command' or some other mechanism.
In the install environment, only the writing of 99-manual.cfg is necessary.
If you do it after first boot (before a reboot) then you must also
touch the manual-clean file.

the 'manual_cache_clean' setting tells cloud-init that it should not search
for a datasource if there is one already present from a previous boot.
Setting this to 'true' means that the user must remove
/var/lib/cloud/instance/ manually before capturing an image to make new
instances from.

For MAAS, this is a fairly reasonable solution as "capturing" an image and
re-deploying it is not common.

B.) disable cloud-init networking *after* first boot.
To make cloud-init not render networking config on subsequent boots:

 $ echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

curtin feeds the networking information it received from MAAS through
to cloud-init in etc/cloud/cloud.cfg.d/50-curtin-networking.cfg .
Above, 99-disable-network-config.cfg filename is provided as it will take
precedence over other curtin's provided network information.

This solution *cannot* be done from the installation environment, as
doing so would make cloud-init not render networking on first boot, and
thus no networking information would be done in the machine at all.

c.) upgrade your cloud-init inside the image.
MAAS's centos images are built with cloud-init/el-stable repo [1].
That currently contains a version of cloud-init (0.7.9+224.g681baff-1) that
does not have a fix for this bug.

We also maintain 2 other repos:
  cloud-init/el-testing
  cloud-init/cloud-init-dev

Ultimately we'll want to get cloud-init at 18.1 into the el-testing repo
and then even get those into the maas images.

Frode Nordahl (fnordahl) wrote :

Addendum to comment #8

Since /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg is rewritten at every boot with a cloud-init network section, any network section added to /etc/cloud/cloud.cfg will be overridden and have no effect.

A workaround for MAAS deployed CentOS images would be:
1) Log into the affected CentOS server
2) Replace the contents of /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg with the following:

network:
  config: disabled

3) Make the file immutable by running:
    sudo chattr +i /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg
4) Verify success of above command by running:
    lsattr /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg

Output should look like this:
    ----i--------e-- /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg

fermulator (fermulator) wrote :

Can someone please reference the _EXACT_ resolution w.r.t. "cloud-images" that has been released? (reference to git commit?)

Same for what is the intended fix to be released for MaaS deployments.

I've read through the comments and there are various aspects/considerations to this issue. How are we addressing?

Scott Moser (smoser) wrote :

Cloud-init fixed this in the commit 5f550420d2.
  https://git.launchpad.net/cloud-init/commit/?id=5f550420d2

This fix is not currently available in MAAS's centos images.

The 'maas-images' task in this bug would track that.

Scott Moser (smoser) on 2018-07-18
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers