No network in AWS (EC-Classic) after stopping and starging instance

Bug #1802073 reported by Jani Ollikainen on 2018-11-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Undecided
Unassigned

Bug Description

I don't know is this cloud-init or netplan or what, but this is not good.

Background:
# lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04
# apt-cache policy cloud-init
cloud-init:
  Installed: 18.4-0ubuntu1~18.04.1
  Candidate: 18.4-0ubuntu1~18.04.1
  Version table:
 *** 18.4-0ubuntu1~18.04.1 500
        500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     18.2-14-g6d48d265-0ubuntu1 500
        500 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

1. Get newest image to use

$ aws --region eu-west-1 ec2 describe-images --owners 099720109477 --filters Name=root-device-type,Values=ebs Name=architecture,Values=x86_64 Name=name,Values='*hvm-ssd/ubuntu-bionic-18.04*' --query 'sort_by(Images, &Name)[-1].ImageId'

"ami-08596fdd2d5b64915"

2. Start instance to EC2-Classic with that image.

3. Try to SSH. Everything is ok.

# cat /var/log/cloud-init-output.log
Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'init-local' at Wed, 07 Nov 2018 08:12:16 +0000. Up 10.51 seconds.
Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'init' at Wed, 07 Nov 2018 08:12:21 +0000. Up 15.50 seconds.
ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------------------+-----------------+--------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+------+-----------------------------+-----------------+--------+-------------------+
ci-info: | eth0 | True | 10.74.200.25 | 255.255.255.192 | global | 22:00:0a:4a:c8:19 |
ci-info: | eth0 | True | fe80::2000:aff:fe4a:c819/64 | . | link | 22:00:0a:4a:c8:19 |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
ci-info: | lo | True | ::1/128 | . | host | . |
ci-info: +--------+------+-----------------------------+-----------------+--------+-------------------+
...
Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'modules:config' at Wed, 07 Nov 2018 08:12:41 +0000. Up 35.63 seconds.
Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'modules:final' at Wed, 07 Nov 2018 08:12:44 +0000. Up 38.98 seconds.
Cloud-init v. 18.4-0ubuntu1~18.04.1 finished at Wed, 07 Nov 2018 08:12:45 +0000. Datasource DataSourceEc2Local. Up 39.38 seconds

4. Stop the instance.

5. Start the instance.

6. Try to SSH.
Expected to happen: Instance has network and is working.
What happens: Instance has no working network

Getting instance log we can see:
[ 11.342357] cloud-init[412]: Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'init-local' at Wed, 07 Nov 2018 08:21:07 +0000. Up 10.77 seconds.
[ OK ] Started Initial cloud-init job (pre-networking).
[ OK ] Reached target Network (Pre).
         Starting Network Service...
[ OK ] Started Network Service.
         Starting Network Name Resolution...
         Starting Wait for Network to be Configured...
[ OK ] Started Wait for Network to be Configured.
         Starting Initial cloud-init job (metadata service crawler)...
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Host and Network Name Lookups.
[ OK ] Reached target Network.
[ 13.036207] cloud-init[637]: Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'init' at Wed, 07 Nov 2018 08:21:08 +0000. Up 12.55 seconds.
[ 13.052849] cloud-init[637]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[ 13.100325] cloud-init[637]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[ 13.121790] cloud-init[637]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 13.129189] cloud-init[637]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[ 13.144839] cloud-init[637]: ci-info: | eth0 | False | . | . | . | 22:00:0b:0a:cb:2d |[ OK ] Started Initial cloud-init job (metadata service crawler).
[ 13.158694] cloud-init
[ OK ] Reached target System Initialization.[637]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ OK ] Started Daily apt download activities.
[ 13.179053] ] Started Message of the Day.
cloud-init[637]: ci-info: | lo | True | ::1/128 | . | host | . |[ OK ] Started ACPI Events Check.
[ OK ] Reached target Paths.
[ 13.201012] cloud-init[637]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+] Listening on ACPID Listen Socket.
[ OK ] Listening on Open-iSCSI iscsid Socket.
[ 13.213993] cloud-init[ OK ] Listening on D-Bus System Message Bus Socket.[637]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
         Starting Socket activation for snappy daemon.
[ 13.229707] cloud-init[637]: ci-info: +-------+-------------+---------+-----------+-------+
[ OK ] Started Daily apt upgrade and clean activities.
[ 13.244949] cloud-init[637]: ci-info: | Route | Destination | Gateway | Interface | Flags |] Listening on UUID daemon activation socket.
         Starting LXD - unix socket.
[ OK ] Started Daily Cleanup of Temporary Directories.[ 13.256281] cloud-init[637]: ci-info: +-------+-------------+---------+-----------+-------+
[ OK ] Started Discard unused blocks once a week.
[ OK ] Reached target Timers.
[ OK ] Reached target Cloud-config availability.
[ 13.286424] cloud-init[637]:
[ OK ] Reached target Network is Online.ci-info: +-------+-------------+---------+-----------+-------+

It would be nice that the instances would work also after stop&start as they used to.
My speculation for the problem is that /etc/netplan/50-cloud-init.yaml has:
            match:
                macaddress: 22:00:0a:66:16:17

Which changes in the stop&start and it is handled in wrong order. File is not generated before trying to get the network up and there is no device for that macaddress. But without console access and internal knowledge how this netplan/cloud-init/systemd thingie works it's kind of hard to
pinpoint the problematic thing.

But I did a small test. Edited /usr/lib/python3/dist-packages/cloudinit/net/netplan.py

            if if_type == 'physical':
                # required_keys = ['name', 'mac_address']
                eth = {
                    'set-name': ifname,
                    'match': ifcfg.get('match', None),
                }
                if eth['match'] is None:
                    macaddr = ifcfg.get('mac_address', None)
                    if macaddr is not None:
                        eth['match'] = {'macaddress': macaddr.lower()}
                    else:
                        del eth['match']
                        del eth['set-name']
+ del eth['match']
+ del eth['set-name']
                _extract_addresses(ifcfg, eth, ifname)
                ethernets.update({ifname: eth})

And then run:
cloud-init clean
cloud-init init

# cat /etc/netplan/50-cloud-init.yaml

# This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true

And then stopped the instance and started it.. It gets the network and works. And stopped again just to be sure it wasn't one time magic. Started and it works.

So the problem really seems that the match/macaddress but how one should properly fix that, I'll leave for people who have made it misbehave like this.

But I think there might be some pretty scared and annoyed people after stopping the instance and starting it, the instance is unreachable. Also depending on their skills to troubleshoot the problem and mount the volume to another instance and fix it (if it's ebs backed, if not, sorry, make a new instance).

Jani Ollikainen (bestis) wrote :

And little bit more testing. It really seems to be that cloud-init does not generate the file again if not run clout-init clean before. So it doesn't generate the file again after stop&start and macaddress differs and no network, and no network.

Let's test it by editing the file by hand:

$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
# EDITED
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true

$ sudo cloud-init init

$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
# EDITED
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true

$ sudo cloud-init clean

$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
# EDITED
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true

$ sudo cloud-init init

$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by
# the datasource. Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true

So clean and init is needed for the file to be generated again, and probably normal boot does just the init and yes, there is wrong macaddress, so no network for you.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers