network-manager on uc20 gets new ip address in recover mode

Bug #1911357 reported by Ian Johnson on 2021-01-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snappy-hwe-snaps
Undecided
Unassigned

Bug Description

After building a new image with network-manager in the snaps section of the model and specifying that network-manager should be included in both run mode and recover mode, I notice that the device gets a different IP address when transitioning to recover mode compared to the IP address the device gets on first boot into run mode.

From what I can tell, what I think happens actually is that on the first boot in run mode, systemd-networkd automatically gets an IP address, then after seeding finishes, network-manager takes over and gets a new, different IP address. Then when transitioning to recover mode, again systemd-networkd runs and gets an IP address (which I think is the same IP address that networkd got before network-manager started up in the first boot of run mode), then seeding in the tmpfs of the recover mode proceeds and network-manager starts up again, and network-manager takes over and gets another IP address, but the IP address that it gets is different from the one that network-manager ended up with during the first boot of run mode.

The expectation is that network-manager upon starting up in recover mode gets the same IP address it had in run mode.

I am attaching relevant logs for network-manager, the output of `ip a show` and the config files that are all written to /etc/netplan. I used console-conf to configure the device but it was ethernet so there was nothing to configure for the networking in that part of the setup process.

In order to reproduce this you need to ensure that network-manager is an included recover mode snap, by building your own image with your own model assertion, something like this:

```json
{
    "type": "model",
    "series": "16",
    "authority-id": "some-id",
    "brand-id": "some-id",
    "model": "ubuntu-core-20-pi-arm64",
    "architecture": "arm64",
    "timestamp": "2020-03-31T12:00:00.0Z",
    "base": "core20",
    "grade": "dangerous",
    "snaps": [
        {
            "name": "pi",
            "type": "gadget",
            "default-channel": "20/edge",
            "id": "YbGa9O3dAXl88YLI6Y1bGG74pwBxZyKg"
        },
        {
            "name": "pi-kernel",
            "type": "kernel",
            "default-channel": "20/edge",
            "id": "jeIuP6tfFrvAdic8DMWqHmoaoukAPNbJ"
        },
        {
            "name": "core20",
            "type": "base",
            "default-channel": "latest/edge",
            "id": "DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q"
        },
        {
            "name": "snapd",
            "type": "snapd",
            "default-channel": "latest/edge",
            "id": "PMrrV4ml8uWuEUDBT8dSGnKUYbevVhc4"
        },
        {
            "name": "network-manager",
            "default-channel": "20/beta",
            "modes": [
                "run",
                "recover"
            ]
        }
    ]
}
```

Revision history for this message
Ian Johnson (anonymouse67) wrote :
Revision history for this message
Ian Johnson (anonymouse67) wrote :

Here are the associated logs from recover mode

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

NM creates a netplan configuration to be set as the default renderer. When netplan applies it, networkd releases the DHCP lease and NM requests a new address to the DHCP server, which might assign the same or a different address to the original one (it tends to be the same if the request comes from the same MAC, but not all DHCP server behave in the same way).

AFAIK there is currently no way to transfer DHCP leases from networkd to NM. Maybe an option would be ask networkd to not release the IP when it stops, then I think NM would take over the current IP address. That can be done with:

[Network]
KeepConfiguration=dhcp-on-stop

in the networkd connection files. But I have not seen this supported in netplan reference.

Maybe when network configuration is controlled by snapd setting this will not be an issue, if netplan knows from the beginning that NM is the default renderer, it won't create configuration files for networkd.

In the case of rebooting to recover mode and have the same address, you will need to copy around NM's DHCP lease files across partitions to make sure the DHCP client tries to grab the old address from the server. That should work.

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

(in the snap, DHCP lease info is stored in/var/snap/network-manager/current/var/lib/NetworkManager/)

Revision history for this message
Ian Johnson (anonymouse67) wrote :

I think it's "ok" that network-manager doesn't take over the same IP address as networkd, but the bug is that network-manager ends up using 2 IP addresses, one for network-manager in run mode and a different one for network-manager in recover mode. The issue I see with this is that you don't have a way to know what IP address to SSH into after a device goes into recover mode, especially on headless devices, and you essentially have to do a network scan to find it (and you also need to know the MAC address, etc.)

Regarding the DHCP lease info being stored in $SNAP_DATA/var/lib/NetworkManager, is it possible to patch network-manager to store this information somewhere else like in /etc/netplan or somewhere else in writable that we could safely copy from run mode to recover mode? We don't currently have a way to copy data from $SNAP_DATA on run mode to $SNAP_DATA on recover mode since recover mode starts with a fresh, empty data partition that is populated by seeding, but we do copy some minimal data over, such as files in /etc/netplan.

We do eventually plan on having some mechanism for snaps to store data in ubuntu-save, and this seems like a good candidate for that, but it would again need patching to network-manager to ensure that network-manager reads/writes the DHCP file to ubuntu-save (likely only writing/saving it there after confirming that the network configuration is good to prevent getting broken networking in recover mode as well).

I will join the meeting this week to discuss this.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers