[2.5] Commissioning results in an alias interface automatically created

Bug #1803188 reported by Andres Rodriguez on 2018-11-13
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
High
Mike Pontillo
2.4
High
Mike Pontillo

Bug Description

I commissioning a machine (which happened to be a VM inside a MAAS deployed Pod), and the interface obtained two different IP addresses during commissioning.

The network commissioning script captured this and created a new interface in the machine with Ready state. The commissioning output for interfaces show:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:0f:2d:94 brd ff:ff:ff:ff:ff:ff
    inet 10.90.90.225/24 brd 10.90.90.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet 10.90.90.198/24 brd 10.90.90.255 scope global secondary ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe0f:2d94/64 scope link
       valid_lft forever preferred_lft forever

The resulting machine, please see attached screenshot.

That said, this doesn't seem like an issue isolated to machines inside a pod because I /think/ I've seen this issue in other machines.

Related branches

Andres Rodriguez (andreserl) wrote :
Changed in maas:
importance: Undecided → High
status: New → Triaged
milestone: none → 2.5.0rc1
assignee: nobody → Mike Pontillo (mpontillo)
description: updated
Mike Pontillo (mpontillo) wrote :

Just thinking out loud here, but I'm wondering if this could be related to bug #1749019. I know we've had issues with DHCP IP addresses from the PXE environment not playing well with subsequently-acquired addresses from ISC DHCP, but that's the most recent thing that has changed in this area...

Andres Rodriguez (andreserl) wrote :

I've not explored why it would have gotten 2 different IP addresses, but it could also be the fact that during commissioning, we try to bring up other interfaces to see if we can discover networks. I wonder if this has regressed and it is trying to dhcp scan the interface used for PXE, which would cause it from potentially obtaining a new IP?

Or rather, there was a bug were cloud-initramfs-tools (i think it was there) that would copy the network config from the initrd to the ephemeral environment, causing the machine not to re-dhcp and that caused a few regressions, such as not renewing the IP lease because network configuration was "statically" configured?

Mike Pontillo (mpontillo) wrote :

Yes, we should take a look at the handoff between the IP address the machine gets in the pre-boot environment and the next DHCP client that will be taking over the lease. I would be willing to bet that's part of the issue.

The most likely reasons I can think of for this to go wrong:

 - The DHCP server cannot match up the lease acquired at PXE boot time with the DHCP request from the ephemeral environment (possibly if it uses a different client identifier).

 - The lease expires between PXE boot time and the time of the DHCP request in the ephemeral environment. For example, in a small dynamic range with many machines booting, it could have expired and been handed to a different machine.

Andres Rodriguez (andreserl) wrote :

Sure, that said, the fact that a single with 2 addresses results on a alias after commissioning is still an issue (regardless of what other issues may be happening in the commissioning environment).

Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Jason Hobbs (jason-hobbs) wrote :

We are seeing this on 2.4.3 also:

Fetching Juju GUI 2.14.0
Waiting for address
Attempting to connect to 10.244.40.201:22
Attempting to connect to 10.244.40.202:22
Connected to 10.244.40.202
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 10.244.40.201 to verify accessibility...
Bootstrap complete, "foundations-maas" controller now available
Controller machines are in the "controller" model
Initial model "default" added
ERROR juju-ha-space is not set and a unique usable address was not found for machines: 0
run "juju config juju-ha-space=<name>" to set a space for Mongo peer communication

Jason Hobbs (jason-hobbs) wrote :

The failure in comment #6 happened around 2018-12-11-11:38:26

Nicolas Pochet (npochet) wrote :
Download full text (6.6 KiB)

When using MAAS 2.5.3, and composing a machine on a pod, it creates an alias for the first interface.

The command used to compose the VM:

maas root pod compose 9 cores=2 memory=1024 interfaces='eth0:space=oam-space;eth1:space=maas2'
Success.
Machine-readable output follows: {
    "system_id": "ar68rw",
    "resource_uri": "/MAAS/api/2.0/machines/ar68rw/"
}

When inspecting the interfaces for this machine after:
maas root interfaces read ar68rw
Success.
Machine-readable output follows:
[
    {
        "system_id": "ar68rw",
        "parents": [],
        "effective_mtu": 1500,
        "id": 55,
        "name": "eth0",
        "type": "physical",
        "vlan": {
            "vid": 1,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5001,
            "name": "untagged",
            "primary_rack": "k87hss",
            "fabric": "default",
            "fabric_id": 1,
            "space": "oam-space",
            "secondary_rack": null,
            "resource_uri": "/MAAS/api/2.0/vlans/5001/"
        },
        "tags": [],
        "discovered": [
            {
                "subnet": {
                    "name": "oam",
                    "vlan": {
                        "vid": 1,
                        "mtu": 1500,
                        "dhcp_on": true,
                        "external_dhcp": null,
                        "relay_vlan": null,
                        "id": 5001,
                        "name": "untagged",
                        "primary_rack": "k87hss",
                        "fabric": "default",
                        "fabric_id": 1,
                        "space": "oam-space",
                        "secondary_rack": null,
                        "resource_uri": "/MAAS/api/2.0/vlans/5001/"
                    },
                    "cidr": "192.168.105.0/24",
                    "rdns_mode": 2,
                    "gateway_ip": "192.168.105.1",
                    "dns_servers": [],
                    "allow_dns": true,
                    "allow_proxy": true,
                    "active_discovery": false,
                    "managed": true,
                    "id": 1,
                    "space": "oam-space",
                    "resource_uri": "/MAAS/api/2.0/subnets/1/"
                },
                "ip_address": "192.168.105.14"
            }
        ],
        "vendor": null,
        "enabled": true,
        "mac_address": "52:54:00:6a:e2:ff",
        "params": "",
        "links": [
            {
                "id": 276,
                "mode": "auto",
                "subnet": {
                    "name": "oam",
                    "vlan": {
                        "vid": 1,
                        "mtu": 1500,
       ...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers