[2.0a1] Can't deploy a node (no interfaces on rack controller)

Bug #1554999 reported by Dean Henrichsmeyer on 2016-03-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Blake Rouse

Bug Description

I upgraded (via dist-upgrade) to 2.0.0~alpha1+bzr4736-0ubuntu1 from 1.10. MAAS was up and appeared normal. I tried to deploy a node and got "The deploy action for 1 node failed with error: No rack controllers can access the BMC of node: roselia".

I also don't see any way of inspecting that configuration in the UI.

Related branches

Christian Reis (kiko) on 2016-03-09
Changed in maas:
importance: Undecided → Critical
milestone: none → 2.0.0
Andres Rodriguez (andreserl) wrote :

Hi Dean,

can you please provide:

1. logs (var/log/maas/*.log)
2. Is your BMC directly connected to a rack or is it routed?

Changed in maas:
status: New → Incomplete
Dean Henrichsmeyer (dean) wrote :
summary: - Can't deploy a node after upgrade to 2.0.0~alpha1+bzr4736-0ubuntu1 from
- 1.10
+ [2.0a1] Can't deploy a node after upgrade to
+ 2.0.0~alpha1+bzr4736-0ubuntu1 from 1.10

Is your BMC directly connected to a rack or is it routed?

Dean Henrichsmeyer (dean) wrote :

My rack controller doesn't appear to have any interfaces.

Dean Henrichsmeyer (dean) wrote :

Oh, and it's directly connected.

Changed in maas:
status: Incomplete → Confirmed
Blake Rouse (blake-rouse) wrote :

If you rack controller has no interfaces then that is why MAAS cannot determine which rack controller can access this BMC. Resolving the issue of why your rack controller has no interfaces will fix this bug. Now if you BMC was not directly connected then this bug would still exist even if your rack controller had interfaces, but that is a different bug.

Blake Rouse (blake-rouse) wrote :

Dean,

Please run the following on your MAAS machine and attach the output to this bug.

cat /proc/1/cgroup

find -H /sys/class/net/* | grep bond

python3 -c "from provisioningserver.networks import get_interfaces_definition; from pprint import pprint; pprint(get_interfaces_definition()[0])"

Thanks,
Blake

Changed in maas:
status: Confirmed → Incomplete
summary: - [2.0a1] Can't deploy a node after upgrade to
- 2.0.0~alpha1+bzr4736-0ubuntu1 from 1.10
+ [2.0a1] Can't deploy a node (no interfaces on rack controller)
tags: added: networking
Dean Henrichsmeyer (dean) wrote :

<root@maasx>:~# ip addr sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:a7:a0:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.7.0.3/16 brd 10.7.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fea7:a04c/64 scope link
       valid_lft forever preferred_lft forever

Dean Henrichsmeyer (dean) wrote :

<root@maasx>:~# cat /proc/1/cgroup
11:blkio:/init.scope
10:memory:/init.scope
9:devices:/init.scope
8:cpuset:/
7:pids:/init.scope
6:hugetlb:/
5:freezer:/
4:cpu,cpuacct:/init.scope
3:perf_event:/
2:net_cls,net_prio:/
1:name=systemd:/init.scope

<root@maasx>:~# find -H /sys/class/net/* | grep bond
<root@maasx>:~#
<root@maasx>:~# python3 -c "from provisioningserver.networks import get_interfaces_definition; from pprint import pprint; pprint(get_interfaces_definition()[0])"
{}
<root@maasx>:~#

Blake Rouse (blake-rouse) wrote :

Some more data, please:

python3 -c "from provisioningserver.utils.ipaddr import get_ip_addr; from pprint import pprint; pprint(get_ip_addr())"

Thanks,
Blake

Dean Henrichsmeyer (dean) wrote :

<root@maasx>:~# python3 -c "from provisioningserver.utils.ipaddr import get_ip_addr; from pprint import pprint; pprint(get_ip_addr())"
{'eth0': {'flags': ['BROADCAST', 'MULTICAST', 'UP', 'LOWER_UP'],
          'index': 7,
          'inet': ['10.7.0.3/16'],
          'mac': '00:16:3e:a7:a0:4c',
          'name': 'eth0',
          'parent': 'if8',
          'settings': {'group': 'default',
                       'mtu': '1500',
                       'qdisc': 'noqueue',
                       'qlen': '1000',
                       'state': 'UP'},
          'type': 'ethernet'},
 'lo': {'flags': ['LOOPBACK', 'UP', 'LOWER_UP'],
        'index': 1,
        'inet': ['127.0.0.1/8'],
        'inet6': ['::1/128'],
        'name': 'lo',
        'settings': {'group': 'default',
                     'mtu': '65536',
                     'qdisc': 'noqueue',
                     'qlen': '1',
                     'state': 'UNKNOWN'},
        'type': 'loopback'}}

Changed in maas:
status: Incomplete → Confirmed
Blake Rouse (blake-rouse) wrote :

Ah okay so the issue here is that "eth0" is reported as "ethernet" only. For MAAS to pick up an interface on physical machine (not in a container, by looking at your cgroup output your not in a container) we look for "ethernet.physical". For some reason on your machine "eth0" is not being reported by MAAS as a physical ethernet device.

We filter the interfaces so we don't pick up interfaces that are created just for a container (eg. vnet devices). We need to determine why on your machine that interface is not "ethernet.physical".

Could you provide the output of the following commands:

tree /sys/class/net/eth0/*
tree /sys/class/net/if8/*

Blake Rouse (blake-rouse) wrote :

And

tree /sys/class/net/eth0@if8/*

Changed in maas:
status: Confirmed → Incomplete
Dean Henrichsmeyer (dean) wrote :

It's a container so eth0 really isn't physical. That's why it's not tagged .physical.

I think we should default to assuming physical if we can't tell.

Dean Henrichsmeyer (dean) wrote :

<dean@courage>:~$ lxc profile show maas
name: maas
config:
  raw.lxc: |
    lxc.aa_profile=unconfined
    lxc.cgroup.devices.allow = b 7:* rwm
    lxc.cgroup.devices.allow = c 10:237 rwm
  security.privileged: "true"
description: ""
devices:
  eth0:
    nictype: bridged
    parent: br0
    type: nic

Changed in maas:
status: Incomplete → Triaged
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: In Progress → Fix Committed
Blake Rouse (blake-rouse) wrote :

The actual problem here is that we should have been using "systemd-detect-virt" instead of MAAS doing the detection manually.

Changed in maas:
status: Fix Committed → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments