Error while calling ScanNetworks: Unable to get RPC connection for rack controller

Bug #1953049 reported by Marco Marino
36
This bug affects 12 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Alexsander de Souza
3.4
Triaged
High
Alexsander de Souza
3.5
Triaged
High
Alexsander de Souza

Bug Description

Hi Team,
I have an issue with MaaS 2.7.3:

MaaS is not aware of all used IP addresses in a managed subnet.
    {
        "name": "internal",
        "vlan": {
            "vid": 1763,
            "mtu": 9000,
            "dhcp_on": false,
            "external_dhcp": null,
            "relay_vlan": null,
            "name": "1763",
            "fabric": "default",
            "space": "internal-space",
            "id": 5004,
            "secondary_rack": null,
            "fabric_id": 1,
            "primary_rack": null,
            "resource_uri": "/MAAS/api/2.0/vlans/5004/"
        },
        "cidr": "10.38.3.0/24",
        "rdns_mode": 2,
        "gateway_ip": null,
        "dns_servers": [],
        "allow_dns": true,
        "allow_proxy": true,
        "active_discovery": false,
        "managed": true,
        "space": "internal-space",
        "id": 3,
        "resource_uri": "/MAAS/api/2.0/subnets/3/"
    }

I checked the list of IP addresses with:

maas admin subnet ip-addresses 10.38.3.0/24 > ip_addresses.txt # (please check the attached file)
and the IP 10.38.3.78 is missing from the list even if it is used by an LXD container deployed with juju (check the juju_status.txt file)

In regiond.log I see every 5 minutes errors like this:
2021-11-23 15:40:59 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'infra-1' (4ybaww).
2021-11-23 15:40:59 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'infra-2' (sgx8m4).
2021-11-23 15:40:59 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'infra-3' (gcw8qw).
2021-11-23 15:40:59 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.

Please, let me know if there is a way to fix this and update the list of used ip addresses in MaaS.
Attached 3 sosreports taken from MaaS Nodes

Thank you.
Regards,
Marco

Related branches

description: updated
Revision history for this message
Bill Wear (billwear) wrote :

MAAS core team cannot access the support portal to view your sosreports. please attach them directory to this bug, without the link.

Changed in maas:
status: New → Incomplete
Revision history for this message
Marco Marino (marino-mrc) wrote :

Hi Bill,
unfortunately, I cannot upload it. I think the issue is related to the size of the file (around 400MB) and launchpad drops the connections after a while when I start the upload.
Please let me know if there is another way.

Thank you.
Regards,
Marco

Revision history for this message
Alberto Donato (ack) wrote :

Could you please attach just regiond.log and rackd.log?

We can start from there and ask for more logs if needed.

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Alberto Donato (ack) wrote :

Also, you mentioned the 10.38.3.78 IP from juju status, but the IP in the output is actually 10.38.0.70, so you'll have to check in the IPs for the 10.38.0.0/24 subnet.

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Marco Marino (marino-mrc) wrote :

Hi Alberto,
the output of "juju status" only shows a single ip, but the container has multiple IPs.

Here is the output of "ip addr"

27: eth0 inet 10.38.2.74/24 brd 10.38.2.255 scope global eth0\ valid_lft forever preferred_lft forever
27: eth0 inet6 fe80::216:3eff:fe84:2037/64 scope link \ valid_lft forever preferred_lft forever
29: eth1 inet 10.38.3.78/24 brd 10.38.3.255 scope global eth1\ valid_lft forever preferred_lft forever
29: eth1 inet6 fe80::216:3eff:fef8:b030/64 scope link \ valid_lft forever preferred_lft forever
31: eth2 inet 10.38.0.57/24 brd 10.38.0.255 scope global eth2\ valid_lft forever preferred_lft forever
31: eth2 inet6 fe80::216:3eff:fec9:306b/64 scope link \ valid_lft forever preferred_lft forever

Regards,
Marco

Revision history for this message
Alberto Donato (ack) wrote :

Hi Marco, would it be possible to upgrade MAAS to a more recent version, and check if the error still occurs?

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Marco Marino (marino-mrc) wrote :

Hi Alberto,
probably yes. I'll update the bug as soon as possible.

Thank you.
Regards,
Marco

Alberto Donato (ack)
Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Alvin Cura (alvinc) wrote :

I am seeing this behaviour as well.

However, I only have one network interface, only one ip address, and only one region+rack controller; so it has no other controllers to talk to.

Revision history for this message
Alvin Cura (alvinc) wrote :

Server information attached.

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:92:04:e7:c5:e9 brd ff:ff:ff:ff:ff:ff
    inet 10.100.127.223/21 brd 10.100.127.255 scope global enp0s31f6
       valid_lft forever preferred_lft forever
    inet6 fe80::a92:4ff:fee7:c5e9/64 scope link
       valid_lft forever preferred_lft forever
it@maas-temp:/var/snap/maas/current$ snap list maas
Name Version Rev Tracking Publisher Notes
maas 3.2.6-12016-g.19812b4da 23947 3.2/stable canonical✓ -

Changed in maas:
status: Expired → Confirmed
Revision history for this message
Björn Tillenius (bjornt) wrote :

Still need the logs to be able to start debugging this issue.

Changed in maas:
status: Confirmed → Incomplete
Revision history for this message
mitzone (mitzone) wrote :

I am too seeing errors about not being able to initialize ScanNetwork

2023-03-11 01:36:40 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas-test-rack1' (6h7sx7).
2023-03-11 01:36:40 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas-test-rack2' (4sq738).
2023-03-11 01:36:40 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.
2023-03-11 01:37:49 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2023-03-11 01:39:19 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2023-03-11 01:41:40 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas-test-rack1' (6h7sx7).
2023-03-11 01:41:40 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas-test-rack2' (4sq738).
2023-03-11 01:41:40 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.

running snap 3.2.7 on Ubuntu 20
Setup is 2 x region controllers and 2 x rack controllers.

I attached the regiond and rackd logs for all.
Thanks.

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hi mitzone,

Can you please check that your database is healthy and all tables are in place?

1. Is it a fresh MAAS install?
2. When you did `maas init` were there any errors during DB migration?

2023-03-10 22:40:37 maasserver.start_up: [error] Database error during start-up
Traceback (most recent call last):
  File "/snap/maas/26274/usr/lib/python3/dist-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "maasserver_config" does not exist
LINE 1: ..._config"."name", "maasserver_config"."value" FROM "maasserve...
                                                             ^

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
mitzone (mitzone) wrote :

Hello,
I have this behaviour on a fresh install of 3.3.x latest and 3.2.7 for both DEB and SNAP installs.
I'm not running any HA setup anymore.
Region and rack on same box. ScanNetworks not working.
Thank you.

Revision history for this message
Anton Troyanov (troyanov) wrote :

1. When you did `maas init region+rack` were there any errors during DB migration step?
2. Are you using standalone PostgreSQL?
3. Were there any errors in PostgreSQL logs during the initial MAAS setup?

I just did a fresh installation of MAAS and `maas init` completed without any errors.
Could it be something with your database?

Changed in maas:
status: Incomplete → Opinion
status: Opinion → New
status: New → Incomplete
Revision history for this message
mitzone (mitzone) wrote :

So after a fresh (no DB migration, no nothing) install of MAAS 3.2.7 from SNAP (same issue for DEB) for ex.
Everything seems to work correctly, I can enroll/commission/deploy, but I see this error every 5 minutes or so in my logs :

2023-03-11 01:36:40 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'maas-test-rack1' (6h7sx7).

2023-03-11 01:36:40 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller.

I thought this might be only cosmetic, so I booted up 2 machines not managed by MAAS in the same subnet, and their IPs were not populated in MAAS as being used, so definetly Active network discovery is not working.

Thanks.

Changed in maas:
status: Incomplete → New
Revision history for this message
Anton Troyanov (troyanov) wrote :

I just did a clean install and I don't see any errors that you do have in your log. It might be something with your installation.

What OS and version are you using? Are there any CIS hardened configs?

The error I've pointed to in https://bugs.launchpad.net/maas/+bug/1953049/comments/22
Says that `maas init` (installation) was not completed successfully and you are missing some tables.

There are also several FileNotFoundError in you rack logs.
E.g.
builtins.FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/maas/26274/proxy/.maas-proxy.conf.l7aj2oa9.tmp'

That file is also normally being created.

So far I am not able to reproduce it with 3.2.7 snap and focal

Changed in maas:
status: New → Incomplete
Revision history for this message
mitzone (mitzone) wrote :

Ok, I'll reinstall.
The logs I posted here are from a DEB deployment with HA (2 region controllers and 2 rack controllers). In otrder to bring up the second region controller, it's normal I think to have those errors until you point it to the correct DB server and etc.

Will do a fresh 3.2.7 SNAP install again and will come back.

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
mitzone (mitzone) wrote :

Same issue

Test box:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"

kernel 5.4.0-88-generic

Installed MAAS from snap like this :

1. snap install --channel=3.2 maas
2. snap install maas-test-db
3. maas init region+rack --database-uri maas-test-db:///
4. maas createadmin

Then from GUI I went to Subnets, selected my subnet, clicked edit and checked Active discovery box (default is off)

Waited some time, same error entries in regiond started to pop up every 5 minutes.
Attached logs.
Thank you.

Revision history for this message
Anton Troyanov (troyanov) wrote :

mitzone, sorry I somehow missed that `Active discovery` on a Subnet should be enabled.

I am able to reproduce this as well.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
Changed in maas:
milestone: none → 3.4.0
Revision history for this message
Igor Brovtsin (igor-brovtsin) wrote :

Moved `bug-council` tag from #1769471

tags: added: bug-council
tags: removed: bug-council
Changed in maas:
assignee: nobody → Alexsander de Souza (alexsander-souza)
status: Triaged → In Progress
Changed in maas:
milestone: 3.4.0 → 3.5.0
Revision history for this message
maasuser1 (maasuser1) wrote (last edit ):

Same problem: `2024-01-25 16:55:16 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.`

Both main and secondary controllers running MAAS 3.4.0-14321-g.1027c7664 installed by Snap.

Changed in maas:
milestone: 3.4.x → 3.6.x
Changed in maas:
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.