All controllers in HA deploy fail to run refresh scripts

Bug #1998480 reported by Marcelo Subtil Marcal
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Incomplete
Undecided
Adam Collard

Bug Description

I'm using snap version/build 3.2.6-12016-g.19812b4da

When deploying MAAS in HA, it is impossible to create the OAM subnet.

Looking at "Controllers" > "[Controller name]" > "Network" on MAAS WebUI, the following error is shown:

```
Error: Node must be connected to a network.
```

On the infra nodes, there is only one bond (bond0) and two bridges, broam (bond0.23) and brint (bond0.3040). Also, there is a bond0.9 interface for BMC access.

The infra2 regiond.log shows "Failed to update and/or record network interface configuration" error:

```
2022-12-01 11:15:38 provisioningserver.utils.services: [critical] Failed to update and/or record network interface configuration: Expectin
g value: line 1 column 1 (char 0); interfaces: {'bond0': {'type': 'bond', 'mac_address': '02:b4:83:84:62:73', 'links': [], 'enabled': True
, 'parents': ['ens1f0', 'enp3s0f0'], 'source': 'machine-resources', 'monitored': True}, 'bond0.23': {'type': 'vlan', 'mac_address': '02:b4
:83:84:62:73', 'links': [], 'enabled': True, 'parents': ['bond0'], 'source': 'machine-resources', 'vid': 23, 'monitored': False}, 'bond0.3
040': {'type': 'vlan', 'mac_address': '02:b4:83:84:62:73', 'links': [], 'enabled': True, 'parents': ['bond0'], 'source': 'machine-resource
s', 'vid': 3040, 'monitored': False}, 'bond0.9': {'type': 'vlan', 'mac_address': '02:b4:83:84:62:73', 'links': [{'mode': 'static', 'addres
s': '139.128.31.22/23'}], 'enabled': True, 'parents': ['bond0'], 'source': 'machine-resources', 'vid': 9, 'monitored': False}, 'brint': {'
type': 'bridge', 'mac_address': '02:b4:83:84:62:73', 'links': [{'mode': 'static', 'address': '100.64.11.2/24'}], 'enabled': True, 'parents
': ['bond0.3040'], 'source': 'machine-resources', 'monitored': False}, 'broam': {'type': 'bridge', 'mac_address': '02:b4:83:84:62:73', 'li
nks': [{'mode': 'static', 'address': '10.129.213.22/24', 'gateway': '10.129.213.1'}], 'enabled': True, 'parents': ['bond0.23'], 'source':
'machine-resources', 'monitored': False}, 'enp3s0f0': {'type': 'physical', 'mac_address': 'b4:99:ba:b9:ec:14', 'links': [], 'enabled': Tru
e, 'parents': [], 'source': 'machine-resources', 'monitored': False}, 'enp3s0f1': {'type': 'physical', 'mac_address': 'b4:99:ba:b9:ec:16',
 'links': [], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'enp4s0f0': {'type': 'physical', 'mac_add
ress': 'b4:99:ba:b9:ec:18', 'links': [], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'enp4s0f1': {'
type': 'physical', 'mac_address': 'b4:99:ba:b9:ec:1a', 'links': [], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monito
red': True}, 'ens1f0': {'type': 'physical', 'mac_address': '98:4b:e1:5f:40:e4', 'links': [], 'enabled': True, 'parents': [], 'source': 'ma
chine-resources', 'monitored': False}, 'ens1f1': {'type': 'physical', 'mac_address': '98:4b:e1:5f:40:e6', 'links': [], 'enabled': True, 'p
arents': [], 'source': 'machine-resources', 'monitored': True}}
        Traceback (most recent call last):
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
            _inlineCallbacks(r, g, status)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
        --- <exception caught here> ---
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/services.py", line 1091, in do_action
            yield self._updateInterfaces(interfaces)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/services.py", line 1174, in _updateInterfaces
            yield self._run_refresh(
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/services.py", line 1201, in _run_refresh
            yield deferToThread(
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/23947/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
            return func(*args, **kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 202, in wrapper
            result = func(*args, **kwargs)
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/refresh/__init__.py", line 63, in refresh
            failed_scripts = runscripts(
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/refresh/__init__.py", line 172, in runscripts
            post_process_hook(
          File "/snap/maas/23947/lib/python3.8/site-packages/provisioningserver/utils/services.py", line 1226, in _annotate_commissioning
            lxd_data = json.load(fp)
          File "/usr/lib/python3.8/json/__init__.py", line 293, in load
            return loads(fp.read(),
          File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
            return _default_decoder.decode(s)
          File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
            obj, end = self.raw_decode(s, idx=_w(s, 0).end())
          File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
            raise JSONDecodeError("Expecting value", s, err.value) from None
        json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :
Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :

subscribed ~field-critical

Revision history for this message
Adam Collard (adam-collard) wrote :

Please attach an sos report on the failed node and confirm what firewalling (if any) is in place, and what proxy configuration (if any) is in place.

The telltale error from the logs is

request to http://127.0.0.1:5240/MAAS/metadata/2012-03-01/ failed. sleeping 1.: HTTP Error 502: cannotconnect

Can you confirm if that URL is accessible from the region controller in question?

Changed in maas:
status: New → Incomplete
Changed in maas:
assignee: nobody → Adam Collard (adam-collard)
Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :

Hi Adam,

I can access that URL from the region controller:

```
$ curl http://127.0.0.1:5240/MAAS/metadata/2012-03-01/
meta-data
```

There is no firewall rules:

```
$ sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
```

I'm attaching the sosreport and the proxy configuration.

Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Did anything happen to the network around 06:57:48-50 on Dec 1? After that everything seems to lose connectivity: HAproxy goes down at that time, so does postgres, HTTP access.log starts showing error 500 responses, rack loses connection to region, and pacemaker loses quorum.

Revision history for this message
Marcelo Subtil Marcal (msmarcal) wrote :

Hi Jerzy,

The switches were rebooted at that time, that's why the connectivity was lost.

summary: - All three controllers show the error: "Error: Node must be connected to
- a network."
+ All controllers in HA deploy fail to run refresh scripts
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.