Failed to start MAAS hardware Sync Service

Bug #2052942 reported by maasuser1
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.6
3.4
Won't Fix
Low
Unassigned
3.5
Won't Fix
Low
Unassigned
3.6
Triaged
Low
Unassigned

Bug Description

MAAS: 3.4/Stable (3.4.0-14321-g.1027c7664)
Machine: standard deployment of Ubuntu server 20.04, ticked "Register as MAAS KVM host" and "Periodically sync hardware".

The below error messages showed after the OS installation is complete.

syslog
```
Feb 12 13:05:05 perlman systemd[1]: snap.lxd.lxd-cd9536b2-9244-4abb-bb63-11e266c05a04.scope: Succeeded.
Feb 12 13:05:05 perlman systemd[1]: Started snap.lxd.lxc-ba8b0408-4801-4a50-a412-8a5506b91cd9.scope.
Feb 12 13:05:06 perlman systemd[1]: snap.lxd.lxc-ba8b0408-4801-4a50-a412-8a5506b91cd9.scope: Succeeded.
Feb 12 13:05:06 perlman systemd[1]: Started snap.lxd.lxc-62131c55-e5a0-442f-ab55-a4812d1ed87b.scope.
Feb 12 13:05:06 perlman systemd[1]: snap.lxd.lxc-62131c55-e5a0-442f-ab55-a4812d1ed87b.scope: Succeeded.
Feb 12 13:05:06 perlman systemd[1]: Reloading.
Feb 12 13:05:06 perlman systemd[1]: Started Timer for periodically running MAAS hardware sync.
Feb 12 13:05:06 perlman systemd[1]: Starting MAAS Hardware Sync Service...
Feb 12 13:05:06 perlman cloud-init[1705]: Created symlink /etc/systemd/system/timers.target.wants/maas_hardware_sync.timer → /lib/systemd/system/maas_hardware_sync.timer.
Feb 12 13:05:06 perlman systemd[1]: Reloading.
Feb 12 13:05:06 perlman systemd[1]: maas_hardware_sync.service: Main process exited, code=exited, status=1/FAILURE
Feb 12 13:05:06 perlman systemd[1]: maas_hardware_sync.service: Failed with result 'exit-code'.
Feb 12 13:05:06 perlman systemd[1]: Failed to start MAAS Hardware Sync Service.
Feb 12 13:05:07 perlman systemd[1]: Finished Execute cloud user/final scripts.
Feb 12 13:05:07 perlman systemd[1]: Reached target Cloud-init target.
Feb 12 13:05:07 perlman systemd[1]: Startup finished in 6min 54.661s (firmware) + 6.626s (loader) + 4.232s (kernel) + 55.009s (userspace) = 8min 530ms.
Feb 12 13:07:40 perlman systemd[1]: Created slice User Slice of UID 1000.
Feb 12 13:07:40 perlman systemd[1]: Starting User Runtime Directory /run/user/1000...
Feb 12 13:07:40 perlman systemd[1]: Finished User Runtime Directory /run/user/1000.
Feb 12 13:07:40 perlman systemd[1]: Starting User Manager for UID 1000...
Feb 12 13:07:40 perlman systemd[3112]: Reached target Paths.
Feb 12 13:07:40 perlman systemd[3112]: Reached target Timers.
Feb 12 13:07:40 perlman systemd[3112]: Starting D-Bus User Message Bus Socket.
Feb 12 13:07:40 perlman systemd[3112]: Listening on GnuPG network certificate management daemon.
Feb 12 13:07:40 perlman systemd[3112]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Feb 12 13:07:40 perlman systemd[3112]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Feb 12 13:07:40 perlman systemd[3112]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Feb 12 13:07:40 perlman systemd[3112]: Listening on GnuPG cryptographic agent and passphrase cache.
Feb 12 13:07:40 perlman systemd[3112]: Listening on debconf communication socket.
Feb 12 13:07:40 perlman systemd[3112]: Listening on REST API socket for snapd user session agent.
Feb 12 13:07:40 perlman systemd[3112]: Listening on D-Bus User Message Bus Socket.
Feb 12 13:07:40 perlman systemd[3112]: Reached target Sockets.
Feb 12 13:07:40 perlman systemd[3112]: Reached target Basic System.
Feb 12 13:07:40 perlman systemd[3112]: Reached target Main User Target.
Feb 12 13:07:40 perlman systemd[3112]: Startup finished in 91ms.
Feb 12 13:07:40 perlman systemd[1]: Started User Manager for UID 1000.
Feb 12 13:07:40 perlman systemd[1]: Started Session 1 of user ubuntu.
```

`systemctl status maas_hardware_sync`:
```
● maas_hardware_sync.service - MAAS Hardware Sync Service
     Loaded: loaded (/lib/systemd/system/maas_hardware_sync.service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2024-02-12 13:05:06 UTC; 2min 53s ago
TriggeredBy: ● maas_hardware_sync.timer
       Docs: https://maas.io
    Process: 3007 ExecStartPre=/usr/bin/wget -O /usr/bin/maas-run-scripts http://192.168.0.2:5248/MAAS/maas-run-scripts (code=exited, status=0/SUCCESS)
    Process: 3051 ExecStartPre=/bin/chmod 0755 /usr/bin/maas-run-scripts (code=exited, status=0/SUCCESS)
    Process: 3052 ExecStart=/usr/bin/maas-run-scripts report-results --config /etc/maas/maas-machine-creds.yml (code=exited, status=1/FAILURE)
   Main PID: 3052 (code=exited, status=1/FAILURE)

Feb 12 13:05:06 perlman wget[3007]: HTTP request sent, awaiting response... 200 OK
Feb 12 13:05:06 perlman wget[3007]: Length: 32563 (32K) [text/x-python]
Feb 12 13:05:06 perlman wget[3007]: Saving to: ‘/usr/bin/maas-run-scripts’
Feb 12 13:05:06 perlman wget[3007]: 0K .......... .......... .......... . 100% 119M=0s
Feb 12 13:05:06 perlman wget[3007]: 2024-02-12 13:05:06 (119 MB/s) - ‘/usr/bin/maas-run-scripts’ saved [32563/32563]
Feb 12 13:05:06 perlman maas-run-scripts[3052]: * Fetching scripts from http://192.168.0.2:5248/MAAS/metadata/2012-03-01/ to /tmp/tmpt4xv075c/scripts
Feb 12 13:05:06 perlman maas-run-scripts[3052]: No script returned
Feb 12 13:05:06 perlman systemd[1]: maas_hardware_sync.service: Main process exited, code=exited, status=1/FAILURE
Feb 12 13:05:06 perlman systemd[1]: maas_hardware_sync.service: Failed with result 'exit-code'.
Feb 12 13:05:06 perlman systemd[1]: Failed to start MAAS Hardware Sync Service.
```

Revision history for this message
maasuser1 (maasuser1) wrote :

After 10 minutes, the hardware sync service started automatically.

```
Feb 12 13:07:40 perlman systemd[1]: Started User Manager for UID 1000.
Feb 12 13:07:40 perlman systemd[1]: Started Session 1 of user ubuntu.
Feb 12 13:15:18 perlman systemd[1]: Starting MAAS Hardware Sync Service...
Feb 12 13:15:26 perlman systemd[1]: maas_hardware_sync.service: Succeeded.
Feb 12 13:15:26 perlman systemd[1]: Finished MAAS Hardware Sync Service.
```

description: updated
Revision history for this message
Anton Troyanov (troyanov) wrote :

Hello!

Did it happen only once or you have a stable reproducer?
According to the code it might be related that requested data was not populated on the Region

```
    if res.status == http.client.NO_CONTENT:
        raise ExitError("No script returned")
```

Was there anything interesting on region/rack logs around that time?
/var/snap/maas/common/log/regiond.log
/var/snap/maas/common/log/rackd.log

Is there any proxy/load balancer in front of MAAS?

Changed in maas:
status: New → Incomplete
Revision history for this message
maasuser1 (maasuser1) wrote :
Download full text (10.2 KiB)

Hi Anton,

I have a stable reproducer. I released the same host and deployed it again with the same deployment options, and the issue happened again.

regiond: Lots of 204 NO_CONTENT log and https://bugs.launchpad.net/maas/+bug/2053042
```
2024-02-13 15:36:13 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:13 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:13 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:14 metadataserver: [info] No user data registered for node named perlman
2024-02-13 15:36:14 regiond: [info] 127.0.0.1 GET /MAAS/metadata/2012-03-01/user-data HTTP/1.1 --> 404 NOT_FOUND (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:14 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:15 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
2024-02-13 15:36:15 regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/4nem8e HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
```

```
2024-02-13 15:36:57 regiond: [info] 127.0.0.1 GET /MAAS/metadata/2012-03-01/maas-scripts HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: Python-urllib/3.8)
2024-02-13 15:36:57 provisioningserver.rpc.common: [critical] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the b'ReportNeighbours' command. [maas:pid=320095:cmd=ReportNeighbours:ask=3b]
        Traceback (most recent call last):
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 271, in _onTimer
            self.runUntilCurrent()
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/base.py", line 991, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
            current.result = callback( # type: ignore[misc]
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/protocols/amp.py", line 1138, in checkKnownErrors
            key = error.trap(*command.allErrors)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
            self.raiseException()
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseExcepti...

Revision history for this message
Anton Troyanov (troyanov) wrote :
Download full text (4.5 KiB)

This one looks interesting.
There was a similar issue reported, and turning off network discovery helped.
Can you please check if that makes any difference on your env?
https://bugs.launchpad.net/maas/+bug/2042869/comments/7

2024-02-13 15:36:57 provisioningserver.rpc.common: [critical] Unhandled failure dispatching AMP command. This is probably a bug. Please ensure that this error is handled within application code or declared in the signature of the b'ReportNeighbours' command. [maas:pid=320095:cmd=ReportNeighbours:ask=3b]
        Traceback (most recent call last):
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 271, in _onTimer
            self.runUntilCurrent()
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/base.py", line 991, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
            current.result = callback( # type: ignore[misc]
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/protocols/amp.py", line 1138, in checkKnownErrors
            key = error.trap(*command.allErrors)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
            self.raiseException()
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
            raise self.value.with_traceback(self.tb)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
            result = inContext.theWork() # type: ignore[attr-defined]
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
            inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
            return func(*args, **kw)
          File "/snap/maas/32469/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
            return func(*args, **kwargs)
          File "/snap/maas/32469/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
            result = func(*args, **kwargs)
          File "/snap/maas/32469/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
            return func_outside_txn(*args, **kwargs)
          File "/snap/maas/32469/lib...

Read more...

Revision history for this message
maasuser1 (maasuser1) wrote :

Hi Anton, I can reproduce the problem even after turning off network discovery.

Revision history for this message
Nick De Villiers (nickdv99) wrote :

So just to clarify - the first attempt always fails, and then the ones after that succeed?

Revision history for this message
maasuser1 (maasuser1) wrote :

Yeah, that's correct.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Low
milestone: none → 3.5.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.