Failed commissioning, 500 INTERNAL_SERVER_ERROR

Bug #1976954 reported by Konstantinos Kaskavelis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned
3.3
Won't Fix
Medium
Unassigned
3.4
Won't Fix
Medium
Unassigned

Bug Description

During the compose_vms step of our test run we got a failed commissioning:

Status: {'elastic-1': 'Failed commissioning', 'graylog-1': 'Ready', 'juju-1': 'Ready', 'vault-1': 'Ready', 'elastic-2': 'Ready', 'grafana-2': 'Ready', 'graylog-2': 'Ready', 'juju-2': 'Testing', 'vault-2': 'Commissioning', 'elastic-3': 'Commissioning', 'prometheus-3': 'Commissioning', 'graylog-3': 'Commissioning', 'nagios-3': 'Commissioning', 'juju-3': 'Commissioning', 'vault-3': 'Commissioning'}

Going through MAAS logs, we see these related lines:

2022-06-01 15:05:42 maasserver: [error] ################################ Exception: ################################
2022-06-01 15:05:42 maasserver: [error] Traceback (most recent call last):
  File "/snap/maas/19835/usr/lib/python3/dist-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/utils/views.py", line 284, in view_atomic_with_post_commit_savepoint
    return view_atomic(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/api/support.py", line 56, in __call__
    response = super().__call__(request, *args, **kwargs)
  File "/snap/maas/19835/usr/lib/python3/dist-packages/django/views/decorators/vary.py", line 20, in inner_func
    response = func(*args, **kwargs)
  File "/snap/maas/19835/usr/lib/python3.8/dist-packages/piston3/resource.py", line 197, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/snap/maas/19835/usr/lib/python3.8/dist-packages/piston3/resource.py", line 195, in __call__
    result = meth(request, *args, **kwargs)
  File "/snap/maas/19835/lib/python3.8/site-packages/maasserver/api/support.py", line 308, in dispatch
    return function(self, request, *args, **kwargs)
  File "/snap/maas/19835/lib/python3.8/site-packages/metadataserver/api.py", line 817, in signal
    target_status = process(node, request, status)
  File "/snap/maas/19835/lib/python3.8/site-packages/metadataserver/api.py", line 641, in _process_commissioning
    self._store_results(
  File "/snap/maas/19835/lib/python3.8/site-packages/metadataserver/api.py", line 529, in _store_results
    script_result.store_result(
  File "/snap/maas/19835/lib/python3.8/site-packages/metadataserver/models/scriptresult.py", line 274, in store_result
    assert self.status in SCRIPT_STATUS_RUNNING_OR_PENDING
AssertionError

2022-06-01 15:05:42 regiond: [info] 10.245.222.212 POST /MAAS/metadata/2012-03-01/ HTTP/1.0 --> 500 INTERNAL_SERVER_ERROR (referrer: -; agent: Python-urllib/3.8)

Test run:

https://solutions.qa.canonical.com/testruns/testRun/eefcccfb-e794-4724-adde-0236f3803b71

Logs:

https://oil-jenkins.canonical.com/artifacts/eefcccfb-e794-4724-adde-0236f3803b71/index.html

Tags: cdo-qa
Revision history for this message
Christian Grabowski (cgrabowski) wrote :

Hi there, I don't happen to see regiond.log or rackd.log in the provided log link, would you be able to provide those? The path should be /var/snap/maas/common/var/log/maas/.

Changed in maas:
status: New → Incomplete
Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :

Hi Christian!

these type of logs are under generated/generated/maas/logs-2022-06-01-15.30.35.tar for this particular run.

Attaching rackd.log here for clarity (from 10.246.64.5)

Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :

And regiond.log:

Revision history for this message
Björn Tillenius (bjornt) wrote :

I'm not sure exactly what the problem is, but I do think we have enough information in order to debug and fix this issue.

The 'elastic-1' node has system id 'gt3sh4'. If you look at the region logs, you can see that it lost connection to the rack at the same time 'gt3sh4' was posting its commissioning results. That probably has something to do with it. We need to check what happens in such a case.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Medium
milestone: none → 3.3.0
Changed in maas:
milestone: 3.3.0 → 3.4.0
tags: added: cdo-qa
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
Changed in maas:
milestone: 3.4.x → 3.5.x
no longer affects: maas/3.3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.