Ready machines with owner

Bug #2056330 reported by Björn Tillenius
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Anton Troyanov
3.5
Fix Committed
High
Anton Troyanov

Bug Description

Looking at labmaas, which is running 3.5.0~beta1-16219-g.8b07c3d5c, some machines that are in the Ready state has an owner. That shouldn't be. A machine in the ready state should never have an owner.

Furthermore, if you go to the machine details for those machine, the page is stuck in a loading state and doesn't show the details.

The currently affected machines are chladni, lapras and scheele.

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → 3.5.0
Revision history for this message
Javier Fuentes (javier-fs) wrote :

In a fresh installation, I found the expected behaviors for all the transitions to Ready:
- Ready to Allocated to Ready
- Ready to Deployed to Ready
- Ready to Allocated to Deployed to Ready

Looking at the code, there are some cases that might not potentially reset the owner of the node:
- Node.abort_testing()
- Node.mark_fixed()
- Node.override_failed_testing()

Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

If it's not a regression we can change this to medium, probably.

Revision history for this message
Anton Troyanov (troyanov) wrote :

After inspecting WS messages I found the following:
{"type": 1, "request_id": 9, "rtype": 1, "error": "ScriptSet matching query does not exist."}
{"type": 1, "request_id": 7, "rtype": 1, "error": "ScriptSet matching query does not exist."}
{"type": 1, "request_id": 10, "rtype": 1, "error": "ScriptSet matching query does not exist."}

Thats a response to:
{"method":"machine.set_active","type":0,"params":{"system_id":"pcp68r"},"request_id":7}
{"method":"machine.get","type":0,"params":{"system_id":"pcp68r"},"request_id":9}
{"method":"machine.get","type":0,"params":{"system_id":"pcp68r"},"request_id":10}

Changed in maas:
milestone: 3.5.0 → 3.6.0
assignee: nobody → Anton Troyanov (troyanov)
Changed in maas:
status: Triaged → In Progress
Revision history for this message
Anton Troyanov (troyanov) wrote :
Download full text (5.7 KiB)

Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: Traceback (most recent call last):
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/usr/lib/python3.10/threading.py", line 953, in run
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: self._target(*self._args, **self._kwargs)
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 821, in worker
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: return target()
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 47, in work
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: task()
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 182, in doWork
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: task()
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: --- <exception caught here> ---
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: result = inContext.theWork() # type: ignore[attr-defined]
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: return self.currentContext().callWithContext(ctx, func, *args, **kw)
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: return func(*args, **kw)
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: return func(*args, **kwargs)
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: result = func(*args, **kwargs)
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: File "/snap/maas/34155/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
Mar 18 11:15:45 jenkins-slave-2 maas-regiond[3389615]: return func_outside_txn(*args, **kwargs)
Mar 18 ...

Read more...

Revision history for this message
Anton Troyanov (troyanov) wrote (last edit ):

It seems that the query that returns wrong result is:

SELECT
    "maasserver_scriptset"."id",
    "maasserver_scriptset"."last_ping",
    "maasserver_scriptset"."node_id",
    "maasserver_scriptset"."result_type",
    "maasserver_scriptset"."power_state_before_transition",
    "maasserver_scriptset"."tags"
FROM "maasserver_scriptset"
WHERE "maasserver_scriptset"."id" = 2594
LIMIT 21

After checking DB data I found that this ID is set for `select * from maasserver_node where id = 4;` (chladni)
...
current_commissioning_script_set_id | 2594
current_installation_script_set_id |
current_testing_script_set_id | 2595
...

However both scriptsets do not exist.

maasdb=# select * from maasserver_scriptset where id = 2594;
(0 rows)

maasdb=# select * from maasserver_scriptset where id = 2595;
(0 rows)

SELECT
n.id,
n.hostname
FROM maasserver_node n
LEFT JOIN maasserver_scriptset scom
ON n.current_commissioning_script_set_id = scom.id
LEFT JOIN maasserver_scriptset sinst
ON n.current_installation_script_set_id = sinst.id
LEFT JOIN maasserver_scriptset stest
ON n.current_testing_script_set_id = stest.id
LEFT JOIN maasserver_scriptset srel
ON n.current_release_script_set_id = srel.id
WHERE (scom.id is null AND n.current_commissioning_script_set_id is not null)
OR (sinst.id is null AND n.current_installation_script_set_id is not null)
OR (stest.id is null AND n.current_testing_script_set_id is not null)
OR (srel.id is null AND n.current_release_script_set_id is not null);

 id | hostname
----+----------
  4 | chladni
  6 | scheele
  7 | lapras
(3 rows)

Did we lose database integrity?

Revision history for this message
Anton Troyanov (troyanov) wrote :

maasdb=# select * from maasserver_scriptset where id > 2580 order by id asc;
  id | last_ping | result_type | node_id | power_state_before_transition | tags
------+-------------------------------+-------------+---------+-------------------------------+-----------------
 2581 | | 0 | 21 | off | {}
 2582 | | 2 | 21 | off | {}
 2583 | 2024-03-01 08:35:48.423709+00 | 2 | 4 | off | {}
 2584 | 2024-03-01 08:35:47.258141+00 | 0 | 6 | off | {}
 2585 | 2024-03-01 08:35:54.715715+00 | 2 | 6 | off | {}
 2586 | | 0 | 8 | off | {}
 2587 | | 2 | 8 | off | {}
 2588 | | 0 | 13 | off | {}
 2589 | | 2 | 13 | off | {}
 2590 | 2024-03-01 08:37:42.151949+00 | 0 | 24 | off | {}
 2591 | 2024-03-01 08:37:49.48196+00 | 2 | 24 | off | {}
 2604 | 2024-03-01 08:44:14.432544+00 | 0 | 24 | off | {}
 2605 | 2024-03-01 08:44:22.346858+00 | 2 | 24 | off | {}
 2606 | | 1 | 24 | off | {}
 2609 | 2024-03-05 11:48:53.083524+00 | 0 | 49 | unknown | {}
 2610 | 2024-03-05 11:49:01.476047+00 | 2 | 49 | unknown | {commissioning}
 2611 | | 0 | 8 | on | {}
 2612 | | 2 | 8 | on | {}
 2613 | | 0 | 13 | on | {}
 2614 | | 2 | 13 | on | {}
 2615 | | 1 | 49 | unknown | {}
 2616 | 2024-03-17 19:54:38.163618+00 | 0 | 50 | unknown | {enlisting}
 2617 | 2024-03-18 10:37:26.213948+00 | 2 | 49 | on | {}
(23 rows)

Changed in maas:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.