Failed to update regiond's processes and endpoints

Bug #2033505 reported by Anton Troyanov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Christian Grabowski
3.3
Fix Released
Critical
Anton Troyanov
3.4
Fix Released
Critical
Anton Troyanov

Bug Description

It seems that 996df30a06ab3c8e75c36f4ee8b24e2118ed9686 introduced this issue, as `_registerConnection` return was changed:
```
- return connection
+ return (connection, created)
```

Aug 30 08:54:39 maas maas-regiond[1350]: maasserver.ipc: [critical] Failed to update regiond's processes and endpoints; maas:pid=1350 record's may be out of date
Aug 30 08:54:39 maas maas-regiond[1350]: Traceback (most recent call last):
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 271, in _onTimer
Aug 30 08:54:39 maas maas-regiond[1350]: self.runUntilCurrent()
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/internet/base.py", line 991, in runUntilCurrent
Aug 30 08:54:39 maas maas-regiond[1350]: call.func(*call.args, **call.kw)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
Aug 30 08:54:39 maas maas-regiond[1350]: self._startRunCallbacks(fail)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
Aug 30 08:54:39 maas maas-regiond[1350]: self._runCallbacks()
Aug 30 08:54:39 maas maas-regiond[1350]: --- <exception caught here> ---
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Aug 30 08:54:39 maas maas-regiond[1350]: current.result = callback( # type: ignore[misc]
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/maasserver/ipc.py", line 622, in ignore_cancel
Aug 30 08:54:39 maas maas-regiond[1350]: failure.trap(CancelledError)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
Aug 30 08:54:39 maas maas-regiond[1350]: self.raiseException()
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
Aug 30 08:54:39 maas maas-regiond[1350]: raise self.value.with_traceback(self.tb)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
Aug 30 08:54:39 maas maas-regiond[1350]: result = inContext.theWork() # type: ignore[attr-defined]
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
Aug 30 08:54:39 maas maas-regiond[1350]: inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
Aug 30 08:54:39 maas maas-regiond[1350]: return self.currentContext().callWithContext(ctx, func, *args, **kw)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
Aug 30 08:54:39 maas maas-regiond[1350]: return func(*args, **kw)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
Aug 30 08:54:39 maas maas-regiond[1350]: return func(*args, **kwargs)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
Aug 30 08:54:39 maas maas-regiond[1350]: result = func(*args, **kwargs)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
Aug 30 08:54:39 maas maas-regiond[1350]: return func_outside_txn(*args, **kwargs)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/maasserver/utils/orm.py", line 574, in retrier
Aug 30 08:54:39 maas maas-regiond[1350]: return func(*args, **kwargs)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
Aug 30 08:54:39 maas maas-regiond[1350]: return func(*args, **kwds)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/maasserver/ipc.py", line 590, in _update
Aug 30 08:54:39 maas maas-regiond[1350]: self._updateConnections(process, conn["rpc"]["connections"].copy())
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
Aug 30 08:54:39 maas maas-regiond[1350]: result = func(*args, **kwargs)
Aug 30 08:54:39 maas maas-regiond[1350]: File "/snap/maas/x1/lib/python3.10/site-packages/maasserver/ipc.py", line 521, in _updateConnections
Aug 30 08:54:39 maas maas-regiond[1350]: previous_connection_ids.discard(db_conn.id)
Aug 30 08:54:39 maas maas-regiond[1350]: builtins.AttributeError: 'tuple' object has no attribute 'id'

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
milestone: none → 3.5.0
status: Triaged → In Progress
Changed in maas:
importance: Critical → High
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
DUFOUR Olivier (odufourc) wrote :

This issue is currently seen on a customer environment after upgrading from MAAS 3.3.5 (3.3/stable) to 3.4 (3.4/stable and 3.4/edge).

Is there any workaround to stabilise MAAS ?
Out of 3 controllers in HA, I'm seeing always at least one or two regiond being crashing constantly with this error.

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hello Olivier,

Unfortunately we forgot about this fix when we were backporting this change:
https://git.launchpad.net/maas/commit/?id=b0ddb4d6abedf37def878d7214e757d5d7d8dc0b
https://git.launchpad.net/maas/commit/?id=fdd97926a8c4cc172f8427b96d7d7b4bd384709d

I am about to backport things to 3.3 and 3.4 and I guess the only workaround would be to use a edge channel until new releases will be available.

Revision history for this message
Anton Troyanov (troyanov) wrote :

New builds are available on 3.3/edge and 3.4/edge

❯ snap info maas | grep edge
  3.4/edge: 3.4.1-14335-g.e559e7a7b 2024-02-22 (33732) 138MB
  3.3/edge: 3.3.5-13228-g.85d0d55a9 2024-02-22 (33736) 138MB

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Thank you for backporting the fix on edge for now, it is way better than nothing.

I will check with the customer if we can try 3.4/edge branches next week and confirm then if it improves the state of MAAS

Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Leonardo Silva (leo-scs) wrote :
Download full text (6.6 KiB)

I tried to update to 3.4.1/stable and the error persists, always 2 out of 3 regions have an error, at times even all 3 have an error, we had to rollback to 3.4.0

follow the error below

2024-03-26 03:17:31 maasserver.ipc: [critical] Failed to update regiond's processes and endpoints; prod-web-maas-region3:pid=944396 record's may be out of date
 Traceback (most recent call last):
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/asyncioreactor.py", line 271, in _onTimer
     self.runUntilCurrent()
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/base.py", line 991, in runUntilCurrent
     call.func(*call.args, **call.kw)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
     self._startRunCallbacks(fail)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
     self._runCallbacks()
 --- <exception caught here> ---
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
     current.result = callback( # type: ignore[misc]
   File "/snap/maas/34087/lib/python3.10/site-packages/maasserver/ipc.py", line 622, in ignore_cancel
     failure.trap(CancelledError)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
     self.raiseException()
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
     raise self.value.with_traceback(self.tb)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
     result = inContext.theWork() # type: ignore[attr-defined]
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
     inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/snap/maas/34087/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
     return func(*args, **kw)
   File "/snap/maas/34087/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
     return func(*args, **kwargs)
   File "/snap/maas/34087/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
     result = func(*args, **kwargs)
   File "/snap/maas/34087/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
     return func_outside_txn(*args, **kwargs)
   File "/snap/maas/34087/lib/python3.10/site-packages/maasserver/utils/orm.py", line 587, in retrier
     return func(*args, **kwargs)
   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
     return func(*args, **kwds)
   File "/snap/maas/34087/lib/python3.10/site-packages/maasserver/ipc.py", line 590, in _update
     self._updateConnections(process, conn["rpc"]["connections"].copy())
   File "/snap/maas/34087/lib/python3.10/site-packages/provisio...

Read more...

Revision history for this message
Anton Troyanov (troyanov) wrote :

Hello Leonardo,

I have a felling that the issue you are facing is a different one. Not sure how duplicated IDs are possible here. Will try to find a reproducer.

django.db.utils.IntegrityError: duplicate key value violates unique constraint "maasserver_regionrackrpcconne_endpoint_id_52afd589b0be380e_uniq"
 DETAIL: Key (endpoint_id, rack_controller_id)=(51289, 5934) already exists.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.