UnknownRemoteError causes cluster to disconnect
Bug #1457799 reported by
Raphaël Badin
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Gavin Panella |
Bug Description
When the cluster, called through RPC, raises an exception unknown to the region, the cluster disconnects.
2015-05-21 23:42:49 [RegionServer,
2015-05-21 23:42:49 [-] Unhandled error in Deferred:
2015-05-21 23:42:49 [-] Unhandled Error
Traceback (most recent call last):
Failure: twisted.
Related branches
lp:~allenap/maas/dont-disconnect-on-error--bug-1457799
- Blake Rouse (community): Approve
-
Diff: 254 lines (+193/-0)2 files modifiedsrc/provisioningserver/rpc/common.py (+78/-0)
src/provisioningserver/rpc/tests/test_common.py (+115/-0)
Changed in maas: | |
importance: | High → Critical |
Changed in maas: | |
assignee: | nobody → Gavin Panella (allenap) |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
We should investigate how we can improve this situation:
- having the region report some information about the underlying error instead of the cryptic "UnknownRemoteE rror: Code<UNKNOWN>: Unknown Error"
- get the cluster *not* to disconnect when this happens. There is no reason for this to happen, one call to the cluster might crash but it shouldn't take down the whole connection with it; the cluster will reconnect eventually but this causes down-time nonetheless and this is a real problem for any kind of installation that operates at scale (like OIL).