UnknownRemoteError causes cluster to disconnect

Bug #1457799 reported by Raphaël Badin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Gavin Panella

Bug Description

When the cluster, called through RPC, raises an exception unknown to the region, the cluster disconnects.

2015-05-21 23:42:49 [RegionServer,1,127.0.0.1] RegionServer connection lost (HOST:IPv4Address(TCP, '127.0.0.1', 45229) PEER:IPv4Address(TCP, '127.0.0.1', 39400))
2015-05-21 23:42:49 [-] Unhandled error in Deferred:
2015-05-21 23:42:49 [-] Unhandled Error
        Traceback (most recent call last):
        Failure: twisted.protocols.amp.UnknownRemoteError: Code<UNKNOWN>: Unknown Error

Related branches

Revision history for this message
Raphaël Badin (rvb) wrote :

We should investigate how we can improve this situation:

- having the region report some information about the underlying error instead of the cryptic "UnknownRemoteError: Code<UNKNOWN>: Unknown Error"

- get the cluster *not* to disconnect when this happens. There is no reason for this to happen, one call to the cluster might crash but it shouldn't take down the whole connection with it; the cluster will reconnect eventually but this causes down-time nonetheless and this is a real problem for any kind of installation that operates at scale (like OIL).

Raphaël Badin (rvb)
Changed in maas:
importance: High → Critical
Gavin Panella (allenap)
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.