Power monitor service hits amp.TooLong errors with > ~600 nodes to a cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Gavin Panella | ||
1.8 |
Fix Released
|
Critical
|
Gavin Panella |
Bug Description
When setting up my MAAS demo of angular, I was creating nodes in 50 batch increments. Once my MAAS got somewhere above >600 I don't know the exact number, as I was not refreshing the page on every 50, MAAS UI stopped working.
The only way to get it to work again was to disconnect the cluster from the region.
This error doesn't really help, because it is not giving the actual call site. Looks like twisted is mangling that, but if the cluster is disconnected the region works, that should help narrow down the call.
ERROR 2014-11-03 09:05:11,673 twisted Amp server or network failure unhandled by client application. Dropping connection! To avoid, add errbacks to ALL remote commands!
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
f(*a, **kw)
File "/usr/lib/
self.
File "/usr/lib/
self.
--- <exception caught here> ---
File "/usr/lib/
current.result = callback(
File "/usr/lib/
aBox.
File "/usr/lib/
proto.
File "/usr/lib/
self.
File "/usr/lib/
raise TooLong(False, True, v, k)
twisted.
Related branches
- Jeroen T. Vermeulen (community): Approve
- Christian Reis (community): Needs Information
-
Diff: 7895 lines (+6361/-339)29 files modifiedLICENSE.Twisted (+65/-0)
docs/development/rpc.rst (+12/-8)
scripts/ampclient.py (+1/-1)
src/maasserver/clusterrpc/power.py (+1/-1)
src/maasserver/clusterrpc/utils.py (+1/-1)
src/maasserver/models/node.py (+2/-2)
src/maasserver/models/tests/test_node.py (+6/-4)
src/maasserver/rpc/regionservice.py (+3/-3)
src/maasserver/rpc/testing/fixtures.py (+2/-2)
src/maasserver/rpc/tests/test_regionservice.py (+5/-5)
src/provisioningserver/pserv_services/dhcp_probe_service.py (+1/-1)
src/provisioningserver/rpc/amp32.py (+2646/-0)
src/provisioningserver/rpc/arguments.py (+11/-11)
src/provisioningserver/rpc/cluster.py (+119/-117)
src/provisioningserver/rpc/clusterservice.py (+4/-4)
src/provisioningserver/rpc/common.py (+17/-17)
src/provisioningserver/rpc/monitors.py (+5/-3)
src/provisioningserver/rpc/region.py (+127/-127)
src/provisioningserver/rpc/testing/__init__.py (+10/-8)
src/provisioningserver/rpc/testing/tls.py (+1/-1)
src/provisioningserver/rpc/tests/test_amp32.py (+3270/-0)
src/provisioningserver/rpc/tests/test_arguments.py (+6/-4)
src/provisioningserver/rpc/tests/test_clusterservice.py (+7/-7)
src/provisioningserver/rpc/tests/test_common.py (+5/-3)
src/provisioningserver/rpc/tests/test_docs.py (+3/-3)
src/provisioningserver/rpc/tests/test_monitors.py (+6/-4)
src/provisioningserver/utils/__init__.py (+1/-1)
src/provisioningserver/utils/shell.py (+11/-0)
src/provisioningserver/utils/tests/test_shell.py (+13/-1)
- Gavin Panella (community): Approve
-
Diff: 1586 lines (+998/-184)16 files modifiedHACKING.txt (+9/-3)
src/maasserver/migrations/0150_power_parameters_and_state_updated_field.py (+515/-0)
src/maasserver/models/node.py (+13/-3)
src/maasserver/models/tests/test_node.py (+8/-0)
src/maasserver/models/timestampedmodel.py (+1/-0)
src/maasserver/rpc/nodes.py (+88/-22)
src/maasserver/rpc/tests/test_nodes.py (+133/-22)
src/maasserver/rpc/tests/test_regionservice.py (+9/-2)
src/maasserver/testing/factory.py (+23/-9)
src/maasserver/websockets/handlers/device.py (+1/-0)
src/maasserver/websockets/handlers/node.py (+3/-0)
src/provisioningserver/pserv_services/node_power_monitor_service.py (+31/-39)
src/provisioningserver/pserv_services/tests/test_node_power_monitor_service.py (+35/-47)
src/provisioningserver/rpc/power.py (+45/-32)
src/provisioningserver/rpc/region.py (+10/-5)
src/provisioningserver/rpc/tests/test_power.py (+74/-0)
- Gavin Panella (community): Approve
-
Diff: 1502 lines (+915/-169)16 files modifiedHACKING.txt (+9/-3)
src/maasserver/migrations/0139_power_parameters_and_state_updated_field.py (+453/-0)
src/maasserver/models/node.py (+13/-3)
src/maasserver/models/tests/test_node.py (+8/-0)
src/maasserver/models/timestampedmodel.py (+1/-0)
src/maasserver/rpc/nodes.py (+88/-22)
src/maasserver/rpc/tests/test_nodes.py (+133/-22)
src/maasserver/rpc/tests/test_regionservice.py (+9/-2)
src/maasserver/testing/factory.py (+23/-9)
src/maasserver/websockets/handlers/device.py (+1/-0)
src/maasserver/websockets/handlers/node.py (+3/-0)
src/provisioningserver/pserv_services/node_power_monitor_service.py (+31/-39)
src/provisioningserver/pserv_services/tests/test_node_power_monitor_service.py (+35/-47)
src/provisioningserver/rpc/power.py (+23/-17)
src/provisioningserver/rpc/region.py (+10/-5)
src/provisioningserver/rpc/tests/test_power.py (+75/-0)
Changed in maas: | |
milestone: | 1.7.0 → 1.7.1 |
Changed in maas: | |
status: | Triaged → In Progress |
Changed in maas: | |
milestone: | 1.7.1 → 1.7.2 |
Changed in maas: | |
milestone: | 1.7.2 → 1.7.3 |
Changed in maas: | |
milestone: | 1.7.3 → 1.9.0 |
Changed in maas: | |
status: | In Progress → Fix Committed |
tags: | added: amp |
Changed in maas: | |
status: | Fix Committed → Fix Released |
I don't think this is too critical as we can scale to more clusters, but it is annoying for sure. We're aiming for up to ~2000 per cluster.