[2.4] Multiple region/rack controller failures cause the UI to be very slow

Bug #1761601 reported by Andres Rodriguez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Critical
Lee Trager

Bug Description

4 controllers in total:

2 region/rack
2 racks

This caused the UI to not be able to load some pages sometimes, or be really slow at loading.

Changed in maas:
milestone: none → 2.4.0beta2
importance: Undecided → High
status: New → Triaged
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Lee Trager (ltrager) wrote :

When using HAProxy with MAAS in HA mode I am seeing new websocket connections being made every few seconds. Switching to Apache resolves the issue. I tested using the HAProxy proxy found from the MAAS docs[1]. We may need to update the documentation as per HAProxy docs [2] and increase the keep-alive timeout.

[1] https://docs.maas.io/devel/en/manage-ha
[2] https://www.haproxy.com/blog/websockets-load-balancing-with-haproxy/

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, I've nailed the issue:

my secondary region/rack died and I could see this in the logs:

2018-04-11 19:35:17 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:35:31 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:35:31 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:35:41 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:35:47 regiond: [info] 192.168.1.73 POST /ping HTTP/1.1 --> 404 NOT_FOUND (referrer: -; agent: PycURL/7.43.0.1 libcurl/7.58.0 GnuTLS/3.5.18 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3)
2018-04-11 19:35:51 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:35:52 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:36:01 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).
2018-04-11 19:36:12 maasserver: [error] Error while calling DescribePowerTypes: RPC connection timed out to rack controller 'node01' (ah3s6n).

Because it died, and because it is trying to do work, the UI has become really slow to browse.

Changed in maas:
importance: High → Critical
summary: - [2.4] Multiple region/rack controllers causes the UI to be very slow
+ [2.4] Multiple region/rack controller failures cause the UI to be very
+ slow
Changed in maas:
milestone: 2.4.0beta2 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.