Unclear what '25% connected to region controllers' means

Bug #1695704 reported by Mark Shuttleworth
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned

Bug Description

I have one rack controller that says it is '25% connected'. That isn't very helpful at all! I would like to see which services have problems with a recommendation on how to fix them if that's detectable.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

What that is actually say is that rackd itself is only 25% connected. All other services that the rackd manages is working fine, and rackd itself is working its just in a degraded state as its missing a couple of connections to the regiond. The rackd should automatically fix this issue and reconnect to the other regiond processes, and it should not cause any issues with operation.

Since this is a connection issue from the rackd to the regiond we could say what regiond processes you are not connected to instead of a precentage. Something like:

rackd -- degraded -- missing connections to my-region:2143, my-region:2144, my-region:2145

or it could be just the regiond name

rackd -- degraded -- missing 2 connections to my-region and other-region

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Re: [Bug 1695704] Re: Unclear what '25% connected to region controllers' means

I think I only have a single regiond server. How could it be 25%
connected in this case? There are 4 rack controllers in my MAAS, but
only one region-and-rack controller.

Mark

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Each regiond server runs 4 seperate regiond processes. Each rack controller connects to each process in the regiond server. Based on the degraded state your rackd is only connected to 1 or the 4 processes when it should be connected to all 4.

Is this the only rack controller that is reporting this? Or is all rack controllers reporting this degraded state? If its just that one rack controller then something in the networking is preventing it from connecting to the other 3 regiond processes.

Do you have any iptables rules in place? Each rackd process connections to the regiond using 5250, 5251, 5252, and 5253.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Interesting. I believe the flag disappeared after a reboot, not sure
whether it was the region or the rack controller that rebooted.

Nonetheless, the error message itself is poorly constructed, reflecting
internal implementation details. Should a user be concerned? Is there a
degraded performance impact? Is there anything the user can do?

More usefully, the rack and region should self-heal, and provide a log
of attempts to do that, which is accessible from the warning flag. So
seeing the (improved) error, I can click on something which tells me
more about the problem and shows a log of attempts to self-heal, that
allows for better bug reports and debugging. For example, if ports are
blocked and connections fail even though the region controller is
offering them, then that can be logged as "Please check connectivity
from rack-IP to region-IP:port".

Mark

Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.3.0
Changed in maas:
milestone: 2.3.0 → 2.3.x
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Added to MAAS backlog item related to improving diagnosability (internal ref. PF-3726)

Changed in maas:
milestone: 2.3.x → 3.4.0
status: Triaged → Invalid
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.0-beta1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.