regiond stops listening on the API port 5240 until regiond is restarted (listening sockets are lost) after database failover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
High
|
Alberto Donato |
Bug Description
Forked from https:/
In some database failover cases regiond sockets get lost and are never reestablished while regiond processes are there and respond to database notifications (e.g. update bind9 configuration after dns record changes).
This was verified 2 times on independent test beds after several DB failovers:
https:/
https:/
See maas-vhost2 logs:
https:/
# no listening sockets
ubuntu@
State Recv-Q Send-Q Local Address:Port Peer Address:Port
# however, the processes are running
ubuntu@
1002 /bin/sh -c exec /usr/sbin/regiond 2>&1 | tee -a $LOGFILE
1004 /usr/bin/python3 /usr/sbin/regiond
1005 tee -a /var/log/
1967 /usr/bin/python3 /usr/sbin/regiond
1969 /usr/bin/python3 /usr/sbin/regiond
1972 /usr/bin/python3 /usr/sbin/regiond
1973 /usr/bin/python3 /usr/sbin/regiond
It definitely has the right content in the zone file so it receives DNS updates, however, no API sockets are present (port 5240):
ubuntu@
; Zone file modified: 2019-02-28 23:26:43.245638.
$TTL 30
#...
@ 30 IN NS maas.
maas-region 0 IN A 10.100.1.2
ubuntu@
; Zone file modified: 2019-02-28 23:26:32.619195.
$TTL 30
#...
@ 30 IN NS maas.
maas-region 0 IN A 10.100.1.2
Given our resource agent tries to use http://
root@maas-vhost2:~# grep -B1 -A1 -a refused /var/log/
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: notice: operation_finished: res_maas_
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: notice: operation_finished: res_maas_
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: notice: operation_finished: res_maas_
--
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: notice: operation_finished: res_maas_
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: notice: operation_finished: res_maas_
Feb 28 23:26:30 [1401] maas-vhost2 lrmd: info: log_finished: finished - rsc:res_
Maybe the sockets were gone even before that because I can see "request to http://
Changed in maas: | |
importance: | Undecided → Critical |
status: | New → Triaged |
assignee: | nobody → Blake Rouse (blake-rouse) |
milestone: | none → 2.5.3 |
importance: | Critical → High |
Changed in maas: | |
milestone: | 2.5.3 → 2.5.4 |
summary: |
- regiond sockets lost after database failover + regiond stops listening on the API port 5240 (listening sockets are + lost) after database failover |
Changed in maas: | |
assignee: | Blake Rouse (blake-rouse) → Alberto Donato (ack) |
summary: |
- regiond stops listening on the API port 5240 (listening sockets are - lost) after database failover + regiond stops listening on the API port 5240 until regiond is restarted + (listening sockets are lost) after database failover |
Changed in maas: | |
milestone: | 2.5.4 → none |
Subscribed ~field-high for tracking.