Juju controller keeps restarting when deployed with juju-ha-space and juju-mgmt-space
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Ian Booth |
Bug Description
# Version
Juju 2.9.27
MAAS 2.9.2
# Problem
After bootstrapping a Juju controller on MAAS with juju-ha-space and juju-mgmt-space set, the juju controller agent seems to restarts every few minutes. There are a bunch of connection broken unexpectedly, timeouts and and unexpected errors in the controller machine logs.
When bootstrapping in same environment without these controller config options, the controller agent works OK (no restarts and issues in the logs).
# Controller log and IP info
ubuntu@
ubuntu@
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 10.42.198.122/23 fe80::5054:
eth1 UP 10.42.200.16/23 fe80::5054:
ubuntu@
default via 10.42.200.1 dev eth1 proto static
10.42.198.0/23 dev eth0 proto kernel scope link src 10.42.198.122
10.42.200.0/23 dev eth1 proto kernel scope link src 10.42.200.16
# Other info
ubuntu@
Name Space ID Subnets
alpha 0
site2-oob 1 10.42.196.0/23
site2-os-public 2 10.42.208.0/23
site2-neutron-
site2-os-data 4 172.17.100.0/23
site2-os-internal 5 172.17.102.0/23
site2-ceph-public 6 172.17.104.0/23
site2-ceph-cluster 7 172.17.106.0/23
site2-oam 8 10.42.200.0/23
site2-provision 9 10.42.198.0/23
ubuntu@
default-space: site2-oam
juju-ha-space: site2-oam
juju-mgmt-space: site2-oam
ubuntu@
Creating Juju controller "prodmaas-
Looking for packaged Juju agent version 2.9.27 for amd64
Located Juju agent version 2.9.27-ubuntu-amd64 at https:/
Launching controller instance(s) on prodmaas-
- ccy63p (arch=amd64 mem=8G cores=2)
Installing Juju agent on bootstrap instance
Fetching Juju Dashboard 0.8.1
Waiting for address
Attempting to connect to 10.42.198.122:22
Attempting to connect to 10.42.200.16:22
Connected to 10.42.198.122
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 10.42.198.122 to verify accessibility...
Bootstrap complete, controller "prodmaas-
Controller machines are in the "controller" model
Initial model "default" added
ubuntu@
prodmaas-
details:
uuid: 1bc6a86b-
controller-
api-endpoints: ['10.42.
cloud: prodmaas-site2
region: default
agent-version: 2.9.27
agent-
controller-
mongo-version: 4.4.11
ca-fingerprint: 95:B7:5A:
ca-cert: |
-----BEGIN CERTIFICATE-----
MIIEEjCCA
BQAwITENM
MzE5MTNaF
B2p1anUtY
bw/
hAb9Bg59K
SlUj3bq4u
8ByadTXsl
VFf0WwxNJ
FdAIy9u1v
8oPLj4Y0z
G7mIYtNMR
MEAwDgYDV
hK4JNFdQc
HHcNIHm1K
8LCIYgTBU
G6GkxqBZw
jbC1MQpZW
XWL8jP7oz
1S7fA/
/
3O4ZOsZo6
-----END CERTIFICATE-----
controller-
"0":
instance-id: ccy63p
models:
controller:
uuid: bcc708d7-
model-uuid: bcc708d7-
machine-
core-count: 2
default:
uuid: 252a0d89-
model-uuid: 252a0d89-
current-model: admin/default
account:
user: admin
access: superuser
ubuntu@
Attribute Value
agent-logfile-
agent-logfile-
api-port 17070
api-port-open-delay 2s
audit-log-
audit-log-
audit-log-
audit-log-max-size 300M
auditing-enabled true
batch-raft-fsm false
ca-cert |
-----BEGIN CERTIFICATE-----
MIIEEjCCAnqgA
BQAwITENMAsGA
MzE5MTNaFw0zM
B2p1anUtY2Ewg
bw/jYojYF5A0p
hAb9Bg59Keagr
SlUj3bq4uyqcm
8ByadTXslytNF
VFf0WwxNJPQb9
FdAIy9u1vHf2b
8oPLj4Y0zTgp6
G7mIYtNMR4c1C
MEAwDgYDVR0PA
hK4JNFdQcou+
HHcNIHm1K25JX
8LCIYgTBUDYo7
G6GkxqBZwLVue
jbC1MQpZWrXGP
XWL8jP7ozIrrb
1S7fA/
/Zs6QNXF/
3O4ZOsZo6B8Zm
-----END CERTIFICATE-----
charmstore-url https:/
controller-name prodmaas-
controller-uuid 1bc6a86b-
juju-db-
juju-ha-space site2-oam
juju-mgmt-space site2-oam
max-agent-
max-charm-
max-debug-
max-prune-
max-prune-
max-txn-log-size 10M
metering-url https:/
migration-
model-logfile-
model-logfile-
model-logs-size 20M
mongo-memory-
non-synced-
prune-txn-
prune-txn-
set-numa-
state-port 37017
ubuntu@
15:42:19 INFO juju.cmd supercommand.go:56 running juju [2.9.27 0 acb32588d1752e8
15:42:19 DEBUG juju.cmd supercommand.go:57 args: []string{
15:42:19 INFO juju.juju api.go:78 connecting to API addresses: [10.42.
15:42:19 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://
15:42:19 INFO juju.api apiclient.go:688 connection established to "wss://
Model Controller Cloud/Region Version SLA Timestamp
controller prodmaas-
Machine State DNS Inst id Series AZ Message
0 started 10.42.198.122 hcc-admin26-vm01 focal hcc-rack11
15:42:19 DEBUG juju.api monitor.go:35 RPC connection died
15:42:19 INFO cmd supercommand.go:544 command finished
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Hello,
I notice your controller has 2 network interfaces/spaces.
I may be having the same issue right now. juju-db/ common/ logs/mongodb. log ?
Just to be sure, can you provide the logs of /var/snap/
Another good test would be to check if your controller is stable if you configure juju-ha-space to site2-provision instead when bootstrapping ?