unexplained container restarts cause connectivity issues to rabbitmq

Bug #2016541 reported by James Page
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Triaged
High
Unassigned

Bug Description

During operation of a long running sunbeam deployment (nearly a month now) I've noted spurious restarts of individual containers; generally the apps all deal with this fine (mysql, OVN and API services have all restarted in this way) but when the RabbitMQ container gets started, the clients panic and end up not interacting with the restarted broker once its back up and running.

a) we need to understand why the containers are getting restarted
b) something in eventlet/oslo.messaging seems broken on this content at yoga which needs to be looked at as well

James Page (james-page)
summary: - explained container restarts
+ explained container restarts cause connectivity issues to rabbitmq
Changed in snap-sunbeam:
status: New → Invalid
no longer affects: snap-sunbeam
summary: - explained container restarts cause connectivity issues to rabbitmq
+ unexplained container restarts cause connectivity issues to rabbitmq
James Page (james-page)
Changed in snap-openstack:
status: New → Triaged
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

I managed to capture some K8S events during a set of container restarts; the impacted containers all had an unhealthy event related to the readyness probe failure.

Pebble fronts the probes for the services; seemed to not respond in a timely fashion which is why the container got restarted.

Note that this is not always the workload container - charm containers are also impacted.

Revision history for this message
James Page (james-page) wrote :

$ microk8s.kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
controller-microk8s-localhost modeloperator-5c95787c8b-mfd69 1/1 Running 0 28d
openstack modeloperator-68847f7c65-zflt9 1/1 Running 0 28d
kube-system calico-kube-controllers-78666d9456-w6c59 1/1 Running 1 (28d ago) 28d
kube-system hostpath-provisioner-766849dd9d-qb54f 1/1 Running 3 (6d3h ago) 28d
metallb-system speaker-cq5vz 1/1 Running 0 28d
kube-system coredns-d489fb88-lg4zq 1/1 Running 0 28d
metallb-system controller-56c4696b5-n8ql8 1/1 Running 0 28d
kube-system calico-node-dbf2p 1/1 Running 1 (28d ago) 28d
controller-microk8s-localhost controller-0 3/3 Running 1 (28d ago) 28d
openstack ovn-relay-0 2/2 Running 12 (17h ago) 28d
openstack keystone-0 2/2 Running 10 (9h ago) 28d
openstack placement-0 2/2 Running 11 (8h ago) 28d
openstack traefik-0 2/2 Running 21 (6h12m ago) 28d
openstack mysql-0 2/2 Running 19 (6h12m ago) 28d
openstack certificate-authority-0 1/1 Running 4 (9h ago) 28d
openstack horizon-0 2/2 Running 10 (18h ago) 28d
openstack rabbitmq-0 2/2 Running 11 (5h27m ago) 28d
openstack neutron-0 2/2 Running 20 (5h26m ago) 28d
openstack ovn-central-0 4/4 Running 32 (5h26m ago) 28d
openstack glance-0 2/2 Running 17 (5h26m ago) 28d
openstack nova-0 4/4 Running 30 (6h13m ago) 28d

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.