Queens -> Rocky Upgrade breaks cinder/glance

Bug #1793781 reported by Jake Zufelt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Invalid
Undecided
Unassigned

Bug Description

We are upgrading from Queens 17.0.8 to Rocky 18.0.0.0rc2 following the guide on https://docs.openstack.org/openstack-ansible/rocky/admin/upgrades/major-upgrades.html#perform-a-controlled-rolling-restart-of-the-galera-containers

After each playbook I check the connectivity of each container by a simple ping, and confirm all backends are up in haproxy.

While running "# openstack-ansible setup-hosts.yml --limit '!galera_all:!neutron_agent:!rabbitmq_all'" the cinder_api, cinder_volums, glance containers all fail on the "TASK [openstack_hosts : Install distro packages]" due to their connectivity being broken. I can no longer ping into those 3 containers. I can ping resources like google from inside the containers, but I can not ping things on the same subnet, so they fail to resolve yum repos.

No tasks or playbooks have failed up unto this point.

Here is a paste of the openstack user config file http://paste.openstack.org/show/730549/

Revision history for this message
Jake Zufelt (skiedude) wrote :

Its only the containers using the storage network that seem to have the connectivity issue that happens during this playbook. We use the same subnet for the container and storage networks, that seemed to work just fine with Queens.

Jake Zufelt (skiedude)
description: updated
Revision history for this message
Jake Zufelt (skiedude) wrote :

Did some poking around and found that the arp table entries on the infra host are incomplete for those 3 containers:

infra1-glance-container-1e3398b3.openstack.local (10.5.25.201) at <incomplete> on br-mgmt
infra1-cinder-api-container-339018e8.openstack.local (10.5.25.209) at <incomplete> on br-mgmt

For kicks I tried creating the arp entry manually with the mac address, but I still can't ping the container. I can also see the arp requests that seem to go unanswered.

17:34:38.160833 ARP, Request who-has infra1-glance-container-1e3398b3.openstack.local tell infra1.openstack.local, length 28
17:34:39.162931 ARP, Request who-has infra1-glance-container-1e3398b3.openstack.local tell infra1.openstack.local, length 28
17:34:40.164899 ARP, Request who-has infra1-glance-container-1e3398b3.openstack.local tell infra1.openstack.local, length 28
17:34:41.166860 ARP, Request who-has infra1-glance-container-1e3398b3.openstack.local tell infra1.openstack.local, length 28

Revision history for this message
Jake Zufelt (skiedude) wrote :

I tried a fresh Rocky install this morning after rekicking the baremetal hosts, and the same 3 volumes failed. So the issue seems to be with my config and Rocky.

Mohammed Naser (mnaser)
Changed in openstack-ansible:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.