Comment 0 for bug 1881899

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

Brief Description
-----------------
Rebooted both controllers but Openstack fails to come back up.

[root@controller-1 sysadmin(keystone_admin)]# openstack endpoint list
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Service Unavailable (HTTP 503)

Node status:
[root@controller-1 sysadmin(keystone_admin)]# system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | degraded |
| 2 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[root@controller-1 sysadmin(keystone_admin)]# fm alarm-list
+----------+---------------------------------------------------------------------+------------------------+----------+--------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+---------------------------------------------------------------------+------------------------+----------+--------------+
| 400.001 | Service group cloud-services warning; dbmon(enabled-active, ) | service_domain= | minor | 2020-06-03T1 |
| | | controller. | | 3:02:13. |
| | | service_group=cloud- | | 385662 |
| | | services.host= | | |
| | | controller-1 | | |
| | | | | |
| 400.001 | Service group cloud-services warning; dbmon(enabled-standby, ) | service_domain= | minor | 2020-06-03T1 |
| | | controller. | | 3:01:13. |
| | | service_group=cloud- | | 112434 |
| | | services.host= | | |
| | | controller-0 | | |
| | | | | |
| 200.006 | controller-0 is degraded due to the failure of its 'pci-irq- | host=controller-0. | major | 2020-06-03T1 |
| | affinity-agent' process. Auto recovery of this major process is in | process=pci-irq- | | 2:46:13. |
| | progress. | affinity-agent | | 918380 |
| | | | | |
+----------+---------------------------------------------------------------------+------------------------+----------+--------------+

[root@controller-1 sysadmin(keystone_admin)]# kubectl get pods -o wide -n openstack | grep -v Running | grep -v Completed
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cinder-api-59979594ff-25hrl 0/1 Init:0/2 0 26m 172.16.192.109 controller-0 <none> <none>
cinder-backup-6dd95fc9dd-svp5r 0/1 Init:0/4 0 26m 172.16.192.123 controller-0 <none> <none>
cinder-scheduler-76c65f6979-5tmt5 0/1 Init:0/2 0 26m 172.16.192.82 controller-0 <none> <none>
cinder-volume-b7dfbb7b9-f47bk 0/1 Init:0/4 0 26m 172.16.192.90 controller-0 <none> <none>
cinder-volume-b7dfbb7b9-mtkz8 0/1 Init:3/4 7 9h 172.16.166.185 controller-1 <none> <none>
cinder-volume-usage-audit-1591187700-g64xf 0/1 Init:0/1 0 27m 172.16.166.157 controller-1 <none> <none>
fm-rest-api-8b5b97bf8-qdlbx 0/1 Init:0/1 0 26m 172.16.192.67 controller-0 <none> <none>
fm-rest-api-8b5b97bf8-v5wz9 0/1 CrashLoopBackOff 9 8h 172.16.166.140 controller-1 <none> <none>
glance-api-6b74f659d-w9t4g 0/1 Init:0/3 0 26m 172.16.192.101 controller-0 <none> <none>
heat-api-846d848bd9-hd46z 0/1 Init:0/1 0 26m 172.16.192.108 controller-0 <none> <none>
heat-cfn-9d6f7ffc5-rvb4d 0/1 Init:0/1 0 26m 172.16.192.73 controller-0 <none> <none>
heat-engine-6487ff65c6-zk4n7 0/1 Init:0/1 0 26m 172.16.192.80 controller-0 <none> <none>
heat-engine-cleaner-1591187700-kd2pn 0/1 Init:0/1 0 27m 172.16.166.156 controller-1 <none> <none>
horizon-65d4b5bdcf-ltms2 0/1 Init:0/1 0 21m 172.16.192.83 controller-0 <none> <none>
keystone-api-6c76774bf7-l7c4d 0/1 Init:0/1 0 26m 172.16.192.113 controller-0 <none> <none>
libvirt-libvirt-default-4mf4v 0/1 Init:0/3 1 9h 192.168.204.2 controller-0 <none> <none>
mariadb-server-0 0/1 CrashLoopBackOff 8 9h 172.16.166.158 controller-1 <none> <none>
neutron-dhcp-agent-controller-0-937646f6-r5skk 0/1 Init:0/1 1 9h 192.168.204.2 controller-0 <none> <none>
neutron-l3-agent-controller-0-937646f6-fk5hc 0/1 Init:0/1 1 9h 192.168.204.2 controller-0 <none> <none>
neutron-metadata-agent-controller-0-937646f6-nzkxz 0/1 Init:0/2 1 9h 192.168.204.2 controller-0 <none> <none>
neutron-ovs-agent-controller-0-937646f6-dbfc2 0/1 Init:0/3 1 9h 192.168.204.2 controller-0 <none> <none>
neutron-server-7c9678cf58-dq52p 0/1 Init:0/1 0 26m 172.16.192.74 controller-0 <none> <none>
neutron-server-7c9678cf58-s85dg 0/1 CrashLoopBackOff 8 9h 172.16.166.191 controller-1 <none> <none>
neutron-sriov-agent-controller-0-937646f6-qxt9g 0/1 Init:0/2 1 9h 192.168.204.2 controller-0 <none> <none>
nova-api-metadata-b9b4fdb9b-d2gr6 0/1 CrashLoopBackOff 8 9h 172.16.166.177 controller-1 <none> <none>
nova-api-metadata-b9b4fdb9b-kg859 0/1 Init:0/2 0 26m 172.16.192.110 controller-0 <none> <none>
nova-api-osapi-856679d49f-4ljnl 0/1 Init:0/1 0 26m 172.16.192.117 controller-0 <none> <none>
nova-compute-controller-0-937646f6-9lrqs 0/2 Init:0/6 1 9h 192.168.204.2 controller-0 <none> <none>
nova-conductor-6cbc75dd89-nxvwc 0/1 Init:0/1 0 26m 172.16.192.66 controller-0 <none> <none>
nova-novncproxy-5bd676cfc4-82r8x 0/1 Init:0/3 0 26m 172.16.192.72 controller-0 <none> <none>
nova-scheduler-7fbf5cdd4-ckmkd 0/1 CrashLoopBackOff 6 9h 172.16.166.172 controller-1 <none> <none>
nova-scheduler-7fbf5cdd4-h65j5 0/1 Init:0/1 0 26m 172.16.192.121 controller-0 <none> <none>
nova-service-cleaner-1591189200-927h5 0/1 Init:0/1 0 6m56s 172.16.192.92 controller-0 <none> <none>

Severity
--------
Critical: openstack is unusable

Steps to Reproduce
------------------
1. Reboot both controllers with reboot -f
2. Wait for them to come back up

Expected Behavior
------------------
'openstack endpoint list' should work

Actual Behavior
----------------
State what is the actual behavior

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
AIO-DX ipv4

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
Did this test scenario pass previously? If so, please indicate the load/pull time info of the last pass.
Use this section to also indicate if this is a new test scenario.

Test Activity
-------------
Developer Testing

 Workaround
 ----------
None