"openstack server list" cmd not working after controller lock/unlock

Bug #1840262 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Al Bailey

Bug Description

Brief Description
-----------------
In SX system, after host lock/unlock, "openstack server list" cmd not working at all.

Severity
--------
Major

Steps to Reproduce
------------------
host lock/unlock; after system bootup, run "openstack server list cmd

TC-name: z_containers/test_custom_containers.py::test_host_operations_with_custom_kubectl_app

Expected Behavior
------------------
"openstack server list" cmd working after controller lock/unlock

Actual Behavior
----------------
cmd not working

Reproducibility
---------------
Seen once

System Configuration
--------------------
One node system

Lab-name: SM-3

Branch/Pull Time/Commit
-----------------------
stx master as of 2019-08-14_20-59-00

Last Pass
---------
Lab: SM_3
Load: 2019-08-13_20-59-00

Timestamp/Logs
--------------
[2019-08-15 08:36:03,679] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0'

[2019-08-15 08:36:53,045] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'

[2019-08-15 08:44:40,412] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show controller-0'
[2019-08-15 08:44:42,174] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+----------------------------------------------------------------------+
| Property | Value |
+---------------------+----------------------------------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |

[2019-08-15 08:47:02,247] 301 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server list'
[2019-08-15 08:47:09,297] 423 DEBUG MainThread ssh.expect :: Output:
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Unable to establish connection to http://keystone.openstack.svc.cluster.local/v3/auth/tokens: HTTPConnectionPool(host='keystone.openstack.svc.cluster.local', port=80): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7faa6fc83690>: Failed to establish a new connection: [Errno 113] No route to host',))
controller-0:~$

[2019-08-15 09:43:43,184] 301 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server list'
[2019-08-15 09:43:53,294] 394 WARNING MainThread ssh.expect :: No match found for ['.*controller\\-[01][:| ].*\\$ '].
expect timeout.
[2019-08-15 09:43:53,294] 779 DEBUG MainThread ssh.send_control:: Sending ctrl+c

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Additional note
subsequent host lock operations fail

270.001 Host controller-0 compute services failure, failed to disable nova services host=controller-0.services=compute critical 2019-08-15T10:33:49

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

see sysinv.log in logs attachede
2019-08-15 14:33:49.463 94484 INFO sysinv.api.controllers.v1.host [-] controller-0 Pending update_vim_progress_status services-disable-failed
2019-08-15 15:28:32.733 94484 INFO sysinv.api.controllers.v1.host [-] controller-0 Pending update_vim_progress_status services-disable-failed

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Numan Waheed (nwaheed)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 for now given this was reported once. This should be triaged to better understand what the issue is.

tags: added: stx.containers stx.sanity
tags: added: stx.3.0
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Al Bailey (albailey1974)
Revision history for this message
Al Bailey (albailey1974) wrote :

{"log":"2019-08-15 11:19:25.417 15 CRITICAL keystonemiddleware.auth_token [-] Unable to validate token: Unab
le to establish connection to http://keystone.openstack.svc.cluster.local:80/v3/auth/tokens: HTTPConnectionPool(host='keystone.openstack.svc.cluster.local', port=80): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnec
tionError('\u003curllib3.connection.HTTPConnection object at 0x7fa9a8cde3d0\u003e: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)): ConnectFailure: Unable to establish connection to http://keystone.openstack.svc.cluste
r.local:80/v3/auth/tokens: HTTPConnectionPool(host='keystone.openstack.svc.cluster.local', port=80): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('\u003curllib3.connection.HTTPConnection object at 0x7fa
9a8cde3d0\u003e: Failed to establish a new connection: [Errno 110] ETIMEDOUT',))\n","stream":"stdout","time":"2019-08-15T11:19:25.419681183Z"}

This error is shown in:
cinder-api
glance-api
neutron-server
nova-api-osapi
placement-api

There is a similar error is in mariadb-server container log (Note that port 443 is SSL)

{"log":"urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.96.0.1', port=443): Max retries exceeded with url: /version/ (Caused by NewConnectionError('\u003curllib3.connection.VerifiedHTTPSConnection object at 0x7f81ba031b10\u003e: Failed to establish a new connection: [Errno 113] No route to host',))\n","stream":"stderr","time":"2019-08-15T08:46:47.33777689Z"}

Revision history for this message
Al Bailey (albailey1974) wrote :

Note: at the time this bug was raised, there were a number of performance problems and timeouts being tracked by https://bugs.launchpad.net/starlingx/+bug/1837426.

Those fixes were checked in in the next day or two after this bug was raised and addressed high cpu usage, which could make pods non-responsive, particularly in AIO environments.

Bad performance might explain the TIMEOUTs, since there are no errors reported in the keystone pods.
It would also explain why the bug was not re-encountered. The performance fixes were submitted shortly afterwards.

Al

Al Bailey (albailey1974)
Changed in starlingx:
status: Triaged → Invalid
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was not reproduced on train
2019-11-21_20-00-00
wcp_3-6

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.