compute-0 host lock failed vim_progress_status is services-disable-failed

Bug #1841448 reported by Anujeyan Manokeran
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
YaoLe

Bug Description

Brief Description
-----------------.
    During the regression run it was observer host-lock on compute was failing VIM progress state is services disabled failed as per below logs.

'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:52:54,727] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:52:35.835981+00:00 |
| uptime | 518 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-enabled |
+---------------------+--------------------------------------------+

Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0'
[2019-08-25 00:52:56,455] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | Locking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:52:35.835981+00:00 |
| uptime | 518 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-enabled |
+---------------------+----------------------------------------

'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:52:57,997] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | Locking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:52:56.301829+00:00 |
| uptime | 518 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-enabled |
+---------------------+--------------------------------------------+

Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:53:16,513] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | Locking- |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:53:12.278134+00:00 |
| uptime | 578 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-enabled |
+---------------------+--------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:16,513] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-25 00:53:16,616] 423 DEBUG MainThread ssh.expect :: Output:
0
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:19,619] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_3_6
[2019-08-25 00:53:19,619] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-25 00:53:19,619] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:53:21,138] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:53:18.628005+00:00 |
| uptime | 578 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-disable-failed |
+---------------------+--------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:21,138] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-25 00:53:21,241] 423 DEBUG MainThread ssh.expect :: Output:
0
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:21,242] 3724 INFO MainThread system_helper.wait_for_host_values:: compute-0 task has reached:
[2019-08-25 00:53:21,242] 3732 INFO MainThread system_helper.wait_for_host_values:: compute-0 is in state(s): {'task': ''}
[2019-08-25 00:53:21,242] 3685 INFO MainThread system_helper.wait_for_host_values:: Waiting for compute-0 to reach state(s) - {'administrative': 'locked'}
[2019-08-25 00:53:21,242] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_3_6
[2019-08-25 00:53:21,242] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-25 00:53:21,242] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:53:22,771] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:53:18.628005+00:00 |
| uptime | 578 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-disable-failed |
+---------------------+--------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:22,771] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-25 00:53:22,874] 423 DEBUG MainThread ssh.expect :: Output:
0
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-25 00:53:22,875] 3728 INFO MainThread system_helper.wait_for_host_values:: compute-0 administrative is unlocked.
[2019-08-25 00:53:25,878] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_3_6
[2019-08-25 00:53:25,878] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-25 00:53:25,878] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show compute-0'
[2019-08-25 00:53:27,388] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------------------+
| Property | Value |
+---------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.25 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {} |
| config_applied | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| config_status | None |
| config_target | 4416743c-f0eb-4981-a4d5-3cdb3f2fd342 |
| console | ttyS0,115200 |
| created_at | 2019-08-24T20:33:09.884522+00:00 |
| hostname | compute-0 |
| id | 4 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {u'locn': u''} |
| mgmt_ip | 192.168.204.195 |
| mgmt_mac | 00:1e:67:d0:03:41 |
| operational | enabled |
| personality | worker |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 19.08 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-08-25T00:53:18.628005+00:00 |
| uptime | 578 |
| uuid | f8800e67-5e9f-485e-988a-4eafb1cc277c |
| vim_progress_status | services-disable-failed |
+---------------------+--------------------------------------------+

-08-25T00:53:19.302 controller-0 VIM_Thread[4151030] INFO _instance_director.py.1820 Completed recovery audit cycle.
2019-08-25T00:53:28.827 controller-0 VIM_Thread[4151030] INFO _network_rebalance.py.1125 Triggering L3 Agent reschedule for enabled l3 agent host(s): [u'compute-0']
2019-08-25T00:53:28.827 controller-0 VIM_Thread[4151030] INFO _dhcp_rebalance.py.925 Triggering DHCP Agent reschedule for enabled dhcp agent host(s): [u'compute-0']
2019-08-25T00:53:32.160 controller-0 VIM_Thread[4151030] INFO _network_rebalance.py.376 L3 Agent routers balanced, max:2 min:0 threshold:3
2019-08-25T00:53:32.266 controller-0 VIM_Thread[4151030] INFO _dhcp_rebalance.py.286 DHCP Agent networks balanced, max:7 min:5 threshold:3
2019-08-25T00:54:10.595 controller-0 VIM_Thread[4151030] INFO _vim_nfvi_audits.py.873 Audit instances called, timer_id=15.
2019-08-25T00:55:03.736 controller-0 VIM_Network-Worker-0_Thread[1580564] ERROR Caught exception trying to disable host compute-0 agents: [OpenStack Exception: method=GET, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents?host=compute-0, headers={'Content-Type': 'application/json'}, body=None, reason=<urlopen error [Errno 110] Connection timed out>]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/openstack/neutron.py", line 740, in disable_network_agents
    token, host_name)
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/openstack/neutron.py", line 260, in _get_network_agents
    response = rest_api_request(token, "GET", api_cmd, api_cmd_headers)
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/openstack/rest_api.py", line 435, in rest_api_request
    api_cmd_headers, api_cmd_payload)
  File "/usr/lib64/python2.7/site-packages/nfv_plugins/nfvi_plugins/openstack/rest_api.py", line 425, in _rest_api_request
    api_cmd_payload, str(e), str(e))
OpenStackException: [OpenStack Exception: method=GET, url=http://neutron-server.openstack.svc.cluster.local:9696//v2.0/agents?host=compute-0, headers={'Content-Type': 'application/json'}, body=None, reason=<urlopen error [Errno 110] Connection timed out>]
2019-08-25T00:55:03.737 controller-0 VIM_Network-Worker-0_Thread[1580564] INFO _timer_scheduler.py.57 Not scheduling on time, elapsed=127447 ms.
2019-08-25T00:55:03.738 controller-0 VIM_Network-Worker-0_Thread[1580564] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-08-25T00:55:03.739 controller-0 VIM_Network-Worker-0_Thread[1580564] INFO _thread.py.218 Thread Network-Worker-0: not scheduling on time
2019-08-25T00:55:03.7

Severity
--------
Major

Steps to Reproduce
------------------
1. Verify system health. No alarm openstack application applied .
2. Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0'
[2019-08-25 00:52:56,455] 423 DEBUG MainThread ssh.expect :: Output:
3. Verify lock was successful.

Expected Behavior
------------------
As per description failure on VIM

Actual Behavior
----------------
As per description failure .
Reproducibility
---------------
Seen only once .Not tried in this load.
System Configuration
--------------------
2+2 system
Branch/Pull Time/Commit
-----------------------
BUILD_DATE= 2019-08-23_20-59-00
Last Pass
---------
2019-08-09_20-59-00
Timestamp/Logs
--------------
2019-08-25T00:53:32.266
Test Activity
-------------
Regression test

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Revision history for this message
Frank Miller (sensfan22) wrote :

As this issue is intermittent and not easily reproducible (passed in other regressions), making stx.3.0 gating and medium priority.

tags: added: stx.3.0 stx.distro.openstack stx.networking
Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Forrest Zhao (forrest.zhao)
Revision history for this message
Frank Miller (sensfan22) wrote :

Assigning to the networking PL to determine who should prime this bug.

Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → YaoLe (yaole)
Yang Liu (yliu12)
tags: added: stx.retestneeded
YaoLe (yaole)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
YaoLe (yaole) wrote :

This bug cannot be reproduced, the build info:
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20191125T000000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="330"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-11-25 00:00:00 +0000"

And my output is :
| administrative | locked
... |
| vim_progress_status | services-disabled |

It always locked successfully, and the vim_progress_status is in right status: diasbled.

Ghada Khalil (gkhalil)
Changed in starlingx:
status: In Progress → Invalid
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing since the issue is not reproducible.
host locks are covered regularly by automated sanity and regression. We haven't received reports of this issue since the initial occurrence. Also openstack has been upversioned to train, so the dhcp timeout failure seen here may no longer be applicable. There was a timeout increase from 60 seconds to 300 seconds in neutron stable/train:
https://review.opendev.org/#/c/692376/

Revision history for this message
Peng Peng (ppeng) wrote :

Issue was not reproduced on train
2019-11-21_20-00-00
wcp_3-6

Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.