Activity log for bug #1858216

Date Who What changed Old value New value Message
2020-01-03 16:18:09 Anujeyan Manokeran bug added bug
2020-01-03 16:19:43 Anujeyan Manokeran description Brief Description ----------------- During the cable pull test on cluster and mgmt configured on same interface cause multinode failure where standby controller(c-0) and worker nodes rebooted immediately. Multinodes failure avoidance timeout was set to 0 in this lab. Multinode failure was not avoided as per set parameter below. **Below set parameter value 5c9b64bf-769c-4949-96df-d21fda360bf7 | platform | maintenance | mnfa_threshold | 2 | None | None | | 2d6feb56-b664-4d61-9468-8fba97c88a79 | platform | maintenance | mnfa_timeout | 0 | None | None | | 37d3fc48-54cf-408e-b497-796a9c25cf4b | platform | maintenance | worker_boot_timeout | 720 | None | None | | c2ab1f ** cable pull time 2020-01-03T14:37:10.135 controller-1 kernel: info [19603.064914] i40e 0000:18:00.0 enp24s0f0: NIC Link is Down 2020-01-03T14:37:11.271 controller-1 kernel: info [19604.196161] i40e 0000:18:00.0 enp24s0f0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None 2020-01-03T14:37:11.365 controller-1 kernel: info [19604.292773] i40e 0000:18:00.0 enp24s0f0: NIC Link is Down **Below interface configuration for controller-1 system host-if-list 1 +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ | uuid | name | class | type | vlan | ports | uses i/f | used by i/f | attributes | | | | | | id | | | | | +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ | 117e4f43-7f4c-4f1d-8edc-02cfb6033d79 | data0 | data | ethernet | None | [u'enp175s0f0'] | [] | [] | MTU=1500,accelerated=True | | 5c6daf5d-8f78-4143-9193-0aaec5ca7924 | oam0 | platform | ethernet | None | [u'eno1'] | [] | [] | MTU=1500 | | 8c1f6b74-38a1-4b3d-ae0d-b47c2ec72390 | cluster0 | platform | vlan | 187 | [] | [u'pxeboot0' | [] | MTU=1500 | | | | | | | | ] | | | | | | | | | | | | | | a016d64b-a825-4c09-b54f-569fa551f4ce | pxeboot0 | platform | ethernet | None | [u'enp24s0f0'] 14 | [] | [u'mgmt0', | MTU=9216 | | | | | | | | | u'cluster0'] | | | | | | | | | | | | | a3d1f954-a9e2-482b-ad80-5d2015ad62e3 | mgmt0 | platform | vlan | 186 | [] | [u'pxeboot0' | [] | MTU=1500 | | | | | | | | ] | | | | | | | | | | | | | +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ Reboot on all the nodes. $ system host-list +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | unlocked | disabled | intest | | 2 | compute-0 | worker | unlocked | disabled | intest | | 3 | compute-1 | worker | unlocked | disabled | intest | | 4 | compute-2 | worker | unlocked | disabled | intest | | 5 | controller-1 | controller | unlocked | enabled | available | Severity -------- Major Steps to Reproduce ------------------ 1. Have AIO+ lab with cluster and MGT provisioned in same network 2. On active controller pull cable where cluster and Mgt are in same vlan. 3. Verify host states after cable pull. Expected Behavior ------------------ All the nodes shouldn’t reboot. Actual Behavior ---------------- All the nodes rebooted for cable pull . MNFA was not triggered. Reproducibility --------------- Tested once in this load System Configuration -------------------- AIO+ wolfpass 8-12 Branch/Pull Time/Commit ----------------------- 2020-01-02 20:04:12 -0500 Last Pass --------- Last tested on load 2019-12-13 19:04:39 it was different issue https://bugs.launchpad.net/starlingx/+bug/1856614 Timestamp/Logs -------------- 2020-01-03T14:37:10.135 Test Activity ------------- Regression Brief Description ----------------- During the cable pull test on cluster and mgmt configured on same interface cause multinode failure. Standby controller(c-0) and worker nodes rebooted immediately on cable pull on active controller. Multinodes failure avoidance timeout was set to 0 in this lab. Multinode failure was not avoided as per set parameter below. **Below set parameter value 5c9b64bf-769c-4949-96df-d21fda360bf7 | platform | maintenance | mnfa_threshold | 2 | None | None | | 2d6feb56-b664-4d61-9468-8fba97c88a79 | platform | maintenance | mnfa_timeout | 0 | None | None | | 37d3fc48-54cf-408e-b497-796a9c25cf4b | platform | maintenance | worker_boot_timeout | 720 | None | None | | c2ab1f ** cable pull time 2020-01-03T14:37:10.135 controller-1 kernel: info [19603.064914] i40e 0000:18:00.0 enp24s0f0: NIC Link is Down 2020-01-03T14:37:11.271 controller-1 kernel: info [19604.196161] i40e 0000:18:00.0 enp24s0f0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None 2020-01-03T14:37:11.365 controller-1 kernel: info [19604.292773] i40e 0000:18:00.0 enp24s0f0: NIC Link is Down **Below interface configuration for controller-1 system host-if-list 1 +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ | uuid | name | class | type | vlan | ports | uses i/f | used by i/f | attributes | | | | | | id | | | | | +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ | 117e4f43-7f4c-4f1d-8edc-02cfb6033d79 | data0 | data | ethernet | None | [u'enp175s0f0'] | [] | [] | MTU=1500,accelerated=True | | 5c6daf5d-8f78-4143-9193-0aaec5ca7924 | oam0 | platform | ethernet | None | [u'eno1'] | [] | [] | MTU=1500 | | 8c1f6b74-38a1-4b3d-ae0d-b47c2ec72390 | cluster0 | platform | vlan | 187 | [] | [u'pxeboot0' | [] | MTU=1500 | | | | | | | | ] | | | | | | | | | | | | | | a016d64b-a825-4c09-b54f-569fa551f4ce | pxeboot0 | platform | ethernet | None | [u'enp24s0f0'] 14 | [] | [u'mgmt0', | MTU=9216 | | | | | | | | | u'cluster0'] | | | | | | | | | | | | | a3d1f954-a9e2-482b-ad80-5d2015ad62e3 | mgmt0 | platform | vlan | 186 | [] | [u'pxeboot0' | [] | MTU=1500 | | | | | | | | ] | | | | | | | | | | | | | +--------------------------------------+----------+----------+----------+------+-----------------+--------------+----------------+---------------------------+ Reboot on all the nodes. $ system host-list +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | unlocked | disabled | intest | | 2 | compute-0 | worker | unlocked | disabled | intest | | 3 | compute-1 | worker | unlocked | disabled | intest | | 4 | compute-2 | worker | unlocked | disabled | intest | | 5 | controller-1 | controller | unlocked | enabled | available | Severity -------- Major Steps to Reproduce ------------------ 1. Have AIO+ lab with cluster and MGT provisioned in same network 2. On active controller pull cable where cluster and Mgt are in same vlan. 3. Verify host states after cable pull. Expected Behavior ------------------ All the nodes shouldn’t reboot. Actual Behavior ---------------- All the nodes rebooted for cable pull . MNFA was not triggered. Reproducibility --------------- Tested once in this load System Configuration -------------------- AIO+ wolfpass 8-12 Branch/Pull Time/Commit ----------------------- 2020-01-02 20:04:12 -0500 Last Pass --------- Last tested on load 2019-12-13 19:04:39 it was different issue https://bugs.launchpad.net/starlingx/+bug/1856614 Timestamp/Logs -------------- 2020-01-03T14:37:10.135 Test Activity ------------- Regression
2020-01-03 16:26:23 Eric MacDonald starlingx: assignee Eric MacDonald (rocksolidmtce)
2020-01-03 16:29:22 Anujeyan Manokeran attachment added collect logs https://bugs.launchpad.net/starlingx/+bug/1858216/+attachment/5317438/+files/ALL_NODES_20200103.151209.tar
2020-01-03 16:29:27 Anujeyan Manokeran attachment added collect logs https://bugs.launchpad.net/starlingx/+bug/1858216/+attachment/5317439/+files/ALL_NODES_20200103.151209.tar
2020-01-03 16:46:16 Eric MacDonald summary Cable pull test on active controller cause multi node failure lab with cluster and mgmt configured on a same interface MNFA times out immediately with timeout value of 0
2020-01-03 19:29:09 OpenStack Infra starlingx: status New In Progress
2020-01-03 19:54:32 Ghada Khalil tags stx.metal
2020-01-03 19:55:33 Ghada Khalil tags stx.metal stx.3.0 stx.4.0 stx.metal
2020-01-03 19:55:39 Ghada Khalil starlingx: importance Undecided High
2020-01-03 19:56:57 Ghada Khalil tags stx.3.0 stx.4.0 stx.metal stx.4.0 stx.metal
2020-01-03 19:57:01 Ghada Khalil starlingx: importance High Medium
2020-01-03 19:58:11 Ghada Khalil starlingx: importance Medium High
2020-01-03 19:58:48 Ghada Khalil tags stx.4.0 stx.metal stx.3.0 stx.4.0 stx.metal
2020-01-03 19:59:20 Ghada Khalil bug added subscriber Daniel Badea
2020-01-08 21:21:11 OpenStack Infra starlingx: status In Progress Fix Released
2020-01-12 14:56:20 Yang Liu tags stx.3.0 stx.4.0 stx.metal stx.3.0 stx.4.0 stx.metal stx.retestneeded
2020-01-16 14:46:32 Ghada Khalil tags stx.3.0 stx.4.0 stx.metal stx.retestneeded in-r-stx30 stx.3.0 stx.4.0 stx.metal stx.retestneeded
2020-01-27 20:59:21 Yang Liu tags in-r-stx30 stx.3.0 stx.4.0 stx.metal stx.retestneeded in-r-stx30 stx.3.0 stx.4.0 stx.metal
2020-02-05 15:17:54 OpenStack Infra tags in-r-stx30 stx.3.0 stx.4.0 stx.metal in-f-centos8 in-r-stx30 stx.3.0 stx.4.0 stx.metal