Port list was not showing for some computes during install

Bug #1834245 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
ChenjieXu

Bug Description

Brief Description
-----------------
             During the execution of install procedure on a 2+10 lab it was observed that after controller-0 is unlocked and all other hosts are bulk added they are all on line but unable to list ports except compute-9 and compute-5.

$ cat /etc/platform/platform.conf
nodetype=worker
subfunction=worker
system_type=Standard
security_profile=standard
INSTALL_UUID=007a2e78-1065-423b-9a1c-d5b4d1515898
http_port=8080
management_interface=ens801f1
UUID=438661b0-7134-46fc-a51c-bb257b95ae7d

system host-list
system+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | locked | disabled | online |
| 3 | compute-0 | worker | locked | disabled | online |
| 4 | compute-1 | worker | locked | disabled | online |
| 5 | compute-2 | worker | locked | disabled | online |
| 6 | compute-3 | worker | locked | disabled | online |
| 7 | compute-4 | worker | locked | disabled | online |
| 8 | compute-5 | worker | locked | disabled | online |
| 9 | compute-6 | worker | locked | disabled | online |
| 10 | compute-7 | worker | locked | disabled | online |
| 11 | compute-8 | worker | locked | disabled | online |
| 12 | compute-9 | worker | locked | disabled | online |

     openstack endpoint list
+----------------------------------+-----------+--------------+-----------------+---------+-----------+--------------------------------+
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+----------------------------------+-----------+--------------+-----------------+---------+-----------+--------------------------------+
| f5ae5ffdb7d6450294625286c6028a6f | RegionOne | fm | faultmanagement | True | admin | http://192.168.204.2:18002 |
| 4fd2244db0bd41cea6414d3265bc4c61 | RegionOne | fm | faultmanagement | True | internal | http://192.168.204.2:18002 |
| c9201cdcddcf4c728a8cea9eb6d98263 | RegionOne | fm | faultmanagement | True | public | http://128.224.151.182:18002 |
| e977870c26bf46ed968bd65ef69e1848 | RegionOne | patching | patching | True | admin | http://192.168.204.2:5491 |
| fca00fb93c5440d3a8c063c093b7ff11 | RegionOne | patching | patching | True | internal | http://192.168.204.2:5491 |
| 313b133d21fe4545a8eb9b61f0c30eaa | RegionOne | patching | patching | True | public | http://128.224.151.182:15491 |
| 428a2308ed00477aa855512ce02d7a43 | RegionOne | vim | nfv | True | admin | http://192.168.204.2:4545 |
| 64e937df592542f3a0d745126bf568ec | RegionOne | vim | nfv | True | internal | http://192.168.204.2:4545 |
| b7269fda8fc94453a9a421febddd98c1 | RegionOne | vim | nfv | True | public | http://128.224.151.182:4545 |
| c1ee7f1acb5a4b06aac27e6be3948cff | RegionOne | smapi | smapi | True | admin | http://192.168.204.2:7777 |
| 6a8317d016744ae8bd9beb4393b17f4e | RegionOne | smapi | smapi | True | internal | http://192.168.204.2:7777 |
| 36e836d2805d4704a76fb58638ff3da8 | RegionOne | smapi | smapi | True | public | http://128.224.151.182:7777 |
| 4ce8c51e5d154b8a91a9857d81a7fba7 | RegionOne | keystone | identity | True | admin | http://192.168.204.2:5000/v3 |
| d2d5b4c42be449d2bf898364c77d68d1 | RegionOne | keystone | identity | True | internal | http://192.168.204.2:5000/v3 |
| 7b3f17e4dceb4a85be8a43820eeecece | RegionOne | keystone | identity | True | public | http://128.224.151.182:5000/v3 |
| 2687e0b07d1b48f4a7695574782cff33 | RegionOne | barbican | key-manager | True | admin | http://192.168.204.2:9311 |
| c2f1942b72bc459889a718c370258e24 | RegionOne | barbican | key-manager | True | internal | http://192.168.204.2:9311 |
| 6a78ec9e8ef841f1aa7a19b84796b860 | RegionOne | barbican | key-manager | True | public | http://128.224.151.182:9311 |
| 1892cc3b71af4462bf5442ee58f7234a | RegionOne | sysinv | platform | True | admin | http://192.168.204.2:6385/v1 |
| b39f4293903b45f2837ad62b69a5d3dc | RegionOne | sysinv | platform | True | internal | http://192.168.204.2:6385/v1 |
| 12f3bbe41e274762bcb7aa3a3b1901c6 | RegionOne | sysinv | platform | True | public | http://128.224.151.182:6385/v1 |
+----------------------------------+-----------+--------------+-----------------+---------+-----------+------------------------------
--+
[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list controller-1

[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list compute-0

[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list compute-9
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+------------------------------------------------+
| uuid | name | type | pci address | device | processor | accelerated | device type |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+------------------------------------------------+
| 63c04dc2-7e70-4fe5-8a0b-42a2b6f783b2 | enp3s0f0 | ethernet | 0000:03:00.0 | 0 | 0 | True | Ethernet Controller 10-Gigabit X540-AT2 |
| c34dc4f2-f9ca-45a6-9434-b301f31354f6 | enp3s0f1 | ethernet | 0000:03:00.1 | 0 | 0 | True | Ethernet Controller 10-Gigabit X540-AT2 |
| 311a8a47-582d-491d-ba36-e2cf01b45f6e | ens787f0 | ethernet | 0000:81:00.0 | 0 | 1 | True | Ethernet Controller X710 for 10GbE SFP+ |
| 1d5532f2-53f9-48e4-b066-da909ee5a31f | ens787f1 | ethernet | 0000:81:00.1 | 0 | 1 | True | Ethernet Controller X710 for 10GbE SFP+ |
| 7c819d6f-e47f-47a6-9557-0b51c4de9af0 | ens787f2 | ethernet | 0000:81:00.2 | 0 | 1 | True | Ethernet Controller X710 for 10GbE SFP+ |
| b059e079-5b31-4360-be5a-b2888b2751a2 | ens787f3 | ethernet | 0000:81:00.3 | 0 | 1 | True | Ethernet Controller X710 for 10GbE SFP+ |
| e95c17ac-b99d-4997-9d94-388a8a9adb23 | ens801f0 | ethernet | 0000:83:00.0 | 0 | 1 | True | 82599ES 10-Gigabit SFI/SFP+ Network Connection |
| 1420e0a2-36bd-485b-ab78-9a86ef5fb496 | ens801f1 | ethernet | 0000:83:00.1 | 0 | 1 | True | 82599ES 10-Gigabit SFI/SFP+ Network Connection |
| e1e940c7-9cc3-4453-b8ed-d8e7db4805bf | ens803f0 | ethernet | 0000:86:00.0 | 0 | 1 | True | 82599ES 10-Gigabit SFI/SFP+ Network Connection |
| 663f1523-2671-4612-9557-03eaf556b5df | ens803f1 | ethernet | 0000:86:00.1 | 0 | 1 | True | 82599ES 10-Gigabit SFI/SFP+ Network Connection |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list compute-6

Severity
--------
Major
Steps to Reproduce
------------------
1. Follow install procedure staring with installing controller-0
2. Once controller-0 installed ftp config files and lincense start running ansible with local hosts to configure

 ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap/bootstrap.yml -e "override_files_dir=/home/sysadmin/ ansible_become_pass=Li69nux*"
3.Bulk add hosts and power on all the nodes using system host-bulk-add

4. Power on all the nodes to install .
5. All the nodes are installed and as per description port-id is not listing for all the nodes except compute-5 and compute-9

Expected Behavior
------------------
All the hosts port list should be able to list
Actual Behavior
----------------
As per description some of the lab port-list were not listed.
Reproducibility
---------------
Seen once in 2+10 system .
Need to be tested with latest load.
System Configuration
--------------------
regular system
Branch/Pull Time/Commit
-----------------------
20190624T233000Z
Last Pass
---------
20190329T061946Z

Timestamp/Logs
--------------

Test Activity
-------------
Regression test

description: updated
Numan Waheed (nwaheed)
tags: added: stx.regression stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Jeyan, please add the logs. This cannot be investigated without logs.

description: updated
tags: added: stx.config
Changed in starlingx:
status: New → Incomplete
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Also can you comment on whether the port-list was visible if executed later on? i.e. is this a race condition?

Changed in starlingx:
assignee: nobody → Anujeyan Manokeran (anujeyan)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Any update on this?

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Ports are never showed . Couldn't continue the install. It was reinstalled then it showed and next install. It is not always reproduce able.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Has this issue been seen more than once?

Revision history for this message
Frank Miller (sensfan22) wrote :

Marking high priority due to impact (a complete re-install is required). Assigning to the networking PL to assign to a networking prime to investigate.

tags: added: stx.networking
Changed in starlingx:
status: Incomplete → Triaged
importance: Undecided → Medium
importance: Medium → High
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Anujeyan Manokeran (anujeyan) → Forrest Zhao (forrest.zhao)
tags: added: stx.2.0
Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → ChenjieXu (midone)
Revision history for this message
ChenjieXu (midone) wrote :

Hi Jeyan,

Could you please confirm that host-port-list doesn't work for compute-9 and compute-5? Because the following log shows that it works for compute-9:
[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list compute-9
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+------------------------------------------------+
| uuid | name | type | pci address | device | processor | accelerated | device type |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+------------------------------------------------+
| 63c04dc2-7e70-4fe5-8a0b-42a2b6f783b2 | enp3s0f0 | ethernet | 0000:03:00.0 | 0 | 0 | True | Ethernet Controller 10-Gigabit X540-AT2 |
| c34dc4f2-f9ca-45a6-9434-b301f31354f6 | enp3s0f1 | ethernet | 0000:03:00.1 | 0 | 0 | True | Ethernet Controller 10-Gigabit X540-AT2 |

Revision history for this message
ChenjieXu (midone) wrote :

Hi Jeyan,

The log you attached only contains log for controller-0. Please also collect logs for other nodes (controller-1, compute-0, compute-1, compute-2, compute-3, compute-4, compute-5, compute-6, compute-7, compute-8, compute-9).

Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

I thing collect logs was not working on controller-0 since there was no connection couldn't find the ports.
Jeyan

Revision history for this message
ChenjieXu (midone) wrote :

Hi Jeyan,

You can collect logs if you can access other nodes by ssh. If the network is not configured, maybe you can collect logs by logging into other nodes directly. It will be hard to debug without logs.

This bug seems to be related to installing multiple compute nodes at the same time. Could you please verify that the below command can get correct output or not?
   system host-disk-list ${COMPUTE}
   system host-cpu-list ${COMPUTE}
   system host-device-list ${COMPUTE}
   system host-memory-list ${COMPUTE}

Revision history for this message
Ghada Khalil (gkhalil) wrote :

If there are no logs from the initial occurrence and the issue is not reproducible in recent loads, I suggest closing this bug.

@Jeyan, Is this issue reproducible?

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

It is not reproduced in PV-1 . I will close this lp.

tags: removed: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing as the reporter confirmed that the issue is not reproducible.

Changed in starlingx:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.