Nodes are in unuseble state. Possible typo "'NoneType' object has no attribute 'startswith'"

Bug #1915864 reported by Alexandru Dimofte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Critical
Zhixiong Chi

Bug Description

Brief Description
-----------------
For Standard configuration(baremetal) I observed during provision fails installing compute-0.
It remains in locked, disabled, offline and checking the /var/log/sysinv.log I observed this:
sysinv 2021-02-16 18:09:15.176 102434 WARNING sysinv.conductor.manager [-] Failed check_nodes_stable. ('NoneType' object has no attribute 'startswith')
sysinv 2021-02-16 18:09:15.177 102434 INFO sysinv.conductor.manager [-] Node(s) are in an unstable state. Defer audit.
sysinv 2021-02-16 18:10:15.178 102434 WARNING sysinv.conductor.manager [-] Failed check_nodes_stable. ('NoneType' object has no attribute 'startswith')
sysinv 2021-02-16 18:10:15.178 102434 INFO sysinv.conductor.manager [-] Node(s) are in an unstable state. Defer audit.
sysinv 2021-02-16 18:11:15.194 102434 WARNING sysinv.conductor.manager [-] Failed check_nodes_stable. ('NoneType' object has no attribute 'startswith')
sysinv 2021-02-16 18:11:15.195 102434 INFO sysinv.conductor.manager [-] Node(s) are in an unstable state. Defer audit.

startswith is a typo?!

Severity
--------
Provide the severity of the defect.
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
Try to install stx on std baremetal. It will fail during provisioning.

Expected Behavior
------------------
Installation should work, computes and storages should be online and no such errors on sysinv.log

Actual Behavior
----------------
Provisioning fails for Standard configuration baremetal.

Reproducibility
---------------
For me is 100% reproducible

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
If I am not wrong 2 or 3 days ago.

Timestamp/Logs
--------------
I will attach collected logs.

Test Activity
-------------
Sanity

Workaround
----------
-

CVE References

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

Collected logs are attached.

Revision history for this message
Ghada Khalil (gkhalil) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on the changelogs, the additional commits between the two loads are:
./stx-tools 07b8d07a36942e88b45bc5fa7c95cfa76af08463 2021-02-15 15:06:36 +0000 Gerrit Code Review <email address hidden> Merge "nspr/nss/nss-softokn/nss-util: CVE-2018-12404 and CVE-2019-11745"
./stx-tools 6ed078685c413b7199cd18bdde6cb8ad33fdb711 2021-01-28 21:36:35 -0500 Zhixiong Chi <email address hidden> nspr/nss/nss-softokn/nss-util: CVE-2018-12404 and CVE-2019-11745

See:
http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/flock/20210216T003258Z/outputs/CHANGELOG.txt

tags: added: stx.5.0 stx.distro.other
Changed in starlingx:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Zhixiong Chi (zhixiongchi)
Revision history for this message
Zhixiong Chi (zhixiongchi) wrote :

In fact, the merged commit 07b8d07a36942e88b45bc5fa7c95cfa76af08463 doesn't modify anything, since the same previous commit https://review.opendev.org/plugins/gitiles/starlingx/tools/+/ebc9f32d7de526b44234a56423dd6785243cc386 had been merged firstly.

Next step I will continue investigate the root cause.

Revision history for this message
Zhixiong Chi (zhixiongchi) wrote :

According to the following error information:
 sysinv.conductor.manager [-] Failed check_nodes_stable. ('NoneType' object has no attribute 'startswith')
The call stack is as below(the file sysinv/sysinv/sysinv/sysinv/conductor/manager.py):
   function check_nodes_stable
     hosts = self.dbapi.ihost_get_list()
       ...
       host.vim_progress_status.startswith(...)

then use "except Exception as e" to catch the exception NoneType's startswith. That means the hosts is none. we couldn't got the nodes information by the dbapi.ihost_get_list function.
Even though I couldn't find any relationship between this issue and nss packages upgraded so far,
I find that, as the last comment shown, the following commits

./stx-tools 07b8d07a36942e88b45bc5fa7c95cfa76af08463 2021-02-15 15:06:36 +0000 Gerrit Code Review <email address hidden> Merge "nspr/nss/nss-softokn/nss-util: CVE-2018-12404 and CVE-2019-11745"
./stx-tools 6ed078685c413b7199cd18bdde6cb8ad33fdb711 2021-01-28 21:36:35 -0500 Zhixiong Chi <email address hidden> nspr/nss/nss-softokn/nss-util: CVE-2018-12404 and CVE-2019-11745

are the NULL operation, because we merge the same code in the commit https://review.opendev.org/plugins/gitiles/starlingx/tools/+/ebc9f32d7de526b44234a56423dd6785243cc386, which was provided by Joe Slater.
If this issue is introduced by this upgraded nss package, it will be possible Joe can provide some more useful information and diagnosis for it.

Hi Alexandru Dimofte,
Could you provide the detailed command and steps to reproduce this issue? Thanks.

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

Hi Zhixiong,
On my case the issue is visible only on Standard configuration and only for baremetal, during the provisioning. So I am unable to install stx in the end.
At some point I see:
system host-show compute-0|grep -w install_state|awk '{print$4}'
so.. it is checking the install state of compute-0 which should be "completed", but it is "None":

Fails if objects are unequal after converting them to strings.
Start / End / Elapsed: 20210218 11:25:27.775 / 20210218 11:25:27.776 / 00:00:00.001
11:25:27.776 FAIL None != completed

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Just to clarify, I simply compared the changelogs between the green build and the red build and asked Zhixiong to investigate if it could be related to these commits. I'm not sure if it is or not.
The other question would be if there has been a recent change in the sanity setup/automation at Intel.

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

During the weekend I tried to reinstall some older green images of stx on standard baremetal and I was shocked to see that the error is still visible. I suspected our validation S1_STANDARD setup, this is why today I triggered the latest available image on our second setup S2_STANDARD and I can say that it passed.
So, now it is clear that we have some issues with our S1_STANDARD setup, the image is fine and this bug should be considered INVALID. Thank you and sorry for consuming your time with this.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking this LP as Invalid. As per above, it was confirmed that this is an issue with the sanity setup at Intel not a StarlingX software issue.

Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.