Simplex Controller-0 does not boot correctly after unlock

Bug #1800757 reported by Juan Carlos Alonso
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Austin Sun

Bug Description

Title
-----

Simplex Controller-0 does not boot correctly after unlock

Brief Description
-----------------

Followed steps on wiki https://docs.starlingx.io/installation_guide/simplex.html, after provisioning and unlock Controller-0, it reboots, and controller status stays a long time in "Enabling Compute Services", then status change to "Critical failure, Auto-recovery, re-enabling", then connection lost.

Severity
--------

<Critical: System/Feature is not usable after the defect>

Steps to Reproduce
------------------

Follow steps in wiki: https://docs.starlingx.io/installation_guide/simplex.html

Expected Behavior
------------------

System up and running

Actual Behavior
----------------

System stays in "Critical failure, Auto-recovery, re-enabling". Connection lost.

Reproducibility
---------------

100%

System Configuration
--------------------

Simplex Virtual environment

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Please use the full bug template posted at: https://wiki.openstack.org/wiki/StarlingX/BugTemplate
The branch information needs to be added as well as the logs.

Changed in starlingx:
status: New → Incomplete
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Can you also indicate the last time this test-case passed? I don't believe this issue was not reported by Ada during the stx.2018.10 release testing.

tags: added: stx.config
Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

The last time a Simplex was deployed for testing with ISO: stx-2018-10-19-29-r-2018.10.iso
This issue is not related to any test case of Sanity Test nor Test Plan, I was deploying all configuration in virtual environment to start with automation of tests, that's why I figured out there is an issue after unlock.
Once a Simplex is deployed again I will get logs and upload them.

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

Set up a Simplex virtual environment again with today's ISO: stx-2018-11-01-8-master.iso

Status "Enabling Compute Services" faced again, seems that Controller-0 is good but after some minutes it went down and reboot with status: "Critical failure, Auto-recovery, re-enabling" presented again:

Broadcast message from root@controller-0 (Thu 2018-11-01 14:06:34 UTC):

The system is going down for reboot NOW!

During the time when Controller-0 is up and running, could not get any Error messages in /var/log/puppet/latest/puppet.log. There are not Error messages.

From /var/log/sm/log I just get:

2018-11-01T13:24:19.000 controller-0 sm: err fmSocket.cpp(140): Socket Error: Failed to write to fd:(63), len:(1584), rc:(-1), error:(Broken pipe)

Can someone replicate this issue?

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Bruce to assign to a developer in Intel to reproduce/investigate this issue.

tags: added: stx.2019.03
Changed in starlingx:
assignee: nobody → Bruce Jones (brucej)
importance: Undecided → High
Revision history for this message
Bruce Jones (brucej) wrote :

Cindy please assign someone to triage and root cause this issue, thanks!

Changed in starlingx:
assignee: Bruce Jones (brucej) → Cindy Xie (xxie1)
Revision history for this message
Austin Sun (sunausti) wrote :

could you provide /var/log/* and xml for vm definition ?

Changed in starlingx:
assignee: Cindy Xie (xxie1) → Austin Sun (sunausti)
Revision history for this message
Austin Sun (sunausti) wrote :

I have followed https://docs.starlingx.io/installation_guide/simplex.html to setup , but I did not meet this issue. could we close this issue ?

nodetype=controller
subfunction=controller,compute
system_type=All-in-one
security_profile=standard
management_interface=lo
INSTALL_UUID=5504aeea-18b5-4b33-82c4-a91cfee12208
UUID=cf59cbc6-eca1-4598-957a-750bb35913be
oam_interface=enp2s1
sdn_enabled=no
region_config=no
system_mode=simplex
sw_version=18.10
security_feature="nopti nospectre_v2"
vswitch_type=ovs-dpdk

[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

What ISO are you using?

Revision history for this message
Jose Perez Carranza (jgperezc) wrote :

I just testd with today master ISO (stx-2018-11-07-22-master.iso) and I can reproduce the issue, seems like nova services is not being activated, this issue were reported previously on the release validation cycle.

You can check the information collected at

http://paste.openstack.org/show/734439/

Revision history for this message
Austin Sun (sunausti) wrote :

I was deploy r-10 release , it was ok. today I deployed same master ISO(stx-2018-11-07-22-master.iso), it successfully deployed and unlocked.

and from http://paste.openstack.org/show/734439/, This description is not same as title. The controller is unlocked, but degraded, could you provide all log files under /var/log/ in controller-0 , your machine config (like what CPU type, NIC type, Disk etc) and your VM xml for further investigation ?
Thanks.

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

I figured out that nova-compute service was disabled and down:

+--------------------------------------+------------------+--------------+----------+----------+-------+
| Id | Binary | Host | Zone | Status | State |
+--------------------------------------+------------------+--------------+----------+----------+-------+
| 650c2034-e61e-4476-b03a-9e16e302038c | nova-compute | controller-0 | nova | disabled | down |
| 6a5aa77d-0550-4a36-b5f7-81c5ffcb29ce | nova-conductor | controller-0 | internal | enabled | up |
| c1d668a2-a580-4372-a583-b7794d1bbaad | nova-consoleauth | controller-0 | internal | enabled | up |
| c220223a-c83b-4c39-b87b-0c126bdf29f0 | nova-scheduler | controller-0 | internal | enabled | up |
+--------------------------------------+------------------+--------------+----------+----------+-------+

After execute 'nova service-enable <nova-compute-id>', nova-compute serice was enabled but still as down.

How can I up the nova-service?

I also attached the logs from /var/log/nova/

Revision history for this message
Austin Sun (sunausti) wrote :

Thanks Alonso, the nova was "disabled by VIM", could you provide all logs under /var/log and all-in-one xml for further analysis ?

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

All the content in /var/log

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

XML of All-in-one Simplex

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

Output from 'collect --all'

Revision history for this message
Austin Sun (sunausti) wrote :

Hi,Alonso:
    I think you are not fully following https://docs.starlingx.io/installation_guide/simplex.html
from this wiki, the vm should be setup with 6 cpu core, but in your controller-0.xml , only define 4 cpu core, could you change your VM xml to 6 cpu core and try ? I have tried in my setup, if use 4 cpu core, it will should degrade.
     wait your test result with new VM config
    Thanks.

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

I increased the CPUs to 6 and it provisioned good, controller-0 is unlocked, enabled and available.
I was able to run Installation, Provisioning and Sanity Test :)

Revision history for this message
Austin Sun (sunausti) wrote :

Great. then I close this bug. if you have new issue, please raise new one.
Thanks.

Changed in starlingx:
status: Incomplete → Invalid
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.