Cannot source '/etc/platform/openrc'

Bug #1833157 reported by Juan Carlos Alonso
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Critical
Lin Shuicheng

Bug Description

Brief Description
-----------------
After unlock the controller-0 (Active) it does not boot correctly.
Then force a reboot to host, it boot correctly but cannot log in to keystone 'source /etc/platform/openrc'

controller-0:~$ source /etc/platform/openrc
Openstack Admin credentials can only be loaded from the active controller.

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
Steps to install and provision STX
$ system host-unlock controller-0
Host does not boot. Force reboot.
Try source /etc/platform/openrc

Expected Behavior
------------------
Can log in to keystone 'source /etc/platform/openrc' and continue provisioning steps.

Actual Behavior
----------------
Cannot log into keystone
Openstack Admin credentials can only be loaded from the active controller.

Reproducibility
---------------
<Reproducible/100%>

System Configuration
--------------------
AIO Simplex Virtual environment
BUILD_ID="20190616T233000Z"
Reproduced on: BUILD_ID="20190617T233000Z"
Baremetal Simplex, Duplex, Standard (2+2 and 2+2+2).

Logs
--------------
From collect attached

Test Activity
-------------
[Sanity]

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

With Image: BUILD_ID="20190617T233000Z", this bug was reproduced on baremetal. Simplex, Duplex, Standard (2+2 and 2+2+2).

Systems have been up for almost 2 hours and is still not possible to do the "source /etc/platform/openrc".

controller-0:~$ uptime
 11:22:33 up 1:53, 1 user, load average: 0.05, 0.08, 0.06
controller-0:~$ !source
source /etc/platform/openrc
Openstack Admin credentials can only be loaded from the active controller.

A collect from Standard Baremetal is attached.

description: updated
Revision history for this message
Bruce Jones (brucej) wrote :

Cindy please have someone look into this Asap and thanks!

Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
Ghada Khalil (gkhalil)
tags: added: stx.sanity
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Critical
Revision history for this message
Austin Sun (sunausti) wrote :

puppet is not successfully ,
2019-06-17T15:22:44.419 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Found meta data is "unclean", please apply-al first
2019-06-17T15:22:44.421 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdmeta 3 v08 /dev/cgts-vg/cgcs-lv internal dump-md' terminated with exit code 255
2019-06-17T15:22:44.423 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: 3: Failure: (127) Device minor not allocated
2019-06-17T15:22:44.425 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: additional info from kernel:
2019-06-17T15:22:44.427 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: unknown minor
2019-06-17T15:22:44.429 Debug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdsetup cstate 3' terminated with exit code 10

need to check why drbd is failed.

Changed in starlingx:
assignee: Cindy Xie (xxie1) → Lin Shuicheng (shuicheng)
Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Both log show there is drbd configuration failure in puppet.log. And puppet configuration must be success in order to run sm service. This is why source command cannot be used.

It seems there is network/IP lose issue in both log. And the IP lose cause DRBD failed to init.
For the 1st log (controller-0_20190617.192835.tar) simplex system:
The error message in puppet log is below:
2019-06-17T15:22:12.763 ^[[mNotice: 2019-06-17 15:22:12 +0000 /Stage[main]/Platform::Drbd::Extension/Platform::Drbd::Filesystem[drbd-extension]/Drbd::Resource[drbd-extension]/Drbd::Resource::Enable[drbd-extension]/Drbd::Resource::Up[drbd-extension]/Exec[reuse existing DRBD resource drbd-extension]/returns: IP 192.168.204.3 not found on this host.^[[0m
2019-06-17T15:22:12.769 ^[[1;31mError: 2019-06-17 15:22:12 +0000 drbdadm adjust drbd-extension returned 10 instead of one of [0]

puppet.log also show this mmgt IP is configured successfully. So it is lost after configuration.

For 2nd log (controller-0_20190618.092729.tar) standard system:
The first error message in puppet log is as below:
2019-06-18T08:35:27.009 ^[[mNotice: 2019-06-18 08:35:26 +0000 /Stage[main]/Platform::Sm/Exec[Configure Cluster Host Interface]/returns: sm-configure interface: error: too few arguments^[[0m
2019-06-18T08:35:27.010 ^[[1;31mError: 2019-06-18 08:35:26 +0000 sm-configure interface controller cluster-host-interface 239.1.1.1 2222 2223 192.168.206.4 2222 2223 returned 2 instead of one of [0]

Need further check what cause the IP/network issue.

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Both logs indicate management IP (192.168.204.3 for simplex and 10.10.54.3 for standard) is lost after configuration in puppet. Not sure what cause it. Have asked submitter to provide a live environment to further check it.

Revision history for this message
Cindy Xie (xxie1) wrote :

in the community call, Ada said that GDC root caused the issue was caused by test automation scripts (lack one step). should be fixed very soon.

tags: added: stx.storage
Cindy Xie (xxie1)
tags: added: stx.distro.other
removed: stx.storage
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Once this is confirmed as a procedural issue, please close as Invalid.
For a gating perspective, I've marked this as stx.2.0 gating as it is causing a red sanity.

Changed in starlingx:
status: New → In Progress
tags: added: stx.2.0
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Confirmed that this was caused by a new change required in the configuration of OAM and MGMT interfaces. We are going to adapt the automation that we have in place. Closing as Invalid.

Changed in starlingx:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.