StarlingX

Cannot source '/etc/platform/openrc'

Bug #1833157 reported by Juan Carlos Alonso on 2019-06-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Invalid	Critical	Lin Shuicheng

Bug Description

Brief Description
-----------------
After unlock the controller-0 (Active) it does not boot correctly.
Then force a reboot to host, it boot correctly but cannot log in to keystone 'source /etc/platform/openrc'

controller-0:~$ source /etc/platform/openrc
Openstack Admin credentials can only be loaded from the active controller.

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
Steps to install and provision STX
$ system host-unlock controller-0
Host does not boot. Force reboot.
Try source /etc/platform/openrc

Expected Behavior
------------------
Can log in to keystone 'source /etc/platform/openrc' and continue provisioning steps.

Actual Behavior
----------------
Cannot log into keystone
Openstack Admin credentials can only be loaded from the active controller.

Reproducibility
---------------
<Reproducible/100%>

System Configuration
--------------------
AIO Simplex Virtual environment
BUILD_ID="20190616T233000Z"
Reproduced on: BUILD_ID="20190617T233000Z"
Baremetal Simplex, Duplex, Standard (2+2 and 2+2+2).

Logs
--------------
From collect attached

Test Activity
-------------
[Sanity]

See original description

Tags:

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-06-18:

controller-0_20190617.192835.tar Edit (16.2 MiB, application/x-tar)

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-06-18:

controller-0_20190618.092729.tar Edit (15.4 MiB, application/x-tar)

With Image: BUILD_ID="20190617T233000Z", this bug was reproduced on baremetal. Simplex, Duplex, Standard (2+2 and 2+2+2).

Systems have been up for almost 2 hours and is still not possible to do the "source /etc/platform/openrc".

controller-0:~$ uptime
11:22:33 up 1:53, 1 user, load average: 0.05, 0.08, 0.06
controller-0:~$ !source
source /etc/platform/openrc
Openstack Admin credentials can only be loaded from the active controller.

A collect from Standard Baremetal is attached.

description:

updated

Revision history for this message

Bruce Jones (brucej) wrote on 2019-06-18:

Cindy please have someone look into this Asap and thanks!

Changed in starlingx:
assignee:	nobody → Cindy Xie (xxie1)

Ghada Khalil (gkhalil) on 2019-06-18

tags:

added: stx.sanity

Ghada Khalil (gkhalil) on 2019-06-18

Changed in starlingx:
importance:	Undecided → Critical

Revision history for this message

Austin Sun (sunausti) wrote on 2019-06-19:

puppet is not successfully ,
2019-06-17T15:22:44.419 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Found meta data is "unclean", please apply-al first[0m
2019-06-17T15:22:44.421 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdmeta 3 v08 /dev/cgts-vg/cgcs-lv internal dump-md' terminated with exit code 255[0m
2019-06-17T15:22:44.423 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: 3: Failure: (127) Device minor not allocated[0m
2019-06-17T15:22:44.425 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: additional info from kernel:[0m
2019-06-17T15:22:44.427 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: unknown minor[0m
2019-06-17T15:22:44.429 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdsetup cstate 3' terminated with exit code 10[0m

need to check why drbd is failed.

puppet is not successfully , 
2019-06-17T15:22:44.419 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Found meta data is "unclean", please apply-al first[0m
2019-06-17T15:22:44.421 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdmeta 3 v08 /dev/cgts-vg/cgcs-lv internal dump-md' terminated with exit code 255[0m
2019-06-17T15:22:44.423 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: 3: Failure: (127) Device minor not allocated[0m
2019-06-17T15:22:44.425 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: additional info from kernel:[0m
2019-06-17T15:22:44.427 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: unknown minor[0m
2019-06-17T15:22:44.429 [0;36mDebug: 2019-06-17 15:22:44 +0000 /Stage[main]/Platform::Drbd::Cgcs/Platform::Drbd::Filesystem[drbd-cgcs]/Drbd::Resource[drbd-cgcs]/Drbd::Resource::Enable[drbd-cgcs]/Drbd::Resource::Up[drbd-cgcs]/Exec[initialize DRBD metadata for drbd-cgcs]/unless: Command 'drbdsetup cstate 3' terminated with exit code 10[0m

need to check why drbd is failed.

Lin Shuicheng (shuicheng) on 2019-06-19

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → Lin Shuicheng (shuicheng)

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2019-06-19:

Both log show there is drbd configuration failure in puppet.log. And puppet configuration must be success in order to run sm service. This is why source command cannot be used.

It seems there is network/IP lose issue in both log. And the IP lose cause DRBD failed to init.
For the 1st log (controller-0_20190617.192835.tar) simplex system:
The error message in puppet log is below:
2019-06-17T15:22:12.763 ^[[mNotice: 2019-06-17 15:22:12 +0000 /Stage[main]/Platform::Drbd::Extension/Platform::Drbd::Filesystem[drbd-extension]/Drbd::Resource[drbd-extension]/Drbd::Resource::Enable[drbd-extension]/Drbd::Resource::Up[drbd-extension]/Exec[reuse existing DRBD resource drbd-extension]/returns: IP 192.168.204.3 not found on this host.^[[0m
2019-06-17T15:22:12.769 ^[[1;31mError: 2019-06-17 15:22:12 +0000 drbdadm adjust drbd-extension returned 10 instead of one of [0]

puppet.log also show this mmgt IP is configured successfully. So it is lost after configuration.

For 2nd log (controller-0_20190618.092729.tar) standard system:
The first error message in puppet log is as below:
2019-06-18T08:35:27.009 ^[[mNotice: 2019-06-18 08:35:26 +0000 /Stage[main]/Platform::Sm/Exec[Configure Cluster Host Interface]/returns: sm-configure interface: error: too few arguments^[[0m
2019-06-18T08:35:27.010 ^[[1;31mError: 2019-06-18 08:35:26 +0000 sm-configure interface controller cluster-host-interface 239.1.1.1 2222 2223 192.168.206.4 2222 2223 returned 2 instead of one of [0]

Need further check what cause the IP/network issue.

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2019-06-19:

Both logs indicate management IP (192.168.204.3 for simplex and 10.10.54.3 for standard) is lost after configuration in puppet. Not sure what cause it. Have asked submitter to provide a live environment to further check it.

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-06-19:

in the community call, Ada said that GDC root caused the issue was caused by test automation scripts (lack one step). should be fixed very soon.

tags:

added: stx.storage

Cindy Xie (xxie1) on 2019-06-19

tags:

added: stx.distro.other
removed: stx.storage

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-06-19:

Once this is confirmed as a procedural issue, please close as Invalid.
For a gating perspective, I've marked this as stx.2.0 gating as it is causing a red sanity.

Changed in starlingx:
status:	New → In Progress
tags:	added: stx.2.0

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-06-19:

Confirmed that this was caused by a new change required in the configuration of OAM and MGMT interfaces. We are going to adapt the automation that we have in place. Closing as Invalid.