duplex system controller-1 fail to boot after unlock

Bug #1860529 reported by Lin Shuicheng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Lin Shuicheng

Bug Description

Brief Description
-----------------
With below patch [0] code merged, duplex mode deployment will fail at controller-1 unlock.
controller-1 will fail at puppet apply after unlock. It is due to controller-1 doesn't have "/opt/platform/config" directory, and cause code [1] fail to run, and lead to controller-1 fail to boot success after unlock.

[0]: https://review.opendev.org/#/c/703266/
[1]: https://review.opendev.org/#/c/703266/2/puppet-manifests/src/modules/platform/manifests/dockerdistribution.pp

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
1. Sync latest code.
2. Do duplex deployment.
3. controller-1 fail to boot after unlock

Expected Behavior
------------------
controller-1 could unlock successfully.

Actual Behavior
----------------
controller-1 fail to boot. And there is Error in worker puppet log.

Reproducibility
---------------
100%

System Configuration
--------------------
duplex system

Branch/Pull Time/Commit
-----------------------
Latest code after patch https://review.opendev.org/#/c/703266

Last Pass
---------
Code before https://review.opendev.org/#/c/703266

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Sanity

 Workaround
 ----------
 Describe workaround if available

Changed in starlingx:
assignee: nobody → Lin Shuicheng (shuicheng)
Changed in starlingx:
assignee: Lin Shuicheng (shuicheng) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/703783
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=c94fa4a0174b96e0716d39bbea7e6fbbbee415a9
Submitter: Zuul
Branch: master

commit c94fa4a0174b96e0716d39bbea7e6fbbbee415a9
Author: Shuicheng Lin <email address hidden>
Date: Thu Jan 23 02:45:31 2020 +0800

    Fix duplex system controller-1 fail to boot after unlock

    It is due to controller-1 doesn't have /opt/platform/config folder.
    And cause puppet failure due to using non-exist file as source.
    Restrict the code for worker node only, since controller node
    already has ca cert in the ssl folder.

    Test:
    Pass simplex/duplex/multi node deployment with vm created.

    Closes-Bug: 1860529
    Change-Id: I808ee15e5c78ebead114219d0ec428fb45cc9128
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / critical -- issue introduced by recent kata container code changes

tags: added: stx.config
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Lin Shuicheng (shuicheng)
importance: High → Critical
tags: added: stx.4.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705852

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (9.5 KiB)

Reviewed: https://review.opendev.org/705852
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=e1f095eb112f76a133734a17f01afeb9828ebaf2
Submitter: Zuul
Branch: f/centos8

commit fc7b9b3d8d811fd50427b584dae5b7488947bb03
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 13:57:52 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Change-Id: I6bb5ad0379f576f66d77a90dfdca94f5e0f28f0c
    Closes-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

commit 950670ac1f0bfaa43e29eeb3ffda71a94de66520
Author: Jim Somerville <email address hidden>
Date: Mon Jan 27 17:09:52 2020 -0500

    Security: Add nospectre_v1 to the security params

    Most of the v1 mitigation is baked into the kernel and not
    optional. The swapgs barriers are, however, optional.
    They have a negative performance impact so we disable them
    by using the nospectre_v1 kernel bootarg.

    Partial-Bug: 1860193
    Depends-On: https://review.opendev.org/#/c/704406
    Change-Id: Iaa11ba3f430fc064ebda679cf290474d3be413da
    Signed-off-by: Jim Somerville <email address hidden>

commit 83775d38804fb665af518127051b37a1daf31e36
Author: David Sullivan <email address hidden>
Date: Wed Jan 15 23:50:23 2020 -0500

    Install secondary controller nodes with kubeadm join

    Kubeadm init is no longer supported for installing secondary nodes in an
    HA kubernetes cluster. kubeadm join with the --controller-plane option
    should be used.

    Change-Id: I21a30b9e871d05c59a19e33a9d278f0217682da6
    Closes-Bug: 1846829
    Depends-On: https://review.opendev.org/702797
    Signed-off-by: David Sullivan <email address hidden>

commit c94fa4a0174b96e0716d39bbea7e6fbbbee415a9
Author: Shuicheng Lin <email address hidden>
Date: Thu Jan 23 02:45:31 2020 +0800

    Fix duplex system controller-1 fail to boot after unlock

    It is due to controller-1 doesn't have /opt/platform/config folder.
    And cause puppet failure due to using non-exist file as source.
    Restrict the code for worker node only, since controller node
    already has ca cert in the ssl folder.

    Test:
    Pass simplex/duplex/multi node deployment with vm created.

    Closes-Bug: 1860529
    Change-Id: I808ee15e5c78ebead114219d0ec428fb45cc9128
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 27f167eb14a04bc67ecca59af3b617c115522101
Author: Angie Wang <email address hidden>
Date: Wed Jan 15 16:15:26 2020 -0500

    Remove puppet-manifests code made obsolete by ansible

    As a result of switch to Ansible, remove the obsolete erb
    templates and remove the dependency of is_initial_config_primary
    facter.

    Change-Id: I4ca6525f01a37da971dc66a11ee99ea4e115e3ad
    Partial-Bug: 1834218
    Depends-On: https://review.opendev.org/#/c/703517/
 ...

Read more...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.