Upgrade activation failed, leading to swact

Bug #1954333 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Chris Friesen

Bug Description

Brief Description

       DC Central cloud upgrade activation failed. There was a swact soon after the activation and activation was in a failed state.

From the investigation below information was captured.

 code related to activation failure

 https://review.opendev.org/c/starlingx/stx-puppet/+/820418

This fails the puppet manifest as the Kubernetes isn’t up. I assume it won’t come up until etcd is restarted, which would normally be the next action in the puppet manifest.

Swacting between the controllers restarts etcd

 **

2021-12-08 19:37:44,664 p=601519 u=root | changed: [localhost] => (item=apiserver-etcd-client.crt)

2021-12-08 19:37:44,750 p=601519 u=root | changed: [localhost] => (item=apiserver-etcd-client.key)

2021-12-08 19:37:44,821 p=601519 u=root | TASK [Create list of etcd classes to pass to puppet] ***************************

2021-12-08 19:37:44,822 p=601519 u=root | Wednesday 08 December 2021 19:37:44 +0000 (0:00:00.273) 0:00:09.706 ****

2021-12-08 19:37:45,111 p=601519 u=root | changed: [localhost]

2021-12-08 19:37:45,180 p=601519 u=root | TASK [Applying puppet for enabling etcd security] ******************************

2021-12-08 19:37:45,180 p=601519 u=root | Wednesday 08 December 2021 19:37:45 +0000 (0:00:00.358) 0:00:10.064 ****

2021-12-08 19:38:06,749 p=601519 u=root | fatal: [localhost]: FAILED! => changed=true

  cmd:

  - /usr/local/bin/puppet-manifest-apply.sh

  - /opt/platform/puppet/21.12/hieradata/

  - fd01:1::3

  - controller

  - runtime

  - /tmp/etcd.yml

  delta: '0:00:21.453324'

  end: '2021-12-08 19:38:06.732159'

  msg: non-zero return code

  rc: 1

  start: '2021-12-08 19:37:45.278835'

  stderr: ''

  stderr_lines: []

  stdout: |-

    Applying puppet runtime manifest...

    [WARNING]

    Warnings found. See /var/log/puppet/2021-12-08-19-37-45_runtime/puppet.log for details

  stdout_lines: <omitted>

2021-12-08T19:38:06.510 /usr/share/ruby/vendor_ruby/puppet/util/command_line.rb:72:in `execute'

2021-12-08T19:38:06.511 /usr/bin/puppet:5:in `<main>'^[[0m

2021-12-08T19:38:06.513 [[1;31mError: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Kubernetes::Master::Change_apiserver_parameters/Exec[wait_for_kube_api_server]/returns: change from notrun to 0 failed: Command exceeded timeout[[0m

2021-12-08T19:38:06.515 [[0;36mDebug: 2021-12-08 19:38:06 +0000 Class[Platform::Kubernetes::Master::Change_apiserver_parameters]: Resource is being skipped, unscheduling all events[[0m

2021-12-08T19:38:06.516 [[0;32mInfo: 2021-12-08 19:38:06 +0000 Class[Platform::Kubernetes::Master::Change_apiserver_parameters]: Unscheduling all events on Class[Platform::Kubernetes::Master::Change_apiserver_parameters][[0m

2021-12-08T19:38:06.518 [[0;36mDebug: 2021-12-08 19:38:06 +0000 Platform::Sm::Restart[etcd]: Resource is being skipped, unscheduling all events[[0m

2021-12-08T19:38:06.519 [[mNotice: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Etcd::Upgrade::Runtime/Platform::Sm::Restart[etcd]/Exec[sm-restart-etcd]: Dependency Exec[wait_for_kube_api_server] has failures: true[[0m

2021-12-08T19:38:06.521 [[1;33mWarning: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Etcd::Upgrade::Runtime/Platform::Sm::Restart[etcd]/Exec[sm-restart-etcd]: Skipping because of failed dependencies[[0m

 **

Severity

Major

Steps to Reproduce

    Follow upgrade procedure to upgrade DC central cloud from 21.05 21.12 .

 Upgrade activation failure was seen during the upgrade activation step.

Expected Behavior

Upgrade activation success

Actual Behavior

As per description upgrade activation failed

Reproducibility

 **

System Configuration

DC-1 Distributed system

Branch/Pull Time/Commit

2021-12-06_23-00-09

Last Pass

"2021-12-04_23-00-07"

Revision history for this message
Chris Friesen (cbf123) wrote :

The fix delivered for bug 1953183 in the Dec 6 load appears to have triggered this issue, as noted in puppet.log on upgrade-activate:

2021-12-08T19:38:06.519 ^[[mNotice: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Etcd::Upgrade::Runtime/Platform::Sm::Restart[etcd]/Exec[sm-restart-etcd]: Dependency Exec[wait_for_kube_api_server] has failures: true^[[0m

Workaround in the lab to progress the upgrade was to revert the changes in https://ala-codereviewti-prod.wrs.com/plugins/gitiles/cgcs/opendev.org.starlingx.stx-puppet/+/89f051d03ecad99f857088320dcda79a5a11fcb4 and rerun the upgrade-activate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/821323

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/821323
Committed: https://opendev.org/starlingx/stx-puppet/commit/be27be2bfbe6251352a5b1989c759b224d47204f
Submitter: "Zuul (22348)"
Branch: master

commit be27be2bfbe6251352a5b1989c759b224d47204f
Author: Chris Friesen <email address hidden>
Date: Thu Dec 9 15:34:15 2021 -0600

    Revert "Wait for kube apiserver after apply service parameter"

    This reverts commit a1b99570ae0fd387cb0fe975d8974dcb4edbf367.

    Reason for revert: This causes upgrade activation to fail. We're
    working on an alternative fix, but reverting this immediately to
    unblock current testing activities.

    Closes-bug: 1954333
    Signed-off-by: Chris Friesen <email address hidden>
    Change-Id: Ibf09c83cc4192ed5505ada736d51fabf025865d0

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
description: updated
Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
importance: Undecided → Medium
tags: added: stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (r/stx.6.0)

Fix proposed to branch: r/stx.6.0
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/821421

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (r/stx.6.0)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/821421
Committed: https://opendev.org/starlingx/stx-puppet/commit/d44434a00374966cde79cc30838ab377b3c4d716
Submitter: "Zuul (22348)"
Branch: r/stx.6.0

commit d44434a00374966cde79cc30838ab377b3c4d716
Author: Chris Friesen <email address hidden>
Date: Thu Dec 9 15:34:15 2021 -0600

    Revert "Wait for kube apiserver after apply service parameter"

    This reverts commit a1b99570ae0fd387cb0fe975d8974dcb4edbf367.

    Reason for revert: This causes upgrade activation to fail. We're
    working on an alternative fix, but reverting this immediately to
    unblock current testing activities.

    Closes-bug: 1954333
    Signed-off-by: Chris Friesen <email address hidden>
    Change-Id: Ibf09c83cc4192ed5505ada736d51fabf025865d0
    (cherry picked from commit be27be2bfbe6251352a5b1989c759b224d47204f)

Ghada Khalil (gkhalil)
tags: added: stx.6.0 stx.cherrypickneeded
Ghada Khalil (gkhalil)
tags: added: in-r-stx60
removed: stx.cherrypickneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.