Comment 0 for bug 1954333

Revision history for this message
Chris Friesen (cbf123) wrote :

Brief Description

       DC Central cloud upgrade activation failed. There was a swact soon after the activation and activation was in a failed state.

From the investigation below information was captured.

 code related to activation failure

 https://review.opendev.org/c/starlingx/stx-puppet/+/820418

This fails the puppet manifest as the Kubernetes isn’t up. I assume it won’t come up until etcd is restarted, which would normally be the next action in the puppet manifest.

http://bitbucket.wrs.com/projects/CGCS/repos/opendev.org.starlingx.stx-puppet/browse/puppet-manifests/src/modules/platform/manifests/etcd.pp?at=refs%2Fheads%2FWRCP_21.12#184

Swacting between the controllers restarts etcd

 **

2021-12-08 19:37:44,664 p=601519 u=root | changed: [localhost] => (item=apiserver-etcd-client.crt)

2021-12-08 19:37:44,750 p=601519 u=root | changed: [localhost] => (item=apiserver-etcd-client.key)

2021-12-08 19:37:44,821 p=601519 u=root | TASK [Create list of etcd classes to pass to puppet] ***************************

2021-12-08 19:37:44,822 p=601519 u=root | Wednesday 08 December 2021 19:37:44 +0000 (0:00:00.273) 0:00:09.706 ****

2021-12-08 19:37:45,111 p=601519 u=root | changed: [localhost]

2021-12-08 19:37:45,180 p=601519 u=root | TASK [Applying puppet for enabling etcd security] ******************************

2021-12-08 19:37:45,180 p=601519 u=root | Wednesday 08 December 2021 19:37:45 +0000 (0:00:00.358) 0:00:10.064 ****

2021-12-08 19:38:06,749 p=601519 u=root | fatal: [localhost]: FAILED! => changed=true

  cmd:

  - /usr/local/bin/puppet-manifest-apply.sh

  - /opt/platform/puppet/21.12/hieradata/

  - fd01:1::3

  - controller

  - runtime

  - /tmp/etcd.yml

  delta: '0:00:21.453324'

  end: '2021-12-08 19:38:06.732159'

  msg: non-zero return code

  rc: 1

  start: '2021-12-08 19:37:45.278835'

  stderr: ''

  stderr_lines: []

  stdout: |-

    Applying puppet runtime manifest...

    [WARNING]

    Warnings found. See /var/log/puppet/2021-12-08-19-37-45_runtime/puppet.log for details

  stdout_lines: <omitted>

2021-12-08T19:38:06.510 /usr/share/ruby/vendor_ruby/puppet/util/command_line.rb:72:in `execute'

2021-12-08T19:38:06.511 /usr/bin/puppet:5:in `<main>'^[[0m

2021-12-08T19:38:06.513 [[1;31mError: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Kubernetes::Master::Change_apiserver_parameters/Exec[wait_for_kube_api_server]/returns: change from notrun to 0 failed: Command exceeded timeout[[0m

2021-12-08T19:38:06.515 [[0;36mDebug: 2021-12-08 19:38:06 +0000 Class[Platform::Kubernetes::Master::Change_apiserver_parameters]: Resource is being skipped, unscheduling all events[[0m

2021-12-08T19:38:06.516 [[0;32mInfo: 2021-12-08 19:38:06 +0000 Class[Platform::Kubernetes::Master::Change_apiserver_parameters]: Unscheduling all events on Class[Platform::Kubernetes::Master::Change_apiserver_parameters][[0m

2021-12-08T19:38:06.518 [[0;36mDebug: 2021-12-08 19:38:06 +0000 Platform::Sm::Restart[etcd]: Resource is being skipped, unscheduling all events[[0m

2021-12-08T19:38:06.519 [[mNotice: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Etcd::Upgrade::Runtime/Platform::Sm::Restart[etcd]/Exec[sm-restart-etcd]: Dependency Exec[wait_for_kube_api_server] has failures: true[[0m

2021-12-08T19:38:06.521 [[1;33mWarning: 2021-12-08 19:38:06 +0000 /Stage[main]/Platform::Etcd::Upgrade::Runtime/Platform::Sm::Restart[etcd]/Exec[sm-restart-etcd]: Skipping because of failed dependencies[[0m

 **

Severity

Major

Steps to Reproduce

    Follow upgrade procedure to upgrade DC central cloud from 21.05 21.12 .

 Upgrade activation failure was seen during the upgrade activation step.

Expected Behavior

Upgrade activation success

Actual Behavior

As per description upgrade activation failed

Reproducibility

 **

System Configuration

DC-1 Distributed system

Branch/Pull Time/Commit

2021-12-06_23-00-09

Last Pass

"2021-12-04_23-00-07"