Unable to create coredump on a Standard system using Annotations

Bug #1996054 reported by Heron Vieira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Heron Vieira

Bug Description

Brief Description
-----------------
Unable to create coredump on a Standard system using Annotations
Token is not getting saved on all nodes but just the active controller.

Severity
--------
Provide the severity of the defect.
Major

Steps to Reproduce
------------------

1. Install a Standard system with Debian load
2. Launch a test pod with annotations for k8s-coredump
3. The pod is created on one of the compute nodes
4. Bash into the pod and run "kill -s SIGTRAP $(pgrep sleep)"
5. No coredump file is created in /var/log/coredump (assuming this the persisted volume in the host)
6. cat /var/log/k8s-coredump.log 401 error

Expected Behavior
------------------
Coredump should be created in /var/log/coredump
Token must be saved on all nodes irrespective of their personality

Actual Behavior
----------------
No coredump is created and the token is not present on other nodes except the active controller.

Reproducibility
---------------
100%

System Configuration
--------------------
Standard IPv4

Branch/Pull Time/Commit
-----------------------
https://review.opendev.org/c/starlingx/ansible-playbooks/+/861222

Last Pass
---------

Timestamp/Logs
--------------
sysadmin@compute-0:~$ cat /var/log/k8s-coredump.log
CRITICAL:k8s-coredump:Process 1418049 (9223372036854775808) of user 0 dumped core.
DEBUG:k8s-coredump:lookupPod: podUID=1d960d7e-4c50-4e6a-9941-4ec8b3b50578
ERROR:k8s-coredump:Error: File does not appear to exist.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): localhost:10250
DEBUG:urllib3.connectionpool:https://localhost:10250 "GET /pods HTTP/1.1" 401 12

Test Activity
-------------
Feature Testing

Workaround
----------
Copy the k8s-coredump-conf.json file to all the other nodes including the node where the pod is created, then the feature works as expected

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/864114

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/864115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/866207

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/864115
Committed: https://opendev.org/starlingx/utilities/commit/1dc658483af00a4141f50b540cd40a7d67a8140b
Submitter: "Zuul (22348)"
Branch: master

commit 1dc658483af00a4141f50b540cd40a7d67a8140b
Author: Heron Vieira <email address hidden>
Date: Wed Nov 9 10:15:50 2022 -0300

    Fix core_pattern and add token creation script

    Remove trailing double quotes from k8s-coredump-handler
    debian kernel.core_pattern and add a shell script
    that creates the k8s-coredump token that will be used
    by the upgrade procedure on a Standard System.

    Test Plan:
    PASS: Install and bootstrap system for a Standard
      configuration.
    PASS: Verify if kernel.core_pattern is not
          with a trailing double quote.
    PASS: Install standard 22.06, upgrade to 22.12
      and verify if token is created correctly on
      all nodes.

    Regression:

    PASS: After bootstrap, create and crash a pod with
          annotations configured and verify if coredump
          is generated on pod namespace on each node.
    PASS: After bootstrap, crash a non k8s application
          and verify that the coredump is generated as
          previously (by systemd-coredump) on each node.

    Closes-bug: 1996054

    Signed-off-by: Heron Vieira <email address hidden>
    Change-Id: I8b2e8fdefe093f4c3cdf12c65910e16f0fd7a351

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/864113
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/0f7a6adb32d17a3e8b0ec8b64f95e0b65e0fe3ce
Submitter: "Zuul (22348)"
Branch: master

commit 0f7a6adb32d17a3e8b0ec8b64f95e0b65e0fe3ce
Author: Heron Vieira <email address hidden>
Date: Wed Nov 9 10:07:02 2022 -0300

    Copy k8s-coredump token for nodes configuration

    Make a copy of the k8s-coredump token on
    /opt/platform/config/{software_version}/k8s-coredump-conf.json
    so it can be used to configure other nodes
    (secondary controllers and worker nodes).

    Test Plan:
    PASS: Install and bootstrap Standard system
    PASS: Install and bootstrap AIO-DX system
    PASS: Verify if /etc/k8s-coredump-conf.json file is copied on
          /opt/platform/config/{software_version}/k8s-coredump-conf.json
    PASS: Install standard 22.06, upgrade to 22.12
      and verify if token is created correctly on
      all nodes.
    PASS: Install AIO-DX 22.06, upgrade to 22.12
      and verify if token is created correctly on
      all nodes.

    Regression:

    PASS: After bootstrap, create and crash a pod with
          annotations configured and verify if coredump
          is generated on pod namespace on each node.
    PASS: After bootstrap, crash a non k8s application
          and verify that the coredump is generated as
          previously (by systemd-coredump) on each node.

    Closes-bug: 1996054

    Signed-off-by: Heron Vieira <email address hidden>
    Change-Id: I7e477e687ab128d1043a0d041bacd508baf2c95b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/864114
Committed: https://opendev.org/starlingx/config/commit/638eb21ccdee8fb7b252fd2b384a4eb0de2eec82
Submitter: "Zuul (22348)"
Branch: master

commit 638eb21ccdee8fb7b252fd2b384a4eb0de2eec82
Author: Heron Vieira <email address hidden>
Date: Wed Nov 9 10:10:08 2022 -0300

    Configure k8s-coredump token on other nodes

    Copy k8s-coredump token on install for secondary
    controller nodes and worker nodes.

    Test Plan:
    PASS: Install and bootstrap Standard system
    PASS: Verify if /etc/k8s-coredump-conf.json file is
          created on all controller and compute nodes.

    Regression:

    PASS: After bootstrap, create and crash a pod with
          annotations configured and verify if coredump
          is generated on pod namespace on each node.
    PASS: After bootstrap, crash a non k8s application
          and verify that the coredump is generated as
          previously (by systemd-coredump) on each node.

    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/864113
    Closes-bug: 1996054

    Signed-off-by: Heron Vieira <email address hidden>
    Change-Id: Ib15b84ca8cc8ca870a21d314f6ee2b7193532aa1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/866207
Committed: https://opendev.org/starlingx/stx-puppet/commit/ea4c05156e9e48f86c6952e946511b9d3b2aaa50
Submitter: "Zuul (22348)"
Branch: master

commit ea4c05156e9e48f86c6952e946511b9d3b2aaa50
Author: Heron Vieira <email address hidden>
Date: Wed Nov 30 16:09:29 2022 -0300

    Create k8s-coredump token for standard upgrade

    Tasks to create k8s-coredump token using shell
    script create-k8s-account.sh and copy created token
    to upgrade config dir (/opt/platform/config/SW_VERSION)
    to allow worker node to make a copy for itself on the
    token path (/etc/k8s-coredump-conf.json) through the
    worker_config script.

    Test Plan:
    PASS: Install and bootstrap system
    PASS: Install standard 22.06, upgrade to 22.12
      and verify if token is created correctly on
      all nodes.
    PASS: Install AIO-DX 22.06, upgrade to 22.12
      and verify if token is created correctly on
      all nodes.

    Regression:

    PASS: After bootstrap, create and crash a pod with
          annotations configured and verify if coredump
          is generated on pod namespace on each node.
    PASS: After bootstrap, crash a non k8s application
          and verify that the coredump is generated as
          previously (by systemd-coredump) on each node.

    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/864115
    Depends-On: https://review.opendev.org/c/starlingx/config/+/864114
    Closes-bug: 1996054
    Change-Id: Ia5b00963302dd67d763cf86af694bf6c7a2e4bd1

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Heron Vieira (hevieira)
importance: Undecided → Medium
tags: added: stx.8.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.