snapshot related cephfs and rbd provisioner error messages

Bug #2045897 reported by Gabriel de Araújo Cabral
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Gabriel de Araújo Cabral

Bug Description

Brief Description
-----------------
The platform-integ-apps snapshotter containers throw error messages, about 8,000 times per day per node.

Severity
--------
Minor

Steps to Reproduce
------------------
Apply platform-integ-apps and see the error logs of csi-snapshotter container.
Use the command "kubectl logs -n kube-system <cephFS or RBD pod> -c csi-snapshotter"

Expected Behavior
------------------
No error messages

Actual Behavior
----------------
Error messages

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
One node system, Two node system, Multi-node system, Dedicated storage

Branch/Pull Time/Commit
-----------------------

Timestamp/Logs
--------------
I1112 20:51:36.080907 1 reflector.go:255] Listing and watching *v1.VolumeSnapshotClass from github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117
E1112 20:51:36.081694 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)
I1112 20:52:26.660060 1 reflector.go:255] Listing and watching *v1.VolumeSnapshotClass from github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117

Workaround
----------
Create the CRDs and the snapshot-controller manually

Changed in starlingx:
assignee: nobody → Gabriel de Araújo Cabral (g-cabral)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/902811
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/99d1b973269fbe805596c80d25a10157855cde3c
Submitter: "Zuul (22348)"
Branch: master

commit 99d1b973269fbe805596c80d25a10157855cde3c
Author: Gabriel de Araújo Cabral <email address hidden>
Date: Wed Dec 6 16:35:22 2023 -0300

    Enable support for PVC snapshots by default during bootstrap

    Previously, the “enable_volume_snapshot_support” variable was false
    by default. As a result, when running the bootstrap playbook, the
    CRDs, and the snapshot-controller were not created to support taking
    snapshots of PVCs.

    With the change, when running the bootstrap playbook, the CRDs and
    the snapshot-controller will be created by default according to the
    K8s version on the system.

    The main benefit of this change is that CSI apps will not need to
    worry about creating CRDs and the snapshot-controller to be able to
    take snapshots of PVCs, in addition to avoiding possible problems,
    since the resource creation process and the version used will be in
    common for all CSI apps.
    Furthermore, if the user is not going to use these resources, he can
    choose to run the bootstrap manually by passing the variable to
    false as a parameter to not create.

    Currently, the version of the CRDs and snapshot-controller present
    in the system is very old, referring to k8s 1.19 with the
    'snapshot-controller:v2.0.0-rc2' image. Due to this, the creation of
    VolumeSnapshots does not occur correctly with new versions of some
    CSI apps.
    Therefore, the current change depends on the review below, where
    the CRDs and snapshot-controller are updated according to the
    K8s version, working as expected.

    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/887797

    Test Plan:
     PASS: Build an image with the code change
     PASS: AIO-SX fresh install + Check if the CRDs
           and snapshot-controller are created
     PASS: AIO-DX fresh install + Check if the CRDs
           and snapshot-controller are created

    Closes-bug: 2045897

    Change-Id: I81c34bd8745a7c06b0fce5fd20a43bd8be73d982
    Signed-off-by: Gabriel de Araújo Cabral <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to platform-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/platform-armada-app/+/902722
Committed: https://opendev.org/starlingx/platform-armada-app/commit/8b1f987add87d217f720f505b9f65f80fc8dbd20
Submitter: "Zuul (22348)"
Branch: master

commit 8b1f987add87d217f720f505b9f65f80fc8dbd20
Author: Gabriel de Araújo Cabral <email address hidden>
Date: Sun Dec 3 20:21:14 2023 -0300

    Add SnapshotClass Creation for CephFS/RBD via Helm Override

    This commit introduces the capability to create a snapshot
    class using helm overrides within "cephfs-provisioner"
    and "rbd-provisioner" charts.

    By default, upon applying platform-integ-apps the
    'snapshotClass.create' field is set to 'false' in both charts,
    creating the snapshot class(es) after being changed to 'true' via
    helm overrides and the app reapplied.

    This enhancement depends on the changes from the review below,
    where in each installation the CRDs and the snapshot-controller
    will be created by default when running bootstrap playbook.
    This way, each CSI app will be able to implement this functionality
    as support for PVC snapshots will be default during installations.

    Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/902811

    Test Plan:
     PASS: Build a new app package with the code changes
     PASS: Successfully execute upload, apply, remove, and delete
           operations for 'platform-integ-apps' on both AIO-SX and
           AIO-DX environments.
     PASS: Upon initial application apply and update, check with
           'helm get values' that the 'snapshotClass.create' field
           is set to 'false' in both the cephFS and RBD charts.
           Additionally, confirm on K8s that the snapshotClasses are not
           created, as expected
     PASS: Use 'system helm-override-update' to change the
           'snapshotClass.create' field to 'true' in both charts.
           Reapply the app and validate on K8s that the snapshotClasses
           are indeed created.
     PASS: After creating the SnapshotClass, take a VolumeSnapshot
           from an existing PVC, proceed with K8s upgrade (K8s 1.25,
           1.26, and 1.27) with CRDs and snapshot controller update,
           take a VolumeSnapshot from another PVC and verify that
           all VolumeSnapshots are correct

    Partial-Bug: 2045897

    Change-Id: I47b897d179c4260fad9171586fe2fbd69f7145de
    Signed-off-by: Gabriel de Araújo Cabral <email address hidden>

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.9.0 stx.storage
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to platform-armada-app (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/904360
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/672b7c70d7e53e298399a252c7616bfb6f286f84
Submitter: "Zuul (22348)"
Branch: master

commit 672b7c70d7e53e298399a252c7616bfb6f286f84
Author: Gabriel de Araújo Cabral <email address hidden>
Date: Wed Dec 6 16:35:22 2023 -0300

    Enable support for PVC snapshots by default during bootstrap

    Previously, the “enable_volume_snapshot_support” variable was false
    by default. As a result, when running the bootstrap playbook, the
    CRDs, and the snapshot-controller were not created to support taking
    snapshots of PVCs.

    With the change, when running the bootstrap playbook, the CRDs and
    the snapshot-controller will be created by default according to the
    K8s version on the system.

    The main benefit of this change is that CSI apps will not need to
    worry about creating CRDs and the snapshot-controller to be able to
    take snapshots of PVCs, in addition to avoiding possible problems,
    since the resource creation process and the version used will be in
    common for all CSI apps.
    Furthermore, if the user is not going to use these resources, he can
    choose to run the bootstrap manually by passing the variable to
    false as a parameter to not create.

    Currently, the version of the CRDs and snapshot-controller present
    in the system is very old, referring to k8s 1.19 with the
    'snapshot-controller:v2.0.0-rc2' image. Due to this, the creation of
    VolumeSnapshots does not occur correctly with new versions of some
    CSI apps.
    Therefore, the current change depends on the review below, where
    the CRDs and snapshot-controller are updated according to the
    K8s version, working as expected.

    * This is a new review after the revert of:
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/902811

    Test Plan:
     PASS: Build an image with the code change
     PASS: AIO-SX fresh install + Check if the CRDs
           and snapshot-controller are created
     PASS: AIO-DX fresh install + Check if the CRDs
           and snapshot-controller are created
     PASS: Standard fresh install + Check if the CRDs
           and snapshot-controller are created
     PASS: Storage fresh install + Check if the CRDs
           and snapshot-controller are created

    Closes-bug: 2045897

    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/904359

    Change-Id: Ifa8e9fc107a264c7df7eb8e0d1f8fa71e6fb7598
    Signed-off-by: Gabriel de Araújo Cabral <email address hidden>
    Signed-off-by: Erickson Silva de Oliveira <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to platform-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/platform-armada-app/+/904361
Committed: https://opendev.org/starlingx/platform-armada-app/commit/313c3b1d1dc1a3b8b800a417dfe409391cb83ad1
Submitter: "Zuul (22348)"
Branch: master

commit 313c3b1d1dc1a3b8b800a417dfe409391cb83ad1
Author: Gabriel de Araújo Cabral <email address hidden>
Date: Sun Dec 3 20:21:14 2023 -0300

    Add SnapshotClass Creation for CephFS/RBD via Helm Override

    This commit introduces the capability to create a snapshot
    class using helm overrides within "cephfs-provisioner"
    and "rbd-provisioner" charts.

    By default, upon applying platform-integ-apps the
    'snapshotClass.create' field is set to 'false' in both charts,
    creating the snapshot class(es) after being changed to 'true' via
    helm overrides and the app reapplied.

    This enhancement depends on the changes from the review below,
    where in each installation the CRDs and the snapshot-controller
    will be created by default when running bootstrap playbook.
    This way, each CSI app will be able to implement this functionality
    as support for PVC snapshots will be default during installations.

    * This is a new review after the revert of:
    https://review.opendev.org/c/starlingx/platform-armada-app/+/902722

    Test Plan:
     PASS: Build a new app package with the code changes
     PASS: Successfully execute upload, apply, remove, and delete
           operations for 'platform-integ-apps' on both AIO-SX and
           AIO-DX environments.
     PASS: Upon initial application apply and update, check with
           'helm get values' that the 'snapshotClass.create' field
           is set to 'false' in both the cephFS and RBD charts.
           Additionally, confirm on K8s that the snapshotClasses are not
           created, as expected
     PASS: Use 'system helm-override-update' to change the
           'snapshotClass.create' field to 'true' in both charts.
           Reapply the app and validate on K8s that the snapshotClasses
           are indeed created.
     PASS: After creating the SnapshotClass, take a VolumeSnapshot
           from an existing PVC, proceed with K8s upgrade (K8s 1.25,
           1.26, and 1.27) with CRDs and snapshot controller update,
           take a VolumeSnapshot from another PVC and verify that
           all VolumeSnapshots are correct
     PASS: With an old version of the application, create user overrides
           for 'classes', then update to the application with the
           current changes and verify that the user overrides for
           'classes' have been transferred to 'storageClasses'

    Partial-Bug: 2045897

    Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/904360

    Change-Id: I6e2fe2009d4cce3e351142359c1f36465cf03ee3
    Signed-off-by: Gabriel de Araújo Cabral <email address hidden>
    Signed-off-by: Erickson Silva de Oliveira <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to platform-armada-app (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to platform-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/platform-armada-app/+/906374
Committed: https://opendev.org/starlingx/platform-armada-app/commit/855c7b1da9ab3779397ffffbc49b7a6e03bca115
Submitter: "Zuul (22348)"
Branch: master

commit 855c7b1da9ab3779397ffffbc49b7a6e03bca115
Author: Gabriel de Araújo Cabral <email address hidden>
Date: Tue Jan 23 10:15:33 2024 -0300

    Add check for csi-snapshotter container creation

    This commit introduces a check to determine whether or not the
    csi-snapshotter container will be created in each provisioner's
    pod.

    The only scenario in which the 'provisioner.snapshotter.enabled'
    field will automatically be set to 'true' and the csi-snapshotter
    container will be created during apply is with:
    - CRDs created: volumesnapshotclasses, volumesnapshots and
      volumesnapshotcontents.
    - Snapshot-controller pod created and using the correct image
      according to the K8s version.

    In all other scenarios, 'provisioner.snapshotter.enabled' will be
    set to 'false' during apply and the container will not be created.

    Remembering that the user can create/remove the container manually
    if desired via helm override.

    Depends-on: https://review.opendev.org/c/starlingx/platform-armada-app/+/904875

    Test Plan:
     PASS: Build a new app package with the code changes
     PASS: Successfully execute upload, apply, remove, and delete
           operations for platform-integ-apps
     PASS: In an environment with CRDs and snapshot-controller using
           the correct image version according to Kubernetes, apply the
           app and verify that the snaphotter field is 'enabled: true'
           and the container has been created in both provisioner pods
     PASS: In an environment with CRDs but with snapshot-controller
           using the incorrect image version according to Kubernetes,
           apply the app and verify that the snaphotter field is
           'enabled: false' and the container has not been created
     PASS: In an environment without CRDs and snapshot-controller, apply
           the app and verify that snaphotter field is 'enabled: false'
           and the container has not been created
     PASS: Manually create/remove csi-snapshotter container via
           helm-override

    Partial-Bug: 2045897

    Change-Id: I7dbcbba520c9758de84e8dae5a553ec1fee69518
    Signed-off-by: Gabriel de Araújo Cabral <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.