aws-k8s-storage hung evaluating manifests

Bug #2007594 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
AWS Cloud Provider Charm
Fix Released
Medium
Adam Dyess
Charm AWS Kubernetes Storage
Fix Released
High
Adam Dyess
Charm Azure Cloud Provider
Fix Released
Medium
Adam Dyess
Charm GCP Kubernetes Storage
Fix Released
Medium
Adam Dyess
KubeVirt Charm
Fix Released
Medium
Adam Dyess
vSphere Cloud Provider Charm
Fix Released
Medium
Adam Dyess

Bug Description

During a run of k8s 1.26 on aws, the aws-k8s-storage charm hangs for 4 hours with the status "Evaluating Manifests" Unfortunately I cant seem to find any logs about why this is happening or what its doing. A few minutes before the message happens, the unit log has an unauthorized when trying to set the aws-secret, but that action never retries, so maybe the failure is that the charm never gets permissions to access aws.

Testrun:
https://solutions.qa.canonical.com/v2/testruns/4e926f62-0d81-4c51-88b3-8d931770e4af/
crashdump:
https://oil-jenkins.canonical.com/artifacts/4e926f62-0d81-4c51-88b3-8d931770e4af/generated/generated/kubernetes-aws/juju-crashdump-kubernetes-aws-2023-02-16-06.01.13.tar.gz
bundle:
https://oil-jenkins.canonical.com/artifacts/4e926f62-0d81-4c51-88b3-8d931770e4af/generated/generated/kubernetes-aws/bundle.yaml

Revision history for this message
George Kraft (cynerva) wrote :

Looks like the manifests failed to apply once due to a temporary API outage during the deployment. The event was deferred[1], so the charm reprocessed the event later.

There is an early return[2] that prevents manifests from being applied if the config has not changed since the last time it was run. I think that early return prevented a reattempt from occurring.

[1]: https://github.com/charmed-kubernetes/aws-k8s-storage/blob/007f315ba090e3c8f02754c51e4bb87c36afdc9e/src/charm.py#L181-L186
[2]: https://github.com/charmed-kubernetes/aws-k8s-storage/blob/007f315ba090e3c8f02754c51e4bb87c36afdc9e/src/charm.py#L168-L169

Changed in charm-aws-k8s-storage:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Adam Dyess (addyess) wrote :

If this occurs, one should be able to run the `sync-resources` action on the unit

Changed in charm-aws-k8s-storage:
milestone: none → 1.27+ck1
Changed in charm-aws-cloud-provider:
milestone: none → 1.27+ck1
Changed in charm-azure-cloud-provider:
milestone: none → 1.27+ck1
Changed in charm-gcp-k8s-storage:
milestone: none → 1.27+ck1
Changed in charm-vsphere-cloud-provider:
milestone: none → 1.27+ck1
Changed in charm-kubevirt:
milestone: none → 1.27+ck1
Changed in charm-aws-cloud-provider:
status: New → Triaged
Changed in charm-azure-cloud-provider:
status: New → Triaged
Changed in charm-gcp-k8s-storage:
status: New → Triaged
Changed in charm-kubevirt:
status: New → Triaged
Changed in charm-vsphere-cloud-provider:
status: New → Triaged
Changed in charm-aws-cloud-provider:
importance: Undecided → Medium
Changed in charm-azure-cloud-provider:
importance: Undecided → Medium
Changed in charm-gcp-k8s-storage:
importance: Undecided → Medium
Changed in charm-kubevirt:
importance: Undecided → Medium
Changed in charm-vsphere-cloud-provider:
importance: Undecided → Medium
Adam Dyess (addyess)
Changed in charm-aws-cloud-provider:
assignee: nobody → Adam Dyess (addyess)
Changed in charm-aws-k8s-storage:
assignee: nobody → Adam Dyess (addyess)
Changed in charm-azure-cloud-provider:
assignee: nobody → Adam Dyess (addyess)
Changed in charm-gcp-k8s-storage:
assignee: nobody → Adam Dyess (addyess)
Changed in charm-vsphere-cloud-provider:
assignee: nobody → Adam Dyess (addyess)
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-aws-cloud-provider:
milestone: 1.27+ck1 → 1.27
Changed in charm-aws-k8s-storage:
milestone: 1.27+ck1 → 1.27
Changed in charm-azure-cloud-provider:
milestone: 1.27+ck1 → 1.27
Changed in charm-gcp-k8s-storage:
milestone: 1.27+ck1 → 1.27
Changed in charm-vsphere-cloud-provider:
milestone: 1.27+ck1 → 1.27
Changed in charm-aws-cloud-provider:
status: Triaged → In Progress
Changed in charm-aws-k8s-storage:
status: Triaged → In Progress
Changed in charm-azure-cloud-provider:
status: Triaged → In Progress
Changed in charm-gcp-k8s-storage:
status: Triaged → In Progress
Changed in charm-vsphere-cloud-provider:
status: Triaged → In Progress
Changed in charm-aws-cloud-provider:
status: In Progress → Fix Committed
Changed in charm-aws-k8s-storage:
status: In Progress → Fix Committed
Changed in charm-azure-cloud-provider:
status: In Progress → Fix Committed
Changed in charm-gcp-k8s-storage:
status: In Progress → Fix Committed
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubevirt:
milestone: 1.27+ck1 → 1.27
assignee: nobody → Adam Dyess (addyess)
status: Triaged → In Progress
Changed in charm-vsphere-cloud-provider:
status: In Progress → Fix Committed
Changed in charm-kubevirt:
status: In Progress → Fix Committed
Changed in charm-aws-cloud-provider:
status: Fix Committed → Fix Released
Changed in charm-aws-k8s-storage:
status: Fix Committed → Fix Released
Changed in charm-azure-cloud-provider:
status: Fix Committed → Fix Released
Changed in charm-gcp-k8s-storage:
status: Fix Committed → Fix Released
Changed in charm-kubevirt:
status: Fix Committed → Fix Released
Changed in charm-vsphere-cloud-provider:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.