timeout waiting for volumes in k8s charm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
We're running into an intermittent issue deploying charms to our k8s platform where the agent pod hits a timeout while waiting for a volume, causing the application to become stuck in the "allocating" state. The root cause of this problem isn't really Juju's concern since it's a failure in the underlying platform, but I'm also filing an issue here since the problem is never surfaced in Juju so a user wouldn't know about it unless they went digging after becoming suspicious at how long the deploy is taking.
Our first reaction to this issue was to look for some way to set a timeout for charm deploy operations, since we know the characteristics of a successful deploy vs. one that will never finish and could confidently set a two-minute timeout in this case. Is there a way to achieve this? That might offer a flexible way for users to make sure they know about it when things go haywire under the hood.
- juju status: https:/
- juju debug-log: https:/
- k8s events: https:/
- k8s status: https:/
This bug is similar to [1] in that it's caused by the platform taking a long time to do something before eventually timing out and leaves Juju stuck in "allocating" state, but it's for a different cloud type and the suggested solution on that ticket might not be applicable here.
description: | updated |
Changed in juju: | |
assignee: | nobody → Evan Hanson (evhan) |
Changed in juju: | |
milestone: | 2.8-beta1 → 2.8.1 |
juju status --storage
would normally be expected to show that storage allocation has an issue. We by default don't show relations and storage in status (--relations and --storage are needed).
status --format yaml also shows everything
Can you confirm that status --storage does or doesn't surface the error?