snap refresh on update-status is causing issues with the snap store

Bug #1904665 reported by Chris Johnston
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Worker Charm
Fix Released
Critical
Cory Johns

Bug Description

Due to the frequency of the k8s charms running snap refresh, the snap store has had to throttle the kubelet snap. This is causing the kubernetes-worker units to go into an error status on update-status:

2020-11-17 16:36:02 DEBUG leader-settings-changed snap "kubectl" has no updates available
2020-11-17 16:36:03 DEBUG leader-settings-changed error: cannot refresh "kubelet": unexpectedly empty response from the server
2020-11-17 16:36:03 DEBUG leader-settings-changed (try again later)
2020-11-17 16:36:03 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-kubernetes-worker-12/charm/reactive/kubernetes_worker.py", line 323, in join_or_update_cohorts
    snap.join_cohort_snapshot(snapname, cohort_key)
  File "lib/charms/layer/snap.py", line 445, in join_cohort_snapshot
    '--cohort', cohort_key])
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['snap', 'refresh', 'kubelet', '--cohort', 'XYZ123']' returned non-zero exit status 1.

2020-11-17 16:36:03 DEBUG leader-settings-changed Traceback (most recent call last):
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/charm/hooks/leader-settings-changed", line 18, in <module>
2020-11-17 16:36:03 DEBUG leader-settings-changed main()
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
2020-11-17 16:36:03 DEBUG leader-settings-changed bus.dispatch(restricted=restricted_mode)
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
2020-11-17 16:36:03 DEBUG leader-settings-changed _invoke(other_handlers)
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
2020-11-17 16:36:03 DEBUG leader-settings-changed handler.invoke()
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
2020-11-17 16:36:03 DEBUG leader-settings-changed self._action(*args)
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-kubernetes-worker-12/charm/reactive/kubernetes_worker.py", line 323, in join_or_update_cohorts
2020-11-17 16:36:03 DEBUG leader-settings-changed snap.join_cohort_snapshot(snapname, cohort_key)
2020-11-17 16:36:03 DEBUG leader-settings-changed File "lib/charms/layer/snap.py", line 445, in join_cohort_snapshot
2020-11-17 16:36:03 DEBUG leader-settings-changed '--cohort', cohort_key])
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
2020-11-17 16:36:03 DEBUG leader-settings-changed **kwargs).stdout
2020-11-17 16:36:03 DEBUG leader-settings-changed File "/usr/lib/python3.6/subprocess.py", line 438, in run
2020-11-17 16:36:03 DEBUG leader-settings-changed output=stdout, stderr=stderr)
2020-11-17 16:36:03 DEBUG leader-settings-changed subprocess.CalledProcessError: Command '['snap', 'refresh', 'kubelet', '--cohort', 'XYZ123']' returned non-zero exit status 1.
2020-11-17 16:36:04 ERROR juju.worker.uniter.operation runhook.go:132 hook "leader-settings-changed" failed: exit status 1

Tags: sts
Revision history for this message
Cory Johns (johnsca) wrote :

The 1.19+ck1 release includes a fix to prevent the excessive refresh requests, and Kevin is currently working on gracefully handling this (and other snap store failures) in the charms as well, to be included in that as well.

Changed in charm-kubernetes-worker:
milestone: none → 1.19+ck1
assignee: nobody → Kevin W Monroe (kwmonroe)
importance: Undecided → Critical
status: New → In Progress
assignee: Kevin W Monroe (kwmonroe) → Cory Johns (johnsca)
Revision history for this message
Cory Johns (johnsca) wrote :

Sorry, I mean that I'm working on gracefully handling this. Either way, it should be fixed in the 1.19+ck1 release.

tags: added: sts
Revision history for this message
Cory Johns (johnsca) wrote :
tags: added: review-needed
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
tags: removed: review-needed
Revision history for this message
Cory Johns (johnsca) wrote :
tags: added: backport-needed
tags: removed: backport-needed
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.