The deferral of snap installation to ``layer-snap`` makes the charm susceptible to hook execution error

Bug #1839616 reported by Frode Nordahl on 2019-08-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snap Layer
Undecided
Unassigned
charm-octavia-diskimage-retrofit
High
Unassigned
vault-charm
High
Unassigned

Bug Description

We consume ``layer-snap`` for our snap installation needs. The default behaviour of the layer appears to be fragile in the event of snap installation failure.

A failure could occur due to temporary network error or other environmental issues.

We can keep using the layer for our snap installation needs but we should probably take charge and call the library functions from the charm code instead of using the layer default mode of operation.

Log excerpt from failure:
2019-08-09 12:35:17 INFO juju-log Initializing Snap Layer
2019-08-09 12:35:17 DEBUG install none
2019-08-09 12:35:19 INFO juju-log Installing octavia-diskimage-retrofit from store
2019-08-09 12:35:47 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 72, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1211, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/reactive/snap.py", line 83, in install
    snap.install(snapname, **snap_opts)
  File "lib/charms/layer/snap.py", line 53, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 303, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=edge', '--devmode', 'octavia-diskimage-retrofit']' returned non-zero exit status 1.

2019-08-09 12:35:47 DEBUG install Traceback (most recent call last):
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/hooks/install", line 22, in <module>
2019-08-09 12:35:47 DEBUG install main()
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 72, in main
2019-08-09 12:35:47 DEBUG install hookenv._run_atstart()
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1211, in _run_atstart
2019-08-09 12:35:47 DEBUG install callback(*args, **kwargs)
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/reactive/snap.py", line 83, in install
2019-08-09 12:35:47 DEBUG install snap.install(snapname, **snap_opts)
2019-08-09 12:35:47 DEBUG install File "lib/charms/layer/snap.py", line 53, in install
2019-08-09 12:35:47 DEBUG install _install_store(snapname, **kw)
2019-08-09 12:35:47 DEBUG install File "lib/charms/layer/snap.py", line 303, in _install_store
2019-08-09 12:35:47 DEBUG install out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
2019-08-09 12:35:47 DEBUG install File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
2019-08-09 12:35:47 DEBUG install **kwargs).stdout
2019-08-09 12:35:47 DEBUG install File "/usr/lib/python3.6/subprocess.py", line 438, in run
2019-08-09 12:35:47 DEBUG install output=stdout, stderr=stderr)
2019-08-09 12:35:47 DEBUG install subprocess.CalledProcessError: Command '['snap', 'install', '--channel=edge', '--devmode', 'octavia-diskimage-retrofit']' returned non-zero exit status 1.
2019-08-09 12:35:47 ERROR juju.worker.uniter.operation runhook.go:132 hook "install" failed: exit status 1

Related branches

Frode Nordahl (fnordahl) on 2019-08-09
summary: The deferral of snap installation to ``layer-snap`` makes the charm
- susceptible for hook execution error
+ susceptible to hook execution error
Changed in charm-octavia-diskimage-retrofit:
status: New → Triaged
importance: Undecided → High
Frode Nordahl (fnordahl) wrote :

Adding ``snap-layer`` to the bug.

1) The information logged on error is insufficient to provide RCA

2) There probably are more robust ways of handling temporary snap install failures in the layer default mode of operation

Frode Nordahl (fnordahl) wrote :
Download full text (3.7 KiB)

Adding the vault charm to the bug:
2019-08-12 10:42:44 INFO juju-log Invoking reactive handler: reactive/vault_handlers.py:128:snap_install
2019-08-12 10:42:45 INFO juju-log Installing core from store
2019-08-12 10:44:57 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 133, in snap_install
    snap.install('core')
  File "lib/charms/layer/snap.py", line 53, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 303, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=stable', 'core']' returned non-zero exit status 1.

2019-08-12 10:44:57 DEBUG install Traceback (most recent call last):
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/charm/hooks/install", line 22, in <module>
2019-08-12 10:44:57 DEBUG install main()
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
2019-08-12 10:44:57 DEBUG install bus.dispatch(restricted=restricted_mode)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
2019-08-12 10:44:57 DEBUG install _invoke(other_handlers)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
2019-08-12 10:44:57 DEBUG install handler.invoke()
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
2019-08-12 10:44:57 DEBUG install self._action(*args)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 133, in snap_install
2019-08-12 10:44:57 DEBUG install snap.install('core')
2019-08-12 10:44:57 DEBUG install File "lib/charms/layer/snap.py", line 53, in install
2019-08-12 10:44:57 DEBUG install _install_store(snapname, **kw)
2019-08-12 10:44:57 DEBUG install File "lib/charms/layer/snap.py", line 303, in _install_store
2019-08-12 10:44:57 DEBUG install out = subprocess.check_output(cmd, stderr=subprocess.STD...

Read more...

Changed in vault-charm:
status: New → Triaged
importance: Undecided → High
Frode Nordahl (fnordahl) wrote :

Just to add some flavor to what's going on:

# snap changes
ID Status Spawn Ready Summary
1 Done today at 10:40 UTC today at 10:40 UTC Initialize system state
2 Done today at 10:41 UTC today at 10:41 UTC Change configuration of "core" snap
3 Done today at 10:42 UTC today at 10:42 UTC Change configuration of "core" snap
4 Error today at 10:42 UTC today at 10:44 UTC Install "core" snap
5 Done today at 10:42 UTC today at 10:42 UTC Initialize device

# snap tasks 4
Status Spawn Ready Summary
Done today at 10:42 UTC today at 10:44 UTC Ensure prerequisites for "core" are available
Undone today at 10:42 UTC today at 10:44 UTC Download snap "core" (7270) from channel "stable"
Error today at 10:42 UTC today at 10:44 UTC Fetch and check assertions for snap "core" (7270)
Hold today at 10:42 UTC today at 10:44 UTC Mount snap "core" (7270)
Hold today at 10:42 UTC today at 10:44 UTC Copy snap "core" data
Hold today at 10:42 UTC today at 10:44 UTC Setup snap "core" (7270) security profiles
Hold today at 10:42 UTC today at 10:44 UTC Make snap "core" (7270) available to the system
Hold today at 10:42 UTC today at 10:44 UTC Automatically connect eligible plugs and slots of snap "core"
Hold today at 10:42 UTC today at 10:44 UTC Set automatic aliases for snap "core"
Hold today at 10:42 UTC today at 10:44 UTC Setup snap "core" aliases
Hold today at 10:42 UTC today at 10:44 UTC Run install hook of "core" snap if present
Hold today at 10:42 UTC today at 10:44 UTC Start snap "core" (7270) services
Hold today at 10:42 UTC today at 10:44 UTC Run configure hook of "core" snap if present

......................................................................
Fetch and check assertions for snap "core" (7270)

2019-08-12T10:44:36Z ERROR cannot get nonce from store: store server returned status 503

The timestamps coincide with the charm log and gives a plausible reason for the failure.

However I must maintain that I do not think we can sustain this as a one-shot-to success type of transaction for the charm/layer and should add some resilience here.

Frode Nordahl (fnordahl) on 2019-08-12
Changed in vault-charm:
milestone: none → 19.10

Change abandoned by Frode Nordahl (<email address hidden>) on branch: master
Review: https://review.opendev.org/676145
Reason: The charm has effectively been rebuilt and shipped with the updated layer already as a side effect of other landed commits. No more need for this one.

David Ames (thedac) on 2019-10-24
Changed in vault-charm:
milestone: 19.10 → 20.01
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers