The deferral of snap installation to ``layer-snap`` makes the charm susceptible to hook execution error

Bug #1839616 reported by Frode Nordahl
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Fix Released
High
Unassigned
Snap Layer
Fix Committed
Undecided
Unassigned
charm-octavia-diskimage-retrofit
Fix Released
High
Unassigned
vault-charm
Fix Released
High
Unassigned

Bug Description

We consume ``layer-snap`` for our snap installation needs. The default behaviour of the layer appears to be fragile in the event of snap installation failure.

A failure could occur due to temporary network error or other environmental issues.

We can keep using the layer for our snap installation needs but we should probably take charge and call the library functions from the charm code instead of using the layer default mode of operation.

Log excerpt from failure:
2019-08-09 12:35:17 INFO juju-log Initializing Snap Layer
2019-08-09 12:35:17 DEBUG install none
2019-08-09 12:35:19 INFO juju-log Installing octavia-diskimage-retrofit from store
2019-08-09 12:35:47 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 72, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1211, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/reactive/snap.py", line 83, in install
    snap.install(snapname, **snap_opts)
  File "lib/charms/layer/snap.py", line 53, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 303, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=edge', '--devmode', 'octavia-diskimage-retrofit']' returned non-zero exit status 1.

2019-08-09 12:35:47 DEBUG install Traceback (most recent call last):
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/hooks/install", line 22, in <module>
2019-08-09 12:35:47 DEBUG install main()
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 72, in main
2019-08-09 12:35:47 DEBUG install hookenv._run_atstart()
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1211, in _run_atstart
2019-08-09 12:35:47 DEBUG install callback(*args, **kwargs)
2019-08-09 12:35:47 DEBUG install File "/var/lib/juju/agents/unit-octavia-diskimage-retrofit-0/charm/reactive/snap.py", line 83, in install
2019-08-09 12:35:47 DEBUG install snap.install(snapname, **snap_opts)
2019-08-09 12:35:47 DEBUG install File "lib/charms/layer/snap.py", line 53, in install
2019-08-09 12:35:47 DEBUG install _install_store(snapname, **kw)
2019-08-09 12:35:47 DEBUG install File "lib/charms/layer/snap.py", line 303, in _install_store
2019-08-09 12:35:47 DEBUG install out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
2019-08-09 12:35:47 DEBUG install File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
2019-08-09 12:35:47 DEBUG install **kwargs).stdout
2019-08-09 12:35:47 DEBUG install File "/usr/lib/python3.6/subprocess.py", line 438, in run
2019-08-09 12:35:47 DEBUG install output=stdout, stderr=stderr)
2019-08-09 12:35:47 DEBUG install subprocess.CalledProcessError: Command '['snap', 'install', '--channel=edge', '--devmode', 'octavia-diskimage-retrofit']' returned non-zero exit status 1.
2019-08-09 12:35:47 ERROR juju.worker.uniter.operation runhook.go:132 hook "install" failed: exit status 1

Related branches

Frode Nordahl (fnordahl)
summary: The deferral of snap installation to ``layer-snap`` makes the charm
- susceptible for hook execution error
+ susceptible to hook execution error
Changed in charm-octavia-diskimage-retrofit:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Adding ``snap-layer`` to the bug.

1) The information logged on error is insufficient to provide RCA

2) There probably are more robust ways of handling temporary snap install failures in the layer default mode of operation

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Download full text (3.7 KiB)

Adding the vault charm to the bug:
2019-08-12 10:42:44 INFO juju-log Invoking reactive handler: reactive/vault_handlers.py:128:snap_install
2019-08-12 10:42:45 INFO juju-log Installing core from store
2019-08-12 10:44:57 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 133, in snap_install
    snap.install('core')
  File "lib/charms/layer/snap.py", line 53, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 303, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=stable', 'core']' returned non-zero exit status 1.

2019-08-12 10:44:57 DEBUG install Traceback (most recent call last):
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/charm/hooks/install", line 22, in <module>
2019-08-12 10:44:57 DEBUG install main()
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 73, in main
2019-08-12 10:44:57 DEBUG install bus.dispatch(restricted=restricted_mode)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
2019-08-12 10:44:57 DEBUG install _invoke(other_handlers)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
2019-08-12 10:44:57 DEBUG install handler.invoke()
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
2019-08-12 10:44:57 DEBUG install self._action(*args)
2019-08-12 10:44:57 DEBUG install File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 133, in snap_install
2019-08-12 10:44:57 DEBUG install snap.install('core')
2019-08-12 10:44:57 DEBUG install File "lib/charms/layer/snap.py", line 53, in install
2019-08-12 10:44:57 DEBUG install _install_store(snapname, **kw)
2019-08-12 10:44:57 DEBUG install File "lib/charms/layer/snap.py", line 303, in _install_store
2019-08-12 10:44:57 DEBUG install out = subprocess.check_output(cmd, stderr=subprocess.STD...

Read more...

Changed in vault-charm:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Just to add some flavor to what's going on:

# snap changes
ID Status Spawn Ready Summary
1 Done today at 10:40 UTC today at 10:40 UTC Initialize system state
2 Done today at 10:41 UTC today at 10:41 UTC Change configuration of "core" snap
3 Done today at 10:42 UTC today at 10:42 UTC Change configuration of "core" snap
4 Error today at 10:42 UTC today at 10:44 UTC Install "core" snap
5 Done today at 10:42 UTC today at 10:42 UTC Initialize device

# snap tasks 4
Status Spawn Ready Summary
Done today at 10:42 UTC today at 10:44 UTC Ensure prerequisites for "core" are available
Undone today at 10:42 UTC today at 10:44 UTC Download snap "core" (7270) from channel "stable"
Error today at 10:42 UTC today at 10:44 UTC Fetch and check assertions for snap "core" (7270)
Hold today at 10:42 UTC today at 10:44 UTC Mount snap "core" (7270)
Hold today at 10:42 UTC today at 10:44 UTC Copy snap "core" data
Hold today at 10:42 UTC today at 10:44 UTC Setup snap "core" (7270) security profiles
Hold today at 10:42 UTC today at 10:44 UTC Make snap "core" (7270) available to the system
Hold today at 10:42 UTC today at 10:44 UTC Automatically connect eligible plugs and slots of snap "core"
Hold today at 10:42 UTC today at 10:44 UTC Set automatic aliases for snap "core"
Hold today at 10:42 UTC today at 10:44 UTC Setup snap "core" aliases
Hold today at 10:42 UTC today at 10:44 UTC Run install hook of "core" snap if present
Hold today at 10:42 UTC today at 10:44 UTC Start snap "core" (7270) services
Hold today at 10:42 UTC today at 10:44 UTC Run configure hook of "core" snap if present

......................................................................
Fetch and check assertions for snap "core" (7270)

2019-08-12T10:44:36Z ERROR cannot get nonce from store: store server returned status 503

The timestamps coincide with the charm log and gives a plausible reason for the failure.

However I must maintain that I do not think we can sustain this as a one-shot-to success type of transaction for the charm/layer and should add some resilience here.

Frode Nordahl (fnordahl)
Changed in vault-charm:
milestone: none → 19.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-octavia-diskimage-retrofit (master)

Change abandoned by Frode Nordahl (<email address hidden>) on branch: master
Review: https://review.opendev.org/676145
Reason: The charm has effectively been rebuilt and shipped with the updated layer already as a side effect of other landed commits. No more need for this one.

David Ames (thedac)
Changed in vault-charm:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in vault-charm:
milestone: 20.01 → 20.05
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Download full text (3.4 KiB)

2020-05-18 11:54:04 INFO juju-log Installing mysql-shell from store
2020-05-18 11:54:13 ERROR juju-log Installation failed cmd="['snap', 'install', '--channel=stable', 'mysql-shell']" returncode=1 output="b'error: cannot perform the following tasks:\n- Mount snap "mysql-shell" (4) ([start snap-mysql\\x2dshell-4.mount] failed with exit status 1: Job failed. See "journalctl -xe" for details.\n)\n'"
2020-05-18 11:54:13 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 73, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1332, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/charm/reactive/snap.py", line 83, in install
    snap.install(snapname, **snap_opts)
  File "lib/charms/layer/snap.py", line 64, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 345, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=stable', 'mysql-shell']' returned non-zero exit status 1.

2020-05-18 11:54:13 DEBUG install Traceback (most recent call last):
2020-05-18 11:54:13 DEBUG install File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/charm/hooks/install", line 22, in <module>
2020-05-18 11:54:13 DEBUG install main()
2020-05-18 11:54:13 DEBUG install File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 73, in main
2020-05-18 11:54:13 DEBUG install hookenv._run_atstart()
2020-05-18 11:54:13 DEBUG install File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1332, in _run_atstart
2020-05-18 11:54:13 DEBUG install callback(*args, **kwargs)
2020-05-18 11:54:13 DEBUG install File "/var/lib/juju/agents/unit-mysql-innodb-cluster-1/charm/reactive/snap.py", line 83, in install
2020-05-18 11:54:13 DEBUG install snap.install(snapname, **snap_opts)
2020-05-18 11:54:13 DEBUG install File "lib/charms/layer/snap.py", line 64, in install
2020-05-18 11:54:13 DEBUG install _install_store(snapname, **kw)
2020-05-18 11:54:13 DEBUG install File "lib/charms/layer/snap.py", line 345, in _install_store
2020-05-18 11:54:13 DEBUG install out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
2020-05-18 11:54:13 DEBUG install File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
2020-05-18 11:54:13 DEBUG install return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2020-05-18 11:54:13 DEBUG install File "/usr/lib/python3.8/subprocess.py", line 512, in run
2020-05-18 11:54:13 DEBUG install rais...

Read more...

David Ames (thedac)
Changed in vault-charm:
milestone: 20.05 → 20.08
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Just hit mysql-innodb-cluster here: https://review.opendev.org/#/c/740178/

Changed in charm-mysql-innodb-cluster:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Changed in vault-charm:
status: Triaged → Confirmed
Changed in vault-charm:
status: Confirmed → Triaged
James Page (james-page)
Changed in vault-charm:
milestone: 20.08 → none
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

snap install error from charm-mysql-innodb-cluster: https://paste.ubuntu.com/p/JMvv63S7nQ/

unit log: https://paste.ubuntu.com/p/mG7wTBKjqC/

This is from the latest ~openstack-charmers-next

tags: added: cdo-qa cdo-release-blocker foundations-engine
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Download full text (4.2 KiB)

I see various glitches quite early:

Aug 10 22:20:54 juju-5b349a-5-lxd-8 systemd[1]: cloud-init.service: Main process exited, code=exited, status=1/FAILURE
...
Aug 10 22:20:54 juju-5b349a-5-lxd-8 chattr[229]: /usr/bin/chattr: Permission denied while reading flags on /sys/firmware/efi/efivars/db-d719b2cb-3d3a-4596-a3bc-dad00e67656f
Aug 10 22:20:54 juju-5b349a-5-lxd-8 rsyslogd[222]: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd. [v8.2001.0]
Aug 10 22:20:54 juju-5b349a-5-lxd-8 chattr[235]: /usr/bin/chattr: Permission denied while reading flags on /sys/firmware/efi/efivars/dbx-d719b2cb-3d3a-4596-a3bc-dad00e67656f
Aug 10 22:20:54 juju-5b349a-5-lxd-8 rsyslogd[222]: imklog: cannot open kernel log (/proc/kmsg): Permission denied.
Aug 10 22:20:54 juju-5b349a-5-lxd-8 rsyslogd[222]: activation of module imklog failed [v8.2001.0 try https://www.rsyslog.com/e/2145 ]
...
Aug 10 22:20:54 juju-5b349a-5-lxd-8 sbkeysync[239]: from /usr/share/secureboot/updates/dbx/MS-2016-08-08.bin
Aug 10 22:20:54 juju-5b349a-5-lxd-8 sbkeysync[239]: 09df5f4e511208ec78b96d12dCan't create key file /sys/firmware/efi/efivars/dbx-d719b2cb-3d3a-4596-a3bc-dad00e67656f: Permission denied
Aug 10 22:20:54 juju-5b349a-5-lxd-8 sbkeysync[239]: Error syncing keystore file /usr/share/secureboot/updates/dbx/MS-2016-08-08.bin
...
Aug 10 22:21:33 juju-5b349a-5-lxd-8 systemd[1]: cloud-final.service: Main process exited, code=exited, status=1/FAILURE
...
Aug 10 22:21:38 juju-5b349a-5-lxd-8 lxd.daemon[3478]: ==> Setting up persistent shmounts path
Aug 10 22:21:39 juju-5b349a-5-lxd-8 lxd.daemon[3543]: Failed to mark /media/.lxd-shmounts as private: Invalid argument
Aug 10 22:21:39 juju-5b349a-5-lxd-8 lxd.daemon[3542]: Failed to setup the shmounts namespace: No such file or directory
Aug 10 22:21:39 juju-5b349a-5-lxd-8 lxd.daemon[3478]: ====> Failed to setup shmounts, continuing without
Aug 10 22:21:39 juju-5b349a-5-lxd-8 lxd.daemon[3478]: ====> Making LXD shmounts use the persistent path
Aug 10 22:21:39 juju-5b349a-5-lxd-8 lxd.daemon[3478]: ====> Making LXCFS use the persistent path
...
Aug 10 22:21:40 juju-5b349a-5-lxd-8 lxd.daemon[3478]: => Starting LXCFS
Aug 10 22:21:40 juju-5b349a-5-lxd-8 lxd.daemon[3645]: Running constructor lxcfs_init to reload liblxcfs
Aug 10 22:21:40 juju-5b349a-5-lxd-8 lxd.daemon[3645]: bindings.c: 791: cgfs_setup_controllers: Failed to set up private lxcfs cgroup mounts
Aug 10 22:21:41 juju-5b349a-5-lxd-8 lxd.daemon[3645]: Failed to setup private cgroup mounts for lxcfs
Aug 10 22:21:41 juju-5b349a-5-lxd-8 lxd.daemon[3648]: cat: /var/snap/lxd/common/lxcfs.pid: No such file or directory
Aug 10 22:21:41 juju-5b349a-5-lxd-8 lxd.daemon[3478]: => Starting LXD
Aug 10 22:21:42 juju-5b349a-5-lxd-8 systemd-resolved[145]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 10 22:21:43 juju-5b349a-5-lxd-8 lxd.daemon[3653]: t=2020-08-10T22:21:43+0000 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
Aug 10 22:21:43 juju-5b349a-5-lxd-8 lxd.daemon[3653]: t=2020-08-10T22:21:43+0000 lvl=warn msg=" - Couldn't find the C...

Read more...

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Download full text (3.7 KiB)

In this other problematic run [0] however we don't see all these glitches. Instead the first failure seems to be:

Jul 16 12:10:27 juju-d19098-zaza-f68227589c9c-21 systemd[1]: Started LXD - main daemon.
Jul 16 12:10:41 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com).
Jul 16 12:10:57 juju-d19098-zaza-f68227589c9c-21 snapd[3336]: stateengine.go:150: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/names?confinement=strict%2Cclassic: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Jul 16 12:11:32 juju-d19098-zaza-f68227589c9c-21 snapd[3336]: daemon.go:542: gracefully waiting for running hooks
Jul 16 12:11:32 juju-d19098-zaza-f68227589c9c-21 snapd[3336]: daemon.go:544: done waiting for running hooks
Jul 16 12:11:32 juju-d19098-zaza-f68227589c9c-21 snapd[3336]: daemon stop requested to wait for socket activation
Jul 16 12:11:56 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com).
Jul 16 12:11:56 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Jul 16 12:13:25 juju-d19098-zaza-f68227589c9c-21 systemd[1]: Starting Snap Daemon...
Jul 16 12:13:25 juju-d19098-zaza-f68227589c9c-21 snapd[10081]: AppArmor status: apparmor is enabled and all features are available
Jul 16 12:13:25 juju-d19098-zaza-f68227589c9c-21 snapd[10081]: daemon.go:343: started snapd/2.45.1+18.04 (series 16; classic) ubuntu/18.04 (amd64) linux/4.15.0-111-generic.
Jul 16 12:13:25 juju-d19098-zaza-f68227589c9c-21 snapd[10081]: daemon.go:436: adjusting startup timeout by 30s (pessimistic estimate of 30s plus 5s per snap)
Jul 16 12:13:25 juju-d19098-zaza-f68227589c9c-21 systemd[1]: Started Snap Daemon.
Jul 16 12:13:43 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com).
Jul 16 12:14:26 juju-d19098-zaza-f68227589c9c-21 snapd[10081]: api.go:990: Installing snap "core" revision unset
Jul 16 12:14:58 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com).
Jul 16 12:15:08 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com).
Jul 16 12:15:18 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com).
Jul 16 12:15:18 juju-d19098-zaza-f68227589c9c-21 snapd[10081]: 2020/07/16 12:15:18 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 408 Request Time-out\r\nCache-Control: no-cache\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<html><body><h1>408 Request Time-out</h1>\nYour browser didn't send a complete request in time.\n</body></html>\n"; err=<nil>
Jul 16 12:15:28 juju-d19098-zaza-f68227589c9c-21 systemd-timesyncd[526]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com).
Jul 16 12:16:32 juju-d19098-zaza-f68227589c9c-21 systemd[1]: Reloading.
Jul 16 12:16:32 juju-d19098-zaza-f68227589c9c-21 s...

Read more...

Changed in charm-mysql-innodb-cluster:
status: Confirmed → Triaged
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Apparently the snap mount errors we see in the last run [0] can be caused by the core snap not having been properly installed. Adding some tenacity/retry here [1] should fix the issue.

0: https://paste.ubuntu.com/p/JMvv63S7nQ/
1: https://github.com/stub42/layer-snap/blob/master/lib/charms/layer/snap.py#L344

Changed in layer-snap:
status: New → In Progress
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This is the unit log from my current test run; https://paste.ubuntu.com/p/qwDDDQKqr2/

We hit the failure here:
2020-08-11 03:29:23 ERROR juju-log Installation failed cmd="['snap', 'install', '--channel=stable', 'mysql-shell']" returncode=1 output="b'error: cannot perform the following tasks:\n- Mount snap "mysql-shell" (4) ([start snap-mysql\\x2dshell-4.mount] failed with exit status 1: Job failed. See "journalctl -xe" for details.\n)\n'"

And again here, 10 hours later after 'juju resolved':
2020-08-11 13:57:39 ERROR juju-log Installation failed cmd="['snap', 'install', '--channel=stable', 'mysql-shell']" returncode=1 output="b'error: cannot perform the following tasks:\n- Mount snap "mysql-shell" (4) ([start snap-mysql\\x2dshell-4.mount] failed with exit status 1: Job failed. See "journalctl -xe" for details.\n)\n'"

Another 'juju resolved' 15 minutes later got us past it:
2020-08-11 14:13:05 DEBUG juju-log Installation successful cmd="['snap', 'install', '--channel=stable', 'mysql-shell']" output="b'mysql-shell 8.0.20 from Canonical* installed\n'"

So, it seems unlikely to be related to timing on getting the core snap installed.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The mysql-shell one appears to be the system bumping up against the maxkeys limit discussed here:

https://linuxcontainers.org/lxd/docs/master/production-setup

Here is a line from /proc/key-users on that system, showing the user 1000000 bumping up against this limit:

1000000: 200 200/200 200/200 3732/20000

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Changed in layer-snap:
status: In Progress → Fix Committed
assignee: Aurelien Lourot (aurelien-lourot) → nobody
Revision history for this message
David Ames (thedac) wrote :

Doing some cleanup.

The original bug was resolved in the snap layer and all affected charms have been rebuilt.

The last several comments (#8-#11) are better addressed in https://bugs.launchpad.net/juju/+bug/1891223 which is well understood.

Closing this bug out.

Changed in charm-mysql-innodb-cluster:
status: Triaged → Confirmed
status: Confirmed → Invalid
Changed in charm-octavia-diskimage-retrofit:
status: Triaged → Invalid
Changed in vault-charm:
status: Triaged → Invalid
Changed in charm-mysql-innodb-cluster:
status: Invalid → Fix Released
Changed in charm-octavia-diskimage-retrofit:
status: Invalid → Fix Released
Changed in vault-charm:
status: Invalid → Fix Released
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Ehm, I think the closing out of this bug was an error and this still occurs. The original intent for this bug was to track spurious hook errors caused by connectivity issues with the snap store and how the layer and/or the charms should handle this better.

Just now from the Vault snap:
2020-09-09 15:55:58 ERROR juju-log Installation failed cmd="['snap', 'install', '--channel=stable', 'core']" returncode=1 output="b'error: cannot perform the following tasks:\n- Download snap "core" (9804) from channel "stable" (received an unexpected http response code (408) when trying to download https://canonical-bos01.cdn.snapcraft.io/download-origin/canonical-lgw01/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_9804.snap?interactive=1&token=1599678000_ef78e027e7a1f418499d0512263b926f2890670c)\n'"
2020-09-09 15:55:58 ERROR juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 140, in snap_install
    snap.install('core')
  File "lib/charms/layer/snap.py", line 64, in install
    _install_store(snapname, **kw)
  File "lib/charms/layer/snap.py", line 345, in _install_store
    out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['snap', 'install', '--channel=stable', 'core']' returned non-zero exit status 1.

Revision history for this message
David Ames (thedac) wrote :

@Frode,

Sorry, I had thought your snap layer change addressed the original intent of the bug. It seems to me it is a snap layer bug and not necessarily a charm bug.

In the meantime, this bug has been somewhat conflated with https://bugs.launchpad.net/juju/+bug/1891223.

So I am not entirely sure how to proceed. Would you prefer to re-open the bug on the snap-layer or our charms?

Revision history for this message
Frode Nordahl (fnordahl) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.