config-changed can hang indefinitly if it misses a previous upgrade key
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
When performing some maintenance on a cloud, bumped into a host that was holding a machine lock.
I could trace back with the logging that it was waiting for an upgrade to finish. But it looks like the 'timeout' that is supposed to happen after 10 minutes, never kicked in.
Code that I suspect is located around: https:/
Logging:
2019-03-19 15:08:57 DEBUG juju.cmd supercommand.go:57 args: []string{
2019-03-19 15:08:57 DEBUG juju.agent agent.go:545 read agent config, format "2.0"
2019-03-19 15:08:57 INFO juju.cmd.jujud agent.go:133 setting logging config to "<root>
2019-03-19 15:09:22 DEBUG juju-log Hardening function 'config_changed'
2019-03-19 15:09:22 DEBUG juju-log No hardening applied to 'config_changed'
2019-03-19 15:09:22 INFO juju-log old_version: luminous
2019-03-19 15:09:23 INFO juju-log new_version: luminous
2019-03-19 15:09:24 DEBUG juju-log Directory "/var/lib/
2019-03-19 15:09:24 INFO juju-log Attempting to resume possibly failed upgrade.
2019-03-19 15:09:27 INFO juju-log Making dir /var/lib/
2019-03-19 15:09:27 INFO juju-log Monitor hosts are ['10.63.4.43:6789', '10.63.4.49:6789', '10.63.4.57:6789']
2019-03-19 15:09:35 DEBUG juju-log Writing file /var/lib/
2019-03-19 15:09:35 INFO juju-log roll_osd_cluster called with luminous
2019-03-19 15:09:36 INFO juju-log osd_sorted_list: [<ceph.
2019-03-19 15:09:36 INFO juju-log upgrade position: 5
2019-03-19 15:09:36 INFO juju-log Previous node is: os-s005
2019-03-19 15:09:37 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 15:09:37 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 15:09:37 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 15:09:37 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 15:09:37 INFO juju-log waiting for 11 seconds
2019-03-19 15:09:48 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 15:09:48 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 15:09:49 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 15:09:49 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 15:09:49 INFO juju-log waiting for 19 seconds
2019-03-19 15:10:08 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 15:10:08 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 15:10:08 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 15:10:08 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 15:10:08 INFO juju-log waiting for 11 seconds
2019-03-19 15:10:20 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 15:10:20 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 15:10:20 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 15:10:20 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 15:10:20 INFO juju-log waiting for 7 seconds
2019-03-19 15:10:27 DEBUG config-changed Error ENOENT: key 'osd_os-
.
.
.
2019-03-19 19:37:59 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 19:37:59 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 19:38:00 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 19:38:00 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 19:38:00 INFO juju-log waiting for 5 seconds
2019-03-19 19:38:05 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 19:38:05 INFO juju-log os-s005 is not finished. Waiting
2019-03-19 19:38:05 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-19 19:38:05 INFO juju-log Monitor config-key get failed with message: b''
2019-03-19 19:38:05 INFO juju-log waiting for 17 seconds
2019-03-19 19:38:23 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-19 19:38:23 INFO juju-log os-s005 is not finished. Waiting
.
.
.
2019-03-21 01:35:05 INFO juju-log os-s005 is not finished. Waiting
2019-03-21 01:35:05 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-21 01:35:05 INFO juju-log Monitor config-key get failed with message: b''
2019-03-21 01:35:05 INFO juju-log waiting for 12 seconds
2019-03-21 01:35:17 DEBUG config-changed Error ENOENT: key 'osd_os-
2019-03-21 01:35:17 INFO juju-log os-s005 is not finished. Waiting
2019-03-21 01:35:18 DEBUG config-changed Error ENOENT: error obtaining 'osd_os-
2019-03-21 01:35:18 INFO juju-log Monitor config-key get failed with message: b''
2019-03-21 01:35:18 INFO juju-log waiting for 28 seconds
tags: | added: canonical-bootstack |
After looking a bit closer to the code. The issue is caused by the fact the for some reason (edge case) the start signal (osd_os- s005_luminous_ start in this case) is never received. So the 10 minuted loop never even starts.