We've managed to resolve the situation (see below for the fix). There are several small failures that all contributed to make this bug hard to find and hard to fix.
The root cause is that the copy of the charm on machine 0 in /var/lib/juju/charm-get-cache/ was truncated. This meant that the machines due to host the service would download the charm, but the hash comparison would fail. This sent juju into an endless loop downloading the charm, failing the hash comparison, and downloading the charm again.
There are several issues here:
1) The downloaded charms were never cleared up on disk, resulting in a full disk.
2) The hash comparison log message was at the INFO level, which made it very hard to see amidst the clutter of other log messages. The relevant log messages were:
```
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:567 ModeContinue starting
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:59 resuming charm install
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:569 ModeContinue exiting
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:567 ModeInstalling local:trusty/snapdeltaservice-11 starting
2016-09-22 03:33:15 INFO juju.worker.uniter.operation executor.go:69 running operation install local:trusty/snapdeltaservice-11
2016-09-22 03:33:15 INFO juju.worker.uniter.operation executor.go:103 preparing operation "install local:trusty/snapdeltaservice-11"
2016-09-22 03:33:15 INFO juju.worker.uniter.charm bundles.go:60 downloading local:trusty/snapdeltaservice-11 from https://10.50.28.64:17070/environment/7da51b 76-a297-4b45-8b9b-fcfb224f3b90/charms?file=%2A&url=local%3Atrusty%2Fsnapdeltaservice-11 2016-09-22 03:33:16 INFO juju.worker.uniter.charm bundles.go:69 download complete
2016-09-22 03:33:16 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] failed: install local:trusty/snapdeltaservice-11
2016-09-22 03:33:16 INFO juju.worker.uniter modes.go:569 ModeInstalling local:trusty/snapdeltaservice-11 exiting
2016-09-22 03:33:16 INFO juju.worker.uniter uniter.go:203 unit "sds-worker-ra28611616d3e50aab788f8a2afbff62b18c02fe2/2" shutting down: ModeInstalling local:tr
usty/snapdeltaservice-11: preparing operation "install local:trusty/snapdeltaservice-11": failed to download charm "local:trusty/snapdeltaservice-11" from ["h
ttps://10.50.28.64:17070/environment/7da51b76-a297-4b45-8b9b-fcfb224f3b90/charms?file=%2A&url=local%3Atrusty%2Fsnapdeltaservice-11"]: expected sha256 "c556018
53a737783602d1e16b759d8e84b26f0e74df8c9036bb7162076152168", got "35372b4cd40573879f3007fc9a3657ad46aae49ecef10e81ff34c4563a98f66a"
2016-09-22 03:33:16 DEBUG juju.worker.uniter runlistener.go:109 juju-run listener stopping
2016-09-22 03:33:16 DEBUG juju.worker.uniter runlistener.go:129 juju-run listener stopped
```
We've managed to resolve the situation (see below for the fix). There are several small failures that all contributed to make this bug hard to find and hard to fix.
The root cause is that the copy of the charm on machine 0 in /var/lib/ juju/charm- get-cache/ was truncated. This meant that the machines due to host the service would download the charm, but the hash comparison would fail. This sent juju into an endless loop downloading the charm, failing the hash comparison, and downloading the charm again.
There are several issues here:
1) The downloaded charms were never cleared up on disk, resulting in a full disk.
2) The hash comparison log message was at the INFO level, which made it very hard to see amidst the clutter of other log messages. The relevant log messages were:
``` snapdeltaservic e-11 starting uniter. operation executor.go:69 running operation install local:trusty/ snapdeltaservic e-11 uniter. operation executor.go:103 preparing operation "install local:trusty/ snapdeltaservic e-11" uniter. charm bundles.go:60 downloading local:trusty/ snapdeltaservic e-11 from https:/ /10.50. 28.64:17070/ environment/ 7da51b 76-a297- 4b45-8b9b- fcfb224f3b90/ charms? file=%2A& url=local% 3Atrusty% 2Fsnapdeltaserv ice-11 2016-09-22 03:33:16 INFO juju.worker. uniter. charm bundles.go:69 download complete snapdeltaservic e-11 snapdeltaservic e-11 exiting ra28611616d3e50 aab788f8a2afbff 62b18c02fe2/ 2" shutting down: ModeInstalling local:tr ervice- 11: preparing operation "install local:trusty/ snapdeltaservic e-11": failed to download charm "local: trusty/ snapdeltaservic e-11" from ["h 10.50.28. 64:17070/ environment/ 7da51b76- a297-4b45- 8b9b-fcfb224f3b 90/charms? file=%2A& url=local% 3Atrusty% 2Fsnapdeltaserv ice-11" ]: expected sha256 "c556018 16b759d8e84b26f 0e74df8c9036bb7 162076152168" , got "35372b4cd40573 879f3007fc9a365 7ad46aae49ecef1 0e81ff34c4563a9 8f66a"
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:567 ModeContinue starting
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:59 resuming charm install
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:569 ModeContinue exiting
2016-09-22 03:33:15 INFO juju.worker.uniter modes.go:567 ModeInstalling local:trusty/
2016-09-22 03:33:15 INFO juju.worker.
2016-09-22 03:33:15 INFO juju.worker.
2016-09-22 03:33:15 INFO juju.worker.
2016-09-22 03:33:16 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] failed: install local:trusty/
2016-09-22 03:33:16 INFO juju.worker.uniter modes.go:569 ModeInstalling local:trusty/
2016-09-22 03:33:16 INFO juju.worker.uniter uniter.go:203 unit "sds-worker-
usty/snapdeltas
ttps://
53a737783602d1e
2016-09-22 03:33:16 DEBUG juju.worker.uniter runlistener.go:109 juju-run listener stopping
2016-09-22 03:33:16 DEBUG juju.worker.uniter runlistener.go:129 juju-run listener stopped
```
The fix was to run:
sudo mv /var/lib/ juju/charm- get-cache/ local_3a_ trusty_ 2f_snapdeltaser vice-11. zip{,.EVIL}
on machine 0.