Canonical Juju

Bug #1634390
Comment #5

Comment 5 for bug 1634390

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-04:

I've run into this as a side effect of working around https://bugs.launchpad.net/bugs/1492237 (which was my root symptom: controller disk fills up rapidly).

The model is 1.25.6, and is a long-running production deployment. The controllers (3 in HA) bumped up against > 98% disk space usage and it became impossible to issue juju commands or even get status.

I stopped juju services manually on each of the controller units, added storage, moved contents of /var/lib/juju, updated fstab, rebooted. But then none of the juju-* services would start.

Systemd unit files are read earlier in the boot process than mounts are handled, and since they are symlinks to files on a separate mount, the systemd unit files simply did not load.

I removed the symlinks and just copied the systemd unit files in place, and the controllers are happy once again, with a ton of space available. Juju status and other juju commands are back to normal.

Example, on unit 0:

sudo mv -fv /etc/systemd/system/juju-db.service /etc/systemd/system/juju-db.service.hold.$(date +%s )
sudo mv -fv /etc/systemd/system/multi-user.target.wants/juju-db.service /etc/systemd/system/multi-user.target.wants/juju-db.service.hold.$(date +%s )

sudo cp -fvp /var/lib/juju/init/juju-db/juju-db.service /etc/systemd/system/juju-db.service
sudo cp -fvp /var/lib/juju.hold/init/juju-db/juju-db.service /etc/systemd/system/multi-user.target.wants/juju-db.service

sudo mv -fv /etc/systemd/system/jujud-machine-0.service /etc/systemd/system/jujud-machine-0.service.hold.$(date +%s )
sudo mv -fv /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service.hold.$(date +%s )

sudo cp -fvp /var/lib/juju/init/jujud-machine-0/jujud-machine-0.service /etc/systemd/system/jujud-machine-0.service
sudo cp -fvp /var/lib/juju/init/jujud-machine-0/jujud-machine-0.service /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service

That may or may not be the best approach, and will likely require careful attention on upgrades, but it got us back up and out of quite a snag.

I've run into this as a side effect of working around https://bugs.launchpad.net/bugs/1492237 (which was my root symptom: controller disk fills up rapidly).

The model is 1.25.6, and is a long-running production deployment.  The controllers (3 in HA) bumped up against > 98% disk space usage and it became impossible to issue juju commands or even get status.

I stopped juju services manually on each of the controller units, added storage, moved contents of /var/lib/juju, updated fstab, rebooted.  But then none of the juju-* services would start.

Systemd unit files are read earlier in the boot process than mounts are handled, and since they are symlinks to files on a separate mount, the systemd unit files simply did not load.

I removed the symlinks and just copied the systemd unit files in place, and the controllers are happy once again, with a ton of space available.  Juju status and other juju commands are back to normal.

Example, on unit 0:

That may or may not be the best approach, and will likely require careful attention on upgrades, but it got us back up and out of quite a snag.