jujud services not starting after reboot when /var is on separate partition

Bug #1634390 reported by Sandor Zeestraten on 2016-10-18
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju
High
Vinodhini
juju-core
Critical
Unassigned

Bug Description

# Issue
We have machines in MAAS where we've split /var to a separate partition.
Deploying machines and services with Juju works fine, however the juju agent (jujud) services will not start when a machine restarts as systemd does not find the services on start due to (I believe) them being symlinked from /var/lib/juju/init/

As a workaround, you can reload systemd so it finds the services and then manually enable them, however that is not a proper solution.

# Output from df and systemctl
http://pastebin.com/t4BLGKGx

# Versions
Juju 2.0.0-xenial-amd64
MAAS 2.0.0

Changed in juju:
importance: Undecided → Medium
status: New → Triaged
milestone: none → 2.1.0
Sandor Zeestraten (szeestraten) wrote :

I managed to reproduce the issue in a fresh MAAS setup.
Deployed two different machines, one with just / and one with split partition / and /var

# Works OK
ubuntu@maas-node06:~$ sudo lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME FSTYPE SIZE MOUNTPOINT LABEL
vda 20G
└─vda1 LVM2_member 20G
  └─vgroot-root ext4 9.3G /
vdb 10G

# Does not work
ubuntu@maas-node07:~$ sudo lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME FSTYPE SIZE MOUNTPOINT LABEL
vda 20G
└─vda1 LVM2_member 20G
  ├─vgroot-root ext4 9.3G /
  └─vgroot-var ext4 4.7G /var
vdb 10G

Changed in juju:
importance: Medium → Critical
Mick Gregg (macgreagoir) wrote :

@anatasia-macmood I've seen a few comments about the place to do with when systemd [re]mounts /var. I'm guessing some tweaking of the service script might help any race.

Changed in juju:
milestone: 2.1.0 → 2.2.0
importance: Critical → High
Anastasia (anastasia-macmood) wrote :

Re-targeting to next milestone - further investigations are needed as this may not be something that can be fixed in Juju.

Sandor Zeestraten (szeestraten) wrote :

@anatasia-macmood The offending systemd service files are symlinked from /var/lib/juju/init/ to /etc/systemd/system/ by the Juju agent installer I presume.

Systemd manages to load these files fine when I simply place in /etc/systemd/system/ like Juju already does with the juju-clean-shutdown.service.

As @macgreagoir mentioned, someone with some more systemd knowledge might have a better idea if the service script can be tweaked or if the location of these files should be reconsidered.

Anyway, I think it is safe to say that Juju should not assume that /var is on the same partition as / as splitting these is a relatively common practice. Perhaps it could be added as a test case?

Changed in juju:
assignee: nobody → Richard Harding (rharding)
Ryan Beisner (1chb1n) wrote :

I've run into this as a side effect of working around https://bugs.launchpad.net/bugs/1492237 (which was my root symptom: controller disk fills up rapidly).

The model is 1.25.6, and is a long-running production deployment. The controllers (3 in HA) bumped up against > 98% disk space usage and it became impossible to issue juju commands or even get status.

I stopped juju services manually on each of the controller units, added storage, moved contents of /var/lib/juju, updated fstab, rebooted. But then none of the juju-* services would start.

Systemd unit files are read earlier in the boot process than mounts are handled, and since they are symlinks to files on a separate mount, the systemd unit files simply did not load.

I removed the symlinks and just copied the systemd unit files in place, and the controllers are happy once again, with a ton of space available. Juju status and other juju commands are back to normal.

Example, on unit 0:

sudo mv -fv /etc/systemd/system/juju-db.service /etc/systemd/system/juju-db.service.hold.$(date +%s )
sudo mv -fv /etc/systemd/system/multi-user.target.wants/juju-db.service /etc/systemd/system/multi-user.target.wants/juju-db.service.hold.$(date +%s )

sudo cp -fvp /var/lib/juju/init/juju-db/juju-db.service /etc/systemd/system/juju-db.service
sudo cp -fvp /var/lib/juju.hold/init/juju-db/juju-db.service /etc/systemd/system/multi-user.target.wants/juju-db.service

sudo mv -fv /etc/systemd/system/jujud-machine-0.service /etc/systemd/system/jujud-machine-0.service.hold.$(date +%s )
sudo mv -fv /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service.hold.$(date +%s )

sudo cp -fvp /var/lib/juju/init/jujud-machine-0/jujud-machine-0.service /etc/systemd/system/jujud-machine-0.service
sudo cp -fvp /var/lib/juju/init/jujud-machine-0/jujud-machine-0.service /etc/systemd/system/multi-user.target.wants/jujud-machine-0.service

That may or may not be the best approach, and will likely require careful attention on upgrades, but it got us back up and out of quite a snag.

tags: added: uosci
Ryan Beisner (1chb1n) wrote :

Added juju-core as this impacts long-running production workloads, which at this point in time are vastly on 1.25.x, with no upgrade path to 2.x.

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.10
Curtis Hovey (sinzui) on 2017-01-27
Changed in juju-core:
milestone: 1.25.10 → none
Changed in juju-core:
milestone: none → 1.25.11
Changed in juju:
assignee: Richard Harding (rharding) → nobody
Curtis Hovey (sinzui) on 2017-03-24
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Curtis Hovey (sinzui) on 2017-03-28
Changed in juju-core:
milestone: 1.25.11 → none
Curtis Hovey (sinzui) on 2017-03-30
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Tim Penhey (thumper) wrote :

Firstly, lets be honest, we aren't going to address this on 1.25.

Changed in juju-core:
status: Triaged → Won't Fix
Tim Penhey (thumper) wrote :

Juju shouldn't be storing the systemd files in /var and symlinking. It appears that other apps put them in /lib/systemd/system and symlink into /etc/systemd/system.

Removing the milestone instead of punting down the road.

Changed in juju:
importance: High → Medium
milestone: 2.2-rc1 → none

Would be great if this bug can be prioritized as it is a pain point in our dev and prod environments.

Any chance of a look at this for 2.4?

Ian Booth (wallyworld) on 2017-11-07
Changed in juju:
milestone: none → 2.4-beta1
importance: Medium → High
Ian Booth (wallyworld) on 2018-04-16
Changed in juju:
assignee: nobody → Vinodhini (vinu-b)
Ian Booth (wallyworld) wrote :

We can put the directories currently placed in /var/lib/juju/init into /lib/systemd/juju instead,
The juju related symlinks in /etc/systemd/system would just be retargetted.

We'll need an upgrade step to copy across existing service files.

Given that they are in /etc/systemd isn't this just changing one
failure-when-on-a-different-filesystem for another?

John
=:->

On Mon, Apr 16, 2018 at 5:27 AM, Ian Booth <email address hidden> wrote:

> We can put the directories currently placed in /var/lib/juju/init into
> /lib/systemd/juju instead,
> The juju related symlinks in /etc/systemd/system would just be retargetted.
>
> We'll need an upgrade step to copy across existing service files.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1634390
>
> Title:
> jujud services not starting after reboot when /var is on separate
> partition
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1634390/+subscriptions
>

Ian Booth (wallyworld) wrote :

From what I can see, it seems that many systemd services as set up by a dsitro install are configured by placing the actual service files themselves into /lib/systemd and then linking to /etc/systemd. Si I assume there's an expectation that /etc and /lib are on the same partition. I can see that in many cases /var would be on a different partition, as that files with logs etc

Ian Booth (wallyworld) on 2018-04-18
Changed in juju:
status: Triaged → In Progress
Changed in juju:
milestone: 2.4-beta1 → none
Ian Booth (wallyworld) on 2018-04-20
Changed in juju:
milestone: none → 2.4-beta2
Changed in juju:
milestone: 2.4-beta2 → none
Vinodhini (vinu-b) wrote :
Ian Booth (wallyworld) on 2018-05-13
Changed in juju:
milestone: none → 2.4-rc1
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers