Ceph mon doesn't restart on reboot with Xenial when using ceph-{mon,osd}@ systemd units

Bug #1646583 reported by Samuel Matzek
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
High
James Page
Xenial
Fix Released
High
James Page
Yakkety
Fix Released
High
James Page
Zesty
Fix Released
High
James Page

Bug Description

[Impact]
Users of the ceph-osd@ and ceph-mon@ systemd units are not able to reboot servers; the targets which ensure that daemons managed this way are missing from the packages. There is also no nice way to restart all ceph daemons on a machine due to the missing systemd targets.

[Test Case]
Install ceph-mon and ceph-osd machines
Enable and initialise cluster using ceph-osd@ and ceph-mon@ unit files.
Confirm working
Reboot machines
ceph-mon units will not start automatically after reboot
ceph-osd units will start, but only due to udev rule processing

[Regression Potential]
Minimal; we're introducing the missing targets to the packages; these targets will be enabled and started on install, and the change to the packaging ensures that the ceph-mon/mds/create-keys systemd service units
provided directly by the packaging are managed as before (not auto enabled and started).

[Original Bug Report]
The ceph monitor and osd daemons do not restart automatically on server reboot on Xenial.

$ apt-cache policy ceph
ceph:
  Installed: 10.2.2-0ubuntu0.16.04.2
  Candidate: 10.2.2-0ubuntu0.16.04.2
  Version table:
 *** 10.2.2-0ubuntu0.16.04.2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     10.1.2-0ubuntu1 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

This is likely caused by the same clash between upstream target files and Ubuntu systemd unit files that is noted in [1] but I have opened a separate bug because I believe the severity of not having Ceph restart automatically on server reboot is higher severity than not being able to start/stop all services with one command, which is what is reported in [1].

The source of the Ubuntu package [2] does not have the necessary target files that upstream Ceph added to allow for restart on reboot under this commit [3] which is included in Ceph's github tag 10.2.2.

[1] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1627640
[2] https://bugs.launchpad.net/ubuntu/+source/ceph/10.2.3-0ubuntu0.16.04.2
[3] https://github.com/ceph/ceph/commit/15c4ad44010c798af7804e287ba71dcc289f806f

Revision history for this message
Samuel Matzek (smatzek) wrote :

I will add that Ceph Ansible v2.0.0 was used to install Ceph from both the distro apt repo and UCA, both with the same result.

Revision history for this message
James Page (james-page) wrote :

I agree that there is a behavioural difference between the systemd configurations provided by upstream, and the systemd configuration in distro (and the reason why is outlined in bug 1627640).

Services not restarting post reboot is not something I've seen (deploying using charms which don't do anything specific with regards to systemd configurations).

Can you confirm which systemd service is used for the ceph-mon's in ceph ansible? I think this is most likely the cause - the vanilla ceph-mon service provided by the Ubuntu and Debian packaging should automatically startup on reboots, but I'm not 100% sure about the ceph-mon@ version provided by upstream packages.

Revision history for this message
James Page (james-page) wrote :

Comparing upstream tip of jewel vs ubuntu:

 install -m0644 systemd/ceph.target debian/ceph-common/lib/systemd/system

 install -m0644 systemd/ceph-mon.target debian/ceph-mon/lib/systemd/system
 install -m0644 systemd/ceph-osd.target debian/ceph-osd/lib/systemd/system
 install -m0644 systemd/ceph-mds.target debian/ceph-mds/lib/systemd/system
 install -m0644 systemd/ceph-radosgw.target debian/radosgw/lib/systemd/system
 install -m0644 systemd/ceph-rbd-mirror.target debian/rbd-mirror/lib/systemd/system

vs
 install -m0644 systemd/ceph.target debian/ceph-common/lib/systemd/system

the later 5 targets not being installed by the Ubuntu packaging.

Revision history for this message
James Page (james-page) wrote :

OK did a quick sanity check using charm; the ceph-mon daemon is managed using the distro provided ceph-mon systemd unit (not the upstream provided ceph-mon@ systemd unit); ceph-osd are managed in the standard way and do restart OK on reboot - specification I think the udev events that happen on boot trigger the OSD to startup independently of any top level system targets.

Revision history for this message
James Page (james-page) wrote :

I switch a ceph-mon unit over to use the ceph-mon@ unit instead; looks like that won't work well:

Created symlink from /<email address hidden> to /lib/systemd/system/ceph-mon@.service.

as the ceph-mon target does not exist. We could introduce the targets into the packaging but I'm unsure of the systemd behaviour with units and targets of the same names.

Revision history for this message
James Page (james-page) wrote :

OK so making the assumption that the ceph-ansible tooling relies on the ceph-mon@ units, I can confirm that a ceph-mon process is not automatically started on reboot.

This is a bit of a misalignment between ceph provided packaging and what's in ubuntu and debian (due to historical introduction of systemd support in Debian prior to upstream ceph).

Changed in ceph (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
James Page (james-page) wrote :

I'll do some testing around running with targets and units of the same name to ensure we don't regress tools which use the older, debian provided systemd units; if that works OK for zesty, we can SRU the target file updates back to xenial as well.

Changed in ceph (Ubuntu Yakkety):
importance: Undecided → High
status: New → Triaged
Changed in ceph (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote : Re: Ceph mon and osd don't restart on reboot with Xenial when using ceph-{mon,osd}@ systemd units

A quick test to introduce the ceph-mon target indicates that this looks OK from a systemd perspective.

summary: - Ceph mon and osd don't restart on reboot with Xenial
+ Ceph mon and osd don't restart on reboot with Xenial when using
+ ceph-{mon,osd}@ systemd units
James Page (james-page)
Changed in ceph (Ubuntu Zesty):
status: Triaged → In Progress
assignee: nobody → James Page (james-page)
Revision history for this message
Samuel Matzek (smatzek) wrote :

I reproduced this again as well and it seems the OSDs do restart OK on reboot while the Mon does not. I must have been confused when writing the bug as I opened it several days after encountering the issue.

Ceph-Ansible does the following to enable the ceph-mon in the init sequence and it is using the ceph-mon@ target.

https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/start_monitor.yml#L46
- name: start and add that the monitor service to the init sequence (for or after infernalis)
  command: systemctl enable ceph-mon@{{ monitor_name }}
  changed_when: false
  failed_when: false
  when:
    - use_systemd
    - ceph_release_num.{{ ceph_release }} > ceph_release_num.hammer

summary: - Ceph mon and osd don't restart on reboot with Xenial when using
+ Ceph mon doesn't restart on reboot with Xenial when using
ceph-{mon,osd}@ systemd units
Revision history for this message
Samuel Matzek (smatzek) wrote :

I installed Ceph from the SRU PPA using Ceph ansible and while the ceph-mon target file is there the Ceph monitor does not auto-start on reboot.

I've installed 10.2.5 from ceph.com apt repo simply using apt-get install and these are the units listed:
# systemctl list-units | grep ceph
ceph-mds.target
ceph-mon.target
ceph-osd.target
ceph.target

I believe the units after an apt install of the Canonical package do not match this. I'm pretty sure the ceph-mon.target and ceph-osd.targets are missing but I don't have a clean system to test this on at the moment. I'll update again later today after I try it out.

Revision history for this message
Samuel Matzek (smatzek) wrote :

I installed Ceph 10.2.5 from the SRU PPA, https://launchpad.net/~openstack-ubuntu-testing/+archive/ubuntu/ceph-sru, on a clean system and "# systemctl list-units | grep ceph" lists nothing.

As noted in comment 10, an install of Ceph 10.2.5 from Ceph.com gives this output:
# systemctl list-units | grep ceph
ceph-mds.target loaded active active ceph target allowing to start/stop all ceph-mds@.service instances at once
ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once
ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once
ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once

Using Ceph 10.2.5 from the SRU PPA, the enabling of a ceph monitor daemon using systemctl enable ceph-mon@<mon_name> makes the symlinks for the ceph-mon target to want the ceph-mon@<name>. However, since the ceph-mon.target itself is not active, the monitor won't start on reboot.

Debian packaging and systemd are a bit of a blackbox to me so I'm not sure where the targets are being auto-enabled during debian package install in the upstream packages and thus am not sure what should possibly change in the distro package to make them compatible.

Revision history for this message
Samuel Matzek (smatzek) wrote :

I am very new to debian package rules but I suspect the reason the units are enabled in the ceph.com package and not in the Ubuntu package is that upstream debian rules has:
 dh_systemd_enable
at [1], whereas Ubuntu Ceph has:

override_dh_systemd_enable:
 ## Do not enable services to match `dh_installinit --no-start`
 ## behaviour.
 ## Users are expected to "systemctl enable" services once their
 ## configuration is correct.

[1] https://github.com/ceph/ceph/blob/v10.2.5/debian/rules#L176

Revision history for this message
James Page (james-page) wrote :

Samuel

You're analysis LGTM - I'll figure out how we enable things appropriate with the mix of stuff we have in distro - I think targets need to be enabled by default, but the distro specific bits don't.

Revision history for this message
James Page (james-page) wrote :

OK after a few iterations I think I have this nailed for zesty - I'll update the xenial and yakkety packages in the ppa with the same change for further testing.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 10.2.5-0ubuntu2

---------------
ceph (10.2.5-0ubuntu2) zesty; urgency=medium

  * d/rules: Install upstream provided systemd targets and ensure they
    are enabled and started on install to ensure that integrations aligned
    to upstream packaging work with Ubuntu packages (LP: #1646583).
  * d/rules,d/p/powerpc_libatomic.patch: Ensure linking with -latomic,
    resolving FTBFS on powerpc architecture.

 -- James Page <email address hidden> Tue, 17 Jan 2017 11:10:40 +0000

Changed in ceph (Ubuntu Zesty):
status: In Progress → Fix Released
James Page (james-page)
description: updated
Revision history for this message
James Page (james-page) wrote :

OK PPA versions updated - xenial still building but yakkety and zesty LGTM now.

Changed in ceph (Ubuntu Yakkety):
assignee: nobody → James Page (james-page)
Changed in ceph (Ubuntu Xenial):
assignee: nobody → James Page (james-page)
Revision history for this message
Samuel Matzek (smatzek) wrote :

Thanks James.

I did a ceph-ansible cluster deploy using the SRU PPA on Xenial.

I rebooted the ceph monitor node and the ceph mon daemon started on reboot.
I tried out these commands on an OSD node and the OSDs were all stopped/started:
systemctl stop ceph-osd\*.service ceph-osd.target
systemctl start ceph-osd\*.service ceph-osd.target

which means that bug
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1627640

is likely fixed as well.

For completeness the systemd unit list post-install:
systemctl list-units | grep ceph
<email address hidden>
system-ceph\x2dcreate\x2dkeys.slice
system-ceph\x2dmon.slice
ceph-mon.target
ceph-osd.target
ceph.target

Revision history for this message
James Page (james-page) wrote :

SRU's uploaded to xenial and yakkety to resolve this issue (and a number of other things).

Revision history for this message
James Page (james-page) wrote :

Thanks for the testing Samuel - much appreciated!

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Samuel, or anyone else affected,

Accepted ceph into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/10.2.5-0ubuntu0.16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Samuel, or anyone else affected,

Accepted ceph into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/10.2.5-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
James Page (james-page) wrote :

Confirmed OK on upgraded xenial install; targets installed and started automatically, ceph-*@ daemon correct started up on reboot.

tags: added: verification-done-xenial verification-needed-yakkety
removed: verification-needed
Revision history for this message
James Page (james-page) wrote :

Also verified OK on yakkety.

tags: added: verfication-done
removed: verification-done-xenial verification-needed-yakkety
Revision history for this message
Samuel Matzek (smatzek) wrote :

I have verified on Xenial ppc64le with a Ceph-ansible deployment of the xenial-proposed 10.2.5. The ceph monitor starts on reboot.

Revision history for this message
James Page (james-page) wrote :

Thanks for the testing Samuel!

James Page (james-page)
tags: added: verification-done
removed: verfication-done
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 10.2.5-0ubuntu0.16.04.1

---------------
ceph (10.2.5-0ubuntu0.16.04.1) xenial; urgency=medium

  * New upstream stable release (LP: #1649856):
    - d/p/32bit-ftbfs.patch: Drop, no longer required.
    - d/p/*: Refresh.
    - d/ceph-common.install: Switch to RSA keys for drop.ceph.com.
  * d/rules: Install upstream provided systemd targets and ensure they
    are enabled and started on install to ensure that integrations aligned
    to upstream packaging work with Ubuntu packages (LP: #1646583).
  * d/ceph.*,d/*.logrotate: Install logrotate configuration
    in ceph-common, ensuring that all daemons get log rotation on
    log files, deal with removal of logrotate configuration in
    ceph for upgrades (LP: #1609866).

 -- James Page <email address hidden> Wed, 18 Jan 2017 13:59:57 +0000

Changed in ceph (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 10.2.5-0ubuntu0.16.10.1

---------------
ceph (10.2.5-0ubuntu0.16.10.1) yakkety; urgency=medium

  * New upstream stable release (LP: #1649856):
    - d/p/32bit-ftbfs.patch: Drop, no longer required.
    - d/p/*: Refresh.
    - d/ceph-common.install: Switch to RSA keys for drop.ceph.com.
  * d/rules: Install upstream provided systemd targets and ensure they
    are enabled and started on install to ensure that integrations aligned
    to upstream packaging work with Ubuntu packages (LP: #1646583).
  * d/ceph.{postinst,preinst,postrm}: Ensure that ceph logrotate
    configuration is purged on upgrade from pre-yakkety installs
    (LP: #1635844).
  * d/ceph-base.*,d/*.logrotate: Install logrotate configuration
    in ceph-common, ensuring that all daemons get log rotation on
    log files, deal with removal of logrotate configuration in
    ceph-base for upgrades (LP: #1609866).

 -- James Page <email address hidden> Wed, 18 Jan 2017 11:39:04 +0000

Changed in ceph (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.