Upgrading ceph-osd from Kilo to Mitaka "misses" /var/lib/ceph ownership change

Bug #1611082 reported by Rafael David Tinoco
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
High
James Page
ceph-osd (Juju Charms Collection)
Invalid
High
James Page

Bug Description

I just upgraded ceph-{osd,mon} from Kilo to Mitaka (when doing a full OpenStack upgrade)
and it looks like the charm missed the step described in:

http://docs.ceph.com/docs/master/release-notes/#upgrading-from-hammer

Fix the ownership:

chown -R ceph:ceph /var/lib/ceph
chown -R ceph:ceph /var/log/ceph

As you can see in the log files:

http://paste.ubuntu.com/22719924/
http://paste.ubuntu.com/22719976/

Fixing the ownership by hand makes the error:

 No block devices detected using current configuration

To disappear from juju status.

Tags: sts
tags: added: sts
Changed in ceph-osd (Juju Charms Collection):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I just updated the charm by trigger the upgrade process changing the source:

juju set ceph-mon source='trusty-proposed/mitaka'
juju set ceph-osd source='trusty-proposed/mitaka'

From the charm.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I think that the package itself should be responsible for ensuring that the path is accessible by the version ceph it is installing. The charm can/should certainly implement a workaround as well though.

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I actually committed a fix for this. I hit this also myself. Here's the patch: https://review.openstack.org/#/c/345646/ for ceph-mon and https://review.openstack.org/#/c/345647/ for ceph-osd. Just be aware that if you have a lot of data in your cluster this could be extremely slow or fail.

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I agree with you Ed. The package should ensure this. The charm fix is a band-aid.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

+1

Revision history for this message
James Page (james-page) wrote :

Actually the package *can't* do this - the packaging is configured not to restart or attempt to manage the ceph daemons during normal package updates; the reason being is that you need to co-ordinate this just like any other outage, and doing it during a package upgrade is not ideal and can cause huge lag in a large cluster during the package update transaction.

Revision history for this message
James Page (james-page) wrote :

Rafael

Which charm versions where you using? the ownership during upgrade change only just landed in the stable charms (as of end of July) so is it possible that you had an older charm version at the point of ceph upgrade?

Changed in ceph (Ubuntu):
status: New → Opinion
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello James,

I'm creating and destroying this deployment every day (trying to reproduce other bug). For this particular error, the environment was created a few hours before using:

$ cat ceph-osd.yaml
ceph-osd:
    source: 'cloud:trusty-proposed/kilo'
    osd-devices: /dev/vdb
    osd-reformat: "yes"

juju deploy --config=$CONFIGDIR/ceph-osd.yaml --to $tkcompute01 cs:trusty/ceph-osd ;
juju add-unit --to $tkcompute02 ceph-osd ; sleep 5
cmd juju add-unit --to $tkcompute03 ceph-osd ; sleep 5

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm re-deploying everything again.. upgrading nova-compute right now. Will update ceph again in a few more minutes. Let me know if you need anything from my setup.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Ok I just re-deployed everything.

After I finished upgrade, I rebooted all the environment.

After ceph machines came back, I had:

https://pastebin.canonical.com/162720/

So I guess something is wrong... 2nd time I was able to reproduce the same behaviour.

When changing permissions:

juju ssh ceph-osd/0 -- sudo chown -R ceph:ceph /var/lib/ceph
juju ssh ceph-osd/1 -- sudo chown -R ceph:ceph /var/lib/ceph
juju ssh ceph-osd/2 -- sudo chown -R ceph:ceph /var/lib/ceph

juju ssh ceph-osd/0 -- sudo chown -R ceph:ceph /var/log/ceph
juju ssh ceph-osd/1 -- sudo chown -R ceph:ceph /var/log/ceph
juju ssh ceph-osd/2 -- sudo chown -R ceph:ceph /var/log/ceph

Everything went back to normal:

https://pastebin.canonical.com/162725/

Revision history for this message
James Page (james-page) wrote :

Interestingly the packaging should be trying to make some directory changes:

       # 5. adjust file and directory permissions
       if ! dpkg-statoverride --list $SERVER_HOME >/dev/null
       then
           chown $SERVER_USER:$SERVER_GROUP $SERVER_HOME
           chmod u=rwx,g=rx,o= $SERVER_HOME
       fi
       if ! dpkg-statoverride --list /var/log/ceph >/dev/null
       then
           chown -R $SERVER_USER:$SERVER_GROUP /var/log/ceph
           # members of group ceph can log here, but cannot remove
           # others' files. non-members cannot read any logs.
           chmod u=rwx,g=rwxs,o=t /var/log/ceph
       fi

       # 6. fix /var/run/ceph
       if [ -d /var/run/ceph ]; then
           chown $SERVER_USER:$SERVER_GROUP /var/run/ceph
       fi

Revision history for this message
James Page (james-page) wrote :

Hi Rafael

I tried reproducing your issue this morning - found a different problem (bug 1611719) but that's unrelated to this - I did end up with a ceph deployment that had the correct permissions.

I deployed trusty-kilo; upgraded to trusty-liberty and then to trusty-mitaka.

Revision history for this message
James Page (james-page) wrote :

I see you appear to go direct from kilo->mitaka

I'll try that and see if that makes a difference.

Revision history for this message
James Page (james-page) wrote :

ceph-mon log for direct upgrade attempt:

2016-08-10 11:10:27 INFO juju-log old_version: cloud:trusty-kilo
2016-08-10 11:10:27 INFO juju-log new_version: cloud:trusty-mitaka
2016-08-10 11:10:27 INFO juju-log Invalid upgrade path from cloud:trusty-kilo to cloud:trusty-mitaka. Valid paths are: ['cloud:trusty-liberty -> cloud:trusty-mitaka', 'cloud:trusty-juno -> cloud:trusty-kilo', 'cloud:trusty-kilo -> cloud:trusty-liberty']

Revision history for this message
James Page (james-page) wrote :

ditto on ceph-osd:

2016-08-10 11:13:17 INFO juju-log old_version: cloud:trusty-kilo
2016-08-10 11:13:17 INFO juju-log new_version: cloud:trusty-mitaka
2016-08-10 11:13:18 INFO juju-log Invalid upgrade path from cloud:trusty-kilo to cloud:trusty-mitaka. Valid paths are: ['cloud:trusty-liberty -> cloud:trusty-mitaka', 'cloud:trusty-juno -> cloud:trusty-kilo', 'cloud:trusty-kilo -> cloud:trusty-liberty']

Revision history for this message
James Page (james-page) wrote :

Hmm I wonder whether this is due to the co-location of ceph-osd with nova-compute; which order did you perform the upgrade in? switching openstack-origin on the nova-compute node will trigger a package upgrade for all packages on the machine, including ceph.

ceph-osd can be more selective, so the upgrade process will only upgrade ceph packages during the rolling upgrade process.

So normal order would have to be ceph-osd first, and then nova-compute.

Also, looking at the code, the ceph-mon and ceph-osd charms will not support charm driven upgrades from proposed pockets - the map of supported paths is as follows:

# A dict of valid ceph upgrade paths. Mapping is old -> new
upgrade_paths = {
    'cloud:trusty-juno': 'cloud:trusty-kilo',
    'cloud:trusty-kilo': 'cloud:trusty-liberty',
    'cloud:trusty-liberty': 'cloud:trusty-mitaka',
}

So setting anything other than values in that map will not trigger any upgrade code in the charm.

Revision history for this message
James Page (james-page) wrote :

(just a hunch but I suspect your ceph-mon units never actually upgraded).

Revision history for this message
James Page (james-page) wrote :

Marking Ubuntu task as invalid - this problem its not in the packaging.

Changed in ceph (Ubuntu):
status: Opinion → Invalid
James Page (james-page)
Changed in ceph-osd (Juju Charms Collection):
status: New → Incomplete
James Page (james-page)
Changed in ceph-osd (Juju Charms Collection):
status: Incomplete → Triaged
assignee: Jorge Niedbalski (niedbalski) → nobody
milestone: none → 17.01
no longer affects: ceph (Ubuntu)
James Page (james-page)
Changed in ceph-osd (Juju Charms Collection):
importance: Medium → High
James Page (james-page)
Changed in ceph-osd (Juju Charms Collection):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.openstack.org/417616

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/417616
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=a60775be188a6450f1ec3c8486aad92032acf1d0
Submitter: Jenkins
Branch: master

commit a60775be188a6450f1ec3c8486aad92032acf1d0
Author: James Page <email address hidden>
Date: Sat Jan 7 16:07:36 2017 +0000

    Generalize upgrade paths for osds

    Make use of new charms.ceph utils to generalize the upgrade
    paths for OSD upgrades, ensuring that only supported upgrade
    paths are undertaken for Ubuntu 16.04 UCA pockets.

    Partial-Bug: 1611082

    Change-Id: Ifbf3a7ffbb5ab17e839099658c7a474784ab4083

James Page (james-page)
Changed in charm-ceph-osd:
assignee: nobody → James Page (james-page)
importance: Undecided → High
status: New → In Progress
Changed in ceph-osd (Juju Charms Collection):
status: In Progress → Invalid
James Page (james-page)
Changed in charm-ceph-osd:
status: In Progress → Fix Released
milestone: none → 17.02
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.