M/N upgrades - Ceph will break after the convergence step if the image uses ext4

Bug #1628874 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Giulio Fidente

Bug Description

After the M/N convergence step, if the image is using ext4 ceph jewel will refuse to start:
[root@overcloud-cephstorage-0 ~]# ceph -s
    cluster 03168daa-85f5-11e6-967b-002c0fb6a95e
     health HEALTH_ERR

root@overcloud-cephstorage-0 ~]# tail /var/log/ceph/ceph-osd.2.log
2016-09-29 09:04:16.406971 7f98ee6fd800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2016-09-29 09:04:16.406976 7f98ee6fd800 1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 15: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-09-29 09:04:16.407155 7f98ee6fd800 1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 15: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-09-29 09:04:16.407769 7f98ee6fd800 1 filestore(/var/lib/ceph/osd/ceph-2) upgrade
2016-09-29 09:04:16.408062 7f98ee6fd800 -1 osd.2 0 backend (filestore) is unable to support max object name[space] len
2016-09-29 09:04:16.408075 7f98ee6fd800 -1 osd.2 0 osd max object name len = 2048
2016-09-29 09:04:16.408077 7f98ee6fd800 -1 osd.2 0 osd max object namespace len = 256
2016-09-29 09:04:16.408079 7f98ee6fd800 -1 osd.2 0 (36) File name too long
2016-09-29 09:04:16.409545 7f98ee6fd800 1 journal close /var/lib/ceph/osd/ceph-2/journal
2016-09-29 09:04:16.410254 7f98ee6fd800 -1 ** ERROR: osd init failed: (36) File name too long

This happened even though we had the following parameters set in puppet:
/etc/puppet/hieradata/extraconfig.yaml:ceph::profile::params::osd_max_object_name_len: 256
/etc/puppet/hieradata/extraconfig.yaml:ceph::profile::params::osd_max_object_namespace_len: 64

The conf file already had the settings:
osd_max_object_namespace_len = 64
osd_max_object_name_len = 256

The issue is that the convergence step will not restart the osd's (which is a good thing), but the new settings won't be applied then and so Ceph in Newton (Jewel) will fail to start

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/379401

Changed in tripleo:
assignee: nobody → Giulio Fidente (gfidente)
status: New → In Progress
Changed in tripleo:
milestone: none → ocata-1
tags: added: newton-backport-potential newton-rc-potential
Changed in tripleo:
milestone: ocata-1 → newton-rc3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/379401
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=27e1d105fb805327993f468aeccef60ab2743854
Submitter: Jenkins
Branch: master

commit 27e1d105fb805327993f468aeccef60ab2743854
Author: Giulio Fidente <email address hidden>
Date: Thu Sep 29 13:52:32 2016 +0200

    Set ceph osd max object name and namespace len on upgrade when on ext4

    As per [1] we need to lower osd max object name and namespace len when
    upgrading from Hammer and the OSD is backed by ext4.

    These could also be given via ExtraConfig but on upgrade we only run
    puppet apply after this script is executed, so the values won't be
    effective unless the daemon is restarted. Yet we do not want puppet
    to restart the daemon because we can't bring all OSDs down
    unconditionally or guests will die.

    1. http://tracker.ceph.com/issues/16187

    Co-Authored-By: Michele Baldessari <email address hidden>
    Co-Authored-By: Dimitri Savineau <email address hidden>
    Change-Id: I7fec4e2426bdacd5f364adbebd42ab23dcfa523a
    Closes-Bug: 1628874

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/381375

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/newton)

Reviewed: https://review.openstack.org/381375
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c4703293e98b57865b19f3f12f675cda85519231
Submitter: Jenkins
Branch: stable/newton

commit c4703293e98b57865b19f3f12f675cda85519231
Author: Giulio Fidente <email address hidden>
Date: Thu Sep 29 13:52:32 2016 +0200

    Set ceph osd max object name and namespace len on upgrade when on ext4

    As per [1] we need to lower osd max object name and namespace len when
    upgrading from Hammer and the OSD is backed by ext4.

    These could also be given via ExtraConfig but on upgrade we only run
    puppet apply after this script is executed, so the values won't be
    effective unless the daemon is restarted. Yet we do not want puppet
    to restart the daemon because we can't bring all OSDs down
    unconditionally or guests will die.

    1. http://tracker.ceph.com/issues/16187

    Co-Authored-By: Michele Baldessari <email address hidden>
    Co-Authored-By: Dimitri Savineau <email address hidden>
    Change-Id: I7fec4e2426bdacd5f364adbebd42ab23dcfa523a
    Closes-Bug: 1628874
    (cherry picked from commit 27e1d105fb805327993f468aeccef60ab2743854)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0rc3

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.