Ocata -> Pike upgrade of an environment with ceph nodes failure caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

Bug #1756363 reported by Marius Cornea
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
High
Unassigned

Bug Description

Description of problem:
Ocata -> Pike upgrade of an environment with ceph nodes failure caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

Version-Release number of selected component (if applicable):

How reproducible:
100%

Steps to Reproduce:
1. Deploy Ocata with 3 controller + 2 computes + 3 ceph nodes

timeout 100m openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
--log-file overcloud_deployment_81.log

Content of: /home/stack/virt/internal.yaml

parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig:
      ceph::profile::params::osd_pool_default_pg_num: 32
      ceph::profile::params::osd_pool_default_pgp_num: 32
      ceph::profile::params::osds:
       '/dev/vdb': {}

2. Run major upgrade composable steps:

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/ceph-ansible-env.yaml \
-e /home/stack/extraconfig_override.yaml \
-e /home/stack/docker-osp12.yaml \

Note the /home/stack/extraconfig_override.yaml content that overrides ExtraConfig:

parameter_defaults:
    ExtraConfig:
        ceph::profile::params::osd_pool_default_pgp_num: 32
        ceph::profile::params::osd_pool_default_pg_num: 32

Actual results:
Upgrade fails caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.1:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: a9b9c88b-b88c-4a5a-8bb2-30911c9540e1
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Check legacy Ceph hieradata] *********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.094349", "end": "2018-03-02 21:01:47.052440", "msg": "non-zero return code", "rc": 1, "start": "2018-03-02 21:01:46.958091", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/39bb3320-0bd6-4365-8f0a-a0ca7674142c_playbook.retry

    PLAY RECAP *********************************************************************
    localhost : ok=1 changed=0 unreachable=0 failed=1

    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: e83240df-2a7c-4ea9-b8b0-3edab4a0c4c9
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
  deploy_stdout: |
None
  deploy_stderr: |
None
overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.2:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 15e04b1d-bdd3-405d-9440-61face8ef484
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Check legacy Ceph hieradata] *********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.097766", "end": "2018-03-02 21:01:47.675402", "msg": "non-zero return code", "rc": 1, "start": "2018-03-02 21:01:47.577636", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/cead8171-a86d-4c4c-9597-a7d960de0170_playbook.retry

    PLAY RECAP *********************************************************************
    localhost : ok=1 changed=0 unreachable=0 faileHeat Stack update failed.
Heat Stack update failed.
d=1

    (truncated, view all with --long)
  deploy_stderr: |

Expected results:
Upgrade succeeds.

Additional info:

It looks that the ExtraConfig override didn't get applied on the nodes:

cat controller-0/etc/puppet/hieradata/extraconfig.json
{
    "ceph::profile::params::osd_pool_default_pg_num": 32,
    "ceph::profile::params::osd_pool_default_pgp_num": 32,
    "ceph::profile::params::osds": {
        "/dev/vdb": {}
    }
}

cat ceph-0/etc/puppet/hieradata/extraconfig.json
{
    "ceph::profile::params::osd_pool_default_pg_num": 32,
    "ceph::profile::params::osd_pool_default_pgp_num": 32,
    "ceph::profile::params::osds": {
        "/dev/vdb": {}
    }

Revision history for this message
Marius Cornea (mcornea) wrote :

I tried instead of overriding ExtraConfig in an additional environment file to remove the old hieradata from the existing environment file and this allowed me to move forward.

Original Environment file containing hiera:

parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig:
      ceph::profile::params::osd_pool_default_pg_num: 32
      ceph::profile::params::osd_pool_default_pgp_num: 32
      ceph::profile::params::osds:
       '/dev/vdb': {}

Adjusted file used during upgrade which allows the upgrade to pass:

parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig: {}
    CephPoolDefaultPgNum: 32
    CephAnsibleDisksConfig:
        devices:
            - '/dev/vdb'

Changed in tripleo:
milestone: none → rocky-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-upgrade (stable/pike)

Reviewed: https://review.openstack.org/553572
Committed: https://git.openstack.org/cgit/openstack/tripleo-upgrade/commit/?id=07e6d2673660ecada04233136c2ee611449c0b0d
Submitter: Zuul
Branch: stable/pike

commit 07e6d2673660ecada04233136c2ee611449c0b0d
Author: Marius Cornea <email address hidden>
Date: Thu Mar 15 14:04:10 2018 -0400

    Remove ceph osd hieradata during upgrade

    Overriding ExtraConfig in an extra environment file is not working
    as expected and stale hieradata is still remaining on the overcloud
    nodes. This removes the hieradata from existing environment file
    instead of overriding ExtraConfig.

    Related-bug: 1756363

    Change-Id: I4915f48b6711add3d6f013c375e08ca1b0d0b22e

tags: added: in-stable-pike
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
Lukas Bezdicka (social-b) wrote :

Fix in tripleo-upgrade is proper process. User has to unset ExtraConfig by specifying it to {}. Overriding in Upgrade could pose a risk.

Changed in tripleo:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.