composed role for hci fails ceph-ansible deploy on step 2

Bug #1712912 reported by John Fulton on 2017-08-24
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Giulio Fidente

Bug Description

A pike deploy with ceph-ansible with 3 storage nodes, 1 compute node, and 1 controller node works as described in the docs [1]. However deleting that overcloud and then redeploying it with the following variations to produce an HCI deployment results in the step2 Heat > Mistral > Ansible workflow failing [2].

The variations to introduce HCI to the working deployment were only to run the same deployment command [3] but to use -r to reference an updated roles_data.yaml file to define an OsdCompute role [4] and change the node type counts and flavors [5].

Footnotes:

[1] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html

[2]
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 3408993b-c569-468b-b4f4-be2ea913ea27
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack create failed.
Heat Stack create failed.

[3]
time openstack overcloud deploy --templates ~/templates \
-r ~/tripleo-ceph-ansible/tht/roles_data.yaml \
-e ~/templates/environments/docker.yaml \
-e ~/templates/environments/docker-ha.yaml \
-e ~/templates/environments/low-memory-usage.yaml \
-e ~/templates/environments/disable-telemetry.yaml \
-e ~/docker_registry.yaml \
-e ~/templates/environments/ceph-ansible/ceph-ansible.yaml \
-e ~/templates/environments/ceph-ansible/ceph-mds.yaml \
-e ~/tripleo-ceph-ansible/tht/overcloud-ceph-ansible.yaml

[4]
###############################################################################
# Role: OsdCompute #
###############################################################################
- name: OsdCompute
  description: |
    Basic Compute Node role
  CountDefault: 0
  networks:
    - InternalApi
    - Tenant
    - Storage
    - StorageMgmt
  HostnameFormatDefault: '%stackname%-osd-compute-%index%'
  # Deprecated & backward-compatible values (FIXME: Make parameters consistent)
  # Set uses_deprecated_params to True if any deprecated params are used.
  uses_deprecated_params: True
  deprecated_param_image: 'NovaImage'
  deprecated_param_extraconfig: 'NovaComputeExtraConfig'
  deprecated_param_metadata: 'NovaComputeServerMetadata'
  deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
  deprecated_param_ips: 'NovaComputeIPs'
  deprecated_server_resource_name: 'NovaCompute'
  disable_upgrade_deployment: True
  ServicesDefault:
    - OS::TripleO::Services::CephOSD
    - OS::TripleO::Services::AuditD
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CephExternal
    - OS::TripleO::Services::CertmongerUser
    - OS::TripleO::Services::Collectd
    - OS::TripleO::Services::ComputeCeilometerAgent
    - OS::TripleO::Services::ComputeNeutronCorePlugin
    - OS::TripleO::Services::ComputeNeutronL3Agent
    - OS::TripleO::Services::ComputeNeutronMetadataAgent
    - OS::TripleO::Services::ComputeNeutronOvsAgent
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::FluentdClient
    - OS::TripleO::Services::Iscsid
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::MySQLClient
    - OS::TripleO::Services::NeutronLinuxbridgeAgent
    - OS::TripleO::Services::NeutronSriovAgent
    - OS::TripleO::Services::NeutronVppAgent
    - OS::TripleO::Services::NovaCompute
    - OS::TripleO::Services::NovaLibvirt
    - OS::TripleO::Services::NovaMigrationTarget
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::ContainersLogrotateCrond
    - OS::TripleO::Services::OpenDaylightOvs
    - OS::TripleO::Services::Securetty
    - OS::TripleO::Services::SensuClient
    - OS::TripleO::Services::Snmp
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoPackages
    - OS::TripleO::Services::Tuned
    - OS::TripleO::Services::Vpp
    - OS::TripleO::Services::OVNController

[5]
  OvercloudControlFlavor: control
  ControllerCount: 1
  OvercloudComputeFlavor: compute
  ComputeCount: 0
  #OvercloudCephStorageFlavor: ceph-storage
  #CephStorageCount: 3
  OvercloudOsdComputeFlavor: ceph-storage
  OsdComputeCount: 3

John Fulton (jfulton-org) wrote :

Is it possible that the env() is empty as per the mistral logs?

2017-08-24 16:57:33.764 20688 ERROR mistral.engine.task_handler [-] Failed to
               handle action completion [error=Can not evaluate YAQL expression
[expression=env().get('role_merged_configs').items().select($[1].get('ceph_osd_ansible_vars', {})).aggregate($1.mergeWith($2)), error=unhashable type: 'dict', data={}],
               wf=tripleo.storage.v1.ceph-install, task=set_role_vars, action=std.noop]:

 http://sprunge.us/KRGI

John Fulton (jfulton-org) wrote :

The output of the following:

 heat resource-show overcloud-AllNodesDeploySteps-2qschexqlpmb WorkflowTasks_Step2_Execution

is available at: http://sprunge.us/HaOA

John Fulton (jfulton-org) wrote :

Cleaned up output from the following containing the state_info, params, and output fields:

 heat resource-show overcloud-AllNodesDeploySteps-2qschexqlpmb WorkflowTasks_Step2_Execution

view with:

 head -1 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
 head -2 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
 head -3 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'

John Fulton (jfulton-org) wrote :

Cleaned up backslashes in params and set a few variables to make it parsable data structure:

 python params.py | jq "."

John Fulton (jfulton-org) wrote :
Download full text (3.6 KiB)

Wasn't able to reproduce failing YAQL query from comment #1. Perhaps clean up removed something relevant to the bug?

[jfulton@skagra Desktop]$ python params.py | jq "." > params.json
[jfulton@skagra Desktop]$ yaql
Yet Another Query Language - command-line query tool
Version 1.1.0
Copyright (c) 2013-2015 Mirantis, Inc

No data loaded into context
Type '@load data-file.json' to load data

yaql> @load params.json
Data from file params.json loaded into context
yaql>
yaql> $.env.get('role_merged_configs').items().select($[1].get('ceph_osd_ansible_vars', {})).aggregate($1.mergeWith($2))
{
    "ceph_conf_overrides": {
        "global": {
            "osd_pool_default_pg_num": 32,
            "osd_pool_default_size": 1
        }
    },
    "user_config": "True",
    "ceph_docker_image_tag": "tag-build-master-jewel-centos-7",
    "containerized_deployment": "True",
    "public_network": "192.168.24.0/24",
    "generate_fsid": "False",
    "monitor_address_block": "192.168.24.0/24",
    "raw_journal_devices": [
        "/dev/vdd"
    ],
    "keys": [
        {
            "mon_cap": "allow r",
            "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics",
            "name": "client.openstack",
            "key": "AQDCAJ9ZAAAAABAANewaxKzo+r5iX2PrtoLhjg==",
            "mode": "0644"
        },
        {
            "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create",
            "mds_cap": "allow *",
            "name": "client.manila",
            "mode": "0644",
            "key": "AQDCAJ9ZAAAAABAAFSO4TiGeDWHuO24n2+DIkQ==",
            "osd_cap": "allow rw"
        }
    ],
    "openstack_keys": [
        {
            "mon_cap": "allow r",
            "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics",
            "name": "client.openstack",
            "key": "AQDCAJ9ZAAAAABAANewaxKzo+r5iX2PrtoLhjg==",
            "mode": "0644"
        },
        {
            "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create",
            "mds_cap": "allow *",
            "name": "client.manila",
            "mode": "0644",
            "key": "AQDCAJ9ZAAAAABAAFSO4TiGeDWHuO24n2+DIkQ==",
            "osd_cap": "allow rw"
        }
    ],
    "journal_collocation": "False",
    "ntp_service_enabled": "False",
    "ceph_docker_image": "ceph/daemon",
    "docker": "True",
    "fsid": "810a20d4-88ea-11e7-8968-00979f13efb1",
    "journal_size": 256,
    "openstack_config": "True",
    "ceph_docker_registry": "docker.io",
    "ceph_stable": "True",
    "devices": [
        "/dev/vdb",
        "/dev/vdc"
    ],
    "raw_multi_journal": "True",
    "ceph_origin": "distro",
    "openstack_pools": [
        {
            "rule_name": "",
            "pg_num": 32,
            "name": "volumes"
        },
        {
            "rule_name...

Read more...

John Fulton (jfulton-org) wrote :

More logs about YAQL error at http://sprunge.us/KRGI

Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Changed in tripleo:
importance: Medium → High
John Fulton (jfulton-org) wrote :

- Modified the task to log the full env() on which the yaql failed [1].
- A non-truncated view of that env() was then extracted [2] from the logs to http://sprunge.us/hcMW

Footnotes:
[1]
 [root@undercloud mistral]# grep -A 2 -B 2 show_env /home/stack/tripleo-common/workbooks/ceph-ansible.yaml
      - ceph_ansible_playbook: /usr/share/ceph-ansible/site-docker.yml.sample
    tasks:
      show_env:
        action: std.echo output=<% env() %>
        publish:
          output: <% task(show_env).result %>
        on-success: enable_ssh_admin
      enable_ssh_admin:
 [root@undercloud mistral]#

[2]
 [root@undercloud mistral]# grep show_env engine.log | grep "2017-08-25 16:12:45.433" | curl -F 'sprunge=<-' http://sprunge.us
http://sprunge.us/hcMW
 [root@undercloud mistral]#

John Fulton (jfulton-org) wrote :

A similar issue was resolved using the safer queries found in
https://review.openstack.org/#/c/499624. I will re-rest with that patch and update this report.

Reviewed: https://review.openstack.org/499624
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=e325fffb8e3e10ff3d45c1951302092ef39ca2e4
Submitter: Jenkins
Branch: master

commit e325fffb8e3e10ff3d45c1951302092ef39ca2e4
Author: Giulio Fidente <email address hidden>
Date: Thu Aug 31 16:40:55 2017 +0200

    Parse ceph_client_ansible_vars in ceph-ansible workbook

    We might emit ceph_client_ansible_vars when configuring a Ceph
    client in the overcloud with an external Ceph cluster.

    Also refactors the YAQL to collect ceph-ansible parameters to
    be safer.

    Co-Authored-By: John Fulton <email address hidden>
    Change-Id: Ifc57c9cf6ca8017a2abc78d6320c0675ad49ca9f
    Related-Bug: #1714271
    Related-Bug: #1712912

Changed in tripleo:
milestone: pike-rc2 → queens-1
Giulio Fidente (gfidente) wrote :

I think this was fixed by https://review.openstack.org/#/c/499624 and we can close it

Changed in tripleo:
status: Triaged → Fix Committed
milestone: queens-1 → pike-rc2

Reviewed: https://review.openstack.org/500580
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ed0c9c31a1b0aa4fef6388827af1f8490d4771c2
Submitter: Jenkins
Branch: stable/pike

commit ed0c9c31a1b0aa4fef6388827af1f8490d4771c2
Author: Giulio Fidente <email address hidden>
Date: Thu Aug 31 16:40:55 2017 +0200

    Parse ceph_client_ansible_vars in ceph-ansible workbook

    We might emit ceph_client_ansible_vars when configuring a Ceph
    client in the overcloud with an external Ceph cluster.

    Also refactors the YAQL to collect ceph-ansible parameters to
    be safer.

    Co-Authored-By: John Fulton <email address hidden>
    Change-Id: Ifc57c9cf6ca8017a2abc78d6320c0675ad49ca9f
    Related-Bug: #1714271
    Related-Bug: #1712912
    (cherry picked from commit e325fffb8e3e10ff3d45c1951302092ef39ca2e4)

tags: added: in-stable-pike
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers