composed role for hci fails ceph-ansible deploy on step 2

Bug #1712912 reported by John Fulton
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Giulio Fidente

Bug Description

A pike deploy with ceph-ansible with 3 storage nodes, 1 compute node, and 1 controller node works as described in the docs [1]. However deleting that overcloud and then redeploying it with the following variations to produce an HCI deployment results in the step2 Heat > Mistral > Ansible workflow failing [2].

The variations to introduce HCI to the working deployment were only to run the same deployment command [3] but to use -r to reference an updated roles_data.yaml file to define an OsdCompute role [4] and change the node type counts and flavors [5].

Footnotes:

[1] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html

[2]
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 3408993b-c569-468b-b4f4-be2ea913ea27
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack create failed.
Heat Stack create failed.

[3]
time openstack overcloud deploy --templates ~/templates \
-r ~/tripleo-ceph-ansible/tht/roles_data.yaml \
-e ~/templates/environments/docker.yaml \
-e ~/templates/environments/docker-ha.yaml \
-e ~/templates/environments/low-memory-usage.yaml \
-e ~/templates/environments/disable-telemetry.yaml \
-e ~/docker_registry.yaml \
-e ~/templates/environments/ceph-ansible/ceph-ansible.yaml \
-e ~/templates/environments/ceph-ansible/ceph-mds.yaml \
-e ~/tripleo-ceph-ansible/tht/overcloud-ceph-ansible.yaml

[4]
###############################################################################
# Role: OsdCompute #
###############################################################################
- name: OsdCompute
  description: |
    Basic Compute Node role
  CountDefault: 0
  networks:
    - InternalApi
    - Tenant
    - Storage
    - StorageMgmt
  HostnameFormatDefault: '%stackname%-osd-compute-%index%'
  # Deprecated & backward-compatible values (FIXME: Make parameters consistent)
  # Set uses_deprecated_params to True if any deprecated params are used.
  uses_deprecated_params: True
  deprecated_param_image: 'NovaImage'
  deprecated_param_extraconfig: 'NovaComputeExtraConfig'
  deprecated_param_metadata: 'NovaComputeServerMetadata'
  deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
  deprecated_param_ips: 'NovaComputeIPs'
  deprecated_server_resource_name: 'NovaCompute'
  disable_upgrade_deployment: True
  ServicesDefault:
    - OS::TripleO::Services::CephOSD
    - OS::TripleO::Services::AuditD
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CephExternal
    - OS::TripleO::Services::CertmongerUser
    - OS::TripleO::Services::Collectd
    - OS::TripleO::Services::ComputeCeilometerAgent
    - OS::TripleO::Services::ComputeNeutronCorePlugin
    - OS::TripleO::Services::ComputeNeutronL3Agent
    - OS::TripleO::Services::ComputeNeutronMetadataAgent
    - OS::TripleO::Services::ComputeNeutronOvsAgent
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::FluentdClient
    - OS::TripleO::Services::Iscsid
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::MySQLClient
    - OS::TripleO::Services::NeutronLinuxbridgeAgent
    - OS::TripleO::Services::NeutronSriovAgent
    - OS::TripleO::Services::NeutronVppAgent
    - OS::TripleO::Services::NovaCompute
    - OS::TripleO::Services::NovaLibvirt
    - OS::TripleO::Services::NovaMigrationTarget
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::ContainersLogrotateCrond
    - OS::TripleO::Services::OpenDaylightOvs
    - OS::TripleO::Services::Securetty
    - OS::TripleO::Services::SensuClient
    - OS::TripleO::Services::Snmp
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoPackages
    - OS::TripleO::Services::Tuned
    - OS::TripleO::Services::Vpp
    - OS::TripleO::Services::OVNController

[5]
  OvercloudControlFlavor: control
  ControllerCount: 1
  OvercloudComputeFlavor: compute
  ComputeCount: 0
  #OvercloudCephStorageFlavor: ceph-storage
  #CephStorageCount: 3
  OvercloudOsdComputeFlavor: ceph-storage
  OsdComputeCount: 3

Revision history for this message
John Fulton (jfulton-org) wrote :

Is it possible that the env() is empty as per the mistral logs?

2017-08-24 16:57:33.764 20688 ERROR mistral.engine.task_handler [-] Failed to
               handle action completion [error=Can not evaluate YAQL expression
[expression=env().get('role_merged_configs').items().select($[1].get('ceph_osd_ansible_vars', {})).aggregate($1.mergeWith($2)), error=unhashable type: 'dict', data={}],
               wf=tripleo.storage.v1.ceph-install, task=set_role_vars, action=std.noop]:

 http://sprunge.us/KRGI

Revision history for this message
John Fulton (jfulton-org) wrote :

The output of the following:

 heat resource-show overcloud-AllNodesDeploySteps-2qschexqlpmb WorkflowTasks_Step2_Execution

is available at: http://sprunge.us/HaOA

Revision history for this message
John Fulton (jfulton-org) wrote :

Cleaned up output from the following containing the state_info, params, and output fields:

 heat resource-show overcloud-AllNodesDeploySteps-2qschexqlpmb WorkflowTasks_Step2_Execution

view with:

 head -1 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
 head -2 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
 head -3 state_info-params-output | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'

Revision history for this message
John Fulton (jfulton-org) wrote :

Cleaned up backslashes in params and set a few variables to make it parsable data structure:

 python params.py | jq "."

Revision history for this message
John Fulton (jfulton-org) wrote :
Download full text (3.6 KiB)

Wasn't able to reproduce failing YAQL query from comment #1. Perhaps clean up removed something relevant to the bug?

[jfulton@skagra Desktop]$ python params.py | jq "." > params.json
[jfulton@skagra Desktop]$ yaql
Yet Another Query Language - command-line query tool
Version 1.1.0
Copyright (c) 2013-2015 Mirantis, Inc

No data loaded into context
Type '@load data-file.json' to load data

yaql> @load params.json
Data from file params.json loaded into context
yaql>
yaql> $.env.get('role_merged_configs').items().select($[1].get('ceph_osd_ansible_vars', {})).aggregate($1.mergeWith($2))
{
    "ceph_conf_overrides": {
        "global": {
            "osd_pool_default_pg_num": 32,
            "osd_pool_default_size": 1
        }
    },
    "user_config": "True",
    "ceph_docker_image_tag": "tag-build-master-jewel-centos-7",
    "containerized_deployment": "True",
    "public_network": "192.168.24.0/24",
    "generate_fsid": "False",
    "monitor_address_block": "192.168.24.0/24",
    "raw_journal_devices": [
        "/dev/vdd"
    ],
    "keys": [
        {
            "mon_cap": "allow r",
            "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics",
            "name": "client.openstack",
            "key": "AQDCAJ9ZAAAAABAANewaxKzo+r5iX2PrtoLhjg==",
            "mode": "0644"
        },
        {
            "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create",
            "mds_cap": "allow *",
            "name": "client.manila",
            "mode": "0644",
            "key": "AQDCAJ9ZAAAAABAAFSO4TiGeDWHuO24n2+DIkQ==",
            "osd_cap": "allow rw"
        }
    ],
    "openstack_keys": [
        {
            "mon_cap": "allow r",
            "osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics",
            "name": "client.openstack",
            "key": "AQDCAJ9ZAAAAABAANewaxKzo+r5iX2PrtoLhjg==",
            "mode": "0644"
        },
        {
            "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create",
            "mds_cap": "allow *",
            "name": "client.manila",
            "mode": "0644",
            "key": "AQDCAJ9ZAAAAABAAFSO4TiGeDWHuO24n2+DIkQ==",
            "osd_cap": "allow rw"
        }
    ],
    "journal_collocation": "False",
    "ntp_service_enabled": "False",
    "ceph_docker_image": "ceph/daemon",
    "docker": "True",
    "fsid": "810a20d4-88ea-11e7-8968-00979f13efb1",
    "journal_size": 256,
    "openstack_config": "True",
    "ceph_docker_registry": "docker.io",
    "ceph_stable": "True",
    "devices": [
        "/dev/vdb",
        "/dev/vdc"
    ],
    "raw_multi_journal": "True",
    "ceph_origin": "distro",
    "openstack_pools": [
        {
            "rule_name": "",
            "pg_num": 32,
            "name": "volumes"
        },
        {
            "rule_name...

Read more...

Revision history for this message
John Fulton (jfulton-org) wrote :

More logs about YAQL error at http://sprunge.us/KRGI

Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Changed in tripleo:
importance: Medium → High
Revision history for this message
John Fulton (jfulton-org) wrote :

- Modified the task to log the full env() on which the yaql failed [1].
- A non-truncated view of that env() was then extracted [2] from the logs to http://sprunge.us/hcMW

Footnotes:
[1]
 [root@undercloud mistral]# grep -A 2 -B 2 show_env /home/stack/tripleo-common/workbooks/ceph-ansible.yaml
      - ceph_ansible_playbook: /usr/share/ceph-ansible/site-docker.yml.sample
    tasks:
      show_env:
        action: std.echo output=<% env() %>
        publish:
          output: <% task(show_env).result %>
        on-success: enable_ssh_admin
      enable_ssh_admin:
 [root@undercloud mistral]#

[2]
 [root@undercloud mistral]# grep show_env engine.log | grep "2017-08-25 16:12:45.433" | curl -F 'sprunge=<-' http://sprunge.us
http://sprunge.us/hcMW
 [root@undercloud mistral]#

Revision history for this message
John Fulton (jfulton-org) wrote :

A similar issue was resolved using the safer queries found in
https://review.openstack.org/#/c/499624. I will re-rest with that patch and update this report.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/499624
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=e325fffb8e3e10ff3d45c1951302092ef39ca2e4
Submitter: Jenkins
Branch: master

commit e325fffb8e3e10ff3d45c1951302092ef39ca2e4
Author: Giulio Fidente <email address hidden>
Date: Thu Aug 31 16:40:55 2017 +0200

    Parse ceph_client_ansible_vars in ceph-ansible workbook

    We might emit ceph_client_ansible_vars when configuring a Ceph
    client in the overcloud with an external Ceph cluster.

    Also refactors the YAQL to collect ceph-ansible parameters to
    be safer.

    Co-Authored-By: John Fulton <email address hidden>
    Change-Id: Ifc57c9cf6ca8017a2abc78d6320c0675ad49ca9f
    Related-Bug: #1714271
    Related-Bug: #1712912

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/500580

Changed in tripleo:
milestone: pike-rc2 → queens-1
Revision history for this message
Giulio Fidente (gfidente) wrote :

I think this was fixed by https://review.openstack.org/#/c/499624 and we can close it

Changed in tripleo:
status: Triaged → Fix Committed
milestone: queens-1 → pike-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/pike)

Reviewed: https://review.openstack.org/500580
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ed0c9c31a1b0aa4fef6388827af1f8490d4771c2
Submitter: Jenkins
Branch: stable/pike

commit ed0c9c31a1b0aa4fef6388827af1f8490d4771c2
Author: Giulio Fidente <email address hidden>
Date: Thu Aug 31 16:40:55 2017 +0200

    Parse ceph_client_ansible_vars in ceph-ansible workbook

    We might emit ceph_client_ansible_vars when configuring a Ceph
    client in the overcloud with an external Ceph cluster.

    Also refactors the YAQL to collect ceph-ansible parameters to
    be safer.

    Co-Authored-By: John Fulton <email address hidden>
    Change-Id: Ifc57c9cf6ca8017a2abc78d6320c0675ad49ca9f
    Related-Bug: #1714271
    Related-Bug: #1712912
    (cherry picked from commit e325fffb8e3e10ff3d45c1951302092ef39ca2e4)

tags: added: in-stable-pike
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.