tripleo-ci-centos-7-3nodes-multinode timing out frequently in the gate

Bug #1768142 reported by Alex Schultz
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz
Revision history for this message
Alex Schultz (alex-schultz) wrote :

See additional comments in Bug 1772403

tags: added: promotion-blocker
Revision history for this message
Alex Schultz (alex-schultz) wrote :

It's failing because the controller services are not being correctly deployed on the 'controller' node. The only thing running on subnode-1 (controller) is mariadb, rabbitmq, memcache, haproxy and cinder-volume. The deployment is successful but the post deploy check in quickstart fails because the endpoint list is empty.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Actually in review of the roles file[0], the controller node is correctly deployed. We're missing something else on the ControlereApi node.

[0] https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/multinode-3nodes.yaml#L59-L85

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Download full text (3.2 KiB)

It appears it's broken because we don't actually run the boostrap bits for keystone because the ControllerApi node thinks that the Controller node is the bootstrap node for that role. It's not.

In the deployment log we see the 'Write docker-puppet-task json files' is skipped on the ControllerApi node.

http://logs.openstack.org/50/569550/8/check/tripleo-ci-centos-7-3nodes-multinode/fddbbcb/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-05-21_16_43_31

2018-05-21 16:43:31 | TASK [Write docker-puppet-tasks json files] ************************************
2018-05-21 16:43:31 | task path: /var/lib/mistral/1b504e8a-df7f-43e7-9673-ceaae3c65ffe/common_deploy_steps_tasks.yaml:160
2018-05-21 16:43:31 | skipping: [centos-7-rax-dfw-0004113538] => (item={'value': [{'puppet_tags': u'keystone_config,keystone_domain_config,keystone_endpoint,keystone_identity_provider,keystone_paste_ini,keystone_role,keystone_service,keystone_tenant,keystone_user,keystone_user_role,keystone_domain', 'config_volume': u'keystone_init_tasks', 'step_config': u'include ::tripleo::profile::base::keystone', 'config_image': u'192.168.24.1:8787/tripleomaster/centos-binary-keystone:20b99f6998c088650b0c0cb066cc6aac3e5f9312_b333f915'}], 'key': u'step_3'}) => {"changed": false, "item": {"key": "step_3", "value": [{"config_image": "192.168.24.1:8787/tripleomaster/centos-binary-keystone:20b99f6998c088650b0c0cb066cc6aac3e5f9312_b333f915", "config_volume": "keystone_init_tasks", "puppet_tags": "keystone_config,keystone_domain_config,keystone_endpoint,keystone_identity_provider,keystone_paste_ini,keystone_role,keystone_service,keystone_tenant,keystone_user,keystone_user_role,keystone_domain", "step_config": "include ::tripleo::profile::base::keystone"}]}, "skip_reason": "Conditional result was False"}

From common_deploy_steps_tasks.yaml the conditional is...

http://logs.openstack.org/50/569550/8/check/tripleo-ci-centos-7-3nodes-multinode/fddbbcb/logs/undercloud/var/lib/mistral/1b504e8a-df7f-43e7-9673-ceaae3c65ffe/common_deploy_steps_tasks.yaml.txt.gz

        - name: Write docker-puppet-tasks json files
          copy: content="{{item.value|to_json}}" dest=/var/lib/docker-puppet/docker-puppet-tasks{{item.key.replace("step_", "")}}.json force=yes mode=0600
          with_dict: "{{role_data_docker_puppet_tasks}}"
          when: deploy_server_id == bootstrap_server_id

If we look at the tripleo-ansible-inventory.yaml we see...

http://logs.openstack.org/50/569550/8/check/tripleo-ci-centos-7-3nodes-multinode/fddbbcb/logs/undercloud/var/lib/mistral/1b504e8a-df7f-43e7-9673-ceaae3c65ffe/tripleo-ansible-inventory.yaml.txt.gz

Controller:
  hosts:
    centos-7-rax-dfw-0004113541:
      <snip>
  vars: {ansible_ssh_user: tripleo-admin, bootstrap_server_id: ece919cf-6814-442d-89fa-0879678fcc7e,
    tripleo_role_name: Controller}
ControllerApi:
  hosts:
    centos-7-rax-dfw-0004113538:
      <snip>
  vars: {ansible_ssh_user: tripleo-admin, bootstrap_server_id: ece919cf-6814-442d-89fa-0879678fcc7e,
    tripleo_role_name: ControllerApi}

So because the bootstrap_server_id for the ControlerApi role is set to a Controller node, it never runs some of the steps which in this case includes t...

Read more...

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/570252

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/570275

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As it was commented in IRC, a bootstrap calculation per service is the way to go. And we have it already implemented via https://github.com/openstack/tripleo-heat-templates/blob/7b74e8b90e04a7390598f70f8a7e00eabc7ec870/overcloud.j2.yaml#L724 , which seems missed adoption in puppets

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

An example of a partial adoption done for mysql puppets http://codesearch.openstack.org/?q=mysql_short_bootstrap_node_name&i=nope&files=&repos=

No need to backport this also, as it was introduced in Rocky, since we switched to config-download. In queens we had a bootstrap per-role.

Thanks @jaosorior and @shardy for the inputs!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/570275
Reason: it is passing now w/o this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/570252
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0ef3058459c082b0f19b4fe275c38cd8e60fb8d8
Submitter: Zuul
Branch: master

commit 0ef3058459c082b0f19b4fe275c38cd8e60fb8d8
Author: Alex Schultz <email address hidden>
Date: Wed May 23 13:12:54 2018 -0600

    Fix 3node deployment

    The keystone role needs to be on the same node as the mysql/haproxy node
    due to a TLS requirement that the haproxy be on the 'primary' tagged
    node. Since the keystone bootstrap stuff only runs on the primary node
    as well, they need to be tied together.

    Change-Id: Ifa7ed93993082466a2a6ddff56bee58b074be512
    Closes-Bug: #1768142

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.