ci: racy deployments with Nova / Neutron

Bug #1663273 reported by Emilien Macchi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Emilien Macchi

Bug Description

We promoted TripleO CI to deploy OpenStack from trunk on 9th February, which means we run Nova from trunk again and not from ocata-3 anymore.

It seems our CI is now randomly failing to register nova placement api service and nova compute fails to work. Because this is racy, we need to make sure nova compute starts after Nova Placement API service is running and its resources in Keystone are created.

Changed in tripleo:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431606

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431641

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431725

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/431606
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3b00ffc728b47e132b3ed8bc460f8697ddb32047
Submitter: Jenkins
Branch: master

commit 3b00ffc728b47e132b3ed8bc460f8697ddb32047
Author: Emilien Macchi <email address hidden>
Date: Thu Feb 9 10:30:54 2017 -0500

    start nova-compute when keystone resources are created

    1. Move keystone resources management at step 4.
    2. Move nova-compute startup at step 5.

    That way, we make sure nova-compute will start when all Keystone
    resources are ready.

    Change-Id: I6e153e11b8519254d2a67b9142bf774a25bce69d
    Closes-Bug: #1663273

Changed in tripleo:
status: In Progress → Fix Released
tags: removed: alert
Revision history for this message
Emilien Macchi (emilienm) wrote :
tags: added: alert
Changed in tripleo:
status: Fix Released → In Progress
summary: - ci: nova-placement-api randomly fails to register the resource
+ ci: racy deployments with Nova / Neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/431641
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=ee1a836d4b8283905f364941ab649d8ce1bbe2ce
Submitter: Jenkins
Branch: master

commit ee1a836d4b8283905f364941ab649d8ce1bbe2ce
Author: Emilien Macchi <email address hidden>
Date: Thu Feb 9 11:41:09 2017 -0500

    nova: start compute after keystone endpoints/services

    Nova team suggest it to avoid situations where nova-compute don't find
    the placement service in the service catalog, which could lead to
    compute resource not registred.

    Change-Id: I9692ddbeb0d06889c1a18a7203ab9d0409bfa04b
    Closes-Bug: #1663273

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/431725
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=bb63f514d22ea82d17947a5972b4da16e66b5a36
Submitter: Jenkins
Branch: master

commit bb63f514d22ea82d17947a5972b4da16e66b5a36
Author: Emilien Macchi <email address hidden>
Date: Thu Feb 9 14:34:13 2017 -0500

    Run nova-cell_v2-discover_hosts at step 5

    We need to run nova-cell_v2-discover_hosts at the very end of the
    deployment because nova database needs to be aware of all registred
    compute hosts.

    1. Move keystone resources management at step 3.
    2. Move nova-compute service at step 4.
    3. Move nova-placement-api at step 3.
    5. Run nova-cell_v2-discover_hosts at step 5 on one nova-api node.
    6. Run neutron-ovs-agent at step 5 to avoid racy deployments where
       it starts before neutron-server when doing HA deployments.

    With that change, we expect Nova aware of all compute services deployed
    in TripleO during an initial deployment.

    Depends-On: If943157b2b4afeb640919e77ef0214518e13ee15
    Change-Id: I6f2df2a83a248fb5dc21c2bd56029eb45b66ceae
    Related-Bug: #1663273
    Related-Bug: #1663458

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 6.0.0.0rc1

This issue was fixed in the openstack/instack-undercloud 6.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.2.0

This issue was fixed in the openstack/puppet-tripleo 6.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/523508

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/523508
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c54c6222d8b517b8ee492caddaf47d2ef5780f31
Submitter: Zuul
Branch: master

commit c54c6222d8b517b8ee492caddaf47d2ef5780f31
Author: Alex Schultz <email address hidden>
Date: Tue Nov 28 12:15:14 2017 -0700

    Fix neutron agent start order

    In the baremetal deployment, we used to ensure that neutron-server was
    started prior to starting up the various agents. In the containerized
    deployment we need to ensure that we launch the agents after the server
    has been started. We can do this by configuring a start_order for each
    of the services.

    It should be noted that the ovs agent was actually configured to start
    in step5 on baremetal due to previous race conditions under HA
    deployments. This change leaves it in step4 but configures the
    start_order to be after the neutron-api service.

    Change-Id: I3794400ef5c8ae620961914831ff85e3438b0399
    Closes-Bug: #1734976
    Related-Bug: #1663273

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/524056

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/524056
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6f250f2e05823d25db75287b921dc4f7a70ae542
Submitter: Zuul
Branch: stable/pike

commit 6f250f2e05823d25db75287b921dc4f7a70ae542
Author: Alex Schultz <email address hidden>
Date: Tue Nov 28 12:15:14 2017 -0700

    Fix neutron agent start order

    In the baremetal deployment, we used to ensure that neutron-server was
    started prior to starting up the various agents. In the containerized
    deployment we need to ensure that we launch the agents after the server
    has been started. We can do this by configuring a start_order for each
    of the services.

    It should be noted that the ovs agent was actually configured to start
    in step5 on baremetal due to previous race conditions under HA
    deployments. This change leaves it in step4 but configures the
    start_order to be after the neutron-api service.

    Change-Id: I3794400ef5c8ae620961914831ff85e3438b0399
    Closes-Bug: #1734976
    Related-Bug: #1663273
    (cherry picked from commit c54c6222d8b517b8ee492caddaf47d2ef5780f31)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.