gate failures: Containers not accessible via ssh

Bug #1490142 reported by Jesse Pretorius
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Critical
Jesse Pretorius
Kilo
Fix Released
Critical
Jesse Pretorius
Trunk
Fix Released
Critical
Jesse Pretorius

Bug Description

Since the merge of https://review.openstack.org/216301 there have been consistent failures in the gate checks due to ansible not being able to access the containers via ssh.

The issue is caused directly by the above change - the apparmor profile is changed in the playbook pre task but the playbook does not wait for the container to be accessible via ssh before continuing.

Tags: in-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (master)

Fix proposed to branch: master
Review: https://review.openstack.org/218572

Changed in openstack-ansible:
status: New → In Progress
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/219638

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/218572
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=a40cb5811822181369ee3269bc57d8bd19f05913
Submitter: Jenkins
Branch: master

commit a40cb5811822181369ee3269bc57d8bd19f05913
Author: Jesse Pretorius <email address hidden>
Date: Sat Aug 29 13:23:54 2015 +0100

    Wait for container ssh after apparmor profile update

    This patch adds a wait for the container's sshd to be available
    after the container's apparmor profile is updated. When the
    profile is updated the container is restarted, so this wait is
    essential to the success of the playbook's completion.

    It also includes 3 retries which has been found to improve the
    rate of success.

    Due to an upstream change in behaviour with netaddr 0.7.16 we
    need to pin the package to a lower version until Neutron is
    adjusted and we bump the Neutron SHA.

    Change-Id: I30575ee31929b0c9af6353b7255cdfb6cebd2104
    Closes-Bug: #1490142

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/217014
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=5b9c77a55709925fea349ab128288218a50ef5a8
Submitter: Jenkins
Branch: kilo

commit 5b9c77a55709925fea349ab128288218a50ef5a8
Author: kevin <email address hidden>
Date: Mon Aug 24 16:24:02 2015 +0100

    Change AppArmor profile application order

    This patch is a combination of two patches committed to master as the
    first patch on its own results in continual gate check fails:

    Patch 1:

    Removed default lxc profile on container create

    Having the lxc container create role drop the lxc-openstack apparmor
    profile on all containers anytime its executed leads to the possibility
    of the lxc container create task overwriting the running profile on a given
    container. If this happens its likley to cause service interruption until the
    correct profile is loaded for all containers its effected by the action.

    To fix this issue the default "lxc-openstack" profile has been removed from the
    lxc contianer create task and added to all plays that are known to be executed
    within an lxc container. This will ensure that the profile is untouched on
    subsequent runs of the lxc-container-create.yml play.

    Closes-Bug: 1487130
    (cherry picked from commit ffb701f8a3a325e0c321fb2d3e37eea25e66a8af)

    Patch 2:

    Wait for container ssh after apparmor profile update

    This patch adds a wait for the container's sshd to be available
    after the container's apparmor profile is updated. When the
    profile is updated the container is restarted, so this wait is
    essential to the success of the playbook's completion.

    It also includes 3 retries which has been found to improve the
    rate of success.

    Due to an upstream change in behaviour with netaddr 0.7.16 we
    need to pin the package to a lower version until Neutron is
    adjusted and we bump the Neutron SHA.

    Closes-Bug: #1490142
    (cherry picked from commit a40cb5811822181369ee3269bc57d8bd19f05913)

    Change-Id: Ifa4640be60c18f1232cc7c8b281fb1dfc0119e56

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/219638
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=91fc87ae3825ac4e59b2e6e18e4e7ff8e3dc1e5b
Submitter: Jenkins
Branch: master

commit 91fc87ae3825ac4e59b2e6e18e4e7ff8e3dc1e5b
Author: Jesse Pretorius <email address hidden>
Date: Wed Sep 2 11:51:45 2015 +0100

    Additional retries for ssh wait check

    This patch adds additional retries to the ssh wait check
    tasks which were unintentionally omitted from
    https://review.openstack.org/218572

    Change-Id: Id8f7df5e283a9f61373f1bfb167a4c0bd098cc25
    Closes-Bug: #1490142

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/220430

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/220430
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=da2d9d0e1e29277abb228118820e3d89cf214f28
Submitter: Jenkins
Branch: kilo

commit da2d9d0e1e29277abb228118820e3d89cf214f28
Author: Jesse Pretorius <email address hidden>
Date: Wed Sep 2 11:51:45 2015 +0100

    Additional retries for ssh wait check

    This patch adds additional retries to the ssh wait check
    tasks which were unintentionally omitted from
    https://review.openstack.org/218572

    Change-Id: Id8f7df5e283a9f61373f1bfb167a4c0bd098cc25
    Closes-Bug: #1490142
    (cherry picked from commit 91fc87ae3825ac4e59b2e6e18e4e7ff8e3dc1e5b)

tags: added: in-kilo
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.