OpenStack-Ansible

Juno to Kilo upgrade: Container seem to loose br-mgmt (eth1) configuration

Bug #1474585 reported by Bjoern on 2015-07-14

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
OpenStack-Ansible	Fix Released	Medium	Kevin Carter
Kilo	Fix Released	Medium	Kevin Carter	OpenStack-Ansible 11.1.0
Trunk	Fix Released	Medium	Kevin Carter

Bug Description

I've notice connection errors to talk to the containers while running the run-upgrade.sh (picked 93c7ae4a6687f787ab7ba797307bbe7b1b2e63cc) and all containers didn't show the additional nic anymore.
I will look into this issue more deeply but communicating this issue early.

NAME STATE IPV4 ----------------------------cinder_api_container-eb9bffe2 infra01_galera_container-1a0b344b infra01_glance_container-07a97943 infra01_heat_apis_container-d3157068 infra01_heat_engine_container-7793c32b infra01_horizon_container-ffe8588e infra01_keystone_container-1bb980d8 infra01_memcached_container-9896a13d infra01_neutron_agents_container-0ab3c5c0 infra01_neutron_server_container-e177bc60 infra01_nova_api_metadata_container-nova_api_os_compute_container-nova_cert_container-a0570897 infra01_nova_conductor_container-476d4030 infra01_nova_console_container-17501a4b infra01_nova_console_container-de2c1a4b infra01_nova_scheduler_container-39f61bdc infra01_rabbit_mq_container-634a2e62 infra01_repo_container-11bc565d infra01_repo_container-ee7ffa45 infra01_swift_proxy_container-b8a5368e infra01_utility_container-52da13b5 IPV6 AUTOSTART
/>--------------------------------------------------------------------------------------
RUNNING 10.0.3.113 - YES (onboot, rpc)
RUNNING 10.0.3.235 - YES (onboot, rpc)
RUNNING 10.0.3.117 - YES (onboot, rpc)
RUNNING 10.0.3.62 - YES (onboot, rpc)
RUNNING 10.0.3.192 - YES (onboot, rpc)
RUNNING 10.0.3.88 - YES (onboot, rpc)
RUNNING 10.0.3.90 - YES (onboot, rpc)
RUNNING 10.0.3.22 - YES (onboot, rpc)
RUNNING 10.0.3.180 - YES (onboot, rpc)
RUNNING 10.0.3.193 - YES (onboot, rpc)
/>ed8d2cdf RUNNING 10.0.3.230 - YES (onboot, rpc)
/>0c409acc RUNNING 10.0.3.30 - YES (onboot, rpc)
RUNNING 10.0.3.40 - YES (onboot, rpc)
RUNNING 10.0.3.101 - YES (onboot, rpc)
RUNNING 10.0.3.109, 172.29.238.157 - YES (onboot, openstack)
RUNNING 10.0.3.44, 172.29.239.203 - YES (onboot, openstack)
RUNNING 10.0.3.47 - YES (onboot, rpc)
RUNNING 10.0.3.240 - YES (onboot, rpc)
RUNNING 10.0.3.140, 172.29.238.134 - YES (onboot, openstack)
RUNNING 10.0.3.106, 172.29.239.163 - YES (onboot, openstack)
RUNNING 10.0.3.161 - YES (onboot, rpc)
RUNNING 10.0.3.150 - YES (onboot, rpc)

Tags:

Revision history for this message

Bjoern (bjoern-t) wrote on 2015-07-14:

FYI, the containers had the eth1.cfg in place inside /etc/network/interfaces.d and a normal restart fixed this issue until I rerun the run-upgrade.sh again

Revision history for this message

Bjoern (bjoern-t) wrote on 2015-07-15:

Running the standard playbooks does not cause any issue so far so the it might only be related to re-running the upgrade script.

Revision history for this message

Bjoern (bjoern-t) wrote on 2015-07-15:

I've noticed one container lost the interface after running /tmp/fix_container_interfaces.yml which exactly does remove the interfaces files and if the container reboots, it will loose the connectivity

Jul 15 21:23:06 infra01-repo_container-11bc565d ansible-<stdin>: Invoked with directory_mode=None force=False remote_src=None path=/etc/network/interfaces.d/eth1.cfg owner=None follow=False group=None state=absent content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=None backup=None

Revision history for this message

Bjoern (bjoern-t) wrote on 2015-07-15:

I seem to be able to reproduce the issue when running /tmp/fix_container_interfaces.yml and lxc-containers-create.yml in that order

Revision history for this message

Kevin Carter (kevin-carter) wrote on 2015-07-21:

inflight review to fix this https://review.openstack.org/#/c/202821/

Changed in openstack-ansible:
status:	New → In Progress
milestone:	none → 11.1.0
importance:	Undecided → Low
importance:	Low → Medium
assignee:	nobody → Kevin Carter (kevin-carter)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-21: Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/202821
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=2badb5341f79fdec0e983aadf81f527a645416d8
Submitter: Jenkins
Branch: master

commit 2badb5341f79fdec0e983aadf81f527a645416d8
Author: kevin <email address hidden>
Date: Thu Jul 16 17:26:43 2015 -0500

Fix general upgrade issues for Juno > Kilo

    This change adds a container task to ensure that container networks are up
    and using the new configs as written by the lxc-container-create play. This
    should resolve an issue where the container networks could be in a down
    state after an upgrade due to a configuration file change.

    A run function was also added to make it possible for a deployer to know
    where in the upgrade process something might have failed and the order in
    which the tasks may need to be rerun to continue the upgrade.

    Change-Id: If02c4e269375368b6f613c5a9e3c947dddbd27f9
    Closes-Bug: #1474585
    Partial-Bug: #1475727

Changed in openstack-ansible:
status:	In Progress → Fix Committed

Kevin Carter (kevin-carter) on 2015-07-22

Changed in openstack-ansible:
status:	Fix Committed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-22: Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/204278
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=31b2edfddcea321f25d99556e3d1844fa7359b58
Submitter: Jenkins
Branch: kilo

commit 31b2edfddcea321f25d99556e3d1844fa7359b58
Author: kevin <email address hidden>
Date: Thu Jul 16 17:26:43 2015 -0500

Fix general upgrade issues for Juno > Kilo

    Change-Id: If02c4e269375368b6f613c5a9e3c947dddbd27f9
    Closes-Bug: #1474585
    Partial-Bug: #1475727
    (cherry picked from commit 2badb5341f79fdec0e983aadf81f527a645416d8)