Juno to Kilo upgrade: Container seem to loose br-mgmt (eth1) configuration

Bug #1474585 reported by Bjoern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Kevin Carter
Kilo
Fix Released
Medium
Kevin Carter
Trunk
Fix Released
Medium
Kevin Carter

Bug Description

I've notice connection errors to talk to the containers while running the run-upgrade.sh (picked 93c7ae4a6687f787ab7ba797307bbe7b1b2e63cc) and all containers didn't show the additional nic anymore.
I will look into this issue more deeply but communicating this issue early.

NAME STATE IPV4 IPV6 AUTOSTART
------------------------------------------------------------------------------------------------------------------
infra01_cinder_api_container-eb9bffe2 RUNNING 10.0.3.113 - YES (onboot, rpc)
infra01_galera_container-1a0b344b RUNNING 10.0.3.235 - YES (onboot, rpc)
infra01_glance_container-07a97943 RUNNING 10.0.3.117 - YES (onboot, rpc)
infra01_heat_apis_container-d3157068 RUNNING 10.0.3.62 - YES (onboot, rpc)
infra01_heat_engine_container-7793c32b RUNNING 10.0.3.192 - YES (onboot, rpc)
infra01_horizon_container-ffe8588e RUNNING 10.0.3.88 - YES (onboot, rpc)
infra01_keystone_container-1bb980d8 RUNNING 10.0.3.90 - YES (onboot, rpc)
infra01_memcached_container-9896a13d RUNNING 10.0.3.22 - YES (onboot, rpc)
infra01_neutron_agents_container-0ab3c5c0 RUNNING 10.0.3.180 - YES (onboot, rpc)
infra01_neutron_server_container-e177bc60 RUNNING 10.0.3.193 - YES (onboot, rpc)
infra01_nova_api_metadata_container-ed8d2cdf RUNNING 10.0.3.230 - YES (onboot, rpc)
infra01_nova_api_os_compute_container-0c409acc RUNNING 10.0.3.30 - YES (onboot, rpc)
infra01_nova_cert_container-a0570897 RUNNING 10.0.3.40 - YES (onboot, rpc)
infra01_nova_conductor_container-476d4030 RUNNING 10.0.3.101 - YES (onboot, rpc)
infra01_nova_console_container-17501a4b RUNNING 10.0.3.109, 172.29.238.157 - YES (onboot, openstack)
infra01_nova_console_container-de2c1a4b RUNNING 10.0.3.44, 172.29.239.203 - YES (onboot, openstack)
infra01_nova_scheduler_container-39f61bdc RUNNING 10.0.3.47 - YES (onboot, rpc)
infra01_rabbit_mq_container-634a2e62 RUNNING 10.0.3.240 - YES (onboot, rpc)
infra01_repo_container-11bc565d RUNNING 10.0.3.140, 172.29.238.134 - YES (onboot, openstack)
infra01_repo_container-ee7ffa45 RUNNING 10.0.3.106, 172.29.239.163 - YES (onboot, openstack)
infra01_swift_proxy_container-b8a5368e RUNNING 10.0.3.161 - YES (onboot, rpc)
infra01_utility_container-52da13b5 RUNNING 10.0.3.150 - YES (onboot, rpc)

Tags: in-kilo
Revision history for this message
Bjoern (bjoern-t) wrote :

FYI, the containers had the eth1.cfg in place inside /etc/network/interfaces.d and a normal restart fixed this issue until I rerun the run-upgrade.sh again

Revision history for this message
Bjoern (bjoern-t) wrote :

Running the standard playbooks does not cause any issue so far so the it might only be related to re-running the upgrade script.

Revision history for this message
Bjoern (bjoern-t) wrote :

I've noticed one container lost the interface after running /tmp/fix_container_interfaces.yml which exactly does remove the interfaces files and if the container reboots, it will loose the connectivity

Jul 15 21:23:06 infra01-repo_container-11bc565d ansible-<stdin>: Invoked with directory_mode=None force=False remote_src=None path=/etc/network/interfaces.d/eth1.cfg owner=None follow=False group=None state=absent content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None selevel=None original_basename=None regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=None backup=None

Revision history for this message
Bjoern (bjoern-t) wrote :

I seem to be able to reproduce the issue when running /tmp/fix_container_interfaces.yml and lxc-containers-create.yml in that order

Revision history for this message
Kevin Carter (kevin-carter) wrote :

inflight review to fix this https://review.openstack.org/#/c/202821/

Changed in openstack-ansible:
status: New → In Progress
milestone: none → 11.1.0
importance: Undecided → Low
importance: Low → Medium
assignee: nobody → Kevin Carter (kevin-carter)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/202821
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=2badb5341f79fdec0e983aadf81f527a645416d8
Submitter: Jenkins
Branch: master

commit 2badb5341f79fdec0e983aadf81f527a645416d8
Author: kevin <email address hidden>
Date: Thu Jul 16 17:26:43 2015 -0500

    Fix general upgrade issues for Juno > Kilo

    This change adds a container task to ensure that container networks are up
    and using the new configs as written by the lxc-container-create play. This
    should resolve an issue where the container networks could be in a down
    state after an upgrade due to a configuration file change.

    A run function was also added to make it possible for a deployer to know
    where in the upgrade process something might have failed and the order in
    which the tasks may need to be rerun to continue the upgrade.

    Change-Id: If02c4e269375368b6f613c5a9e3c947dddbd27f9
    Closes-Bug: #1474585
    Partial-Bug: #1475727

Changed in openstack-ansible:
status: In Progress → Fix Committed
Changed in openstack-ansible:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/204278
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=31b2edfddcea321f25d99556e3d1844fa7359b58
Submitter: Jenkins
Branch: kilo

commit 31b2edfddcea321f25d99556e3d1844fa7359b58
Author: kevin <email address hidden>
Date: Thu Jul 16 17:26:43 2015 -0500

    Fix general upgrade issues for Juno > Kilo

    This change adds a container task to ensure that container networks are up
    and using the new configs as written by the lxc-container-create play. This
    should resolve an issue where the container networks could be in a down
    state after an upgrade due to a configuration file change.

    A run function was also added to make it possible for a deployer to know
    where in the upgrade process something might have failed and the order in
    which the tasks may need to be rerun to continue the upgrade.

    Change-Id: If02c4e269375368b6f613c5a9e3c947dddbd27f9
    Closes-Bug: #1474585
    Partial-Bug: #1475727
    (cherry picked from commit 2badb5341f79fdec0e983aadf81f527a645416d8)

tags: added: in-kilo
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.