Containerized compute node loses connectivity during deployment

Bug #1646897 reported by Martin André
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Steve Baker

Bug Description

Since [1] the containerized compute node looses connectivity during the deployment causing it to timeout.

This is likely due to our usage of os-net-config [2] in heat-agents container.

[1] https://github.com/openstack/tripleo-heat-templates/commit/2985cd9a3a04acfe069c063c65ebf487a1413388
[2] https://github.com/openstack/tripleo-common/blob/master/heat_docker_agent/run-os-net-config

Martin André (mandre)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Martin André (mandre)
Changed in tripleo:
milestone: none → ocata-3
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

AFAIK this is blocking us to have containers job on CI.

Changed in tripleo:
assignee: Martin André (mandre) → Steve Baker (steve-stevebaker)
status: Triaged → In Progress
Revision history for this message
Steve Baker (steve-stevebaker) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

hrm... I'm trying this patch out but hitting an error.
For some reason the route configs are added to /etc/sysconfig/network-scripts however the devices themselves are not.

details are here: http://paste.openstack.org/show/592159/

[heat-admin@overcloud-novacompute-0 ~]$ sudo ls /etc/sysconfig/network-scripts/route-br-ex
/etc/sysconfig/network-scripts/route-br-ex

[heat-admin@overcloud-novacompute-0 ~]$ sudo ls /etc/sysconfig/network-scripts/ifcfg-br-ex
ls: cannot access /etc/sysconfig/network-scripts/ifcfg-br-ex: No such file or directory

[heat-admin@overcloud-novacompute-0 ~]$ sudo ls /etc/sysconfig/network-scripts/ifcfg*
/etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo

This causes the device to not start and ultimately the deployment to fail afaict.

[2016/12/12 10:31:03 PM] [INFO] running ifup on bridge: br-ex
Traceback (most recent call last):
  File "/usr/bin/os-net-config", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 187, in main
    activate=not opts.no_activate)
  File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 931, in apply
    self.ifup(bridge, iftype='bridge')
  File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 244, in ifup
    self.execute(msg, '/sbin/ifup', interface)
  File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 224, in execute
    processutils.execute(cmd, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 394, in execute
    cmd=sanitized_cmd)
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: /sbin/ifup br-ex

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/407289
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=bb73874310ce7a7a2a7da6a848dbd60e4e0aff12
Submitter: Jenkins
Branch: master

commit bb73874310ce7a7a2a7da6a848dbd60e4e0aff12
Author: Steve Baker <email address hidden>
Date: Tue Dec 6 16:27:04 2016 +1300

    docker: don't use custom run-os-net-config

    The script run-os-net-config[1] copies in ifcfg-* from the host before
    running os-net-config. Apparently it was done this way because the
    other scripts in /etc/sysconfig/network-scripts/ differed between host
    and agent container. This should be less of an issue now that host and
    heat-agents run centos-7 (even when the host is atomic)

    tripleo-heat-templates recently changed to running os-net-config in a
    deployment script instead of an os-refresh-config script [2]. This
    means that our current run-os-net-config approach is currently
    resulting in os-net-config being executed twice.

    Another issue with run-os-net-config is that it copies ifcfg-* from
    host to container, but not back again. This means that rebooting the
    server will result in unconfigured interfaces until os-net-config is
    somehow run again.

    This change bind mounts /etc/sysconfig/network-scripts/ from the host
    and uses the conventional approach to running os-refresh-config.

    This may fix the issue where compute nodes are losing network
    connectivity, so
    Closes-Bug: #1646897

    [1] http://git.openstack.org/cgit/openstack/tripleo-common/tree/heat_docker_agent/run-os-net-config
    [2] I0ed08332cfc49a579de2e83960f0d8047690b97a

    Change-Id: I763fc8d8e3eb10ac64d33e46c92888d211003e72

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.