Docker default bridge conflict with network address 172.17.0.0/16

Bug #1996916 reported by Jim Gauld
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
If customer provisions a network with address 172.17.0.0/16 ( or similar network ) and it gateway address is 172.17.0.1, this IP address causes conflict with docker0 bridge.

controller-0:~$ ifconfig docker0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
        inet6 fe80::42:14ff:fe25:73b prefixlen 64 scopeid 0x20<link>
        ether 02:42:14:25:07:3b txqueuelen 0 (Ethernet)
        RX packets 499 bytes 40292 (39.3 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 502 bytes 47544 (46.4 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
controller-0:~$

As a result, packets from customer's GW will be lost, and it cease communication error between GW and applications application pods.

Severity
--------
Critical. Applications not usable due to the defect.

Steps to Reproduce
------------------
Write down the steps to reproduce the issue
1: Launch pod on any of controllers.
2: ping to 172.17.0.1
  -> Then ping will be respose.
3: Packet capture on OAM.
  -> no packet is captured which dest IP address is 172.17.0.1

Expected Behavior
------------------
ping to 172.17.0.1 then OAM interface can capture packet which dest address is 172.17.0.1.

Actual Behavior
----------------
docker0 response to 172.17.0.1 because this is default IP address set by docker.

Reproducibility
---------------
100% with this specific Network address.

System Configuration
--------------------
Any configuration. IPv4

Branch/Pull Time/Commit
-----------------------
Day one issue.

Last Pass
---------
First time this specific network addressed being used.

Timestamp/Logs
--------------
-

Test Activity
-------------
Installation on commercial network.

Workaround
----------
On each node with docker.service active:
Manually modify: /etc/docker/daemon.json, append the key: value, "bridge": "none".
e.g,
{
    "bridge": "none",
    "insecure-registries": []
}

NOTE: this file is configured via puppet template:
stx-puppet/puppet-manifests/src/modules/platform/templates/insecuredockerregistry.conf.erb

Restart docker.service
sudo pmon-restart dockerd

Verify the "bridge" network no longer exists.
sudo docker network ls

Jim Gauld (jgauld)
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/864923

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (master)

Change abandoned by "Jim Gauld <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/864923
Reason: docker.service and dockerdistribution.service are coupled and there order and dependency problem since dockerdistribution is updating daemon.json after docker service starts, and does not properly restart service, and does not configure the file on storage hosts. Although not technically required, this also doesn't handle docker configuration during bootstrap either.

Using the config-files docker service override ExecStart option and appending "--bridge=none" is now preferred method to set this. Note this option is generic, but the options in the docker.service default has changed between CentOS and Debian, so an OS specific override file is required.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/865330
Committed: https://opendev.org/starlingx/config-files/commit/4624ec3c459ace85822a07f7dcfbcf3379b47052
Submitter: "Zuul (22348)"
Branch: master

commit 4624ec3c459ace85822a07f7dcfbcf3379b47052
Author: Jim Gauld <email address hidden>
Date: Tue Nov 22 15:10:49 2022 -0500

    CentOS: Remove docker network bridge default

    This disables the docker network bridge that is created by default
    when no bridge options are provided by docker.service or daemon.json.
    Since docker bridge is not used, it can be safely removed.

    The docker.service file is provided by RPM docker-ce, i.e.,
    rpm -q --whatprovides /usr/lib/systemd/system/docker.service
    docker-ce-18.09.6-3.el7.x86_64

    This file contains the default ExecStart:
    [Service]
    ExecStart=/usr/bin/dockerd -H fd:// \
     --containerd=/run/containerd/containerd.sock

    The ExecStart gets overridden by a Drop-In. The previous default
    setting gets wiped out using "ExecStart=", then the value is redefined
    with same options and "--bridge=none" appended.
      Drop-In: /etc/systemd/system/docker.service.d
               └─docker-stx-override.conf

    If the network with address 172.17.0.0/16 (or similar network) and it
    gateway address is 172.17.0.1, this IP address causes conflict with
    docker0 bridge. This results in packet loss between GW and application
    pods.

    Closes-Bug: 1996916

    Test Plan:
    PASS: AIO-SX Fresh install ISO. Verify docker bridge not configured.
          i.e., 'sudo docker network ls'
    PASS: Designer in-service patch apply and remove (with this change).
          Verify docker bridge not configured.
          i.e., 'sudo docker network ls'

    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ibd0164002744f1bd56e14fdb53c5b9a935b1fcc4

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/865329
Committed: https://opendev.org/starlingx/config-files/commit/2f5eebfa22030c218109e44adcacad32fae3defc
Submitter: "Zuul (22348)"
Branch: master

commit 2f5eebfa22030c218109e44adcacad32fae3defc
Author: Jim Gauld <email address hidden>
Date: Tue Nov 22 15:10:30 2022 -0500

    Debian: Remove docker network bridge default

    This disables the docker network bridge that is created by default
    when no bridge options are provided by docker.service or daemon.json.
    Since docker bridge is not used, it can be safely removed.

    The docker.service file is provided by package docker.io, i.e.,
    dpkg-query -S /lib/systemd/system/docker.service
    docker.io: /lib/systemd/system/docker.service

    dpkg -s docker.io | grep Version
    Version: 20.10.5+dfsg1-1+deb11u1

    This file contains the default ExecStart:
    [Service]
    ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS

    The ExecStart gets overridden by a Drop-In. The previous default
    setting gets wiped out using "ExecStart=", then the value is redefined
    with same options and "--bridge=none" appended.
      Drop-In: /etc/systemd/system/docker.service.d
               └─docker-stx-override.conf

    If the network with address 172.17.0.0/16 (or similar network) and it
    gateway address is 172.17.0.1, this IP address causes conflict with
    docker0 bridge. This results in packet loss between GW and application
    pods.

    Closes-Bug: 1996916

    Test Plan:
    PASS: AIO-SX Fresh install ISO. Verify docker bridge not configured.
          i.e., 'sudo docker network ls'
    PASS: STORAGE: Fresh install ISO. Verify docker bridge not configured.
          i.e., 'sudo docker network ls'

    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ied12dffd3d2894c05bd174ea937ae4bd9a800084

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/865731

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to update (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/update/+/865732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/865731
Committed: https://opendev.org/starlingx/utilities/commit/6d86d620500decb664430a9921abc16f1b1cc6a6
Submitter: "Zuul (22348)"
Branch: master

commit 6d86d620500decb664430a9921abc16f1b1cc6a6
Author: Jim Gauld <email address hidden>
Date: Fri Nov 25 14:32:43 2022 -0500

    Support dockerd in-service patching

    This adds 'dockerd' process to patch-restart-processes to support
    in-service patching of docker.service.

    Closes-Bug: 1996916

    Test Plan:
    PASS: CentOS: Apply/remove designer in-service patch that calls
          'patch-restart-processes dockerd'.
    PASS: Manually verify docker.service restarts after issuing:
          'sudo /usr/local/sbin/patch-restart-processes dockerd'

    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ib8e6c101303cea62dd84d6d9c9ddc2beeb00f5ae

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to update (master)

Reviewed: https://review.opendev.org/c/starlingx/update/+/865732
Committed: https://opendev.org/starlingx/update/commit/443681e20970ba1457bd68ca476da48493dbb310
Submitter: "Zuul (22348)"
Branch: master

commit 443681e20970ba1457bd68ca476da48493dbb310
Author: Jim Gauld <email address hidden>
Date: Fri Nov 25 14:55:18 2022 -0500

    Support dockerd in-service patching with EXAMPLE_DOCKER script

    This adds the EXAMPLE_DOCKER in-service patching script.
    Packaging currently for CentOS, but script itself is generic.

    The docker.service has required patching, so this supports future
    patching.

    Closes-Bug: 1996916
    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/865731

    Test Plan:
    PASS: CentOS: Apply/remove designer in-service patch including
          EXAMPLE_DOCKER and verify docker.service restarts.

    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: I2c630eac88da030af69240a2badd11f06cbd5475

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.containers
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.