StarlingX

Containers: cluster IP address not assigned on computes after lock/unlock - I/F name too long

Bug #1817593 reported by Yang Liu on 2019-02-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Teresa Ho

Bug Description

Brief Description
-----------------
garbd pod is in CrashLoopBackOff status since install and config

Severity
--------
Major

Steps to Reproduce
------------------
1. install and config platform
2. deploy stx-openstack
3. check pods status

Expected Behavior
------------------
3. all pods are healthy

Actual Behavior
----------------
[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces -o wide | grep -v -e Completed -e Running
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
openstack osh-openstack-garbd-garbd-5744f5f85-9t5lz 0/1 CrashLoopBackOff 116 10h 172.16.4.2 compute-1 <none>

Reproducibility
---------------
Reproducible (maybe, seen 2/2 times on a regular system)

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
f/stein as of 2019-02-21

Timestamp/Logs
--------------
{"log":"2019-02-25 15:39:15.012 INFO: protonet asio version 0\n","stream":"stderr","time":"2019-02-25T15:39:15.012273135Z"}
{"log":"2019-02-25 15:39:15.012 INFO: Using CRC-32C for message checksums.\n","stream":"stderr","time":"2019-02-25T15:39:15.012417963Z"}
{"log":"2019-02-25 15:39:15.012 INFO: backend: asio\n","stream":"stderr","time":"2019-02-25T15:39:15.01245371Z"}
{"log":"2019-02-25 15:39:15.012 INFO: gcomm thread scheduling priority set to other:0 \n","stream":"stderr","time":"2019-02-25T15:39:15.012475804Z"}
{"log":"2019-02-25 15:39:15.012 WARN: access file(./gvwstate.dat) failed(No such file or directory)\n","stream":"stderr","time":"2019-02-25T15:39:15.012588844Z"}
{"log":"2019-02-25 15:39:15.012 INFO: restore pc from disk failed\n","stream":"stderr","time":"2019-02-25T15:39:15.012594681Z"}
{"log":"2019-02-25 15:39:15.012 INFO: GMCast version 0\n","stream":"stderr","time":"2019-02-25T15:39:15.012702816Z"}
{"log":"2019-02-25 15:39:35.046 WARN: Failed to resolve tcp://mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local:4567\n","stream":"stderr","time":"2019-02-25T15:39:35.046357123Z"}
{"log":"2019-02-25 15:39:55.071 WARN: Failed to resolve tcp://mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local:4567\n","stream":"stderr","time":"2019-02-25T15:39:55.071237092Z"}
{"log":"2019-02-25 15:39:55.071 INFO: (816dbf37, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567\n","stream":"stderr","time":"2019-02-25T15:39:55.071427289Z"}
{"log":"2019-02-25 15:39:55.071 INFO: (816dbf37, 'tcp://0.0.0.0:4567') multicast: , ttl: 1\n","stream":"stderr","time":"2019-02-25T15:39:55.071432896Z"}
{"log":"2019-02-25 15:39:55.071 INFO: EVS version 0\n","stream":"stderr","time":"2019-02-25T15:39:55.07169498Z"}
{"log":"2019-02-25 15:39:55.071 INFO: gcomm: connecting to group 'mariadb-server_openstack', peer 'mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local:,mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local:'\n","stream":"stderr","time":"2019-02-25T15:39:55.071748318Z"}
{"log":"2019-02-25 15:39:55.071 ERROR: failed to open gcomm backend connection: 131: No address to connect (FATAL)\n","stream":"stderr","time":"2019-02-25T15:39:55.071806276Z"}
{"log":"\u0009 at gcomm/src/gmcast.cpp:connect_precheck():310\n","stream":"stderr","time":"2019-02-25T15:39:55.071811556Z"}
{"log":"2019-02-25 15:39:55.071 ERROR: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -131 (State not recoverable)\n","stream":"stderr","time":"2019-02-25T15:39:55.071813648Z"}
{"log":"2019-02-25 15:39:55.071 ERROR: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'mariadb-server_openstack' at 'gcomm://mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local,mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local': -131 (State not recoverable)\n","stream":"stderr","time":"2019-02-25T15:39:55.071867156Z"}
{"log":"2019-02-25 15:39:55.071 FATAL: Exception in creating receive loop: Failed to open connection to group: 131 (State not recoverable)\n","stream":"stderr","time":"2019-02-25T15:39:55.071884553Z"}
{"log":"\u0009 at garb/garb_gcs.cpp:Gcs():35\n","stream":"stderr","time":"2019-02-25T15:39:55.071888396Z"}

Tags:

Frank Miller (sensfan22) on 2019-02-25

tags:	added: stx.containers
Changed in starlingx:
importance:	Undecided → High

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-25:

Marking as release gating; high priority as containers do not come up in standard and storage configs. Recent commits maybe suspect.

Changed in starlingx:
assignee:	nobody → Chris Friesen (cbf123)
status:	New → Triaged
tags:	added: stx.2019.05

Frank Miller (sensfan22) on 2019-02-26

summary:

- Containers: garbd not coming up on freshly installed system
+ Containers: cluster IP address was not assigned on some computes after
+ lock/unlock

Ghada Khalil (gkhalil) on 2019-02-26

Changed in starlingx:
assignee:	Chris Friesen (cbf123) → Teresa Ho (teresaho)
summary:	- Containers: cluster IP address was not assigned on some computes after - lock/unlock + Containers: cluster IP address not assigned on computes after + lock/unlock - - I/F name too long
summary:	Containers: cluster IP address not assigned on computes after - lock/unlock - - I/F name too long + lock/unlock - I/F name too long

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-26:

Download full text (3.7 KiB)

Changing the bug summary as further investigation shows that the issue is related to assigning the cluster network IP address.

Details from Don Penney:
------------------------
The cluster IP address was not assigned on compute-1 and compute-2, but was assigned on compute-0. When kubernetes brings up its node, it sees the mgmt IP only, which cannot reach the 172.16.1.11 address. We'd expect to see the 192.168.206.X address here instead:

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl describe node compute-1
Name: compute-1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=compute-1
                    openstack-compute-node=enabled
                    openvswitch=enabled
                    sriov=enabled
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.204.251/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 25 Feb 2019 04:13:11 +0000

So it seems that the root cause here is the failure to assign the cluster IP, which appears to be due to the interface name being too long in this lab when suffixed with a VLAN tag and an alias identifier (ie. enp134s0f0.169:5, enp136s0f0.169:5)

It looks like there are a couple of issues exposed here:
1. kernel defines IFNAMSIZ to be 16 bytes, so max interface name length is 15 characters. In the case of the aliased cluster IP address on a VLAN, the full interface name is IFNAME.VLAN:ALIAS, ie. enp134s0f0.169:5, which is 16 characters. This results in "RTNETLINK answers: Numerical result out of range" errors that block the aliased address being assigned with "ifup IFNAME.VLAN"
2. apply_network_config.sh script is doing ifdown/ifup for changed interfaces, but is not handling aliased files or sorting interfaces. If the "find" run by the script returns the base VLAN interface filename first, it will do an ifup on the IFNAME.VLAN:ALIAS name after, which we've seen from a manual test will give the RTNETLINK error but assign the IP address.

In this lab, we saw that on the initial unlock of each compute, the apply_network_config.sh tried to ifup the aliased names ahead of the base VLAN name on compute-1 and compute-2, which fails because the underlying VLAN interface hadn't been setup yet. As a result, the cluster IP as never assigned on those hosts. On compute-0, it did the base VLAN interface first, and was able to then assign the cluster IP address from the aliased filename.

On a reboot, the /etc/init.d/network script brings up the interfaces, but does not have a separate ifup for the aliased name. So we see it attempt to assign the cluster IP, but fail a check and cannot assign the address. Subsequent reboots of compute-0 result in the cluster IP address not being assigned as a result (the apply_network_config.sh script only does the ifdown/ifup if there's a config change). The following logs are seen in daemon.log:

2019-02-25T22:17:48.255 compute-0 network[1875]: info Determining IP information for...

Changing the bug summary as further investigation shows that the issue is related to assigning the cluster network IP address.

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl describe node compute-1 
Name: compute-1 
Roles: <none> 
Labels: beta.kubernetes.io/arch=amd64 
                    beta.kubernetes.io/os=linux 
                    kubernetes.io/hostname=compute-1 
                    openstack-compute-node=enabled 
                    openvswitch=enabled 
                    sriov=enabled 
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock 
                    node.alpha.kubernetes.io/ttl: 0 
                    projectcalico.org/IPv4Address: 192.168.204.251/24 
                    volumes.kubernetes.io/controller-managed-attach-detach: true 
CreationTimestamp: Mon, 25 Feb 2019 04:13:11 +0000

It looks like there are a couple of issues exposed here: 
1. kernel defines IFNAMSIZ to be 16 bytes, so max interface name length is 15 characters. In the case of the aliased cluster IP address on a VLAN, the full interface name is IFNAME.VLAN:ALIAS, ie. enp134s0f0.169:5, which is 16 characters. This results in "RTNETLINK answers: Numerical result out of range" errors that block the aliased address being assigned with "ifup IFNAME.VLAN" 
2. apply_network_config.sh script is doing ifdown/ifup for changed interfaces, but is not handling aliased files or sorting interfaces. If the "find" run by the script returns the base VLAN interface filename first, it will do an ifup on the IFNAME.VLAN:ALIAS name after, which we've seen from a manual test will give the RTNETLINK error but assign the IP address.

2019-02-25T22:17:48.255 compute-0 network[1875]: info Determining IP information for enp134s0f0.169...RTNETLINK answers: Numerical result out of range 
2019-02-25T22:17:48.000 compute-0 dhclient[2466]: info bound to 192.168.204.210 -- renewal in 41172 seconds. 
2019-02-25T22:17:48.274 compute-0 network[1875]: info done. 
2019-02-25T22:17:48.319 compute-0 network[1875]: info Determining if ip address 192.168.206.149 is already in use for device enp134s0f0.169... 
2019-02-25T22:17:52.328 compute-0 network[1875]: info RTNETLINK answers: Numerical result out of range 
2019-02-25T22:17:52.329 compute-0 network[1875]: info bind: Cannot assign requested address

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-27: Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/639683

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-02-27:

Lowering priority to medium. Analysis indicates this only impacts 1 lab and can be worked around by changing a vlan name.

Changed in starlingx:
importance:	High → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-28: Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/639683
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=575be13d684eb4763092bd391f41d168341dbe1f
Submitter: Zuul
Branch: master

commit 575be13d684eb4763092bd391f41d168341dbe1f
Author: Teresa Ho <email address hidden>
Date: Wed Feb 27 09:32:05 2019 -0500

Fix to not restart alias interface

    The alias interface may at times be brought up before its parent
    interface (ethernet or vlan) depending on how the interface config
    files are listed in the filesystem.
    The fix is to not restart the alias interface. If there is a
    change in the alias interface configuration, the parent interface
    would be restarted instead.

Partial-Bug: 1817593

Change-Id: I83b88e1587c6468d85e14a50de934908586cbc9b
Signed-off-by: Teresa Ho <email address hidden>

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Ghada Khalil (gkhalil) on 2019-04-09

tags:

added: stx.retestneeded

Ghada Khalil (gkhalil) on 2019-06-04

tags:

added: stx.networking
removed: stx.containers

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-07-08:

Based on input from Matt Peters (networking TL), we should investigate an alternate naming scheme for vlan interfaces (e.g. vlanN) so that the vlanN:alias will not exceed the 15 char limit.

Revision history for this message

Forrest Zhao (forrest.zhao) wrote on 2019-08-08:

Can be worked around by explicitly defining the cluster network. Defer to stx 3.0 for final fix.

tags:

removed: stx.2.0

Frank Miller (sensfan22) on 2019-08-09

tags:

added: stx.3.0

Ghada Khalil (gkhalil) on 2019-09-04

Changed in starlingx:
status:	In Progress → Confirmed

Revision history for this message

Matt Peters (mpeters-wrs) wrote on 2019-09-12:

The vlanNN naming scheme is documented here:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-naming_scheme_for_vlan_interfaces

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-04: Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686728

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-04: Fix proposed to metal (master)

#10

Fix proposed to branch: master
Review: https://review.opendev.org/686834

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-17: Fix merged to config (master)

#11

Reviewed: https://review.opendev.org/686728
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=c823a241f9bf695883ac97c92681478986d1da55
Submitter: Zuul
Branch: master

commit c823a241f9bf695883ac97c92681478986d1da55
Author: Teresa Ho <email address hidden>
Date: Fri Oct 4 09:48:50 2019 -0400

Change VLAN interface naming scheme

    The VLAN alias interface name may exceed the maximum interface
    name length as defined by the kernel, depending on some hardware.
    This commit changes the vlan naming convention from device name plus
    VLAN ID to VLAN plus VLAN ID for platform interfaces.
    A semantic check is also added to ensure that VLAN IDs are unique
    across all platform interfaces of a host.

Closes-Bug: 1817593

Change-Id: Idf732e85aa7a6d19491f13e29cc1b408bb5bfe5d
Signed-off-by: Teresa Ho <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-17: Fix merged to metal (master)

#12

Reviewed: https://review.opendev.org/686834
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=d6ea5446e9d008a065a12cecc6c6a4692aa2c3e9
Submitter: Zuul
Branch: master

commit d6ea5446e9d008a065a12cecc6c6a4692aa2c3e9
Author: Teresa Ho <email address hidden>
Date: Fri Oct 4 16:09:23 2019 -0400

Change VLAN interface naming scheme

Closes-Bug: 1817593
Depends-On: https://review.opendev.org/#/c/686728/

Change-Id: I8a74e1d47e0ab3ef261f9512a8887d7f0de66064
Signed-off-by: Teresa Ho <email address hidden>

Revision history for this message

Yang Liu (yliu12) wrote on 2019-11-07:

#13

Verified on 1102 load. IF naming convention is changed as described.

tags:

removed: stx.retestneeded

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.