IPV6 AIO-DX + 3 workers: worker nodes fail to become ready

Bug #1844192 reported by Yosief Gebremariam
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Joseph Richard

Bug Description

Brief Description
-----------------
Attempted to install AIO-DX + 3 worker hosts in IPV6 configuration, but the worker host nodes fail to become ready after unlock.

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
compute-0 NotReady <none> 96m v1.15.3
compute-1 NotReady <none> 96m v1.15.3
compute-2 NotReady <none> 96m v1.15.3
controller-0 Ready master 173m v1.15.3
controller-1 Ready master 107m v1.15.3

Some of the pods in the worker nodes are in Init or ContainerCreating state.

sysadmin@controller-0 ~(keystone_admin)]$ kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-767467f9cf-82zqh 1/1 Running 2 114m dead:beef::8e22:765f:6121:eb44 controller-0 <none> <none>
calico-node-4twmg 1/1 Running 0 49m face::4 controller-1 <none> <none>
calico-node-p8bhg 0/1 Init:0/2 0 37m face::54ab:ab94:4966:f04 compute-2 <none> <none>
calico-node-pdwkd 1/1 Running 1 114m face::3 controller-0 <none> <none>
calico-node-sppdw 0/1 Init:0/2 0 37m face::265b:64e0:5e56:bae compute-0 <none> <none>
calico-node-xzr6n 0/1 Init:0/2 0 37m face::9b18:93dd:1592:9333 compute-1 <none> <none>
ceph-pools-audit-1568662500-wgv2n 0/1 Completed 0 13m dead:beef::8e22:765f:6121:eb4b controller-0 <none> <none>
ceph-pools-audit-1568662800-9p24l 0/1 Completed 0 8m17s dead:beef::8e22:765f:6121:eb4c controller-0 <none> <none>
ceph-pools-audit-1568663100-75dvk 0/1 Completed 0 3m17s dead:beef::8e22:765f:6121:eb4d controller-0 <none> <none>
coredns-7cf476b5c8-2mqh6 1/1 Running 3 114m dead:beef::8e22:765f:6121:eb43 controller-0 <none> <none>
coredns-7cf476b5c8-8mxjx 1/1 Running 0 47m dead:beef::a4ce:fec1:5423:e301 controller-1 <none> <none>
kube-apiserver-controller-0 1/1 Running 2 113m face::3 controller-0 <none> <none>
kube-apiserver-controller-1 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-controller-manager-controller-0 1/1 Running 1 113m face::3 controller-0 <none> <none>
kube-controller-manager-controller-1 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-multus-ds-amd64-5xkjb 1/1 Running 1 114m face::3 controller-0 <none> <none>
kube-multus-ds-amd64-djm6k 0/1 ContainerCreating 0 37m face::54ab:ab94:4966:f04 compute-2 <none> <none>
kube-multus-ds-amd64-g77rc 0/1 ContainerCreating 0 37m face::9b18:93dd:1592:9333 compute-1 <none> <none>
kube-multus-ds-amd64-ghbgd 0/1 ContainerCreating 0 37m face::265b:64e0:5e56:bae compute-0 <none> <none>
kube-multus-ds-amd64-smmw5 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-proxy-445n4 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-proxy-97nnn 0/1 ContainerCreating 0 37m face::265b:64e0:5e56:bae compute-0 <none> <none>
kube-proxy-mpxdw 1/1 Running 1 114m face::3 controller-0 <none> <none>
kube-proxy-nl54j 0/1 ContainerCreating 0 37m face::54ab:ab94:4966:f04 compute-2 <none> <none>
kube-proxy-zrj5c 0/1 ContainerCreating 0 37m face::9b18:93dd:1592:9333 compute-1 <none> <none>
kube-scheduler-controller-0 1/1 Running 1 113m face::3 controller-0 <none> <none>
kube-scheduler-controller-1 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-4fmc9 0/1 ContainerCreating 0 37m face::9b18:93dd:1592:9333 compute-1 <none> <none>
kube-sriov-cni-ds-amd64-cvbj9 0/1 ContainerCreating 0 37m face::54ab:ab94:4966:f04 compute-2 <none> <none>
kube-sriov-cni-ds-amd64-fsbr8 0/1 ContainerCreating 0 37m face::265b:64e0:5e56:bae compute-0 <none> <none>
kube-sriov-cni-ds-amd64-plqfz 1/1 Running 1 49m face::4 controller-1 <none> <none>
kube-sriov-cni-ds-amd64-xxv4d 1/1 Running 1 114m face::3 controller-0 <none> <none>
rbd-provisioner-65db585fd6-n676r 1/1 Running 1 30m dead:beef::8e22:765f:6121:eb47 controller-0 <none> <none>
rbd-provisioner-65db585fd6-wwm49 1/1 Running 1 30m dead:beef::a4ce:fec1:5423:e302 controller-1 <none> <none>
storage-init-rbd-provisioner-qq4fd 0/1 Completed 0 30m dead:beef::8e22:765f:6121:eb48 controller-0 <none> <none>
tiller-deploy-7855f54f57-874f9 1/1 Running 1 113m face::3 controller-0 <none> <none>

Severity
--------
Major

Steps to Reproduce
------------------
Install AIO-DX system
Add additional workers to the AIO-DX system
Unlock the worker hosts

Expected Behavior
------------------
After adding worker hosts to an AIO-DX system - workers should be ready after unlock.

Actual Behavior
----------------
Worker hosts failed to to become ready:

Reproducibility
---------------
Tested once

System Configuration
--------------------
AIO-DX + 3 additional worker hosts

Branch/Pull Time/Commit
-----------------------

Timestamp/Logs
--------------
Attached
Fri Jul 25 20:40:01 UTC 2019

Test Activity
-------------
IPV6 Installation and Configuration

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Joseph Richard (josephrichard)
Revision history for this message
Joseph Richard (josephrichard) wrote :

The default route isn't getting added on the compute node, so pulling the docker image fails.

Ghada Khalil (gkhalil)
tags: added: stx.networking
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / high priority - AIO+worker configuration not working with IPv6

tags: added: stx.3.0
Changed in starlingx:
status: New → Triaged
importance: Undecided → High
Ghada Khalil (gkhalil)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685112

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/685112
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=09eb2106f39658a1283c41696c3809c64c8d0c3e
Submitter: Zuul
Branch: master

commit 09eb2106f39658a1283c41696c3809c64c8d0c3e
Author: Joseph Richard <email address hidden>
Date: Wed Sep 25 14:32:24 2019 -0400

    Disable ipv6 autoconf

    This commit disables ipv6 autoconf for all interfaces.
    This is necessary in order to prevent invalid routes from setting the
    default route on worker nodes, which will cause them to not go ready.

    Closes-bug: 1844192
    Change-Id: I1f397b1b050854a83b37b75085f0ded80716be0a
    Signed-off-by: Joseph Richard <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Yang Liu (yliu12) wrote :

This is not seen in recent DX+Worker IPv6 sanity.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.