IPv6: stx-openstack app apply failed by time out waiting for the condition "grpc_status"

Bug #1859641 reported by Peng Peng on 2020-01-14
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
High
zhipeng liu

Bug Description

Brief Description
-----------------
During the initial regular system setup, apply stx-openstack application failed. armada/stx-openstack-apply log shows that there a time out for waiting for the condition "grpc_status".

And the mariadb-ingress pods are not ready:
mariadb-ingress-5bb8b69fc8-czb7z 0/1 Running 0 5h18m dead:beef::8e22:765f:6121:eb4e controller-0 <none> <none>
mariadb-ingress-5bb8b69fc8-h8zx5 0/1 Running 0 5h18m dead:beef::a4ce:fec1:5423:e306 controller-1 <none> <none>

Severity
--------
Critical

Steps to Reproduce
------------------
install a Ipv6 system
apply stx-openstack application

TC-name: installation

Expected Behavior
------------------
applied

Actual Behavior
----------------
- apply failed
- mariadb-ingress pods are not ready:
mariadb-ingress-5bb8b69fc8-czb7z 0/1 Running 0 5h18m dead:beef::8e22:765f:6121:eb4e controller-0 <none> <none>
mariadb-ingress-5bb8b69fc8-h8zx5 0/1 Running 0 5h18m dead:beef::a4ce:fec1:5423:e306 controller-1 <none> <none>

Reproducibility
---------------
Unknown - first time this is tried on IPv6 system

System Configuration
--------------------
Multi-node system

Lab-name: WCp_71-75

Branch/Pull Time/Commit
-----------------------
20200111T023000Z

Last Pass
---------
unknown

Timestamp/Logs
--------------
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller [-] [chart=openstack-mariadb]: Error while installing release osh-openstack-mariadb: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "release osh-openstack-mariadb failed: timed out waiting for the condition"
        debug_error_string = "{"created":"@1578993242.085407734","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"release osh-openstack-mariadb failed: timed out waiting for the condition","grpc_status":2}"
>
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller Traceback (most recent call last):
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 473, in install_release
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller metadata=self.metadata)
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 533, in __call__
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller return _end_unary_response_blocking(state, call, False, None)
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller raise _Rendezvous(state, None, None, deadline)
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller status = StatusCode.UNKNOWN
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller details = "release osh-openstack-mariadb failed: timed out waiting for the condition"
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller debug_error_string = "{"created":"@1578993242.085407734","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"release osh-openstack-mariadb failed: timed out waiting for the condition","grpc_status":2}"
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller >
2020-01-14 09:14:02.085 458 ERROR armada.handlers.tiller ^[[00m
2020-01-14 09:14:02.087 458 DEBUG armada.handlers.tiller [-] [chart=openstack-mariadb]: Helm getting release status for release=osh-openstack-mariadb, version=0 get_release_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:539^[[00m
2020-01-14 09:14:02.230 458 DEBUG armada.handlers.tiller [-] [chart=openstack-mariadb]: GetReleaseStatus= name: "osh-openstack-mariadb"
info {
  status {
    code: FAILED
  }
  first_deployed {
    seconds: 1578991441
    nanos: 428638329
  }
  last_deployed {
    seconds: 1578991441
    nanos: 428638329
  }
  Description: "Release \"osh-openstack-mariadb\" failed: timed out waiting for the condition"
}

Test Activity
-------------
installation

Yang Liu (yliu12) on 2020-01-14
summary: - stx-openstack app apply failed by time out waiting for the condition
- "grpc_status"
+ IPv6: stx-openstack app apply failed by time out waiting for the
+ condition "grpc_status"
description: updated
Yang Liu (yliu12) on 2020-01-16
tags: added: stx.retestneeded
Yang Liu (yliu12) wrote :

Comments from Joseph form initial investigation:

The helm charts are also pulled from the IPv4 address, I tried updating it to use the hostname, but unfortunately there seems to be a bug in build-helm-charts with the image-records not correctly parsing pulling from hosts with hyphens in the name.

Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority -- it seems that the openstack helm charts are not working for IPv6.
Assigning to the openstack PL/team for further investigation

tags: added: stx.containers
tags: added: stx.distro.openstack
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → yong hu (yhu6)
Ghada Khalil (gkhalil) on 2020-01-20
tags: added: stx.4.0
Joseph Richard (josephrichard) wrote :

That helm chart issue was a separate issue affecting an ipv6 lab, and has been fixed.

mariadb is correctly binding to 0.0.0.0 [1] even in ipv6 configs. This can be overwritten by a change in the mariadb helm plugin [2] to override this in an ipv6 environment.

There is also a bug in openstack-helm-infra [3] that it will not enclose ipv6 addresses in square brackets resulting in them being unable to be parsed.

[1] https://github.com/openstack/openstack-helm-infra/blob/master/mariadb/values.yaml#L322
[2] https://opendev.org/starlingx/config/src/branch/master/sysinv/sysinv/sysinv/sysinv/helm/mariadb.py
[3] https://github.com/openstack/openstack-helm-infra/blame/master/mariadb/files/nginx.tmpl#L476

Peng Peng (ppeng) wrote :

Issue reproduced on
Lab: WCP_71_75
Load: 2020-01-22_20-00-00
log @
https://files.starlingx.kube.cengn.ca/launchpad/1859641

Peng Peng (ppeng) wrote :

Please ignore comment #5. I post the comment on the wrong place. Sorry about that

zhipeng liu (zhipengs) on 2020-02-11
Changed in starlingx:
assignee: yong hu (yhu6) → zhipeng liu (zhipengs)
Yan Chen (ychen2u) wrote :

@Peng, please share us the localhost.yml file you used to deploy the stx.

zhipeng liu (zhipengs) wrote :

Hi Pengpeng

As Yan said, we need your help to set up IPv6 environment for our Ipv6 related bug analysis.

Thanks!
Zhipeng

Changed in starlingx:
status: Triaged → Incomplete
Peng Peng (ppeng) wrote :

2020-01-22_20-00-00/lab/yow/cgcs-wildcat-71_75/localhost.yml

system_mode: duplex
dns_servers:
- 2620:10a:a001:a103::2

management_subnet: face::/64
management_multicast_subnet: ff05::1b:0/124
cluster_host_subnet: feed:beef::/64
cluster_pod_subnet: dead:beef::/64
cluster_service_subnet: fd04::/112

external_oam_subnet: 2620:10a:a001:a103::6:0/64
external_oam_gateway_address: 2620:10a:a001:a103::6:0
external_oam_floating_address: 2620:10A:A001:A103::1218
external_oam_node_0_address: 2620:10A:A001:A103::1215
external_oam_node_1_address: 2620:10A:A001:A103::1024

admin_password: Li69nux*
ansible_become_pass: Li69nux*
docker_http_proxy: http://yow-proxomatic.wrs.com:3128
pxeboot_subnet: 192.168.202.0/24
docker_no_proxy:
- registry.local
- tis-lab-registry.cumulus.wrs.com
docker_registries:
  quay.io:
    url: tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/quay.io
  docker.elastic.co:
    url: tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/docker.elastic.co
  gcr.io:
    url: tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/gcr.io
  k8s.gcr.io:
    url: tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/k8s.gcr.io
  docker.io:
    url: tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/docker.io
  defaults:
    type: docker
~

Yan Chen (ychen2u) wrote :

@Peng,
Please also share us a simplex localhost.yml.
I want to also try how to setup ipv6 env for a simplex setup.
Thanks!

Peng Peng (ppeng) wrote :

from WCP_112:
BUILD_ID="2020-02-18_10-46-00"

system_mode: simplex
dns_servers:
- 2620:10a:a001:a103::2

management_subnet: abcd:204::/64
management_multicast_subnet: ff05::1b:0/124

cluster_host_subnet: abcd:205::/64
cluster_pod_subnet: abcd:206::/64
cluster_service_subnet: abcd:207::/112

external_oam_subnet: 2620:10a:a001:a103::6:0/64
external_oam_gateway_address: 2620:10a:a001:a103::6:0
external_oam_floating_address: 2620:10a:a001:a103::148

admin_password: Li69nux*
ansible_become_pass: Li69nux*
pxeboot_subnet: 192.168.202.0/24
docker_http_proxy: http://yow-proxomatic.wrs.com:3128
docker_https_proxy: http://yow-proxomatic.wrs.com:3129
docker_no_proxy:
- registry.local
- tis-lab-registry.cumulus.wrs.com

Yan Chen (ychen2u) wrote :

Thanks a lot.

Yang Liu (yliu12) on 2020-02-20
Changed in starlingx:
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers