RPC timeout error when creating barbican secret on host-bulk-add after controller-0 is up
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Alexander Kozyrev |
Bug Description
Brief Description
-----------------
After controller-0 is installed and configured, run system host-bulk-add <xmlfile>, the cmd will fail due to RPC timeout.
Severity
--------
Major
Steps to Reproduce
------------------
- Install and configure controller-0 using ansible
- unlock controller-0 and wait for it available and pods are running
- system host-bulk-add other hosts
TC-name: system install
Expected Behavior
------------------
- hosts successfully added
Actual Behavior
----------------
[2019-08-09 14:06:49,438] 301 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-bulk-add hosts_bulk_add.xml'
[2019-08-09 14:08:51,518] 423 DEBUG MainThread ssh.expect :: Output:
Error:
controller: Timeout while waiting on RPC response - topic: "sysinv.
controller: Timeout while waiting on RPC response - topic: "sysinv.
Reproducibility
---------------
Reproducible
This is seen on multiple systems on 0808 and later loads, and was not seen on 0805 load.
To work around it, rerun this command after some time.
System Configuration
-------
AIO-DX, Standard, Dedicated storage systems
Branch/Pull Time/Commit
-------
We see this issue on both master the stx2.0 branches.
stx master as of 20190809
stx.2.0 as of 20190808
Last Pass
---------
20190805
Timestamp/Logs
--------------
# controller-0 is up and pods are running
[2019-08-09 14:06:46,730] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod -n=kube-system -o=wide'
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-
calico-node-grtkc 1/1 Running 1 44m 192.168.204.3 controller-0 <none> <none>
coredns-
coredns-
kube-apiserver-
kube-controller
kube-multus-
kube-proxy-6sxg6 1/1 Running 1 44m 192.168.204.3 controller-0 <none> <none>
kube-scheduler-
kube-sriov-
tiller-
# cmd failed
[2019-08-09 14:06:49,438] 301 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-bulk-add hosts_bulk_add.xml'
[2019-08-09 14:08:51,518] 423 DEBUG MainThread ssh.expect :: Output:
Error:
controller: Timeout while waiting on RPC response - topic: "sysinv.
controller: Timeout while waiting on RPC response - topic: "sysinv.
Test Activity
-------------
Sanity
Changed in starlingx: | |
assignee: | nobody → John Kung (john-kung) |
importance: | Undecided → High |
Changed in starlingx: | |
importance: | High → Undecided |
Changed in starlingx: | |
assignee: | John Kung (john-kung) → Alex Kozyrev (akozyrev) |
summary: |
- RPC timeout error when bulk-add-hosts after controller-0 is up + RPC timeout error when creating barbican secret on host-bulk-add after + controller-0 is up |
tags: | added: stx.3.0 stx.config |
tags: | added: stx.retestneeded |
This issue could be related to change from https:/ /bugs.launchpad .net/starlingx/ +bug/1834673. At least the return code reported in bug 1834673 did not cover the rpc timeout case.
Revert request for suspect update is in Gerrit: https:/ /review. opendev. org/#/c/ 675698/