Containers: stx-openstack reapply gets stuck at applying armada-manifest with connection timeout error when retrieving release info

Bug #1817770 reported by Yang Liu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
John Kung

Bug Description

Brief Description
-----------------
When reapply stx-openstack, it got stuck at applying armada-manifest due to connection timeout when retrieving release info for determine whether re-apply is needed.

Severity
--------
Minor

Steps to Reproduce
------------------
- Install and configure system
- Apply/reapply stx-openstack application

Expected Behavior
------------------
- stx-openstack application is successfully applied

Actual Behavior
----------------
- stx-openstack apply got stuck at armada-manifest due to connection timeout when retrieving release info for determine whether re-apply is needed.

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
f/stein as of 2019-02-25

Timestamp/Logs
--------------
[2019-02-26 16:20:37,367] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-apply stx-openstack'

2019-02-26 16:21:50.884 18 DEBUG armada.handlers.tiller [-] Getting known releases from Tiller... list_charts /usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py:286
2019-02-26 16:21:50.885 18 DEBUG armada.handlers.tiller [-] Tiller ListReleases() with timeout=300 list_releases /usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py:205
2019-02-26 16:26:50.886 18 ERROR armada.cli [-] Caught unexpected exception: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>
2019-02-26 16:26:50.886 18 ERROR armada.cli Traceback (most recent call last):
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/__init__.py", line 39, in safe_invoke
2019-02-26 16:26:50.886 18 ERROR armada.cli self.invoke()
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/apply.py", line 217, in invoke
2019-02-26 16:26:50.886 18 ERROR armada.cli resp = armada.sync()
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/armada.py", line 234, in sync
2019-02-26 16:26:50.886 18 ERROR armada.cli deployed_releases, failed_releases = self._get_releases_by_status()
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/armada.py", line 200, in _get_releases_by_status
2019-02-26 16:26:50.886 18 ERROR armada.cli known_releases = self.tiller.list_charts()
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py", line 288, in list_charts
2019-02-26 16:26:50.886 18 ERROR armada.cli for latest_release in self.list_releases():
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py", line 209, in list_releases
2019-02-26 16:26:50.886 18 ERROR armada.cli for y in release_list:
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 347, in _next_
2019-02-26 16:26:50.886 18 ERROR armada.cli return self._next()
2019-02-26 16:26:50.886 18 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 341, in _next
2019-02-26 16:26:50.886 18 ERROR armada.cli raise self
2019-02-26 16:26:50.886 18 ERROR armada.cli grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>
2019-02-26 16:26:50.886 18 ERROR armada.cli
armada@f91232ac6052:~$

[storage/driver] 2019/02/26 16:37:18 list: failed to list: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: read tcp 172.16.0.112:49838->10.96.0.1:443: read: connection timed out

Ghada Khalil (gkhalil)
tags: added: stx.containers
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Tee Ngo (teewrs)
status: New → Triaged
tags: added: stx.2019.05
Dariush Eslimi (deslimi)
Changed in starlingx:
assignee: Tee Ngo (teewrs) → John Kung (john-kung)
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
John Kung (john-kung) wrote :

The following error occurs on the helm list attempt triggered after a host-swact away from the controller running the tiller pod:
(see: https://bugs.launchpad.net/starlingx/+bug/1817941)

"[storage/driver] 2019/02/26 16:37:18 list: failed to list: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: read tcp 172.16.0.112:49838->10.96.0.1:443: read: connection timed out"

Therefore, this bug can be tracked as a duplicate to 1817941.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Duplicate bug was fixed on 2019-05-07
https://review.opendev.org/657087

Marking as Fix Released

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Yang Liu (yliu12) wrote :

Test passed on following load: 2019-06-03_18-34-53.
helm list worked and reapply completed shortly after swact.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.