"helm list" failed by network is unreachable after host-unlock

Bug #1863795 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Frank Miller

Bug Description

Brief Description
-----------------
In a regular system, after host-lock/unlock, run cmd "helm list" failed, the error msg likes,
helm list. Output: Error: forwarding ports: error upgrading connection: unable to upgrade connection: error dialing backend: dial tcp 128.224.151.83:38136: connect: network is unreachable

Severity
--------
Major

Steps to Reproduce
------------------
host-lock
host-unlock
helm list

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[controller]

Expected Behavior
------------------
helm list works

Actual Behavior
----------------
helm list cmd not working

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
regular system

Lab-name: WCP_63-66

Branch/Pull Time/Commit
-----------------------
2020-02-17_04-10-00

Last Pass
---------
Lab: WP_8_12
Load: 2020-02-17_04-10-00

Timestamp/Logs
--------------
[2020-02-18 16:20:23,597] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-02-18 16:20:23,597] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1'

[2020-02-18 16:25:49,424] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-02-18 16:25:49,425] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show controller-1'
[2020-02-18 16:25:50,704] 436 DEBUG MainThread ssh.expect :: Output:
+-----------------------+-----------------------------------------------------------------------+
| Property | Value |
+-----------------------+-----------------------------------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |

[2020-02-18 16:25:50,813] 314 DEBUG MainThread ssh.send :: Send 'helm list'
[2020-02-18 16:25:50,991] 436 DEBUG MainThread ssh.expect :: Output:
Error: forwarding ports: error upgrading connection: unable to upgrade connection: error dialing backend: dial tcp 128.224.151.83:38136: connect: network is unreachable

[2020-02-18 16:25:59,823] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-02-18 16:26:01,034] 436 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+---------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+---------------+----------+-----------+
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
+---------------------+---------+-------------------------------+---------------+----------+-----------+

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - intermittent issue

tags: added: stx.containers
tags: added: stx.4.0
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Frank Miller (sensfan22)
Revision history for this message
Yang Liu (yliu12) wrote :

We are seeing this on multiple IPv6 standard system in daily sanity.
However, "helm ls" command eventually recovered itself after some time, thus it seems like slow recovery issue.

Revision history for this message
Yang Liu (yliu12) wrote :

I withdraw my previous comment regarding helm cmds recovered itself.
I confirmed in the latest sanity, after lock/unlock, helm cmds are failing and never recovered.

The error message is the same as following LP, and workaround is also the same - delete tiller pod. Thus they are likely having the same root cause. What puzzles me is the occurrence of this tiller issue keeps increasing on ipv6 standard system - initially it was very rare, it then happens on every fresh install, now it happens on controller lock/unlock as well.

https://bugs.launchpad.net/starlingx/+bug/1856078

Revision history for this message
Frank Miller (sensfan22) wrote :

Thanks for the update Yang. This is a mystery why it is showing up more often on the standard IPv6 config. We believe this issue may be an initialization order issue for upstream pods/components and for some reason the order of things coming up on the standard IPv6 config might be different than the other configs and triggering the issue more often.

As this is the same trigger/root cause of https://bugs.launchpad.net/starlingx/+bug/1856078, will mark this as a duplicate.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Updating the status to match the duplicate LP: https://bugs.launchpad.net/starlingx/+bug/1856078
Merged on 2020-04-22

Changed in starlingx:
status: Triaged → Fix Released
Peng Peng (ppeng)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.