IPv6: custom app apply failed by applying process terminated

Bug #1843932 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Dan Voiculeasa

Bug Description

Brief Description
-----------------
uploaded a custom app, like hello-kitty, then applied it. The status showed that the completion was 100%, but eventually, the status changed to apply-failed and the process msg shows "Unexpected process termination while application-apply was in progress."

Severity
--------
Major

Steps to Reproduce
------------------
uploaded a custom app, like hello-kitty,
application apply
check the applying status

TC-name: z_containers/test_custom_containers.py::test_launch_app_via_sysinv

Expected Behavior
------------------
apply completed

Actual Behavior
----------------
apply failed

Reproducibility
---------------
Seen once

System Configuration
--------------------
Multi-node system
IPv6
Lab-name: WCP_71-75

Branch/Pull Time/Commit
-----------------------
stx master as of 2019-09-12_20-00-00

Last Pass
---------
This TC has not passed on an IPv6 lab before.
Passes on IPv4. Last Pass: 2019-09-09_00-10-00

Timestamp/Logs
--------------
[2019-09-13 14:37:04,313] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-apply hello-kitty'

[2019-09-13 14:48:33,768] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-09-13 14:48:35,160] 433 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+---------------+----------+--------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+---------------+----------+--------------------------------------------------------------+
| hello-kitty | 1.0 | hello-kitty | manifest.yaml | applying | processing chart: hk-hello-kitty, overall completion: 100.0% |
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
+---------------------+---------+-------------------------------+---------------+----------+--------------------------------------------------------------+
controller-1:~$

[2019-09-13 14:50:46,445] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-09-13 14:50:47,897] 433 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------+-------------------------------+---------------+--------------+-------------------------------------------------------------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+---------------+--------------+-------------------------------------------------------------------------------------------------------------------------------+
| hello-kitty | 1.0 | hello-kitty | manifest.yaml | apply-failed | Unexpected process termination while application-apply was in progress. The application status has changed from 'applying' to |
| | | | | | 'apply-failed'. |
| | | | | | |
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
+---------------------+---------+-------------------------------+---------------+--------------+-------------------------------------------------------------------------------------------------------------------------------+
controller-0:~$

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Another apply failure is reported for the same app:
https://bugs.launchpad.net/starlingx/+bug/1843932

description: updated
tags: added: stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
assignee: nobody → Bob Church (rchurch)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Bob to do initial triage

Ghada Khalil (gkhalil)
description: updated
summary: - custom app apply failed by applying process terminated
+ IPv6: custom app apply failed by applying process terminated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 gating until further investigation

Changed in starlingx:
assignee: Bob Church (rchurch) → Dan Voiculeasa (dvoicule)
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0
Revision history for this message
Dan Voiculeasa (dvoicule) wrote :
Download full text (5.2 KiB)

I've seen once a sanity lab in the reproduced state. The hello-kitty pod failed to respond to Liveness and Readiness probes.
"Liveness probe failed ..."
"Readiness probe failed ..."
Trying to set up a virtual IPv6 lab i've seen the exact same log for calico-node, didn't write the output somewhere :(.

I tried to reproduce the issue with a newer build the output changed from "Liveness probe failed" to the below. Same behavior for calico-node/hello-kitty depending on default route set on host.

fd00::1 is on a NAT64/DNS64 node.

ansible-playbook
 dns_servers:
 - fd00::1
 external_oam_subnet: fd00::/64
 external_oam_gateway_address: fd00::1
 external_oam_floating_address: fd00::2
 external_oam_node_0_address: fd00::3
 external_oam_node_1_address: fd00::4
 management_subnet: fd01::/64
 management_multicast_subnet: ff08::1:1:0/124
 cluster_host_subnet: fd02::/64
 cluster_pod_subnet: fd03::/64
 cluster_service_subnet: fd04::/112
 docker_no_proxy:
   - tis-lab-registry.cumulus.wrs.com

Apply hello-kitty:
 kubectl --kubeconfig /etc/kubernetes/admin.conf apply -f /home/sysadmin/custom_apps/hellokitty.yaml

No default route:
   Warning FailedCreatePodSandBox 74s kubelet, controller-0 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0dbbc088b1baba5581533c73dd
 ff77f1605bfddb5c37a4cc8c7a396beeb183aea" network for pod "hellokitty": NetworkPlugin cni failed to set up pod "hellokitty_default" network: Multus: Err adding pod to network "chain": Multus: error in
 invoke Conflist add - "chain": error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[fd04::1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial
 tcp [fd04::1]:443: connect: network is unreachable, failed to clean up sandbox container "0dbbc088b1baba5581533c73dff77f1605bfddb5c37a4cc8c7a396beeb183aea" network for pod "hellokitty": NetworkPlugin
 cni failed to teardown pod "hellokitty_default" network: Multus: error in invoke Conflist Del - "chain": error in getting result from DelNetworkList: error getting ClusterInformation: Get https://[fd00
 4::1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp [fd04::1]:443: connect: network is unreachable]
   Warning FailedCreatePodSandBox 67s kubelet, controller-0 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "7b94821cda718783348d63b822
 e0c16a170ee4b3841bdfa2e3e90d672042681a3" network for pod "hellokitty": NetworkPlugin cni failed to set up pod "hellokitty_default" network: Multus: Err adding pod to network "chain": Multus: error in
 invoke Conflist add - "chain": error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[fd04::1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial
 tcp [fd04::1]:443: connect: network is unreachable, failed to clean up sandbox container "7b94821cda718783348d63b82e0c16a170ee4b3841bdfa2e3e90d672042681a3" network for pod "hellokitty": NetworkPlugin
 cni failed to teardown pod "hellokitty_default" network: Multus: error in invoke Conflist Del - "chain": erro...

Read more...

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

I believe the issue here is the missing default route.
Duplicate of https://bugs.launchpad.net/starlingx/+bug/1844192

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

Finally could reproduce it, it's a different issue. Looking into it now .

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

It is a hello-kitty app issue.

Liveness/Readiness probe is done with a http get to an app running inside container.
The app listens on ipv4.

app.py:
  app.run(host='0.0.0.0', port=80)

Changing to listen to ipv4 and ipv6 gives HTTP Error 500:
  app.run(host='::', port=80)

Probes pass only if HTTP 2xx to 4xx are returned.

Changing the whole app.py passes the probes:
from flask import Flask
import os
import socket

app = Flask(__name__)

@app.route("/")
def hello():
    html = "<h3>Hello Kitty {name}!</h3>"
    return html

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=80)

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::80 :::* LISTEN 1/python

I meant the following, but it doesn't listen for ipv4 also :
from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
    html = "<h3>Hello Kitty {name}!</h3>"
    return html

if __name__ == "__main__":
    app.run(host='::', port=80)

Yang Liu (yliu12)
tags: added: stx.retestneeded
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Not seeing this issue recently

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.