os_tempest: Ping router ip address - Destination Host Unreachable

Bug #1953738 reported by Ananya Banerjee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

periodic-tripleo-ci-centos-9-scenario007-standalone-master is failing at task [os_tempest : Ping router ip address] with

TASK [os_tempest : Ping router ip address] *************************************
2021-12-08 08:42:29.331858 | primary | Wednesday 08 December 2021 08:42:29 -0500 (0:00:00.043) 0:32:50.959 ****
2021-12-08 08:42:33.921328 | primary | FAILED - RETRYING: Ping router ip address (5 retries left).
2021-12-08 08:42:48.749615 | primary | FAILED - RETRYING: Ping router ip address (4 retries left).
2021-12-08 08:43:03.284398 | primary | FAILED - RETRYING: Ping router ip address (3 retries left).
2021-12-08 08:43:17.743868 | primary | FAILED - RETRYING: Ping router ip address (2 retries left).
2021-12-08 08:43:32.346524 | primary | FAILED - RETRYING: Ping router ip address (1 retries left).
2021-12-08 08:43:46.894235 | primary | fatal: [undercloud]: FAILED! => {"attempts": 5, "changed": true, "cmd": "set -e\nping -c2 \"192.168.24.178\"\n", "delta": "0:00:03.096241", "end": "2021-12-08 13:43:46.635251", "msg": "non-zero return code", "rc": 1, "start": "2021-12-08 13:43:43.539010", "stderr": "", "stderr_lines": [], "stdout": "PING 192.168.24.178 (192.168.24.178) 56(84) bytes of data.\nFrom 192.168.24.1 icmp_seq=1 Destination Host Unreachable\nFrom 192.168.24.1 icmp_seq=2 Destination Host Unreachable\n\n--- 192.168.24.178 ping statistics ---\n2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1040ms\npipe 2", "stdout_lines": ["PING 192.168.24.178 (192.168.24.178) 56(84) bytes of data.", "From 192.168.24.1 icmp_seq=1 Destination Host Unreachable", "From 192.168.24.1 icmp_seq=2 Destination Host Unreachable", "", "--- 192.168.24.178 ping statistics ---", "2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1040ms", "pipe 2"]}
2021-12-08 08:43:46.895013 | primary |

https://logserver.rdoproject.org/08/36508/16/check/periodic-tripleo-ci-centos-9-scenario007-standalone-master/46d365d/job-output.txt

Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → yoga-1
tags: added: ci
tags: added: alert promotion-blocker
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (10.5 KiB)

In the L3 agent logs in that failed job I see errors with spawning keepalived sidecar container:

2021-12-08 13:42:35.111 131589 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: ac39aabb-b846-44d3-a702-54a58e5b187a: neutron_lib.exceptions.ProcessExecutionError: Exit code: 127; Cmd: ['ip', 'netns', 'exec', 'qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived', '-r', '/var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived-vrrp', '-D']; Stdin: ; Stdout: Starting a new child container neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a
; Stderr: + export DOCKER_HOST=
+ DOCKER_HOST=
+ ARGS='-P -f /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a/keepalived.conf -p /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived -r /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived-vrrp -D'
++ ip netns identify
+ NETNS=qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a
+ NAME=neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a
+ CLI='nsenter --net=/run/netns/qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a --preserve-credentials -m -t 1 podman'
+ LOGGING='--log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a.log'
+ CMD='/usr/sbin/keepalived -n -l -D'
++ nsenter --net=/run/netns/qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-keepalived- --format '{{.ID}}:{{.Names}}:{{.Status}}'
++ awk '{print $1}'
+ LIST=
++ printf '%s\n' ''
++ grep -E ':(Exited|Created)'
+ ORPHANTS=
+ '[' -n '' ']'
+ printf '%s\n' ''
+ grep -q 'neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a$'
+ echo 'Starting a new child container neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a'
+ nsenter --net=/run/netns/qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /lib/modules:/lib/modules:ro -v /sbin/modprobe:/sbin/modprobe:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --cgroupns host --privileged -u root --name neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a 192.168.24.1:8787/tripleomastercentos9/openstack-neutron-l3-agent:9d27ee6e0e4f5143dbc49d4c775c9cd5-updated-20211208080944 /usr/sbin/keepalived -n -l -D -P -f /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a/keepalived.conf -p /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived -r /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived-vrrp -D
Error: create directory `/sys/fs/cgroup/../../libpod-74030e1ee20a0de5fd4791acb2b5751fe0e1e8e03f6bb4f5714da41487e27d94.scope`: No...

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Ahh, one more thing. If that issue is blocker and needs to be fixed/workarounded fast, You can change neutron server's "l3_ha" config option to False. That way it will use legacy, nonHA routers so keepalived container will not be needed at all.

Revision history for this message
Brent Eagles (beagles) wrote :

"Error: create directory `/sys/fs/cgroup/../../libpod-74030e1ee20a0de5fd4791acb2b5751fe0e1e8e03f6bb4f5714da41487e27d94.scope`: No...:"

Suggests we should look at how the sidecars are being launched and whether there are new requirements for cgroup2 etc. in CentOS 9.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to critical since neutron side cars provide a critical path

Changed in tripleo:
importance: High → Critical
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Hello,

indeed, can't say more for now (didn't really test cs9), but this is the right issue:

+ nsenter --net=/run/netns/qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a --preserve-credentials -m -t 1 podman run --detach --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /lib/modules:/lib/modules:ro -v /sbin/modprobe:/sbin/modprobe:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:shared -v /dev/log:/dev/log --net host --pid host --cgroupns host --privileged -u root --name neutron-keepalived-qrouter-ac39aabb-b846-44d3-a702-54a58e5b187a 192.168.24.1:8787/tripleomastercentos9/openstack-neutron-l3-agent:9d27ee6e0e4f5143dbc49d4c775c9cd5-updated-20211208080944 /usr/sbin/keepalived -n -l -D -P -f /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a/keepalived.conf -p /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived -r /var/lib/neutron/ha_confs/ac39aabb-b846-44d3-a702-54a58e5b187a.pid.keepalived-vrrp -D
Error: create directory `/sys/fs/cgroup/../../libpod-74030e1ee20a0de5fd4791acb2b5751fe0e1e8e03f6bb4f5714da41487e27d94.scope`: No such file or directory: OCI runtime attempted to invoke a command that was not found

It would be interesting to check the denials in parallel (though CI is in permissive), and see if we get any other error in the system logs.

A local reproducer would also be nice so that we can iterate on it.

Wondering if the *path* isn't different with cgroup2... Needs some investigations there imho.

Also, mayb nsenter also has some cgroup2 option?

Revision history for this message
Marios Andreou (marios-b) wrote :

https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821392 was posted as a workaround based on comment #2 above.

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
David Vallee Delisle (valleedelisle) wrote :

I don't have a reproducer for this issue but this is how we fixed it in nova:

~~~
diff --git a/deployment/neutron/neutron-l3-container-puppet.yaml b/deployment/neutron/neutron-l3-container-puppet.yaml
index 15fd5660f..49f860a7b 100644
--- a/deployment/neutron/neutron-l3-container-puppet.yaml
+++ b/deployment/neutron/neutron-l3-container-puppet.yaml
@@ -292,6 +292,7 @@ outputs:
             pid: host
             privileged: true
             restart: always
+ cgroupns: host
             depends_on:
               - openvswitch.service
             healthcheck: {get_attr: [ContainersCommon, healthcheck_rpc_port]}
~~~

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "David Vallee Delisle <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/821709
Reason: zuul stuck?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/821709
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/157d0c112bf21139b4d9ca076f1121a941a35114
Submitter: "Zuul (22348)"
Branch: master

commit 157d0c112bf21139b4d9ca076f1121a941a35114
Author: David Vallee Delisle <email address hidden>
Date: Tue Dec 14 09:58:06 2021 -0500

    Start the l3 agent with cgroupns: host

    Since the l3 agent is spinning containers, it should use the host cgroups
    namespaces just like we did in nova [1]

    [1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/802489/

    Related-Bug: #1936005
    Closes-Bug: #1953738
    Change-Id: Ic83e946e1f3dc912bc4cf8270d66ecc7c2324c96

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by "Ananya <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821392
Reason: the proper fix https://review.opendev.org/c/openstack/tripleo-heat-templates/+/821709 has been merged

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/827642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/827642
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/f962b8e14829d896b732867e2a9b862b4323ecb4
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f962b8e14829d896b732867e2a9b862b4323ecb4
Author: David Vallee Delisle <email address hidden>
Date: Tue Dec 14 09:58:06 2021 -0500

    Start the l3 agent with cgroupns: host

    Since the l3 agent is spinning containers, it should use the host cgroups
    namespaces just like we did in nova [1]

    [1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/802489/

    Related-Bug: #1936005
    Closes-Bug: #1953738
    Closes-Bug: #1959582
    Change-Id: Ic83e946e1f3dc912bc4cf8270d66ecc7c2324c96
    (cherry picked from commit 157d0c112bf21139b4d9ca076f1121a941a35114)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 16.0.0

This issue was fixed in the openstack/tripleo-heat-templates 16.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.