stx-openstack: neutron netns cleanup job is deadlocking the cpu

Bug #1964507 reported by Thiago Paiva Brito
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Thiago Paiva Brito

Bug Description

Brief Description
-----------------
The neutron-netns-cleanup-cron pods are erroring out and the lack of proper management on the script is causing a high cpu consumption (>50%). This is caused because we are overwriting a default configuration for root_helper on the neutron chart.

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
Just apply stx-openstack and look at the neutron-netns-cleanup-cron-default-* jobs. Resource consumption can be verified using top

Expected Behavior
------------------
Job should run and sleep for a whole day.

Actual Behavior
----------------
Job is looping out and consuming more than 50% of cpu

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
All configurations

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Developer Testing

Workaround
----------
system helm-override-update wr-openstack neutron openstack --set conf.neutron.agent.root_helper="sudo /var/lib/openstack/bin/neutron-rootwrap /etc/neutron/rootwrap.conf"
system application-apply stx-openstack

Revision history for this message
Thiago Paiva Brito (outbrito) wrote :

The bug on the script itself that will prevent this from happening in case of any future misconfiguration is dealt with on https://review.opendev.org/c/openstack/openstack-helm/+/833160

This bug only covers openstack-armada-app. The aim is to remove the override that is causing the error. We override conf.neutron.agent.root_helper with just "sudo", but calling "ip netns list" with sudo causes a problem the current version of our base OS for the neutron image:

```
sh-4.2$ sudo ip netns list
sudo: unable to mkdir /run/sudo: Read-only file system

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for neutron:
```

Using the default call with the rootwrap that osh uses by default [1] solves the problem, so I'll remove the override.

[1] https://github.com/openstack/openstack-helm/blame/master/neutron/values.yaml#L1903

Changed in starlingx:
assignee: nobody → Thiago Paiva Brito (outbrito)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/833162
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/352e34cd2aa12c0bbf8b5a06c7b621aa480f9a9b
Submitter: "Zuul (22348)"
Branch: master

commit 352e34cd2aa12c0bbf8b5a06c7b621aa480f9a9b
Author: Thiago Brito <email address hidden>
Date: Thu Mar 10 15:19:28 2022 -0300

    Fix misconfig causing neutron netns job fail

    We should be using the default rootwrap that OSH uses, not overwriting
    this with sudo. Using just sudo cause a permission problem running this
    script.

    Closes-Bug: #1964507
    Signed-off-by: Thiago Brito <email address hidden>
    Change-Id: I0b16483504fcc83020e3387b7696b55b618a0942

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.distro.openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.