openvswitch.agent.ovs_neutron_agent fails to Cmd: ['iptables-restore', '-n']

Bug #2033683 reported by Alex Glebov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned
tripleo
Invalid
Undecided
Unassigned

Bug Description

Description
===========
Wallaby deployment via undercloud/overcloud started to fail recently on overcloud node provision
Neutron constantly reports inability to update iptables that in turn makes baremetal to fail to boot from PXE
From the review it seems that /usr/bin/update-alternatives set to legacy fails since neutron user doesn't have sudo to run it
In the info I can see that neutron user has the following subset of commands it's able to run:
...
    (root) NOPASSWD: /usr/bin/update-alternatives --set iptables /usr/sbin/iptables-legacy
    (root) NOPASSWD: /usr/bin/update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
    (root) NOPASSWD: /usr/bin/update-alternatives --auto iptables
    (root) NOPASSWD: /usr/bin/update-alternatives --auto ip6tables

But the issue is the fact that command isn't found as it was moved to /usr/sbin/update-alternatives

Steps to reproduce
==================
1. Deploy undercloud
2. Deploy networks and VIP
3. Add and introspect a node
4. Execute overcloud node provision ... that will timeout

Expected result
===============
Successful overcloud node baremetal provisioning

Logs & Configs
==============
2023-08-31 18:21:28.613 4413 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-18d52177-9c93-401c-b97d-0334e488a257 - - - - -] Error while processing VIF ports: neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Cmd: ['iptables-restore', '-n']; Stdin: # Generated by iptables_manager

2023-08-31 18:21:28.613 4413 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent COMMIT
2023-08-31 18:21:28.613 4413 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent # Completed by iptables_manager
2023-08-31 18:21:28.613 4413 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent ; Stdout: ; Stderr: iptables-restore: line 23 failed

Environment
===========
Centos 9 Stream and undercloud deployment tool

Revision history for this message
Alex Glebov (alexiusflavius) wrote (last edit ):

Additional info - from what I see the affected image is
https://quay.io/repository/tripleowallaby/openstack-neutron-server

Step:
cp /usr/share/openstack-tripleo-common-containers/container-images/kolla/neutron-base/neutron_sudoers /etc/sudoers.d/neutron_sudoers

The source file and execution should be updated with the new command location or maybe link should be created to have both commands available

tags: added: tripleo-common
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

IIUC the "update-alternatives" commands are part of what tripleo does during deployment? Then I'd say that neutron as a project is not directly affected.

Revision history for this message
yatin (yatinkarel) wrote :

https://review.opendev.org/c/openstack/kolla/+/761182 looks missing from ported TripleO images.

From what i see /usr/sbin/update-alternatives used to be actual location since long, why the issue uncovered now?

Changed in neutron:
status: New → Invalid
Revision history for this message
Alex Glebov (alexiusflavius) wrote (last edit ):

Hello,
It's hard to say what project should be patched - but sudo rules on the tripleowallaby / openstack-neutron-server has to be patched
Image itself is using neutron user and it doesn't seem to be able to run the necessary commands to apply correct iptable rules
This in turn prevents deployment of the new nodes as provisioning ain't working and renders the whole cluster failed.
Can someone take a look why the above patch https://review.opendev.org/c/openstack/kolla/+/761182 mentioned here has been excluded from the neutron image?

Changed in neutron:
status: Invalid → New
Revision history for this message
yatin (yatinkarel) wrote :

Hi Alex,

<< Can someone take a look why the above patch https://review.opendev.org/c/openstack/kolla/+/761182 mentioned here has been excluded from the neutron image?

It would have been just missed, since train release Tripleo builds container images natively and not use kolla, You can propose a patch in tripleo-common to fix it.

As said i was more interested to know why the issue seen now as /usr/sbin/update-alternatives used to be the path from long back.

But considering you are using CentOS8-stream containers on CentOS9-stream host i think you are hitting a recent iptables issue in CentOS8-stream[1], you can check version in your running container, if it matches iptables-1.8.5-8 you can downgrade it to resolve the issue temporary, as the fix for it is not yet merged.
If there is no real reason to use CentOS8 images can move to use CentOS 9-Stream based images[2]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2236501
[2] https://quay.io/repository/tripleowallabycentos9/openstack-neutron-server?tab=tags

Again marking it as invalid for neutron, feel free to reopen but share what's expected fix is required in neutron project.

Changed in neutron:
status: New → Invalid
Changed in tripleo:
status: New → Confirmed
Revision history for this message
Alex Glebov (alexiusflavius) wrote :

Hello,
Thanks for such a detailed response, thou I have a running deployment that is using Centos 8 containers already and I would really appreciate if someone can merge the bugfix again into the neutron image,

For the new deployment I'll use Centos 9 if those images don't have the above issue
Thanks

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

Sorry I overlooked the comment 5 and it seems this is a bug in iptables in c8s.
https://bugzilla.redhat.com/show_bug.cgi?id=2236501

Changed in neutron:
status: Invalid → New
status: New → Invalid
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

We are facing this issue in Puppet OpenStack CI which uses RDO stable/yoga and c8s, so this looks like a legit bug in iptables.
I don't think this is also related to TripleO so I'll close this as invalid.

Changed in tripleo:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.