dhcp-agent failed to open netns

Bug #1864856 reported by Kevin Zhao
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Radosław Piliszek
Rocky
Fix Released
High
Radosław Piliszek
Stein
Fix Released
High
Radosław Piliszek
Train
Fix Released
High
Michal Nasiadka
Ussuri
Fix Released
High
Radosław Piliszek

Bug Description

Train release, check DHCP-agent log: and also l3-agent log:

openstack subnet create: error: argument --subnet-range: expected one argument
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/dhcp.py", line 228, in _enable
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/dhcp.py", line 1545, in setup
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent namespace=network.namespace):
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 949, in ensure_device_is_ready
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent if not dev.link.address:
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 476, in address
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent return self.attributes.get('link/ether')
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 509, in attributes
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent self._parent.namespace)
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 53, in sync_inner
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent return input_func(*args, **kwargs)
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent OSError: [Errno 22] failed to open netns
2020-02-26 14:16:44.000 6 ERROR neutron.agent.dhcp.agent
2020-02-26 14:16:44.002 6 INFO neutron.agent.dhcp.agent [-] Finished network a0b00759-ac50-495a-ac5e-9b03e361f3cb dhcp configuration
2020-02-26 14:16:44.003 6 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qdhcp-9d13dabf-a892-415e-a923-a0c2a619c04a" failed: Invalid argument

2020-02-26 14:16:44.003 6 WARNING neutron.agent.linux.ip_lib [-] Setting ['sysctl', '-w', 'net.ipv6.conf.default.accept_ra=0'] in namespace qdhcp-9d13dabf-a892-415e-a923-a0c2a619c04a failed: Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qdhcp-9d13dabf-a892-415e-a923-a0c2a619c04a" failed: Invalid argument
.: neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qdhcp-9d13dabf-a892-415e-a923-a0c2a619c04a" failed: Invalid argument
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 9d13dabf-a892-415e-a923-a0c2a619c04a.: OSError: [Errno 22] failed to open netns
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/dhcp/agent.py", line 160, in call_driver
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/dhcp.py", line 217, in enable
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent common_utils.wait_until_true(self._enable, timeout=300)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/common/utils.py", line 701, in wait_until_true
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent while not predicate():
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/dhcp.py", line 228, in _enable
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/dhcp.py", line 1545, in setup
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent namespace=network.namespace):
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 949, in ensure_device_is_ready
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent if not dev.link.address:
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 476, in address
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent return self.attributes.get('link/ether')
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/agent/linux/ip_lib.py", line 509, in attributes
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent self._parent.namespace)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 53, in sync_inner
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent return input_func(*args, **kwargs)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.7/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2020-02-26 14:16:44.068 6 ERROR neutron.agent.dhcp.agent OSError: [Errno 22] failed to open netns

Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :

hard to reproduce....
The procedure:
enable_octavia

and run Kolla deploy all-in-one.

After that, create Octavia network:

OCTAVIA_MGMT_SUBNET=192.168.100.0/24
OCTAVIA_MGMT_SUBNET_START=192.168.100.2
OCTAVIA_MGMT_SUBNET_END=192.168.100.254
openstack network create lb-mgmt-net
openstack subnet create --subnet-range $OCTAVIA_MGMT_SUBNET --allocation-pool start=$OCTAVIA_MGMT_SUBNET_START,end=$OCTAVIA_MGMT_SUBNET_END --network lb-mgmt-net lb-mgmt-subnet

openstack security group create lb-mgmt-sec-grp
openstack security group rule create --protocol icmp lb-mgmt-sec-grp
openstack security group rule create --protocol tcp --dst-port 22 lb-mgmt-sec-grp
openstack security group rule create --protocol tcp --dst-port 9443 lb-mgmt-sec-grp

OCTAVIA_HM_LISTEN_PORT=5555
OCTAVIA_AMP_LOG_ADMIN_PORT=10514
OCTAVIA_AMP_LOG_TENANT_PORT=20514

openstack security group create lb-health-mgr-sec-grp
openstack security group rule create --protocol udp --dst-port $OCTAVIA_HM_LISTEN_PORT lb-health-mgr-sec-grp
openstack security group rule create --protocol udp --dst-port $OCTAVIA_AMP_LOG_ADMIN_PORT lb-health-mgr-sec-grp
openstack security group rule create --protocol udp --dst-port $OCTAVIA_AMP_LOG_TENANT_PORT lb-health-mgr-sec-grp

MGMT_PORT_ID=$(openstack port create --security-group lb-health-mgr-sec-grp --device-owner Octavia:health-mgr --host=j12-d05-07 -c id -f value --network lb-mgmt-net octavia-health-manager-standalone-listen-port)

MGMT_PORT_MAC=$(openstack port show -c mac_address -f value $MGMT_PORT_ID)

MGMT_PORT_IP=$(openstack port show -f yaml -c fixed_ips $MGMT_PORT_ID | awk -v IP_VER=$SERVICE_IP_VERSION '{FS=",|";gsub(",","");gsub("'\''","");for(line = 1; line <= NF; ++line) {if ($line ~ /^.*- ip_address:/) {split($line, word, " ");if ((IP_VER == "4" || IP_VER == "") && word[3] ~ /\./) print word[3];if (IP_VER == "6" && word[3] ~ /:/) print word[3];} else {split($line, word, " ");for(ind in word) {if (word[ind] ~ /^ip_address=/) {split(word[ind], token, "=");if ((IP_VER == "4" || IP_VER == "") && token[2] ~ /\./) print token[2];if (IP_VER == "6" && token[2] ~ /:/) print token[2];}}}}}')

# jump to the container ovs_vswitchd
ovs-vsctl -- --may-exist add-port br-int o-hm0 -- set Interface o-hm0 type=internal -- set Interface o-hm0 external-ids:iface-status=active -- set Interface o-hm0 external-ids:attached-mac=$MGMT_PORT_MAC -- set Interface o-hm0 external-ids:iface-id=$MGMT_PORT_ID -- set Interface o-hm0 external-ids:skip_cleanup=true

# Then dhcpclient : change to outside docker
sudo ip link set dev o-hm0 address $MGMT_PORT_MAC
sudo dhclient -v o-hm0

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

I don't think Octavia is needed to reproduce it - Dincer had the same problem on neutron-l3-agent. Kevin - can you check if https://review.opendev.org/#/c/710051/ fixes the bug?

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Download full text (5.8 KiB)

This is reproducible on centos7/train (Dincer case):
need to reconfigure and (extra) restart l3 agent container.

The issue is that on 2nd restart container has invalid references to netns inside itself - instead of proc mounts, it has a bunch of regular files which are unusable for l3 agent.

Logs:

2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent [-] Error while initializing router 87c3c36f-1038-45d4-afc0-c9c109239b66: ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qrouter-87c3c36f-1038-45d4-afc0-c9c109239b66" failed: Inv
alid argument
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 485, in _router_added
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent ri.initialize(self.process_monitor)
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 167, in initialize
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent self.router_namespace.create()
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py", line 97, in create
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent ip_wrapper.netns.execute(cmd)
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 713, in execute
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent run_as_root=run_as_root)
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent returncode=returncode)
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qrouter-87c3c36f-1038-45d4-afc0-c9c109239b66" failed: Invalid argument
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent
2020-02-27 12:20:58.626 17 ERROR neutron.agent.l3.agent

2020-02-27 12:20:58.644 17 WARNING neutron.agent.l3.agent [-] Hit retry limit with router update for 87c3c36f-1038-45d4-afc0-c9c109239b66, action 3
2020-02-27 12:20:58.644 17 WARNING neutron.agent.l3.agent [-] Info for router 87c3c36f-1038-45d4-afc0-c9c109239b66 was not found. Performing router cleanup
2020-02-27 12:20:58.657 17 ERROR neutron.agent.l3.agent [-] Error while deleting router 87c3c36f-1038-45d4-afc0-c9c109239b66: OSError: [Errno 22] failed to open netns
2020-02-27 12:20:58.657 17 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-02-27 12:20:58.657 17 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 506, in _safe_router_removed
2020-02-27 12:20:58.657 17 ERROR neutron.agent.l3.agent self._router_removed(ri, router_id)
2020-02-27 12:20:58.657 17 ERROR neutron.agent.l3.agent File "/...

Read more...

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

^ this also breaks router connectivity as l3 agent "could not restore it".

It can be workarounded by clearing - rm /run/netns/* - in container and restarting it.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Kevin, please confirm https://review.opendev.org/710051 fixes your issue. It fixes at least one other one.

Changed in kolla-ansible:
assignee: Michal Nasiadka (mnasiadka) → Radosław Piliszek (yoctozepto)
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :

@yoctozepto,
Thanks! I have not got time to test yesterday. Will test it today to see the consequence.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/710051
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=61a59e015f91041e01e51310229ff91c2cac730d
Submitter: Zuul
Branch: master

commit 61a59e015f91041e01e51310229ff91c2cac730d
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 26 16:00:47 2020 +0100

    Add /run/netns bindmount to Neutron containers

    Closes-Bug: #1864856
    Change-Id: I725eeb18a22b3fa7838f16761d19f7e699ab5e82

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/710445

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/710445
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=9f9e1efa8c54880a5a328b9c8a23744026b7c2e0
Submitter: Zuul
Branch: stable/train

commit 9f9e1efa8c54880a5a328b9c8a23744026b7c2e0
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 26 16:00:47 2020 +0100

    Add /run/netns bindmount to Neutron containers

    Closes-Bug: #1864856
    Change-Id: I725eeb18a22b3fa7838f16761d19f7e699ab5e82
    (cherry picked from commit 61a59e015f91041e01e51310229ff91c2cac730d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/710513

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/710514

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/710514
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=aed54c4ff0b71efa2ec075003d269259b0b6dbc6
Submitter: Zuul
Branch: stable/rocky

commit aed54c4ff0b71efa2ec075003d269259b0b6dbc6
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 26 16:00:47 2020 +0100

    Add /run/netns bindmount to Neutron containers

    Closes-Bug: #1864856
    Change-Id: I725eeb18a22b3fa7838f16761d19f7e699ab5e82
    (cherry picked from commit 61a59e015f91041e01e51310229ff91c2cac730d)
    (cherry picked from commit 9f9e1efa8c54880a5a328b9c8a23744026b7c2e0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/710513
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=710af675dfb456e7cba861f162b8e3e2e0128a89
Submitter: Zuul
Branch: stable/stein

commit 710af675dfb456e7cba861f162b8e3e2e0128a89
Author: Michal Nasiadka <email address hidden>
Date: Wed Feb 26 16:00:47 2020 +0100

    Add /run/netns bindmount to Neutron containers

    Closes-Bug: #1864856
    Change-Id: I725eeb18a22b3fa7838f16761d19f7e699ab5e82
    (cherry picked from commit 61a59e015f91041e01e51310229ff91c2cac730d)
    (cherry picked from commit 9f9e1efa8c54880a5a328b9c8a23744026b7c2e0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.2.1

This issue was fixed in the openstack/kolla-ansible 7.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.