[ML2/OVN] After upgrading from Xena to Yoga neutron-dhcp-agent is not working for Baremetals

Bug #1995287 reported by Grzegorz Koper
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

After upgrading from Xena to Yoga dhcp stopped working for Baremetal instances.

neutron-dhcp-agent is in a weird loop:

https://paste.opendev.org/show/bB3s1Zpd86i2R6ue1sPw/

neutron-server logs :

https://paste.opendev.org/show/bcMFvWkKByzjKhW7v5qO/

agents are reporting up and running:

[stack@server .ssh]$ openstack network agent list | grep -i dhcp
| 02cecb18-841d-47f2-8b3e-c05134cf17b6 | DHCP agent | server3 | nova | :-) | UP | neutron-dhcp-agent |
| 2768ae2b-3869-4acd-9a83-d5446e62099c | DHCP agent | server1 | nova | :-) | UP | neutron-dhcp-agent |
| 8ed34d8b-30f4-45f3-8bfb-4689f616f5c6 | DHCP agent | server2 | nova | :-) | UP | neutron-dhcp-agent |

Grzegorz Koper (koperg)
description: updated
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

After enabling debug in neutron-dhcp-agent:
2022-11-02 09:43:39.106 7 DEBUG neutron.common.utils [req-1d32d76b-a61e-47cd-8cea-7abfc06e1487 - - - - -] Calling throttled function clear wrapper /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/common/utils.py:111
2022-11-02 09:43:39.106 7 DEBUG neutron.agent.dhcp.agent [req-1d32d76b-a61e-47cd-8cea-7abfc06e1487 - - - - -] resync (a56b71cc-2220-479d-97a5-082b722e9e4c): ['Remote error: NetworkNotFound Network a56b71cc-2220-479d-97a5-082b722e9e4c could not be found.\n[\'Traceback (most recent call last):\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 241, in inner\\n return func(*args, **kwargs)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 241, in inner\\n return func(*args, **kwargs)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_lib/db/api.py", line 139, in wrapped\\n setattr(e, \\\'_RETRY_EXCEEDED\\\', True)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__\\n self.force_reraise()\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise\\n raise self.value\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_lib/db/api.py", line 135, in wrapped\\n return f(*args, **kwargs)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper\\n ectxt.value = e.inner_exc\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__\\n self.force_reraise()\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise\\n raise self.value\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper\\n return f(*args, **kwargs)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_lib/db/api.py", line 183, in wrapped\\n LOG.debug("Retry wrapper got retriable exception: %s", e)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__\\n self.force_reraise()\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise\\n raise self.value\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_lib/db/api.py", line 179, in wrapped\\n return f(*dup_args, **dup_kwargs)\\n\', \' File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/handlers/dhcp_rpc.py", line 310, in update_dhcp_port\\n raise exceptions.NetworkNotFound(net_id=network_id)\\n\', \'neutron_lib.exceptions.NetworkNotFound: Network a56b71cc-2220-479d-97a5-082b722e9e4c could not be found.\\n\'].'] _periodic_resync_helper /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:364

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

neutron-server side for similar req:

2022-11-02 09:59:12.703 50 DEBUG neutron.api.rpc.handlers.dhcp_rpc [req-cb2cd6f0-0e2b-46bf-83f0-86c1eaf08bc5 - - - - -] get_active_networks_info from kef1p-phycon0001 get_active_networks_info /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/handlers/dhcp_rpc.py:159
2022-11-02 09:59:13.428 50 DEBUG neutron_lib.db.resource_extend [req-cb2cd6f0-0e2b-46bf-83f0-86c1eaf08bc5 - - - - -] It took 0.21 seconds to run function 'neutron_lib.db.resource_extend.apply_funcs' wrapper /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/timeutils.py:388

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

And additionally that in neutron-server:
2022-11-02 10:01:19.725 50 WARNING neutron.api.rpc.handlers.dhcp_rpc [req-ba479896-8e2d-4dbb-bb3a-cbb62257ed72 - - - - -] The DHCP agent on kef1p-phycon0002 does not host the network a56b71cc-2220-479d-97a5-082b722e9e4c.

The network itself exists:

  % openstack network show a56b71cc-2220-479d-97a5-082b722e9e4c !4221
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2021-06-29T07:14:50Z |
| description | |
| dns_domain | compute.sms-lab.cloud. |
| id | a56b71cc-2220-479d-97a5-082b722e9e4c |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | None |
| is_vlan_transparent | None |
| mtu | 1500 |
| name | stackhpc-ipv4-vlan-v2 |
| port_security_enabled | True |
| project_id | 96f3ea773bb94bdba912ba2af1a9a5c6 |
| provider:network_type | vlan |
| provider:physical_network | physnet1 |
| provider:segmentation_id | 209 |
| qos_policy_id | None |
| revision_number | 11 |
| router:external | Internal |
| segments | None |
| shared | False |
| status | ACTIVE |
| subnets | 39fee5c2-c38c-4050-9578-711c76cc0e5a |
| tags | |
| updated_at | 2021-07-13T08:10:21Z |
+---------------------------+--------------------------------------+

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

As commented in IRC, it could be necessary to check the Neutron server logs (in debug mode) when the DHCP requests that network and why the server is raising the "NetworkNotFound" exception.

Another issue reported in the logs is the problem accessing to the PID files:
2022-11-02 09:57:14.969 7 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/dhcp/a56b71cc-2220-479d-97a5-082b722e9e4c/pid; Error: [Errno 2] No such file or directory: '/var/lib/neutron/dhcp/a56b71cc-2220-479d-97a5-082b722e9e4c/pid' get_value_from_file /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/agent/linux/utils.py:253

Is this directory accessible from the DHCP agent? Did you try to delete these PID/config files before restarting the agent?

Regards.

Revision history for this message
Bartosz Bezak (bbezak) wrote :

I'm also not able to list dhcp-agent on any network:

openstack network agent list --network provision-net --long

ResourceNotFound: 404: Client Error for url: https://test.cloud:9696/v2.0/networks/a9aec3da-d23c-4d5b-be1e-37b50c51a7b5/dhcp-agents, The resource could not be found.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Bartosz:

Please, attach the Neutron server logs in debug mode.

This seems to be a problem with the permissions. If your user has no permissions or the scope is invalid, the result with be a "webob.exc.HTTPNotFound" exception. Check [1]. This patch is not in Xena.

Update the oslo.policy library (if not updated), check your policy file and check the user permissions.

Regards.

[1]https://review.opendev.org/c/openstack/neutron/+/838697

Revision history for this message
Bartosz Bezak (bbezak) wrote (last edit ):

Hi Rodolfo, we're already on Yoga - quite recent build from stable/yoga branch. It was working on Xena though.

logs from client side:

openstack --debug network agent list --network provision-net --long

https://api.sms-lab.cloud:9696 "GET /v2.0/networks/a9aec3da-d23c-4d5b-be1e-37b50c51a7b5/dhcp-agents HTTP/1.1" 404 103
RESP: [404] content-length: 103 content-type: application/json date: Wed, 02 Nov 2022 17:21:50 GMT x-openstack-request-id: req-a0283fc6-04f4-4377-ae6d-de653547393f
RESP BODY: {"NeutronError": {"type": "HTTPNotFound", "message": "The resource could not be found.", "detail": ""}}

logs from neutron-server side

2022-11-02 17:21:50.514 23 WARNING neutron.pecan_wsgi.controllers.resource [req-a0283fc6-04f4-4377-ae6d-de653547393f 562e695ace82486486f7fbebf5d4fd4f 96f3ea773bb94bdba912ba2af1a9a5c6 - default default] No controller found for: dhcp-agents - returning response code 404: pecan.routing.PecanNotFound
2022-11-02 17:21:50.515 23 INFO neutron.pecan_wsgi.hooks.translation [req-a0283fc6-04f4-4377-ae6d-de653547393f 562e695ace82486486f7fbebf5d4fd4f 96f3ea773bb94bdba912ba2af1a9a5c6 - default default] GET failed (client error): The resource could not be found.
2022-11-02 17:21:50.516 23 INFO neutron.wsgi [req-a0283fc6-04f4-4377-ae6d-de653547393f 562e695ace82486486f7fbebf5d4fd4f 96f3ea773bb94bdba912ba2af1a9a5c6 - default default] 91.90.162.148,10.103.1.12 "GET /v2.0/networks/a9aec3da-d23c-4d5b-be1e-37b50c51a7b5/dhcp-agents HTTP/1.1" status: 404 len: 285 time: 0.0062070

Revision history for this message
Elvira García Ruiz (elviragr) wrote :

Hi Bartosz!

Like ralonsoh said on the previous comment, in order to debug this we would need to check the policy.json. It's important to check the permission of the user too, since this looks a lot like a permission issue.

Kind regards.

Revision history for this message
Bartosz Bezak (bbezak) wrote :

We're using default neutron policy.yaml i.e. not overriding anything in neutron.

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

Additional interesting entry from the log:
2022-11-04 08:55:06.532 7 INFO neutron.api.extensions [req-0536ead4-c0f9-4b05-8f94-6225b2eb4b68 - - - - -] Extension dhcp_agent_scheduler not supported by any of loaded plugins
2022-11-04 08:56:22.298 7 DEBUG neutron.api.extensions [req-e5b84e82-6f67-4326-9620-3afe063e11a6 - - - - -] Ext name="DHCP Agent Scheduler" alias="dhcp_agent_scheduler" description="Schedule networks among dhcp agents" updated="2013-02-07T10:00:00-00:00" _check_extension /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/extensions.py:421

I added agent and dhcpagentscheduler to ml2 conf - but didn't really help.

Our current ml2 conf:
[ml2]
type_drivers = flat,vlan,geneve
tenant_network_types = vlan
mechanism_drivers = ovn,genericswitch,sriovnicswitch
extension_drivers = qos,port_security,subnet_dns_publish_fixed_ip

[ml2_type_vlan]
network_vlan_ranges = physnet1:170:199

[ml2_type_flat]
flat_networks = *

[ml2_type_vxlan]
vni_ranges = 1:1000

[ml2_type_geneve]
vni_ranges = 1001:2000
max_header_size = 38

[ovn]
ovn_nb_connection = tcp:10.103.1.10:6641,tcp:10.103.1.11:6641,tcp:10.103.1.12:6641
ovn_sb_connection = tcp:10.103.1.10:6642,tcp:10.103.1.11:6642,tcp:10.103.1.12:6642
ovn_metadata_enabled = True
enable_distributed_floating_ip = True

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.