2023-10-01 08:20:27 |
Grisha Tsukerman |
bug |
|
|
added bug |
2023-10-02 11:45:15 |
Lajos Katona |
tags |
|
l3-ipam-dhcp |
|
2023-10-02 11:45:30 |
Lajos Katona |
neutron: importance |
Undecided |
Critical |
|
2023-10-02 12:02:37 |
Jeremy Stanley |
description |
Steps to reproduce:
1.Create a network with several subnets and a router.
2.Delete the router and quickly afterwards delete the subnets and finally the network.
Expected behavior:
- Subnet and networks should be deleted as expected after deleting the router.
Actual behavior:
1.Router is not deleted properly (the port is not deleted)
2.Because of the above, the subnet and network deletion tasks are dropped because of the design of the task management in DHCP agent.
RCA:
1. Router deletion failure:
a. Eventually the task port_delete_end is called from the router deletion for the port: https://github.com/openstack/neutron/blob/stable/yoga/neutron/agent/dhcp/agent.py
b. As part of the event queue, the resource __lt__ function is called to check for the IP:
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L177C1-L178C1
c. The __lt__ function fails because when a router uses the port_delete_end, the fixed_ip 'ip_address' key is not accessible.
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/dhcp/agent.py#L86
d. Since there is no error handling in the primary loop, all other tasks that were within the queue are forgotten
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L156
As far as I understand, there are two problems:
1. In this commit https://github.com/openstack/neutron/commit/53000704f211bbbd5e439890015891039ef6752e the __lt__ functionality was changed but did not support the router port deleteion.
2. The primary worker loop mechanism does not support unexpected behavior like crashes and such. Is it by design that all other tasks will drop in this case?
Here's a small visualization: TBD
Version:
Yoga |
This issue is being treated as a potential security risk under
embargo. Please do not make any public mention of embargoed
(private) security vulnerabilities before their coordinated
publication by the OpenStack Vulnerability Management Team in the
form of an official OpenStack Security Advisory. This includes
discussion of the bug or associated fixes in public forums such as
mailing lists, code review systems and bug trackers. Please also
avoid private disclosure to other individuals not already approved
for access to this information, and provide this same reminder to
those who are made aware of the issue prior to publication. All
discussion should remain confined to this private bug report, and
any proposed fixes should be added to the bug as attachments. This
embargo shall not extend past 2023-12-31 and will be made
public by or on that date even if no fix is identified.
Steps to reproduce:
1.Create a network with several subnets and a router.
2.Delete the router and quickly afterwards delete the subnets and finally the network.
Expected behavior:
- Subnet and networks should be deleted as expected after deleting the router.
Actual behavior:
1.Router is not deleted properly (the port is not deleted)
2.Because of the above, the subnet and network deletion tasks are dropped because of the design of the task management in DHCP agent.
RCA:
1. Router deletion failure:
a. Eventually the task port_delete_end is called from the router deletion for the port: https://github.com/openstack/neutron/blob/stable/yoga/neutron/agent/dhcp/agent.py
b. As part of the event queue, the resource __lt__ function is called to check for the IP:
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L177C1-L178C1
c. The __lt__ function fails because when a router uses the port_delete_end, the fixed_ip 'ip_address' key is not accessible.
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/dhcp/agent.py#L86
d. Since there is no error handling in the primary loop, all other tasks that were within the queue are forgotten
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L156
As far as I understand, there are two problems:
1. In this commit https://github.com/openstack/neutron/commit/53000704f211bbbd5e439890015891039ef6752e the __lt__ functionality was changed but did not support the router port deleteion.
2. The primary worker loop mechanism does not support unexpected behavior like crashes and such. Is it by design that all other tasks will drop in this case?
Here's a small visualization: TBD
Version:
Yoga |
|
2023-10-02 12:02:45 |
Jeremy Stanley |
bug task added |
|
ossa |
|
2023-10-02 12:02:54 |
Jeremy Stanley |
ossa: status |
New |
Incomplete |
|
2023-10-02 12:03:13 |
Jeremy Stanley |
bug |
|
|
added subscriber Neutron Core Security reviewers |
2024-01-04 14:50:10 |
Jeremy Stanley |
description |
This issue is being treated as a potential security risk under
embargo. Please do not make any public mention of embargoed
(private) security vulnerabilities before their coordinated
publication by the OpenStack Vulnerability Management Team in the
form of an official OpenStack Security Advisory. This includes
discussion of the bug or associated fixes in public forums such as
mailing lists, code review systems and bug trackers. Please also
avoid private disclosure to other individuals not already approved
for access to this information, and provide this same reminder to
those who are made aware of the issue prior to publication. All
discussion should remain confined to this private bug report, and
any proposed fixes should be added to the bug as attachments. This
embargo shall not extend past 2023-12-31 and will be made
public by or on that date even if no fix is identified.
Steps to reproduce:
1.Create a network with several subnets and a router.
2.Delete the router and quickly afterwards delete the subnets and finally the network.
Expected behavior:
- Subnet and networks should be deleted as expected after deleting the router.
Actual behavior:
1.Router is not deleted properly (the port is not deleted)
2.Because of the above, the subnet and network deletion tasks are dropped because of the design of the task management in DHCP agent.
RCA:
1. Router deletion failure:
a. Eventually the task port_delete_end is called from the router deletion for the port: https://github.com/openstack/neutron/blob/stable/yoga/neutron/agent/dhcp/agent.py
b. As part of the event queue, the resource __lt__ function is called to check for the IP:
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L177C1-L178C1
c. The __lt__ function fails because when a router uses the port_delete_end, the fixed_ip 'ip_address' key is not accessible.
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/dhcp/agent.py#L86
d. Since there is no error handling in the primary loop, all other tasks that were within the queue are forgotten
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L156
As far as I understand, there are two problems:
1. In this commit https://github.com/openstack/neutron/commit/53000704f211bbbd5e439890015891039ef6752e the __lt__ functionality was changed but did not support the router port deleteion.
2. The primary worker loop mechanism does not support unexpected behavior like crashes and such. Is it by design that all other tasks will drop in this case?
Here's a small visualization: TBD
Version:
Yoga |
Steps to reproduce:
1.Create a network with several subnets and a router.
2.Delete the router and quickly afterwards delete the subnets and finally the network.
Expected behavior:
- Subnet and networks should be deleted as expected after deleting the router.
Actual behavior:
1.Router is not deleted properly (the port is not deleted)
2.Because of the above, the subnet and network deletion tasks are dropped because of the design of the task management in DHCP agent.
RCA:
1. Router deletion failure:
a. Eventually the task port_delete_end is called from the router deletion for the port: https://github.com/openstack/neutron/blob/stable/yoga/neutron/agent/dhcp/agent.py
b. As part of the event queue, the resource __lt__ function is called to check for the IP:
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L177C1-L178C1
c. The __lt__ function fails because when a router uses the port_delete_end, the fixed_ip 'ip_address' key is not accessible.
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/dhcp/agent.py#L86
d. Since there is no error handling in the primary loop, all other tasks that were within the queue are forgotten
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L156
As far as I understand, there are two problems:
1. In this commit https://github.com/openstack/neutron/commit/53000704f211bbbd5e439890015891039ef6752e the __lt__ functionality was changed but did not support the router port deleteion.
2. The primary worker loop mechanism does not support unexpected behavior like crashes and such. Is it by design that all other tasks will drop in this case?
Here's a small visualization: TBD
Version:
Yoga |
|
2024-01-04 14:50:17 |
Jeremy Stanley |
information type |
Private Security |
Public Security |
|