network-ip-availabilities' result is not correct when the subnet has no allocation-pool

Bug #1843211 reported by yangjianfeng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Opinion
Wishlist
Unassigned

Bug Description

Rocky & maybe master:
I create a network and a subnet.
Then, I update the subnet to no-allocation-pool
 openstack subnet set 0f4d5c95-879d-4b2a-90d0-350e09375f80 --no-allocation-pool

Next,I check the availability ips of the network:
  openstack ip availability show 440453b6-d5be-47a8-abed-df7c491e6501
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| id | 440453b6-d5be-47a8-abed-df7c491e6501 |
| name | None |
| network_id | 440453b6-d5be-47a8-abed-df7c491e6501 |
| network_name | yjfaaa |
| project_id | 7be52ee6c397430085c1dc3accf4e795 |
| subnet_ip_availability | cidr='6.6.6.0/24', ip_version='4', subnet_id='0f4d5c95-879d-4b2a-90d0-350e09375f80', subnet_name='yjfaaa-sub', total_ips='256', used_ips='3' |
| total_ips | 256 |
| used_ips | 3 |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+

But I can't create port in the network:
  openstack port create test --network 440453b6-d5be-47a8-abed-df7c491e6501
{"NeutronError": {"message": "No more IP addresses available on network 440453b6-d5be-47a8-abed-df7c491e6501.", "type": "IpAddressGenerationFailure", "detail": ""}}

description: updated
Revision history for this message
yangjianfeng (yangjianfeng) wrote :

I read about the codes, It will calculate the total ip according to subnet's cidr when the subnet has no IPAllocationPool. I think this is not reasonable. I think we should add a field "available_ips", it is calculated base on IPAllocationPool. The "total_ips" was calculated according to "subnet.cidr".

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/681705

Changed in neutron:
assignee: nobody → yangjianfeng (yangjianfeng)
status: New → In Progress
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello yangjianfeng:

I you remove the allocation pools in the subnet, how do you expect the IPAM driver to return an IP address? When a port without a fixed IP is created, the IPAM driver will read the subnets of the network all will retrieve a non used IP address from the allocation pools on those subnets.

Please, do not match terms:
- The allocation pool in a subnet is, from the subnet CIDR, the range of IPs that the subnet can "give" a new port.
- The IP availability is the number of free IPs in the CIDR range.

IMO, this bug is invalid.

Revision history for this message
Akihiro Motoki (amotoki) wrote :

"allocation_pools" defines a set of IP addresses that neutron can assign to a port *automatically*. You remove all addresses in the allocation pools in the subnet, so neutron considers there is no IP address for automatic allocation for a new port. That's the expected behavior. You can still create a port by specifying an IP address from the command line.

Revision history for this message
Akihiro Motoki (amotoki) wrote :

Regarding the behavior of the IP availability API, I think total_ips needs to be fixed.

In my understanding, "total_ips" is the size of the allocation pools (the number of IP addresses in the allocation pools) and "user_ips" is the number of IP address used by neutron ports.

My understanding above looks correct based on my test http://paste.openstack.org/show/775414/.
I created a subnet with an allocation pool which has 100 IP addresses. I got 100 as total_ips.

In the case of this bug report, the size of the allocation pools is zero, so "total_ips" should be zero.

Note that we can create a port with IP address outside of the allocation pools so the number of available IP addresses does not match "total_ips - used_ips" necessarily.

Changed in neutron:
importance: Undecided → Low
Revision history for this message
Akihiro Motoki (amotoki) wrote :

Looking at the proposed patch, you introduced a new field "available_ips".
This is an API behavior change. If you want an API change, it is worth RFE.
If so, you describe what kind of improvement is needed.
As of now, I don't consider this as RFE and it is just a normal bug report.
Normal bug fixes should change the API change.

Revision history for this message
yangjianfeng (yangjianfeng) wrote :

Hello Rodolfo:
Maybe I didn’t make it clear, you are right. if I remove the allocation pools in the subnet. the port shouldn't create successfully. Because of the subnet can't allocate IP, so the number of assignable IPs should be 0. But, the "openstack ip availability show" command tells me that the network's number of total IPs is 256, the number of used IPs is 3. So, I can't accurately calculate the number of available IPs from the return result of the command line. Obviously,this is unreasonable.

So, I think we should make some adjustments to this API[1]. We should add a 'available_ips' field that represents the number of assignable IPs directly. The 'total_ips' only represents the number of IPs for subnet CIDR.

[1] https://docs.openstack.org/api-ref/network/v2/index.html?#network-ip-availability-and-usage-stats

Revision history for this message
yangjianfeng (yangjianfeng) wrote :

Hi, Akihiro:
    Yes, If we can find a way to calculate the "total_ips" correctly, the "available_ips" shouldn't be introduced. I have thought about some calculation methods, but I always feel that there are some logic holes.
As you commented at #5, the number of available IP addresses can't accurately be calculated only according to "total_ips" and "used_ips". This will cause some confusion to the users.

I suggest that the "total_ips" is the number of the all IP address of a network, it be calculated according to subnet's CIDR. The introduced a new field "available_ips" rename as "allocable_ips", it represents the rest of the number of IPs in allocation pools.

tags: added: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Proposal from yangjianfeng makes sense for me. We can use IMO "total_ips" as number of all IPs in subnet - so IP addresses which user can manually use to create ports (see comment #4 from Akihiro). And new field with number of IPs available to allocate automatically could be added to this list also.
Let's discuss that on next drivers team meeting.

tags: added: rfe-triaged
tags: removed: rfe
Changed in neutron:
importance: Low → Wishlist
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

yangjianfeng,

i don't understand your motivation to use ip-availabilities without allocation pool.
can you please explain your use case a bit?

Revision history for this message
yangjianfeng (yangjianfeng) wrote :

Hi, all:
I am so sorry, I absent the neutron driver meeting yesterday because of terrible network environment. But I read the meeting record[1].

yamamoto: This bug is reported by our QA team. Usually, I don't agree a subnet without allocation pools. But, updating of allocation pools is useful, it can help user to control the allocation of ip address. So, this maybe results in the "total_ips" < "used_ips", this will cause some confusion to the users.

[1] http://eavesdrop.openstack.org/meetings/neutron_drivers/2019/neutron_drivers.2019-09-27-14.00.log.html

Revision history for this message
yangjianfeng (yangjianfeng) wrote :

I think below parameters maybe necessary for users:
1. A parameter, it represents the number of all ips in network.
2. A parameter, it represents the number of all ips in allocaltion pools
3. A parameter, it represents the number of the used ips in network.
4. A parameter, it represents the number of the used ips in allocaltion pools.

Revision history for this message
Miguel Lavalle (minsel) wrote :

"This bug is reported by our QA team". What organization are you with?

Revision history for this message
yangjianfeng (yangjianfeng) wrote :

I work for Troila, locate on TianJin, China. We just registered to become a openstack gold member. We are developing our cloud products base on openstack.

https://www.kunluncloud.com/#/

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I checked that carefully today and compared it with documentation (api-ref).
Docs says that "total_ips" is "The total number of IP addresses in a network." - so it should be independent of allocation pools for subnets. But in reality it works in slightly different way. For subnet without any allocation pool defined, it indeed shows total number of IPs (based on subnet's cidr). But if there is any allocation pool created, total_ips value reflects total number of IPs IN allocation pools, not based on subnet's CIDR.
My test is on http://paste.openstack.org/show/780885/

So now the question is: should we update api-ref that it will describe what total_ips really means in case when there are and where there aren't any allocation pools in subnet?
Or maybe we should just treat it as a bug in our API and fix it that total_ips would always show total number of IPs based on subnet's CIDR?

IMO the second option is better as for me it is clearly bug in our code but I would like to know opinions of others.

When we will decide about this part, we can than talk about "rfe part" of this report and think about adding new field called e.g. "total_allocation_pools_ips" (or something like that) if that would be valid usecase for someone.

Revision history for this message
Miguel Lavalle (minsel) wrote :

I did some additional testing and thinking and I disagree with the conclusion in #15 that this behavior is a bug for the following reasons:

1) When we set a subnet to --no-allocation-pool, we are not saying that the subnet doesn't have available IPs anymore. What we are saying is that we don't have a pool or pools inside the subnet's CIDR from which to automatically allocate IPs when the port create request doesn't specify explicitly an address. Please look at http://paste.openstack.org/show/780902/. Once I set the subnet to --no-allocation pool I need to explicitly specify an ip address to create a port in that subnet. Also note in http://paste.openstack.org/show/780902/ that the accounting of available IPs is correct after the subnet has been set to --no-allocation-pool

2) Now let's look at the code: https://github.com/openstack/neutron/blob/92f7484d2a59f0ee900eca7280f92f6c1c0de35e/neutron/db/network_ip_availability_db.py#L125-L139. This code explicitly and prominently distinguishes between the case where the subnet has allocation pools (L132-136) which are then used for the IP availability calculation, and the case where the allocation pools are absent. In the latter case, the IP availability calculation is based on the subnet's CIDR (L138-139). Note that this loop iterates over a query generated here https://github.com/openstack/neutron/blob/92f7484d2a59f0ee900eca7280f92f6c1c0de35e/neutron/db/network_ip_availability_db.py#L107-L115. Based on this code, the semantics of the network-ip-availability extension is the following in regards to subnet allocation pools: if there are allocation pools, we will base the calculations on them. If there aren't we will use the subnet's CIDR. These semantics might need to be better documented, but that doesn't mean we have a bug in the code.

3) Please also note that the loop mentioned in the previous point hasn't changed since the code was originally merged: https://review.opendev.org/#/c/212955/23/neutron/db/network_ip_availability_db.py@124

4) Look at the patch that implemented the network-ip-availability extension: https://review.opendev.org/#/c/212955. The developers are from GoDaddy. They also submitted the original RFE (https://bugs.launchpad.net/neutron/+bug/1457986) and wrote the spec (https://review.opendev.org/#/c/180803/). GoDaddy is a BIG OpenStack deployer with an $11B market cap (https://finance.yahoo.com/quote/GDDY?p=GDDY&.tsrc=fin-srch). So we know that at least they have tooling based on the current behavior of network-ip-availability. And it is very likely that other deployers rely on this behavior, since the code was merged almost 3 years and a half ago (March 1 2016). Are we going to risk breaking them?

5) Finally, I am confused about the submitter's intent with this bug. If he sets the subnet to --no-allocation-pool, why is he surprised that he cannot create a port without explicitly specifying an address, as he seems to be in #1? That's the expected behavior and as mentioned in my point 1, network-ip-availability does the correct accounting

Revision history for this message
Miguel Lavalle (minsel) wrote :

I want also to point out that the network-ip-availability extension default policy is 'RULE_ADMIN_ONLY': https://github.com/openstack/neutron/blob/master/neutron/conf/policies/network_ip_availability.py#L21. And it was like this since the beginning: https://review.opendev.org/#/c/212955/23/neutron/tests/etc/policy.json@46. So clearly the extension was meant to be used by tools and admins. This implies the following:

1) There is tooling relying on the extension's behavior
2) In principle, this is no source of confusion to normal users or tenants

Revision history for this message
Akihiro Motoki (amotoki) wrote :

I also looked into the code and tested the current behavior.
The response of "total_ips" is changed based on whether there is allocation_pools or not.
I agree this is confusing, but as Miguel pointed out this API is for admins and exists for three years,
so operators might have tools which depends on the current behavior.

Thus, my first suggestion is to update the description of "total_ips" in the API reference without changing the current behavior

On the other hand I agree the current response is not enough too from the following points:
- The response of "total_ips" is conditional (based on allocation_pools)
- The response of "used_ips" does not consider "allocation_pools". "used_ips" counts the number of IPs both from inside and outside of allocation pools of the subnet.

If we need to know the IP usage correctly, we need four information:
- The total number of IP addresses of a subnet
- The total number of IP addresses in allocation pools
- The number of used IP addresses of a subnet (<- it is same as the response of the current "used_ips" IIUC)
- The number of used IP addresses from allocation pools

We can add extra attributes to provide more detail information.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi,

We discussed this on last drivers meeting: http://eavesdrop.openstack.org/meetings/neutron_drivers/2019/neutron_drivers.2019-10-04-14.00.log.html#l-14

Here is what we decided:

* we basically agree with proposal from Akihiro from comment #18 but as there is couple potential ways to improve this API we would like to propose spec and discuss it in details there. Miguel Lavalle volunteered to push such initial patch with spec,

* Can You explain us in more details what use case You have and want to address with this RFE? We want to know exactly WHY this API should be improved to find the best way to accomplish that.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: yangjianfeng (yangjianfeng) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/681705
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Due to no activity in this RFE for very long time, I'm marking it for now as postponed. Feel free to reopen it if You will want to work on it.

tags: added: rfe-postponed
removed: rfe-triaged
Changed in neutron:
status: New → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers