Client should not blindly resend/retry POST/PUT (create/update) requests

Bug #2012047 reported by Marcin Wilk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
keystoneauth
New
Undecided
Unassigned
python-openstackclient
New
Undecided
Unassigned
python-openstackclient (Ubuntu)
New
Undecided
Unassigned

Bug Description

Env: Focal/Ussuri
Client: python3-openstackclient 5.2.0-0ubuntu1.20.04.1

When user submits create request (ie, create subnet) to openstack api, and the request is processed synchronously (unlike server create for example), the client waits for the API response. If that request takes extensive time to complete, longer than any client idle time value on the API backend, the server closes the TCP connection with the openstackclient. In such cases the client automatically repeats the POST request, which by default is a wrong behaviour.
Following are logs from the 'openstack sbunet create example' where execution of the initial request took over 100s (keystone auth logs omitted for clarity):
# the command:
os subnet create \
  --network cd275f37-deec-4f0f-9ad3-4816ca52678e \
  --gateway 172.16.0.1 \
  --allocation-pool start=172.16.0.100,end=172.16.0.200 \
  --subnet-range 172.16.0.0/24 \
  --dns-nameserver 8.8.8.8 \
  --no-dhcp external-subnet-2 \
  --debug

# the command output:
Network client initialized using OpenStack SDK: <openstack.network.v2._proxy.Proxy object at 0x7f7ec8f34490>
REQ: curl -g -i -X GET http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d"
Starting new HTTP connection (1): 10.5.100.3:9696
http://10.5.100.3:9696 "GET /v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e HTTP/1.1" 200 726
RESP: [200] Connection: keep-alive Content-Length: 726 Content-Type: application/json Date: Thu, 16 Mar 2023 07:23:59 GMT X-Openstack-Request-Id: req-3abe5525-cde2-436d-bf39-34746d76f07a
RESP BODY: {"network":{"id":"cd275f37-deec-4f0f-9ad3-4816ca52678e","name":"ext_net","tenant_id":"d6761dacbb0649189a07a4a1a191a8c0","admin_state_up":true,"mtu":1500,"status":"ACTIVE","subnets":["040a6d57-b6fc-436a-b531-03baf4abb609"],"shared":false,"availability_zone_hints":[],"availability_zones":["nova"],"ipv4_address_scope":null,"ipv6_address_scope":null,"router:external":true,"description":"","port_security_enabled":true,"dns_domain":"","l2_adjacency":true,"is_default":false,"tags":[],"created_at":"2022-05-06T10:18:25Z","updated_at":"2023-03-16T06:35:33Z","revision_number":16,"project_id":"d6761dacbb0649189a07a4a1a191a8c0","provider:network_type":"flat","provider:physical_network":"physnet1","provider:segmentation_id":null}}
GET call to network for http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e used request id req-3abe5525-cde2-436d-bf39-34746d76f07a
REQ: curl -g -i -X POST http://10.5.100.3:9696/v2.0/subnets -H "Content-Type: application/json" -H "User-Agent: openstacksdk/0.46.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" -H "X-Auth-Token: {SHA256}6d3a82e66bdeb642bcfef551c78dbc0063e1bad49431683e6a0023f28dbc1d7d" -d '{"subnet": {"allocation_pools": [{"start": "172.16.0.100", "end": "172.16.0.200"}], "ip_version": 4, "gateway_ip": "172.16.0.1", "name": "external-subnet-2", "dns_nameservers": ["8.8.8.8"], "enable_dhcp": false, "cidr": "172.16.0.0/24", "network_id": "cd275f37-deec-4f0f-9ad3-4816ca52678e"}}'
Failure: Unable to establish connection to http://10.5.100.3:9696/v2.0/subnets: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Retrying in 0.5s.
Starting new HTTP connection (2): 10.5.100.3:9696
http://10.5.100.3:9696 "POST /v2.0/subnets HTTP/1.1" 400 221
RESP: [400] Connection: keep-alive Content-Length: 221 Content-Type: application/json Date: Thu, 16 Mar 2023 07:25:30 GMT X-Openstack-Request-Id: req-d0632669-e1ce-4157-b5a5-e52be6e51873
RESP BODY: {"NeutronError": {"type": "InvalidInput", "message": "Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.", "detail": ""}}
POST call to network for http://10.5.100.3:9696/v2.0/subnets used request id req-d0632669-e1ce-4157-b5a5-e52be6e51873
BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cliff/app.py", line 400, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3/dist-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3/dist-packages/cliff/display.py", line 117, in run
    column_names, data = self.take_action(parsed_args)
  File "/usr/lib/python3/dist-packages/openstackclient/network/v2/subnet.py", line 377, in take_action
    obj = client.create_subnet(**attrs)
  File "/usr/lib/python3/dist-packages/openstack/network/v2/_proxy.py", line 3491, in create_subnet
    return self._create(_subnet.Subnet, **attrs)
  File "/usr/lib/python3/dist-packages/openstack/proxy.py", line 417, in _create
    return res.create(self, base_path=base_path)
  File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1291, in create
    self._translate_response(response, has_body=has_body)
  File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1113, in _translate_response
    exceptions.raise_from_response(response, error_message=error_message)
  File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 234, in raise_from_response
    raise cls(
openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
clean_up CreateSubnet: BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.0.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.
END return value: 1

From the neutron-server.log the first request from client came in:
2023-03-16 07:23:59.608 2384978 DEBUG neutron.api.v2.base [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

and ended successfully after 131s:
2023-03-16 07:26:10.768 2384978 INFO neutron.wsgi [req-66884669-2253-4faf-95a4-ebbe353896a0 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 201 len: 0 time: 131.1619370

But in the meaintime, the second request came as a result of the client retry action:
2023-03-16 07:25:30.129 2384976 DEBUG neutron.api.v2.base [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] Request body: {'subnet': {'allocation_pools': [{'start': '172.16.0.100', 'end': '172.16.0.200'}], 'ip_version': 4, 'gateway_ip': '172.16.0.1', 'name': 'external-subnet-2', 'dns_nameservers': ['8.8.8.8'], 'enable_dhcp': False, 'cidr': '172.16.0.0/24', 'network_id': 'cd275f37-deec-4f0f-9ad3-4816ca52678e'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:719

And of course it failed validation:
2023-03-16 07:25:30.409 2384976 INFO neutron.wsgi [req-d0632669-e1ce-4157-b5a5-e52be6e51873 93eeeebc2920450faaaf2395443505fd d6761dacbb0649189a07a4a1a191a8c0 - 6a4439622e71431c8b96073e33c3b7e1 6a4439622e71431c8b96073e33c3b7e1] 10.5.2.174 "POST /v2.0/subnets HTTP/1.1" status: 400 len: 429 time: 0.2894270

The first request comlete sucessfully but the user wouldn't know about it. Instad he got measleading error as above.
This problem comes from the keystoneauth1 library, used by the python-openstackclient:
https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1019
https://github.com/openstack/keystoneauth/blob/stable/ussuri/keystoneauth1/session.py#L1031

In order to reproduce the problem I set python debugger on the neutron server code, here:
https://opendev.org/openstack/neutron/src/branch/master/neutron/api/rpc/handlers/resources_rpc.py#L237

But this problem originally comes from the production environment with 60+ compute nodes (DVR) and notification about changes due to the subnet creations takes over 100s.

Kind regards,
Marcin

Revision history for this message
Marcin Wilk (wilkmarcin) wrote :

And increasing default 'timeout client' and 'timeout server' values on the neturon-api haproxy (from default 90000 ms) is a workaround, but still client should handle these type of situations differently.

Cheers,
Marcin

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Marcin,

Thank you for reporting this. Do you know by any chance if this is also an issue on newer versions of the openstackclient?

Corey

Revision history for this message
Marcin Wilk (wilkmarcin) wrote :
Download full text (9.7 KiB)

Hey Corey,
I just checked with:
- python3-openstackclient 5.8.0-0ubuntu1
- python3-openstackclient 6.0.0-0ubuntu1~cloud0

and it looks like the newer versions are also affected:

 python3-openstackclient 5.8.0-0ubuntu1
(...)
 Starting new HTTP connection (1): 10.5.100.3:9696
http://10.5.100.3:9696 "GET /v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e HTTP/1.1" 200 765
RESP: [200] Connection: keep-alive Content-Length: 765 Content-Type: application/json Date: Wed, 22 Mar 2023 14:46:37 GMT X-Openstack-Request-Id: req-eed04f66-58e4-48df-b733-91fe086a2176
RESP BODY: {"network":{"id":"cd275f37-deec-4f0f-9ad3-4816ca52678e","name":"ext_net","tenant_id":"d6761dacbb0649189a07a4a1a191a8c0","admin_state_up":true,"mtu":1500,"status":"ACTIVE","subnets":["040a6d57-b6fc-436a-b531-03baf4abb609","3184d8e7-e9be-4071-b1bf-52cca10f28ec"],"shared":false,"availability_zone_hints":[],"availability_zones":["nova"],"ipv4_address_scope":null,"ipv6_address_scope":null,"router:external":true,"description":"","port_security_enabled":true,"dns_domain":"","l2_adjacency":true,"is_default":false,"tags":[],"created_at":"2022-05-06T10:18:25Z","updated_at":"2023-03-22T14:43:07Z","revision_number":27,"project_id":"d6761dacbb0649189a07a4a1a191a8c0","provider:network_type":"flat","provider:physical_network":"physnet1","provider:segmentation_id":null}}
GET call to network for http://10.5.100.3:9696/v2.0/networks/cd275f37-deec-4f0f-9ad3-4816ca52678e used request id req-eed04f66-58e4-48df-b733-91fe086a2176
Attribute [] not found in [<openstack.resource._ComponentManager object at 0x7f9a6d25d750>]: ''.
REQ: curl -g -i -X POST http://10.5.100.3:9696/v2.0/subnets -H "Content-Type: application/json" -H "User-Agent: openstacksdk/0.61.0 keystoneauth1/4.4.0 python-requests/2.25.1 CPython/3.10.6" -H "X-Auth-Token: {SHA256}05006deb621e646b64dc83b8718ec106a32226420306ac5a39f2d84b844d4dff" -d '{"subnet": {"allocation_pools": [{"start": "172.16.4.100", "end": "172.16.4.200"}], "enable_dhcp": false, "gateway_ip": "172.16.4.1", "dns_nameservers": ["8.8.8.8"], "network_id": "cd275f37-deec-4f0f-9ad3-4816ca52678e", "ip_version": 4, "cidr": "172.16.4.0/24", "name": "external-subnet-4"}}'
Failure: Unable to establish connection to http://10.5.100.3:9696/v2.0/subnets: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Retrying in 0.5s.
Starting new HTTP connection (2): 10.5.100.3:9696
http://10.5.100.3:9696 "POST /v2.0/subnets HTTP/1.1" 400 221
RESP: [400] Connection: keep-alive Content-Length: 221 Content-Type: application/json Date: Wed, 22 Mar 2023 14:48:07 GMT X-Openstack-Request-Id: req-7e183803-909a-4582-86af-818ec49dac84
RESP BODY: {"NeutronError": {"type": "InvalidInput", "message": "Invalid input for operation: Requested subnet with cidr: 172.16.4.0/24 for network: cd275f37-deec-4f0f-9ad3-4816ca52678e overlaps with another subnet.", "detail": ""}}
POST call to network for http://10.5.100.3:9696/v2.0/subnets used request id req-7e183803-909a-4582-86af-818ec49dac84
BadRequestException: 400: Client Error for url: http://10.5.100.3:9696/v2.0/subnets, Invalid input for operation: Requested subnet with cidr: 172.16.4.0/24 for network: cd275f37-deec-4f0f-9...

Read more...

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks for checking. If we think this is a bug in the upstream project we should target that project as well.

Revision history for this message
Marcin Wilk (wilkmarcin) wrote :

Hi Corey,
I added keystoneauth project as per your suggestion.
Kind regards,
Marcin

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.