While checking another issue https://bugs.launchpad.net/grenade/+bug/2020643 which is triggered by this stuck issue, i saw one request in neutron too timed out after 900 seconds[1].
Tempest triggered security group delete but it timed out and retried and succeeded in the second attempt:-
2023-05-22 04:28:48.568 69207 WARNING urllib3.connectionpool [-] Retrying (Retry(total=9, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='173.231.255.73', port=80): Read timed out. (read timeout=60)")': /compute/v2.1/os-security-groups/76a08f39-c1e7-47e5-b52f-c4de53a46146 2023-05-22 04:28:48.707 69207 INFO tempest.lib.common.rest_client [req-1fd0ae50-5f36-40de-9d48-243f74c59335 req-1fd0ae50-5f36-40de-9d48-243f74c59335 ] Request (SecurityGroupRulesTestJSON:tearDownClass): 202 DELETE http://173.231.255.73/compute/v2.1/os-security-groups/76a08f39-c1e7-47e5-b52f-c4de53a46146 60.179s
The original request is stuck in nova/neutron and times out after 900s(client_socket_timeout).
nova:- May 22 04:27:48.532171 np0034092361 <email address hidden>[51094]: DEBUG nova.api.openstack.wsgi [None req-fc94b238-88f2-4e4b-8a25-9ed456ac652a tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] Calling method '<function Controller.__getattribute__.<locals>.version_select at 0x7f9129c5dea0>' {{(pid=51094) _process_stack /opt/stack/old/nova/nova/api/openstack/wsgi.py:513}} May 22 04:42:48.780841 np0034092361 <email address hidden>[51094]: DEBUG neutronclient.v2_0.client [None req-fc94b238-88f2-4e4b-8a25-9ed456ac652a tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] Error message: {"error": {"code": 401, "title": "Unauthorized", "message": "The request you have made requires authentication."}} {{(pid=51094) _handle_fault_response /usr/local/lib/python3.10/dist-packages/neutronclient/v2_0/client.py:262}} May 22 04:42:48.781975 np0034092361 <email address hidden>[51094]: INFO nova.api.openstack.requestlog [None req-fc94b238-88f2-4e4b-8a25-9ed456ac652a tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] 173.231.255.73 "DELETE /compute/v2.1/os-security-groups/76a08f39-c1e7-47e5-b52f-c4de53a46146" status: 500 len: 0 microversion: 2.1 time: 900.252509 May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: Mon May 22 04:42:48 2023 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /compute/v2.1/os-security-groups/76a08f39-c1e7-47e5-b52f-c4de53a46146 (ip 173.231.255.73) !!! May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: Mon May 22 04:42:48 2023 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306] during DELETE /compute/v2.1/os-security-groups/76a08f39-c1e7-47e5-b52f-c4de53a46146 (173.231.255.73) May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: CRITICAL nova [None req-fc94b238-88f2-4e4b-8a25-9ed456ac652a tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] Unhandled error: OSError: write error May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: ERROR nova OSError: write error May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: ERROR nova
nova worker 51094 not doing anything and stuck during those 900 seconds.
neutron, an another req(security-group-rule delete associated with the security group):- May 22 04:27:48.525507 np0034092361 neutron-server[51851]: DEBUG neutron.api.rpc.handlers.resources_rpc [req-b7d96c61-9d75-4c14-8fd7-6f2dd459b68f req-40ff2231-f1d6-4785-9596-2ccd296618b6 tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] Pushing event deleted for resources: {'SecurityGroupRule': ['ID=75e2d70f-0a9d-4bc3-958f-220455905b69,revision_number=None']} {{(pid=51851) push /opt/stack/old/neutron/neutron/api/rpc/handlers/resources_rpc.py:237}} May 22 04:42:48.524257 np0034092361 neutron-server[51851]: DEBUG neutron.wsgi [req-b7d96c61-9d75-4c14-8fd7-6f2dd459b68f req-40ff2231-f1d6-4785-9596-2ccd296618b6 tempest-SecurityGroupRulesTestJSON-54591405 tempest-SecurityGroupRulesTestJSON-54591405-project-member] (51851) timed out ('173.231.255.73', 33990) {{(pid=51851) process_request /usr/local/lib/python3.10/dist-packages/eventlet/wsgi.py:842}}
neutron-api worker processing other requests during this time so where it's stuck?
[1] https://zuul.opendev.org/t/openstack/build/4a37d236576b41648ef16c25b6daaff0
While checking another issue https:/ /bugs.launchpad .net/grenade/ +bug/2020643 which is triggered by this stuck issue, i saw one request in neutron too timed out after 900 seconds[1].
Tempest triggered security group delete but it timed out and retried and succeeded in the second attempt:-
2023-05-22 04:28:48.568 69207 WARNING urllib3. connectionpool [-] Retrying (Retry(total=9, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ReadTimeoutErr or("HTTPConnect ionPool( host='173. 231.255. 73', port=80): Read timed out. (read timeout=60)")': /compute/ v2.1/os- security- groups/ 76a08f39- c1e7-47e5- b52f-c4de53a461 46 lib.common. rest_client [req-1fd0ae50- 5f36-40de- 9d48-243f74c593 35 req-1fd0ae50- 5f36-40de- 9d48-243f74c593 35 ] Request (SecurityGroupR ulesTestJSON: tearDownClass) : 202 DELETE http:// 173.231. 255.73/ compute/ v2.1/os- security- groups/ 76a08f39- c1e7-47e5- b52f-c4de53a461 46 60.179s
2023-05-22 04:28:48.707 69207 INFO tempest.
The original request is stuck in nova/neutron and times out after 900s(client_ socket_ timeout) .
nova:- openstack. wsgi [None req-fc94b238- 88f2-4e4b- 8a25-9ed456ac65 2a tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] Calling method '<function Controller. __getattribute_ _.<locals> .version_ select at 0x7f9129c5dea0>' {{(pid=51094) _process_stack /opt/stack/ old/nova/ nova/api/ openstack/ wsgi.py: 513}} v2_0.client [None req-fc94b238- 88f2-4e4b- 8a25-9ed456ac65 2a tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] Error message: {"error": {"code": 401, "title": "Unauthorized", "message": "The request you have made requires authentication."}} {{(pid=51094) _handle_ fault_response /usr/local/ lib/python3. 10/dist- packages/ neutronclient/ v2_0/client. py:262} } openstack. requestlog [None req-fc94b238- 88f2-4e4b- 8a25-9ed456ac65 2a tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] 173.231.255.73 "DELETE /compute/ v2.1/os- security- groups/ 76a08f39- c1e7-47e5- b52f-c4de53a461 46" status: 500 len: 0 microversion: 2.1 time: 900.252509 v2.1/os- security- groups/ 76a08f39- c1e7-47e5- b52f-c4de53a461 46 (ip 173.231.255.73) !!! writev_ headers_ and_body_ do(): Broken pipe [core/writer.c line 306] during DELETE /compute/ v2.1/os- security- groups/ 76a08f39- c1e7-47e5- b52f-c4de53a461 46 (173.231.255.73) 88f2-4e4b- 8a25-9ed456ac65 2a tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] Unhandled error: OSError: write error
May 22 04:27:48.532171 np0034092361 <email address hidden>[51094]: DEBUG nova.api.
May 22 04:42:48.780841 np0034092361 <email address hidden>[51094]: DEBUG neutronclient.
May 22 04:42:48.781975 np0034092361 <email address hidden>[51094]: INFO nova.api.
May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: Mon May 22 04:42:48 2023 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /compute/
May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: Mon May 22 04:42:48 2023 - uwsgi_response_
May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: CRITICAL nova [None req-fc94b238-
May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: ERROR nova OSError: write error
May 22 04:42:48.792104 np0034092361 <email address hidden>[51094]: ERROR nova
nova worker 51094 not doing anything and stuck during those 900 seconds.
neutron, an another req(security- group-rule delete associated with the security group):- server[ 51851]: DEBUG neutron. api.rpc. handlers. resources_ rpc [req-b7d96c61- 9d75-4c14- 8fd7-6f2dd459b6 8f req-40ff2231- f1d6-4785- 9596-2ccd296618 b6 tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] Pushing event deleted for resources: {'SecurityGroup Rule': ['ID=75e2d70f- 0a9d-4bc3- 958f-220455905b 69,revision_ number= None']} {{(pid=51851) push /opt/stack/ old/neutron/ neutron/ api/rpc/ handlers/ resources_ rpc.py: 237}} server[ 51851]: DEBUG neutron.wsgi [req-b7d96c61- 9d75-4c14- 8fd7-6f2dd459b6 8f req-40ff2231- f1d6-4785- 9596-2ccd296618 b6 tempest- SecurityGroupRu lesTestJSON- 54591405 tempest- SecurityGroupRu lesTestJSON- 54591405- project- member] (51851) timed out ('173.231.255.73', 33990) {{(pid=51851) process_request /usr/local/ lib/python3. 10/dist- packages/ eventlet/ wsgi.py: 842}}
May 22 04:27:48.525507 np0034092361 neutron-
May 22 04:42:48.524257 np0034092361 neutron-
neutron-api worker processing other requests during this time so where it's stuck?
[1] https:/ /zuul.opendev. org/t/openstack /build/ 4a37d236576b416 48ef16c25b6daaf f0