Sometimes grenade job fails with NetworkNotFound because a network delete request took too long
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| neutron |
Critical
|
Ihar Hrachyshka |
Bug Description
The DELETE request in question is:
2017-04-07 03:45:19.220 | 2017-04-07 03:41:31,143 18539 WARNING [urllib3.
2017-04-07 03:45:19.220 | 2017-04-07 03:41:34,053 18539 INFO [tempest.
2017-04-07 03:45:19.220 | 2017-04-07 03:41:34,053 18539 DEBUG [tempest.
2017-04-07 03:45:19.220 | Body: None
2017-04-07 03:45:19.221 | Response - Headers: {u'content-length': '138', u'content-type': 'application/json', u'x-openstack-
2017-04-07 03:45:19.221 | Body: {"NeutronError": {"message": "Network 46b0776a-
What we see is first attempt to delete the network failed after 60 seconds, so we retry DELETE, at which point we see that the network is no longer there.
In neutron-server log, we see that the first DELETE attempt was received with req_id req-933bd8b3-
In logs handling the first DELETE request, we see some looping de-allocating ports:
2017-04-07 03:40:34.227 8785 DEBUG neutron.
2017-04-07 03:40:34.231 8785 DEBUG neutron.
2017-04-07 03:40:34.338 8785 DEBUG neutron.
2017-04-07 03:40:34.340 8785 DEBUG neutron.
It goes on like that on and on up until:
2017-04-07 03:41:32.644 8785 DEBUG neutron.
2017-04-07 03:41:32.644 8785 DEBUG neutron.ipam.driver [req-933bd8b3-
2017-04-07 03:41:32.698 8785 DEBUG neutron.
Right before the thread is unblocked and makes progress, we see there is no longer a port to de-allocate:
2017-04-07 03:41:32.421 8785 DEBUG neutron.
I think this was unblocked by another cleanup (DELETE) request for the port that happened just before:
2017-04-07 03:41:31.994 8785 DEBUG neutron.
I suspect this is related to the following patch where we first caught the situation but landed the patch nevertheless: https:/
We may want to revert those. We may also want to release a new Newton release because the patch got into 9.3.0.
Ihar Hrachyshka (ihar-hrachyshka) wrote : | #1 |
Changed in neutron: | |
importance: | Undecided → Critical |
tags: | added: gate-failure |
tags: | added: db |
Changed in neutron: | |
status: | New → Confirmed |
assignee: | nobody → Ihar Hrachyshka (ihar-hrachyshka) |
Fix proposed to branch: stable/ocata
Review: https:/
Fix proposed to branch: stable/newton
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/newton
commit 0a2ca17d871c88a
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 17:14:18 2017 +0000
Revert "Fix DetachedInstanc
This reverts commit 20b9b8934331556
The patch made us loop indefinitely on network delete request.
Change-Id: I18517bafcee495
Closes-Bug: #1680912
tags: | added: in-stable-newton |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/ocata
commit a919f2b3541e233
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 17:13:35 2017 +0000
Revert "Fix DetachedInstanc
This reverts commit b9242c348cbf44e
The patch made us loop indefinitely on network delete request.
Change-Id: I67eed7c0cb9ca7
Closes-Bug: #1680912
tags: | added: in-stable-ocata |
This issue was fixed in the openstack/neutron 10.0.1 release.
This issue was fixed in the openstack/neutron 9.3.1 release.
Changed in neutron: | |
status: | Confirmed → Fix Released |
http:// logstash. openstack. org/#dashboard/ file/logstash. json?query= message% 3A%5C%22Read% 20timed% 20out.% 20(read% 20timeout% 3D60)%5C% 22
^ suggests there were 3 failures of gate-grenade- dsvm-neutron- ubuntu- xenial in last week overall, though there are more timeouts in other jobs.