nova_server_resize_poll_interval=5.0 causes intermittent failures with NovaServers.boot_server_from_volume_and_resize

Bug #1946912 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Rally
Fix Released
Undecided
Unassigned

Bug Description

NovaServers.boot_server_from_volume_and_resize test case fails intermittently with the following connection error when polling resize status:

ITER: 10 END: Error GetResourceFailure: Failed to get the resource <Server: s_rally_d7e83197_5mMdRhmh>: Unable to establish connection to https://nova.fqdn:8774/v2.1/servers/a1976740-eee9-4582-bed1-b0421f418c41: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

After looking into the detail, it looks like the default value of nova_server_resize_poll_interval as 5 seconds can hit into a corner case where OpenStack API services are behind Apache2 wsgi and HTTP Keep-alive timeout is 5 seconds as the default value of Apachd2.
https://github.com/openstack/rally-openstack/blob/ec1f31a685bdcbe10212701f8fd0c35598d4745f/rally_openstack/common/cfg/nova.py#L240-L243

We can mitigate the issue by having nova_server_resize_poll_interval explicitly not to use 5 seconds or bump HTTP Keep-alive timeout on the server side. But it would be nice to have nova_server_resize_poll_interval with something other than 5 out of the box.

ref:
https://github.com/psf/requests/issues/4664
https://bugs.python.org/issue41345
https://bugs.python.org/msg374466

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to rally-openstack (master)
Changed in rally:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to rally-openstack (master)

Reviewed: https://review.opendev.org/c/openstack/rally-openstack/+/813742
Committed: https://opendev.org/openstack/rally-openstack/commit/0ec255aad23aea7693f9b830962370a0500767b9
Submitter: "Zuul (22348)"
Branch: master

commit 0ec255aad23aea7693f9b830962370a0500767b9
Author: Nobuto Murata <email address hidden>
Date: Wed Oct 13 14:30:29 2021 +0900

    Avoid poll_interval to be the same as HTTP Keep-Alive timeout

    When OpenStack API services are behind Apache2, HTTP Keep-Alive timeout
    is 5 seconds out of the box. Polling operations with the same interval
    as HTTP Keep-Alive timeout can cause intermittent failures like:

    Unable to establish connection to
    https://nova.example.com:8774/v2.1/servers/UUID:
    ('Connection aborted.', RemoteDisconnected('Remote end closed connection
    without response',))

    Let's avoid the same value by default as Apache2's HTTP Keep-Alive
    timeout.

    Closes-Bug: #1946912
    Change-Id: Ibda414b129a44d38ed3c3a4b5a43fd45e63ec122

Changed in rally:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/rally-openstack 2.2.0

This issue was fixed in the openstack/rally-openstack 2.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.