504 Gateway Timeout while trying to verify_metadata for a server

Bug #1912845 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Gate
New
Undecided
Unassigned

Bug Description

Seen in the gate recently where a test fails while trying to verify metadata, example:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/scenario/test_server_basic_ops.py", line 136, in test_server_basic_ops
    self.verify_metadata()
  File "/opt/stack/tempest/tempest/scenario/test_server_basic_ops.py", line 81, in verify_metadata
    if not test_utils.call_until_true(exec_cmd_and_verify_output,
  File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 116, in call_until_true
    if func(*args, **kwargs):
  File "/opt/stack/tempest/tempest/scenario/test_server_basic_ops.py", line 78, in exec_cmd_and_verify_output
    self.assertEqual(self.ip, result, msg)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 415, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 502, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: !=:
reference = '172.24.5.114'
actual = '''\
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
'''
: Failed while verifying metadata on server. Result of command "curl http://169.254.169.254/latest/meta-data/public-ipv4" is NOT "172.24.5.114".

Logstash query:

54 hits in the last 7 days, all failures

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20while%20verifying%20metadata%20on%20server%5C%22%20AND%20tags%3Aconsole&from=7d

Here is what we currently use for the metadata API config nova-metadata-uwsgi.ini:

[uwsgi]
http = 0.0.0.0:8775
lazy-apps = true
add-header = Connection: close
buffer-size = 65535
hook-master-start = unix_signal:15 gracefully_kill_them_all
thunder-lock = true
plugins = http,python3
enable-threads = true
worker-reload-mercy = 90
exit-on-reload = false
die-on-term = true
master = true
processes = 2
wsgi-file = /usr/local/bin/nova-metadata-wsgi

We're using 2 API workers here, as specified by the formula [1]:

# Sets the maximum number of workers for most services to reduce
# the memory used where there are a large number of CPUs present
# (the default number of workers for many services is the number of CPUs)
# Also sets the minimum number of workers to 2.
API_WORKERS=${API_WORKERS:=$(( ($(nproc)/4)<2 ? 2 : ($(nproc)/4) ))}

I'm wondering if we should try increasing this for the metadata API only, considering how many gate failures we observe around metadata API timing out or otherwise failing to respond to requests [2][3].

[1] https://github.com/openstack/devstack/blob/edee6dc341e40939360b36ce9fd09052dea1ee4d/stackrc#L789-L793
[2] https://bugs.launchpad.net/openstack-gate/+bug/1911574
[3] https://bugs.launchpad.net/openstack-gate/+bug/1808010

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.