neutron.tests.functional.agent.test_l3_agent.MetadataL3AgentTestCase.test_access_to_metadata_proxy times out intermittently

Bug #1461172 reported by Matt Riedemann on 2015-06-02
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Critical
Armando Migliaccio

Bug Description

Elastic recheck:
http://status.openstack.org//elastic-recheck/index.html#1461172

http://logs.openstack.org/90/186290/2/gate/gate-neutron-dsvm-functional/599b742/console.html.gz#_2015-05-29_19_44_46_232

2015-05-29 19:44:46.232 | 2015-05-29 19:44:46.177 | neutron.tests.functional.agent.test_l3_agent.MetadataL3AgentTestCase.test_access_to_metadata_proxy
2015-05-29 19:44:46.232 | 2015-05-29 19:44:46.178 | --------------------------------------------------------------------------------------------------
2015-05-29 19:44:46.232 | 2015-05-29 19:44:46.179 |
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.180 | Captured traceback:
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.181 | ~~~~~~~~~~~~~~~~~~~
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.182 | Traceback (most recent call last):
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.183 | File "neutron/tests/functional/agent/test_l3_agent.py", line 900, in test_access_to_metadata_proxy
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.185 | self.fail('metadata proxy unreachable on %s before timeout' % url)
2015-05-29 19:44:46.233 | 2015-05-29 19:44:46.186 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.187 | raise self.failureException(msg)
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.188 | AssertionError: metadata proxy unreachable on http://169.254.169.254:80 before timeout
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.189 |
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.191 |
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.192 | Captured stdout:
2015-05-29 19:44:46.234 | 2015-05-29 19:44:46.193 | ~~~~~~~~~~~~~~~~
2015-05-29 19:44:46.235 | 2015-05-29 19:44:46.194 | 2015-05-29 19:43:27.257 15907 INFO neutron.common.config [req-8a89f51c-7d0f-4317-a897-c68b90af405d - - - - -] Logging enabled!
2015-05-29 19:44:46.235 | 2015-05-29 19:44:46.195 | 2015-05-29 19:43:27.257 15907 INFO neutron.common.config [req-8a89f51c-7d0f-4317-a897-c68b90af405d - - - - -] /opt/stack/new/neutron/.tox/dsvm-functional/lib/python2.7/site-packages/subunit/run.py version 2015.2.0.dev499
2015-05-29 19:44:46.235 | 2015-05-29 19:44:46.196 | 2015-05-29 19:43:27.331 15907 INFO eventlet.wsgi.server [-] (15907) wsgi starting up on http:///:t/
2015-05-29 19:44:46.235 | 2015-05-29 19:44:46.197 | 2015-05-29 19:43:30.762 15907 INFO eventlet.wsgi.server [-] (15907) wsgi starting up on http:///:t/
2015-05-29 19:44:46.235 | 2015-05-29 19:44:46.198 | 2015-05-29 19:43:31.229 15907 ERROR neutron.agent.linux.utils [req-d285dd8a-7dab-41a8-9692-57608f0cd9b2 - - - - -]
2015-05-29 19:44:46.236 | 2015-05-29 19:44:46.199 | Command: ['ip', 'netns', 'exec', 'func-7b734de4-1879-4c2e-87f5-070911a7b223', 'curl', '--max-time', 60, '-D-', 'http://169.254.169.254:80']
2015-05-29 19:44:46.261 | 2015-05-29 19:44:46.200 | Exit code: 7
2015-05-29 19:44:46.261 | 2015-05-29 19:44:46.201 | Stdin:
2015-05-29 19:44:46.261 | 2015-05-29 19:44:46.202 | Stdout:
2015-05-29 19:44:46.261 | 2015-05-29 19:44:46.203 | Stderr: % Total % Received % Xferd Average Speed Time Time Time Current
2015-05-29 19:44:46.261 | 2015-05-29 19:44:46.204 | Dload Upload Total Spent Left Speed
2015-05-29 19:44:46.262 | 2015-05-29 19:44:46.205 |
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to 169.254.169.254 port 80: Connection refused
2015-05-29 19:44:46.262 | 2015-05-29 19:44:46.206 |
2015-05-29 19:44:46.262 | 2015-05-29 19:44:46.207 | 2015-05-29 19:43:31.293 15907 INFO eventlet.wsgi.server [-] (15907) wsgi exited, is_accepting=True

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQXNzZXJ0aW9uRXJyb3I6IG1ldGFkYXRhIHByb3h5IHVucmVhY2hhYmxlIG9uXCIgQU5EIG1lc3NhZ2U6XCJiZWZvcmUgdGltZW91dFwiIEFORCB0YWdzOlwiY29uc29sZVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDMzMjY0NzAwNjIzfQ==

20 hits in 7 days, all failures, check and gate queues.

Matt Riedemann (mriedem) on 2015-06-02
tags: added: functional-tests
Changed in neutron:
status: New → Confirmed
Kyle Mestery (mestery) wrote :

Assaf, assigning this to you to triage a bit further. We'll also discuss in the Neutron meeting today. Thanks!

Changed in neutron:
assignee: nobody → Assaf Muller (amuller)
importance: Undecided → High
Brian Haley (brian-haley) wrote :

Notice the startup line, "wsgi starting up on http:///:t/" so somehow the host was '/' and the port was ':t' (or /: and t). I looked at this last week with Henry and never did get to the bottom of it, but that exact string, '/:t' is the same in all the failures with this test.

Assaf Muller (amuller) wrote :

Thanks Brian that's helpful.

Assaf Muller (amuller) wrote :

Looks like 'wsgi starting up on http:///:t/' is always the case, even on successful runs.

@Assaf, that is a bug in eventlet that we've fixed as https://github.com/eventlet/eventlet/commit/3ba7b02dfacb00e51dcad8097c3854f51c2b7d71 It should not have any relation to actual failures.

Assaf Muller (amuller) on 2015-07-27
Changed in neutron:
assignee: Assaf Muller (amuller) → nobody
Kyle Mestery (mestery) wrote :

Assigning to Carl to triage a bit more.

tags: added: l3
Changed in neutron:
assignee: nobody → Carl Baldwin (carl-baldwin)
Miguel Lavalle (minsel) on 2015-08-11
tags: added: l3-ipam-dhcp
removed: l3
Miguel Lavalle (minsel) on 2015-08-13
Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Miguel Lavalle (minsel)
Assaf Muller (amuller) on 2015-08-28
description: updated
tags: added: gate-failure

12 hits last 7 days

Changed in neutron:
importance: High → Critical
Miguel Lavalle (minsel) wrote :

Analyzed latest log stash entry. Found the following traceback:

2015-09-01 12:27:53.906 | 2015-09-01 12:27:53.883 | Traceback (most recent call last):
2015-09-01 12:27:53.916 | 2015-09-01 12:27:53.894 | File "neutron/tests/functional/agent/test_l3_agent.py", line 930, in test_access_to_metadata_proxy
2015-09-01 12:27:53.927 | 2015-09-01 12:27:53.904 | router = self.manage_router(self.agent, router_info)
2015-09-01 12:27:53.937 | 2015-09-01 12:27:53.915 | File "neutron/tests/functional/agent/test_l3_agent.py", line 129, in manage_router
2015-09-01 12:27:53.948 | 2015-09-01 12:27:53.925 | agent._process_added_router(router)
2015-09-01 12:27:53.959 | 2015-09-01 12:27:53.936 | File "neutron/agent/l3/agent.py", line 439, in _process_added_router
2015-09-01 12:27:53.969 | 2015-09-01 12:27:53.946 | self._router_added(router['id'], router)
2015-09-01 12:27:53.979 | 2015-09-01 12:27:53.956 | File "neutron/agent/l3/agent.py", line 328, in _router_added
2015-09-01 12:27:53.988 | 2015-09-01 12:27:53.966 | ri.initialize(self.process_monitor)
2015-09-01 12:27:53.998 | 2015-09-01 12:27:53.975 | File "neutron/agent/l3/router_info.py", line 85, in initialize
2015-09-01 12:27:54.008 | 2015-09-01 12:27:53.985 | self.router_namespace.create()
2015-09-01 12:27:54.018 | 2015-09-01 12:27:53.995 | File "neutron/agent/l3/namespaces.py", line 71, in create
2015-09-01 12:27:54.032 | 2015-09-01 12:27:54.009 | ip_wrapper = self.ip_wrapper_root.ensure_namespace(self.name)
2015-09-01 12:27:54.043 | 2015-09-01 12:27:54.020 | File "neutron/agent/linux/ip_lib.py", line 161, in ensure_namespace
2015-09-01 12:27:54.056 | 2015-09-01 12:27:54.031 | if not self.netns.exists(name):
2015-09-01 12:27:54.066 | 2015-09-01 12:27:54.044 | File "neutron/agent/linux/ip_lib.py", line 737, in exists
2015-09-01 12:27:54.076 | 2015-09-01 12:27:54.054 | run_as_root=cfg.CONF.AGENT.use_helper_for_ns_read)
2015-09-01 12:27:54.086 | 2015-09-01 12:27:54.063 | File "neutron/agent/linux/ip_lib.py", line 88, in _execute
2015-09-01 12:27:54.097 | 2015-09-01 12:27:54.074 | log_fail_as_error=log_fail_as_error)
2015-09-01 12:27:54.107 | 2015-09-01 12:27:54.085 | File "neutron/agent/linux/utils.py", line 156, in execute
2015-09-01 12:27:54.118 | 2015-09-01 12:27:54.095 | raise RuntimeError(m)
2015-09-01 12:27:54.128 | 2015-09-01 12:27:54.105 | RuntimeError:
2015-09-01 12:27:54.152 | 2015-09-01 12:27:54.116 | Command: ['ip', '-o', 'netns', 'list']
2015-09-01 12:27:54.152 | 2015-09-01 12:27:54.128 | Exit code: 126
2015-09-01 12:27:54.162 | 2015-09-01 12:27:54.139 | Stdin:
2015-09-01 12:27:54.173 | 2015-09-01 12:27:54.149 | Stdout: {"data":[[65534]],"headings":["ofport"]}
2015-09-01 12:27:54.187 | 2015-09-01 12:27:54.161 |
2015-09-01 12:27:54.199 | 2015-09-01 12:27:54.176 | Stderr:

Logstash indicates that this happened with change 196149. Investigating in Gerrit, found https://review.openstack.org/#/c/196149, which creates on purpose the failure shown in the traceback above. Investigating with the patchset author

Paul Michali (pcm) wrote :

That is a test commit to try to get the VPN Functional test job for NEUTRON working. This is an experimental neutron job that will run the VPN functional jobs on the Neutron changeset. They should be ignored.

Changed in neutron:
milestone: none → liberty-rc1

Hi Miguel: are you actively looking into this one?

In a failed run, I can see that iptables rules are acknowledged to be applied successfully well after the test failure [1]. I wonder if there is a race where the rules to map the metadata port are not effective making the curl require blow up.

[1] http://logs.openstack.org/91/220091/2/check/gate-neutron-dsvm-functional/a19990f/logs/neutron.tests.functional.agent.test_l3_agent.UnprivilegedUserGroupMetadataL3AgentTestCase.test_access_to_metadata_proxy.log.txt.gz#_2015-09-04_11_06_35_587

Unfortunately in this scenario the timeout (60 second on the curl command) is not effective, because the 'server is not accepting requests'.

Perhaps we can make the test retry on connection reset, or we could synchronize the spawn of the metadata proxy with the completion of iptables-save/restore in [1].

In other words, we could wait that the apply operation is really complete before spawning the proxy. I can hack something together to prove my point.

[1] https://github.com/openstack/neutron/blob/master/neutron/agent/metadata/driver.py#L155

um I might have this wrong. I am no longer sure on comments #13/#14.

I am no longer sure on comments #13/#14 :(

Fix proposed to branch: master
Review: https://review.openstack.org/220968

Changed in neutron:
assignee: Miguel Lavalle (minsel) → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/220968
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4b2e6842f320405cd963f560bc06849b4b7bb1eb
Submitter: Jenkins
Branch: master

commit 4b2e6842f320405cd963f560bc06849b4b7bb1eb
Author: armando-migliaccio <email address hidden>
Date: Mon Sep 7 04:53:50 2015 -0700

    Retry metadata request on connection refused error

    This testcase may fail intermittently on 'Connection refused' error.
    This could be due to the fact that the metadata proxy setup is not exactly
    complete at the time the request is issued; in fact there is no
    synchronization between the router being up and the metadata request being
    issued, and clearly this may be the reason of accidental but seldom failures.

    In order to rule out this possibility and stabilize the test, let's retry
    on connection refused only. If we continue to fail, then the next step would
    be to dump the content of iptables to figure out why the error occurs.

    Closes-bug: #1461172

    Change-Id: I65a5bf4fbbcad6ba93a46d36cabe7844ff528d8d

Changed in neutron:
status: In Progress → Fix Committed
Miguel Lavalle (minsel) wrote :

0 events as of 9/11. Thanks Armando!

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224357

Download full text (73.6 KiB)

Reviewed: https://review.openstack.org/224357
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fdc3431ccd219accf6a795079d9b67b8656eed8e
Submitter: Jenkins
Branch: feature/pecan

commit fe236bdaadb949661a0bfb9b62ddbe432b4cf5f1
Author: Miguel Angel Ajo <email address hidden>
Date: Thu Sep 3 15:40:12 2015 +0200

    No network devices on network attached qos policies

    Network devices, like internal router legs, or dhcp ports
    should not be affected by bandwidth limiting rules.

    This patch disables application of network attached policies
    to network/neutron owned ports.

    Closes-bug: #1486039
    DocImpact

    Change-Id: I75d80227f1e6c4b3f5fa7762b8dc3b0c0f1abd46

commit db4a06f7caa20a4c7879b58b20e95b223ed8eeaf
Author: Ken'ichi Ohmichi <email address hidden>
Date: Wed Sep 16 10:04:32 2015 +0000

    Use tempest-lib's token_client

    Now tempest-lib provides token_client modules as library and the
    interface is stable. So neutron repogitory doesn't need to contain
    these modules.
    This patch makes neutron use tempest-lib's token_client and removes
    the own modules for the maintenance.

    Change-Id: Ieff7eb003f6e8257d83368dbc80e332aa66a156c

commit 78aed58edbe6eb8a71339c7add491fe9de9a0546
Author: Jakub Libosvar <email address hidden>
Date: Thu Aug 13 09:08:20 2015 +0000

    Fix establishing UDP connection

    Previously, in establish_connection() for UDP protocol data were sent
    but never read on peer socket. That lead to successful read on peer side
    if this connection was filtered. Having constant testing string masked
    this issue as we can't distinguish to which test of connectivity data
    belong.

    This patch makes unique data string per test_connectivity() and
    also makes establish_connection() to create an ASSURED entry in
    conntrack table. Finally, in last test after firewall filter was
    removed, connection is re-established in order to avoid troubles with
    terminated processes or TCP continuing sending packets which weren't
    successfully delivered.

    Closes-Bug: 1478847
    Change-Id: I2920d587d8df8d96dc1c752c28f48ba495f3cf0f

commit e6292fcdd6262434a7b713ad8802db6bc8a6d3dc
Author: YAMAMOTO Takashi <email address hidden>
Date: Wed Sep 16 13:20:51 2015 +0900

    ovsdb: Fix a few docstring

    Change-Id: I53e1e21655b28fe5da60e58aeeb7cbbd103ae014

commit c22949a4449d96a67caa616290cf76b67b182917
Author: fumihiko kakuma <email address hidden>
Date: Wed Sep 16 11:52:59 2015 +0900

    Remove requirements.txt for the ofagent mechanism driver

    It is no longer used.

    Related-Blueprint: core-vendor-decomposition
    https://blueprints.launchpad.net/neutron/+spec/core-vendor-decomposition

    Change-Id: Ib31fb3febf8968e50d86dd66e1e6e1ea2313f8ac

commit d1d4de19d85f961d388c91e70f31b3bafec418c5
Author: Kevin Benton <email address hidden>
Date: Thu Sep 3 20:25:57 2015 -0700

    Always return iterables in L3 get_candidates

    The caller of this function expects iterables.

    Closes-Bug: #1494996
    Change-Id: I3d103e63f4e127a77268502415c0ddb0d804b54a

commit 1ad6ac448067306...

tags: added: in-feature-pecan

Change abandoned by Doug Wiegley (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/224334

Thierry Carrez (ttx) on 2015-09-24
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-10-15
Changed in neutron:
milestone: liberty-rc1 → 7.0.0

Reviewed: https://review.openstack.org/233544
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2051ffb822dd89112cf41681659a2d434bb8534d
Submitter: Jenkins
Branch: stable/kilo

commit 2051ffb822dd89112cf41681659a2d434bb8534d
Author: armando-migliaccio <email address hidden>
Date: Mon Sep 7 04:53:50 2015 -0700

    Retry metadata request on connection refused error

    This testcase may fail intermittently on 'Connection refused' error.
    This could be due to the fact that the metadata proxy setup is not exactly
    complete at the time the request is issued; in fact there is no
    synchronization between the router being up and the metadata request being
    issued, and clearly this may be the reason of accidental but seldom failures.

    In order to rule out this possibility and stabilize the test, let's retry
    on connection refused only. If we continue to fail, then the next step would
    be to dump the content of iptables to figure out why the error occurs.

    Closes-bug: #1461172

    Conflicts:
     neutron/tests/functional/agent/test_l3_agent.py

    Change-Id: I65a5bf4fbbcad6ba93a46d36cabe7844ff528d8d
    (cherry picked from commit 4b2e6842f320405cd963f560bc06849b4b7bb1eb)

tags: added: in-stable-kilo
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers