test_api_extension_validation_with_good_dns_names fails with 500 error

Bug #1594796 reported by Armando Migliaccio
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Kevin Benton

Bug Description

The trace:

ft296.52: neutron.tests.unit.extensions.test_dns.DnsExtensionTestCase.test_api_extension_validation_with_good_dns_names_StringException: Empty attachments:
  pythonlogging:''
  stdout

stderr: {{{
/home/jenkins/workspace/gate-neutron-python34/.tox/py34/lib/python3.4/site-packages/paste/deploy/loadwsgi.py:22: DeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately.
  return pkg_resources.EntryPoint.parse("x=" + s).load(False)
}}}

Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-neutron-python34/neutron/tests/unit/extensions/test_dns.py", line 497, in test_api_extension_validation_with_good_dns_names
    self.assertEqual(201, res.status_code)
  File "/home/jenkins/workspace/gate-neutron-python34/.tox/py34/lib/python3.4/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/home/jenkins/workspace/gate-neutron-python34/.tox/py34/lib/python3.4/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 201 != 500

A few instances:

http://logs.openstack.org/99/331999/2/check/gate-neutron-python34/ecd478b/testr_results.html.gz
http://logs.openstack.org/68/330368/2/check/gate-neutron-python34/f0beed0/testr_results.html.gz

Logstash:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22test_api_extension_validation_with_good_dns_names%5C%22

Changed in neutron:
status: New → Confirmed
importance: Undecided → Critical
tags: added: dns gate-failure
Miguel Lavalle (minsel)
Changed in neutron:
assignee: nobody → Miguel Lavalle (minsel)
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I don't have a root cause yet. I'm just making a check point here to show what I've been looking at. I don't think this has anything to do with DNS. More likely, it has to do with the new mac address duplicate detection [1]. But, I don't know the root cause yet.

This new mac address duplicate detection relies on retries if there is a race to get the same mac. The stack trace in [2] goes right through the _create method in api/v2/base.py [3] which is annotated with db_api.retry_db_errors [4]. So, presumably, this is trying 10 times and failing with duplicate macs each time. If this is true, this smells of problem with random because that is just not likely enough to ever happen in anyone's lifetime with a half-decent random generator. But, we should check more in to these assumptions.

I have a vague memory of having to deal with seeding the random generator after forking a process to deal with a problem like this. I thought I had fixed something but I can't find it. It might have been a problem that was local to the old HP cloud. Are there multiple API processes involved?

[1] https://review.openstack.org/#/c/327413
[2] http://paste.openstack.org/show/520931/
[3] https://github.com/openstack/neutron/blob/7245b172b0/neutron/api/v2/base.py#L535
[4] https://github.com/openstack/neutron/blob/7245b172b0/neutron/api/v2/base.py#L426

tags: added: l3-ipam-dhcp
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

Here are a couple of things that might be interesting to note. The failure, so far, has only happened with py34.

Also, it is strange that the first occurrence of the problem was with a patch that changed the code in _create_db_port_obj having to do with duplicate mac address detection [1]. This could be coincidence. This could be a coincidence but I thought it was worth noting.

[1] https://review.openstack.org/#/c/330368/2/neutron/db/db_base_plugin_v2.py

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I haven't been able to reproduce this locally.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/332487

Changed in neutron:
assignee: Miguel Lavalle (minsel) → Kevin Benton (kevinbenton)
status: Confirmed → In Progress
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

So, after looking at Kevin's proposal, I realized that DBDuplicateEntry isn't caught by the regular retry decorator in base.py. So, all my confused wondering about how we could collide on a mac address 10 times in a row was off track.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/332487
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1af862791ef83d2d67d71a4db6d37b919accb3d5
Submitter: Jenkins
Branch: master

commit 1af862791ef83d2d67d71a4db6d37b919accb3d5
Author: Kevin Benton <email address hidden>
Date: Tue Jun 21 14:39:57 2016 -0700

    Retry DBDuplicate errors in retry decorator

    The MAC duplicate detection logic now expects that the core
    plugin catches DBDuplicate entries and converts them into
    retry attempts. This assumption was valid for ML2 because it
    catches them, but it's not true for anything else, including
    unit tests that just use db_base_plugin_v2 directly.

    Instead of expecting each plugin to catch and convert
    DBDuplicate errors into RetryRequests, this patch just has
    the retry logic check for DBDuplicate errors. A DBDuplicate
    being raised to the API layer for anything other than a race
    that needs to be retried is a bug anyway since it will be
    turned into an HTTP 500.

    The MAC generation for the unit test that failed in the bug
    report defines the 4th octect of the MAC in the test config
    so there are only 65k MAC addresses available per network.
    The unit test would then proceed to create 13 ports on the
    network, which would give us a ~1/5000 chance of a dup mac
    in the unit test using db_base_plugin_v2, which was not
    catching the DBDuplicate exception.

    Closes-Bug: #1594796
    Change-Id: I828f529db8c389ba0ab1eaa5f93ca2f5563048a8

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b2

This issue was fixed in the openstack/neutron 9.0.0.0b2 development milestone.

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.