Intermittent test failure caused by missing request

Bug #1682026 reported by Matthew Oliver
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Low
Matthew Oliver

Bug Description

There is an intermittent test failure that has been seen on the stable/newton branch. In testing this only happens every now and again, sometimes it takes 10 iterations other ties 200

  NEWTON

  ======================================================================
  FAIL: test_GET (test.unit.proxy.test_server.TestAccountController)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/matt/swift/test/unit/proxy/test_server.py", line 7798, in test_GET
      do_test(*test)
    File "/home/matt/swift/test/unit/proxy/test_server.py", line 7777, in do_test
      self.assert_status_map(controller.GET, *args)
    File "/home/matt/swift/test/unit/proxy/test_server.py", line 7732, in assert_status_map
      self.assertEqual(res.status_int, expected)
  AssertionError: 404 != 503
      '404 != 503' = '%s != %s' % (safe_repr(404), safe_repr(503))
      '404 != 503' = self._formatMessage('404 != 503', '404 != 503')
  >> raise self.failureException('404 != 503')

  number of runs: 114

This feels like it shouldn't be limited to just newton so I checked on master:

  MASTER

  ======================================================================
  FAIL: test_GET (test.unit.proxy.test_server.TestAccountController)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/matt/swift/test/unit/proxy/test_server.py", line 8036, in test_GET
      self.assert_status_map(controller.GET, (404, 503, 503), 503)
    File "/home/matt/swift/test/unit/proxy/test_server.py", line 7977, in assert_status_map
      self.assertEqual(res.status_int, expected)
  AssertionError: 404 != 503
      '404 != 503' = '%s != %s' % (safe_repr(404), safe_repr(503))
      '404 != 503' = self._formatMessage('404 != 503', '404 != 503')
  >> raise self.failureException('404 != 503')

  number of runs: 135

So Turns out it isn't. It happens on master to. Doing some debugging I see that during a failure we loose a request:

  0.1s fake_http_connect: code_iter=(404, 503, 503)
  0.1s connect: status=404
  0.1s connect: status=503
  0.1s connect: status=503
  0.1s fake_http_connect: code_iter=(404, 503, 503)
  0.1s connect: status=404
  0.1s connect: status=503

   ^— Loosing a req some where.

Revision history for this message
Matthew Oliver (matt-0) wrote :

After much more debugging it comes down to the proxy server error_limiting a node.. then the proxies node_iter fails to iterout this node. As the tests use a default FakeRing with no extra handoffs we only get 2 requests which is not enough for quorum.

I've managed to fix it by clearing the error_limited dict on each call, but a better way would be to simply either to add hand off nodes or reduce the error_limited supression time to 0. Inital patch incoming.

This should fix the above test failure, but I want see if the container and object controller suffer from the same problem. And if so make a single patch to correct them all. Stay tuned.

Changed in swift:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Matthew Oliver (matt-0)
Revision history for this message
Matthew Oliver (matt-0) wrote :

initially testing the account test error with this. And seems to be working

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/460364

Changed in swift:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/460364
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a07f7dc8c0b98f76ea083145e991ed56f1cdb99a
Submitter: Jenkins
Branch: master

commit a07f7dc8c0b98f76ea083145e991ed56f1cdb99a
Author: Matthew Oliver <email address hidden>
Date: Thu Apr 27 01:03:29 2017 +0000

    Fix sporadic failure in TestAccountController unit test

    The proxy server on occasion has error limited a node by the time the
    test runs, causing the proxie's node_iter failing to iter out this
    error limited node. As the test uses a default FakeRing with no
    extra handoffs, on this occasion we only get 2 requests which is not
    enough for quorum, causing it to return a 503.

    This patch sets the error_suppression_interval to 0 when creating
    the proxy server. Meaning a node effectively isn't error_limited.

    Change-Id: I96cf4c4d63594f803cc1cd57e874d1624db8e249
    Closes-Bug: #1682026

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.15.0

This issue was fixed in the openstack/swift 2.15.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.