Intermittent unit test failures for incorrect notifier count

Bug #1412513 reported by John Griffith
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Gorka Eguileor
Nominated for Kilo by Eric Harney

Bug Description

Started seeing a couple of these over the weekend:
http://logs.openstack.org/97/148097/1/check/gate-cinder-python27/47f4c47/console.html#_2015-01-19_15_18_22_638

Yet to reproduce locally, but found a couple in the gates. Need to setup an ERQ and see how often we're actually hitting this.

Revision history for this message
John Griffith (john-griffith) wrote :

The notifier count is incorrect intermittently giving MismatchError
on assertEqual NotifierCount. This seems to hit either in the volume_create
    (giving a MismatchError 3 != 2) OR
in the snapshot_create
    (giving a MismatchError 5 != 4)

ERQ submitted here: https://review.openstack.org/#/c/148320/

Revision history for this message
John Griffith (john-griffith) wrote :

10 hits on this starting on Jan 16

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/148340

Revision history for this message
Eric Harney (eharney) wrote :

The above patch should help us determine what extra notification is queued when the failure described by John occurs.

Mike Perez (thingee)
Changed in cinder:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/148340
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=8740e0630938a88cd03c514ffe873613801f04b9
Submitter: Jenkins
Branch: master

commit 8740e0630938a88cd03c514ffe873613801f04b9
Author: Eric Harney <email address hidden>
Date: Mon Jan 19 14:01:45 2015 -0500

    Make test_create_delete_snapshot more robust

    This patch does two things to improve this test, and
    help debug issues with it failing.

    - Add additional checks for contents of the first two
      expected notifications.
    - Assert the length of the notifications list after checking
      its contents so that if it contains unexpected items, we can
      see what they are. This should help with the current gate
      failures.

    Change-Id: Ib92247389fb3be6d65a658880fd47e634f1da8d1
    Related-Bug: #1412513

Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

IMO, there's still a bug with this test_create_delete_snapshot unit test. I've hit it here :http://logs.openstack.org/17/152117/1/check/gate-cinder-python27/915a0af/console.html#_2015-02-02_13_58_46_759

The ERQ is not accurate anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/163191

Revision history for this message
Eric Harney (eharney) wrote :
Revision history for this message
Eric Harney (eharney) wrote :

The test_create_delete_snapshot changes actually revealed something here. Looks like we are getting backup.createprogress notifications in a test which shouldn't deal with backup.

Maybe the backup tests are leaking something?

http://logs.openstack.org/85/147185/2/check/gate-cinder-python27/ceb7d84/console.html

2015-03-12 15:37:51.378 | cinder.tests.test_volume.VolumeTestCase.test_create_delete_snapshot
2015-03-12 15:37:51.378 | -------------------------------------------------------------------
2015-03-12 15:37:51.378 |
2015-03-12 15:37:51.378 | Captured traceback:
2015-03-12 15:37:51.378 | ~~~~~~~~~~~~~~~~~~~
2015-03-12 15:37:51.378 | Traceback (most recent call last):
2015-03-12 15:37:51.378 | File "cinder/tests/test_volume.py", line 2488, in test_create_delete_snapshot
2015-03-12 15:37:51.378 | self.assertEqual(msg['event_type'], 'snapshot.delete.end')
2015-03-12 15:37:51.378 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
2015-03-12 15:37:51.379 | self.assertThat(observed, matcher, message)
2015-03-12 15:37:51.379 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
2015-03-12 15:37:51.379 | raise mismatch_error
2015-03-12 15:37:51.379 | testtools.matchers._impl.MismatchError: 'backup.createprogress' != 'snapshot.delete.end'

Changed in cinder:
assignee: nobody → Eric Harney (eharney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/163915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/163191
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d1b03bf268303d05f1067239fd3f7e87498fe091
Submitter: Jenkins
Branch: master

commit d1b03bf268303d05f1067239fd3f7e87498fe091
Author: Eric Harney <email address hidden>
Date: Tue Mar 10 16:34:16 2015 -0400

    Tests: Harden fake_notifier asserts

    This is a continuation of
    8740e063 - Make test_create_delete_snapshot more robust

    The same issue cropped up for me in
    test_create_consistencygroup_from_src().

    Add some more specific asserts to try to reveal what
    is going wrong here. The idea is to assert more
    specific things (examining individual notifications)
    before more general asserts (count of notifications) when
    possible, to fail the test on the check that will provide
    the most information back.

    Related-Bug: #1412513

    Change-Id: I267bd5933247f69f220a7be871a63e4db9eb9a80

Revision history for this message
John Griffith (john-griffith) wrote :

Seeing this again lately on Master, seems to be around consistency group tests each time I see it and I've managed to hit it locally a couple times now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Eric Harney (<email address hidden>) on branch: master
Review: https://review.openstack.org/163915

Eric Harney (eharney)
tags: added: tests
Revision history for this message
Gorka Eguileor (gorka) wrote :

This is still happening: http://logs.openstack.org/09/161209/11/check/gate-cinder-python27/a2efc0d/console.html.gz

2015-05-21 06:16:04.705 | Captured traceback:
2015-05-21 06:16:04.705 | ~~~~~~~~~~~~~~~~~~~
2015-05-21 06:16:04.705 | Traceback (most recent call last):
2015-05-21 06:16:04.705 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/mock.py", line 1201, in patched
2015-05-21 06:16:04.705 | return func(*args, **keywargs)
2015-05-21 06:16:04.705 | File "cinder/tests/unit/test_volume.py", line 472, in test_delete_driver_not_initialized
2015-05-21 06:16:04.706 | self.assertEqual(0, len(fake_notifier.NOTIFICATIONS))
2015-05-21 06:16:04.706 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
2015-05-21 06:16:04.706 | self.assertThat(observed, matcher, message)
2015-05-21 06:16:04.706 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
2015-05-21 06:16:04.706 | raise mismatch_error
2015-05-21 06:16:04.706 | testtools.matchers._impl.MismatchError: 0 != 1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/185635

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/185635
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=4bbe1c87e26150c62e1dc8a2a5b8979896c1ddd9
Submitter: Jenkins
Branch: master

commit 4bbe1c87e26150c62e1dc8a2a5b8979896c1ddd9
Author: Gorka Eguileor <email address hidden>
Date: Tue May 26 16:39:28 2015 +0200

    Display NOTIFICATIONS on assert failure

    From time to time we have itermitent errors regarding the
    fake_notifier.NOTIFICATIONS count.

    This patch does two things to try to find out which notifications are
    being fired unexpectedly:
    - Make sure that fake_notifier is reset in cleanup regardless of
      exception raised by shutil.rmtree
    - Display fake_notifier.NOTIFICATIONS whenever count assertion fails.

    Related-Bug: #1412513
    Change-Id: I07370636d8b30098dcb2a70ef1985c2f5aee81be

Revision history for this message
Eric Harney (eharney) wrote :
Download full text (7.0 KiB)

This failure is still being hit:

http://logs.openstack.org/65/177665/7/check/gate-cinder-python27/ab09abb/console.html

2015-06-02 18:26:38.502 | cinder.tests.unit.test_volume.VolumeTestCase.test_update_consistencygroup
2015-06-02 18:26:38.502 | -------------------------------------------------------------------------
2015-06-02 18:26:38.502 |
2015-06-02 18:26:38.502 | Captured traceback:
2015-06-02 18:26:38.502 | ~~~~~~~~~~~~~~~~~~~
2015-06-02 18:26:38.503 | Traceback (most recent call last):
2015-06-02 18:26:38.503 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/mock.py", line 1201, in patched
2015-06-02 18:26:38.503 | return func(*args, **keywargs)
2015-06-02 18:26:38.503 | File "cinder/tests/unit/test_volume.py", line 4607, in test_update_consistencygroup
2015-06-02 18:26:38.503 | fake_notifier.NOTIFICATIONS)
2015-06-02 18:26:38.503 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
2015-06-02 18:26:38.504 | self.assertThat(observed, matcher, message)
2015-06-02 18:26:38.504 | File "/home/jenkins/workspace/gate-cinder-python27/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
2015-06-02 18:26:38.504 | raise mismatch_error
2015-06-02 18:26:38.543 | testtools.matchers._impl.MismatchError: 10 != 11: [{'priority': 'INFO', 'publisher_id': 'consistencygroup.bare-trusty-1433254444', 'payload': {'status': u'available', 'user_id': u'fake', 'name': u'test_cg', 'availability_zone': u'nova', 'tenant_id': u'fake', 'created_at': '2015-06-02T19:27:08.108679', 'consistencygroup_id': u'e9f5dbad-daf2-43e2-b633-d3f10167189b'}, 'event_type': 'consistencygroup.create.start'}, {'priority': 'INFO', 'publisher_id': 'consistencygroup.bare-trusty-1433254444', 'payload': {'status': 'available', 'user_id': u'fake', 'name': u'test_cg', 'availability_zone': u'nova', 'tenant_id': u'fake', 'created_at': '2015-06-02T19:27:08.108679', 'consistencygroup_id': u'e9f5dbad-daf2-43e2-b633-d3f10167189b'}, 'event_type': 'consistencygroup.create.end'}, {'priority': 'INFO', 'publisher_id': 'backup.bare-trusty-1433254444', 'payload': {'status': None, 'backup_id': u'123', 'user_id': None, 'service': None, 'availability_zone': None, 'tenant_id': None, 'created_at': '2015-06-02 18:22:02.592221', 'list': [], 'host': None, 'backup_percent': 0, 'display_name': None, 'prefix': u'volume_1234-5678-1234-8888/20150602182202/az_nova_backup_123', 'volume_meta': None, 'volume_id': u'1234-5678-1234-8888', 'fail_reason': None, 'service_metadata': u'volume_1234-5678-1234-8888/20150602182202/az_nova_backup_123', 'id': 1, 'size': 1}, 'event_type': 'backup.createprogress'}, {'priority': 'INFO', 'publisher_id': u'volume.bare-trusty-1433254444', 'payload': {'status': u'creating', 'display_name': u'test_volume', 'availability_zone': u'fake_az', 'tenant_id': u'fake', 'created_at': '2015-06-02T19:27:08.108679', 'volume_id': u'c795d5e3-284e-4f1c-a26f-ef1d22c64c90', 'volume_type': None, 'host': u'bare-trusty-1433254444', 'replication_driver_data': None, 'replic...

Read more...

tags: added: gate-failure
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/187766

Revision history for this message
Eric Harney (eharney) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/187766
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e5f6ed69e0031900f97da185b5363ebd3dd60c0c
Submitter: Jenkins
Branch: master

commit e5f6ed69e0031900f97da185b5363ebd3dd60c0c
Author: Eric Harney <email address hidden>
Date: Tue Jun 2 16:26:58 2015 -0400

    Disable backup progress notifications for unit tests

    We have code somewhere which is firing backup createprogress
    notifications into the notifier, which causes random unit
    test failures.

    Until we determine exactly why this happening, attempt to
    avoid this problem in the gate by disabling these notifications.

    Related-Bug: #1412513

    Change-Id: I50b7946562e757ee26cb68b17b4a5ffcd5f3fabd

Matt Riedemann (mriedem)
Changed in cinder:
importance: Undecided → High
Revision history for this message
Eric Harney (eharney) wrote :

I was able to hit this only running a small subset of Cinder tests, which may narrow things down a bit:

$ tox -epy27 -- --concurrency=25 --regex "(cinder.tests.unit.test_volume|cinder.tests.unit.test_backup)"
...
    testtools.matchers._impl.MismatchError: 'consistencygroup.create.start' != 'volume.create.end'

Revision history for this message
Eric Harney (eharney) wrote :

Also hit this only running 38 tests with this command, instead of the above which is 400+ :

$ tox -epy27 -- --concurrency=25 --regex "(test_backup.BackupTestCase|VolumeTestCase.test_create_delete_volume|VolumeTestCase.test_create_consistencygroup_from_src)"

testtools.matchers._impl.MismatchError: 0 != 1
   in test_create_delete_volume

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
Eric Harney (eharney) wrote :
Changed in cinder:
assignee: Eric Harney (eharney) → Gorka Eguileor (gorka)
Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
Tom Barron (tpb) wrote :
Revision history for this message
Eric Harney (eharney) wrote :

Hit stable/kilo gate here: https://review.openstack.org/#/c/217197/

Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-1 → 7.0.0
Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :
Download full text (4.1 KiB)

And this problem continues:

14:27:19 ==============================
14:27:19 Failed 1 tests - output below:
14:27:19 ==============================
14:27:19
14:27:19 cinder.tests.unit.test_volume.VolumeMigrationTestCase.test_retype_volume_with_type_only
14:27:19 ---------------------------------------------------------------------------------------
14:27:19
14:27:19 Captured traceback:
14:27:19 ~~~~~~~~~~~~~~~~~~~
14:27:19 Traceback (most recent call last):
14:27:19 File "cinder/tests/unit/test_volume.py", line 5302, in test_retype_volume_with_type_only
14:27:19 self._retype_volume_exec(True, reserve_vol_type_only=True)
14:27:19 File "cinder/tests/unit/test_volume.py", line 5250, in _retype_volume_exec
14:27:19 (self.notifier.notifications))
14:27:19 File "/home/jenkins/workspace/check-cinder-tox/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
14:27:19 self.assertThat(observed, matcher, message)
14:27:19 File "/home/jenkins/workspace/check-cinder-tox/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
14:27:19 raise mismatch_error
14:27:19 testtools.matchers._impl.MismatchError: 1 != 2: Notifier count incorrect [{'priority': 'INFO', 'publisher_id': 'backup.slave-59824-bare-trusty-rax-iad', 'payload': {'status': None, 'user_id': u'c853ca26-e8ea-4797-8a52-ee124a013d0e', 'backup_id': '707844eb-6d8a-4ac1-8b98-618e1c0b3a3a', 'fail_reason': None, 'service': None, 'availability_zone': None, 'tenant_id': u'89afd400-b646-4bbc-b12b-c0a4d63e5bd3', 'created_at': '2016-06-14 14:20:47.566865+00:00', 'volume_id': 'c09b1ad4-5f0e-4d3f-8b9e-0000004caec8', 'list': [], 'prefix': 'volume_c09b1ad4-5f0e-4d3f-8b9e-0000004caec8/20160614142047/az_nova_backup_707844eb-6d8a-4ac1-8b98-618e1c0b3a3a', 'backup_percent': 0, 'parent_id': None, 'host': None, 'volume_meta': None, 'snapshot_id': None, 'display_name': None, 'num_dependent_backups': 0, 'service_metadata': u'volume_c09b1ad4-5f0e-4d3f-8b9e-0000004caec8/20160614142047/az_nova_backup_707844eb-6d8a-4ac1-8b98-618e1c0b3a3a', 'id': 1, 'size': 1}, 'event_type': 'backup.createprogress'}, {'priority': 'INFO', 'publisher_id': 'volume.slave-59824-bare-trusty-rax-iad', 'payload': {'status': u'available', 'display_name': u'test_volume', 'volume_attachment': [], 'availability_zone': u'fake_az', 'tenant_id': u'7f265bd4-3a85-465e-a899-5dc4854a86d3', 'created_at': '2016-06-14T14:24:47.561090+00:00', 'volume_id': '937a7a68-6127-4c59-a8c3-841512cf3650', 'volume_type': u'15a0548b-0a38-4602-8c73-7cb6b91cb84d', 'host': u'slave-59824-bare-trusty-rax-iad', 'replication_driver_data': None, 'replication_status': u'disabled', 'snapshot_id': None, 'replication_extended_status': None, 'user_id': u'c853ca26-e8ea-4797-8a52-ee124a013d0e', 'metadata': [], 'launched_at': '2016-06-14T14:24:47.788337', 'size': 1}, 'event_type': 'volume.retype'}]
14:27:19
14:27:19
14:27:19 Captured pythonlogging:
14:27:19 ~~~~~~~~~~~~~~~~~~~~~~~
14:27:19 2016-06-14 14:24:47,450 INFO [cinder.rpc] Automatically selected cinder-scheduler objects version 1.4 as minimum service version.
14:27:19 2016-06-14 14:2...

Read more...

Revision history for this message
Eric Harney (eharney) wrote :

https://review.openstack.org/#/c/317055/ should clean up the current issues.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.