mongodb functional tests have race condition due to fixed sleep time

Bug #1518468 reported by Ryan Beisner
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MongoDB Charm
Fix Released
Undecided
Mario Splivalo

Bug Description

The mongodb amulet tests have been variably passing/failing in automation since I began observing them not long ago. This is due to the arbitrary one-time sleep period, apparently intended to allow enough time for the charm and its deployed services to quiesce.

A sleep is not reliable, as the wait time is not always long enough for the charm and replica to settle before tests commence. The amount of time necessary to wait will vary, depending on test rig load, internet speed, and other factors, and is not reliably predictable.

Increasing sleep duration isn't an ideal solution, as it only defers the potential for race, and causes test rigs to be idle and blocking unnecessarily when not under increased load.

Checking that the underlying service is ready, in a polling interval loop with a timeout of several minutes, before testing, would be ideal.

That, and/or implementing extended status and having the charm advertise when it is settled and ready, and only testing once that condition is met.

Tags: amulet uosci

Related branches

Ryan Beisner (1chb1n)
description: updated
summary: - mongodb functional tests are racey due to fixed sleep time
+ mongodb functional tests have race condition due to fixed sleep time
Ryan Beisner (1chb1n)
description: updated
description: updated
Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI, two recent proposals, with automated tests failing in various race windows. In a handful of spot checks of those results, they are failing due to tests jumping the gun.

https://code.launchpad.net/~evarlast/charms/trusty/mongodb/fix-dump-actions/+merge/277191

https://code.launchpad.net/~tvansteenburgh/charms/trusty/mongodb/use-charm-benchmark-lib/+merge/278044

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Ryan I merged 277191 because the change replaced some arbitrary sleep() calls with Amulet wait() methods. There is still room for improvement in these functional tests. I highly encourage anyone who knows the mongodb service well to contribute tests that replace all sleep() functions with reliable ways to determine the correct time to run tests and eliminate the race condition. Such as running a command on the unit to tell when it is ready, or using mongo libraries to determine the ready state.

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

I've reviewed the amulet failures for the first mp that Ryan Beisner pasted in comment#1 (as second URL is no longer available). The amulet failure is attributed to 03_deploy_replicaset.py failing - this is because sometimes when replicaset is formed mongodb/0 unit is started after mongodb/1, so you have two PRIMARYes in the replicaset.

As the code for building replicaset was written before leader election in juju, there was no safe way of determining which unit should initialize the replicaset - therefore, the unit with the lowest number is initializing the replicaset; which, as shown, sometimes fails.

I will fix this in the charm, and then amulet tests should not fail any more.

This is the reason for the failure of mongodb when used with landscape, as explained in this bug:
https://bugs.launchpad.net/charms/+source/mongodb/+bug/1467742

Changed in mongodb (Juju Charms Collection):
status: New → Confirmed
assignee: nobody → Mario Splivalo (mariosplivalo)
Changed in mongodb (Juju Charms Collection):
status: Confirmed → In Progress
affects: mongodb (Juju Charms Collection) → mongodb-charm
Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

This is now fixed as mongodb charm code uses leader election when choosing a unit to initialize replicaset from: https://code.launchpad.net/~mariosplivalo/mongodb-charm/+git/mongodb-charm/+merge/340136

Changed in mongodb-charm:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.