Ceilometer

ceilometer/tests/test_bin.py can fail on slow machine

Bug #1342765 reported by Chris Dent on 2014-07-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Won't Fix	Low	Unassigned

Bug Description

test_bin.py spawns a vareity of subprocess for the various "bins" (scripts) that ceilometer has. One subprocess contains an api server and the test of it makes requests to /v2/meters to confirm a 200 response and an empty JSON list.

This test can sometimes fail:

self.assertEqual(200, response.status)
AttributeError: 'NoneType' object has no attribute 'status'

This happens because the HTTP request to the server is getting connection refused more than 10 times. The server does not finish initializing before a for loop with a .5 sleep between HTTP requests completes.

The lame fix for this is to increase either the timeout or the number of loops but this doesn't strike me as robust. In fact I'd go so far as to say any test (in a high latency environment) that is contingent on a timeout is a pretty bad smell. A slightly less worse option may be to sleep a bit (in the parent) before entering the loop. Still pretty icky.

So what's the other option? Not really clear. The test's goal is to test the viability of the ceilometer-api console-script. Is that really necessary. Are we interested in that specifically or are we more concerned that the WSGI interface works as designed? I don't know? Anyone?

Revision history for this message

Chris Dent (cdent) wrote on 2014-07-23:

In my specific case the slowness is caused by 0.0.0.0 resolving very slowly in a call to `socket.get_fqdn()`. It can be resolved by making some changes to the local `/etc/hosts`.

However the same point remains: The test is very fragile because of the way it handles timeouts and uses subprocesses. Is this wise?

Revision history for this message

gordon chung (chungg) wrote on 2014-08-29:

define slow machine? i'm marking as low priority since i assume this doesn't affect a machine with specs from last 5 years.

Changed in ceilometer:
importance:	Undecided → Low
status:	New → Triaged

Revision history for this message

Chris Dent (cdent) wrote on 2014-08-29:

I think we can probably just kill it, the slowness turned out to be mostly a dns problem, but the secondary point -- using a loop around a subprocess -- is a bit fragile for a test scenario.

Revision history for this message

ZhiQiang Fan (aji-zqfan) wrote on 2014-11-03:

Even my 5 years old laptop can run this test...

gordon chung (chungg) on 2015-09-15

Changed in ceilometer:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.