ceilometer/tests/test_bin.py can fail on slow machine

Bug #1342765 reported by Chris Dent
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Won't Fix
Low
Unassigned

Bug Description

test_bin.py spawns a vareity of subprocess for the various "bins" (scripts) that ceilometer has. One subprocess contains an api server and the test of it makes requests to /v2/meters to confirm a 200 response and an empty JSON list.

This test can sometimes fail:

    self.assertEqual(200, response.status)
    AttributeError: 'NoneType' object has no attribute 'status'

This happens because the HTTP request to the server is getting connection refused more than 10 times. The server does not finish initializing before a for loop with a .5 sleep between HTTP requests completes.

The lame fix for this is to increase either the timeout or the number of loops but this doesn't strike me as robust. In fact I'd go so far as to say any test (in a high latency environment) that is contingent on a timeout is a pretty bad smell. A slightly less worse option may be to sleep a bit (in the parent) before entering the loop. Still pretty icky.

So what's the other option? Not really clear. The test's goal is to test the viability of the ceilometer-api console-script. Is that really necessary. Are we interested in that specifically or are we more concerned that the WSGI interface works as designed? I don't know? Anyone?

Revision history for this message
Chris Dent (cdent) wrote :

In my specific case the slowness is caused by 0.0.0.0 resolving very slowly in a call to `socket.get_fqdn()`. It can be resolved by making some changes to the local `/etc/hosts`.

However the same point remains: The test is very fragile because of the way it handles timeouts and uses subprocesses. Is this wise?

Revision history for this message
gordon chung (chungg) wrote :

define slow machine? i'm marking as low priority since i assume this doesn't affect a machine with specs from last 5 years.

Changed in ceilometer:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Chris Dent (cdent) wrote :

I think we can probably just kill it, the slowness turned out to be mostly a dns problem, but the secondary point -- using a loop around a subprocess -- is a bit fragile for a test scenario.

Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

Even my 5 years old laptop can run this test...

gordon chung (chungg)
Changed in ceilometer:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.