Compute worker silently fails when XS host runs out of space

Bug #699878 reported by Rick Harris
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Sandy Walsh

Bug Description

This bug only affects compute workers using the XenServer hypervisor.

If a XenServer's Dom0 disk fills up, the compute worker's call to XenServerSession login will hang. This will cause the compute worker service to not start. This will, in turn, cause messages to accumulate on the worker's queue, making it wrongly appear to be a problem in the RPC code-path (Rabbit, carrot, etc).

Proposed fix:

We should be able to detect the login hanging, time it out, and then log an error message on the compute worker. Perhaps something like:

"Unable to login using XenServerSession (is the disk full?)"

Related branches

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

maybe this is related to the bug #692994.

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Changed in nova:
assignee: nobody → Sandy Walsh (sandy-walsh)
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Ok, I'm able to reproduce this now. Working on a fix.

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Hmm, even easier way to reproduce this problem. Just stop the xapi service running on Dom0.

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Sorry, just stopping the service gives different results (the socket simply can't connect). In low disk conditions login_with_password() completely freezes.

Josh Kearney (jk0)
Changed in nova:
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.