OpenStack Compute (Nova)

glance plugin should try multiple servers on failure

Reported by Johannes Erdfelt on 2012-03-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Johannes Erdfelt

Bug Description

The glance plugin for xenapi will retry contacting a glance host a configurable number of times before giving up. Unfortunately, while multiple glance hosts can be configured in nova, it will pick one for each call to the glance plugin. If the picked host is not functioning correctly, it will never try another host that might be up.

It should send a list of hosts to the plugin and let the retry logic in the plugin try multiple hosts instead of one.

Changed in nova:
assignee: nobody → Johannes Erdfelt (johannes.erdfelt)

I think such a logic should be in the compute manager so that every driver can benefit from it (e.g. the manager tries a HEAD request or something like that). This would leave the driver unaware of the fact that multiple glance apis are available. Once the manager has determined that there is a valid glance api to get images from/push snapshots to, it will then pass that api reference to the driver. The driver code hence remains fairly unchanged.

That would require a fairly significant restructuring of the code since the compute manager is pretty ignorant about image services or how the drivers actually work. Implementing retries at the manager level would be pretty awkward right now.

I'm testing a change which moves the retry logic from the glance plugin to the xenapi driver. This will allow it to cycle through glance hosts to find one that is actually working.

I'm not against a more significant restructuring, but as an intermediate step I think the patch I'll propose later today will be a good minimal step.

make sense...let's see what you got :)

however how about doing it in a way that it's common code in the virt layer? At least that could be reused by the libvirt and the vmwareapi drivers.

The glance driver already handles retries correctly so it works fine for any use in nova-compute.

It's just broken in the glance xenapi plugin since it has it's own implementation for communicating with glance (since it runs on dom0 and not in nova-compute).

Fix proposed to branch: master
Review: https://review.openstack.org/4821

Changed in nova:
status: New → In Progress

Reviewed: https://review.openstack.org/4821
Committed: http://github.com/openstack/nova/commit/c4a2e17dcfbd7b6434a7dfae3c7a3e5f30a3fc87
Submitter: Jenkins
Branch: master

commit c4a2e17dcfbd7b6434a7dfae3c7a3e5f30a3fc87
Author: Johannes Erdfelt <email address hidden>
Date: Thu Mar 1 18:49:44 2012 +0000

    Retry download_vhd with different glance host each time

    Fixes bug 944096

    Change-Id: I33aa3774ba7f266e85f09c6c569fdd0f895478b4

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-20
Changed in nova:
milestone: none → essex-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers