Uploading and downloading VHDs via Glance XenAPI plugin doesn't always retry when it should
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Jesse J. Cook |
Bug Description
Encountered a situation where one glance node could not talk to registry which resulted in a high number of upload_vhd errors. The Glance XenAPI plugin doesn't properly differentiate between server permanent and globally permanent errors. This is only reasonable behavior in the case where there is a single glance node. In the case of many glance nodes retrying a different server is preferable.
Ideally:
Retry until:
1. A non-retryable error is encountered (e.g. 403)
2. Max retries is reached
3. No servers left to retry (i.e. every server was dropped from the retry list due to a permanent error)
If the glance nodes sit behind a load balancer (proxy), this approach could result in the LB being treated as a single glance endpoint (no retries for server errors). Retrying on server errors without dropping servers with server errors from the list could result in unnecessary retries, especially in the case where there is only a single glance node.
Additionally, if multiple errors are encountered, only the last error is logged as an instance error. Every error should be recorded.
Examples:
Current:
* The plugin tries to upload using 1 of n glance nodes (n > 1)
* An ephemeral (retryable) error is encountered
* The plugin retries using a different glance node
* An error related to a server fault (e.g. 500) is encountered
* The plugin does not retry
* Instance fault
Expected:
* The plugin tries to upload using 1 of n glance nodes (n > 1)
* An ephemeral (retryable) error is encountered
* Instance fault
* The plugin retries using a different glance node
* An error related to a server fault (e.g. 500) is encountered
* The plugin retries using a different glance node
* Success
Changed in nova: | |
assignee: | nobody → Jesse J. Cook (jesse-j-cook) |
status: | New → In Progress |
Changed in nova: | |
milestone: | none → kilo-1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | kilo-1 → 2015.1.0 |
Fix proposed to branch: master /review. openstack. org/128090
Review: https:/