[SRU] (libvirt) KeyError updating resources for some node, guest.uuid is not in BDM list
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Dan Smith | ||
Mitaka |
Won't Fix
|
Undecided
|
Edward Hope-Morley | ||
Newton |
Fix Committed
|
Medium
|
Lee Yarwood | ||
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Mitaka |
Fix Released
|
Medium
|
Edward Hope-Morley | ||
Newton |
Fix Released
|
Medium
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Edward Hope-Morley |
Bug Description
[Impact]
There currently exists a race condition whereby the compute resource_tracker periodic task polls extant instances and checks their BDMs which can occur prior to any mappings having yet been created e.g. root disk mapping for new instances. This patch ensures that instances without any BDMs are skipped.
[Test Case]
* deploy Openstack Mitaka with debug logging enabled (not essential but helps)
* create an instance
* delete its BDMs - pastebin.
* watch /var/log/
* ensure that exception mentioned in LP does not occur (happens after "Auditing locally available compute resources for node")
[Regression Potential]
The resource tracker information is used by the scheduler when deciding which compute hosts are able to have an instances scheduled to them. In this case the resource tracker would be skipping instances that would contribute to disk overcommit ratios. As such it is possible that that scheduler will have momentarily skewed information about resource consumption on that compute host until the next resource_tracker tick. Since the likelihood of this race condition occurring is hopefully slim and provided that users have a reasonable frequency for the resource_tracker, the likelihood of this becoming a long term problem is low since the issue will always be corrected by a subsequent tick (although if the compute host in question were saturated that would not be fixed until an instances was deleted or migrated).
[Other]
Note that this patch did not make it into upstream stable/mitaka branch due to the stable cutoff so the proposal is to carry in the archive (indefinitely).
--------
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
2016-07-12 09:54:36.021 10056 ERROR nova.compute.
Changed in fuel-plugin-contrail: | |
assignee: | nobody → shiliang (shiliang) |
affects: | fuel-plugin-contrail → nova |
Changed in nova: | |
status: | New → In Progress |
Changed in nova: | |
assignee: | shiliang (shiliang) → Dan Smith (danms) |
tags: | added: sts |
Changed in cloud-archive: | |
status: | New → Fix Released |
tags: | added: canonical-bootstack |
no longer affects: | ubuntu |
no longer affects: | Ubuntu Xenial |
tags: | added: sts-sru |
tags: |
added: sts-sru-needed removed: sts-sru |
Changed in nova (Ubuntu Xenial): | |
assignee: | nobody → Edward Hope-Morley (hopem) |
summary: |
- (libvirt) KeyError updating resources for some node, guest.uuid is not - in BDM list + [SRU] (libvirt) KeyError updating resources for some node, guest.uuid is + not in BDM list |
description: | updated |
tags: | added: sts-sponsor |
description: | updated |
tags: | removed: sts-sponsor |
Changed in nova (Ubuntu): | |
importance: | Undecided → Medium |
Changed in nova (Ubuntu Xenial): | |
importance: | Undecided → Medium |
tags: |
added: verification-mitaka-done removed: verification-mitaka-needed |
tags: |
added: sts-sru-done removed: sts-sru-needed |
I confirm this case resource_ tracker [req-11cba8bf- 6613-4d41- 8e1d-8bf310942c ed - - - - -] Auditing locally available compute resources for node node1.parking.cloud manager [req-11cba8bf- 6613-4d41- 8e1d-8bf310942c ed - - - - -] Error updating resources for node node1.parking. cloud. manager Traceback (most recent call last): manager File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ manager. py", line 6452, in update_ available_ resource manager rt.update_ available_ resource( context) manager File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ resource_ tracker. py", line 500, in update_ available_ resource manager resources = self.driver. get_available_ resource( self.nodename) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 5376, in get_available_ resource manager disk_over_committed = self._get_ disk_over_ committed_ size_total( ) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 7054, in _get_disk_ over_committed_ size_total manager local_instances [guest. uuid], bdms[guest.uuid]) manager KeyError: 'c2d1e02b- 2e71-44c9- 8d6b-4adb6be0a3 4f'
2016-07-12 12:34:33.724 3955 INFO nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.
2016-07-12 12:34:33.807 3955 ERROR nova.compute.