xen volume auto device selection always picks xvdb

Bug #1061944 reported by clayg on 2012-10-04
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
clayg
Folsom
Medium
Vish Ishaya
nova (Ubuntu)
Undecided
Unassigned
Quantal
Undecided
Unassigned

Bug Description

on most of the xen deploy's I've seen, the 'root_device_name' attribute on the instance is None, and instance_block_mapping will short-circuit the return path with-out evaluating bdms.

This is the attach request, after do_reserve has already returned the selected "mountpoint" (it's not really a *mount* is it?).

2012-10-04 23:07:49 DEBUG nova.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-45592fb5-143b-4f40-a168-d1ff0c0c0aa4', u'_context_quota_class': None, u'_context_project_name': u'openStackCastleLab', u'_context_service_catalog': None, u'_context_user_name': u'clayg', u'_context_auth_token': '<SANITIZED>', u'args': {u'mountpoint': u'/dev/xvdb', u'instance': {u'vm_state': u'active', u'availability_zone': None, u'terminated_at': None, u'ephemeral_gb': 0, u'instance_type_id': 1, u'user_data': None, u'vm_mode': u'xen', u'deleted_at': None, u'reservation_id': u'r-pax0p2jn', u'id': 1, u'security_groups': [{u'deleted_at': None, u'user_id': u'ed4bd089a2f449dca0828a2c42dbfb77', u'name': u'default', u'deleted': False, u'created_at': u'2012-10-04T01:29:31.000000', u'updated_at': None, u'rules': [], u'project_id': u'3f4884d9c31d4393a11158e09b816a5b', u'id': 1, u'description': u'default'}], u'disable_terminate': False, u'user_id': u'ed4bd089a2f449dca0828a2c42dbfb77', u'uuid': u'6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca', u'server_name': None, u'default_swap_device': None, u'info_cache': {u'instance_uuid': u'6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca', u'deleted': False, u'created_at': u'2012-10-04T01:29:31.000000', u'updated_at': u'2012-10-04T01:29:34.000000', u'network_info': u'[{"network": {"bridge": "xenbr0", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.127.0.130"}], "version": 4, "meta": {}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "10.6.23.4"}, {"meta": {}, "version": 4, "type": "dns", "address": "10.6.23.5"}], "routes": [], "cidr": "10.127.0.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.127.0.1"}}, {"ips": [], "version": null, "meta": {}, "dns": [], "routes": [], "cidr": null, "gateway": {"meta": {}, "version": null, "type": "gateway", "address": null}}], "meta": {"tenant_id": null}, "id": "130460c1-8dd6-45d0-94bb-870777dc1f21", "label": "public"}, "meta": {}, "id": "0b8f347d-b301-439e-8b88-274a43f7cc2e", "address": "fa:16:3e:2a:89:f7"}]', u'deleted_at': None, u'id': 1}, u'hostname': u'test01', u'launched_on': u'localhost.localdomain', u'display_description': u'test01', u'key_data': None, u'kernel_id': u'', u'power_state': 1, u'default_ephemeral_device': None, u'progress': 100, u'project_id': u'3f4884d9c31d4393a11158e09b816a5b', u'launched_at': u'2012-10-04T01:34:19.000000', u'scheduled_at': u'2012-10-04T01:29:31.000000', u'ramdisk_id': u'', u'access_ip_v6': None, u'access_ip_v4': None, u'deleted': False, u'key_name': None, u'updated_at': u'2012-10-04T01:35:30.000000', u'host': u'localhost.localdomain', u'display_name': u'test01', u'task_state': None, u'shutdown_terminate': False, u'architecture': None, u'root_gb': 40, u'locked': False, u'name': u'instance-6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca', u'created_at': u'2012-10-04T01:29:31.000000', u'launch_index': 0, u'metadata': [], u'memory_mb': 4096, u'instance_type': {u'disabled': False, u'root_gb': 40, u'deleted_at': None, u'name': u'm1.medium', u'deleted': False, u'created_at': None, u'ephemeral_gb': 0, u'updated_at': None, u'memory_mb': 4096, u'vcpus': 2, u'swap': 0, u'rxtx_factor': 1.0, u'is_public': True, u'flavorid': u'3', u'vcpu_weight': None, u'id': 1}, u'vcpus': 2, u'image_ref': u'6a86087a-2c71-4f50-90ae-1b884eb6bb22', u'root_device_name': None, u'auto_disk_config': None, u'os_type': None, u'config_drive': u''}, u'volume_id': u'bf5ddd27-6f8e-427b-b0a1-e66ad67c6bce'}, u'_context_instance_lock_checked': False, u'_context_is_admin': True, u'version': u'2.0', u'_context_project_id': u'3f4884d9c31d4393a11158e09b816a5b', u'_context_timestamp': u'2012-10-04T23:07:49.337942', u'_context_read_deleted': u'no', u'_context_user_id': u'ed4bd089a2f449dca0828a2c42dbfb77', u'method': u'attach_volume', u'_context_remote_address': u'10.6.61.108'} from (pid=4897) _safe_log /opt/nova/nova/openstack/common/rpc/common.py:195

Even though this instance already had a "auto device_name" volume attached, the do_reserve is returning/attach call is sending:

'mountpoint': u'/dev/xvdb'

And the problem seems to mostly be the instance has:

'root_device_name': None

Which unsurprisingly leads to a traceback:

2012-10-04 23:08:59 ERROR nova.virt.xenapi.volumeops [req-c6af01f8-1d42-4f93-ada0-4fc217c56bdc ed4bd089a2f449dca0828a2c42dbfb77 3f4884d9c31d4393a11158e09b816a5b] ['DEVICE_ALREADY_EXISTS', '1']
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops Traceback (most recent call last):
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops File "/opt/nova/nova/virt/xenapi/volumeops.py", line 178, in attach_volume
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops dev_number, bootable=False)
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops File "/opt/nova/nova/virt/xenapi/vm_utils.py", line 332, in create_vbd
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops vbd_ref = session.call_xenapi('VBD.create', vbd_rec)
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops File "/opt/nova/nova/virt/xenapi/driver.py", line 714, in call_xenapi
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops return session.xenapi_request(method, args)
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops File "/usr/local/lib/python2.6/dist-packages/XenAPI.py", line 133, in xenapi_request
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops result = _parse_result(getattr(self, methodname)(*full_params))
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops File "/usr/local/lib/python2.6/dist-packages/XenAPI.py", line 203, in _parse_result
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops raise Failure(result['ErrorDescription'])
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops Failure: ['DEVICE_ALREADY_EXISTS', '1']
2012-10-04 23:08:59 TRACE nova.virt.xenapi.volumeops

There's some more in the error handling:

2012-10-04 23:09:00 ERROR nova.compute.manager [req-c6af01f8-1d42-4f93-ada0-4fc217c56bdc ed4bd089a2f449dca0828a2c42dbfb77 3f4884d9c31d4393a11158e09b816a5b] [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] Failed to attach volume 8d8b78b8-bff3-48fe-be34-a3d589076028 at /dev/xvdb
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] Traceback (most recent call last):
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] File "/opt/nova/nova/compute/manager.py", line 2001, in _attach_volume
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] mountpoint)
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] File "/opt/nova/nova/virt/xenapi/driver.py", line 381, in attach_volume
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] mountpoint)
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] File "/opt/nova/nova/virt/xenapi/volumeops.py", line 183, in attach_volume
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] ' instance %(instance_name)s') % locals())
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca] Exception: Unable to use SR OpaqueRef:fd0c55a7-7584-2907-4791-654dd06428ed for instance instance-6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca
2012-10-04 23:09:00 TRACE nova.compute.manager [instance: 6819fd4d-e1db-4cdf-86f3-3cb8594eb1ca]

clayg (clay-gerrard) wrote :
description: updated

Fix proposed to branch: master
Review: https://review.openstack.org/14068

Changed in nova:
assignee: nobody → clayg (clay-gerrard)
status: New → In Progress

Reviewed: https://review.openstack.org/14068
Committed: http://github.com/openstack/nova/commit/07845ad68f83952bbae36b4b115ff6c433e81fa3
Submitter: Jenkins
Branch: master

commit 07845ad68f83952bbae36b4b115ff6c433e81fa3
Author: Clay Gerrard <email address hidden>
Date: Thu Oct 4 18:50:39 2012 -0500

    Always use bdm in instance_block_mapping on Xen

    It doesn't seem that Xen will set the 'root_device_name' property on
    instances. This is causing instance_block_mapping to return early
    without evaluating the bdm before returning the list of device_names
    in use. I'm not sure why instance_block_mapping does this
    (optimization?) but on Xen at least it seems that it should not.

    Added a check for Xen compute_driver flag in instance_block_mapping to
    "guess" the root_device_name (similarlly to what is done in
    compute.utils for swap and ephemeral).

    fixes bug #1061944

    Change-Id: If5b1a2b7377232c78f0629a3624552ecf6ceb0ee

Changed in nova:
status: In Progress → Fix Committed
tags: added: folsom-backport-potential
Changed in nova:
importance: Undecided → Medium
Vish Ishaya (vishvananda) wrote :

Note: backport also needs https://review.openstack.org/#/c/14991/

Thierry Carrez (ttx) on 2012-11-21
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
tags: removed: folsom-backport-potential

Reviewed: https://review.openstack.org/16713
Committed: http://github.com/openstack/nova/commit/a369303b93500694b22b7d49785b47dcd734acf4
Submitter: Jenkins
Branch: stable/folsom

commit a369303b93500694b22b7d49785b47dcd734acf4
Author: Clay Gerrard <email address hidden>
Date: Thu Oct 4 18:50:39 2012 -0500

    Always use bdm in instance_block_mapping on Xen

    It doesn't seem that Xen will set the 'root_device_name' property on
    instances. This is causing instance_block_mapping to return early
    without evaluating the bdm before returning the list of device_names
    in use. I'm not sure why instance_block_mapping does this
    (optimization?) but on Xen at least it seems that it should not.

    Added a check for Xen compute_driver flag in instance_block_mapping to
    "guess" the root_device_name (similarlly to what is done in
    compute.utils for swap and ephemeral).

    fixes bug #1061944

    Change-Id: If5b1a2b7377232c78f0629a3624552ecf6ceb0ee
    (cherry picked from commit 07845ad68f83952bbae36b4b115ff6c433e81fa3)

Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Quantal):
status: New → Confirmed

Hello clayg, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-04-04
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers