After folsom upgrade, instances can no longer access existing volumes.

Bug #1065702 reported by Adam Gandelman on 2012-10-11
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
High
John Griffith
Folsom
High
John Griffith
OpenStack Compute (nova)
High
John Griffith
Folsom
High
John Griffith
Ubuntu Cloud Archive
High
Unassigned
cinder (Ubuntu)
Undecided
Unassigned
Quantal
High
Unassigned
Raring
Undecided
Unassigned
nova (Ubuntu)
High
Unassigned
Quantal
High
Unassigned
Raring
High
Unassigned

Bug Description

After an upgrade from essex to folsom, attempting to attach volumes that existed pre-upgrade to instances does not work. Attempting to attach an existing volume to an instance results in a traceback ending in:

'.join(cmd))
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001] ProcessExecutionError: Unexpected error while running command.
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000002 -p
192.168.20.9:3260 --rescan
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001] Exit code: 255
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001] Stdout: ''
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001] Stderr: 'iscsiadm: No portal found.\n'
2012-10-11 15:18:53 TRACE nova.compute.manager [instance: e1eb74b1-c063-48b5-ab3a-1ec935b26001]

It looks like compute is trying to find the volume with target named iqn.2010-10.org.openstack:volume-00000002. However, the upgrade from Essex to Folsom migrates all volume IDs to UUIDs. On the volume host:

 ~# tgt-admin --dump
default-driver iscsi

<target iqn.2010-10.org.openstack:volume-cd04dd3b-f8af-4400-bb91-8051aa05ef63>
</target>

<target iqn.2010-10.org.openstack:volume-9def01e3-0929-4735-932e-338a29374df3>
</target>

<target iqn.2010-10.org.openstack:volume-1eba5cc6-747b-4df9-810c-1f8debfd2dad>
</target>

The volumes table in the database, where I assume this connection information originates from on attach, still contains the old data:
mysql> select provider_location from volumes;

+-----------------------------------------------------------------+
| provider_location |
+-----------------------------------------------------------------+
| 192.168.20.9:3260,2 iqn.2010-10.org.openstack:volume-00000002 1 |
| 192.168.20.9:3260,1 iqn.2010-10.org.openstack:volume-00000001 1 |
| 192.168.20.9:3260,3 iqn.2010-10.org.openstack:volume-00000003 1 |
+-----------------------------------------------------------------+
3 rows in set (0.00 sec)

Changed in nova (Ubuntu):
importance: Undecided → High
Vish Ishaya (vishvananda) wrote :

ugh, this is nasty. Looks like we need a migration for this. For backport we could add a translation using the ec2_id table to convert from the old name to the new one. Some testing needs to be done to figure out if the new iqns are actually there in all cases or if there is a case where the old iqn will exist.

Changed in nova:
importance: Undecided → High
Changed in cinder:
importance: Undecided → High
tags: added: openstack-ubuntu-upgrade
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
Changed in nova:
assignee: nobody → John Griffith (john-griffith)
Dave Spano (dspano) wrote :

I used the block_device_mapping table to figure out what to rename. Just throwing that out there for people who need to change their existing volumes before a fix is put out.

Chuck Short (zulcss) on 2012-10-22
Changed in nova (Ubuntu):
status: New → Confirmed
Changed in nova:
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/14615

Changed in nova:
status: Confirmed → In Progress
Changed in cinder:
status: New → In Progress
Changed in cinder:
milestone: none → grizzly-1

Reviewed: https://review.openstack.org/14790
Committed: http://github.com/openstack/cinder/commit/524c7fa6dfade6efcabda19b105d58cab8c434e2
Submitter: Jenkins
Branch: master

commit 524c7fa6dfade6efcabda19b105d58cab8c434e2
Author: John Griffith <email address hidden>
Date: Wed Oct 24 16:28:02 2012 -0600

    Detect and fix issues caused by vol ID migration

    The migration from volume ID to UUID neglected to update the provider_location
    field on the volume. As a result the iqn and volume name no long match and
    existing volumes are no longer able to be attached after an upgrade
    (essex -> folsom and then nova-vol->cinder).

    This patch adds a method to the volume driver that will check for the
    mismatch of volume name in the iqn during service start up. If
    detected it will update the provider_location field in the database
    to include the new ID. Also it will create a symlink to the device backing
    file that also has the correct naming convention.

    Note: We don't disturb an connections that are currently attached.
    For this case we add a check in manager.detach and do any provider_location
    cleanup that's needed at that time. This ensures that connections
    persist on restarts of tgtd and reboot.

    Change-Id: I8224824b793c98a9767c5d8dd741d892be720c4f
    Fixes: bug 1065702

Changed in cinder:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/15157
Committed: http://github.com/openstack/cinder/commit/7f34ba39abcaa4f885d7448b427148fab4e49892
Submitter: Jenkins
Branch: stable/folsom

commit 7f34ba39abcaa4f885d7448b427148fab4e49892
Author: John Griffith <email address hidden>
Date: Wed Oct 31 16:43:09 2012 -0600

    Detect and fix issues caused by vol ID migration

    The migration from volume ID to UUID neglected to update the provider_location
    field on the volume. As a result the iqn and volume name no long match and
    existing volumes are no longer able to be attached after an upgrade
    (essex -> folsom and then nova-vol->cinder).

    This patch adds a method to the volume driver that will check for the
    mismatch of volume name in the iqn during service start up. If
    detected it will update the provider_location field in the database
    to include the new ID. Also it will create a symlink to the device backing
    file that also has the correct naming convention.

    Note: We don't disturb an connections that are currently attached.
    For this case we add a check in manager.detach and do any provider_location
    cleanup that's needed at that time. This ensures that connections
    persist on restarts of tgtd and reboot.

    Change-Id: I4683df4ef489972752dc58cb4e91d458a79a8ef2
    Fixes: bug 1065702enter the commit message for your changes. Lines starting

tags: added: in-stable-folsom

Reviewed: https://review.openstack.org/15005
Committed: http://github.com/openstack/nova/commit/e3d7f8c7de9bad73bf1f9b5ee9b2cf46eb452351
Submitter: Jenkins
Branch: stable/folsom

commit e3d7f8c7de9bad73bf1f9b5ee9b2cf46eb452351
Author: John Griffith <email address hidden>
Date: Mon Oct 29 17:57:40 2012 -0600

    Detect and fix issues caused by vol ID migration

    The migration from volume ID to UUID neglected to update the provider_location
    field on the volume. As a result the iqn and volume name no long match and
    existing volumes are no longer able to be attached after an upgrade
    (essex -> folsom and then nova-vol->cinder).

    This patch adds a method to the volume driver that will check for the
    mismatch of volume name in the iqn during service start up. If
    detected it will update the provider_location field in the database
    to include the new ID. Also it will create a symlink to the device backing
    file that also has the correct naming convention.

    Note: We don't disturb an connections that are currently attached.
    For this case we add a check in manager.detach and do any provider_location
    cleanup that's needed at that time. This ensures that connections
    persist on restarts of tgtd and reboot.

    Change-Id: Ib41ebdc849ebc31a9225bdc4902209c5a23da104
    Fixes: bug 1065702

Adam Gandelman (gandelman-a) wrote :

Cinder Fix in Raring as of 2013.1~g1~20121101.361-0ubuntu1

Changed in cinder (Ubuntu):
status: New → Fix Released
Changed in cinder (Ubuntu Quantal):
status: New → Confirmed
importance: Undecided → Critical
importance: Critical → High
Changed in nova (Ubuntu Quantal):
importance: Undecided → High
Mark McLoughlin (markmc) on 2012-11-01
Changed in nova:
status: In Progress → Invalid
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
tags: added: cloud-archive
Chuck Short (zulcss) on 2012-11-13
Changed in cloud-archive:
importance: Undecided → High
status: New → Confirmed
Thierry Carrez (ttx) on 2012-11-22
Changed in cinder:
status: Fix Committed → Fix Released
Changed in nova (Ubuntu Raring):
status: Confirmed → Fix Released

Hello Adam, or anyone else affected,

Accepted cinder into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/cinder/2012.2.1-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cinder (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Clint Byrum (clint-fewbar) wrote :

Hello Adam, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
Mark McLoughlin (markmc) on 2013-01-22
tags: removed: in-stable-folsom
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cinder - 2012.2.1-0ubuntu1

---------------
cinder (2012.2.1-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - Cinder should suggest ceph-common, not python-ceph (LP: #1065901):
      - debian/control: cinder-volume Suggests: python-ceph -> ceph-common
  * Resynchronize with stable/folsom (87d839a5) (LP: #1085255):
    - [f990ff0] Remove unused python-daemon dependency
    - [940f363] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [7f34ba3] After folsom upgrade, instances can no longer access existing
      volumes. (LP: #1065702)
    - [1c99b24] Jenkins jobs fail because of incompatibility between sqlalchemy-
      migrate and the newest sqlalchemy-0.8.0b1 (LP: #1073569)
    - [d12d4b6] Add SIGPIPE handler to subprocess execution in rootwrap and
      utils.execute (LP: #1053364)
    - [ce5e002] Set defaultbranch in .gitreview to stable/folsom
 -- Adam Gandelman <email address hidden> Tue, 04 Dec 2012 09:19:29 -0800

Changed in cinder (Ubuntu Quantal):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-04-04
Changed in cinder:
milestone: grizzly-1 → 2013.1
Changed in cloud-archive:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers