Equallogic driver and multi-host access

Bug #1296677 reported by Kalle Happonen on 2014-03-24
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Rajini Karthik

Bug Description

The eqlx volume driver currently has no setting for multi-host access when creating volumes.

By default when creating a volume on an Equallogic box, the multi-host access is set to false. This seems to break live migrations.

When a VM is live migrated, the target machine tries to login to the iscsi target, but gets a "Not Authorized" since the source machine is still logged into the same volume. If multi-host access is enabled live migrations work.

Mike Perez (thingee) on 2014-03-25
tags: added: drivers eqlx
Kalle Happonen (kalle-happonen) wrote :

Related to this, there is another related bug.

In eqlx.py in the terminate_connection function it runs

self._eql_execute('volume', 'select', volume['name'],'access', 'delete', '1')

The '1' is hardcoded and is incorrect after migrations. It should be the rule ID that matches the initiator.

This bug makes the elqx driver completely break on (live) mgirations.

Kalle Happonen (kalle-happonen) wrote :

Sorry, to clarify, live migrations themselves work, but after one live migration all volume tasks break (since it tried to delete the non-existing rule 1).

Fix proposed to branch: master
Review: https://review.openstack.org/103647

Changed in cinder:
assignee: nobody → Rajini Ram (rajini-ram)
status: New → In Progress
Rajini Karthik (rajini-karthik) wrote :

The proposed fix also takes care of the deleting the correct access record by matching the ID to the initiator in the terminate_connection. Please review the fix.

Kalle Happonen (kalle-happonen) wrote :

Fix reviewed. The patched code worked in my tests. Thanks.

Reviewed: https://review.openstack.org/103647
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=2339351c04e0f99897ca0365343066b0f4065378
Submitter: Jenkins
Branch: master

commit 2339351c04e0f99897ca0365343066b0f4065378
Author: rajinir <email address hidden>
Date: Fri Jun 27 14:53:07 2014 -0500

    Fixes EqualLogic volume live migration.

    Fixes the issue by enabling the multihost flag for the volume and also
    discovering the correct access record to delete when terminating
    the connection from the source vm and then deleting the record.

    Change-Id: If3580c84a4efd3a58c19e9e74d0a13eb68e67031
    Closes-Bug: 1296677

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → juno-2
status: Fix Committed → Fix Released
Tzach Shefi (tshefi) wrote :

Verified, Please review process as wasn't sure which live migration to use.
RHEL7
python-cinder-2014.1.2-2.el7ost.noarch
openstack-cinder-2014.1.2-2.el7ost.noarch
python-cinderclient-1.0.9-1.el7ost.noarch

1. Packstack deployed AIO+a second compute node
2. Configured AIO /var/lib/nova/instance as shared NFS storage for live migration.
3. Mapped second compute's nova folder to NFS share on AIO.
4. Configured needed live migration settings under Nova.conf
5. Tested live migration with Cirros instance.
6. Configured Cinder driver for Dell EQL PS6000 ISCSI storage
7. Created empty Cinder volume, checked on Dell volume was created.
8. Attached this new volume to instance on AIO compute host.
9. Noticed a single ISCSI connection on Dell to this volume.
10.Live migrated instance with attached volume to second compute node.
11.During migration noticed Dell volume info showed two ISCSI connections.
12.Migration completed successfully, cinder volume remined attached.
13.Checked volume info again Dell again shows only one ISCSI connection.

Hope this verifies bug ok.

BTW when I tried to create a new volume from a Glance image I ran into problems, opening a new bug about it.

Kalle Happonen (kalle-happonen) wrote :

For problems with creating a volume from Glance, this might be related
https://bugs.launchpad.net/cinder/+bug/1323065

Also your review would have triggred the original bug, so the review confirms that the bug is fixed.

As per comment #1, there was another bug in the code. This bug was triggered on the second migration of a volume, because of a wrong assumption on acl numbering.

The patch has a fix for this too, and I have verified the fix on our system. For sake of compleneness, to test for both bugs, do two live migrations instead of one.

Thierry Carrez (ttx) on 2014-10-16
Changed in cinder:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers