Cinder

Failed to iscsi login to 3PAR after controller restart

Bug #1770611 reported by Pedro Rubio on 2018-05-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Incomplete	Undecided	Unassigned

Bug Description

Env:
Helion OpenStack 4.0 (Mitaka)
3PAR
<snip>
        3.0.10 - Remove metadata that tracks the instance ID. bug #1572665
        3.99.10.1 - _create_3par_iscsi_host() now accepts iscsi_iqn as list
                 only. Bug #1590180
        3.99.10.2 - Added entry point tracing
        3.99.10.3 - Handling HTTP conflict 409, host WWN/iSCSI name already
                used by another host, while creating 3PAR iSCSI Host.
                bug #1642945
        3.99.10.4 - Fix snapCPG error during backup of attached volume.

Also recently applied:

- Update CHAP on host record when volume is migrated to new compute host. bug # 1737181
https://review.openstack.org/#/c/531669/4
(fixed for Newton, Ocata and Peak)

- 3PAR: Get host from os-brick bug #1690244
https://review.openstack.org/#/c/482103/
(fixed for Newton, Ocata)

Into the following scenarios we found that iscsi session fails intermintently to login back due to authentication failure:
"iscsiadm: Could not login to [iface:default, target: <target>, portal: <portal>].
iscsiadm: initiation reported error (24 - iSCSI login failed due to authorization failure)"

Into 3PAR (showevents) we can see:
"CHAP initiation auth.: authentication of <initiator> failed (wrong secret?)"

Scenario 1.- 3Par controller port restarted

Scenario 2.- compute node restart (i.e. after maintenance or evacuate)

This issue was identified as CHAP secret been updated into 3PAR for compute node not maching the one stored into compute node itself (/etc/iscsi/nodes/*/*)

Workaround:
1.- Connect to compute node to get CHAP secret for the 3PAR node to be used
$ grep pass /etc/iscsi/nodes/*/*
2.- Connect to 3PAR and update CHAP secret for compute node using the secret from 1.-
$ sethost initchap -chapname ha-volume-manager <secre> <compute-node>

I couldn´t find the way to reproduce the issue but it seems that it could be related to some driver operations updating that CHAP secret from cinder DB.

While checking DB I found that there are some active volumes keeping "wrong" (not maching the one into compute node) CHAP into their information (provider_auth).

I wonder if this is related to one of the issues already fixed (bug # 1737181) and beyond the fix some extra cleanup on DB is required to avoid not synced CHAP secrets from been re-used.

I also know that one of the operations setting CHAP for a host is host definion creation so that´s something that could explain Scenario 2; when a compute has no volume presented (i.e. host evacuate - all instances migrated) host definition is removed on 3PAR, once node is back running and tries to create new iscsi sessions it fails because host definition was created using a wrong secret.

Regarding Scenario 1, there should be some operation updating CHAP, not affecting the current sessions until a re-login is required for any reason (i.e. controller port restart)

If I find a way to reproduce it I would include more details.

Should you need anything else just let me know.

Tags:

Sean McGinnis (sean-mcginnis) on 2018-05-11

tags:

added: 3par drivers hpe

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-05-16:

Hi Pedro,

please find the response on your query:
>>> The problem here is that I couldn´t reproduce it yet but to me it´s important to know if somehow invalid CHAPs in DB can be used to update host definition and secret on 3PAR. Could you answer this question? If this is the case, do we need to perform some update on cinder DB to avoid this issue?

HPE 3par driver do not directly update the cinder.volumes table with CHAP secret, its the cinder layer who sits on the top of the hpe 3par drivers updates the cinder DB. and this is what recommended

from driver, create_export() is responsible for updation of CHAP secret in cinder.volumes table

Since you already have vluns created in 3par with CHAP secret key, hpe3par driver.create_export(), will fetch the same secret from existing vlun and that secret will be set for newly created ones.

There might be some operation done, which triggers the bug #1737181 & #1690244 and due to which the CHAP is not/incorrectly set in 3par, and this is not sync with create_export()

As you have already tried applpying the patch https://review.openstack.org/#/c/531669/4 and https://review.openstack.org/#/c/482103/, but it didn't worked.

I suggest to remove all the vluns from 3par for any ONE specific host and then apply both the patches, restart the services and then perform the same operation from using that host

Revision history for this message

Pedro Rubio (prubio) wrote on 2018-07-27:

Hi Vivek,
Finally we confirmed that issue got fixed after patches were installed. Some fixes had to be done on compute side to update properly some CHAP credentials manually modified.
After some time working with patches installed the new CHAP name scheme (using hostname as username instead of ha-volume-managed) is applied and populated properly. Still some computes contain old info until they are restarted or all instances live-migrated out of them.
You can close this launchpad.
Regards,
Pedro

Eric Harney (eharney) on 2019-07-01

Changed in cinder:
status:	New → Incomplete

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.