failing to mount iSCSI path with first volume

Bug #1969087 reported by DUFOUR Olivier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-os-brick (Ubuntu)
New
Undecided
Unassigned

Bug Description

On a customer deployment, on focal-ussuri, with iSCSI backends and multipath enabled we face an issue where iscsiadm will fail to mount one the path of an iSCSI volume with the following error :
"iscsiadm: Could not make /etc/iscsi/nodes: File exists\niscsiadm: Error while adding record: encountered iSCSI database failure"
(see nova-compute.log for more details)

In term of impact more exactly, all the servers mounting an iSCSI volume for the first time will fail "silently", as the end-user won't be aware, to mount the first path of the iSCSI target.

I noticed the iscsi database error happens solely on the first iSCSI volume to be mounted on each involved server in the deployment like units running cinder-volume and nova-compute services.
After some investigations, this appears to be a race coundition with os-brick and iscsid daemon.
After a deployment or a reboot, iscsid isn't started and os-brick tries too quickly to mount the first path of the iSCSI volume before iscsid has finished to initialise thus leading to the error we see in cinder-volume or nova-compute logs.
If iscsid is manually started on the server before the error just simply disappears and the target paths are all mounted properly on the first volume.

Here is an example of the processes running on a nova-compute :
# before an instance creation with the first iSCSI volume
ubuntu@nova-compute:~$ ps aux | grep iscsi
ubuntu 3705821 0.0 0.0 6304 2624 pts/1 S+ 14:18 0:00 grep --color=auto iscsi
# after the instance creation
ubuntu@nova-compute:~$ ps aux | grep iscsi
root 3707866 0.0 0.0 5108 248 ? Ss 14:21 0:00 /sbin/iscsid
root 3707867 0.0 0.0 5964 5816 ? S<Ls 14:21 0:00 /sbin/iscsid
root 3707869 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_eh]
root 3707878 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_q_1]
ubuntu 3708321 0.0 0.0 6436 2524 pts/1 S+ 14:21 0:00 grep --color=auto iscsi

To avoid the first issue of iscsiadm encountering the database error, the current workaround I found for now is simply to start and enable iscsid on every cinder-volume and nova-compute units before mounting any iSCSI volume.

Looking more in depth, this issue is also mentionned in this ticket on os-brick #1944474 , where they have implemented a retry mecanism to try again to mount the path if iscsiadm returns the database failure error code (6).

Would it be possible either to backport the fix from #1944474 to the package and/or to see if it's feasible to start iscsid beforehand through a charm configuration ?

Revision history for this message
DUFOUR Olivier (odufourc) wrote :
summary: - os-brick failing to mount iSCSI path
+ failing to mount iSCSI path with first volume
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.