Volumes fail to attach without discovery using tgt

Bug #922232 reported by Adam Gandelman on 2012-01-26
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Vish Ishaya
nova (Ubuntu)
High
Adam Gandelman

Bug Description

I'm not sure if this is a general bug or specific to the tgt iscsi helper (will compare with ietd soon)

Attaching a volume to an instance results in the following errors in nova-compute.log on a libvirt compute node: http://paste.ubuntu.com/817916/

Manually walking thru the initiator commands on the compute node ends in the same results, mainly:

# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000001 -p 192.168.20.4:3260
iscsiadm: no records found!

It's not until I send a discovery to the target can I list any records, which (I assume) is the first step toward a successful volume attachment in nova.

root@test-03:~# iscsiadm -m discovery -t sendtargets -p 192.168.20.4:3260
192.168.20.4:3260,1 iqn.2010-10.org.openstack:volume-00000001
root@test-03:~# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000001 -p 192.168.20.4:3260
# BEGIN RECORD 2.0-871
node.name = iqn.2010-10.org.openstack:volume-00000001
node.tpgt = 1
node.startup = manual
iface.hwaddress = <empty>
iface.ipaddress = <empty>
iface.iscsi_ifacename = default
iface.net_ifacename = <empty>
iface.transport_name = tcp
iface.initiatorname = <empty>
node.discovery_address = 192.168.20.4
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = <empty>
node.session.auth.password = <empty>
node.session.auth.username_in = <empty>
node.session.auth.password_in = <empty>
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 192.168.20.4
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No
# END RECORD
root@test-03:~#

I have found code in nova to do this kind of discovery, but it appears deprecated in favor of passing the target location directly through nova's messaging layer. I'm wondering if discovery should be attempted as a fall-back if the initiator fails to find records directly.

Changed in nova:
assignee: nobody → Adam Gandelman (gandelman-a)
status: New → In Progress
Download full text (7.2 KiB)

Interesting. I didn't think discovery was necessary if you had the rest of the data already. Is there a way to work around this without using discovery? The fallback seems fine, but I was trying to deprecate discovery. I assume this is passing in devstack because the volume host is the same host as the compute host so the record exists.

On Jan 26, 2012, at 10:01 AM, Adam Gandelman wrote:

> Public bug reported:
>
> I'm not sure if this is a general bug or specific to the tgt iscsi
> helper (will compare with ietd soon)
>
> Attaching a volume to an instance results in the following errors in
> nova-compute.log on a libvirt compute node:
> http://paste.ubuntu.com/817916/
>
> Manually walking thru the initiator commands on the compute node ends in
> the same results, mainly:
>
> # iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000001 -p 192.168.20.4:3260
> iscsiadm: no records found!
>
> It's not until I send a discovery to the target can I list any records,
> which (I assume) is the first step toward a successful volume attachment
> in nova.
>
> root@test-03:~# iscsiadm -m discovery -t sendtargets -p 192.168.20.4:3260
> 192.168.20.4:3260,1 iqn.2010-10.org.openstack:volume-00000001
> root@test-03:~# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000001 -p 192.168.20.4:3260
> # BEGIN RECORD 2.0-871
> node.name = iqn.2010-10.org.openstack:volume-00000001
> node.tpgt = 1
> node.startup = manual
> iface.hwaddress = <empty>
> iface.ipaddress = <empty>
> iface.iscsi_ifacename = default
> iface.net_ifacename = <empty>
> iface.transport_name = tcp
> iface.initiatorname = <empty>
> node.discovery_address = 192.168.20.4
> node.discovery_port = 3260
> node.discovery_type = send_targets
> node.session.initial_cmdsn = 0
> node.session.initial_login_retry_max = 8
> node.session.xmit_thread_priority = -20
> node.session.cmds_max = 128
> node.session.queue_depth = 32
> node.session.auth.authmethod = None
> node.session.auth.username = <empty>
> node.session.auth.password = <empty>
> node.session.auth.username_in = <empty>
> node.session.auth.password_in = <empty>
> node.session.timeo.replacement_timeout = 120
> node.session.err_timeo.abort_timeout = 15
> node.session.err_timeo.lu_reset_timeout = 20
> node.session.err_timeo.host_reset_timeout = 60
> node.session.iscsi.FastAbort = Yes
> node.session.iscsi.InitialR2T = No
> node.session.iscsi.ImmediateData = Yes
> node.session.iscsi.FirstBurstLength = 262144
> node.session.iscsi.MaxBurstLength = 16776192
> node.session.iscsi.DefaultTime2Retain = 0
> node.session.iscsi.DefaultTime2Wait = 2
> node.session.iscsi.MaxConnections = 1
> node.session.iscsi.MaxOutstandingR2T = 1
> node.session.iscsi.ERL = 0
> node.conn[0].address = 192.168.20.4
> node.conn[0].port = 3260
> node.conn[0].startup = manual
> node.conn[0].tcp.window_size = 524288
> node.conn[0].tcp.type_of_service = 0
> node.conn[0].timeo.logout_timeout = 15
> node.conn[0].timeo.login_timeout = 15
> node.conn[0].timeo.auth_timeout = 45
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
> node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
> node.conn[0].iscsi.HeaderDigest = None
> node.conn[0].iscsi.D...

Read more...

Adam Gandelman (gandelman-a) wrote :

Vish-

I think this is the problem, from the libvirt volume driver:

        iscsi_properties = connection_info['data']
        try:
            # NOTE(vish): if we are on the same host as nova volume, the
            # discovery makes the target so we don't need to
            # run --op new
            self._run_iscsiadm(iscsi_properties, ())
        except exception.ProcessExecutionError:
            self._run_iscsiadm(iscsi_properties, ('--op', 'new'))

Unfortuantely, iscsiadm returns 0 if no records are found and an exception is never raised. Deploying a multi-node setup right now test fix..

Changed in nova (Ubuntu):
importance: Undecided → High
assignee: nobody → Adam Gandelman (gandelman-a)

Reviewed: https://review.openstack.org/3479
Committed: http://github.com/openstack/nova/commit/ea2c8c8b363dceb3c73be8f02f078d7b78b2c712
Submitter: Jenkins
Branch: master

commit ea2c8c8b363dceb3c73be8f02f078d7b78b2c712
Author: Adam Gandelman <email address hidden>
Date: Thu Jan 26 12:36:55 2012 -0800

    Fix multinode libvirt volume attachment lp #922232

    iscsiadm returns 0 if local db contains no target records. As a result,
    no exception is caught and no entry gets created (--op new) before continuing
    to login. Devstack/single-node users avoided this because, apparently, records
    are created in initiator db on target creations.

    Update: Address smokestack failures if err == None

    fixes bug #922232

    Change-Id: I39c3574b8d75ca32eba3716efc3b488e596fbaf6

Changed in nova:
status: In Progress → Fix Committed
Changed in nova (Ubuntu):
status: New → Fix Released
Roman Sokolkov (rsokolkov) wrote :

Hi! I have problem with this fix. I Use devstack on ubuntu 11.10. I need manually discover targets

(nova.rpc): TRACE: Traceback (most recent call last):
(nova.rpc): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 249, in _process_data
(nova.rpc): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/exception.py", line 126, in wrapped
(nova.rpc): TRACE: return f(*args, **kw)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 150, in decorated_function
(nova.rpc): TRACE: function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 173, in decorated_function
(nova.rpc): TRACE: self.add_instance_fault_from_exc(context, instance_uuid, e)
(nova.rpc): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc): TRACE: self.gen.next()
(nova.rpc): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 168, in decorated_function
(nova.rpc): TRACE: return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 1635, in attach_volume
(nova.rpc): TRACE: connector)
(nova.rpc): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc): TRACE: self.gen.next()
(nova.rpc): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 1627, in attach_volume
(nova.rpc): TRACE: mountpoint)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/exception.py", line 126, in wrapped
(nova.rpc): TRACE: return f(*args, **kw)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/virt/libvirt/connection.py", line 454, in attach_volume
(nova.rpc): TRACE: mount_device)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/virt/libvirt/connection.py", line 446, in volume_driver_method
(nova.rpc): TRACE: return method(connection_info, *args, **kwargs)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 112, in connect_volume
(nova.rpc): TRACE: (out, err) = self._run_iscsiadm(iscsi_properties, ())
(nova.rpc): TRACE: File "/opt/stack/nova/nova/virt/libvirt/volume.py", line 96, in _run_iscsiadm
(nova.rpc): TRACE: check_exit_code=check_exit_code)
(nova.rpc): TRACE: File "/opt/stack/nova/nova/utils.py", line 232, in execute
(nova.rpc): TRACE: cmd=' '.join(cmd))
(nova.rpc): TRACE: ProcessExecutionError: Unexpected error while running command.
(nova.rpc): TRACE: Command: sudo iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000009 -p 10.100.0.34:3260
(nova.rpc): TRACE: Exit code: 255
(nova.rpc): TRACE: Stdout: ''
(nova.rpc): TRACE: Stderr: 'iscsiadm: no records found!\n'
(nova.rpc): TRACE:

Vish Ishaya (vishvananda) wrote :

hmm, looks like that fix wasn't quite right. Let me try another one.

Changed in nova:
assignee: Adam Gandelman (gandelman-a) → Vish Ishaya (vishvananda)
status: Fix Committed → In Progress
Vish Ishaya (vishvananda) wrote :

Roman: can you see if the above fix works for you?

Brian Waldon (bcwaldon) on 2012-02-08
Changed in nova:
milestone: none → essex-4
importance: Undecided → Medium

Reviewed: https://review.openstack.org/3865
Committed: http://github.com/openstack/nova/commit/a933e3628ba8cc2fb985665a724799ee0a58aa16
Submitter: Jenkins
Branch: master

commit a933e3628ba8cc2fb985665a724799ee0a58aa16
Author: Vishvananda Ishaya <email address hidden>
Date: Tue Feb 7 11:23:59 2012 -0800

    Check return code instead of output for iscsiadm

     * iscsiadm returns 255 on no records
     * Refixes bug 922232

    Change-Id: If177c3c79c6ad974c2bed0ad72a62e956af451e0

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-02-29
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-4 → 2012.1
Marcelo Dieder (mdieder) wrote :

I have the same problem. The fix its ok?

In version Essex (Ubuntu 12.04), i create a new node compute (multinode), and add the volume to a instance.

I have the error in the node compute (/var/log/nova/nova-compute.log):

Stderr: 'iscsiadm: No portal found.\n', and manually:

# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 192.168.56.2:3260 --rescan
iscsiadm: No portal found.

But if i make a discovery...

root@cloud02:~# iscsiadm -m discovery -t st -p 192.168.56.2 192.168.156.66:3260,1 iqn.2010-10.org.openstack:volume-00000007
192.168.56.2:3260,1 iqn.2010-10.org.openstack:volume-00000006
10.1.0.4:3260,1 iqn.2010-10.org.openstack:volume-00000006
10.0.2.15:3260,1 iqn.2010-10.org.openstack:volume-00000006
192.168.122.1:3260,1 iqn.2010-10.org.openstack:volume-00000006
169.254.169.254:3260,1 iqn.2010-10.org.openstack:volume-00000006
192.168.56.2:3260,1 iqn.2010-10.org.openstack:volume-00000007
10.1.0.4:3260,1 iqn.2010-10.org.openstack:volume-00000007
10.0.2.15:3260,1 iqn.2010-10.org.openstack:volume-00000007
192.168.122.1:3260,1 iqn.2010-10.org.openstack:volume-00000007
169.254.169.254:3260,1 iqn.2010-10.org.openstack:volume-00000007

and then a login:

root@cloud02:~# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 192.168.56.2:3260 --login
Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-00000007, portal: 192.168.56.2,3260]
Login to [iface: default, target: iqn.2010-10.org.openstack:volume-00000007, portal: 192.168.56.2,3260]: successful

and rescan:

root@cloud02:~# iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 192.168.56.2:3260 --rescan
Rescanning session [sid: 1, target: iqn.2010-10.org.openstack:volume-00000007, portal: 192.168.56.2,3260]

and now the instance attached volume!

it seems that the activation of a new node login is not automatic?

Vish Ishaya (vishvananda) wrote :

Do you have iscsi_ip_address set on the volume host? It needs to be an ip that the compute hosts can hit. When the volume is created this ip is stored for connection information in the database, so if you don't have it set, you will probably need to set it, restart nova-volume and recreate volumes before attaching.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers