verify_iscsi_connection causes 'No such file or directory'...

Bug #1417307 reported by Dan Prince
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Critical
Chris Krelle

Bug Description

As of 46067160e6787631fb5a55193a02794449f22ebf I now get iscsi detection errors on all of my deployments. I'm seeing the following from my openstack-ironic-conductor logs on Fedora 20:

-- Logs begin at Mon 2015-02-02 20:38:20 UTC, end at Mon 2015-02-02 22:43:09 UTC. --
Feb 02 20:39:30 localhost systemd[1]: Starting OpenStack Ironic Conductor service...
Feb 02 20:39:30 localhost systemd[1]: Started OpenStack Ironic Conductor service.
Feb 02 20:40:20 localhost ironic-conductor[4139]: 2015-02-02 20:40:20.383 4139 WARNING ironic.conductor.utils [-] Not going to change_node_power_state because current state = requested state = 'power off'.
Feb 02 20:44:04 localhost sudo[5621]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m discovery -t st -p 172.19.0.3:3260
Feb 02 20:44:04 localhost sudo[5637]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m node -p 172.19.0.3:3260 -T iqn-c30a1850-384a-48a8-a550-2bdacbb48cdd --login
Feb 02 20:44:05 localhost sudo[5649]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m node -S
Feb 02 20:44:05 localhost sudo[5654]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m node -T iqn-c30a1850-384a-48a8-a550-2bdacbb48cdd -R
Feb 02 20:44:05 localhost sudo[5658]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m node -p 172.19.0.3:3260 -T iqn-c30a1850-384a-48a8-a550-2bdacbb48cdd --logout
Feb 02 20:44:06 localhost sudo[5661]: ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m node -p 172.19.0.3:3260 -T iqn-c30a1850-384a-48a8-a550-2bdacbb48cdd -o delete
Feb 02 20:44:06 localhost ironic-conductor[4139]: 2015-02-02 20:44:06.113 4139 ERROR ironic.drivers.modules.iscsi_deploy [-] Deploy failed for instance 1ee26037-a284-47c1-9fc2-c1c61d362f6e. Error: [Errno 2] No such file or directory: '/dev/disk/by-path/ip-172.19.0.3:3260-iscsi-iqn-c30a1850-384a-48a8-a550-2bdacbb48cdd-lun-1'

----

I've confirmed that reverting this single commit fixes me issues.

Dan Prince (dan-prince)
Changed in ironic:
assignee: nobody → Dan Prince (dan-prince)
importance: Undecided → Critical
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/152328

Revision history for this message
Dan Prince (dan-prince) wrote :
Revision history for this message
Dan Prince (dan-prince) wrote :

Yeah. So actually just adding back in the sleep fixes everything. So it appears the new verification functions aren't quite doing what we need yet... at least not on Fedora.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/152328
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=53a93ad49d5ffcd1ae76747ca826676e4c4ef5cb
Submitter: Jenkins
Branch: master

commit 53a93ad49d5ffcd1ae76747ca826676e4c4ef5cb
Author: Dan Prince <email address hidden>
Date: Mon Feb 2 17:58:05 2015 -0500

    Partial revert of 4606716 until we debug further

    This patch partially reverts 46067160e6787631fb5a55193a02794449f22ebf
    which breaks deployment on some distros (Fedora). Using this
    patch fixes deployment when trying to deploy on a real baremetal
    undercloud using Fedora (my dev environment).

    Not doing a full revert because this patch was large and it now
    causes conflicts to revert cleanly.

    Partial bug: 1417307

    Change-Id: I2a4f5c4f6ef22092ca97734d02b54838f38b0602

Revision history for this message
Chris Krelle (nobodycam) wrote :

Dan, while poking around I found this old Nova bare metal patch from Derek Higgins ( https://review.openstack.org/#/c/121155 ). This patch sets the sleep time to 10 seconds. so I was wondering if the revert to 3 seconds is enough. this is the exact same code as was in nova bare metal.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/152734

Changed in ironic:
assignee: Dan Prince (dan-prince) → Chris Krelle (nobodycam)
aeva black (tenbrae)
Changed in ironic:
milestone: none → kilo-2
Thierry Carrez (ttx)
Changed in ironic:
milestone: kilo-2 → kilo-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/152734
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=235641fc4c0bb2864bc9b97553865dac3c4b6b1d
Submitter: Jenkins
Branch: master

commit 235641fc4c0bb2864bc9b97553865dac3c4b6b1d
Author: Chris Krelle <email address hidden>
Date: Wed Feb 4 09:33:38 2015 -0800

    improve iSCSI connection check

    This patch improves the iSCSI connection verification by polling
    /dev/disk/by-path for the iSCSI connection. This should ensure that
    the iSCSI block device is seen by the file system after login.

    It also removes the hard coded sleep in the login method.

    Change-Id: Ibf6b6a15471a6130f33788b8c3f26d4577bf79a3
    Closes-bug: #1417307

Changed in ironic:
status: In Progress → Fix Committed
Revision history for this message
Chris Krelle (nobodycam) wrote :

I have encountered this error in again testing. After debugging it was found the "is_bulk_device" function in deploy_utils was causing the error. wrapping the os.stat in retry logic fixed the error in testing.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/156888

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/156888
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=271c773e7474c7e8e84e2def92afe17a5c2c3b3c
Submitter: Jenkins
Branch: master

commit 271c773e7474c7e8e84e2def92afe17a5c2c3b3c
Author: Chris Krelle <email address hidden>
Date: Tue Feb 17 21:28:03 2015 -0800

    add retry logic to is_block_device function

    In testing the os.stat command was throwing a unhandled exception.
    This Patch adds a try block around the os.stat command and if unable
    to execute with out error in "CONF.deploy.iscsi_verify_attempts"
    it rasies InstanceDeployFailure

    Change-Id: Ibf5483435b02fab64bd4f0d368326b9ecbb4cc07
    Closes-bug: #1417307

Thierry Carrez (ttx)
Changed in ironic:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ironic:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.