Cinder doesn't restore LIO target ACLs for target after Cinder node reboots

Bug #1536248 reported by Oleg Borisenko
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Mitsuhiro Tanino

Bug Description

The problem:

If Cinder node reboots or crashes, Cinder doesn't restore LIO target ACL rules for existing targets. That leads to:
1) On Cinder node targetcli reports for example:
o- iqn.2010-10.org.openstack:volume-9b86b9d7-6db6-4a14-afac-a4bd6f1cc6c5 ............................................... [1 TPG]
  | o- tpgt1 ........................................................................................................... [enabled]
  | o- acls ........................................................................................................... [0 ACLs]
  | o- luns ............................................................................................................ [1 LUN]
  | | o- lun0 [iblock/iqn.2010-10.org.openstack:volume-9b86b9d7-6db6-4a14-afac-a4bd6f1cc6c5 (/dev/mediumgroup/volume-9b86b9d7-6db6-4a14-afac-a4bd6f1cc6c5)]
  | o- portals ...................................................................................................... [1 Portal]
  | o- 10.10.21.32:3260 .................................................................................. [OK, iser disabled]

There is expected [1 ACL] instead of [0 ACL] because that volume was in-use at the moment of reboot.

In syslog also there is repeated each second:
iSCSI Initiator Node: iqn.2015-12.ru.ispras.cloud:01:a34486dd48c is not authorized to access iSCSI target portal group

2) On Compute node which hosts this VM, you can see in syslog:
iscsid: conn 0 login rejected: initiator failed authorization with target

3) On VM that uses the volume you can see IO error on every try to use the volume.

The environment:
Openstack multinode setup; each component is setup from git branch stable/liberty (cinder last commit is 6d0981b252835a525149714e39cc1e4ee7568315).

Cinder host: Ubuntu 14.04, python 2.7.6, Cinder from stable/liberty (6d0981b252835a525149714e39cc1e4ee7568315), kernel 3.19.0-47-generic, targetcli 2.1-1 from repo, python-rtslib 2.2-1 from repo.

Config part for these volumes:
[fast-1]
volume_group=fastgroup
volume_driver=cinder.volume.drivers.lvm.LVMVolumeDriver
volume_backend_name=LVM_iSCSI_fast
iscsi_protocol = iscsi
iscsi_helper = lioadm

The way to reproduce:
1) Install and setup Cinder from stable/liberty github version with config [fast-1] part and LIO
2) Create Volume and attach it to any VM.
3) Simulate 'crash' - reboot the Cinder node without stopping VM.
4) After reboot start Cinder and look for errors and missing ACL (Volume is not usable anymore).

The clues:
1) After Volume attachment /etc/target/saveconfig.json seems to be correct. After reboot and Cinder restart it becomes incorrect (looses ACLs).
2) May be connected to patches:
the first one - https://github.com/openstack/cinder/commit/6686b896d7ef5e11f1f277ded008d153edc183a8 and the second one - https://github.com/openstack/cinder/commit/706878deaaa5acbb8a577942a5a774c58fe16332

The workaround:
Nethertheless there is a workaround (but it seems painful). If you use "sudo targetctl save backup/before-reboot.json" and "sudo targetctl restore backup/before-reboot.json" (TARGETCTL - just in case - I mean the tool that is in Cinder itself), everything seems to be restored correctly. But it doesn't save us from Cinder node crashes and it should be done with crontab or something (by time not by Cinder actual events)

Targetcli configs (IMPORTANT):
1) Just right after Volume creation and attachment - https://gist.github.com/anonymous/5f1295270c36d19df9d7
2) After Cinder reboot and restart - https://gist.github.com/anonymous/0f110ce214359e41b871
3) Diff between them (that's the actual bug occurance) - https://gist.github.com/anonymous/bc41f4e51b46e8426c79

NOTE: in all the LIO configs there are already broken volumes, so just look at the diff-file first.

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Let me try to reproduce and investigate.

Changed in cinder:
assignee: nobody → Mitsuhiro Tanino (mitsuhiro-tanino)
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

From my investigation, cinder-rtstool has a bug which the tool can not set user name and password which are passed from command-line.

When we restart cinder service, ensure_export() recreate iSCSI target using original user ID and Password which are stored in the DB. In this case, I confirmed that ensure_export() properly passed original user ID and Password to create_iscsi_target() in lio.py, but the created iSCSI target doesn't have ACL anymore.

After recreating iSCSI target, this configuration is saved to /etc/targetcli/saveconfig.json, therefore, this configuration doesn't have ACL information.

I will try to debug cinder-rtstool.

Changed in cinder:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

From my further investigation, this is not a bug but we need to know recovery steps for c-vol node crash with lioadm.

From the commit massage in https://github.com/openstack/cinder/commit/706878deaaa5acbb8a577942a5a774c58fe16332,
I think RHEL has a "target.service" and this service restore /etc/targetcli/saveconfig.json after rebooting the server. In this case, I guess LIO iSCSI configuration will be restored automatically.

If you use non RHEL OS, we need to restore existing LIO configuration by hand(or add this to boot script).
Cinder has a function to save LIO iSCSI configuration automatically to the /etc/targetcli/saveconfig.json by default.
We can use this.

So right recovery steps after c-vol node crash are;
(1) Reboot the server
(2) Restore /etc/targetcli/saveconfig.json
       sudo targetctl restore /etc/targetcli/saveconfig.json
(3) Start c-vol service

And then, iSCSI connection from iscsi client will be recovered.

Changed in cinder:
status: Confirmed → Won't Fix
importance: Medium → Undecided
status: Won't Fix → Invalid
Revision history for this message
Oleg Borisenko (al-foo) wrote :

Mitsuhiro, Are you sure that that's not a bug? The solution you describe I have already noted in the part "The workaround".

But it seems to be rather dangerous: if anyone forgets to restore the config by hand, Cinder restores "wrong" backup and overwrites the "correct" one (which is really not good way). Wouldn't it be more intelligent to restore the "correct" backup initially?

Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Thank you for your reply. Let me consider again.

Changed in cinder:
status: Invalid → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/271424

Changed in cinder:
status: New → In Progress
Revision history for this message
Mitsuhiro Tanino (mitsuhiro-tanino) wrote :

Could you try whether the patch works for you?

Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → High
Revision history for this message
Oleg Borisenko (al-foo) wrote :

Thank you for the patch, the code seems to be ok. Unfortunately I will be able to try it only on Monday, I'll write as soon as it would be possible

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/271424
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5cec4056eb061004b400c1dc5b946bf890b4bab0
Submitter: Jenkins
Branch: master

commit 5cec4056eb061004b400c1dc5b946bf890b4bab0
Author: Mitsuhiro Tanino <email address hidden>
Date: Fri Jan 22 11:31:25 2016 -0500

    [LVM] Restore target config during ensure_export

    If server crash or reboot happened, LIO target configuration
    will be initialized after boot up a server. Currently, Cinder
    has a functionality to save LIO target configuration to save
    file at several checkpoint.

    If LIO target service is configured properly, LIO target
    configuration is restored by this service during boot up
    a server, but if not, existing in-use volumes would become
    inconsistent status after c-vol service starts.

    If there is no iSCSI target configuration during
    ensure_export, LIO dirver should restore the saved
    configuration file to avoid the problem.

    Closes-Bug: #1536248
    Change-Id: I74d300ba26a08b6f423f5ed3e13495b73cfbbd52

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/283085

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/cinder 8.0.0.0b3

This issue was fixed in the openstack/cinder 8.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/liberty)

Reviewed: https://review.openstack.org/283085
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=f5ec1be2d8bdecb0b94bfb298b03bb1e49ee8cc4
Submitter: Jenkins
Branch: stable/liberty

commit f5ec1be2d8bdecb0b94bfb298b03bb1e49ee8cc4
Author: Mitsuhiro Tanino <email address hidden>
Date: Fri Jan 22 11:31:25 2016 -0500

    [LVM] Restore target config during ensure_export

    If server crash or reboot happened, LIO target configuration
    will be initialized after boot up a server. Currently, Cinder
    has a functionality to save LIO target configuration to save
    file at several checkpoint.

    If LIO target service is configured properly, LIO target
    configuration is restored by this service during boot up
    a server, but if not, existing in-use volumes would become
    inconsistent status after c-vol service starts.

    If there is no iSCSI target configuration during
    ensure_export, LIO dirver should restore the saved
    configuration file to avoid the problem.

    Closes-Bug: #1536248
    Change-Id: I74d300ba26a08b6f423f5ed3e13495b73cfbbd52
    (cherry picked from commit 5cec4056eb061004b400c1dc5b946bf890b4bab0)

tags: added: in-stable-liberty
Revision history for this message
Oleg Borisenko (al-foo) wrote :

Thank you Mitsuhiro!

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/cinder 7.0.2

This issue was fixed in the openstack/cinder 7.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/cinder 7.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.