[bionic][azure] fence_scsi unable to unfence from another node

Bug #1864419 reported by Rafael David Tinoco
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fence-agents (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned

Bug Description

When playing with fence_scsi in Azure's new shared disk feature, I discovered that I'm able to register a key and acquire a reservation using that key for the host I have generated the key to. BUT, when trying to unregister a key from another host I get errors.

With that problem, I can't use fence_scsi agent for fencing in Microsoft Azure.

-------------------------------

rafaeldtinoco@clubionic01:~$ cat /etc/fence_scsi.key
3abe0000

#### Registering a node into the shared disk manually:

rafaeldtinoco@clubionic01:~$ sudo fence_scsi --verbose -n 10.250.3.10 -d /dev/sdc -k 3abe0000 -D debug.log -o on
Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x8, there are NO registered reservation keys
No registration for key 3abe0000 on device /dev/sdc

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -o -I -S 3abe0000 -d /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -r -d /dev/sdc
0 PR generation=0x9, there is NO reservation held

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -o -R -T 5 -K 3abe0000 -d /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Success: Powered ON

#### Trying to un-register from the same node HAS TO FAIL (fence_scsi does not allow it)

rafaeldtinoco@clubionic01:~$ sudo fence_scsi --verbose -n 10.250.3.10 -d /dev/sdc -k 3abe0000 -D debug.log -o off
Delay 0 second(s) before logging in to the fence device

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Failed: keys cannot be same. You can not fence yourself.

#### Trying to un-register from another node (has to succeed, that is the fencing purpose!)

rafaeldtinoco@clubionic02:~$ sudo fence_scsi --verbose -n 10.250.3.10 -d /dev/sdc -k 3abe0000 -D debug.log -o off
Delay 0 second(s) before logging in to the fence device

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -o -A -T 5 -K 62ed0001 -S 3abe0000 -d /dev/sdc
99 persistent reserve out: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

PR out (Preempt and abort): Sense category: -1, try '-v' option for more information

Executing: /usr/bin/sg_turs /dev/sdc
0

Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

Failed to remove key 3abe0000 on device /dev/sdc

Failed to verify 1 device(s)

-------------------------

And I have realized that, in the SAME node (as the key in place), I'm able to release and unregister a node's key:

rafaeldtinoco@clubionic01:~$ sudo sg_persist --out --release --param-rk=0x3abe0000 --prout-type=5 /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk

rafaeldtinoco@clubionic01:~$ sudo sg_persist --out --register --param-rk=0x3abe0000 --prout-type=5 /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk

-------------------------

But in a different node, after the reservation is set in a different node, I CANNOT release and unregister the node'skey:

rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
  PR generation=0x9, 1 registered reservation key follows:
    0x3abe0000

rafaeldtinoco@clubionic02:~$ sudo sg_persist --out --release --param-rk=0x3abe0000 --prout-type=5 /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
persistent reserve out: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

rafaeldtinoco@clubionic02:~$ sudo sg_persist --out --register --device=/dev/sdc --param-rk=0x3abe0000
  Msft Virtual Disk 1.0
  Peripheral device type: disk
persistent reserve out: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

Changed in fence-agents (Ubuntu):
importance: Undecided → High
importance: High → Undecided
Changed in fence-agents (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Creating an easier reproducer:

Node 01:

rafaeldtinoco@clubionic01:~$ sudo sg_persist --out --register --param-sark=3abe0000 /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk

rafaeldtinoco@clubionic01:~$ sudo sg_persist --out --reserve --param-rk=3abe0000 --prout-type=5 /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk

rafaeldtinoco@clubionic01:~$ sudo sg_persist -r /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
  PR generation=0xb, Reservation follows:
    Key=0x3abe0000
    scope: LU_SCOPE, type: Write Exclusive, registrants only

----

Node 02: (has to be able to remove node's 01 reservation):

rafaeldtinoco@clubionic02:~$ sudo sg_persist -v --out --release --param-rk=3abe0000 --prout-type=5 /dev/sdc
    inquiry cdb: 12 00 00 00 24 00
  Msft Virtual Disk 1.0
  Peripheral device type: disk
    Persistent Reservation Out cmd: 5f 02 05 00 00 00 00 00 18 00
persistent reserve out: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

PR out (Release): Sense category: -1

We get a sense out of the storage server.

----

But if we try to remove the reservation from the same node it works:

rafaeldtinoco@clubionic01:~$ sudo sg_persist -v --out --release --param-rk=3abe0000 --prout-type=5 /dev/sdc
    inquiry cdb: 12 00 00 00 24 00
  Msft Virtual Disk 1.0
  Peripheral device type: disk
    Persistent Reservation Out cmd: 5f 02 05 00 00 00 00 00 18 00
PR out: command (Release) successful

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I have tried in another environment, using LIO as storage server, and Focal (because I only had Focal available) and I was able to release all keys from the shared disk:

(k)rafaeldtinoco@clufocal01:~$ sudo sg_persist -v --out --release --param-rk=0xbf090000 --prout-type=5 /dev/sda
    inquiry cdb: 12 00 00 00 24 00
  LIO-ORG cluster.focal.t 4.0
  Peripheral device type: disk
    Persistent reservation out cdb: 5f 02 05 00 00 00 00 00 18 00
PR out: command (Release) successful

(k)rafaeldtinoco@clufocal01:~$ sudo sg_persist -v --out --release --param-rk=0xbf090001 --prout-type=5 /dev/sda
    inquiry cdb: 12 00 00 00 24 00
  LIO-ORG cluster.focal.t 4.0
  Peripheral device type: disk
    Persistent reservation out cdb: 5f 02 05 00 00 00 00 00 18 00
PR out: command (Release) successful

(k)rafaeldtinoco@clufocal01:~$ sudo sg_persist -v --out --release --param-rk=0xbf090002 --prout-type=5 /dev/sda
    inquiry cdb: 12 00 00 00 24 00
  LIO-ORG cluster.focal.t 4.0
  Peripheral device type: disk
    Persistent reservation out cdb: 5f 02 05 00 00 00 00 00 18 00
PR out: command (Release) successful

Note that the SCSI CMD is pretty much the same to the faulty one in Azure environment.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (8.0 KiB)

@vybava,

I just realized the bug description has the exact command sequence you need. But I'm putting again in this comment to make things easier. Please follow the order that was written bellow:

##
## NODE #1
##

# tell "fence_scsi" agent that node 1 is "on"

$ sudo fence_scsi --verbose -n 10.250.3.10 -d /dev/sda -k 123abc -o on

2020-02-24 18:37:01,701 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,705 DEBUG: 0
2020-02-24 18:37:01,706 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,709 DEBUG: 0
2020-02-24 18:37:01,709 INFO: Executing: /usr/bin/sg_persist -n -i -k -d /dev/sda
2020-02-24 18:37:01,712 DEBUG: 0 PR generation=0x2, there are NO registered reservation keys
2020-02-24 18:37:01,714 DEBUG: No registration for key 123abc on device /dev/sda
2020-02-24 18:37:01,714 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,718 DEBUG: 0
2020-02-24 18:37:01,718 INFO: Executing: /usr/bin/sg_persist -n -o -I -S 123abc -d /dev/sda
2020-02-24 18:37:01,723 DEBUG: 2 PR out (Register and ignore existing key): Device not ready
  sg_persist failed: Device not ready
2020-02-24 18:37:01,724 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,727 DEBUG: 0
2020-02-24 18:37:01,727 INFO: Executing: /usr/bin/sg_persist -n -i -k -d /dev/sda
2020-02-24 18:37:01,731 DEBUG: 0 PR generation=0x3, 1 registered reservation key follows: 0x123abc
2020-02-24 18:37:01,732 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,735 DEBUG: 0
2020-02-24 18:37:01,735 INFO: Executing: /usr/bin/sg_persist -n -i -r -d /dev/sda
2020-02-24 18:37:01,739 DEBUG: 0 PR generation=0x3, there is NO reservation held
2020-02-24 18:37:01,740 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,744 DEBUG: 0
2020-02-24 18:37:01,745 INFO: Executing: /usr/bin/sg_persist -n -o -R -T 5 -K 123abc -d /dev/sda
2020-02-24 18:37:01,749 DEBUG: 0
2020-02-24 18:37:01,750 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,753 DEBUG: 0
2020-02-24 18:37:01,754 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:01,757 DEBUG: 0
2020-02-24 18:37:01,757 INFO: Executing: /usr/bin/sg_persist -n -i -k -d /dev/sda
2020-02-24 18:37:01,761 DEBUG: 0 PR generation=0x3, 1 registered reservation key follows: 0x123abc
Success: Powered ON

# key 0x123abc is registered:

(k)rafaeldtinoco@clufocal01:~$ sudo sg_persist --in --read-keys --device=/dev/sda
  LIO-ORG cluster.focal.t 4.0
  Peripheral device type: disk
  PR generation=0x3, 1 registered reservation key follows:
    0x123abc

# key 0x123abc holds the reservation:

(k)rafaeldtinoco@clufocal01:~$ sudo sg_persist -r /dev/sda
  LIO-ORG cluster.focal.t 4.0
  Peripheral device type: disk
  PR generation=0x3, Reservation follows:
    Key=0x123abc
    scope: LU_SCOPE, type: Write Exclusive, registrants only

##
## NODE 02
##

# tell "fence_scsi" agent that node 2 is "on"

(k)rafaeldtinoco@clufocal02:~$ sudo fence_scsi --verbose -n 10.250.3.11 -d /dev/sda -k 321abc -o on
2020-02-24 18:37:37,888 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:37,892 DEBUG: 0
2020-02-24 18:37:37,892 INFO: Executing: /usr/bin/sg_turs /dev/sda
2020-02-24 18:37:37,895 DEBUG: 0...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.3 KiB)

Previous command show the full set of commands given by fence_scsi agent to the storage server, for the shared disk used by the cluster. Bellow is the exact same test at Azure environment. I was having issues before and I'm not having them anymore as it seems. I'll have to investigate what the cluster is doing differently than this sequence:

##
## NODE 01 (registration)
##

rafaeldtinoco@clubionic01:~$ sudo fence_scsi --verbose -n 10.250.3.10 -d /dev/sdc -k 123abc -o on
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x12, there are NO registered reservation keys
No registration for key 123abc on device /dev/sdc
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -o -I -S 123abc -d /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x13, 1 registered reservation key follows:
    0x123abc
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -r -d /dev/sdc
0 PR generation=0x13, there is NO reservation held
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -o -R -T 5 -K 123abc -d /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x13, 1 registered reservation key follows:
    0x123abc
Success: Powered ON

rafaeldtinoco@clubionic01:~$ sudo sg_persist --in --read-keys --device=/dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
  PR generation=0x13, 1 registered reservation key follows:
    0x123abc
rafaeldtinoco@clubionic01:~$ sudo sg_persist -r /dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
  PR generation=0x13, Reservation follows:
    Key=0x123abc
    scope: LU_SCOPE, type: Write Exclusive, registrants only

##
## NODE 02 (registration)
##

rafaeldtinoco@clubionic02:~$ sudo fence_scsi --verbose -n 10.250.3.11 -d /dev/sdc -k 321abc -o on
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x13, 1 registered reservation key follows:
    0x123abc
No registration for key 321abc on device /dev/sdc
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -o -I -S 321abc -d /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x14, 2 registered reservation keys follow:
    0x123abc
    0x321abc
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -r -d /dev/sdc
0 PR generation=0x14, Reservation follows:
    Key=0x123abc
    scope: LU_SCOPE, type: Write Exclusive, registrants only
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_turs /dev/sdc
0
Executing: /usr/bin/sg_persist -n -i -k -d /dev/sdc
0 PR generation=0x14, 2 registered reservation keys follow:
    0x123abc
    0x321abc
Success: Powered ON

rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sdc
  Msft Virtual Disk 1.0
  Peripheral device type: disk
 ...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm flagging this bug as invalid because I was able to make the fence_scsi agent to work in a pacemaker 1.1.19 environment. I have opened the following bug:

https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1865523

To work on this.

Changed in fence-agents (Ubuntu):
status: New → Invalid
Changed in fence-agents (Ubuntu Bionic):
status: Confirmed → Invalid
importance: High → Undecided
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.