tgt doesn't export some LUNs after a reload operation (SIGHUP)

Bug #1294267 reported by Rarylson Freitas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tgt (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Sometime, after run a `reload tgt`, some LUNs aren't correctly added to the targets.

The error will be reproduced above:

$ tgt-admin --dump

default-driver iscsi

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_4>
 backing-store /dev/vg_vmware/vmware_4
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_3>
 backing-store /dev/vg_vmware/vmware_3
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_2>
 backing-store /dev/vg_vmware/vmware_2
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_1>
 backing-store /dev/vg_vmware/vmware_1
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_5>
 backing-store /dev/vg_vmware/vmware_5
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

Now, we'll send a SIGHUP signal to tgt:

$ reload tgt
# run `kill -SIGHUP $TGT_ROOT_PID` have the same behavior
$ tgt-admin --dump

default-driver iscsi

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_4>
 backing-store /dev/vg_vmware/vmware_4
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_3>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_2>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_1>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_5>
 backing-store /dev/vg_vmware/vmware_5
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

Now, some target LUNs weren't correcty attached to the target (/dev/vg_vmware/vmware_1, /dev/vg_vmware/vmware_2 and /dev/vg_vmware/vmware_3 in my example).

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: tgt 1:1.0.17-1ubuntu2
ProcVersionSignature: Ubuntu 3.8.0-35.52~precise1-generic 3.8.13.13
Uname: Linux 3.8.0-35-generic x86_64
ApportVersion: 2.0.1-0ubuntu17.6
Architecture: amd64
Date: Tue Mar 18 14:57:37 2014
MarkForUpload: True
SourcePackage: tgt
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Rarylson Freitas (rarylson) wrote :
Revision history for this message
Rarylson Freitas (rarylson) wrote :

I think the problem can be related to this fact: the TGT daemon don't add a block storage to a LUN if some process was accessing it.

The following command show another operation in which a similar problem occours:

# All block storage added to the targets
$ tgt-admin --dump | grep -o "backing-store \(.*\)" | sed -e "s/^backing-store \(.*\)$/\1/"
/dev/vg_vmware/vmware_4
/dev/vg_vmware/vmware_3
/dev/vg_vmware/vmware_2
/dev/vg_vmware/vmware_1
/dev/vg_vmware/vmware_5
# Now, we'll run a stop/start operation. Some device block (previously LUNs) still opened by some process
# between the `stop` and `start` operations. In our case, the device blocks was opened by the command `blkid`,
# running with the 'root' user.
$ stop tgt; lsof /dev/mapper/vg_vmware-vmware_1 /dev/mapper/vg_vmware-vmware_2 \
$ /dev/mapper/vg_vmware-vmware_3 /dev/mapper/vg_vmware-vmware_4 \
$ /dev/mapper/vg_vmware-vmware_5; start tgt; tgt-admin --dump | grep -o "backing-store \(.*\)" | \
$ sed -e "s/^backing-store \(.*\)$/\1/"
tgt stop/waiting
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
blkid 29870 root 3r BLK 252,2 0t212992 1585695 /dev/mapper/../dm-2
blkid 29871 root 3r BLK 252,0 0t143360 1605724 /dev/mapper/../dm-0
blkid 29872 root 3r BLK 252,3 0t180224 1585696 /dev/mapper/../dm-3
blkid 29875 root 3r BLK 252,4 0x1fd70bce200 1585697 /dev/mapper/../dm-4
tgt start/running, process 29878
/dev/vg_vmware/vmware_5
$ ls -lh /dev/mapper/vg_vmware-vmware_1 /dev/mapper/vg_vmware-vmware_2 /dev/mapper/vg_vmware-vmware_3 /dev/mapper/vg_vmware-vmware_4 /dev/mapper/vg_vmware-vmware_5
lrwxrwxrwx 1 root root 7 Mar 18 15:27 /dev/mapper/vg_vmware-vmware_1 -> ../dm-0
lrwxrwxrwx 1 root root 7 Mar 18 15:27 /dev/mapper/vg_vmware-vmware_2 -> ../dm-2
lrwxrwxrwx 1 root root 7 Mar 18 15:27 /dev/mapper/vg_vmware-vmware_3 -> ../dm-3
lrwxrwxrwx 1 root root 7 Mar 18 15:27 /dev/mapper/vg_vmware-vmware_4 -> ../dm-4
lrwxrwxrwx 1 root root 7 Mar 18 15:27 /dev/mapper/vg_vmware-vmware_5 -> ../dm-5

In our test, four device blocks (vg_vmware-vmware_1, vg_vmware-vmware_2, vg_vmware-vmware_3 and vg_vmware-vmware_4) were opened by the blkid command, and the `start tgt` command (that runs `/usr/sbin/tgt-admin -e`) only adds the other lun (vg_vmware-vmware_5).

I hope this test helps.

Revision history for this message
Rarylson Freitas (rarylson) wrote :

Sometimes, the restart operation doesn't works also:

$ tgt-admin --dump

default-driver iscsi

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_4>
 backing-store /dev/vg_vmware/vmware_4
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_3>
 backing-store /dev/vg_vmware/vmware_3
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_2>
 backing-store /dev/vg_vmware/vmware_2
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_1>
 backing-store /dev/vg_vmware/vmware_1
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_5>
 backing-store /dev/vg_vmware/vmware_5
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

$ restart tgt
tgt start/running, process 30315

$ tgt-admin --dump
default-driver iscsi

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_4>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_3>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_2>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_1>
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

<target iqn.2014-03.br.com.2aliancas.storage2:vmware_iscsi_5>
 backing-store /dev/vg_vmware/vmware_5
 initiator-address 192.168.130.0/24
 initiator-address 127.0.0.1
</target>

In this case, only the /dev/vg_vmware/vmware_5 device block was added as a LUN.

Revision history for this message
Rarylson Freitas (rarylson) wrote :

I discovered that, although sending a SIGHUG signal to TGT (using the `reload tgt` command) doesn't works, I can correctly "reload" TGT using the following command:

    tgt-admin --update ALL

I know upstart has the option to change the signal send to a deamon. However, if we could change the `reload` behavior to send no signal and exec an script, we could correct the `reload` wrong behavior.

Revision history for this message
Rarylson Freitas (rarylson) wrote :

In the attachment, I implemented a fix for the `restart` and `stop+start` bug.

It's a poor fix, but it may help.

Unfortunately, I don't found any fix for the reload method, since upstart (in Ubuntu 12.04) always send a SIGHUP signal to the deamon.

Changed in tgt (Ubuntu):
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug. It doesn't look like it's been addressed upstream at all (no catching of sighup going on in git head).

Changed in tgt (Ubuntu):
status: New → Confirmed
Revision history for this message
Rarylson Freitas (rarylson) wrote :
Download full text (7.6 KiB)

Hi Serge Hallyn,

I run some new tests and, in short, the problem I found is actually three problems (maybe this bug should be broke in three).
I will summarize the problems I found:

- Tgtd breaks when it receives a SIGHUP;
  - When we send a SIGHUP to tgtd (`reload tgt`), the deamon stops (and it logs an fault in syslog), and then upstart respawns the deamon. It is, run `reload tgt` appears to be equivalent to run `kill -9 TGT_ROOT_PID; start tgt;`;
- Tgtd has an unexpected behavior when we run a `restart tgt` or a `stop tgt; start tgt` command;
  - When tgtd stops and closes the devices, the system will probably run a `blkid` in this devices. So, when tgtd starts again, some devices may be still opened by the blkid command, and neither tgtd or `tgt-admin -e` will add this device (I'm considering "allow-in-use" is configured to "off" in this moment);
- The `stop tgt` command really stops tgtd even if some LUNs still in use;
  - This can cause data corruption.

Debian wheezy (https://packages.debian.org/source/wheezy/tgt) solves these problem using the initd.sample from upstream.

In the next explanations, I'm using the source code found in: http://packages.ubuntu.com/source/precise/tgt

- Tgtd breaks when it receives a SIGHUP:

From the initd.sample file (tgt-1.0.17/scripts/initd.sample, line 79):

reload()
{
 echo "Updating target framework daemon configuration"
 # Update configuration for targets. Only targets which
 # are not in use will be updated.
 tgt-admin --update ALL -c $TGTD_CONFIG &>/dev/null
 RETVAL=$?
 if [ "$RETVAL" -eq 107 ] ; then
     echo "tgtd is not running"
     exit 1
 fi
}

It is, the `service tgt reload` command will run a `tgt-admin --update ALL` command, and this last command works fine. The `service tgt reload` command DOESN'T send a SIGHUP signal in Debian (and in all Linux distributions that uses the initd.sample file).

I don't see how fix this in Ubuntu using upstart, because in Ubuntu 12.04 Upstart always send a SIGHUP signal. In newer versions of Ubuntu we can use the `reload signal` stanza, but we can't run a script (`tgt-admin --update ALL`) when `reload tgt` was called.

- Tgtd has an unexpected behavior when we run a `restart tgt` or a `stop tgt; start tgt` command:

First, let's read the tgt-admin source code (tgt-1.0.17/scripts/tgt-admin, line 1240):

 # Check if userspace uses this device
 my $lsof_check = check_exe("lsof");
 if ($lsof_check ne 1) {
  system("lsof $backing_store &>/dev/null");
  my $exit_value = $? >> 8;
  if ($exit_value eq 0) {
   execute("# Device $backing_store is used (already tgtd target?).");
   execute("# Run 'lsof $backing_store' to see the details.");
   return 0;
  }
 }

It is, when `tgt-admin -e` (in the `post-start` stanza) is executed, if some device still opened (in our case, by the `blkid` command), we can add this device as a LUN.

In my attachment tgt.override, a hack was proposed to avoid this problem. However, my tests showed that waiting one second between stops and starts tgtd solves the problem. Actually, 0.3 second waiting solved my problem in all tests (but 1 second is a more secure interval).

From the initd.sample file (tgt-1.0.17/scripts/initd.sample, lin...

Read more...

Revision history for this message
Rarylson Freitas (rarylson) wrote :

Hi everyone.

It seems like the TGT bug (when stopping, reloading or restarting the TGT deamon) is solved in Ubuntu Artful (17.10). I did not make tests, but now TGT uses the upstream init.d script or the TGT systemd config file (both of them address all of the previous described problems).

The commands `tgtadm --op update --mode sys --name State -v offline`, `tgt-admin --offline ALL` and `tgt-admin -e -c /etc/tgt/targets.conf`, for example, correctly turns TGT down (the to first commands) and reloads in the correct way (the last command).

I think this bug report can be closed now, since it seems to be solved in newer versions.

Below, the systemd tgt service config file:

```
[Unit]
Description=(i)SCSI target daemon
Documentation=man:tgtd(8)
After=network.target

[Service]
Type=notify
TasksMax=infinity
ExecStart=/usr/sbin/tgtd -f
ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline
ExecStartPost=/usr/sbin/tgt-admin -e -c /etc/tgt/targets.conf
ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v ready

ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline
ExecStop=/usr/sbin/tgt-admin --offline ALL
ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null -f
ExecStop=/usr/sbin/tgtadm --op delete --mode system

ExecReload=/usr/sbin/tgt-admin --update ALL -c /etc/tgt/targets.conf
Restart=on-failure

[Install]
WantedBy=multi-user.target
```

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for coming back rarylson!
It seems no one realized this implicitly fixed this issue when the changes were made.
Closing the bug as fix released (since artful).

Changed in tgt (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.