Comment 7 for bug 1294267

Revision history for this message
Rarylson Freitas (rarylson) wrote :

Hi Serge Hallyn,

I run some new tests and, in short, the problem I found is actually three problems (maybe this bug should be broke in three).
I will summarize the problems I found:

- Tgtd breaks when it receives a SIGHUP;
  - When we send a SIGHUP to tgtd (`reload tgt`), the deamon stops (and it logs an fault in syslog), and then upstart respawns the deamon. It is, run `reload tgt` appears to be equivalent to run `kill -9 TGT_ROOT_PID; start tgt;`;
- Tgtd has an unexpected behavior when we run a `restart tgt` or a `stop tgt; start tgt` command;
  - When tgtd stops and closes the devices, the system will probably run a `blkid` in this devices. So, when tgtd starts again, some devices may be still opened by the blkid command, and neither tgtd or `tgt-admin -e` will add this device (I'm considering "allow-in-use" is configured to "off" in this moment);
- The `stop tgt` command really stops tgtd even if some LUNs still in use;
  - This can cause data corruption.

Debian wheezy (https://packages.debian.org/source/wheezy/tgt) solves these problem using the initd.sample from upstream.

In the next explanations, I'm using the source code found in: http://packages.ubuntu.com/source/precise/tgt

- Tgtd breaks when it receives a SIGHUP:

From the initd.sample file (tgt-1.0.17/scripts/initd.sample, line 79):

reload()
{
 echo "Updating target framework daemon configuration"
 # Update configuration for targets. Only targets which
 # are not in use will be updated.
 tgt-admin --update ALL -c $TGTD_CONFIG &>/dev/null
 RETVAL=$?
 if [ "$RETVAL" -eq 107 ] ; then
     echo "tgtd is not running"
     exit 1
 fi
}

It is, the `service tgt reload` command will run a `tgt-admin --update ALL` command, and this last command works fine. The `service tgt reload` command DOESN'T send a SIGHUP signal in Debian (and in all Linux distributions that uses the initd.sample file).

I don't see how fix this in Ubuntu using upstart, because in Ubuntu 12.04 Upstart always send a SIGHUP signal. In newer versions of Ubuntu we can use the `reload signal` stanza, but we can't run a script (`tgt-admin --update ALL`) when `reload tgt` was called.

- Tgtd has an unexpected behavior when we run a `restart tgt` or a `stop tgt; start tgt` command:

First, let's read the tgt-admin source code (tgt-1.0.17/scripts/tgt-admin, line 1240):

 # Check if userspace uses this device
 my $lsof_check = check_exe("lsof");
 if ($lsof_check ne 1) {
  system("lsof $backing_store &>/dev/null");
  my $exit_value = $? >> 8;
  if ($exit_value eq 0) {
   execute("# Device $backing_store is used (already tgtd target?).");
   execute("# Run 'lsof $backing_store' to see the details.");
   return 0;
  }
 }

It is, when `tgt-admin -e` (in the `post-start` stanza) is executed, if some device still opened (in our case, by the `blkid` command), we can add this device as a LUN.

In my attachment tgt.override, a hack was proposed to avoid this problem. However, my tests showed that waiting one second between stops and starts tgtd solves the problem. Actually, 0.3 second waiting solved my problem in all tests (but 1 second is a more secure interval).

From the initd.sample file (tgt-1.0.17/scripts/initd.sample, line 125):

 restart)
  TASK=restart
  stop && start
  ;;

And from line 19:

 # Put tgtd into "offline" state until all the targets are configured.
 # We don't want initiators to (re)connect and fail the connection
 # if it's not ready.
 tgtadm --op update --mode sys --name State -v offline
 # Configure the targets.
 tgt-admin -e -c $TGTD_CONFIG
 # Put tgtd into "ready" state.
 tgtadm --op update --mode sys --name State -v ready

A fix for this problem can be:

 # Put tgtd into "offline" state until all the targets are configured.
 # We don't want initiators to (re)connect and fail the connection
 # if it's not ready.
 tgtadm --op update --mode sys --name State -v offline
+ # Wait the system free the backing-stores (a `blkid` command
+ # may still be running)
+ # See: https://bugs.launchpad.net/ubuntu/+source/tgt/+bug/1294267
+ [ "$TASK" == "restart" ] && sleep 1
 # Configure the targets.
 tgt-admin -e -c $TGTD_CONFIG
 # Put tgtd into "ready" state.
 tgtadm --op update --mode sys --name State -v ready

- The `stop tgt` command really stops tgtd even if some LUNs still in use:

Let's see the source code again (tgt-1.0.17/scripts/initd.sample, line 35):

 # Remove all targets. It only removes targets which are not in use.
 tgt-admin --update ALL -c /dev/null &>/dev/null
 # tgtd will exit if all targets were removed
 tgtadm --op delete --mode system &>/dev/null
 RETVAL=$?
 if [ "$RETVAL" -eq 107 ] ; then
     echo "tgtd is not running"
     [ "$TASK" != "restart" ] && exit 1
 elif [ "$RETVAL" -ne 0 ] ; then
     echo "Some initiators are still connected - could not stop tgtd"
     exit 2
 fi

It is, in the tgt upstream initd script `service tgt stop` command shouldn't stop the deamon if some target still in use.

However, in the Ubuntu upstart version, according to Upstart cookbook, when we run `stop tgt`, upstart sends a SIGTERM signal and waits the time specified by the `kill timeout` stanza. After this, it sends a SIGKILL signal, forcing the deamon to stop.

See: http://upstart.ubuntu.com/cookbook/#kill-signal, http://upstart.ubuntu.com/cookbook/#kill-timeout

--------------

Solutions (patches):

There are two main kind of solutions: fix upstart, or use the initd.sample script again.

- Tgtd breaks when it receives a SIGHUP:

-- Initd.sample: Already solved
-- Upstart: No way to solve this problem

- Tgtd has an unexpected behavior when we run a `restart tgt` or a `stop tgt; start tgt` command:

-- Initd.sample:

Patch applied to initd.sample file (tgt-1.0.17/scripts/initd.sample, line 19):

 # Put tgtd into "offline" state until all the targets are configured.
 # We don't want initiators to (re)connect and fail the connection
 # if it's not ready.
 tgtadm --op update --mode sys --name State -v offline
+ # Wait the system free the backing-stores (a `blkid` command
+ # may still be running)
+ # See: https://bugs.launchpad.net/ubuntu/+source/tgt/+bug/1294267
+ [ "$TASK" == "restart" ] && sleep 1
 # Configure the targets.
 tgt-admin -e -c $TGTD_CONFIG
 # Put tgtd into "ready" state.
 tgtadm --op update --mode sys --name State -v ready

-- Upstart:

Patch applied to tgt.conf file (debian/upstart, line 11):

- post-start exec /usr/sbin/tgt-admin -e
+ post-start script
+ # Wait the system free the backing-stores (a `blkid` command
+ # may still be running)
+ # See: https://bugs.launchpad.net/ubuntu/+source/tgt/+bug/1294267
+ sleep 1
+ /usr/sbin/tgt-admin -e
+ end script

Note: I don't tested this patch. I don't know if the post-script stanza runs after a restart operation. It's possible it doesn't work, since the Upstart cookbook says: "Further note that if the job contains post-stop, pre-start or post-start stanzas, these will NOT be run for a restart. However, a pre-stop stanza will be run." See: http://upstart.ubuntu.com/cookbook/#restart

- The `stop tgt` command really stops tgtd even if some LUNs still in use:

-- Initd.sample: Already solved
-- Upstart:

Patch applied to tgt.conf file (debian/upstart, line 11):

+ pre-stop script
+ # Remove all targets. It only removes targets which are not in use.
+ /usr/sbin/tgt-admin --update ALL -c /dev/null
+ # tgtd will exit if all targets were removed
+ /usr/sbin/tgtadm --op delete --mode system
+ RETVAL=$?
+ if [ "$RETVAL" -ne 0 ] ; then
+ # Some initiators are still connected - could not stop tgtd
+ # TODO Log this message in some place
+ exit 2
+ fi
+ end script
+
post-start exec /usr/sbin/tgt-admin -e

-----------

This comment is bigger, but it describes well all problems and solutions I found. I hope it helps.