$FAIL not always set → --keep does not always work

Bug #996789 reported by gmoore777
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
xen-tools (Ubuntu)
In Progress
Undecided
Axel Beckert

Bug Description

On 64-bit Ubuntu 12.04 PrecisePangolin, the
/usr/lib/xen-tools/karmic.d/70-install-ssh file
from the 4.2.1-1 xen-tools package does not prevent the
`sshd` from starting up when the openssh-server package
gets installed.
I don't know why it doesn't, I just see it running.

[ Aside: if the dom0 machine happens to have
 sshd running, then no additional `sshd` process will start
 to run during the installation of the openssh-server package
 on the domU machine. Then the entire run of xen-create-image
 will be successful.
]

[Aside: I did have to do the following to begin to use
xen-create-image.

    cd /usr/share/doc/xen-tools
    sudo ln -s karmic.d precise.d

   sudo vi /etc/default/xen
    Change:
        TOOLSTACK=
        to:
        TOOLSTACK=xl
]

The short story on how to fix my dilemna was to edit:
/usr/lib/xen-tools/karmic.d/70-install-ssh

And to add code to shut down the sshd process that was just started
after installing the openssh-server package:
chroot ${prefix} /usr/sbin/service ssh stop

Longer story:
The xen-create-image perl script has the intention of preventing the sshd
process from starting up, but it's not working. I don't really know why.

Because this sshd binary starts up upon the installation of the
openssh-server package and loads up other libraries, the
clean up "END" subroutine in xen-create-image fails while unmounting
the /tmp/<tempname> directory.
That failed unmounting is not fatal.
The new machine is still intact at that moment.
But when the perl script, xen-create-image, exits, any temporary
directory location as created by the File::Temp Perl class gets
removed automajically.
The /tmp/<tempname> directory was created via this File::Temp Perl class
and will get removed. Unfortunately it's not the empty directory
as it should be after successful unmounting, it's the entire new machine
file heararchy since the unmounting failed.
Thus after xen-create-image, when one tries to list out the contents
of the root of the new machine, for me it would be at the top of my
volume, I see no directories, no files. Nothing.

Tags: precise
Revision history for this message
Axel Beckert (xtaran) wrote : Re: [Bug 996789] [NEW] 70-install-ssh does not prevent sshd from starting up

Hi,

thanks for the bug report!

gmoore777 wrote:
> The xen-create-image perl script has the intention of preventing the sshd
> process from starting up, but it's not working. I don't really know why.

Hrm, could be upstart related. Have to investigate.

> Because this sshd binary starts up upon the installation of the
> openssh-server package and loads up other libraries, the
> clean up "END" subroutine in xen-create-image fails while unmounting
> the /tmp/<tempname> directory.
[...]
> But when the perl script, xen-create-image, exits, any temporary
> directory location as created by the File::Temp Perl class gets
> removed automajically.

Does adding --keep as option to xen-create-image help here?

A new xen-tools is due soon and I'll try to include a fix for this.

  Regards, Axel
--
 ,''`. | Axel Beckert <email address hidden>, http://people.debian.org/~abe/
: :' : | Debian Developer, ftp.ch.debian.org Admin
`. `' | 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
  `- | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5

Revision history for this message
gmoore777 (guy-moore) wrote : Re: 70-install-ssh does not prevent sshd from starting up
Download full text (3.4 KiB)

Looking at the code of xen-create-image, --keep would only matter if somewhere along the lines FAIL was set to 1and that
"exit" was not called at the point of the error.
The failure occurs during a `umount` of a busy device. (it's still busy cause `sshd` is running.)
The perl program just exits at that point which at this point is inside the routine, runCommand()
at the last 2 lines:
        $FAIL = 1;
        exit 127;

If --keep were to work, I would have had to see this line get printed:
      Removing failed install:
The installation is not programmatically being removed via the command, xen-delete-image.

The installation is being removed cause of this line in the beginning:
        use File::Temp qw/ tempdir /;
Meaning, once the perl program exits, and before control is given back to the user,
Perl automajically removes any file systems that was created with the intention of them
being temporary. "tempdir" represents the entire installation and it is this directory that is
being "correctly" cleaned up behind the scenes without any notice to the end user.
"Correctly" meaning, Perl is only doing what it is supposed to do.

Anyways, I just did this per your request:

Ran UpdateManager to get any updates. (none)
rebooted machine.
Shut down the `sshd` on Dom0
Unedit by hack in /usr/lib/xen-tools/karmic.d/70-install-ssh.
sudo xen-create-image --fs=ext4 --image=full --memory=1Gb --size=20Gb --swap=2Gb --install-method=debootstrap --arch=amd64 --dist=precise --lvm=vg1 --hostname=xen004 --vcpus=1 --ip=192.168.0.245 --gateway=192.168.0.1 --broadcast=192.168.0.255 --netmask=255.255.255.0 --keep --verbose

and that resulted in an empty installation under my mounted volume at /tmp/P_4Wff0FTJ.

This was the final n lines from the xen-create-image:
...
Setting up root password
Generating a password for the new guest.
All done
Executing : umount /tmp/P_4Wff0FTJ/proc
Finished : umount /tmp/P_4Wff0FTJ/proc 2>&1
Unmounting : /tmp/P_4Wff0FTJ/dev/pts
Executing : umount /tmp/P_4Wff0FTJ/dev/pts
Finished : umount /tmp/P_4Wff0FTJ/dev/pts 2>&1
Unmounting : /tmp/P_4Wff0FTJ
Executing : umount /tmp/P_4Wff0FTJ
umount: /tmp/P_4Wff0FTJ: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
Finished : umount /tmp/P_4Wff0FTJ 2>&1
Running command 'umount /tmp/P_4Wff0FTJ 2>&1' failed with exit code 256.
Aborting
See /var/log/xen-tools/xen004.log for details
cannot remove directory for /tmp/P_4Wff0FTJ: Device or resource busy at /usr/share/perl/5.14/File/Temp.pm line 902
$

and then I surmise between the message of "cannot remove directory" and the Unix prompt,
Perl is effectively doing this: `rm -rf /tmp/P_4Wff0FTJ`.
Actually I discovered this, cause during my 100s of installations fighting with this problem, I had commented
out one of the umount commands for .../proc. So when xen-create-image failed and got the regular messages
as in the above, I also got all these messages about how it couldn't "unlink" a bunch of files under /tmp/XXXXXX/proc/...
I said to myself, huh? where is that code that is doing all this removing. I then stumbled upon that line:
   use File::Temp qw/ tempdir /;
and lo...

Read more...

Revision history for this message
gmoore777 (guy-moore) wrote :

Oh, and I forgot to mention. After the failed installation, `sshd` is running:

$ ps -alef | grep sshd
4 S root 1302 1 0 80 0 - 12487 poll_s 08:03 ? 00:00:00 /usr/sbin/sshd -D
$

and I cannot stop it via `service` command from Dom0 (cause it was started from a different machine/root):

$ sudo service ssh stop
stop: Unknown instance:
$

But I certainly can kill it (which is untrue if this sshd was started from Dom0
as another sshd would get started when something detected it died.)
$ sudo kill -9 1302
$ ps -alef | grep sshd
$

Revision history for this message
Axel Beckert (xtaran) wrote : Re: [Bug 996789] Re: 70-install-ssh does not prevent sshd from starting up

Hi,

gmoore777 wrote:
> Looking at the code of xen-create-image, --keep would only matter if somewhere along the lines FAIL was set to 1and that
> "exit" was not called at the point of the error.
> The failure occurs during a `umount` of a busy device. (it's still busy cause `sshd` is running.)
> The perl program just exits at that point which at this point is inside the routine, runCommand()
> at the last 2 lines:
> $FAIL = 1;
> exit 127;

The next release will contain quite some fixes at that point
(consistent $FAIL usage instead just die and such), so this may be
already fixed in git, but I'll have closer look at that concrete
example.

Thanks for alle the details. I suspect the fix for the initially
reported issue (and not this side issue with --keep) is given in
https://bugs.launchpad.net/ubuntu/+source/xen-tools/+bug/997063

  Regards, Axel
--
 ,''`. | Axel Beckert <email address hidden>, http://people.debian.org/~abe/
: :' : | Debian Developer, ftp.ch.debian.org Admin
`. `' | 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
  `- | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5

Changed in xen-tools (Ubuntu):
assignee: nobody → Axel Beckert (xtaran)
status: New → In Progress
Revision history for this message
gmoore777 (guy-moore) wrote : Re: 70-install-ssh does not prevent sshd from starting up

Yes, I believe 997063 should address my issue with sshd starting up (which is the beginning of the end for me) during the installation of openssh-server package.

Axel Beckert (xtaran)
tags: added: precise
Revision history for this message
Axel Beckert (xtaran) wrote :

I'm tracking now the --keep vs $FAIL issue in here while the initial SSH issue is tracked in #997063.

summary: - 70-install-ssh does not prevent sshd from starting up
+ $FAIL not always set → --keep does not always work
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.