lxc-destroy/lxc-stop gets stuck

Bug #1377973 reported by Nikola Krzalic
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Expired
Medium
Unassigned

Bug Description

On ubuntu 14.04 I ran lxc-start with a custom built container and interrupted the setup before it could finish. Now lxc-ls --fancy, lxc-destroy -n container or lxc-stop -n container are getting stuck. LXC package version is 1.0.5-0ubuntu0.1

Output from strace:

# strace lxc-destroy -n myvm
<....>
mkdir("/", 0755) = -1 EEXIST (File exists)
mkdir("/run/", 0755) = -1 EEXIST (File exists)
mkdir("/run/lock/", 0755) = -1 EEXIST (File exists)
mkdir("/run/lock/lxc//", 0755) = -1 EEXIST (File exists)
mkdir("/run/lock/lxc//var/", 0755) = -1 EEXIST (File exists)
mkdir("/run/lock/lxc//var/lib/", 0755) = -1 EEXIST (File exists)
mkdir("/run/lock/lxc//var/lib/lxc", 0755) = -1 EEXIST (File exists)
stat("/var/lib/lxc/myvm/config", 0x7fff06d64f50) = -1 ENOENT (No such file or directory)
stat("/var/lib/lxc/myvm/partial", 0x7fff06d64e80) = -1 ENOENT (No such file or directory)
socket(PF_LOCAL, SOCK_STREAM, 0) = 3
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/myvm/command"}, 28) = 0
getuid() = 0
getgid() = 0
sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=2028, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL) = 16
recvmsg(3,

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1377973] [NEW] lxc-destroy/lxc-stop gets stuck

thanks for reporting this bug.

Could you please show:

1. The result of 'netstat -x | grep lxc'

2. ps -ef | grep lxc

 status: incomplete

Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
Nikola Krzalic (crypt1d) wrote :
Download full text (4.1 KiB)

Looks like the lxc-start command got stuck, after killing the process
everything went back to normal.

root@ares:~# netstat -x | grep lxc
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command
unix 2 [ ] STREAM CONNECTING 0
 @/var/lib/lxc/myvm/command

root@ares:~# ps -ef | grep lxc
root 1697 1250 0 Okt06 ? 00:00:00 strace lxc-start -n myvm -f
myvm.conf --logfile=vm.log --logpriority=NOTICE
root 1704 1697 0 Okt06 ? 00:00:00 lxc-start -n myvm -f
myvm.conf --logfile=vm.log --logpriority=NOTICE
lxc-dns+ 13856 1 0 Okt05 ? 00:00:00 dnsmasq -u lxc-dnsmasq
--strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid
--conf-file= --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254
--dhcp-lease-max=253 --dhcp-no-override --except-interface=lo
--interface=lxcbr0 --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases
--dhcp-authoritative

root@ares:~# ps -ef | grep 1697
root 1697 1250 0 Okt06 ? 00:00:00 strace lxc-start -n myvm -f
myvm.conf --logfile=vm.log --logpriority=NOTICE
root 1704 1697 0 Okt06 ? 00:00:00 lxc-start -n myvm -f
myvm.conf --logfile=vm.log --logpriority=NOTICE
root 15861 21116 0 08:12 pts/16 00:00:00 grep --color=auto 1697

root@ares:~# kill -3 1697
root@ares:~# ps -ef | grep lxc-start
root 15870 21116 0 08:13 pts/16 00:00:00 grep --color=auto lxc-start

root@ares:~# lxc-ls --fancy
NAME STATE IPV4 IPV6 AUTOSTART
----------------------------------

root@ares:~# lxc-destroy -n myvm
Container is not defined

still, I find it a bit strange that not even lxc-ls was working properly....

On Mon, Oct 6, 2014 at 4:57 PM, Serge Hallyn <email address hidden>
wrote:

> thanks for reporting this bug.
>
> Could you please show:
>
> 1. The result of 'netstat -x | grep lxc'
>
> 2. ps -ef | grep lxc
>
> status: incomplete
>
>
> ** Changed in: lxc (Ubuntu)
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1377973
>
> Title:
> lxc-destroy/lxc-stop gets stuck
>
> Status in “lxc” package in Ubuntu:
> Incomplete
>
> Bug description:
> On ubuntu 14.04 I ran lxc-start with a custom built container and
> interrupted the setup before it could finish. Now lxc-ls --fancy, lxc-
> destroy -n container or lxc-stop -n container are getting stuck. LXC
> package version is 1.0.5-0ubuntu0.1
>
> Output from strace:
>
> # strace lxc-destroy -n myvm
> <....>
> mkdir("/", 0755) = -1 EEXIST (File exists)
> mkdir("/run/", 0755) = -1 EEXIST (File exists)
> mkdir("/run/lock/", 0755) = -1 EEXIST (File exists)
> mkdir("/run/lock/lxc//", 0755) = -1 EEXIST (File exists)
> mkdir("/run/lock/lxc//var/", 0755) = -1 EEX...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Nikola Krzalic (<email address hidden>):
> Looks like the lxc-start command got stuck, after killing the process
> everything went back to normal.
...
> still, I find it a bit strange that not even lxc-ls was working properly....

Right, we should find a way to keep lxc-stop and lxc-ls from hanging in
this case. Perhaps after a (reasonably long) timeout we should assume
it is dead, hard-kill the container (in this case the owner of the
command socket), and continue.

(marking medium importance rather than high, per guidelines, since it
can be detected and worked around)

 status: confirmed
 importance: medium

Changed in lxc (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Confirmed
Revision history for this message
Dan Poler (l-dan) wrote :

Quoting Serge Hallyn
> Perhaps after a (reasonably long) timeout we should assume
> it is dead, hard-kill the container (in this case the owner of the
> command socket), and continue.

I'm seeing this behaviour happen in Vivid although in my case the "stuck" container is perfectly responsive and is working fine e.g. if I ssh into it. So the assumption that the container is dead and should be killed may not be the answer... I'd rather have the container work and lxc-ls broken than the other way around.

Revision history for this message
Dan Poler (l-dan) wrote :

Correction, I meant to say "I'm seeing this behaviour happen in Trusty" not Vivid.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Interesting, so apparently only the lxc monitor itself is hung. could you please strace that process (it will be the parent of the container init) so we can see what it is hung trying to do?

Did you happen to upgrade lxc to a significantly newer version after starting the container? I don't know of any changes we've made to the monitor protocol, but it's not impossible.

Can you give more details about how you set up your custom container? Is this (somewhat - i.e. you can do it more than once even if not always) reproducible with a precise set of steps?

Changed in lxc (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Dan Poler (l-dan) wrote :

The container has since been restarted (by issuing a 'reboot' inside the container), which "unstuck" lxc-ls, it's now behaving normally. If it happens again I'll see what I can see. It's not the first time I've seen this happen.

LXC has not been upgraded since the container was started, no.

The container was created via 'sudo lxc-create -t download -n <name>' - nothing terribly unusual that separates it from a couple other containers running on the same hardware. It runs a desktop (xfce) which is unique but not a particularly "wacky" use case. The only "special" config is autostart, an lxc.cgroup.devices.allow statement, and an lxc.mount.entry.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.