login sessions hangs in lxc

Bug #1572061 reported by brumbjorn
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Expired
High
Unassigned

Bug Description

login to lxc container hangs after login prompt. This happends to randomly lxc container while other containers on same host work as expected.
Several protocols affected, ssh, dovecot & smtp for example. DNS config is ok. lxc-attach -n name works and I'm able to reach outside network, do resolving etc.
if I do a lxc-stop and start the container works again. after a day or two samething occurs again to a random container on the same host.
 Could it be some kind of apparmor blocking ?

HostOs
Description: Ubuntu 16.04 LTS
Release: 16.04

network mode

iface br0 inet static
        bridge_ports eno1
         address x.x.x.x
        netmask 255.255.255.0
        network x.x.x.x
        broadcast x.x.x.x
        gateway x.x.x.x
        dns-nameservers x.x.x.x
        dns-search domain.com

lxc-os ubuntu xenial & debian jessie

configured with static ip

best regards
Bjorn

Tags: xenial
brumbjorn (fafners)
tags: added: xenial
Revision history for this message
brumbjorn (fafners) wrote :

anyone please?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Please show us the exact commands you used to create, start, and log into the container, as well as the container configuration file.

Changed in lxc (Ubuntu):
status: New → Incomplete
Revision history for this message
brumbjorn (fafners) wrote :

Debain container
lxc-create -t download( selected -> debian jessie amd64) -n kjell -B zfs

lxc-start -n kjell

function of container is remote access ssh.

 Template used to create this container: /usr/share/lxc/templates/lxc-download
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)

# Distribution configuration
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.arch = x86_64

# Container specific configuration
lxc.rootfs = /tank/host/kjell/rootfs
lxc.rootfs.backend = zfs
lxc.utsname = kjell

lxc.start.auto = 1
lxc.start.delay = 5
lxc.start.order = 50

# Network configuration
lxc.network.type = veth
lxc.network.link = br0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:13:89:92
lxc.logfile = /tank/host/kjell/kjell.log
lxc.loglevel = 1
==================================================

Ubuntu Container:

lxc-create -t download( selected -> ubunti xenian amd64) -n mail -B zfs

lxc-start -n mail

container is mail server, postfix dovecot

config:

# Template used to create this container: /usr/share/lxc/templates/lxc-ubuntu
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)
#lxc.aa_profile = unconfined
lxc.start.auto = 1
lxc.start.delay = 8
lxc.start.order = 8

# Common configuration
lxc.include = /usr/share/lxc/config/ubuntu.common.conf

# Container specific configuration
lxc.rootfs = /tank/host/mail/rootfs
lxc.utsname = mail
lxc.arch = amd64

# Network configuration
lxc.network.type = veth
lxc.network.link = br0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:e9:2c:b6

================

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

when I simply do

sudo lxc-create -t download -n kjell -B zfs -- -d debian -r jessie -a amd64

the resulting container works for me. I can apt-get install openssh-server and log in. However, your configuration file shows 'br0' is in use. Please show how br0 is configured on the host. Please show 'ifconfig -a' output in the container.

Revision history for this message
brumbjorn (fafners) wrote :

Hi,
it's an intermittent error meaning it only occurs on rare occasions on the containters.
I am rather sure it's pam or apparmor who is the issue here though. I received the login prompt during the incident but it hangs after password. Valid for both imaps and ssh protocols

 ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:16:3e:e9:2c:b6
          inet addr:192.168.60.1 Bcast:192.168.60.255 Mask:255.255.255.0
          inet6 addr: fe80::216:3eff:fee9:2cb6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:122486 errors:0 dropped:1809 overruns:0 frame:0
          TX packets:45613 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:54922882 (54.9 MB) TX bytes:44895323 (44.8 MB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:425 errors:0 dropped:0 overruns:0 frame:0
          TX packets:425 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:69808 (69.8 KB) TX bytes:69808 (69.8 KB)

================================

 This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto br0
iface br0 inet static
        bridge_ports eno1
        address 192.168.60.250
        netmask 255.255.255.0
        network 192.168.60.0
        broadcast 192.168.60.255
        gateway 192.168.60.252
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 192.168.60.4
        dns-search bjornes.net

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It might help if you can show a ps -ef and screenshot of top while it is hanging, or even better yet an strace -f of the sshd. One somewhat common cause of such hangs is /etc/hosts in the container not having an entry for the localhost.

Changed in lxc (Ubuntu):
importance: Undecided → High
Revision history for this message
brumbjorn (fafners) wrote :
Download full text (19.2 KiB)

Just got an hanging lxc, this is the deb container mentioned above. (127.0.0.1 localhost is defined in /etc/hosts

 ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 28148 3176 ? Ss Apr23 0:00 /sbin/init
root 38 0.0 0.0 32968 4364 ? Ss Apr23 0:00 /lib/systemd/systemd-journald
root 96 0.0 0.0 27476 1432 ? Ss Apr23 0:00 /usr/sbin/cron -f
root 102 0.0 0.0 12664 1236 tty3 Ss+ Apr23 0:00 /sbin/agetty --noclear tty3 linux
root 103 0.0 0.0 12664 1212 tty2 Ss+ Apr23 0:00 /sbin/agetty --noclear tty2 linux
root 104 0.0 0.0 12664 1172 tty4 Ss+ Apr23 0:00 /sbin/agetty --noclear tty4 linux
root 105 0.0 0.0 12664 1224 tty1 Ss+ Apr23 0:00 /sbin/agetty --noclear tty1 linux
root 106 0.0 0.0 14236 1272 console Ss+ Apr23 0:00 /sbin/agetty --noclear --keep-baud console 1152
root 121 0.0 0.0 8440 1160 ? S Apr23 0:00 /usr/sbin/syslogd --no-forward
Debian-+ 364 0.0 0.0 53248 2260 ? Ss Apr23 0:00 /usr/sbin/exim4 -bd -q30m
root 1027 0.0 0.0 55184 3908 ? Ss Apr24 0:00 /usr/sbin/sshd -D
root 2016 0.0 0.0 82812 4540 ? Ss 15:56 0:00 sshd: bjorn [priv]
bjorn 2018 0.0 0.0 82812 3116 ? S 15:56 0:00 sshd: bjorn@pts/4
bjorn 2019 0.0 0.0 4336 632 pts/4 Ss 15:56 0:00 -sh
bjorn 2024 0.0 0.0 46476 3768 pts/4 S+ 15:56 0:00 ssh grundbult
root 2190 0.0 0.0 21880 2616 ? Ss 18:01 0:00 /bin/bash
root 2219 0.0 0.0 58600 4092 ? Ss 18:09 0:00 sshd: [accepted]
sshd 2220 0.0 0.0 56528 2744 ? S 18:09 0:00 sshd: [net]
root 2221 0.2 0.0 58600 4148 ? Ss 18:10 0:00 sshd: [accepted]
sshd 2222 0.0 0.0 56528 2740 ? S 18:10 0:00 sshd: [net]
root 2224 0.0 0.0 58600 4156 ? Ss 18:10 0:00 sshd: [accepted]
sshd 2225 0.0 0.0 56528 2652 ? S 18:10 0:00 sshd: [net]
root 2228 0.0 0.0 81592 4536 ? Ss 18:11 0:00 sshd: bjorn [priv]
sshd 2229 0.0 0.0 0 0 ? Z 18:11 0:00 [sshd] <defunct>
root 2231 0.0 0.0 19100 1640 ? R+ 18:11 0:00 ps aux

===============================================================================

strace -f sshd
execve("/usr/sbin/sshd", ["sshd"], [/* 18 vars */]) = 0
brk(0) = 0x559e98a77000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe860909000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=10501, ...}) = 0
mmap(NULL, 10501, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe860906000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libwrap.so.0", O_RDONLY|O_CLOEXEC...

Revision history for this message
brumbjorn (fafners) wrote :
Download full text (16.8 KiB)

strace -f sshd
execve("/usr/sbin/sshd", ["sshd"], [/* 18 vars */]) = 0
brk(0) = 0x559e98a77000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe860909000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=10501, ...}) = 0
mmap(NULL, 10501, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe860906000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libwrap.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3000\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=40624, ...}) = 0
mmap(NULL, 2138176, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe8604e0000
mprotect(0x7fe8604e9000, 2093056, PROT_NONE) = 0
mmap(0x7fe8606e8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7fe8606e8000
mmap(0x7fe8606ea000, 64, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe8606ea000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpam.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300&\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=64024, ...}) = 0
mmap(NULL, 2159200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe8602d0000
mprotect(0x7fe8602dd000, 2097152, PROT_NONE) = 0
mmap(0x7fe8604dd000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7fe8604dd000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20c\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=142728, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe860905000
mmap(NULL, 2246896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe8600ab000
mprotect(0x7fe8600cc000, 2097152, PROT_NONE) = 0
mmap(0x7fe8602cc000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7fe8602cc000
mmap(0x7fe8602ce000, 6384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe8602ce000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0#\7\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=2066816, ...}) = 0
mmap(NULL, 4176824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe85fcaf000
mprotect(0x7fe85fe7b000, 2097152, PROT_NONE) = 0
mmap(0x7fe86007b000, 184320, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DE...

Revision history for this message
brumbjorn (fafners) wrote :

Sorry, my first strace was cut.
I will leave my lxc hanging with ssh problems for now, for be abler to answer your questions quicker.
Thank you

//Bjorn

Revision history for this message
brumbjorn (fafners) wrote :
Download full text (3.9 KiB)

Hi,
I lxc-attached the container and stopped sshd and started sshd .
Next step was to strace -p PID for sshd while opening a new ssh session from outside.
As you can see no child processes started from sshd. se out put below from ps aux, strace -p. I also attched output from netstan -an inside the lxc during the attempted login connection

===============================================
ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 28208 3312 ? Ss Apr23 0:01 /sbin/init
root 38 0.0 0.0 32968 4372 ? Ss Apr23 0:00 /lib/systemd/systemd-journald
root 96 0.0 0.0 27476 1432 ? Ss Apr23 0:00 /usr/sbin/cron -f
root 102 0.0 0.0 12664 1236 tty3 Ss+ Apr23 0:00 /sbin/agetty --noclear tty3 linux
root 103 0.0 0.0 12664 1212 tty2 Ss+ Apr23 0:00 /sbin/agetty --noclear tty2 linux
root 104 0.0 0.0 12664 1172 tty4 Ss+ Apr23 0:00 /sbin/agetty --noclear tty4 linux
root 105 0.0 0.0 12664 1224 tty1 Ss+ Apr23 0:00 /sbin/agetty --noclear tty1 linux
root 106 0.0 0.0 14236 1272 console Ss+ Apr23 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102
root 121 0.0 0.0 8440 1160 ? S Apr23 0:00 /usr/sbin/syslogd --no-forward
Debian-+ 364 0.0 0.0 53248 2260 ? Ss Apr23 0:00 /usr/sbin/exim4 -bd -q30m
root 2190 0.0 0.0 21884 2628 ? Ss 18:01 0:00 /bin/bash
root 2284 0.0 0.0 42336 1960 ? S 18:17 0:00 /usr/sbin/CRON -f
root 2475 0.0 0.0 42336 1960 ? S 19:17 0:00 /usr/sbin/CRON -f
root 2686 0.0 0.0 42336 1960 ? S 20:17 0:00 /usr/sbin/CRON -f
root 2773 0.0 0.0 55184 4064 ? Ss 20:22 0:00 /usr/sbin/sshd -D
root 2774 0.0 0.0 19100 1656 ? R+ 20:22 0:00 ps aux
root@kjell:/etc/init.d# strace -p 2773
Process 2773 attached
sendto(4, "<38>Apr 25 20:22:26 sshd[2773]: "..., 68, MSG_NOSIGNAL, NULL, 0

===================================================================

netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 49 0 192.168.60.2:22 192.168.60.55:52195 ESTABLISHED
tcp 22 0 192.168.60.2:22 192.168.60.55:52190 CLOSE_WAIT
tcp 22 0 192.168.60.2:22 192.168.60.55:52186 CLOSE_WAIT
tcp6 0 0 ::1:25 :::* LISTEN
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ] DGRAM 27827 /run/systemd/notify
unix 10 [ ] DGRAM 28002 /dev/log
unix 2 [ ACC ] STREAM LISTENING 27828 /run/systemd/private
unix 2 [ ] DGRAM 27831 /run/systemd/shutdownd
unix 2 [ ] ...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Odd, I wonder why you are getting the re-exec requires full path error, as execve is clearly getting a full path.

So it seems like an sshd bug, let's see when we get the un-cut strace.

Revision history for this message
brumbjorn (fafners) wrote : Re: [Bug 1572061] Re: login sessions hangs in lxc

Serge,
Sorry for being unclear.
This is not only related to ssh, as I mentioned earlier it happends to
dovecot too.

On Mon, Apr 25, 2016 at 8:34 PM, Serge Hallyn <email address hidden>
wrote:

> Odd, I wonder why you are getting the re-exec requires full path error,
> as execve is clearly getting a full path.
>
> So it seems like an sshd bug, let's see when we get the un-cut strace.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1572061
>
> Title:
> login sessions hangs in lxc
>
> Status in lxc package in Ubuntu:
> Incomplete
>
> Bug description:
> login to lxc container hangs after login prompt. This happends to
> randomly lxc container while other containers on same host work as expected.
> Several protocols affected, ssh, dovecot & smtp for example. DNS config
> is ok. lxc-attach -n name works and I'm able to reach outside network, do
> resolving etc.
> if I do a lxc-stop and start the container works again. after a day or
> two samething occurs again to a random container on the same host.
> Could it be some kind of apparmor blocking ?
>
>
> HostOs
> Description: Ubuntu 16.04 LTS
> Release: 16.04
>
> network mode
>
> iface br0 inet static
> bridge_ports eno1
> address x.x.x.x
> netmask 255.255.255.0
> network x.x.x.x
> broadcast x.x.x.x
> gateway x.x.x.x
> dns-nameservers x.x.x.x
> dns-search domain.com
>
> lxc-os ubuntu xenial & debian jessie
>
> configured with static ip
>
> best regards
> Bjorn
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1572061/+subscriptions
>

Revision history for this message
brumbjorn (fafners) wrote :

check earlier post for un-cut strace

On Mon, Apr 25, 2016 at 8:59 PM, Bjorne <email address hidden> wrote:

> Serge,
> Sorry for being unclear.
> This is not only related to ssh, as I mentioned earlier it happends to
> dovecot too.
>
> On Mon, Apr 25, 2016 at 8:34 PM, Serge Hallyn <email address hidden>
> wrote:
>
>> Odd, I wonder why you are getting the re-exec requires full path error,
>> as execve is clearly getting a full path.
>>
>> So it seems like an sshd bug, let's see when we get the un-cut strace.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1572061
>>
>> Title:
>> login sessions hangs in lxc
>>
>> Status in lxc package in Ubuntu:
>> Incomplete
>>
>> Bug description:
>> login to lxc container hangs after login prompt. This happends to
>> randomly lxc container while other containers on same host work as expected.
>> Several protocols affected, ssh, dovecot & smtp for example. DNS
>> config is ok. lxc-attach -n name works and I'm able to reach outside
>> network, do resolving etc.
>> if I do a lxc-stop and start the container works again. after a day or
>> two samething occurs again to a random container on the same host.
>> Could it be some kind of apparmor blocking ?
>>
>>
>> HostOs
>> Description: Ubuntu 16.04 LTS
>> Release: 16.04
>>
>> network mode
>>
>> iface br0 inet static
>> bridge_ports eno1
>> address x.x.x.x
>> netmask 255.255.255.0
>> network x.x.x.x
>> broadcast x.x.x.x
>> gateway x.x.x.x
>> dns-nameservers x.x.x.x
>> dns-search domain.com
>>
>> lxc-os ubuntu xenial & debian jessie
>>
>> configured with static ip
>>
>> best regards
>> Bjorn
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1572061/+subscriptions
>>
>
>

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Can you show the contents of the container's /etc/hosts? Does adding an entry for

127.0.0.1 kjell

help?

Revision history for this message
brumbjorn (fafners) wrote :

Ok I tested as described.
127.0.0.1 localhost kjell
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

No it did not solv the issue :(

//Bjorn

Revision history for this message
brumbjorn (fafners) wrote :
Download full text (3.4 KiB)

Latest observation from my side is that lxc-console hangs as well. During login via lxc-console it hangs after motd

lxc-console -n kjell

Connected to tty 1
Type <Ctrl+a q> to exit the console, <Ctrl+a Ctrl+a> to enter Ctrl+a itself

Debian GNU/Linux 8 kjell tty1

kjell login: NNNN
Password:
Last login: Mon Apr 25 15:56:13 CEST 2016 from 151.156.192.9 on pts/4
Linux kjell 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

__________________________________

I listed the processes and found that even CRON was stucked. Se list below

ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 28208 3316 ? Ss Apr23 0:01 /sbin/init
root 38 0.0 0.0 32968 4372 ? Ss Apr23 0:00 /lib/systemd/systemd-journald
root 96 0.0 0.0 27476 1432 ? Ss Apr23 0:00 /usr/sbin/cron -f
root 102 0.0 0.0 12664 1236 tty3 Ss+ Apr23 0:00 /sbin/agetty --noclear tty3 linux
root 103 0.0 0.0 12664 1212 tty2 Ss+ Apr23 0:00 /sbin/agetty --noclear tty2 linux
root 104 0.0 0.0 12664 1172 tty4 Ss+ Apr23 0:00 /sbin/agetty --noclear tty4 linux
root 105 0.0 0.0 63316 2268 tty1 Ss+ Apr23 0:00 /bin/login --
root 106 0.0 0.0 14236 1272 console Ss+ Apr23 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102
root 121 0.0 0.0 8440 1160 ? S Apr23 0:00 /usr/sbin/syslogd --no-forward
Debian-+ 364 0.0 0.0 53248 2260 ? Ss Apr23 0:00 /usr/sbin/exim4 -bd -q30m
root 2284 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2475 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2686 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2789 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2794 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2813 0.0 0.0 55184 4056 ? Ss Apr25 0:00 /usr/sbin/sshd -D
root 2864 0.0 0.0 42336 1960 ? S Apr25 0:00 /usr/sbin/CRON -f
root 2869 0.0 0.0 42336 1960 ? S 00:17 0:00 /usr/sbin/CRON -f
root 2874 0.0 0.0 42336 1960 ? S 01:17 0:00 /usr/sbin/CRON -f
root 2879 0.0 0.0 42336 1960 ? S 02:17 0:00 /usr/sbin/CRON -f
root 2884 0.0 0.0 42336 1960 ? S 03:17 0:00 /usr/sbin/CRON -f
root 2889 0.0 0.0 42336 1960 ? S 04:17 0:00 /usr/sbin/CRON -f
root 2894 0.0 0.0 42336 1960 ? S 05:17 0:00 /usr/sbin/CRON -f
root 2900 0.0 0.0 42336 1960 ? S 06:17 0:00 /usr/sbin/CRON -f
root 2901 0.0 0.0 21888 2668 ? Ss 06:20 0:00 /bin/bash
root 2934 0.0 0.0 19100 1628 ? R+ 06:21 0:00 ps aux
___________...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

thanks again for reporting this. I simply cannot reproduce this and haven't
found any likely cause. If you can reliably reproduce this in some sort
of cloud instance, i.e. so we could start a digitalocean or amazon instance
with a cloud-init script to set up a bad container, that would be
immensely helpful.

My best guess is still that this is a result of the static host bridge
configuration. Have you been able to reproduce this using the default
lxcbr0?

Revision history for this message
brumbjorn (fafners) wrote :

Hi,
to go for the default rsylog for the ubuntu lxc seems stable, I had not run in to any hanging sessions yet since replacing syslogd.
I still have an lxc with jessie and syslogd (the one from the mails above). I can see if I can reproduce this error with a clone of the mentioned container and lxcbr0.
This is an intermittent error so it can take some time for me to hit the bug again.
All the best,

Revision history for this message
brumbjorn (fafners) wrote :
Download full text (102.3 KiB)

I got an hanging session again in the container mentioned in last post.

 strace -f syslogd
execve("/usr/sbin/syslogd", ["syslogd"], [/* 18 vars */]) = 0
brk(0) = 0xa67000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9128bad000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=10501, ...}) = 0
mmap(NULL, 10501, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9128baa000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\17\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=10680, ...}) = 0
mmap(NULL, 2105624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f912878c000
mprotect(0x7f912878e000, 2093056, PROT_NONE) = 0
mmap(0x7f912898d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f912898d000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1738176, ...}) = 0
mmap(NULL, 3844640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f91283e1000
mprotect(0x7f9128583000, 2093056, PROT_NONE) = 0
mmap(0x7f9128782000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a1000) = 0x7f9128782000
mmap(0x7f9128788000, 14880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9128788000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9128ba9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9128ba8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9128ba7000
arch_prctl(ARCH_SET_FS, 0x7f9128ba8700) = 0
mprotect(0x7f9128782000, 16384, PROT_READ) = 0
mprotect(0x7f912898d000, 4096, PROT_READ) = 0
mprotect(0x611000, 4096, PROT_READ) = 0
mprotect(0x7f9128baf000, 4096, PROT_READ) = 0
munmap(0x7f9128baa000, 10501) = 0
brk(0) = 0xa67000
brk(0xa88000) = 0xa88000
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(3) = 0
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(3) = 0
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=497, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVA...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi - could you show the result of

ls -l /dev/kmsg

in the container?

Revision history for this message
brumbjorn (fafners) wrote :

Hi,
There is no such file in any of my containers.

I had to take acton on this issue so I removed inetutils-syslogd in my affected lxc:s.
Everything is now stable so I think it's clear that this i releated to behavior of syslogd in 16.04 lxc:s .
I'm sorry to say but I have no cloud resources to set this up as you asked.

Best regards
Bjorn

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Stanislav Sinyagin (ssinyagin) wrote :

I have exactly the same problem on my host. Host OS: Debian 8.10. Container OS: Debian 9.4

I started "/usr/sbin/sshd -d -p 8022", and when I connected to it from another host, the process has hung with the latest log message:

debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8

Also strace shows a similar line:

# strace -p 12014
strace: Process 12014 attached
sendto(3, "<39>May 6 02:42:08 sshd[12014]:"..., 94, MSG_NOSIGNAL, NULL, 0

Then while this was hanging, I did "apt-get remove inetutils-syslogd", and the SSH session immediately went through.

But RDP client still cannot connect to xrdp running in this container. I'll keep watching it after a reboot.

Revision history for this message
Stanislav Sinyagin (ssinyagin) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.