pam login 25s timeout delay due to tickling ntp systemd units

Bug #1916226 reported by kehlarn
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu
New
Undecided
Unassigned

Bug Description

TLDR:

attempting to run ntp in standalone instead of dhcp driven somehow leads to logind/dbus pam timeouts and throw errors to auth.log and results in pam logins via console/ssh/su to be delayed by 25s for said timeout but otherwise work. at first I thought I had a regression on 1591411 but this is different. I do not understand why this happened but presume it is because nm has some weird side effect on network-pre.target so disabling the two ntp units didn't work as I intended.

I have since run into 1823098 which has the same starting issue I had which is to run ntp in standalone mode with a provided /etc/ntp.conf that isn't modified by dhcp. I am filing this bug in case somebody else travels down this road, hits these log messages and wonders wtf broke.

the ntp issues:
- there should be a knob in /etc/default/ntp to specify which form of ntp you want (none/standalone/dhcp/netplan/etc)
- vaguely using netplan 'use-ntp' shouldn't be the only way to handle this and even that's locked into a bool instead of a selection since it has sync semantics
- dhcp shouldn't assume 'ntp-servers' is requested without checking for it. there's no /etc/default/dhcp that I can see so you have to modify /etc/dhcp/dhclient.conf directly as it is in the request block and not a useful hook.

perhaps there's an intended way to do this but I didn't run into it thus far. open to suggestions. meanwhile my solution was simply to disable the two units and no longer be a network-pre.target. it seemed to work for 2days and then I started seeing these on pam auths:

===

DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"

$ uptime #all users are just xterm login shells for me in X
 23:34:24 up 34 days, 2:55, 31 users, load average: 0.55, 0.44, 0.38

$ dpkg -l|egrep 'systemd |dbus-user-session'|awk '{print $1,$2,$3,$4}'|column -t
ii dbus-user-session 1.12.2-1ubuntu1.2 amd64
ii libsystemd0:amd64 237-3ubuntu10.42 amd64
ii libsystemd0:i386 237-3ubuntu10.42 i386
ii python3-systemd 234-1build1 amd64
ii systemd 237-3ubuntu10.42 amd64

$ systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

$ ps laxwww|grep dbus-[d]
4 106 1021 1 20 0 52740 3548 - Ss ? 13:45 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
0 1000 1839 1712 20 0 51580 2160 ep_pol Ss ? 1:37 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
0 1000 2286 2273 20 0 50052 984 ep_pol S ? 0:16 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3

===8<: trigger1
$ date;sudo su - build -c /bin/date
Thu Feb 18 22:54:40 PST 2021
[sudo] password for luser:
Thu Feb 18 22:55:06 PST 2021

===8<: trigger1 auth.log
Feb 18 22:54:10 bahx dbus-daemon[1021]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:54:10 bahx sshd[22024]: pam_systemd(sshd:session): Failed to create session: Connection timed out
Feb 18 22:54:10 bahx systemd-logind[1030]: Failed to start session scope session-941.scope: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:54:10 bahx sshd[22090]: Received disconnect from 127.0.0.1 port 50456:11: disconnected by user
Feb 18 22:54:10 bahx sshd[22090]: Disconnected from user luser 127.0.0.1 port 50456
Feb 18 22:54:10 bahx sshd[22024]: pam_unix(sshd:session): session closed for user luser
Feb 18 22:54:41 bahx sudo: luser : TTY=pts/26 ; PWD=/home/luser ; USER=root ; COMMAND=/bin/su - build -c /bin/date
Feb 18 22:54:41 bahx sudo: pam_unix(sudo:session): session opened for user root by luser(uid=0)
Feb 18 22:54:41 bahx su[22111]: Successful su for build by root
Feb 18 22:54:41 bahx su[22111]: + /dev/pts/26 root:build
Feb 18 22:54:41 bahx su[22111]: pam_unix(su:session): session opened for user build by luser(uid=0)
Feb 18 22:55:06 bahx dbus-daemon[1021]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:55:06 bahx su[22111]: pam_systemd(su:session): Failed to create session: Connection timed out
Feb 18 22:55:06 bahx systemd-logind[1030]: Failed to start user slice user-20000.slice, ignoring: Connection timed out (org.freedesktop.DBus.Error.Timeout)
Feb 18 22:55:06 bahx su[22111]: pam_unix(su:session): session closed for user build
Feb 18 22:55:06 bahx sudo: pam_unix(sudo:session): session closed for user root

===8<: trigger2
$ date;ssh localhost date
Thu Feb 18 22:55:52 PST 2021
generic banner
luser@localhost's password:
Thu Feb 18 22:56:18 PST 2021

===8<: trigger2 auth.log
Feb 18 22:55:53 bahx sshd[22135]: Accepted password for luser from 127.0.0.1 port 50608 ssh2
Feb 18 22:55:53 bahx sshd[22135]: pam_unix(sshd:session): session opened for user luser by (uid=0)
Feb 18 22:55:56 bahx systemd-logind[1030]: Failed to start session scope session-2.scope: Connection timed out
Feb 18 22:55:56 bahx dbus-daemon[1021]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:56:12 bahx dbus-daemon[1021]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out (service_start_timeout=25000ms)
Feb 18 22:56:18 bahx sshd[22135]: pam_systemd(sshd:session): Failed to create session: Connection timed out
Feb 18 22:56:18 bahx sshd[22202]: Received disconnect from 127.0.0.1 port 50608:11: disconnected by user
Feb 18 22:56:18 bahx sshd[22202]: Disconnected from user luser 127.0.0.1 port 50608
Feb 18 22:56:18 bahx sshd[22135]: pam_unix(sshd:session): session closed for user luser
Feb 18 22:56:21 bahx dbus-daemon[1021]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:56:21 bahx systemd-logind[1030]: Failed to stop user service: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

===

all other references to these log messages end up with a generic restart logind/daemon reexec/reload and ignore the cause so I didn't repeat them blindly.

instead i reverted and reenabled/started the two units I disabled/stopped and voila pam worked and no 25s delay on login.

===
$ for i in /var/log/auth.log*;do echo $i|egrep -q gz;(test $? -eq 0&&gzip -dc $i||cat $i)|egrep -i systemd1;done|awk 'BEGIN{IFS=":"}{print $1,$2}'|sort|uniq -c
      2 Feb 15
    277 Feb 17
    424 Feb 18 #mostly me triaging ssh/sudo su

$ for i in /var/log/auth.log*;do echo $i|egrep -q gz\$;(test $? -eq 0&&gzip -dc $i||cat $i)|egrep -i systemd1|awk '{print "'$i'",$0}';done|egrep 'Feb 15'
/var/log/auth.log Feb 15 06:51:05 bahx polkitd(authority=local): Operator of unix-session:c2 FAILED to authenticate to gain authorization for action org.freedesktop.systemd1.manage-units for system-bus-name::1.12149 [systemctl start ntp.service] (owned by unix-user:luser)
/var/log/auth.log Feb 15 07:06:43 bahx polkitd(authority=local): Operator of unix-session:c2 FAILED to authenticate to gain authorization for action org.freedesktop.systemd1.manage-units for system-bus-name::1.12173 [systemctl reset-failed ntp-systemd-netif.service] (owned by unix-user:luser)

===
this was me attempting to shut them down in a xterm without sudo and canceling the auth dialog and redoing them with sudo. the units disabled/stopped:

ntp-systemd-netif.service
ntp-systemd-netif.path

because of:

===
$ systemctl list-units --failed --all
  UNIT LOAD ACTIVE SUB DESCRIPTION
● ntp-systemd-netif.service loaded failed failed ntp-systemd-netif.service

with:

sudo systemctl reset-failed ntp-systemd-netif.service

===

at this point I have disabled/stopped the units again and pam is still happy. I'll let it run to see if it returns. i did change one thing though since then. because the failure of the unit was because the target script hook for ntp was removed I replaced the ntp hook with a simple exit 0 bash script so it doesn't trigger failure. perhaps that will be enough to prevent this cascade? time will tell

in any event something this minor in a single daemon shouldn't break pam across the system I should think. that's rather a violent overreach methinks.

whether this classifies under nm, dbus, systemd-logind, or ntp is the real mystery. nm barked first it seems. there's some nuance in ntp's systemd config not aligning with nm I guess? why it took 2 days to trigger I assume is like the previous issue requiring enough usage or SOMAXCONN or something like it before 'it' happens. still tbd. i triggered another nonsudo auth dialog to try to reproduce but seems fine still

Tags: bot-comment
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1916226/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.