ppc64el ubuntu-server ISO does not install libpam-systemd

Bug #1561658 reported by Breno Leitão on 2016-03-24
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Seeds
Fix Released
Critical
Unassigned
ubuntu-meta (Ubuntu)
Critical
Martin Pitt

Bug Description

On Ubuntu 16.04/ppc64el, the cgroup for a user session (bash) inherits from a sshd.service, when the user logs into the machine using SSH.

This causes the amount of process to be limited by /etc/systemd/system/conf DefaultTasksMax=512

This does not seem to happen on amd64. This is a cgroup tree diff:

On x64, bash (in this case, PID 19405 ) spawned by sshd belongs to CGROUP session-5.scope->user-1003.slice->user.slice:

└─user.slice
  ├─user-1000.slice
  │ ├─session-1.scope
  │ │ ├─634 sshd: brenohl [priv]
  │ │ ├─660 sshd: brenohl@pts/0
  │ │ └─661 -bash
  │ └─user@1000.service
  │ ├─636 /lib/systemd/systemd --user
  │ └─637 (sd-pam)
  └─user-1003.slice
    ├─session-5.scope
    │ ├─19379 sshd: gromero [priv]
    │ ├─19404 sshd: gromero@pts/1
    │ ├─19405 -bash

However, in ppc64le, bash (in this case, PID 1913), spawned by sshd belongs to CGROUP ssh.service->system.slice->-.slice:

-.slice
├─1720 /sbin/cgmanager -m name=systemd
├─init.scope
└─system.slice
  ├─dbus.service
  │ └─1699 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  ├─cron.service
  │ └─1702 /usr/sbin/cron -f
  ├─<email address hidden>
  │ └─1833 /sbin/dhclient
  ├─accounts-daemon.service
  │ └─1717 /usr/lib/accountsservice/accounts-daemon
  ├─system-serial\x2dgetty.slice
  │ └─<email address hidden>
  │ └─1875 /sbin/agetty --keep-baud 115200 38400 9600 hvc0 vt220
  ├─systemd-journald.service
  │ └─1382 /lib/systemd/systemd-journald
  ├─systemd-timesyncd.service
  │ └─1639 /lib/systemd/systemd-timesyncd
  ├─ssh.service
  │ ├─1863 /usr/sbin/sshd -D
  │ ├─1897 sshd: gromero [priv]
  │ ├─1912 sshd: gromero@pts/0
  │ ├─1913 -bash

Having the user session associated with the systemd cgroups (/system.slice/ssh.service) instead of normal user/session cgroups (as user-XXXX.slice/session-5.scope), causes the process to be limited to the systemd TasksMax limit, thus, causing "Cannot fork" and "Resource temporary unavailable" problems when the amount of processes reaches this 512 limit.

Gustavo Romero has more details about this problem, and will comment soon.

Changed in systemd (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
status: New → Confirmed
Gustavo Romero (gromero) wrote :

Using this Makefile with "make -j500" will trigger the problematic behavior described in this bug.

Steve Langasek (vorlon) wrote :

Martin, please have a look at this (next week).

Changed in systemd (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Martin Pitt (pitti)
importance: Undecided → Critical
milestone: none → ubuntu-16.04
Gustavo Romero (gromero) wrote :

We can also check the different behavior of cgroups on ppc64le and on amd64 by means of specifying the exact cgroups under which we execute 'make':

## ppc64le
make clean; sudo cgexec -g devices:/system.slice --sticky make -j500 ## Fails on ppc64le
make clean; sudo cgexec -g devices:/system.slice/ssh.service --sticky make -j500 ## Fails on ppc64le
make clean; sudo cgexec -g pids:/system.slice --sticky make -j500 ## Works on ppc64le
make clean; sudo cgexec -g pids:/system.slice/ssh.service --sticky make -j500 ## Fails on ppc64le

## amd64
make clean; sudo cgexec -g devices:/system.slice --sticky make -j500 ## Works on x64
make clean; sudo cgexec -g devices:/system.slice/ssh.service --sticky make -j500 ## Works on x64
make clean; sudo cgexec -g pids:/system.slice --sticky make -j500 Makefile Makefile Makefile ## Cgroup does not exist on x64
make clean; sudo cgexec -g pids:/system.slice/ssh.service --sticky make -j500 ## Cgroup does not exist on x64

Gustavo Romero (gromero) wrote :

When make fails, we get something like:

sleep 1
touch makefork-466
make: fork: Resource temporarily unavailable
sleep 1
make: fork: Resource temporarily unavailable
make: *** Deleting file 'makefork-464'
sleep 1
make: fork: Resource temporarily unavailable
make: *** Deleting file 'makefork-465'

Martin Pitt (pitti) wrote :

I checked this on a ppc64el xenial scalingstack instance, and ssh sessions are in the expected controller:

$ egrep 'systemd|pids' /proc/self/cgroup
5:pids:/user.slice/user-1000.slice/session-2.scope
1:name=systemd:/user.slice/user-1000.slice/session-2.scope

$ cat /sys/fs/cgroup/pids//user.slice/user-1000.slice/session-2.scope/pids.max
max

What is "cat /proc/self/cgroup" in an ssh session for you? The cgroup tree output from above is likely the name=systemd controller, but TasksMax translates to the "pids.max" setting in the "pids" cgroup controller.

So we need to find out what's different on your system:

 - Do you see any error messages in "sudo journalctl -t sshd"?
 - Does your session have a $XDG_SESSION_ID (should be a small number)?
 - What is the output of "sudo journalctl -t systemd-logind", "loginctl", and "loginctl show-session $XDG_SESSION_ID"?
 - Can you please just attach the output of the full journal, just in case? (sudo journalctl -b > /tmp/journal.txt)

Thank you!

summary: - “Cannot fork” and "Resource temporary unavailable"
+ ppc64el ssh sessions don't run in session cgroup but in sshd's
Changed in systemd (Ubuntu):
status: Confirmed → Incomplete

Output produced by "sudo journalctl -b > /tmp/journal.txt"

Gustavo Romero (gromero) wrote :

Hi Martin,

The ssh session controller here is different from yours:

$ egrep 'systemd|pids' /proc/self/cgroup
11:pids:/system.slice/ssh.service
1:name=systemd:/system.slice/ssh.service

$ cat /sys/fs/cgroup/pids/system.slice/pids.max
max

$ cat /sys/fs/cgroup/pids/system.slice/ssh.service/pids.max
512

Here are the evidences you requested (journal.txt is attached above this comment):

$ cat /proc/self/cgroup
11:perf_event:/
10:blkio:/
9:memory:/
8:net_cls,net_prio:/
7:freezer:/
6:cpuset:/
5:cpu,cpuacct:/
4:devices:/system.slice/ssh.service
3:hugetlb:/
2:pids:/system.slice/ssh.service
1:name=systemd:/system.slice/ssh.service

No error messages in "sudo journalctl -t sshd":
$ sudo journalctl -t sshd -l
-- Logs begin at Wed 2016-03-30 09:55:43 BRT, end at Wed 2016-03-30 10:01:01 BRT. --
Mar 30 09:55:44 gromero2 sshd[1902]: Server listening on 0.0.0.0 port 22.
Mar 30 09:55:44 gromero2 sshd[1902]: Server listening on :: port 22.
Mar 30 09:57:29 gromero2 sshd[1942]: Accepted publickey for gromero from 192.168.122.1 port 46308 ssh2: RSA SHA256:KD/WZk56NzE26ubz6Aw8LE1RdJfeQlRzJ36wwYHyE0c
Mar 30 09:57:29 gromero2 sshd[1942]: pam_unix(sshd:session): session opened for user gromero by (uid=0)

$XDG_SESSION_ID is unset, echo $XDG_SESSION_ID shows nothing.

$ sudo journalctl -t systemd-logind
-- Logs begin at Wed 2016-03-30 09:55:43 BRT, end at Wed 2016-03-30 10:03:34 BRT. --
Mar 30 09:55:44 gromero2 systemd-logind[1780]: New seat seat0.

$ loginctl
   SESSION UID USER SEAT

0 sessions listed.

$ loginctl show-session $XDG_SESSION_ID
EnableWallMessages=no
NAutoVTs=6
KillExcludeUsers=root
KillUserProcesses=no
RebootToFirmwareSetup=no
IdleHint=yes
IdleSinceHint=0
IdleSinceHintMonotonic=0
InhibitDelayMaxUSec=5s
HandlePowerKey=poweroff
HandleSuspendKey=suspend
HandleHibernateKey=hibernate
HandleLidSwitch=suspend
HandleLidSwitchDocked=ignore
HoldoffTimeoutUSec=30s
IdleAction=ignore
IdleActionUSec=30min
PreparingForShutdown=no
PreparingForSleep=no
Docked=no

Thank you!

Martin Pitt (pitti) wrote :

A-ha! So there's no logind session, and indeed if I remove pam_systemd.so from /etc/pam.d/common-session I get the same effect.

Do you have "session optional pam_systemd.so" in /etc/pam.d/common-session ? If so, it fails for some reason and we need to find out why. But there's no trace of an error, or it trying to create a session in the journal, so it's more likely just missing.

Assuming that it is missing indeed, how was this system installed? Do you do any customizations to that file? Normally this is handled automatically by pam-auth-update. If you run that (via sudo), it should have "Create cgroups for user login sessions" enabled, and create a file with pam_systemd. What happens there?

summary: - ppc64el ssh sessions don't run in session cgroup but in sshd's
+ ssh sessions don't run in session cgroup but in sshd's
summary: - ssh sessions don't run in session cgroup but in sshd's
+ ssh sessions don't run in session cgroup but in sshd's -- pam_systemd
+ missing

Marin, you are right! No libpam-systemd installed, so no pam_systemd.so file and no "session optional pam_systemd.so" entry in /etc/pam.d/common-session. I've just installed libpam-systemd and logged in and the issue vanished. However, it seems "sudo pam-auth-update" is no handling it automatically, so executing it has no effect (I see just two options: Unix authentication [x] and Create home directory on login [ ]; so no "Create home directory on login" option). There are no customizations in any file.

Now ssh sessions are in the right controller:

$ egrep 'systemd|pids' /proc/self/cgroup
5:pids:/user.slice/user-1000.slice/session-2.scope
1:name=systemd:/user.slice/user-1000.slice/session-2.scope

and also, for instance, "make -j500" works fine.

The system was installed initially from a daily build (about a month ago) and updated/upgraded/dist-upgraded/do-release-upgraded continuously ever since.

I'm installing a fresh one to verify if this issue still exists or not. I'll let you know.

Thanks!

Gustavo Romero (gromero) wrote :

Martin , I meant "so no `Create cgroups for user login sessions` option" in the previous comment.

Breno Leitão (breno-leitao) wrote :

It seems that libpam-systemd is installed as a dependency of policykit-1 on amd64. Policykit-1 is not being installed automatically on ppc64el but on amd64.

So, I understand that we might have two solutions:

a) Install policykit-1 in ppc64el, and this will bring libpam-systemd as a dependency

b) Makes systemd dependent of libpam-systemd.

Option b) seems to be the better than a) from a shallow point of view, mainly because in the solution "a)" a user can remove policykit-1 (or even libpam-systemd) and start being bite by this problem

Martin Pitt (pitti) wrote :

> b) Makes systemd dependent of libpam-systemd.

systemd already Recommends: libpam-systemd, so it should already be installed by default. It's not strictly required, so it can be removed for people who know what they are doing, don't need user logins, and need a minimal footprint (embedded devices or container workloads, for example). Similar story for policykit-1.

Hence my question how you installed this box -- if this is any supported method such as netboot, and that doesn't install it by default, this is what we need to fix.

Gustavo Romero (gromero) wrote :

Hi, Martin. It seems Breno talked to Adam Conrad and the pcc64le ISO is not installing policykit-1 (but it should as we can see here http://goo.gl/WTK54h) correctly. The ISO I installed and tested is this one here:

http://cdimage.ubuntu.com/ubuntu-server/daily/current/xenial-server-ppc64el.iso

Regarding getting libpam-systemd higher on the install stack or not, I think (but not sure) the install procedure doesn't install by default "Recommends", so it won't install libpam-systemd anyway if policykit-1 is removed from install. On x64 we get libpam-systemd only because of polocykit-1 being installed, and given the importance of libpam-systemd I would vote for b) as Breno said, ie not "Recommends", but "Depends".

Martin Pitt (pitti) wrote :

> the install procedure doesn't install by default "Recommends",

That would be a grave bug indeed. "Recommends" are pretty strong, and I'd consider a system without any recommends as pretty broken. I retitle the bug accordingly for now.

summary: - ssh sessions don't run in session cgroup but in sshd's -- pam_systemd
- missing
+ ppc64el ubuntu-server ISO does not install libpam-systemd (not
+ installing recommends?)
affects: systemd (Ubuntu) → debian-installer (Ubuntu)

I downloaded and installed today's amd64 xenial server ISO (as I don't have ppc64el hardware where I could run the installer). I kept all the default language/mode options. I went with "Guided - Use entire disk", and left the default task selection (which was only "standard system utilities", the rest was disabled). This installed libpam-systemd by default, as expected.

So I figure this is either ppc64el image specific, or you installed in some other way. Can you please attach /var/log/installer/syslog from that system? Thanks!

Gustavo Romero (gromero) wrote :

/var/log/installer/syslog from the ppc64le ISO install mentioned

Gustavo Romero (gromero) wrote :

Hi, Martin

We've already checked that libpam-systemd no being installed from ISO occurs just on ppc64le (because policykit-1 is not installed as it is being on amd64).

The log you requested is attached above.

Thank you!

Martin Pitt (pitti) wrote :

I now explicitly seeded libpam-systemd for server, like we do on desktop. It is already transitively pulled in on x86, so let's make this explicit to have a consistent install on all architectures.

We can't just bump the Recommends:, as that would pull in libpam-systemd and dbus into a debootstrap and thus make D-Bus essential. (Which in turn makes porting to new architectures harder, bloads up chroots, etc.). It'd also be conceptually wrong. But I suppose "systemd" being in the essential set is related to why its recommends are not installed by default.

affects: debian-installer (Ubuntu) → ubuntu-meta (Ubuntu)
Changed in ubuntu-meta (Ubuntu):
status: Incomplete → In Progress
status: In Progress → Fix Committed
summary: - ppc64el ubuntu-server ISO does not install libpam-systemd (not
- installing recommends?)
+ ppc64el ubuntu-server ISO does not install libpam-systemd
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-meta - 1.353

---------------
ubuntu-meta (1.353) xenial; urgency=medium

  * Refreshed dependencies
  * Added libpam-systemd to server (LP: #1561658)

 -- Martin Pitt <email address hidden> Thu, 31 Mar 2016 15:27:53 +0200

Changed in ubuntu-meta (Ubuntu):
status: Fix Committed → Fix Released
affects: baltix → ubuntu-rtm
Changed in ubuntu-rtm:
importance: Undecided → Critical
status: New → Fix Released
affects: ubuntu-rtm → ubuntu-seeds
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers