systemd-logind spins in cgmanager_ping_sync()

Bug #1303649 reported by James Page on 2014-04-07
56
This bug affects 9 people
Affects Status Importance Assigned to Milestone
cgmanager (Ubuntu)
High
Unassigned
libnih (Ubuntu)
High
Unassigned
lxc (Ubuntu)
High
Unassigned
systemd (Ubuntu)
High
Stéphane Graber

Bug Description

systemd-logind is consuming a high level of cpu on a continual basis:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  676 root 20 0 43644 2144 1568 R 100.0 0.0 74:43.77 systemd-logind

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: systemd-services 204-5ubuntu17
ProcVersionSignature: Ubuntu 3.13.0-22.44-generic 3.13.8
Uname: Linux 3.13.0-22-generic x86_64
ApportVersion: 2.14.1-0ubuntu1
Architecture: amd64
CurrentDesktop: Unity
Date: Mon Apr 7 09:09:37 2014
EcryptfsInUse: Yes
InstallationDate: Installed on 2013-04-23 (348 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130423)
SourcePackage: systemd
UpgradeStatus: Upgraded to trusty on 2013-11-11 (146 days ago)

James Page (james-page) wrote :
James Page (james-page) wrote :

This appears to inhibit other operations such as logout/restart as well; having to sudo reboot from the command line when this occurs.

Martin Pitt (pitti) wrote :

This looks like fallout from the cgmanager integration, as it just started recently (confirmed on IRC) and otherwise logind hasn't changed for the whole trusty cycle. When it happens, could you attach strace -vvtts1024 to it and see what it's spinning on? Thanks!

Changed in systemd (Ubuntu):
status: New → Incomplete
James Page (james-page) wrote :
Download full text (3.8 KiB)

08:15:51.211285 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211353 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211424 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211500 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211573 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211649 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211725 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211797 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211879 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.211952 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212024 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212095 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212165 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212242 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212313 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212389 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212464 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212536 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212608 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212681 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212752 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212825 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212901 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.212975 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213051 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213125 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213202 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213276 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213348 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213423 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213496 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213572 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213648 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213725 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213800 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213889 write(6, "\0", 1) = -1 EAGAIN (Resource temporarily unavailable)
08:15:51.213987 write(6, "\0", 1) = -1 EAGAIN (Resource tempor...

Read more...

Martin Pitt (pitti) wrote :

Thanks. That's essentially just one write(). Could you check in /proc/<pid>/fd/ what fd 6 points to, or grab the full strace and see if there's a corresponding open.* = 6 or socket.* = 6 or similar, to find what fd 6 is?

James Page (james-page) wrote :

root@hendrix:/proc/649/fd# ls -l
total 0
lrwx------ 1 root root 64 Apr 10 09:12 0 -> /dev/null
lrwx------ 1 root root 64 Apr 10 09:12 1 -> /dev/pts/1
lrwx------ 1 root root 64 Apr 10 09:12 10 -> socket:[7848]
lrwx------ 1 root root 64 Apr 10 09:12 11 -> socket:[7849]
lrwx------ 1 root root 64 Apr 10 09:12 12 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Apr 10 09:12 13 -> socket:[7849]
lrwx------ 1 root root 64 Apr 10 09:12 14 -> /dev/input/event3
lrwx------ 1 root root 64 Apr 10 09:12 15 -> /dev/input/event0
lrwx------ 1 root root 64 Apr 10 09:12 16 -> /dev/input/event2
lrwx------ 1 root root 64 Apr 10 09:12 17 -> /dev/input/event1
lrwx------ 1 root root 64 Apr 10 09:12 18 -> /dev/tty6
lr-x------ 1 root root 64 Apr 10 09:12 19 -> /run/systemd/sessions/5.ref
lrwx------ 1 root root 64 Apr 10 09:12 2 -> /dev/pts/1
lr-x------ 1 root root 64 Apr 10 09:12 20 -> /run/systemd/sessions/c132.ref
l-wx------ 1 root root 64 Apr 10 09:12 21 -> /run/systemd/sessions/c132.ref
lrwx------ 1 root root 64 Apr 10 09:12 22 -> socket:[2523405]
lrwx------ 1 root root 64 Apr 10 09:12 3 -> socket:[9647]
lrwx------ 1 root root 64 Apr 10 09:12 4 -> anon_inode:[eventpoll]
lr-x------ 1 root root 64 Apr 10 09:12 5 -> pipe:[7828]
l-wx------ 1 root root 64 Apr 10 09:12 6 -> pipe:[7828]
lr-x------ 1 root root 64 Apr 10 09:12 7 -> /sys/devices/virtual/tty/tty0/active
lrwx------ 1 root root 64 Apr 10 09:12 8 -> socket:[7846]
lrwx------ 1 root root 64 Apr 10 09:12 9 -> socket:[7847]

James Page (james-page) wrote :

#0 0x00007fad4e680700 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1 0x00007fad4efbe377 in nih_main_loop_interrupt () from /lib/x86_64-linux-gnu/libnih.so.1
No symbol table info available.
#2 0x00007fad4eb8464b in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#3 0x00007fad4eb84f8e in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#4 0x00007fad4eb85626 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#5 0x00007fad4eb843ff in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#6 0x00007fad4eb6e9dc in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#7 0x00007fad4eb6f389 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#8 0x00007fad4eb6f94d in dbus_connection_send_with_reply_and_block () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#9 0x00007fad4f1d0a85 in cgmanager_ping_sync () from /lib/x86_64-linux-gnu/libcgmanager.so.0
No symbol table info available.
#10 0x00000000004223a3 in ?? ()
No symbol table info available.
#11 0x000000000041647f in ?? ()
No symbol table info available.
#12 0x000000000041655d in ?? ()
No symbol table info available.
#13 0x000000000040f3ed in ?? ()
No symbol table info available.
#14 0x00000000004107dc in ?? ()
No symbol table info available.
#15 0x000000000040c4d3 in ?? ()
No symbol table info available.
#16 0x00007fad4eb7de26 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#17 0x00007fad4eb70a01 in dbus_connection_dispatch () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#18 0x0000000000409467 in ?? ()
No symbol table info available.
#19 0x000000000040622c in ?? ()
No symbol table info available.
#20 0x00007fad4e5b5ec5 in __libc_start_main (main=0x4060d0, argc=1, argv=0x7fff596b4ca8, init=<optimised out>, fini=<optimised out>,
    rtld_fini=<optimised out>, stack_end=0x7fff596b4c98) at libc-start.c:287
        result = <optimised out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 5644452072334886061, 4219731, 140734693592224, 0, 0, -5644226841702234963, -5688182704516430675},
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7fff596b4cb8, 0x7fad5004c1c8}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1500204216}}}
        not_first_call = <optimised out>
#21 0x000000000040637c in ?? ()
No symbol table info available.

Martin Pitt (pitti) on 2014-04-10
summary: - systemd-logind high cpu consumption
+ systemd-logind spins in cgmanager_ping_sync()
Changed in systemd (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
status: Incomplete → New
James Page (james-page) wrote :
Download full text (5.0 KiB)

Backtrace with debug symbols installed

(gdb) bt full
#0 0x00007fad4e680700 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1 0x00007fad4efbe377 in nih_main_loop_interrupt () from /lib/x86_64-linux-gnu/libnih.so.1
No symbol table info available.
#2 0x00007fad4eb8464b in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#3 0x00007fad4eb84f8e in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#4 0x00007fad4eb85626 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#5 0x00007fad4eb843ff in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#6 0x00007fad4eb6e9dc in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#7 0x00007fad4eb6f389 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#8 0x00007fad4eb6f94d in dbus_connection_send_with_reply_and_block () from /lib/x86_64-linux-gnu/libdbus-1.so.3
No symbol table info available.
#9 0x00007fad4f1d0a85 in cgmanager_ping_sync () from /lib/x86_64-linux-gnu/libcgmanager.so.0
No symbol table info available.
#10 0x00000000004223a3 in cgm_dbus_connect () at ../src/shared/cgmanager.c:73
        dbus_error = {name = 0x0, message = 0x0, dummy1 = 1, dummy2 = 0, dummy3 = 0, dummy4 = 0, dummy5 = 1, padding1 = 0x2161b10}
        connection = <optimised out>
        __func__ = "cgm_dbus_connect"
#11 0x000000000041647f in cg_create (controller=controller@entry=0x212a3d0 "hugetlb", path=path@entry=0x2155dc0 "/user/0.user/c132.session",
    suffix=suffix@entry=0x0) at ../src/shared/cgroup-label.c:47
        fs = 0x0
        r = <optimised out>
        existed = 0
#12 0x000000000041655d in cg_create_and_attach (controller=controller@entry=0x212a3d0 "hugetlb", path=path@entry=0x2155dc0 "/user/0.user/c132.session",
    pid=20656) at ../src/shared/cgroup-label.c:89
        r = <optimised out>
        q = <optimised out>
        __PRETTY_FUNCTION__ = "cg_create_and_attach"
#13 0x000000000040f3ed in session_create_one_group (s=s@entry=0x2165df0, controller=controller@entry=0x212a3d0 "hugetlb",
    path=path@entry=0x2155dc0 "/user/0.user/c132.session") at ../src/login/logind-session.c:442
        r = <optimised out>
        __PRETTY_FUNCTION__ = "session_create_one_group"
#14 0x00000000004107dc in session_create_cgroup (s=0x2165df0) at ../src/login/logind-session.c:511
        k = 0x212a730
        p = 0x2155dc0 "/user/0.user/c132.session"
        r = <optimised out>
#15 session_start (s=0x2165df0) at ../src/login/logind-session.c:567
        r = <optimised out>
        __PRETTY_FUNCTION__ = "session_start"
        __func__ = "session_start"
#16 0x000000000040c4d3 in bus_manager_create_session (_reply=0x7fff596b4828, message=0x212dcf0, m=<optimised out>) at ../src/login/logind-dbus.c:672
        audit_id = 0
        vtnr = 0
        remote_user = 0x2137228 "root"
        leader = 20656
        id = 0x2166390 "c132"
        r = <optimised out>
        seat = 0x0
        remote = 1
        exists = 0
        type = 0x21371e8 "unspecified"
        remote_host = 0x2137234 "hendrix"
        service = 0x2137...

Read more...

Stéphane Graber (stgraber) wrote :

The stacktrace suggests we are in libcgmanager, so adding a cgmanager task.

Serge Hallyn (serge-hallyn) wrote :
Download full text (3.2 KiB)

It turns out the write() is not a part of the dbus transaction with cgmanager, but actually a part of libnih's mainloop exiting code:

(gdb) where
#0 0x00007f70ea509700 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f70eae47377 in nih_main_loop_interrupt () at main.c:630
#2 0x00007f70eaa0d64b in _dbus_transport_queue_messages (transport=transport@entry=0x2329290) at ../../dbus/dbus-transport.c:1157
#3 0x00007f70eaa0df8e in do_reading (transport=0x2329290) at ../../dbus/dbus-transport-socket.c:851
#4 0x00007f70eaa0e626 in socket_do_iteration (transport=0x2329290, flags=6, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport-socket.c:1162
#5 0x00007f70eaa0d3ff in _dbus_transport_do_iteration (transport=0x2329290, flags=3940863025, flags@entry=6, timeout_milliseconds=1,
    timeout_milliseconds@entry=25000) at ../../dbus/dbus-transport.c:976
#6 0x00007f70ea9f79dc in _dbus_connection_do_iteration_unlocked (connection=connection@entry=0x232d580, pending=pending@entry=0x232bf90, flags=flags@entry=6,
    timeout_milliseconds=timeout_milliseconds@entry=25000) at ../../dbus/dbus-connection.c:1234
#7 0x00007f70ea9f8389 in _dbus_connection_block_pending_call (pending=0x232bf90) at ../../dbus/dbus-connection.c:2415
#8 0x00007f70eaa0772a in dbus_pending_call_block (pending=<optimized out>) at ../../dbus/dbus-pending-call.c:748
#9 0x00007f70ea9f894d in dbus_connection_send_with_reply_and_block (connection=0x232d580, message=0x23056e0, timeout_milliseconds=-1, error=0x7fff22097b70)
    at ../../dbus/dbus-connection.c:3530
#10 0x00007f70eb05b4f5 in cgmanager_create_sync () from /lib/x86_64-linux-gnu/libcgmanager.so.0
#11 0x0000000000422520 in ?? ()
#12 0x00000000004164a4 in ?? ()
#13 0x000000000041655d in ?? ()
#14 0x000000000040f3ed in ?? ()
#15 0x00000000004107dc in ?? ()
#16 0x000000000040c4d3 in ?? ()
#17 0x00007f70eaa06e26 in _dbus_object_tree_dispatch_and_unlock (tree=0x22f92a0, message=message@entry=0x22fb9d0, found_object=found_object@entry=0x7fff22098114)
    at ../../dbus/dbus-object-tree.c:862
#18 0x00007f70ea9f9a01 in dbus_connection_dispatch (connection=0x22f8870) at ../../dbus/dbus-connection.c:4672
#19 0x0000000000409467 in ?? ()
#20 0x000000000040622c in ?? ()
#21 0x00007f70ea43eec5 in __libc_start_main (main=0x4060d0, argc=1, argv=0x7fff220982d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fff220982c8) at libc-start.c:287
#22 0x000000000040637c in ?? ()

This is happening in this part of nih/main.c:

/**
 * nih_main_loop_interrupt:
 *
 * Interrupts the current (or next) main loop iteration because of an
 * event that potentially needs immediate processing, or because some
 * condition of the main loop has been changed.
 **/
void
nih_main_loop_interrupt (void)
{
        nih_main_loop_init ();

        if (interrupt_pipe[1] != -1)
                while (write (interrupt_pipe[1], "", 1) < 0)
                        ;
}

Why the interrupt_pipe[0] woudl be closed is beyond me.

A simple fix would be to add a check for errno == EAGAIN in the while loop to avoid this condition. However, we should figure out why this is happening and hopef...

Read more...

Changed in libnih (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in cgmanager (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Serge Hallyn (serge-hallyn) wrote :

This patch seems to prevent logins from hanging and systemd-logind from taking all cpu time.

It probably should not be blindly applied as there remains the question: why is the interrupt buffer filling up.

tags: added: patch
James Hunt (jamesodhunt) wrote :

The problem here is that applications linked to the cgmanager client library are not calling nih_main_loop(). That's a problem because the nih-dbus code calls nih_main_loop_interrupt() which adds a byte to the main loop interrupt pipe. After 65536 bytes have been added to that pipe, the next call to nih_main_loop_interrupt() will result in the repeated EAGAIN shown above.

The correct solution is for the clients to use an nih main loop.

Changed in lxc (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in systemd (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Serge Hallyn (serge-hallyn) wrote :

Actually the source of the problem is that I was calling nih_dbus_setup(), which actually sets up the mainloop. It is not needed - as I thought it was - for making *_sync calls from cgmanager-client.so. So the right fix is to not call that.

Two comments on this. First, nih_dbus_setup() seems misleadingly named, though that may just be how I read it. The comments above that function are plenty clear and I should have seen those.

Second, the default nih dbus connection is a shared one, which always does nih_dbus_setup(). So for anyone to use libcgmanager without a nih mainloop, they *must* use the dbus_connection_open_private(). We should probably mention that somewhere in the libcgmanager documentation.

James Hunt (jamesodhunt) wrote :

Right. So to be completely clear, by calling nih_dbus_setup(), nih was registering a libdbus handler that runs on the client side such that whenever libdbus deemed necessary, it would attempt to wake up the clients main loop by calling nih_main_loop_interrupt(). However, since the client didn't have an nih main loop, the interrupt pipe was never drained and eventually filled leading to the EAGAIN tight loop.

Yes, I agree - maybe it should have been called nih_dbus_main_loop_setup() to make its usage clearer.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 1.0.3-0ubuntu2

---------------
lxc (1.0.3-0ubuntu2) trusty; urgency=medium

  * Cherry-pick upstream fix for cgmanager integration. (LP: #1303649)
 -- Stephane Graber <email address hidden> Fri, 11 Apr 2014 12:17:41 -0400

Changed in lxc (Ubuntu):
status: Confirmed → Fix Released
Changed in libnih (Ubuntu):
status: Confirmed → Invalid
Changed in cgmanager (Ubuntu):
status: Confirmed → Invalid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-5ubuntu19

---------------
systemd (204-5ubuntu19) trusty; urgency=medium

  * cgmanager: don't call nih_dbus_setup() since we won't use the
    nih_mainloop(). (LP: #1303649)
 -- Serge Hallyn <email address hidden> Fri, 11 Apr 2014 10:17:04 -0500

Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Dale Gulledge (dgulledge) wrote :

I don't know whether we've had a regression here, but I started seeing what looks like this exact same problem this morning.

Marcelo de G. Malheiros (mgm) wrote :

Same here.

I'm running a 14.04.5 distribution and the problem has appeared after the last dist-upgrade.

I'm using the 'proposed' repository for Trusty, and a few packages have been updated recently. I suspect it was caused by the following update:

https://www.ubuntuupdates.org/packages/latest_logs?commit=Filter&dist=trusty

systemd 03-08 22:07 UTC
Release: trusty Repo: universe Level: proposed New version: 204-5ubuntu20.27
Packages in group: python-systemd

  systemd (204-5ubuntu20.27) trusty; urgency=medium

  * logind: fix session not added to gc and other memleaks (LP: #1750013)
    - Added cgmanager as dependency, needed to clean up old cgroups

Marcelo de G. Malheiros (mgm) wrote :

Continuing my previous comment: after the update I got both 'cgmanager' and 'systemd-logind' daemons eating 70% of processor time each (in a multi-core cpu).

Tried to reinstall and reconfigure all recently updated packages, with no result.

The visible effect is a long delay when logging (either on terminal or graphics environment). I also got problems with the NetworkManager, which started to show an 'insufficient privileges' when selecting a wireless network.

A stopgap measure to access wifi was killing the user-level 'nm-applet' process and starting it again as root.

Marcelo de G. Malheiros (mgm) wrote :

Solved the issue temporarily for me:

1. Had to manually download and install older packages:

  sudo dpkg -i libpam-systemd_204-5ubuntu20.25_amd64.deb libsystemd-daemon0_204-5ubuntu20.25_amd64.deb systemd-services_204-5ubuntu20.25_amd64.deb

2. Put those versions on hold:

  sudo apt-mark hold systemd-services libsystemd-daemon0 libpam-systemd

3. Removed completely 'cgmanager':

  sudo apt-get purge cgmanager

4. Restarted

Dale Gulledge (dgulledge) wrote :

Marcelo,

Thanks for the information. My system was unresponsive enough that diagnosing the problem was a huge pain. I decided to back up /home and do an install of 16.04 from scratch since I'd been intending to do an upgrade soon anyway.

I was also using the "proposed" repo, so I should have been running the same versions you were of those packages. I installed them Thursday evening, but didn't reboot then. When I booted on Friday morning, the problem started. I remembered them because they don't get frequent updates so it was a bit of a surprise to see them.

Serge Hallyn (serge-hallyn) wrote :

Hi,

just to get this straight to narrow down scenarios to try to reproduce:

1. Dale with 16.04 you are *not* seeing this, right? You saw it with 14.04 with proposed enabled?

2. Marcelo, you are seeing this with 14.04.5 with proposed enabled?

Marcelo de G. Malheiros (mgm) wrote :

Serge,

Yes, this happened to me last friday (March 9th), after the latest update of my 14.04.5 which had 'proposed' enabled.

I have since reverted to the 204-5ubuntu20.25 version of the three above-mentioned packages, by manual installation. Had also marked the same packages on hold to prevent update.

Dale Gulledge (dgulledge) wrote :

Serge,

I am not seeing anything like this with 16.04.4. Since I did a full wipe and reinstall, I didn't expect to. Yes, I definitely saw it with 14.04.5 with "proposed" enabled. And the packages Marcelo indicated were among the most recently installed. I think you can treat his system as representative of the problem I saw.

Thanks Dale and Marcelo for your quick report - this is really useful. The problem seem to be caused due to cgmanager being added as dependency of systemd in -proposed - this request was clearly explained in LP #1750013 (not a dup for this, it's another issue with systemd-logind).

During my test I didn't observe such bad behavior of high CPU utilization - might be some interaction I didn't explore/reproduce. Can you check please if you're using -backports repo ? If so, then the cgmanager version installed in your system was 0.39 (the default version, without -backports repo, is 0.24).

Anyway, the systemd package with cgmanager as dependency was _removed_ from -proposed, so as next step I'll focus in fixing the other LP without adding cgmanager, and eventually investigate what bad interaction happened in your case to address it.

Thanks,

Guilherme

Dale Gulledge (dgulledge) wrote :

Yes, I was using -backports. I dumped a list of everything I had installed before I reinstalled, and I had cgmanager/trusty-updates 0.24-0ubuntu7.5 installed right before the I wiped my system. I was still experiencing the problem.

Marcelo de G. Malheiros (mgm) wrote :

Guilherme,

Yes, I am using 'backports' too.

However, the 'cgmanager' version that was automatically installed was 'cgmanager_0.24-0ubuntu7.5_amd64', and it is still available to me.

Thank you for sorting this!

I am having the same issue since this morning (14 march): cgmanager and systemd-logind both using 60-70% of system resources, gui unusable, login via console possible.
Looking at /var/log/apt/term.log I see that the update has been installed on 9 march, but did not give any issues until now.
Intermediate updates were samba packages yesterday (13 march) evening ... but the system worked after that just fine for the first bootup this morning.
Only on the second boot did things go wrong.

I tried to apply the fix outlined in 4 steps by Marcelo [4 steps], just for i386 instead of amd64.
The problem I have is getting those 3 deb packages.
I found the build for libpam 20.25 [libpam20.25], but this one offers no artefacts.
On the various package sites only 20.26 is available.
Do I have to build on my own?
Or is a timeline for an update known, so that I can just upgrade?

Cordially,
  htc

[4 steps]: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1303649/comments/20
[libpam20.25]: https://launchpad.net/ubuntu/+source/systemd/204-5ubuntu20.25/+build/13545611

Marcelo de G. Malheiros (mgm) wrote :

Hofer-Temmel Christoph,

I've got the binary packages for the 204-5ubuntu20.25 version from Launchpad itself. The pages for the x86 versions you need are:

https://launchpad.net/ubuntu/trusty/i386/libpam-systemd/204-5ubuntu20.25
https://launchpad.net/ubuntu/trusty/i386/libsystemd-daemon0/204-5ubuntu20.25
https://launchpad.net/ubuntu/trusty/i386/systemd-services/204-5ubuntu20.25

Hope it helps,
  Marcelo

vadimo (michalgejdos-azet) wrote :

I cca two days the same problem. Two Core CPU's all time on 60% and nm-applet not shown on panel:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  459 root 20 0 45500 868 680 S 62,2 0,0 53:20.26 systemd-logind
25499 root 20 0 25708 1344 1120 R 58,2 0,0 1:55.28 cgmanager

Cameron Paine (cbp) wrote :

FWIW I'm experiencing similar symptoms on an ancient desktop that has been stable, reliable and used every day for years. I too ran dist-upgrade a couple of days ago. I've no idea what repos I use. That config hasn't been touched for a long time. Can investigate if the information might be useful.

$ top -b -n 1 | head
top - 19:17:12 up 1 day, 11:17, 2 users, load average: 1.74, 1.65, 1.96
Tasks: 191 total, 2 running, 189 sleeping, 0 stopped, 0 zombie
%Cpu(s): 47.8 us, 51.4 sy, 0.3 ni, 0.3 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 1025144 total, 720844 used, 304300 free, 171616 buffers
KiB Swap: 1048572 total, 30508 used, 1018064 free. 190468 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  353 root 20 0 3076 1172 1068 R 31.6 0.1 922:38.21 cgmanager
  892 root 20 0 4216 1344 1248 S 31.6 0.1 911:28.89 systemd-logind
 1539 root 20 0 89256 37060 14232 S 7.9 3.6 45:56.98 Xorg
$ uname -a
Linux lab-netvista 3.13.0-87-generic #133-Ubuntu SMP Tue May 24 18:33:01 UTC 2016 i686 i686 i686 GNU/Linux
zaphod@lab-netvista:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

Could you find the pid of cgmanager ( 353 below)‎ and do

Strafe -f -p 353 -o trace.txt

for maybe 5 seconds, ctrl-c it, and attach trace.txt here?

Mauro Franzoni (mauf) wrote :

I was also experiencing cpu overload for cgmanager (up to 99%) then I disabled it in grub's default (with "cgroup_disable=cpu") but systemd-logind started eating CPU (80%) instead.

I see nothing strange in the logs, but if I can be of any help by providing any information please just ask me.

mauf

Damo (damomilhuisen) wrote :

Same problem as well strace on cgmanager attached.

Damo (damomilhuisen) wrote :

Strace on systemd-logind as well.

Cameron Paine (cbp) wrote :

@serge-hallyn, I've attached output of as requested.

I hadn't subscribed to notification. That's now been fixed. Missed your request when you posted it.

Happy to do more.

Folks that are experiencing this issue: the best way to circumvent it for now I guess it's downgrade systemd package to version 204-5ubuntu20.26 and remove cgmanager.

To remove cgmanager: "sudo apt-get remove cgmanager"

To downgrade systemd version: "sudo apt-get install systemd-services=204-5ubuntu20.26 libsystemd-login0=204-5ubuntu20.26 libsystemd-daemon0=204-5ubuntu20.26 libpam-systemd=204-5ubuntu20.26 libudev1=204-5ubuntu20.26 udev=204-5ubuntu20.26"

The version 204-5ubuntu20.26 is in the official repos, so no need to download manually deb packages and whatnot; the proposed 20.27 version should be removed for now.

Thanks,

Guilherme

For the folks that observed the recent cgmanager issue with logind: there's a new systemd version in -proposed; it's version "204-5ubuntu20.28". I've just tested it with and without cgmanager, in kernels 4.4.0-119 and 3.13.0-145 (according to LP #1750013), and didn't observe any constant CPU utilization increase.

Important to notice this new version of systemd ("204-5ubuntu20.28") _does not depend_ on cgmanager, so it's not mandatory to have cgmanager installed.

Tests with the new version are welcome, with and without cgmanager. I'd like to ask, if possible, that anyone that observes the issue again to test with and without cgmanager, and if possible, in kernels 4.4.0 and 3.13.0, so we have a better idea of the environments reproducing the issue.

Thanks in advance,

Guilherme

Marcelo de G. Malheiros (mgm) wrote :

Dear Guilherme,

I have a mostly vanilla Ubuntu 14.04.5, right now with kernel 3.13.0-145.

I've just tested the new "204-5ubuntu20.28" packages from "-proposed", and they run fine, with and without the latest "cgmanager". No problems detected.

Thank you for your support.

Thanks a lot for your feedback Marcelo; I'm glad everything seems fine now.
Cheers,

Guilherme

Sorry for the late reply, I have not been subscribed to this bug. I have not written earlier because the bug was limited to one location (it seems in connection with one wireless network made available to all users and logging in to the network before logging into the user account), and I wanted to test all cases of old packages, Marcelo's workaround and the new version.

@Marcelo de G. Malheiros: Thanks for your pointer to the packages. Your temporary fix worked well. That enabled me to actually work at this location.

@Guilherme Picooli: Thanks for version "204-5ubuntu20.28". I upgraded to it and it the problem has disappeared.

Cordially,
  htc

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers