systemd-logind assert failure: cgmanager-client.c:6322: Assertion failed in cgmanager_list_children_sync: proxy != NULL

Bug #1309025 reported by Para Siva on 2014-04-17
146
This bug affects 25 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
High
Stéphane Graber
Trusty
High
Stéphane Graber
Utopic
High
Stéphane Graber

Bug Description

SRU:

Rationale: systemd-logind randomly crashes on random systems, usually with a similar traceback or most commonly with a corrupted one. We've identified a few problems in the patch, most of which can account for the symptoms people have seen and all of which being obviously correct bugfixes.

Testcase: Get the fixes into utopic and trusty-proposed, wait for a week for new reports (here and on errors.ubuntu.com), if none were reported, push to -updates. While we know what we've fixed, actually reproducing the bug in the wild is notoriously difficult, we attempted various kind of stress tests over the past 2 months without much luck...

Regression potential: All the fixes are very simple, very targeted and pretty obvious, so if we do end up breaking something else as the result of that, it's most likely another bug that was hidden behind incorrect behaviour. Any such bug should be easy to deal with or we can always revert to the current state (better the devil you know).

=== Original bug report ===
Occurred after a dist-upgrade, reboot and logging in.

Any needed logs will be added later

ProblemType: Crash
DistroRelease: Ubuntu 14.04
Package: systemd-services 204-5ubuntu20
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
AssertionMessage: cgmanager-client.c:6322: Assertion failed in cgmanager_list_children_sync: proxy != NULL
Date: Thu Apr 17 14:53:58 2014
ExecutablePath: /lib/systemd/systemd-logind
InstallationDate: Installed on 2012-10-08 (555 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Beta amd64 (20121008)
ProcCmdline: /lib/systemd/systemd-logind
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
Signal: 6
SourcePackage: systemd
StacktraceTop:
 cgmanager_list_children_sync () from /lib/x86_64-linux-gnu/libcgmanager.so.0
 ?? ()
 ?? ()
 ?? ()
 ?? ()
Title: systemd-logind assert failure: cgmanager-client.c:6322: Assertion failed in cgmanager_list_children_sync: proxy != NULL
UpgradeStatus: Upgraded to trusty on 2013-10-26 (172 days ago)
UserGroups: utah

Para Siva (psivaa) wrote :
information type: Private → Public

StacktraceTop:
 cgmanager_list_children_sync (parent=parent@entry=0x0, proxy=0x0, controller=controller@entry=0x426d47 "systemd", cgroup=cgroup@entry=0x9d3e51 "user/1000.user/c2.session", output=output@entry=0x7ffffff78078) at cgmanager-client.c:6323
 cgm_list_children (controller=0x426d47 "systemd", cgroup_path=0x9d3e51 "user/1000.user/c2.session", cgroup_path@entry=0x9d3e50 "/user/1000.user/c2.session", children=children@entry=0x7ffffff78078) at ../src/shared/cgmanager.c:194
 cg_trim (controller=controller@entry=0x425c0a "name=systemd", path=0x9d3e50 "/user/1000.user/c2.session", delete_root=delete_root@entry=false) at ../src/shared/cgroup-util.c:750
 session_terminate_cgroup (s=0x9e3740) at ../src/login/logind-session.c:625
 session_stop (s=s@entry=0x9e3740) at ../src/login/logind-session.c:709

Changed in systemd (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace
dino99 (9d9) wrote :

Got that crash after a cold boot; fresh system installed.

tags: added: i386
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Martin Pitt (pitti) on 2014-04-23
Changed in systemd (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
Stéphane Graber (stgraber) wrote :

I don't see anything obviously wrong in the call made by systemd-logind itself, so this may be a problem with cgmanager, adding a task.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cgmanager (Ubuntu):
status: New → Confirmed
Serge Hallyn (serge-hallyn) wrote :

Has this re-occurred at all since the initial report?

Changed in cgmanager (Ubuntu):
importance: Undecided → Medium
Para Siva (psivaa) wrote :

Yes this keeps occurring to me. I think I have a system that keeps reproducing this. I had to move away from it since I was not able to log in at all on that machine and I assumed this bug was the reason.

Luke Faraone (lfaraone) wrote :

This bug occurred for me on the same day as ~psviaa ( 2014-04-17), but I don't think it has happened again. I don't believe it prevented me from logging in.

Serge Hallyn (serge-hallyn) wrote :

@psivaa,

would you mind turning that machine on, with cgroup-lite installed and cgmanager disabled, and with the ubuntu user existing (in sudo group) with my ssh key set up (ssh-import-id serge-hallyn), and with a port forward through your access point so I can log in and try to debug?

If that's possible, please send me a private email with the ip address and forwarded ssh port #.

Serge Hallyn (serge-hallyn) wrote :

@lfaraone,

just a shot in the dar, but do you by chance still have cgroup-lite installed? If /sys/fs/cgroup/devices (etc) still exist, then that would explain why you can still log in even if you're experiencing this bug.

Martin Pitt (pitti) wrote :

For the record, I also regularly get this crash. Yes, I do have cgroup-lite installed, shouldn't I? /sys/fs/cgroup/devices/ does exist. Also, I'm always able to log in, there is no noticeable brokenness except for apport reporting this crash.

Martin Pitt (pitti) wrote :

I purged cgroup-lite now, will report whether the crash comes back. Although I'm mostly running systemd now as I'm currently developing it.

Para Siva (psivaa) wrote :

@hallyn
Unfortunately, the system that I first saw this bug on is not reproducing the crash any more. I unintstalled/ installed many packages , somewhat blindly at times, on that machine to fix the login issue that I had on it and now the crash is not occurring now.

dino99 (9d9) wrote :

Got some packages from the edgers ppa: nvidia-337, mesa, libdrm2 and some wayland updates.

While installing, i first saw "no nvidia-337 in the dkms tree" at the very beginning of upgrade (seem to be the previous status)
Then all the depmod have worked on 3.13 3.14 & 3.15 kernels
When all the upgrades have been done, then "startx" have loaded X without trouble.

As my system have a "maxwell" nvidia card, it should be usfull for me & other people having recent hardware, to get the latest driver & the like (mesa, drm) inside the Utopic archive. As everything is working again, i'm disabling the edgers ppa (not waiting the next break).

So, the cgmanager issue is fixed on my system.

Serge Hallyn (serge-hallyn) wrote :

Thanks - so psivaa and dino99 both no longer see this issue. That leaves pitti, who has a somewhat nonstandard system. Do you still see this crash, after uninstalling cgroup-lite?

Having cgroup-lite installed shouldn't really affect this. If it does (through a race in mounting /sys/fs/cgroup) we obviously must address it.

It does seem just as likely that the thing we're racing against is systemd. Although IIUC systemd mounts cgroups before firing off any daemons, so that race should not exist.

Martin Pitt (pitti) wrote :

TBH I completely forgot about this for a week, as I didn't get that crash again. I actually reinstalled cgroup-lite, as after purging I got a log in, that logind crash, and then pretty much nothing was working (quite expectantly). However, this might now be mitigated in utopic as logind has become fully D-BUS activatable (it wasn't in trusty), so the effects of such a crash are hardly visible. But as I said, I've got no recent .crash report for it. Did anything change in cgmanager etc. recently, or did some systemd rearrangement change this, like the on-demand startup via D-BUS activation? So if noone else gets this any more in utopic I'm fine with getting this closed, I just wonder if it still affects trusty?

dino99 (9d9) wrote :

@Martin

since that report was done, we got a new:

cgmanager (0.25-0ubuntu5) utopic; urgency=medium

  * d/p/0008-get_controller_path-use-the-is_same_controller-helpe.patch:
    correctly handle requests pertaining to named systems (i.e.
    'name=systemd').

 -- Serge Hallyn <email address hidden> Fri, 02 May 2014 13:28:24 -0500

but its has not resolve the problem by itself. My issue has been fixed by the latest nvidia-337.19 + libdrm2 + mesa packages from the edgers ppa.

Note: on that machine , only Utopic was affected, Trusty with a stock install never had trouble.

Thanks, everyone. I don't believe anything in cgmanager fixed this,
so rather than mark it 'fix released' i'll mark it 'invalid' in the
'can no longer be reproduced' sense. If it happens again please do
reopen this.

 status: invalid

Changed in cgmanager (Ubuntu):
status: Confirmed → Invalid
Roman V. Isaev (rm-isaev) wrote :

I still have this problem. I have 14.04 with all latest updates...

Andreas Hasenack (ahasenack) wrote :

Happens all the time to me too. Cold start, login, crash. Doesn't prevent me from logging in, it's just apport annoying me.

Bryan Quigley (bryanquigley) wrote :

This is the first and third crasher in systemd-services (as of may 23rd)
https://errors.ubuntu.com/?release=Ubuntu%2014.04&package=systemd-services&period=day

Serge Hallyn (serge-hallyn) wrote :

Following from the errors.ubuntu.com link, and looking at the thread stack
trace there, it has

#3 0xb7694d63 in cgmanager_set_value_scm_sync (parent=0x0, proxy=0x0, controller=0x8071f35 "systemd", cgroup=0x868b579 "user/140.user/c2.session", key=0x807d84e "tasks", value=0xbfa6b5ec "", sockfd=68) at cgmanager-client.c:4320
        method_call = 0x1
        iter = {dummy1 = 0xb7683004 <__nih_free>, dummy2 = 0xb766f1cf <nih_free+623>, dummy3 = 0, dummy4 = -1079593560, dummy5 = 6897520, dummy6 = -1751146240, dummy7 = -1219624852, dummy8 = 0, dummy9 = -1, dummy10 = 134769652, dummy11 = 1, pad1 = 141098216, pad2 = 141098208, pad3 = 0x0}
        error = {name = 0xbfa6b58c "\364k\b\b\001", message = 0xb7682e58 "H}\001", dummy1 = 0, dummy2 = 0, dummy3 = 0, dummy4 = 1, dummy5 = 0, padding1 = 0x868fce8}
        reply = <optimized out>
        __FUNCTION__ = "cgmanager_set_value_scm_sync"
#4 0x0806cdfa in cgm_get (controller=0x8071f35 "systemd", cgroup_path=0x868b579 "user/140.user/c2.session", cgroup_path@entry=0x868b578 "/user/140.user/c2.session", key=key@entry=0x807d84e "tasks") at ../src/shared/cgmanager.c:135
        result = 0x0
        __func__ = "cgm_get"

which is confusing since cgm_get calls cgmanager_get_value_scm_sync, not
cgmanager_set_value_scm_sync. Which suggests memory corruption - which
could of course be happening anywhere.

I've tried this before, but will (on tuesday) set up an attempt to
reproduce with a great number of repeated logins.

Changed in cgmanager (Ubuntu):
status: Invalid → Confirmed
amir sanjar (asanjar) wrote :

I have been having the same problem on two systems multiple times daily, both were upgraded to 14.04 LTS from 13.10. Have not been able to reproduce on a fresh install

Serge Hallyn (serge-hallyn) wrote :

Very interesting, thanks for that.

Can anyone say whether they have reproduced this on a machine which was
a fresh 14.04 install?

I will do a few attempts to reproduce with an upgrade from 13.10.

Andreas Hasenack (ahasenack) wrote :

On Fri, May 30, 2014 at 1:57 PM, Serge Hallyn <email address hidden>
wrote:

> Very interesting, thanks for that.
>
> Can anyone say whether they have reproduced this on a machine which was
> a fresh 14.04 install?
>
>
Mine was a fresh install, but I can't guarantee it wasn't the last beta
before release. It might have been.

I had this problem as well as startup (14.04 upgraded from old version, 10.04).
Purged cgroup-lite and it was solved.

Roman V. Isaev (rm-isaev) wrote :

I had to remove whole lxc to get rid of this problem.

Serge Hallyn (serge-hallyn) wrote :

Hi Roman,

Did you have any auto-started containers?

Do you know whether you had cgroup-lite installed? (Could you attach your /var/log/apt/history.log)

Helge Jung (youngage) wrote :

I'm experiencing this bug on a fresh install of Ubuntu Gnome 14.04 x64 on every startup (looks like it already occurs when lightdm is running or early in Gnome Shell startup as window decorations are missing at first). I have docker.io installed (which uses lxc and cgroups if I'm correct). Additionally, if that matters, my system is booting in UEFI Secure Boot mode.

If you need logs from me or if I shall test some experimental stuff feel free to ask.

Serge Hallyn (serge-hallyn) wrote :

Can you verify whether cgroup-lite is installed? What does

ls /sys/fs/cgroup

show?

Helge Jung (youngage) wrote :

cgroup-lite is installed (1.9). /sys/fs/cgroup shows the following directories (all owned by root:root and with access 0755): blkio, cgmanager, cpu, cpuacct, cpuset, devices, freezer, hugetlb, memory, perf_event, systemd

Roman V. Isaev (rm-isaev) wrote :

On 16/Jun 13:20, Serge Hallyn wrote:
> Can you verify whether cgroup-lite is installed? What does
>
> ls /sys/fs/cgroup
>
> show?

dragon pts/15#ls /sys/fs/cgroup
systemd
dragon pts/15#ls /sys/fs/cgroup/systemd
cgroup.clone_children cgroup.event_control cgroup.procs cgroup.sane_behavior notify_on_release release_agent tasks user
dragon pts/15#

root@dragon:/home/test# dpkg -l|grep cgroup
rc cgmanager 0.24-0ubuntu5 amd64 Central cgroup manager daemon
rc cgroup-lite 1.9 all Light-weight package to set up cgroups at system boot
ii libcgmanager0:amd64 0.24-0ubuntu6 amd64 Central cgroup manager daemon (client library)
ii libcgmanager0:i386 0.24-0ubuntu6 i386 Central cgroup manager daemon (client library)

--
 Roman V. Isaev http://www.gunlab.com.ru Moscow, Russia

Serge Hallyn (serge-hallyn) wrote :

Thanks. Unfortunately I still cannot reproduce this...

Helge, can you please do

dpkg -l > packages.list
echo "lxc-ls -f" > containers.list
sudo lxc-ls -f >> containers.list
echo "autostart lines" >> containers.list
sudo grep 'lxc.start.auto' /var/lib/lxc/*/config >> containers.list
echo "lxc-autostart output" >> containers.list
sudo lxc-autostart -L >> containers.list

and attach packages.list and containers.list?

 status: incomplete

Changed in cgmanager (Ubuntu):
status: Confirmed → Incomplete
Helge Jung (youngage) wrote :

packages.list is attached, the containers.list is almost empty (I have nothing matching /var/lib/lxc/*/config):

lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
----------------------------------
autostart lines
lxc-autostart output

description: updated
Stéphane Graber (stgraber) wrote :

Ok, so after spending some more time on it with Serge, we found and fixed a few problems with the cgmanager patch for systemd-logind. Pretty much each of the fixes could explain what we've been seeing, so hopefully this will be the end of it (combined with the cgmanager fix from Michael Terry which already landed in both utopic and trusty).

I'm removing the cgmanager task as I believe this side has now been separately fixed and am tracking these bugfixes for trusty and utopic now.

Changed in cgmanager (Ubuntu):
status: Incomplete → Invalid
no longer affects: cgmanager (Ubuntu)
no longer affects: cgmanager (Ubuntu Trusty)
no longer affects: cgmanager (Ubuntu Utopic)
Changed in systemd (Ubuntu Utopic):
status: Confirmed → In Progress
Changed in systemd (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Changed in systemd (Ubuntu Utopic):
importance: Medium → High
Changed in systemd (Ubuntu Trusty):
assignee: nobody → Stéphane Graber (stgraber)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-10ubuntu9

---------------
systemd (204-10ubuntu9) utopic; urgency=medium

  * Fix various issues with the cgmanager integration (LP: #1309025):
     - Always nih_free variables that were potentially nih allocated.
     - Always initialize the children listings to NULL.
 -- Stephane Graber <email address hidden> Wed, 18 Jun 2014 23:34:41 -0400

Changed in systemd (Ubuntu Utopic):
status: In Progress → Fix Released

Hello Parameswaran, or anyone else affected,

Accepted systemd into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/systemd/204-5ubuntu20.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed

As soon as I saw someone mention LXC above, and since I have two machines that I recently upgraded to 14.04 that had LXC and the crashes, I purged LXC, rebooted and re-installed LXC, and voilà: no more crashes!

I started experiencing this error suddenly after a reboot, and couldn't log in in graphic mode anymore. I promptly found this page and tried to enable the trusty-proposed repository to install the fixed version.

The first strange thing that I noticed is that I couldn't find a systemd package in the -proposed package database (Is this normal?). In any case, I decided to install one by one the components that were mentioned in the package description. All of them appeared in trusty-proposed, and I was able to install them.

The problem, however, was that apt said that it needed to remove some packages that were not compatible with the newer ones that I was installing. This included things such as Skype, but also others such as libgl1-mesa-glx. This sounded weird, but I decided to proceed anyway. After doing that and rebooting, I was able to login without getting the original error message, but something was still wrong: many of the Unity components stopped working. The Unity panel is gone, window resizing is not working, window decorations are gone, etc. I also have XMonad installed, which seems to work just fine, and also Gnome with XMonad, with doesn't seem to work at all.

Any ideas on how to fix this?

Thanks

The crash has come back today after an upgrade, and I decided to try the systemd package in proposed, but as Arthur said, it doesn't exist.

Stéphane Graber (stgraber) wrote :

systemd is the source package name, the binary package in this case is systemd-services

tags: added: verification-done
removed: verification-needed
Stéphane Graber (stgraber) wrote :

Been running it on a variety of system, the last crash I got was prior to the upgrade, so things at least aren't any worse and likely are fixed by this update.

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-5ubuntu20.3

---------------
systemd (204-5ubuntu20.3) trusty; urgency=medium

  * Fix various issues with the cgmanager integration (LP: #1309025):
     - Always return false on connection failure.
     - Always nih_free variables that were potentially nih allocated.
     - Always initialize the children listings to NULL.
     - Always initialize the list iterator to 0.
 -- Stephane Graber <email address hidden> Wed, 18 Jun 2014 23:37:50 -0400

Changed in systemd (Ubuntu Trusty):
status: Fix Committed → Fix Released
Luis Boullosa (luisboullosa) wrote :

The problem has arisen again in my system when updating to systemd-services 204-5ubuntu20.5.

I downgraded to the 204-5ubuntu20.4 version and the error disappeared, so I guess the latest version is provoking the error now.

Martin Pitt (pitti) wrote :

@Luis: That version (http://launchpadlibrarian.net/182673897/systemd_204-5ubuntu20.4_204-5ubuntu20.5.diff.gz) does not change anything related to this bug at all, so this sounds like a race condition and pure coincidence. However, you posted this to both here and bug 1302264 and both are already fixed/closed. If you still experience crashes, please report them as new bugs, as they are most certainly different. Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers