From #ubuntu-devel [23:23] robert_ancell, poke -- I think lightdm 1.11.8 is causing problems on krillin devices [23:23] mterry, oh, rsalveti said it was working [23:23] which problems? [23:24] robert_ancell, rsalveti: I'm seeing that cgroups aren't being set for the user session correctly, so a lot of policykit requests are failing [23:24] mterry, you have a lightdm.log? [23:24] robert_ancell, uh hold on just flashed [23:24] 1.11.7 would have those problems [23:25] right, the issue I had got fixed with 1.11.8 [23:26] So, anyone want to send me a krillin? :) [23:26] Interesting. I'm not seeing the problem with 1.11.7 [23:27] mterry, it was a crash triggered by a race, so you might not have seen it [23:28] robert_ancell, well the failure is reliable with 1.11.8 (only on krillin) [23:28] So I suppose your issue is different, though they both sounds related to logind [23:37] cjwatson: now, they were all straight rebuilds [23:38] cjwatson: btw, libav now migrated to debian/testing [23:51] mterry, are you getting a log? [23:51] robert_ancell, yeah sorry had problems [23:51] np [23:51] just checking :) [00:02] robert_ancell, http://paste.ubuntu.com/8254607/ [00:02] worth the wait! [00:03] mterry, hmm, so that log looks good. There's definitely a logind session open and we activated it [00:04] mterry, have you tried running loginctl and checking everything looks happy there? [00:04] robert_ancell, it does yes [00:04] robert_ancell, it's a cgroup thing as far as I can tell [00:04] root@ubuntu-phablet:/# cat /proc/`pidof unity8`/cgroup [00:04] 4:name=systemd:/ [00:04] 3:freezer:/user.slice/user-32011.slice/session-c1.scope [00:04] 2:cpuacct:/user.slice/user-32011.slice/session-c1.scope [00:04] 1:cpu:/user.slice/user-32011.slice/session-c1.scope [00:05] mterry, right, but the cgroups are all set by pam_systemd aren't they? [00:05] mterry, have you confirmed 1.11.7 works fine? [00:05] robert_ancell, it does for me yes [00:06] let me try downgrading on this same image to double confirm [00:09] mterry, can you get a lightdm.log from 1.11.7 so we can diff [00:09] robert_ancell, yup, downgrading works fine [00:10] there's just nothing significant in the diff in how we interact with PAM... very odd... [00:10] robert_ancell, http://paste.ubuntu.com/8254652/ [00:10] and more relevantly: [00:10] root@ubuntu-phablet:/# cat /proc/`pidof unity8`/cgroup [00:10] 4:name=systemd:/user.slice/user-32011.slice/session-c1.scope [00:10] 3:freezer:/user.slice/user-32011.slice/session-c1.scope [00:10] 2:cpuacct:/user.slice/user-32011.slice/session-c1.scope [00:10] 1:cpu:/user.slice/user-32011.slice/session-c1.scope [00:11] mterry, that's on a good run? [00:11] What does "loginctl session-status $XDG_SESSION_ID" say on a good and bad run [00:11] robert_ancell, right [00:12] root@ubuntu-phablet:/# loginctl session-status c1 [00:12] Failed to query ControlGroup: No such interface 'org.freedesktop.DBus.Properties' on object at path /org/freedesktop/systemd1/unit/session_2dc1_2escope [00:12] c1 - phablet (32011) [00:12] Since: Fri 2014-09-05 00:07:59 UTC; 4min 11s ago [00:12] Leader: 1439 (lightdm) [00:12] Seat: seat0; vc1 [00:12] Service: lightdm-autologin; type unspecified; class background [00:12] State: active [00:12] Unit: session-c1.scope [00:12] that's a bad run [00:12] Would need to reflash to get good run [00:12] robert_ancell, like I said, logind thinks everything is fine [00:13] robert_ancell, the problem is that apps don't think they are part of the active session [00:13] not sure about "class background" [00:13] robert_ancell, I don't know what that means either [00:13] mterry, you can't just update lightdm back to 1.11.8? [00:14] robert_ancell, oh right! ahem [00:14] robert_ancell, I forgot I had downgraded in fact [00:14] robert_ancell, so this is from a good run [00:14] let me upgrade again [00:14] rsalveti, did we confirm if your krillin does the PK stuff right with lightdm 1.11.8? [00:14] robert_ancell, I'm a little loopy right now [00:14] no worries [00:16] root@ubuntu-phablet:/# loginctl session-status c1 [00:16] Failed to query ControlGroup: No such interface 'org.freedesktop.DBus.Properties' on object at path /org/freedesktop/systemd1/unit/session_2dc1_2escope [00:16] c1 - phablet (32011) [00:16] Since: Fri 2014-09-05 00:15:09 UTC; 44s ago [00:16] Leader: 1359 (lightdm) [00:16] Seat: seat0; vc1 [00:16] Service: lightdm-autologin; type unspecified; class background [00:16] State: active [00:16] Unit: session-c1.scope [00:16] robert_ancell, that's the bad [00:16] seems identical [00:16] yeah [00:16] will be really curious to see how lightdm could cause it [00:17] hallyn, if we were setting up PAM incorrectly that would do it, but I can't see any significant change there [00:17] mterry, could you do a bisect? [00:17] mterry, also, is it just you or is this widespread? [00:17] robert_ancell, widespread [00:17] on krillin [00:18] robert_ancell, sure, I can bisect... let me get a build env set up for krillin [00:19] It's got to be either r2037+r2038 or r2041 [00:21] ok, will check those explictly [00:45] robert_ancell, looks like it wasn't 2041 [01:22] mterry, how's it going with the bisecting? [01:22] robert_ancell, good but ran into a problem with re-running a debuild without cleaning (didn't seem to like that) [01:22] robert_ancell, I'm now testing the 2037/2038 commits [01:23] so to confirm - it works with 2036 (1.11.7) and not 2043 (1.11.8) [01:23] what other revs? [01:25] robert_ancell, it does not work with 2040 [01:26] robert_ancell, and I'm testing 2038 now [01:28] robert_ancell, as expected, 2038 doesn't work either [01:28] robert_ancell, is it worth trying 2037 specifically? [01:28] mterry, have you tried a locally built 2036? [01:28] robert_ancell, yes, it worked [01:29] you should probably try for completeness, but 2038 is probably the candidate [01:32] mterry, what method are you using to build on your device? Remount rw then bzr-buildpackage? [01:33] robert_ancell, uh, I put the device in rw mode sure. Not remounting each time though [01:33] robert_ancell, and then debuild sure [01:34] mterry, you just do a "# mount / -o,remount,rw" to go from a stock image? No other special tricks? [01:34] robert_ancell, there's a file you can put in /userdata that makes the image rw on boot [01:35] ah, what is that? [01:35] robert_ancell, than I can do normal stuff like apt-get build-dep lightdm etc [01:35] robert_ancell, uh I think the file is .writable_image, but I usually just run "phablet-config writable-image" from my host device [01:35] awesome, thanks [01:36] robert_ancell, do you have a krillin? post 3.4 to llvm/IR/Verifier.h [01:36] mterry, no, I was hoping I might be able to reproduce on mako [01:36] robert_ancell, I have not been able to myself [01:36] :( [01:38] mterry, The only thing I can think of now is to manually start removing parts of r2038 to find out what's actually triggering it. [01:39] robert_ancell, nothing seems especially suspicious to you? [01:40] mterry, I've been reading through the code here and there's nothing that stands out. We listen to move D-Bus events from logind, we activate sessions slightly differently (but from reading the systemd source both methods should have the same outcome) and we check CanGraphical on the seat. But the logs indicate that's always TRUE in your case so it should be the same behaviour as previously [01:42] mterry, dmesg doesn't show logind complaining about anything does it? [01:43] [ 9.925763] (1)[1099:systemd-logind]systemd-logind[1099]: New seat seat0. [01:43] [ 11.310221] (0)[1099:systemd-logind]systemd-logind[1099]: Failed to start unit user@32011.service: Unknown unit: user@32011.service [01:43] [ 11.311741] (0)[1099:systemd-logind]systemd-logind[1099]: Failed to start user service: Unknown unit: user@32011.service [01:43] robert_ancell, but those are normal errors I think [01:43] yeah, I get them here [01:44] mterry, I have a "New session c1 of user lightdm." on my desktop, do you get those on a good run? [01:44] what's going on with llvm in ubuntu? it seems to me that debian testing has both llvm 3.4 and 3.5, whereas ubuntu dropped 3.4 already. why is that? [01:44] robert_ancell, hold on, testing 2037 [01:44] k [01:45] robert_ancell, 2037 is good [01:45] well, that confirms it, 2038 it is [01:45] robert_ancell, you have that message in your dmesg? [01:45] mterry, on my desktop [01:46] robert_ancell, I don't here, even on a good run [01:46] robert_ancell, ok... shall I start disassembling 2037? [01:46] *2038 [01:47] yes please [02:18] robert_ancell, ah hrm... I looked in the folder I built my 2038 bisect in, and it wasn't valid -- rebuilding 2038 and it works. [02:19] mterry, ah [02:19] mterry, so where does that leave the range of possibilities? [02:19] RAOF, can you check lightdm --show-config [02:20] robert_ancell, I no longer have my 2040 folder lying around. But assuming I did do that one right, either 2039 or 2040 are broken [02:20] mterry, 2040 was reverted in 2042 [02:20] mterry, and 2039 is only test changes [02:21] hrm. Then let me redo my 2040 bisect [02:38] robert_ancell, 2040 works too, will try 2041 [02:38] which I'm guessing will fail? [02:39] that's the guess [02:51] robert_ancell, OK 2041 seems to work too... Are the later commits risky? [02:52] Or am I just screwing up my bisects [02:52] 2042 is a reversion and 2043 is the release [02:53] :( [02:54] is 2043 working now? [02:54] grr, it better not be [03:12] robert_ancell, ok 2043 did not work [03:12] so that's good I guess [03:12] I guess? [03:13] Assuming my 2041 test was valid, which I'm not guaranteeing, I'll test 2042 next... Is that likely to be a problem commit? [03:13] You indicated no [03:15] * mterry is beginning to doubt his ability to bisect [03:16] * mterry tries 2041 again -- that really should be the problem commit [03:37] mterry, how goes 2041? [03:37] robert_ancell, it was good again [03:37] robert_ancell, trying 2042 for the lulz [03:37] mterry, do you have all the packages built so you can easily switch between? [03:38] robert_ancell, I unfortunately have been building the .debs over each other. But I have the build directories sitting there so could regenerate debs in relatively shorter order [03:39] robert_ancell, and 2042 predictably fails (if it worked and 2043 failed, I'd just give up) [03:40] robert_ancell, so assuming my 2041 tests have been valid, and I did redo it... That means 2042 is the problem [03:41] that makes no sense [03:41] robert_ancell, I do see something about add_login1_seat in that commit... any chance that is odd? [03:42] s/odd/buggy/ [03:42] 2042 = "Revert globbing changes - there are problems with it" [03:42] right... so we're going back to historical behavior [03:43] only has changes in src/lightdm.c regarding configuration loading [03:43] And they only relate to multi-seat anyway, which is not being used on krillin [03:44] robert_ancell, :( [03:44] robert_ancell, so I'm guessing I mis-tested 2041... twice [03:45] mterry, does repeated runs have the same result? [03:45] robert_ancell, it always has in the past [03:51] mterry, So to summarise: LightDM 1.11.8 is breaking PolicyKit on krillin, but not mako. 1.11.7 doesn't break PolicyKit on krillin, but does have a race-conditn triggered crash. The bisect it indicating r2042 could cause the problem, but it doesn't make sense. There's been some inconsistency in the bisecting [03:51] I should also say "not mako or desktop" [03:52] robert_ancell, the inconsistency in the bisecting so far was just me building the wrong revisions -- not actual inconsistencies in results observed -- i.e. so far each revision has exhibited reliable behavior [03:53] I think [03:53] robert_ancell, I take that back -- I built the same 2041 deb and this time it is broken [03:54] so maybe I need to do more reboots before declaring a given revision as "clean" [03:54] I have to head out soon, and you must be almost falling asleep. What do you think we should do? [03:54] I'll go back and test the 2040 revision [03:55] robert_ancell, I think the likely culprit is 2041 -- but if this problem isn't 100% reproducable, I need to retest my older bisects [03:55] robert_ancell, but my 2041 test has proven that the problem existed in at least 204 [03:55] 2041 [03:55] even if it doesn't always show up [03:56] robert_ancell, I am about to fall asleep, so I'm going to bed. I can continue testing tomorrow. But I'm not sure I'm qualified to fix it, especially since I don't have the context for the original 2041 commit [03:57] mterry, 2041 is the fix for bug 1364725 [03:57] robert_ancell, and it's tough for you to investigate without a krillin [03:57] bug 1364725 in lightdm (Ubuntu Trusty) "logind session ID not used due to race condition" [High,In progress] https://launchpad.net/bugs/1364725 [03:57] yeah, I'm just guessing here [03:58] robert_ancell, so I can continue tomorrow and work on it myself [03:58] robert_ancell, but if you have any guesses, email them to me and I can try [03:58] mterry, an option is trying r2040.1.1 which is the sticking plaster solution to the above bug instead of the proper solution in r2041 [03:58] robert_ancell, OK, will try that tomorrow [03:59] mterry, thanks for staying up. bye [03:59] but first I should probably stress test r2040 itself to make sure that it doesn't exhibit the bug [03:59] it's annoying that it creeped back [03:59] I thought it was always reproducable [03:59] anyway, bye