MachineSuite.TestMachineWorkers timed out waiting for workers zesty because dbus is in interactive mode

Bug #1665160 reported by Curtis Hovey
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Unassigned
2.1
Won't Fix
High
Unassigned
dbus (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

As seen at
    http://reports.vapour.ws/releases/issue/5768c750749a563f2d7daa6e

A unit test is exclusively failing on our new zesty machine. the previous issues is not related to the zesty falures

machine-0: has workers [agent api-address-updater api-caller api-config-watcher disk-manager log-forwarder log-sender logging-config-updater machine-action-runner machiner migration-fortress migration-inactive-flag migration-minion proxy-config-updater reboot-executor ssh-authkeys-updater state-config-watcher storage-provisioner termination-signal-handler unconverted-api-workers upgrade-check-flag upgrade-check-gate upgrade-steps-flag upgrade-steps-gate upgrader]
machine-0: waiting for [unit-agent-deployer]
machine-0: unexpected []
machine-0: report:
{}

machine_test.go:1272:
    WaitMatch(c, matcher.Check, coretesting.LongWait, s.BackingState.StartSync)
engine_test.go:234:
    c.Fatalf("timed out waiting for workers")
... Error: timed out waiting for workers

Changed in juju:
importance: Critical → High
milestone: 2.1.0 → none
Curtis Hovey (sinzui)
Changed in juju:
milestone: none → 2.2.0-alpha1
Revision history for this message
Tim Penhey (thumper) wrote :

The cause of the worker failing is this:

[LOG] 0:02.914 DEBUG juju.worker.dependency "unit-agent-deployer" manifold worker stopped: dbus stop request failed for application "jujud-unit-zesty-juju-ci-0": Interactive authentication required.
[LOG] 0:02.914 ERROR juju.worker.dependency "unit-agent-deployer" manifold worker returned unexpected error: dbus stop request failed for application "jujud-unit-zesty-juju-ci-0": Interactive authentication required.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Curtis,
According to comment # 1, dbus on zesty seem to want to run interactive authentication. This does not seem to be Juju's responsibility.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Marking as Won't Fix for 2.1.1 as it requires a bit more investigation and we are focused on 2.2.

This is related to dbus changes in zesty and 2.2 will be out before zesty.

Curtis Hovey (sinzui)
summary: MachineSuite.TestMachineWorkers timed out waiting for workers zesty
+ because dbus is in interactive mode
Revision history for this message
Iain Lane (laney) wrote :

It is not dbus which is asking for authentication here. That's not really a thing that happens.

What does happen is that some services that own names on the bus will check that a method's caller is authorised to make the call before performing the action. This is usually achieved with PolicyKit - I think that you're seeing a PolicyKit denial here.

A notable user is systemd - am I right in saying that this is a response from a dbus call to a systemd method (StopUnit?)? If so, this behaviour is certainly nothing new.

The relevant polkit action id is "org.freedesktop.systemd1.manage-units". It is callable by admins (unix group admin or unix group sudo) after they have authenticated with their password. If you call the method as root, you won't get prompted and the action will be performed.

I would look at any changes to how you run the tests (which user?), then any changes to the systemd stuff (is it new? is it now being called from a non-root process whereas before it wasn't?). Note that policykit will output to the journal when it authorises or denies acition requests, which might be helpful.

Curtis Hovey (sinzui)
Changed in juju:
status: Incomplete → Triaged
Changed in juju:
milestone: 2.2-alpha1 → 2.2-rc1
Revision history for this message
Curtis Hovey (sinzui) wrote :

Is the unit test messing with the services on the host system? I hope not.

Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Revision history for this message
Christopher Lee (veebers) wrote :

These tests are run across different archs using the same script (the same user etc.). Nothing has changed in how it's run but we're only seeing this failure on Zesty.

I should be able to get the PolicyKit output from the journal in case that is of use.

Revision history for this message
Christopher Lee (veebers) wrote :

I should also note that the test runs as the ubuntu user whish does has sudo access to the whole machine.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
James Henstridge (jamesh) wrote :

Anastasia asked me to take a look at this. I'm still not sure what the underlying cause is, but here is some of what I've discovered so far.

1. There has been a change in policykit-1 that sounds like it might be relevant. Namely:

  [ Martin Pitt ]
  * Use PAM's common-session-noninteractive modules for pkexec instead of
    common-session. The latter also runs pam_systemd (the only difference
    normally) which is a no-op under the classic session-centric
    D-BUS/graphical login model (as it won't start a new one if it is already
    running within a logind session), but very expensive when using
    dbus-user-session and being called from a service that runs outside the
    PAM session. This causes long delays in e. g. gnome-settings-daemon's
    backlight helpers. (LP: #1626651)

So if something is being run by pkexec, it is no longer executing pam_systemd.so. That module is responsible for registering with logind, setting up the user bus, etc.

So it seems possible that there is some code in the test suite that looked like it was running under an interactive user session on Xenial, but doesn't any more on Zesty.

2. The default policies for systemd1.manage-units haven't changed between versions. In both cases, it reads:

        <action id="org.freedesktop.systemd1.manage-units">
                <defaults>
                        <allow_any>auth_admin</allow_any>
                        <allow_inactive>auth_admin</allow_inactive>
                        <allow_active>auth_admin_keep</allow_active>
                </defaults>
        </action>

So it is possible that the test works on previous versions by piggy backing off a previous authorisation via auth_admin_keep. If we look like a non-interactive invocation on Zesty, we may need authentication every time now.

3. Maybe we can bypass all of this by configuring the polkit local authority to allow the test process to manage units unconditionally. This would work for all versions irrespective of whether we look like an interactive session.

This would involve placing a policy file under /etc/polkit-1/localauthority/50-local.d before running the tests.

Revision history for this message
James Henstridge (jamesh) wrote :

With Chris's help we got the tests to get past the D-Bus errors on Zesty by creating a file /etc/polkit-1/localauthority/50-local.d/manage-units.pkla on the test system with the following contents:

    # Allow the "ubuntu" user to manage systemd units unconditionally for testing

    [Allow manage-units]
    Identity=unix-user:ubuntu
    Action=org.freedesktop.systemd1.manage-units;org.freedesktop.systemd1.manage-unit-files
    ResultAny=yes
    ResultInactive=yes
    ResultActive=yes

(It turned out that manage-unit-files was also needed to enable/disable services). This basically instructs polkitd to tell systemd that it is okay for the "ubuntu" user to manage system services without having to re-enter their password. Assuming that's what you expect while running the tests, the CI scripts should be updated to add this file to the container.

There shouldn't be any need to special case this for Zesty: it should be fine to create the .pkla file on older releases too.

Changed in juju:
milestone: 2.2-beta3 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

This failure has not been due to Juju but rather the setup of the CI environment. We have not seen any consecutive failures of this nature, so the issue has been resolved.

I am closing this report.

Changed in juju:
status: Incomplete → Fix Released
Changed in dbus (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.