rootwrap filter fail when killing a pid that doesn't exist

Bug #1010275 reported by dan wendlandt
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Thierry Carrez

Bug Description

I've seen this several times, but only now tracked it down.

I get the following error when running nova-network:

2012-06-07 14:26:58 CRITICAL nova [-] Unexpected error while running command.
Command: sudo nova-rootwrap kill -9 4514
Exit code: 99
Stdout: 'Unauthorized command: kill -9 4514\n'
Stderr: ''

The issue is that we're running kill_dhcp in linux net. Kill DHCP gets its pid not from a running process, but from a pid file in the network directory (e.g., /var/lib/nova/networks).

Thus, if the pid file still exists, but dnsmasq is not running, the above kill command is trying to kill a pid that doesn't exit.

However, the process of applying the filter is the following:

        try:
            command = os.readlink("/proc/%d/exe" % int(args[1]))
            # NOTE(dprince): /proc/PID/exe may have ' (deleted)' on
            # the end if an executable is updated or deleted
            if command.endswith(" (deleted)"):
                command = command[:command.rindex(" ")]
            if command not in self.args[1]:
                # Affected executable not in accepted list
                return False
        except (ValueError, OSError):
            # Incorrect PID
            return False

Importantly, if the file in proc does not exist, the filter fails. In this case, because the process is no longer running the read fails and you get a filter failure.

Perhaps for the kill filter it should be able to run kill commands for pids that do not exist? Either that, or we should raise some kind of specific exception that could be caught and ignored by higher-level code like kill_dhcp that is perfectly happy if the PID no longer exists. Right now the rootwrap failure will prevent nova-network from booting until you figure out you need to clear out the old dnsmasq pid.

Tags: rootwrap
Thierry Carrez (ttx)
tags: added: rootwarp
tags: added: rootwrap
removed: rootwarp
Revision history for this message
Thierry Carrez (ttx) wrote :

Nice find.

The trick is that allowing to kill PIDs that do no longer (or not yet) exist creates a (admitted minimal) flaw in the filter. And you can't really raise a higher level error since those are running in two different processes.

The solution, I think, is to accept that nova-rootwrap returns "Unauthorized command" in that specific case. nova-rootwrap returns 99 in that case (it otherwise returns the return code of the shell command it ran). This can be achieved by calling utils.execute with check_exit_code=[0,99].

Would that work for you ?

Changed in nova:
assignee: nobody → Thierry Carrez (ttx)
importance: Undecided → High
status: New → Triaged
Revision history for this message
dan wendlandt (danwent) wrote :

Yeah, that seems like the best available approach, though it seems like we'll have to touch each place in the code where utils.execute specifies kill

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/8389

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Thierry Carrez (ttx) wrote :

Actually all the other "kill" calls just check that the PID is actually valid before trying to kill it. The kill_dhcp call should just do the same, rather than ignoring an "authorized command" error.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/8389
Committed: http://github.com/openstack/nova/commit/294de0ec6c6b2f5d39dcdc3687a42095fca01288
Submitter: Jenkins
Branch: master

commit 294de0ec6c6b2f5d39dcdc3687a42095fca01288
Author: Thierry Carrez <email address hidden>
Date: Mon Jun 11 11:37:07 2012 +0200

    Do not attempt to kill already-dead dnsmasq

    Check that the dnsmasq process is running (and actually looks like a
    dnsmasq process) before attempting to kill it. Fixes bug 1010275.

    Change-Id: Ib49209e1624dfb30470adbe13d7fc045ec1fdf83

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-2 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.