Checkbox (Legacy)

Bug #1294007
Comment #8

Comment 8 for bug 1294007

Revision history for this message

Daniel Manrique (roadmr) wrote on 2014-03-19:

I've been doing some quick archaeology here.

Looking at the Job class (checkbox/job.py), it clearly expects environ (second parameter to its constructor) to contain a list of strings indicating environment variables and their values. Each string should contain at least a "=" to separate key and value. It will then split them, add them as keys/values to a process_environ dictionary (which is initialized from os.environ) and pass the completed process_environ to an instance of checkbox.lib.process.Process which handles actually executing the command.

This Process thing eventually does os.execve passing this environment as the method's "env" parameter. So clearly we want the environ we pass while constructing a Job to contain strings of the form "KEY=VALUE" and nothing else.

Now, to see where this environ parameter is built. When executing a job, either the message_info or the backend_info plugins will handle the request to execute it ("message-exec" event). Backend_info handles messages with user: while message_info handles those without user:.

We know that backend_info handles this correctly. This is what it does with the environ value as taken from the job definition:
if "environ" in message:
                #Prepare variables to be "exported" from my environment
                #to the backend's.
                backend_environ = ["%s=%s" % (key, os.environ[key])
                             for key in message["environ"]
                             if key in os.environ]
                message = dict(message) # so as to not wreck the
                                         # original message
                message["environ"] = backend_environ

AH, so this is what we should do: read the desired keys from the invoking environment and rewrite the message's "environ" to contain a list of "K=V" strings. So we receive something like ['USER'] in the original message, but we're expected to create the job with ['USER=roadmr'] and similar. This "mangled" message is then passed to the backend, which simply instantiates Job as follows:

job = Job(message["command"], message.get("environ"),
message.get("timeout"))

By now message.get("environ") will contain the as-desired strings, ready for Job to use.

---

Now, this is what message_info does:

            job = Job(message["command"], message.get("environ"),
                message.get("timeout"))
            status, data, duration = job.execute()

So it's not doing the preprocessing of environ; as initially suspected, it just passes the ['USER'] list unaltered, which will fail for the reasons discussed above.

Strictly speaking, environ: should never be used without user:; the point of environ: is to "pass on" environment variables from the invoking environment, but without user:, the invoking environment is available without any further handling.

One way to fix this is to have the message_info plugin silently drop the message's environ value, so any non-user: jobs will simply ignore it. This should be safe since we don't anticipate new development with this code and the risk of someone instantiating Job with an invalid environ is extremely low.

The second possibility is to have Job itself gracefully handle invalid environ entries; if a separator (=) can't be found in one of the list's elements, then just silently ignore that one, or raise a warning in the log file; this would clue users to the fact that the job is unnecessarily using environ:. I think I prefer this solution as Job is the lowest layer where we can implement this check, a class that can "protect" itself rather than relying on callers feels more robust to me.

I've been doing some quick archaeology here.

We know that backend_info handles this correctly. This is what it does with the environ value as taken from the job definition:
if "environ" in message:
                #Prepare variables to be "exported" from my environment
                #to the backend's.
                backend_environ = ["%s=%s" % (key, os.environ[key])
                             for key in message["environ"]
                             if key in os.environ]
                message = dict(message)  # so as to not wreck the
                                         # original message
                message["environ"] = backend_environ

job = Job(message["command"], message.get("environ"),
                    message.get("timeout"))

By now message.get("environ") will contain the as-desired strings, ready for Job to use.

---

Now, this is what message_info does:

job = Job(message["command"], message.get("environ"),
                message.get("timeout"))
            status, data, duration = job.execute()

So it's not doing the preprocessing of environ; as initially suspected, it just passes the ['USER'] list unaltered, which will fail for the reasons discussed above.