`gpg --batch [...]` error caused by race in BootSourceCacheService

Bug #1376024 reported by Gavin Panella
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Raphaël Badin

Bug Description

Using a self-built 1.7.0~beta4+bzr3127-0ubuntu1 on a NUC.

ERROR 2014-09-30 22:42:17,179 twisted Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/dist-packages/provisioningserver/utils/twisted.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/bootresources.py", line 901, in _import_resources
    download_all_boot_resources(sources, product_mapping)
  File "/usr/lib/python2.7/dist-packages/maasserver/bootresources.py", line 804, in download_all_boot_resources
    keyring_file=source.get('keyring'))
  File "/usr/lib/python2.7/dist-packages/maasserver/bootresources.py", line 785, in download_boot_resources
    writer.sync(reader, rpath)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 88, in sync
    return self.sync_index(reader, path, data, content)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 240, in sync_index
    self.sync(reader, path=epath)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 82, in sync
    content, payload = reader.read_json(path)
  File "/usr/lib/python2.7/dist-packages/simplestreams/mirrors/__init__.py", line 40, in read_json
    return raw, self.policy(content=raw, path=path)
  File "/usr/lib/python2.7/dist-packages/simplestreams/util.py", line 254, in policy_read_signed
    return read_signed(content=content, keyring=keyring)
  File "/usr/lib/python2.7/dist-packages/simplestreams/util.py", line 271, in read_signed
    raise e
subprocess.CalledProcessError: Command '['gpg', '--batch', '--verify', u'--keyring=/tmp/maas-_Y7SGckeyrings/maas.ubuntu.com-images-ephemeral-v2-releases.gpg', '-']' returned non-zero exit status 2

Tags: boot-images

Related branches

Gavin Panella (allenap)
summary: - `gpg --batch --verify --keyring=/tmp/maas-
- _Y7SGckeyrings/maas.ubuntu.com-images-ephemeral-v2-releases.gpg -`
- returned non-zero exit status 2
+ `gpg --batch --verify --keyring=.../maas.ubuntu.com-images-
+ ephemeral-v2-releases.gpg -` returned non-zero exit status 2
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
tags: added: boot-images
Changed in maas:
milestone: none → 1.7.0
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: `gpg --batch --verify --keyring=.../maas.ubuntu.com-images-ephemeral-v2-releases.gpg -` returned non-zero exit status 2

I keep seeing this too in my 1.7.0~beta4+bzr3139-0ubuntu1~trusty1

Revision history for this message
Blake Rouse (blake-rouse) wrote :

The best thing we can do is handle the error better, no way of stopping the error from occurring as the validation is on the simplestreams side.

Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Triaged
importance: Critical → High
assignee: Blake Rouse (blake-rouse) → nobody
milestone: 1.7.0 → next
Revision history for this message
Christian Reis (kiko) wrote :

<smoser> i dont think you're really seeing a invalid gpg signature
<smoser> i suspect download failed or proxy failed or something.
<smoser> race conditions should not be possible
<smoser> its a inline signed file that its verifiying

Revision history for this message
Scott Moser (smoser) wrote :

Heres some information that shoudl be useful:
a.) there is a bug (bug 1342807) where gpg is not thread safe.
     I don't think thats the problem though.
b.) If you do catch this error, the subprocess.CalledProcessError that is raised has stderr and stdout available.
    in download_boot_resources you could definitely catch the CalledProcessError and log its
     e.output[0] and e.output[1].
c.) simplestreams read_signed has a debug log with the stdout and stderr
    so just having python logging configured would get you the stdout /stderr also, and that would likely help in figuring this out.

d.) the simplestreams code mirror code is not threadsafe for multiple threads writing to a single mirror target
     (multiple threads downloading the same file and writing to the same location would not be protected).
     I know i've said this to gavin before.

If you can recreate this failure with any sort of regularity, enable debugging or change maas to log the output and that will help a lot.

Regarding race of the server side data (ie, you downloaded a file that had an invalid signature), that is pretty unlikely
a.) the simplestreams data format was designed for this to not happen. The only file you're checking a signature on is inline signed.
b.) the server side data should not be changing very often.

Revision history for this message
Gavin Panella (allenap) wrote :

I've captured the output from one of these failures:
  gpg: fatal: can't create directory `/home/maas/.gnupg': No such file or directory

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Sounds like GPGHOME is not getting set properly, and it's defaulting back to $HOME/.gnupg. And IIRC the packaging doesn't create a home dir for the maas user.

Christian Reis (kiko)
Changed in maas:
milestone: next → 1.7.1
Revision history for this message
Raphaël Badin (rvb) wrote :

We already have provision in the download code for setting GNUPGHOME:

src/maasserver/bootresources.py
      env = {
            'GNUPGHOME': get_maas_user_gpghome(),
            }
        http_proxy = Config.objects.get_config('http_proxy')
        if http_proxy is not None:
            env['http_proxy'] = http_proxy
            env['https_proxy'] = http_proxy
        with environment_variables(env), tempdir('keyrings') as keyrings_path:
            # download stuff

And get_maas_user_gpghome() should return '/var/lib/maas/gnupg'.

It seems that this value isn't correctly propagated to the gpg call.

Revision history for this message
Raphaël Badin (rvb) wrote :

This problem seems to appear when both the services BootSourceCacheService (the service which cache boot source information, running every hour) and ImportResourcesService (boot resources, running every hour) run at the same time.

Both these services do:
        env = {
            'GNUPGHOME': <home>,
            'http_proxy': <proxy>,
            'https_proxy': <proxy>,
        }
        with environment_variables(env):
            # do stuff

environment_variables() is a context manager that sets and *unsets* the variables upon entry and exit (respectively).

The trick is that, since the variables are per-process, when one service exits the context manager, it *clears* GNUPGHOME for the other service (which is still running).

Possible solutions:
a) Set the GNUPGHOME once and for all at start-up. This will fix the immediate problem we have with GNUPGHOME. But a similar problem exists with 'http_proxy' and it cannot be set at startup because it's dynamic.
b) Have the context manager not clear the env variables when it exits. This means we can still run into conflicts (for instance when 'http_proxy' gets changed) like: have a service run partially with a value of 'http_proxy' and partially with another value of 'http_proxy'. The consequences of this conflict (given the fact that the services in question can be kicked off manually and are run periodically) are, I think, minor.
c) Have the services run the import code inside a different processes. This is probably too big a hammer for the problem we have.

I think we should go with b): it's very low risk and would fix this bug (plus the similar problem we have with 'http_proxy').

Revision history for this message
Christian Reis (kiko) wrote :

I'm curious if it would be possible to ensure that when the proxy is changed that the change is queued and only applied when no current context managers are currently running.

For the moment, in 1.7.1 I agree that doing b fixes the obvious problem.

Revision history for this message
Gavin Panella (allenap) wrote :

I don't think changing the proxy mid-flight is a problem. It won't affect HTTP requests that are already in progress, it'll only affect the next one. It doesn't matter if one HTTP request comes via one proxy (or no proxy) and the next via another proxy (or no proxy), as long as the proxy is valid and/or the network routes correctly.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

(The way to implement option ‘b’ IMHO would be to stop using a context manager for those environment variables and just manipulate the environment, not to patch the context manager.)

I don't suppose we could pass the proxy settings as parameters, e.g. with a change to simplestreams, so we don't need to mess with the environment at runtime?

Revision history for this message
Raphaël Badin (rvb) wrote :

> I don't suppose we could pass the proxy settings as parameters, e.g. with a change to simplestreams, so we don't need to mess with the environment at runtime?

That would be the cleanest way to fix this… but it seems like too big a change to me: a change to the signature of simplestream's methods and a change to how simplestreams works internally.

Raphaël Badin (rvb)
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → In Progress
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

For the record: I just had a quick look and we *could* have passed these parameters on the command line, without any signature changes outside MAAS. The gpg error happens inside an overridable callback. We import it from simplestreams, then pass it (or a different one, depending) into simplestreams as the signature verification function.

So something we could do, if the current solution should turn out to be a problem, is write our own version of simplestreams' policy_read_signed (as either a closure or a callable object I guess) and prefix the gpg command line with our variables. That would also allow us to improve its error reporting, by the way — it wouldn't be hard to make it report "signature does not verify" differently from "I was unable to do my job."

Christian Reis (kiko)
summary: - `gpg --batch --verify --keyring=.../maas.ubuntu.com-images-
- ephemeral-v2-releases.gpg -` returned non-zero exit status 2
+ `gpg --batch [...]` error caused by race in BootSourceCacheService
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.