[SRU] ERROR Nonce already used

Bug #1190986 reported by Konrad Meier
62
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Julian Edwards
1.5
Fix Released
High
Julian Edwards
1.6
Fix Released
High
Julian Edwards
juju-core
Fix Released
High
Ian Booth
juju (Ubuntu)
Confirmed
Undecided
Unassigned
Saucy
Won't Fix
Undecided
Unassigned
Trusty
Confirmed
Undecided
Unassigned
maas (Ubuntu)
Fix Released
Medium
Unassigned
Saucy
Won't Fix
Medium
Unassigned
Trusty
Fix Released
High
Unassigned

Bug Description

SRU for secondary fix (branch lp:~julian-edwards/maas/nonce-fix-bug-1190986)
[Impact]
 * Minimize possibility of hitting a nonce collision by using a longer nonce (uuid4) in both python-maas-client and commissioning scripts.

[Testcase]
 * Do some api calls and make sure that the piston_nonce postgres table is using the larger uuid nonce for the calls

[Regression Potential]
 * Minimal; replace random string of length 8 with uuidgen.

The error below happens sometimes. (~5% of all requests fail)

System: Ubuntu 12.04.2
MAAS-version: 1.2+bzr1373+dfsg-0ubuntu1~12.04.1
Juju-version: 0.7+bzr628+bzr631~precise1

$ juju status
2013-06-14 13:59:36,213 INFO Connecting to environment...
2013-06-14 13:59:36,295 ERROR Cannot connect to environment: Nonce already used: 55941407
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/juju/providers/common/connect.py", line 43, in run
    client = yield self._internal_connect(share)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1037, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 382, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/juju/providers/common/connect.py", line 59, in _internal_connect
    candidates = yield self._provider.get_zookeeper_machines()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1037, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 382, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/juju/providers/common/findzookeepers.py", line 32, in find_zookeepers
    machine = yield provider.get_machine(instance_id)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1037, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 382, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/juju/providers/maas/provider.py", line 125, in get_machines
    instances = yield self.maas_client.get_nodes(instance_ids)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/juju/providers/maas/maas.py", line 95, in _process_error
    raise ProviderError(error.response)
ProviderError: Nonce already used: 55941407
2013-06-14 13:59:36,297 ERROR Nonce already used: 55941407

I think the problem is caused by the MAAS-database table "piston_nonce"
In my system this table contains a hughe amount of tokens that are never deleted:
maasdb=# SELECT count(id) FROM piston_nonce;
  count
---------
 3619099
(1 row)

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Can you attach all the maas logs under /var/log/maas please.

Thanks.

Changed in maas:
status: New → Incomplete
Revision history for this message
Konrad Meier (konrad-meier) wrote :

Requested logfiles in attachment.

Revision history for this message
Raphaël Badin (rvb) wrote :

Looks like piston is not cleaning up the nonce table (see https://bitbucket.org/jespern/django-piston/issue/187/too-many-nonces).

Revision history for this message
Konrad Meier (konrad-meier) wrote :

@Raphaël: Yes I think that is the problem.
In my system the "piston_nonce" table is growing fast.
Right now I have 4016258 nonces for a single token_key.

I am not sure whose responsibility it is to clean that table.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

For reference, the django-social-auth project added a "clean_nonces" command to take of their own manifestation of this problem in December 2012: https://github.com/omab/django-social-auth/issues/410 — but unfortunately that particular solution seems to be based on a field Nonce.timestamp. I'm not seeing that field even in piston upstream.

Revision history for this message
Konrad Meier (konrad-meier) wrote :

As a workaround I added a cronjob to clean the table every 5 min:
*/5 * * * * su -c 'psql -d maasdb -c "DELETE FROM piston_nonce WHERE id < (SELECT max(id)-1000 FROM piston_nonce);"' postgres >> /var/log/maas/nonce_cron.log
Since there is no timestamp in the table I keep the last 1k entries.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Low
David Britton (dpb)
tags: added: theme-oil
Revision history for this message
Raphaël Badin (rvb) wrote :

There is no 'timestamp' column in the Nonce table so fixing this is not as easy as cleaning up the old nonces with a simple SQL query.

But I think I've got a solution that is both simple and doable without performing brain surgery on piston:
Let timestamp_threshold be the duration for which a nonce is valid (the value is in piston's codebase)
Every 5 minutes, run a script that does this:
  1. create a (fake) Nonce with token_key='', consumer_key='' and key=time.time()
  2.a. find the most recent Nonce (sort by id and take the nonce with the biggest id) with token_key='', consumer_key='' and key < time.time() - timestamp_threshold
  2.b. delete all the nonces which are older than the nonce fetched in 2.a., if any.

The idea is basically to create "checkpoint" nonces and then use them to clean up old nonces. The "checkpoint" nonces won't point to a customer so won't be considered by the oauth machinery. The Nonce.key field is a charfield with 255 which should be plenty to store a timestamp (storing a timestamp in a charfield is a bit hacky, I give you that, but it should work).

Gavin Panella (allenap)
Changed in maas:
importance: Low → High
Raphaël Badin (rvb)
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → In Progress
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
James Page (james-page)
Changed in maas (Ubuntu Saucy):
status: New → Triaged
importance: Undecided → Medium
Changed in maas:
milestone: none → 14.04
Revision history for this message
Raphaël Badin (rvb) wrote :

If you're using a earlier version of MAAS that doesn't contain the fix yet, it's easy to reproduce the fix:
Save the script from http://paste.ubuntu.com/6762313/ into nonces_cleanup.py and then run it using cron, every 5 minutes:
cat nonces_cleanup.py | sudo maas shell

Note that the cleanup will not happen the first time you run the script, only the second time (and all the other times after that); Read the script if you want to know why ;).

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
David Britton (dpb) wrote :

Got another instance maybe? maas package version: 1.6.1+bzr2548-0ubuntu1~ppa1

ERROR 2014-08-19 20:01:18,871 maasserver ################################ Exception: 'Nonce already used: 48723931' ################################
ERROR 2014-08-19 20:01:18,871 maasserver Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/decorators/vary.py", line 19, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 128, in __call__
    actor, anonymous = self.authenticate(request, rm)
  File "/usr/lib/python2.7/dist-packages/maasserver/api_support.py", line 67, in authenticate
    RestrictedResource, self).authenticate(request, rm)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 103, in authenticate
    if not authenticator.is_authenticated(request):
  File "/usr/lib/python2.7/dist-packages/maasserver/api_auth.py", line 60, in is_authenticated
    raise OAuthUnauthorized(error)
OAuthUnauthorized: 'Nonce already used: 48723931'

Revision history for this message
David Britton (dpb) wrote :

comment #9 was while running juju bootstrap -v

Revision history for this message
Greg Lutostanski (lutostag) wrote :

Since it was hit while running juju I'm inclined to believe that it is because http://bazaar.launchpad.net/~juju/gomaasapi/trunk/view/head:/oauth.go is still using the old nonce generation method ([0-9]{8} rather than uuid) which is much more collision prone.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Ian Booth (wallyworld)
Changed in juju-core:
milestone: none → 1.21-alpha1
milestone: 1.21-alpha1 → 1.20.6
importance: Undecided → High
status: New → Triaged
Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju (Ubuntu Saucy):
status: New → Confirmed
Changed in juju (Ubuntu):
status: New → Confirmed
Aaron Bentley (abentley)
Changed in juju-core:
status: Fix Committed → Fix Released
description: updated
summary: - ERROR Nonce already used
+ [SRU] ERROR Nonce already used
Changed in maas (Ubuntu Trusty):
importance: Undecided → High
milestone: none → trusty-updates
status: New → Fix Committed
Chris J Arges (arges)
Changed in maas (Ubuntu):
status: Triaged → Fix Released
Changed in maas (Ubuntu Saucy):
status: Triaged → Won't Fix
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Konrad, or anyone else affected,

Accepted maas into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/maas/1.5.4+bzr2294-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju (Ubuntu Trusty):
status: New → Confirmed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Been tested and issue fixed.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.5.4+bzr2294-0ubuntu1.1

---------------
maas (1.5.4+bzr2294-0ubuntu1.1) trusty-proposed; urgency=medium

  * Add hardware enablement for armhf/keystone (LP: #1350103)
 -- Greg Lutostanski <email address hidden> Thu, 18 Sep 2014 16:43:56 -0500

Changed in maas (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Rolf Leggewie (r0lf) wrote :

saucy has seen the end of its life and is no longer receiving any updates. Marking the saucy task for this ticket as "Won't Fix".

Changed in juju (Ubuntu Saucy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.