intermittent auth failure

Bug #740529 reported by Thomi Richards
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sloecode
Triaged
High
Unassigned

Bug Description

I have just seen an auth failure with the BZR smart server, and then auth success when retrying the same command without changing anything on the client-side. The client-executed commands were:

lee@ubuntu:~/source/trunk$ bzr push sc:elasticmonkey
Permission denied (publickey).
bzr: ERROR: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.
lee@ubuntu:~/source/trunk$ bzr sc-login
farlelr1
lee@ubuntu:~/source/trunk$ bzr push sc:elasticmonkey
Sloecode Bazaar Smart Server
Pushed up to revision 2.

And the relevant section from the twistd.log file are below. Note the two separate authentications, and the error message for the first failed authentication:

Here's the attempt that failed:

2011-03-23 10:24:00+1300 [twisted.protocols.policies.TimeoutFactory] disabling diffie-hellman-group-exchange because we cannot find moduli file
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] kex alg, key alg: diffie-hellman-group1-sha1 ssh-rsa
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] outgoing: aes128-ctr hmac-md5 none
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] incoming: aes128-ctr hmac-md5 none
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] NEW KEYS
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] starting service ssh-userauth
2011-03-23 10:24:00+1300 [SSHService ssh-userauth on TimeoutProtocol,84453,10.102.100.83] farlelr1 trying auth none
2011-03-23 10:24:00+1300 [SSHService ssh-userauth on TimeoutProtocol,84453,10.102.100.83] farlelr1 trying auth publickey
2011-03-23 10:24:00+1300 [SSHService ssh-userauth on TimeoutProtocol,84453,10.102.100.83] [Failure instance: Traceback: <class 'socket.error'>: [Errno 32] Broken pipe
        /home/sloecode/sloecode/sloecode/sshserver/auth.py:246:auth_publickey
        /usr/lib/python2.6/dist-packages/twisted/cred/portal.py:115:login
        /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:125:maybeDeferred
        /usr/lib/python2.6/dist-packages/twisted/conch/checkers.py:92:requestAvatarId
        --- <exception caught here> ---
        /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:125:maybeDeferred
        /home/sloecode/sloecode/sloecode/sshserver/auth.py:267:checkKey
        /home/sloecode/sloecode/sloecode/sshserver/auth.py:113:lookupUserDetails
        /usr/lib/python2.6/xmlrpclib.py:1199:__call__
        /usr/lib/python2.6/xmlrpclib.py:1489:__request
        /usr/lib/python2.6/xmlrpclib.py:1235:request
        /usr/lib/python2.6/xmlrpclib.py:1351:send_content
        /usr/lib/python2.6/httplib.py:759:send
        <string>:1:sendall
        ]
2011-03-23 10:24:00+1300 [SSHService ssh-userauth on TimeoutProtocol,84453,10.102.100.83] farlelr1 failed auth publickey
2011-03-23 10:24:00+1300 [SSHService ssh-userauth on TimeoutProtocol,84453,10.102.100.83] unauthorized login: unable to get avatar id
2011-03-23 10:24:00+1300 [TimeoutProtocol,84453,10.102.100.83] connection lost

And here's the subsequent attempt that succeeded (just a few seconds later):

2011-03-23 10:24:43+1300 [twisted.protocols.policies.TimeoutFactory] disabling diffie-hellman-group-exchange because we cannot find moduli file
2011-03-23 10:24:43+1300 [TimeoutProtocol,84454,10.102.100.83] kex alg, key alg: diffie-hellman-group1-sha1 ssh-rsa
2011-03-23 10:24:43+1300 [TimeoutProtocol,84454,10.102.100.83] outgoing: aes128-ctr hmac-md5 none
2011-03-23 10:24:43+1300 [TimeoutProtocol,84454,10.102.100.83] incoming: aes128-ctr hmac-md5 none
2011-03-23 10:24:43+1300 [TimeoutProtocol,84454,10.102.100.83] NEW KEYS
2011-03-23 10:24:43+1300 [TimeoutProtocol,84454,10.102.100.83] starting service ssh-userauth
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] farlelr1 trying auth none
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] farlelr1 trying auth publickey
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] Key with type ssh-rsa and comment lee@ubuntu matched credentials.
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] farlelr1 trying auth publickey
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] Key with type ssh-rsa and comment lee@ubuntu matched credentials.
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] farlelr1 authenticated with publickey
2011-03-23 10:24:43+1300 [SSHService ssh-userauth on TimeoutProtocol,84454,10.102.100.83] starting service ssh-connection
2011-03-23 10:24:43+1300 [SSHService ssh-connection on TimeoutProtocol,84454,10.102.100.83] got channel session request
2011-03-23 10:24:43+1300 [SSHChannel session (0) on SSHService ssh-connection on TimeoutProtocol,84454,10.102.100.83] channel open
2011-03-23 10:24:43+1300 [SSHChannel session (0) on SSHService ssh-connection on TimeoutProtocol,84454,10.102.100.83] unhandled request for env
2011-03-23 10:24:43+1300 [SSHChannel session (0) on SSHService ssh-connection on TimeoutProtocol,84454,10.102.100.83] executing command "bzr serve --inet --directory=/ --allow-writes"
2011-03-23 10:24:43+1300 [SSHChannel session (0) on SSHService ssh-connection on TimeoutProtocol,84454,10.102.100.83] Running: 'bzr', ['bzr', 'sc-serve', '--inet', 'farlelr1']

Tags: smartserver
Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote :

I've now seen this twice. It seems that after a period of inactivity the auth will fail, but once it's been "woken up" it will work consistently. At a first glance it seems to be failing in the XMLRPC code when we attempt to get user auth details from the web app.

description: updated
Changed in sloecode:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Thomi Richards (thomir)
Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote :

...now with the corresponding pylons web-app logs.

Here's the request that failed:

10:24:00,222 WARNI [paste.httpserver.ThreadPool] [MainThread] Thread 140540316600064 hung (working on task for 150193 seconds)
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 49385)
Traceback (most recent call last):
  File "/usr/lib/python2.6/SocketServer.py", line 283, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/pymodules/python2.6/paste/httpserver.py", line 1052, in process_request
    lambda: self.process_request_in_thread(request, client_address))
  File "/usr/lib/pymodules/python2.6/paste/httpserver.py", line 632, in add_task
    self.kill_hung_threads()
  File "/usr/lib/pymodules/python2.6/paste/httpserver.py", line 793, in kill_hung_threads
    self.kill_worker(worker.thread_id)
  File "/usr/lib/pymodules/python2.6/paste/httpserver.py", line 720, in kill_worker
    killthread.async_raise(thread_id, SystemExit)
  File "/usr/lib/pymodules/python2.6/paste/util/killthread.py", line 25, in async_raise
    raise ValueError("invalid thread id")
ValueError: invalid thread id
----------------------------------------
10:24:00,235 DEBUG [routes.middleware] [worker 10] Matched POST /xmlrpc
10:24:00,235 DEBUG [routes.middleware] [worker 10] Route path: '/xmlrpc', defaults: {'action': u'index', 'controller': u'xmlrpc'}
10:24:00,235 DEBUG [routes.middleware] [worker 10] Match dict: {'action': u'index', 'controller': u'xmlrpc'}

Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote :

Since adding extra logging I cannot reproduce. Annoying...

Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote :

Cannot Reproduce

Changed in sloecode:
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.