Librarian: psycopg2.OperationalError: could not send data to server: Connection timed out

Bug #1035112 reported by David Ames on 2012-08-09
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Unassigned
Storm
Low
Sidnei da Silva

Bug Description

 smagoun1 began getting intermittent 'Processing failed' errors trying to load private attachments from the librarian e.g.
https://i100944525.restricted.launchpadlibrarian.net/100944525/blacklist_xps13z.patch?token=f86fa57da3d51ceca33c6bb8303e3d7a

Found these erros in the librarian logs:

2012-08-09 21:05:40+0000 [-] Unhandled error in Deferred:
2012-08-09 21:05:40+0000 [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
            self.__bootstrap_inner()
          File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
            self.run()
          File "/usr/lib/python2.6/threading.py", line 484, in run
            self.__target(*self.__args, **self.__kwargs)
        --- <exception caught here> ---
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/Twisted-11.1.0-py2.6-linux-x86_64.egg/twisted/python/threadpool.py", line 207, in _worker
            result = context.call(ctx, function, *args, **kwargs)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/Twisted-11.1.0-py2.6-linux-x86_64.egg/twisted/python/context.py", line 118, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/Twisted-11.1.0-py2.6-linux-x86_64.egg/twisted/python/context.py", line 81, in callWithContext
            return func(*args,**kw)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/database/__init__.py", line 37, in retry_transaction_decorator
            return func(*args, **kwargs)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/database/sqlbase.py", line 555, in reset_store_decorator
            return func(*args, **kwargs)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/database/__init__.py", line 73, in write_transaction_decorator
            ret = func(*args, **kwargs)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/librarianserver/web.py", line 129, in _getFileAlias
            alias = self.storage.getFileAlias(aliasID, token, path)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/librarianserver/storage.py", line 82, in getFileAlias
            return self.library.getAlias(aliasid, token, path)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/lib/lp/services/librarianserver/db.py", line 60, in getAlias
            TimeLimitedToken.path==path).is_empty()
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/store.py", line 1077, in is_empty
            result = self._store._connection.execute(select)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/databases/postgres.py", line 266, in execute
            return Connection.execute(self, statement, params, noresult)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/database.py", line 238, in execute
            raw_cursor = self.raw_execute(statement, params)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/databases/postgres.py", line 276, in raw_execute
            return Connection.raw_execute(self, statement, params)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/database.py", line 322, in raw_execute
            self._check_disconnect(raw_cursor.execute, *args)
          File "/srv/launchpadlibrarian.net/production/launchpad5-rev-15654/eggs/storm-0.19.0.99_lpwithnodatetime_r406-py2.6-linux-x86_64.egg/storm/database.py", line 371, in _check_disconnect
            return function(*args, **kwargs)
        psycopg2.OperationalError: could not send data to server: Connection timed out

lifeless was able to reproduce and began stress testing

We ruled out pgbouncer and postgres

After restarting the librarians the error did not return even under stress test.

Related branches

David Ames (thedac) on 2012-08-09
tags: added: canonical-losa-lp
Robert Collins (lifeless) wrote :

I suspect that we suffer a network 'glitch' for some other unidentified reason. Then the connection is bad, and we're not actively closing it, rather trying to reset it, or some such thing.

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: oops
Robert Collins (lifeless) wrote :

(note this doesn't strictly speaking oops, but its the codepath that normally generates oopses, so same triage rule IMNSHO).

Sidnei da Silva (sidnei) wrote :

Fixed in storm by treating this kind of error as DisconnectionError, you might want to pull that patch.

Changed in storm:
status: New → Fix Committed
importance: Undecided → Low
assignee: nobody → Sidnei da Silva (sidnei)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers