asynchronous connection failed in postgresql jobs

Bug #1338841 reported by Matt Riedemann
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
High
Unassigned
devstack
Fix Released
Undecided
Matt Riedemann

Bug Description

The trace for the failure is here:

http://logs.openstack.org/57/105257/4/check/check-tempest-dsvm-postgres-full/f72b818/logs/tempest.txt.gz?level=TRACE#_2014-07-07_23_43_37_250

This is the console error:

2014-07-07 23:44:59.590 | tearDownClass (tempest.thirdparty.boto.test_ec2_keys.EC2KeysTest)
2014-07-07 23:44:59.590 | -----------------------------------------------------------------
2014-07-07 23:44:59.590 |
2014-07-07 23:44:59.590 | Captured traceback:
2014-07-07 23:44:59.590 | ~~~~~~~~~~~~~~~~~~~
2014-07-07 23:44:59.590 | Traceback (most recent call last):
2014-07-07 23:44:59.590 | File "tempest/thirdparty/boto/test.py", line 272, in tearDownClass
2014-07-07 23:44:59.590 | raise exceptions.TearDownException(num=fail_count)
2014-07-07 23:44:59.590 | TearDownException: 1 cleanUp operation failed

There isn't much in the n-api logs, just the 400 response.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

There was a new boto package uploaded to pypi on 7/1 but it seems we started seeing the keypair failures later than that.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

message:"Unexpected OperationalError raised: (OperationalError) asynchronous connection failed" AND tags:"screen-n-api.txt"

that shows up more now, check and gate, multiple changes, all failures.

Interestingly it only shows up in postgres jobs.

Changed in nova:
importance: Undecided → High
Revision history for this message
Matt Riedemann (mriedem) wrote :

The postgres jobs are the only ones running the nova-api-meta service so that might have something to do with this also.

Revision history for this message
Matt Riedemann (mriedem) wrote :

The "asynchronous connection failed" error doesn't show up before 7/8 in logstash.

Revision history for this message
Matt Riedemann (mriedem) wrote :
tags: added: postgresql
Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like we're running with a really old version of postgresql:

ii python-psycopg2 2.4.5-1build5 amd64 Python module for PostgreSQL

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

That might be coming back to this:

https://review.openstack.org/#/c/103206/

That merged on 7/8 and that's when I first saw the concurrency issue in that patch going through the gate, (comment 4 above), now it's hitting everywhere, so that probably triggered the locks in postgresql.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Multiple trove workers also merged on 7/8:

https://review.openstack.org/#/c/103239/

And glance api/registry workers to use number of CPUs by default merged on 7/3:

https://review.openstack.org/102665

So maybe the cinder api workers change just tipped it over.

summary: - EC2KeysTest fails in tearDownClass with InvalidKeyPair.Duplicate
+ asynchronous connection failed in postgresql jobs
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

We could try playing with max_connections, but that seems iffy.

There is also synchronous_commit=off and fsync=off which, according to the docs, provide performance benefit when you don't need to be careful about losing data, which we don't in the gate since they are throw-away VMs.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Consensus is to try setting max_connections=200 given the default of 100 and the fact that the default for mysql is 151:

http://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in cinder:
status: New → Invalid
Changed in devstack:
status: New → In Progress
Changed in nova:
status: New → Invalid
Changed in devstack:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.openstack.org/105854
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=94c654ef37f6a0247a307578f3240f97201a3cba
Submitter: Jenkins
Branch: master

commit 94c654ef37f6a0247a307578f3240f97201a3cba
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 9 12:38:36 2014 -0700

    Set postgresql max_connections=200

    Now that we have multiple workers running by default
    in various projects (nova/cinder/glance/trove), the
    postgresql job is failing intermittently with connection
    failures to the database.

    The default max_connections for postgresql is 100 so here
    we double that.

    Note that the default max_connections for mysql used to
    be 100 but is now 151, so this change brings the postgresql
    configuration more in line with mysql.

    Change-Id: I2fcae8184a82e303103795a7bf57c723e27190c9
    Closes-Bug: #1338841

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/121952

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on devstack (stable/icehouse)

Change abandoned by Sean Dague (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/121952

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.