buildd-manager intermittently trying to make incorrect DB connection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Launchpad itself |
High
|
[LEGACY] Canonical WebOps |
Bug Description
2010-05-04 10:49:47+0100 [-] Log opened.
2010-05-04 10:49:47+0100 [-] twistd 10.0.0 (/usr/bin/python2.5 2.5.2) starting up.
2010-05-04 10:49:47+0100 [-] reactor class: twisted.
2010-05-04 10:49:47+0100 [-] Starting scanning cycle.
2010-05-04 10:49:47+0100 [-] Scanning failed with: could not connect to server: No such file or directory
2010-05-04 10:49:47+0100 [-] Is the server running locally and accepting
2010-05-04 10:49:47+0100 [-] connections on Unix domain socket "/var/run/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] Traceback (most recent call last):
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] d = defer.maybeDefe
2010-05-04 10:49:47+0100 [-] --- <exception caught here> ---
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] return func(*args, **kwargs)
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] _get_sqlobject_
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] return getUtility(
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] return db_policy.
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] store_name, 'launchpad:%s' % store_name)
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] File "/srv/launchpad
2010-05-04 10:49:47+0100 [-] raw_connection = super(Launchpad
2010-05-04 10:49:47+0100 [-] File "/home/
2010-05-04 10:49:47+0100 [-]
2010-05-04 10:49:47+0100 [-] psycopg2.
2010-05-04 10:49:47+0100 [-] Is the server running locally and accepting
2010-05-04 10:49:47+0100 [-] connections on Unix domain socket "/var/run/
See:
https:/
https:/
This is a new issue as of 10.04.
I have a tar available of the pertinent logfiles (the files will be rotated so keeping them around might be useful). From bigjools and danilo:
- look for "psycopg2.
- specifically it tries to connect to PG via /var/run/
- the only way I know that can happen is that LPCONFIG is not set so I guess we also need to look at the startup scripts (the one that starts up the daemons)
Michael Nelson (michael.nelson) wrote : | #1 |
Stuart Bishop (stub) wrote : | #2 |
Which LPCONFIG is being used when this error is triggered?
Does this always happen, or dones the system sometimes start up correctly?
Данило Шеган (danilo) wrote : | #3 |
Tom tells me it's LPCONFIG=ftpmaster that's being used. Since we had it during original rollout, it failed at first and then when it was restarted it succeeded.
Gary Poster (gary) wrote : | #4 |
Reminder to self: pertinent logs are devpad:
Gary Poster (gary) wrote : | #5 |
Given the comments from bigjools danilo and noodles, LPCONFIG looks like the key. Chex let me look at the script that starts the buildd-manager: https:/
Could LOSAs add the export in start_buildd_
Thank you
Changed in launchpad-foundations: | |
assignee: | nobody → Canonical LOSAs (canonical-losas) |
Michael Nelson (michael.nelson) wrote : | #6 |
Thanks Gary... that certainly makes sense of the situation.
Stuart Bishop (stub) wrote : | #7 |
If it is just a matter of ensuring LPCONFIG is exported correctly, a quick fix would be to create ~/.lpconfig - the contents of this will be used as the configuration name if LPCONFIG environment variable is not set.
Tom Haddon (mthaddon) wrote : | #8 |
We've added "export LPCONFIG=
Changed in launchpad-foundations: | |
status: | Triaged → Fix Released |
Just a bit of info from when I tried to look into this on Friday:
On Fri, May 7, 2010 at 5:47 PM, Michael Nelson <email address hidden> wrote: buildd- sequencer) .
> So I've been playing with r9336 production stable, and running the
> buildd-manager with a non-local postgres connection (and killing my
> local psql), and I cannot reproduce this without removing the host
> option for the db config.
>
> So far, I cannot see why we are trying to connect to a local postgres
> intsance on cesium, or why the issue disappears after a restart, nor
> reproduce the issue :/
>
> The possibilities:
> 1) LPCONFIG env. variable was not set during first run - unlikely but possible?
> 2) A change to some of the dbconfig loading options (there were some
> during the 10.04 cycle) introduced some kind of config loading issue -
> very unlikely.
>
> I'd recommend that we ask a losa to re-update cesium to
> production-stable and restart again (and find out exactly what command
> they run to restart it - the wiki only has examples for the old slave
> scanner/
It would have been nice to be able to test the above using the same startup script used in production.