fti.py wants to lock all replication sets

Bug #435674 reported by Stuart Bishop
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Won't Fix
High
Stuart Bishop

Bug Description

fti.py is attempting to lock all replication sets when it is run. This is bad, as it will either deadlock with the SSO servers and fail, or succeed and block the SSO servers from running while it updates the fti indexes.

Revision history for this message
Stuart Bishop (stub) wrote :

postgres@wildcherry:~/launchpad/database/schema$ ./fti.py

2009-09-23 23:06:20 INFO No need to rebuild full text index on archive
2009-09-23 23:06:20 INFO No need to rebuild full text index on bug
2009
-09-23 23:06:20 INFO No need to rebuild full text index on bugtask
2009-09-23 23:06:20 INFO No need to rebuild full text index on binarypackagerelease
2009-09-23 23:06:20 INFO No need to rebuild full text index on cve
2009-09-23 23:06:20 INFO No need to rebuild full text index on distribution
2009-09-23 23:06:20 INFO No need to rebuild full text index on distributionsourcepackagecache
2009-09-23 23:06:20 INFO No need to rebuild full text index on distroseriespackagecache
2009-09-23 23:06:20 INFO No need to rebuild full text index on faq
2009
-09-23 23:06:20 INFO No need to rebuild full text index on message
2009-09-23 23:06:20 INFO No need to rebuild full text index on messagechunk
2009-09-23 23:06:20 INFO No need to rebuild full text index on person
2009-09-23 23:06:20 INFO No need to rebuild full text index on product
2009-09-23 23:06:20 INFO No need to rebuild full text index on productreleasefile
2009-09-23 23:06:20 INFO No need to rebuild full text index on project
2009-09-23 23:06:20 INFO No need to rebuild full text index on shippingrequest
2009-09-23 23:06:20 INFO No need to rebuild full text index on specification
2009-09-23 23:06:20 INFO No need to rebuild full text index on question
2009-09-23 23:06:20 INFO Executing generated SQL using slonik
/tmp/slonikpX2jNU.sk:32: PGRES_FATAL_ERROR select "_sl".ddlScript_prepare(1, -1); - ERROR: deadlock detected
DETAIL: Process 19773 waits for AccessExclusiveLock on relation 876869 of database 876387; blocked by process 10280.
Process 10280 waits for AccessShareLock on relation 876880 of database 876387; blocked by process 19773.
CONTEXT: SQL statement "lock table "public"."account" in access exclusive mode"
PL/pgSQL function "altertablerestore" line 47 at EXECUTE statement
SQL statement "SELECT "_sl".alterTableRestore(tab_id) from "_sl".sl_table where tab_set in (select set_id from "_sl".sl_set where set_origin = "_sl".getLocalNodeId('_sl'))"
PL/pgSQL function "ddlscript_prepare" line 34 at PERFORM
2009-09-23 23:06:29 ERROR slonik script failed

Stuart Bishop (stub)
Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Stuart Bishop (stub)
milestone: none → 3.1.10
Revision history for this message
Stuart Bishop (stub) wrote :

"""In Slony-I version 1.0, this would only lock the tables in the specified replication set. As of 1.1, all replicated tables are locked (e.g. - triggers are removed at the start, and restored at the end). This deals with the risk that one might request DDL changes on tables in multiple replication sets."""

So this is unfortunately expected behavior and throws a spanner into our upgrade procedures.

Revision history for this message
Stuart Bishop (stub) wrote :

I believe this behavior stops being a problem if we move the master for the auth replication set to a dedicated server that does not have a replica of the lpmain replication set.

In this setup, the authdb tables will get locked on the Launchpad databases (launchpad_prod*). They will not be locked on the standalone database used for read only mode, as it is stand alone. They will not be locked on the authdb replication set master as there is no lpmain replica there.

Revision history for this message
Francis J. Lacoste (flacoste) wrote : Re: [Bug 435674] Re: fti.py wants to lock all replication sets

On October 7, 2009, Stuart Bishop wrote:
> I believe this behavior stops being a problem if we move the master for
> the auth replication set to a dedicated server that does not have a
> replica of the lpmain replication set.
>
> In this setup, the authdb tables will get locked on the Launchpad
> databases (launchpad_prod*). They will not be locked on the standalone
> database used for read only mode, as it is stand alone. They will not be
> locked on the authdb replication set master as there is no lpmain
> replica there.
>

The plan was to still have a replica of the lpmain replication set on the auth
master DB. Otherwise, we'll need to set-up another separate launchpad replica
for the ShipIt/OpenID servers.

--
Francis J. Lacoste
<email address hidden>

Curtis Hovey (sinzui)
Changed in launchpad-foundations:
milestone: 3.1.10 → 3.1.11
Revision history for this message
Stuart Bishop (stub) wrote :

The current situation seems to behaving 'good enough' until the launchpad/auth split is finished early next year.

Changed in launchpad-foundations:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.